Skip to main content
Back to blog

Cohere Command A with Clanker Cloud — Open-Weights Enterprise AI for Infrastructure

Use Cohere Command A's 256K context and open-weights model in Clanker Cloud. BYOK or self-host for air-gapped enterprise infrastructure management.

Most teams evaluating AI for infrastructure management face the same tension: capability versus control. Managed API providers offer the most capable models, but data leaves your environment with every query. Open-source models keep data local, but typically sacrifice context length, tool use, and reliability.

Cohere Command A resolves that tension. It is open-weights, self-hostable, and purpose-built for enterprise agentic tasks — with a 256K context window that holds your entire infrastructure configuration in a single pass. Paired with Clanker Cloud, it becomes a practical infrastructure reasoning layer that works equally well via Cohere's managed API or fully air-gapped on your own hardware.


Cohere Command A — the enterprise-first model

Command A was released in March 2025. At 111 billion parameters, it is currently Cohere's most performant chat and agentic model. The API identifier is cohere.command-a-03-2025.

The model was not fine-tuned for enterprise use after the fact — it was designed for it from the outset. Cohere's positioning is explicit: "maximum performance with minimum hardware costs," with particular strength in business-critical agentic tasks, tool use, retrieval-augmented generation (RAG), and multilingual workloads across 10 or more languages.

Compared to Command R+, Command A delivers better throughput, stronger tool-use reliability, and more coherent coordination across multi-step agent workflows. These are not marginal improvements — they are the difference between an agent that completes a plan and one that stalls partway through.

The open-weights license (Apache 2.0 compatible) means the model weights are downloadable. You can run Command A on your own GPU cluster, on bare-metal, or inside an air-gapped data center. No data leaves your environment unless you choose the managed API path.


Why Command A is uniquely suited for enterprise infrastructure

256K context window

Infrastructure configuration does not fit neatly into small chunks. A non-trivial AWS environment might include hundreds of Terraform resources spread across dozens of files, CloudFormation stacks with cross-stack references, Kubernetes manifests for multiple namespaces, and IAM policies that interact with resources defined elsewhere. Chunking this input and feeding it to a model in pieces loses the cross-resource relationships that matter most for security audits, cost analysis, and dependency mapping.

Command A's 256K token context window holds an entire Terraform state file, a full CloudFormation template, and a suite of Kubernetes manifests simultaneously. Cross-resource relationships — an IAM role defined in one file affecting an EC2 instance defined in another — remain visible to the model in a single pass.

Among leading models, 256K is the largest available context window. Most peers sit at 128K or 200K. For large enterprise environments, that extra headroom is the difference between a complete audit and a partial one.

Open-weights and self-hosted deployment

Running Command A on your own infrastructure eliminates data egress by design. For organizations in financial services, healthcare, government, or defense contracting, this is not a preference — it is a compliance requirement.

Command A is deployable on 2x A100 80GB GPUs or, with AWQ quantization, on 4x RTX 4090s. For teams already running Hetzner bare-metal or on-premises GPU clusters, the hardware cost to self-host Command A is well within range. Clanker Cloud's local-first architecture means the model endpoint, the infrastructure credentials, and the query results all remain on your machines.

Tool use and agent coordination

Command A was built with tool calling as a first-class capability. It handles multi-step agentic workflows without the degradation patterns common in models that were adapted for tool use after training. When Clanker Cloud routes a complex infrastructure task — spanning provider APIs, configuration files, and state data — Command A maintains plan coherence across the full sequence of tool calls.


Two deployment modes with Clanker Cloud

Mode 1: Cohere API (BYOK)

For teams that want Command A's capabilities without managing GPU infrastructure, the BYOK path uses Cohere's managed API:

  1. Obtain an API key at dashboard.cohere.com
  2. In Clanker Cloud, navigate to Settings → AI Model → Bring Your Own Key → Cohere
  3. Paste the key and select cohere.command-a-03-2025

Your key is stored locally in the Clanker Cloud desktop app and is never transmitted to Clanker Cloud servers. Queries route directly from your machine to Cohere's API. This mode is suitable for teams that want enterprise-grade performance without the overhead of running their own GPU infrastructure.

For more on the BYOK model framework across all supported providers, see Clanker Cloud documentation.

Mode 2: Self-hosted Command A via Ollama

For fully air-gapped operation:

  1. Download Command A weights from Cohere's model repository (Apache 2.0 license)
  2. Run the model locally via Ollama or vLLM on your on-premises hardware
  3. In Clanker Cloud Settings, configure the AI model endpoint to point to your local Ollama instance (e.g., http://localhost:11434)

In this configuration, zero external API calls are made. All inference happens on your hardware. Combined with Clanker Cloud's local credential storage, this creates a fully isolated environment: infrastructure credentials stay on-prem, model inference stays on-prem, and query logs stay on-prem.

This is the configuration that large enterprises with SOC 2, HIPAA, or GDPR constraints should evaluate before defaulting to managed API providers.


What Command A with 256K context can do

Full-stack infrastructure review

clanker ask "review my complete infrastructure configuration across all files and find security gaps, over-provisioning, and missing redundancy"

Feed the entire Terraform state, CloudFormation templates, and Kubernetes manifests in a single context. Command A tracks cross-resource dependencies — it understands that the IAM role in iam.tf grants permissions to the Lambda function in functions.tf, and that a permissive policy in one file has downstream effects across the stack.

This is the kind of holistic review that chunked-context approaches cannot reliably perform. A 128K model reviewing the same configuration in two passes will miss the relationship between resources that appear in different windows.

Multi-language team support

Command A supports 10 or more languages natively. Global enterprise teams can query their infrastructure in their working language without translation overhead or loss of precision:

clanker ask "zeige mir alle ungenutzten AWS-Ressourcen in der eu-west-1 Region"

This query in German — "show me all unused AWS resources in the eu-west-1 region" — returns the same structured analysis as the English equivalent. For distributed teams with engineers in Germany, Japan, France, or Spain, this removes a real friction point.

Agentic infrastructure workflows

clanker ask "plan and execute a cost optimization pass across my AWS account — identify savings, draft the Terraform changes, and prepare a summary for the team" --maker --apply

Command A's native tool use coordinates across multiple Clanker Cloud tools in sequence: querying provider APIs, analyzing the results, generating Terraform with the --maker flag, applying changes with --apply, and producing a formatted summary. The --agent-trace flag surfaces the full tool call sequence for review.

This is a multi-step agentic workflow that requires sustained plan coherence. Command A's design for enterprise agentic tasks makes it reliable here in ways that general-purpose models often are not.


Command A and Deep Research

Clanker Cloud's Deep Research feature fans out across every connected provider — AWS, GCP, Azure, Kubernetes, Cloudflare, and others — runs parallel analysis with multiple subagents, and returns prioritized findings organized by severity: cost drivers, misconfigurations, resilience gaps, and availability issues.

Command A is particularly well-suited as the backbone model for Deep Research runs in large environments:

clanker ask "run a deep research scan — prioritize security and compliance findings"

The 256K context window means the Deep Research agent can hold more infrastructure state simultaneously than most models allow. For enterprises with hundreds of resources across multiple regions and providers, this prevents the context truncation mid-audit that forces smaller-context models to drop earlier findings before completing a scan.

Deep Research results include severity levels, affected resources, evidence sources, estimated cost impact, and concrete action labels. Findings can be exported as JSON or Markdown for team sharing. Because Clanker Cloud is local-first, credentials never leave the machine during any part of the scan.


Command R7B — the lightweight Cohere option

For teams that do not need Command A's full capacity, Command R7B (released December 2024) offers a practical alternative for frequent, lightweight queries.

At 7 billion parameters, Command R7B runs on a single RTX 3090 or an Apple M-series Mac. It is optimized for RAG and tool use within a smaller footprint:

clanker ask "quick check — any new alerts or failures in the last 15 minutes"

Use cases for Command R7B include lightweight monitoring agents, high-frequency small queries where latency matters more than depth, and edge deployments. It can be deployed as a sidecar on Kubernetes nodes to provide local infrastructure reasoning without egress — a useful pattern for clusters where network egress is restricted or expensive.

Command A and Command R7B serve different points on the capability-resource tradeoff. Many teams will use both: Command R7B for continuous low-overhead monitoring, Command A for periodic deep audits and complex agentic workflows.


Compliance and data residency

Self-hosted Command A, combined with Clanker Cloud's local-first architecture, produces a configuration that is SOC 2, HIPAA, and GDPR-friendly by design — not by policy claims, but by architecture.

The data flow in a fully air-gapped deployment:

  • Infrastructure credentials: stored locally in the Clanker Cloud desktop app, never transmitted externally
  • Model inference: runs on your hardware, on your network
  • Query logs: remain on-prem, no third-party AI provider logs to audit or negotiate over
  • Provider API calls: go directly from your machine to AWS/GCP/Azure — no proxy, no intermediary

For organizations subject to data residency requirements, this is the difference between building a compliance case and inheriting one. There is no third-party AI data processing agreement to negotiate, no model provider DPA to review, and no inference log export to request.

The AI DevOps for Teams page covers how Clanker Cloud handles team deployments in regulated environments, including credential scoping and audit trail configuration.


Command A via MCP for enterprise agents

Command A's tool-use capability extends beyond direct CLI interaction. Teams building enterprise automation agents — deployment pipelines, incident response workflows, cost governance systems — can connect those agents to Clanker Cloud's infrastructure tooling via MCP:

clanker mcp --transport http --listen 127.0.0.1:39393

With the MCP server running, any agent built on Command A (via Cohere API or self-hosted) can call Clanker Cloud infrastructure tools using the standard MCP protocol. The primary tool for natural-language infrastructure queries is clanker_route_question.

A practical pattern: an enterprise deployment agent that queries infrastructure state before every release. Before applying a Terraform plan, the agent calls clanker_route_question to confirm the target environment is healthy and that the planned changes do not conflict with current state. This happens programmatically, without human intervention, using Command A as the reasoning layer.

For detailed MCP integration guidance, see Clanker Cloud for AI Agents. Teams building production agent workflows should also review the vibe-coding-to-production guide for patterns that move agent-assisted work safely through staging and into production.


FAQ

What is Cohere Command A and can I self-host it?

Command A is Cohere's flagship enterprise model, released March 2025. It has 111 billion parameters, a 256K token context window, and is available under an Apache 2.0 compatible open-weights license. You can download the model weights and run Command A on your own GPU hardware — 2x A100 80GB, or 4x RTX 4090 with AWQ quantization — with no dependency on Cohere's API. The model is designed for agentic tasks, tool use, RAG, and multilingual workloads.

How do I use Cohere Command A with Clanker Cloud?

There are two paths. For the managed API route, obtain a Cohere API key at dashboard.cohere.com, then go to Clanker Cloud Settings → AI Model → Bring Your Own Key → Cohere and paste the key. Select the cohere.command-a-03-2025 model identifier. For self-hosted inference, run Command A via Ollama or vLLM on your local hardware and point Clanker Cloud's model endpoint to your local Ollama instance. Full setup documentation is at docs.clankercloud.ai.

Why does the 256K context window matter for infrastructure management?

Infrastructure configurations involve cross-resource dependencies that break when the context is chunked. An IAM policy in one file affects resources defined in another; a VPC configuration constrains subnets and security groups throughout the stack. A model with a 128K or 200K context window reviewing a large environment in multiple passes loses those cross-file relationships. Command A's 256K window holds an entire multi-file infrastructure configuration in a single context, enabling accurate dependency analysis, security audits, and cost reviews without truncation.

Is Cohere Command A suitable for air-gapped enterprise environments?

Yes. Because Command A is open-weights, you can run it entirely within your network perimeter. Combined with Clanker Cloud's local-first architecture — where credentials never leave the desktop app — a self-hosted Command A deployment produces a fully air-gapped configuration. No data is sent to Cohere's API, no query logs are held by third-party AI providers, and all inference runs on your hardware. This configuration is appropriate for financial services, healthcare, government, and defense environments with strict data residency requirements. See AI DevOps for Teams for compliance deployment guidance.


Get started

Clanker Cloud is in public beta with no cost on the Beta tier. The demo walks through provider connection, model configuration, and your first infrastructure query in under five minutes.

To create an account and connect your first provider, go to clankercloud.ai/account. For teams evaluating the self-hosted Command A path, start with the documentation — it covers Ollama endpoint configuration, MCP setup, and team credential scoping.

Questions about deployment options and compliance configurations are covered in the FAQ.

Next step

Give your agent live infrastructure context

Download Clanker Cloud, expose the local MCP surface, and let coding agents work from current cloud, Kubernetes, GitHub, and cost state instead of guesses.

Download and connect MCPWatch demo