Local Models with Claude Code

Anthropic's API sends your code to their servers for processing. For most businesses, this is fine — Anthropic doesn't train on your data. But some teams need their code to stay on-premises: regulated industries, government contracts, or simply a preference for data sovereignty. Local models give you AI-assisted development without the data leaving your infrastructure.

When to consider local models

Regulatory requirements — Your industry mandates that source code stays within your infrastructure or specific geographic regions.
Classified or sensitive code — Government, defence, or financial code that cannot be transmitted to third-party APIs.
Cost control at scale — Very high-volume usage (hundreds of sessions daily) where API costs exceed the cost of running your own infrastructure.
Latency requirements — Co-located models respond faster than API calls to Anthropic's servers, especially for rapid iteration.

Options for local/private deployment

GCP Vertex AI

Google Cloud hosts Claude models on Vertex AI. Your code is processed within GCP's infrastructure under your project's data residency settings. This is the easiest path to "private" Claude usage:

Same model quality as the Anthropic API
Data stays within your GCP project
Governed by your existing GCP agreements and compliance certifications
No infrastructure to manage — it's a managed API

AWS Bedrock

Amazon Bedrock provides similar managed access to Claude models within AWS infrastructure. If your team is already on AWS, this keeps everything in one cloud:

VPC integration for network isolation
IAM-based access control
CloudTrail logging for audit compliance

Self-hosted open models

For complete infrastructure control, run open-source models on your own hardware. Models like Llama and Mistral can run locally and be used with Claude Code's model provider configuration. Trade-offs:

Full data sovereignty — nothing leaves your network
Lower capability than Claude for complex coding tasks
Requires GPU infrastructure (significant upfront cost)
You manage updates, scaling, and availability

Configuring Claude Code for alternative providers

Claude Code supports configuring alternative API endpoints. You can point it at Vertex AI, Bedrock, or any OpenAI-compatible API:

Set the API endpoint to your Vertex AI or Bedrock URL
Configure authentication using your cloud provider's credentials
Select the model — same Claude models are available through these providers

Hybrid approaches

Most teams don't go fully local. A hybrid approach balances privacy with capability:

Sensitive repos use Vertex AI/Bedrock — Code stays within your cloud infrastructure.
Non-sensitive repos use the Anthropic API — Simpler setup, potentially lower cost.
Local models for routine tasks — Use smaller, local models for code formatting, simple refactoring, and boilerplate generation. Save the full Claude models for complex work.

Cost comparison

Cost varies significantly by approach:

Anthropic API — Pay per token. Most cost-effective for low to medium usage. The Max plan offers predictable pricing.
Vertex AI / Bedrock — Similar per-token pricing to the Anthropic API, plus your cloud provider's markup. May be offset by existing cloud commitments.
Self-hosted — High fixed cost (GPU hardware or cloud GPU instances), low marginal cost per token. Only economical at very high volumes.

Quality considerations

Not all models are equal for coding tasks. Claude Opus and Sonnet are specifically tuned for code understanding, generation, and review. Open-source alternatives are improving rapidly but still lag behind for:

Large codebase comprehension
Multi-file refactoring
Complex debugging and root cause analysis
Following nuanced instructions in CLAUDE.md and skills

Test any alternative model thoroughly on your actual codebase before committing to it for production use.

Next steps

Local models fit into a broader AgentOps strategy. Combine with guardrails for access control and centralised logging for visibility across both local and cloud-hosted sessions.