Anthropic Provider

Access Claude models (Claude 3.5, Claude 4, Claude Opus 4.5) through Anthropic's API with support for extended thinking, prompt caching, and function calling.

Name anthropic
SDK anthropic
Models Claude 3.5, Claude 4, Claude Opus 4.5
Max Context 200K tokens (all models)

Available Models

Model Context Extended Thinking
claude-opus-4-5-20251101 200K Yes
claude-sonnet-4-20250514 200K Yes
claude-haiku-4-20250414 200K No
claude-3-5-sonnet-20241022 200K Yes
claude-3-5-haiku-20241022 200K No
Quick start
from shared import load_provider, ProviderConfig

provider = load_provider("anthropic")

# Initialize with API key
provider.initialize(ProviderConfig(
    api_key="sk-ant-...",  # or from ANTHROPIC_API_KEY env
))

# Connect to a model
provider.connect("claude-sonnet-4-20250514")

# Start chatting
provider.create_session(
    system_instruction="You are helpful."
)
response = provider.send_message("Hello!")
print(response.text)
With extended thinking
provider.initialize(ProviderConfig(
    api_key="sk-ant-...",
    extra={
        'enable_thinking': True,
        'thinking_budget': 10000
    }
))

# Response includes reasoning
response = provider.send_message("Complex question")
print(f"Thinking: {response.thinking}")
print(f"Answer: {response.text}")

Extended Thinking

Extended thinking allows Claude to show its reasoning process before generating a response. This is useful for complex problems, debugging model behavior, or when you want transparency into the reasoning.

When to Use Extended Thinking

  • Complex reasoning - Math, logic, multi-step problems
  • Code analysis - Understanding large codebases
  • Decision making - Weighing pros and cons
  • Debugging - Understanding why the model chose an answer

Configuration Options

Option Type Default Description
enable_thinking bool False Enable extended thinking
thinking_budget int 10000 Max tokens for thinking
Thinking Budget
The thinking budget limits how many tokens Claude can use for internal reasoning. Higher values allow more thorough analysis but increase latency and cost. Start with 10,000 and adjust based on task complexity.
Enable extended thinking
provider.initialize(ProviderConfig(
    api_key="sk-ant-...",
    extra={
        'enable_thinking': True,
        'thinking_budget': 15000  # More tokens for complex tasks
    }
))
Access thinking in response
response = provider.send_message(
    "Explain the trade-offs between microservices and monolith"
)

# Check if thinking is present
if response.has_thinking:
    print("=== Claude's reasoning ===")
    print(response.thinking)
    print()

print("=== Final answer ===")
print(response.text)
Example thinking output
=== Claude's reasoning ===
Let me think through the key trade-offs:

For microservices:
- Pros: Independent deployment, technology flexibility...
- Cons: Distributed systems complexity, network latency...

For monolith:
- Pros: Simpler to develop initially, no network overhead...
- Cons: Deployment coupling, scaling limitations...

I should structure this by category: development, deployment,
scaling, and operations...

=== Final answer ===
Here are the key trade-offs between microservices and monolith architectures:
...

Prompt Caching

Prompt caching reduces cost by up to 90% and improves latency by up to 85% for repeated prompts. It's automatically applied to system instructions and tool definitions.

How It Works

  • Cached content is stored for 5 minutes (refreshed on each use)
  • Cached reads cost 0.1x the normal input price
  • Cache writes cost 1.25x but only happen once
  • System instructions and tools are automatically cached

Best Use Cases

Scenario Savings
Long system instructions High - reused every message
Many tool definitions High - reused every message
Document analysis High - document cached across questions
Simple conversations Low - little to cache
Automatic Caching
When caching is enabled, the provider automatically adds cache_control to system instructions and tool definitions. No manual configuration needed.
Enable prompt caching
provider.initialize(ProviderConfig(
    api_key="sk-ant-...",
    extra={
        'enable_caching': True
    }
))
Cost comparison (100K token prompt)
Without caching:
  100K tokens × $3.00/MTok = $0.30 per request

With caching:
  First request: $0.375 (1.25x write cost)
  Subsequent:    $0.03  (0.1x read cost)

After 2 requests: 75% savings
After 10 requests: 90% savings
Caching with tools
from jaato import ToolSchema

tools = [
    ToolSchema(name='search', description='...', parameters={...}),
    ToolSchema(name='execute', description='...', parameters={...}),
    ToolSchema(name='read_file', description='...', parameters={...}),
    # ... many more tools
]

provider.create_session(
    system_instruction="You are an AI assistant...",
    tools=tools  # Tools and system cached automatically
)

Function Calling

Claude supports function calling (tool use) for interacting with external systems. Tools are defined using JSON Schema and automatically converted to Anthropic's format.

Tool Schema Differences

Anthropic uses input_schema instead of parameters. The provider handles this conversion automatically.

jaato Format Anthropic Format
parameters input_schema
function_call tool_use block
function_response tool_result block
Automatic Conversion
You don't need to handle Anthropic-specific formats. Use the standard ToolSchema type and the provider converts automatically.
Define and use tools
from jaato import ToolSchema, ToolResult

# Define tool with standard ToolSchema
tools = [ToolSchema(
    name='get_weather',
    description='Get current weather for a location',
    parameters={
        "type": "object",
        "properties": {
            "location": {"type": "string", "description": "City name"}
        },
        "required": ["location"]
    }
)]

# Create session with tools
provider.create_session(
    system_instruction="You are a weather assistant.",
    tools=tools
)

# Model may request tool use
response = provider.send_message("What's the weather in Tokyo?")

if response.function_calls:
    fc = response.function_calls[0]
    print(f"Tool: {fc.name}, Args: {fc.args}")
    # Tool: get_weather, Args: {'location': 'Tokyo'}

    # Send result back
    result = ToolResult(
        call_id=fc.id,
        name=fc.name,
        result={"temp": 22, "condition": "sunny"}
    )
    response = provider.send_tool_results([result])
    print(response.text)
    # "The weather in Tokyo is 22°C and sunny!"

Authentication

Anthropic provider supports three authentication methods (in priority order):

1. PKCE OAuth Login (Recommended)

For Claude Pro/Max subscribers. Uses browser-based OAuth flow. No API key needed, uses your subscription quota.

2. OAuth Token

Token starting with sk-ant-oat01-... obtained from claude setup-token. Uses subscription quota.

3. API Key

Token starting with sk-ant-api03-... from Anthropic Console. Uses API credits (pay-per-use).

Environment Variables

VariableDescription
ANTHROPIC_API_KEY API key (sk-ant-api03-...) - uses API credits
ANTHROPIC_AUTH_TOKEN OAuth token (sk-ant-oat01-...) - uses subscription
Cost Savings
OAuth tokens use your Claude Pro/Max subscription instead of API credits. For heavy usage, this can save significant costs.
Environment variables
# Option 1: API Key (uses API credits)
ANTHROPIC_API_KEY=sk-ant-api03-...

# Option 2: OAuth Token (uses subscription)
ANTHROPIC_AUTH_TOKEN=sk-ant-oat01-...

# Option 3: PKCE OAuth - no env var needed
OAuth login (browser-based)
from shared.plugins.model_provider.anthropic import oauth_login

# Trigger browser-based OAuth
token = oauth_login(
    on_message=lambda msg: print(msg)
)
# Opens browser, user logs in
# Token is automatically saved

# Or via JaatoClient
client = JaatoClient(provider_name="anthropic")
client.connect(None, None, "claude-sonnet-4-20250514")
client.verify_auth(allow_interactive=True)
Get OAuth token via CLI
# Install Claude CLI
npm install -g @anthropic-ai/claude-code

# Get OAuth token
claude setup-token

# Token saved to ~/.claude/settings.json

Error Handling

The provider throws specific exceptions with actionable error messages for common failure scenarios.

Exception Cause
APIKeyNotFoundError No API key in env or config
APIKeyInvalidError API key rejected (401)
RateLimitError Too many requests (429)
OverloadedError Anthropic API overloaded (529)
ContextLimitError Context window exceeded
ModelNotFoundError Invalid model ID
Handle errors
from shared.plugins.model_provider.anthropic.errors import (
    APIKeyNotFoundError,
    APIKeyInvalidError,
    RateLimitError,
    OverloadedError,
)

try:
    provider.initialize(config)
    provider.send_message("Hello")
except APIKeyNotFoundError:
    print("Set ANTHROPIC_API_KEY environment variable")
except APIKeyInvalidError:
    print("Check your API key at console.anthropic.com")
except RateLimitError:
    print("Too many requests, retry later")
except OverloadedError:
    print("Anthropic API is busy, retry in a moment")

Configuration Summary

Environment Variables

Variable Required Description
ANTHROPIC_API_KEY Yes Anthropic API key

ProviderConfig.extra Options

Key Type Default Description
enable_caching bool False Enable prompt caching
enable_thinking bool False Enable extended thinking
thinking_budget int 10000 Max thinking tokens
Full configuration example
from shared import load_provider, ProviderConfig

provider = load_provider("anthropic")
provider.initialize(ProviderConfig(
    api_key="sk-ant-...",  # Or use ANTHROPIC_API_KEY env
    extra={
        'enable_caching': True,     # 90% cost reduction
        'enable_thinking': True,    # Reasoning traces
        'thinking_budget': 15000,   # Max thinking tokens
    }
))

provider.connect("claude-sonnet-4-20250514")

# Create session
provider.create_session(
    system_instruction="You are a helpful assistant.",
    tools=[...]  # Optional tools
)

# Chat
response = provider.send_message("Explain quantum computing")
print(f"Thinking: {response.thinking}")
print(f"Answer: {response.text}")
.env file example
# .env
ANTHROPIC_API_KEY=sk-ant-api03-...
JAATO_PROVIDER=anthropic
MODEL_NAME=claude-sonnet-4-20250514