Ollama Provider

Run AI models locally with Ollama. No API costs, complete privacy, and access to a wide variety of open-source models including Qwen, Llama, Mistral, and more.

Provider Nameollama
Moduleshared.plugins.model_provider.ollama
RequiresOllama v0.14.0+
AuthNone (local)

Benefits

  • No API costs - Run models for free
  • Privacy - Data never leaves your machine
  • Offline - Works without internet
  • Model variety - Qwen, Llama, Mistral, DeepSeek, etc.
  • Function calling - Via Anthropic-compatible API
Hardware Requirements
Local models require significant RAM (8GB+) and ideally a GPU. Performance depends on your hardware and model size.
Quick start
from jaato import JaatoClient

client = JaatoClient(provider_name="ollama")
client.connect(
    project=None,
    location=None,
    model="qwen3:32b"
)
client.configure_tools(registry)

response = client.send_message(
    "Hello from local Qwen!",
    on_output=on_output
)
Prerequisites
# Install Ollama
# macOS/Linux: https://ollama.com/download
curl -fsSL https://ollama.com/install.sh | sh

# Start Ollama server
ollama serve

# Pull a model
ollama pull qwen3:32b

# Verify it works
ollama run qwen3:32b "Hello!"

Recommended Models

These models work well with function calling. Model choice depends on your hardware capabilities.

For Coding Tasks

ModelSizeRAM
qwen3:32b 32B ~20GB
qwen3:14b 14B ~10GB
deepseek-coder-v2:16b 16B ~12GB
codellama:13b 13B ~10GB

For General Tasks

ModelSizeRAM
llama3.3:70b 70B ~40GB
llama3.1:8b 8B ~6GB
mistral:7b 7B ~5GB
Pull models
# Pull recommended coding model
ollama pull qwen3:32b

# Or smaller for limited RAM
ollama pull qwen3:14b

# List available models
ollama list

# See model details
ollama show qwen3:32b
Search for models
# Browse available models
# https://ollama.com/library

# Search for coding models
ollama search code

# Pull specific version
ollama pull qwen3:32b-instruct-q4_K_M

Configuration

Environment Variables

VariableDefaultDescription
OLLAMA_HOST http://localhost:11434 Ollama server URL
OLLAMA_MODEL - Default model name
OLLAMA_CONTEXT_LENGTH 32768 Context window override

ProviderConfig.extra Options

KeyTypeDefaultDescription
host str localhost:11434 Ollama server URL
context_length int 32768 Context window size
Environment configuration
# .env file
JAATO_PROVIDER=ollama
OLLAMA_HOST=http://localhost:11434
OLLAMA_MODEL=qwen3:32b
OLLAMA_CONTEXT_LENGTH=32768
Remote Ollama server
# Connect to remote server
OLLAMA_HOST=http://192.168.1.100:11434

Function Calling

Ollama v0.14.0+ supports function calling through the Anthropic-compatible API. The provider automatically converts tool schemas to the correct format.

Requirements

  • Ollama v0.14.0 or later
  • A model that supports function calling

Supported Models

Most instruction-tuned models support function calling:

  • Qwen 3 (all sizes)
  • Llama 3.1, 3.3
  • Mistral
  • DeepSeek Coder
Function calling example
from jaato import JaatoClient

client = JaatoClient(provider_name="ollama")
client.connect(None, None, "qwen3:32b")
client.configure_tools(registry)

# Model will use tools as needed
response = client.send_message(
    "List the files in the current directory",
    on_output=on_output
)
# Model calls list_files tool
Check Ollama version
# Verify Ollama version >= 0.14.0
ollama --version

# Update if needed
curl -fsSL https://ollama.com/install.sh | sh

Running Ollama

Start Server

Ollama runs as a background server. Start it before using the provider.

GPU Acceleration

Ollama automatically uses GPU acceleration when available:

  • NVIDIA - CUDA (automatic)
  • AMD - ROCm (Linux)
  • Apple - Metal (automatic)

Memory Management

Ollama keeps models loaded in memory. Use ollama stop to unload when not needed.

Start Ollama
# Start server (foreground)
ollama serve

# Or run as service (Linux)
sudo systemctl start ollama

# Check if running
curl http://localhost:11434/api/tags
Memory management
# See loaded models
ollama ps

# Unload a model
ollama stop qwen3:32b

# Set GPU memory limit
OLLAMA_GPU_MEMORY=8g ollama serve

vs Cloud Providers

FeatureOllamaCloud APIs
Cost Free (hardware only) Pay-per-token
Privacy 100% local Data sent to cloud
Speed Depends on hardware Generally faster
Model quality Good (open source) Best (proprietary)
Offline Yes No

When to Use Ollama

  • Privacy-sensitive tasks
  • Offline development
  • Cost-conscious usage
  • Experimentation with open models
Choose based on needs
# Privacy-first (local)
client = JaatoClient(provider_name="ollama")
client.connect(None, None, "qwen3:32b")

# Performance-first (cloud)
client = JaatoClient(provider_name="anthropic")
client.connect(None, None, "claude-sonnet-4-20250514")