Ollama Provider
Run AI models locally with Ollama. No API costs, complete privacy, and access to a wide variety of open-source models including Qwen, Llama, Mistral, and more.
| Provider Name | ollama |
| Module | shared.plugins.model_provider.ollama |
| Requires | Ollama v0.14.0+ |
| Auth | None (local) |
Benefits
- No API costs - Run models for free
- Privacy - Data never leaves your machine
- Offline - Works without internet
- Model variety - Qwen, Llama, Mistral, DeepSeek, etc.
- Function calling - Via Anthropic-compatible API
from jaato import JaatoClient
client = JaatoClient(provider_name="ollama")
client.connect(
project=None,
location=None,
model="qwen3:32b"
)
client.configure_tools(registry)
response = client.send_message(
"Hello from local Qwen!",
on_output=on_output
)
# Install Ollama
# macOS/Linux: https://ollama.com/download
curl -fsSL https://ollama.com/install.sh | sh
# Start Ollama server
ollama serve
# Pull a model
ollama pull qwen3:32b
# Verify it works
ollama run qwen3:32b "Hello!"
Recommended Models
These models work well with function calling. Model choice depends on your hardware capabilities.
For Coding Tasks
| Model | Size | RAM |
|---|---|---|
qwen3:32b |
32B | ~20GB |
qwen3:14b |
14B | ~10GB |
deepseek-coder-v2:16b |
16B | ~12GB |
codellama:13b |
13B | ~10GB |
For General Tasks
| Model | Size | RAM |
|---|---|---|
llama3.3:70b |
70B | ~40GB |
llama3.1:8b |
8B | ~6GB |
mistral:7b |
7B | ~5GB |
# Pull recommended coding model
ollama pull qwen3:32b
# Or smaller for limited RAM
ollama pull qwen3:14b
# List available models
ollama list
# See model details
ollama show qwen3:32b
# Browse available models
# https://ollama.com/library
# Search for coding models
ollama search code
# Pull specific version
ollama pull qwen3:32b-instruct-q4_K_M
Configuration
Environment Variables
| Variable | Default | Description |
|---|---|---|
OLLAMA_HOST |
http://localhost:11434 |
Ollama server URL |
OLLAMA_MODEL |
- | Default model name |
OLLAMA_CONTEXT_LENGTH |
32768 | Context window override |
ProviderConfig.extra Options
| Key | Type | Default | Description |
|---|---|---|---|
host |
str | localhost:11434 | Ollama server URL |
context_length |
int | 32768 | Context window size |
# .env file
JAATO_PROVIDER=ollama
OLLAMA_HOST=http://localhost:11434
OLLAMA_MODEL=qwen3:32b
OLLAMA_CONTEXT_LENGTH=32768
# Connect to remote server
OLLAMA_HOST=http://192.168.1.100:11434
Function Calling
Ollama v0.14.0+ supports function calling through the Anthropic-compatible API. The provider automatically converts tool schemas to the correct format.
Requirements
- Ollama v0.14.0 or later
- A model that supports function calling
Supported Models
Most instruction-tuned models support function calling:
- Qwen 3 (all sizes)
- Llama 3.1, 3.3
- Mistral
- DeepSeek Coder
from jaato import JaatoClient
client = JaatoClient(provider_name="ollama")
client.connect(None, None, "qwen3:32b")
client.configure_tools(registry)
# Model will use tools as needed
response = client.send_message(
"List the files in the current directory",
on_output=on_output
)
# Model calls list_files tool
# Verify Ollama version >= 0.14.0
ollama --version
# Update if needed
curl -fsSL https://ollama.com/install.sh | sh
Running Ollama
Start Server
Ollama runs as a background server. Start it before using the provider.
GPU Acceleration
Ollama automatically uses GPU acceleration when available:
- NVIDIA - CUDA (automatic)
- AMD - ROCm (Linux)
- Apple - Metal (automatic)
Memory Management
Ollama keeps models loaded in memory. Use ollama stop
to unload when not needed.
# Start server (foreground)
ollama serve
# Or run as service (Linux)
sudo systemctl start ollama
# Check if running
curl http://localhost:11434/api/tags
# See loaded models
ollama ps
# Unload a model
ollama stop qwen3:32b
# Set GPU memory limit
OLLAMA_GPU_MEMORY=8g ollama serve
vs Cloud Providers
| Feature | Ollama | Cloud APIs |
|---|---|---|
| Cost | Free (hardware only) | Pay-per-token |
| Privacy | 100% local | Data sent to cloud |
| Speed | Depends on hardware | Generally faster |
| Model quality | Good (open source) | Best (proprietary) |
| Offline | Yes | No |
When to Use Ollama
- Privacy-sensitive tasks
- Offline development
- Cost-conscious usage
- Experimentation with open models
# Privacy-first (local)
client = JaatoClient(provider_name="ollama")
client.connect(None, None, "qwen3:32b")
# Performance-first (cloud)
client = JaatoClient(provider_name="anthropic")
client.connect(None, None, "claude-sonnet-4-20250514")