Ollama Provider

Run AI models locally with Ollama. No API costs, complete privacy, and access to a wide variety of open-source models including Qwen, Llama, Mistral, and more.

Provider Name	`ollama`
Module	`shared.plugins.model_provider.ollama`
Requires	Ollama v0.14.0+
Auth	None (local)

Benefits

No API costs - Run models for free
Privacy - Data never leaves your machine
Offline - Works without internet
Model variety - Qwen, Llama, Mistral, DeepSeek, etc.
Function calling - Via Anthropic-compatible API

Hardware Requirements

Local models require significant RAM (8GB+) and ideally a GPU. Performance depends on your hardware and model size.

Quick start

from jaato import JaatoClient

client = JaatoClient(provider_name="ollama")
client.connect(
    project=None,
    location=None,
    model="qwen3:32b"
)
client.configure_tools(registry)

response = client.send_message(
    "Hello from local Qwen!",
    on_output=on_output
)

Prerequisites

# Install Ollama
# macOS/Linux: https://ollama.com/download
curl -fsSL https://ollama.com/install.sh | sh

# Start Ollama server
ollama serve

# Pull a model
ollama pull qwen3:32b

# Verify it works
ollama run qwen3:32b "Hello!"

Recommended Models

These models work well with function calling. Model choice depends on your hardware capabilities.

For Coding Tasks

Model	Size	RAM
`qwen3:32b`	32B	~20GB
`qwen3:14b`	14B	~10GB
`deepseek-coder-v2:16b`	16B	~12GB
`codellama:13b`	13B	~10GB

For General Tasks

Model	Size	RAM
`llama3.3:70b`	70B	~40GB
`llama3.1:8b`	8B	~6GB
`mistral:7b`	7B	~5GB

Pull models

# Pull recommended coding model
ollama pull qwen3:32b

# Or smaller for limited RAM
ollama pull qwen3:14b

# List available models
ollama list

# See model details
ollama show qwen3:32b

Search for models

# Browse available models
# https://ollama.com/library

# Search for coding models
ollama search code

# Pull specific version
ollama pull qwen3:32b-instruct-q4_K_M

Configuration

Environment Variables

Variable	Default	Description
`OLLAMA_HOST`	`http://localhost:11434`	Ollama server URL
`OLLAMA_MODEL`	-	Default model name
`OLLAMA_CONTEXT_LENGTH`	32768	Context window override

ProviderConfig.extra Options

Key	Type	Default	Description
`host`	str	localhost:11434	Ollama server URL
`context_length`	int	32768	Context window size

Environment configuration

# .env file
JAATO_PROVIDER=ollama
OLLAMA_HOST=http://localhost:11434
OLLAMA_MODEL=qwen3:32b
OLLAMA_CONTEXT_LENGTH=32768

Remote Ollama server

# Connect to remote server
OLLAMA_HOST=http://192.168.1.100:11434

Function Calling

Ollama v0.14.0+ supports function calling through the Anthropic-compatible API. The provider automatically converts tool schemas to the correct format.

Requirements

Ollama v0.14.0 or later
A model that supports function calling

Supported Models

Most instruction-tuned models support function calling:

Qwen 3 (all sizes)
Llama 3.1, 3.3
Mistral
DeepSeek Coder

Function calling example

from jaato import JaatoClient

client = JaatoClient(provider_name="ollama")
client.connect(None, None, "qwen3:32b")
client.configure_tools(registry)

# Model will use tools as needed
response = client.send_message(
    "List the files in the current directory",
    on_output=on_output
)
# Model calls list_files tool

Check Ollama version

# Verify Ollama version >= 0.14.0
ollama --version

# Update if needed
curl -fsSL https://ollama.com/install.sh | sh

Running Ollama

Start Server

Ollama runs as a background server. Start it before using the provider.

GPU Acceleration

Ollama automatically uses GPU acceleration when available:

NVIDIA - CUDA (automatic)
AMD - ROCm (Linux)
Apple - Metal (automatic)

Memory Management

Ollama keeps models loaded in memory. Use ollama stop to unload when not needed.

Start Ollama

# Start server (foreground)
ollama serve

# Or run as service (Linux)
sudo systemctl start ollama

# Check if running
curl http://localhost:11434/api/tags

Memory management

# See loaded models
ollama ps

# Unload a model
ollama stop qwen3:32b

# Set GPU memory limit
OLLAMA_GPU_MEMORY=8g ollama serve

vs Cloud Providers

Feature	Ollama	Cloud APIs
Cost	Free (hardware only)	Pay-per-token
Privacy	100% local	Data sent to cloud
Speed	Depends on hardware	Generally faster
Model quality	Good (open source)	Best (proprietary)
Offline	Yes	No

When to Use Ollama

Privacy-sensitive tasks
Offline development
Cost-conscious usage
Experimentation with open models

Choose based on needs

# Privacy-first (local)
client = JaatoClient(provider_name="ollama")
client.connect(None, None, "qwen3:32b")

# Performance-first (cloud)
client = JaatoClient(provider_name="anthropic")
client.connect(None, None, "claude-sonnet-4-20250514")