NVIDIA NIM Provider

Access AI models through NVIDIA NIM (Inference Microservices), supporting both NVIDIA's hosted API at build.nvidia.com and self-hosted NIM containers. Uses the OpenAI-compatible chat completions API.

Provider Namenim
Moduleshared.plugins.model_provider.nim
SDKopenai (OpenAI-compatible API)
AuthAPI key (hosted) or none (self-hosted)

Highlights

  • NIM Catalog — Access Llama, DeepSeek-R1, Nemotron, Mistral, and more
  • Self-hosted support — Run NIM containers locally with no API key
  • Reasoning models — DeepSeek-R1 with chain-of-thought via reasoning_content
  • Function calling — Full tool use support via OpenAI-compatible API
  • Streaming — Real-time streaming with cancellation support
OpenAI Compatibility
NIM exposes an OpenAI-compatible chat completions API. The provider uses the openai Python SDK as its transport layer, making it compatible with any model in the NIM catalog.
Quick start
from jaato import JaatoClient

client = JaatoClient(provider_name="nim")
client.connect(
    project=None,
    location=None,
    model="meta/llama-3.3-70b-instruct"
)
client.configure_tools(registry)

response = client.send_message(
    "Hello from NIM!",
    on_output=on_output
)
Self-hosted NIM container
from shared import load_provider, ProviderConfig

provider = load_provider("nim")
provider.initialize(ProviderConfig(
    extra={
        'base_url': 'http://localhost:8000/v1',
    }
))
# No API key needed for self-hosted
provider.connect("meta/llama-3.1-8b-instruct")

response = provider.send_message("Hello!")
print(response.text)

Available Models

NIM hosts a large catalog of models. Below are some popular options. See build.nvidia.com for the full catalog.

Popular Models

ModelReasoningNotes
meta/llama-3.3-70b-instruct No Strong general-purpose reasoning
meta/llama-3.1-405b-instruct No Largest open model
meta/llama-3.1-70b-instruct No Good balance of speed and quality
meta/llama-3.1-8b-instruct No Fast, lightweight
deepseek/deepseek-r1 Yes Chain-of-thought reasoning
nvidia/llama-3.1-nemotron-70b-instruct No NVIDIA-tuned Llama
Model Naming
NIM models use the format org/model-name. When using with jaato, prefix with nim/ for provider routing: nim/meta/llama-3.3-70b-instruct.
Connect to a model
# Llama 3.3 70B
provider.connect("meta/llama-3.3-70b-instruct")

# DeepSeek-R1 (reasoning)
provider.connect("deepseek/deepseek-r1")

# Nemotron (NVIDIA-tuned)
provider.connect("nvidia/llama-3.1-nemotron-70b-instruct")
Reasoning model output
provider.connect("deepseek/deepseek-r1")
provider.create_session(
    system_instruction="You are helpful.",
    tools=[]
)

response = provider.send_message(
    "What is 25 * 37?"
)

if response.thinking:
    print("=== Reasoning ===")
    print(response.thinking)
    print()

print("=== Answer ===")
print(response.text)

Authentication

NVIDIA's hosted NIM API requires an API key (nvapi-...). Self-hosted NIM containers run without authentication.

Get Your API Key

Credential Priority

PrioritySource
1 (highest) JAATO_NIM_API_KEY environment variable
2 Stored credentials (.jaato/nim_auth.json)
3 api_key in ProviderConfig
Self-hosted endpoints need no key

Credential Storage

Credentials are stored in JSON files with Unix permissions 0600:

  • Project: .jaato/nim_auth.json (if .jaato/ exists)
  • User: ~/.jaato/nim_auth.json (fallback)
Environment variable
# .env file
JAATO_NIM_API_KEY=nvapi-your-key-here
JAATO_PROVIDER=nim
JAATO_NIM_MODEL=meta/llama-3.3-70b-instruct
Interactive login (TUI)
# Login - shows instructions
nim-auth login

# Store API key directly
nim-auth key nvapi-your-key-here

# Check status
nim-auth status

# Logout - clears stored credentials
nim-auth logout
Programmatic login
from shared.plugins.model_provider.nim.auth import (
    login_with_key,
    validate_api_key,
)

# Validate and store
creds = login_with_key(
    api_key="nvapi-your-key",
    on_message=lambda msg: print(msg)
)

# Or just validate
valid, detail = validate_api_key("nvapi-your-key")
print(f"Valid: {valid}")

Self-Hosted NIM

NVIDIA NIM containers can run locally or on your own infrastructure. Self-hosted instances on localhost or private networks don't require an API key.

Detected as Self-Hosted

The provider automatically detects self-hosted endpoints by hostname:

  • localhost / 127.0.0.1
  • 192.168.* / 10.* (private networks)
Self-hosted configuration
# .env file for self-hosted
JAATO_PROVIDER=nim
JAATO_NIM_BASE_URL=http://localhost:8000/v1
JAATO_NIM_MODEL=meta/llama-3.1-8b-instruct
Docker example
# Pull and run a NIM container
docker run --gpus all \
  -p 8000:8000 \
  nvcr.io/nim/meta/llama-3.1-8b-instruct:latest

# Connect jaato
export JAATO_NIM_BASE_URL=http://localhost:8000/v1
export JAATO_PROVIDER=nim

Configuration

Environment Variables

VariableDefaultDescription
JAATO_NIM_API_KEY API key (nvapi-...) for hosted NIM
JAATO_NIM_BASE_URL https://integrate.api.nvidia.com/v1 API endpoint URL
JAATO_NIM_MODEL Default model name
JAATO_NIM_CONTEXT_LENGTH 32768 Context window size override

ProviderConfig.extra Options

KeyTypeDefaultDescription
base_url str See env Override API base URL
context_length int 32768 Override context window
Full .env example
# .env
JAATO_PROVIDER=nim
JAATO_NIM_API_KEY=nvapi-your-key-here
JAATO_NIM_MODEL=meta/llama-3.3-70b-instruct
JAATO_NIM_CONTEXT_LENGTH=131072
Full programmatic config
from shared import load_provider, ProviderConfig

provider = load_provider("nim")
provider.initialize(ProviderConfig(
    api_key="nvapi-your-key",
    extra={
        'base_url': 'https://integrate.api.nvidia.com/v1',
        'context_length': 131072,
    }
))

provider.connect("meta/llama-3.3-70b-instruct")

provider.create_session(
    system_instruction="You are a helpful assistant.",
    tools=[...]
)

response = provider.send_message("Hello!")
print(response.text)

Error Handling

The provider maps OpenAI SDK exceptions to NIM-specific error types for consistent error handling.

ExceptionCause
APIKeyNotFoundError No API key found and endpoint is not self-hosted
AuthenticationError API key rejected (401/403)
RateLimitError Rate limit exceeded (429)
ModelNotFoundError Invalid model name (404)
ContextLimitError Input too large for model context
InfrastructureError Server error (5xx) or connection failure
Handle errors
from shared.plugins.model_provider.nim.errors import (
    APIKeyNotFoundError,
    AuthenticationError,
    RateLimitError,
)

try:
    provider.initialize(config)
    provider.send_message("Hello")
except APIKeyNotFoundError:
    print("Set JAATO_NIM_API_KEY or run: nim-auth login")
except AuthenticationError:
    print("Invalid API key")
except RateLimitError as e:
    if e.retry_after:
        print(f"Rate limited, retry in {e.retry_after}s")

Corporate Proxy Support

The provider supports corporate proxy environments including:

  • HTTP/HTTPS proxies via standard environment variables
  • Custom CA certificate bundles
  • Kerberos/SPNEGO proxy authentication

Configuration uses the same proxy settings as other jaato providers. See the Proxy & Kerberos guide for details.

Proxy configuration
# Standard proxy
export HTTPS_PROXY=http://proxy.corp.com:8080

# Corporate CA bundle
export REQUESTS_CA_BUNDLE=/path/to/ca-bundle.crt

# Kerberos proxy auth
export JAATO_KERBEROS_PROXY=true