Ollama Engine¶

The Ollama engine provides inference using the Ollama local LLM server.

Status¶

Production - Fully implemented and tested.

Features¶

Text generation (chat and completion)
Embedding generation
Vision/multimodal models (LLaVA, etc.)
Streaming responses
Automatic model management (pull, list, delete)

Requirements¶

Ollama server running locally (default: http://localhost:11434)
Models pre-pulled or available for on-demand pulling

Configuration¶

engine:
  available:
    - ollama

  ollama:
    base_url: ${OLLAMA_BASE_URL:-http://localhost:11434}
    timeout: 600  # 10 minutes for large generations

Environment Variables¶

Variable	Default	Description
`OLLAMA_BASE_URL`	`http://localhost:11434`	Ollama server URL

Supported Model Types¶

Chat models: llama3, mistral, qwen, etc.
Embedding models: nomic-embed-text, mxbai-embed-large, etc.
Vision models: llava, bakllava, qwen2.5-vl, etc.

Usage¶

Starting Ollama¶

# Start Ollama server
ollama serve

# Pull models
ollama pull llama3.1:8b
ollama pull nomic-embed-text

Job Examples¶

Text Generation:

{
  "model_id": "llama3.1:8b",
  "platform": "ollama",
  "job_type": "llm",
  "input_data": "Explain quantum computing in simple terms."
}

Embedding:

{
  "model_id": "nomic-embed-text",
  "platform": "ollama",
  "job_type": "embed",
  "input_data": {"texts": ["Hello world", "How are you?"]}
}

Vision:

{
  "model_id": "llava:7b",
  "platform": "ollama",
  "job_type": "llm",
  "input_data": "Describe this image",
  "attached_files": [{"download_url": "...", "file_type": "image/png"}]
}

Parameter Mapping¶

Generic parameters are automatically mapped to Ollama-specific parameters:

Generic	Ollama	Description
`max_tokens`	`num_predict`	Maximum tokens to generate
`temperature`	`temperature`	Sampling temperature
`top_p`	`top_p`	Nucleus sampling
`top_k`	`top_k`	Top-k sampling
`stop_sequences`	`stop`	Stop sequences

Implementation Files¶

src/engines/ollama.py - OllamaEngine class
src/engines/base.py - InferenceEngine base class

Troubleshooting¶

Connection refused¶

Ensure Ollama server is running: ollama serve

Model not found¶

Pull the model first: ollama pull <model-name>

Timeout errors¶

Increase timeout in config for large models/generations.