Skip to content

Ollama Engine

The Ollama engine provides inference using the Ollama local LLM server.

Status

Production - Fully implemented and tested.

Features

  • Text generation (chat and completion)
  • Embedding generation
  • Vision/multimodal models (LLaVA, etc.)
  • Streaming responses
  • Automatic model management (pull, list, delete)

Requirements

  • Ollama server running locally (default: http://localhost:11434)
  • Models pre-pulled or available for on-demand pulling

Configuration

engine:
  available:
    - ollama

  ollama:
    base_url: ${OLLAMA_BASE_URL:-http://localhost:11434}
    timeout: 600  # 10 minutes for large generations

Environment Variables

Variable Default Description
OLLAMA_BASE_URL http://localhost:11434 Ollama server URL

Supported Model Types

  • Chat models: llama3, mistral, qwen, etc.
  • Embedding models: nomic-embed-text, mxbai-embed-large, etc.
  • Vision models: llava, bakllava, qwen2.5-vl, etc.

Usage

Starting Ollama

# Start Ollama server
ollama serve

# Pull models
ollama pull llama3.1:8b
ollama pull nomic-embed-text

Job Examples

Text Generation:

{
  "model_id": "llama3.1:8b",
  "platform": "ollama",
  "job_type": "llm",
  "input_data": "Explain quantum computing in simple terms."
}

Embedding:

{
  "model_id": "nomic-embed-text",
  "platform": "ollama",
  "job_type": "embed",
  "input_data": {"texts": ["Hello world", "How are you?"]}
}

Vision:

{
  "model_id": "llava:7b",
  "platform": "ollama",
  "job_type": "llm",
  "input_data": "Describe this image",
  "attached_files": [{"download_url": "...", "file_type": "image/png"}]
}

Parameter Mapping

Generic parameters are automatically mapped to Ollama-specific parameters:

Generic Ollama Description
max_tokens num_predict Maximum tokens to generate
temperature temperature Sampling temperature
top_p top_p Nucleus sampling
top_k top_k Top-k sampling
stop_sequences stop Stop sequences

Implementation Files

  • src/engines/ollama.py - OllamaEngine class
  • src/engines/base.py - InferenceEngine base class

Troubleshooting

Connection refused

Ensure Ollama server is running: ollama serve

Model not found

Pull the model first: ollama pull <model-name>

Timeout errors

Increase timeout in config for large models/generations.