Ollama Engine¶
The Ollama engine provides inference using the Ollama local LLM server.
Status¶
Production - Fully implemented and tested.
Features¶
- Text generation (chat and completion)
- Embedding generation
- Vision/multimodal models (LLaVA, etc.)
- Streaming responses
- Automatic model management (pull, list, delete)
Requirements¶
- Ollama server running locally (default:
http://localhost:11434) - Models pre-pulled or available for on-demand pulling
Configuration¶
engine:
available:
- ollama
ollama:
base_url: ${OLLAMA_BASE_URL:-http://localhost:11434}
timeout: 600 # 10 minutes for large generations
Environment Variables¶
| Variable | Default | Description |
|---|---|---|
OLLAMA_BASE_URL |
http://localhost:11434 |
Ollama server URL |
Supported Model Types¶
- Chat models: llama3, mistral, qwen, etc.
- Embedding models: nomic-embed-text, mxbai-embed-large, etc.
- Vision models: llava, bakllava, qwen2.5-vl, etc.
Usage¶
Starting Ollama¶
# Start Ollama server
ollama serve
# Pull models
ollama pull llama3.1:8b
ollama pull nomic-embed-text
Job Examples¶
Text Generation:
{
"model_id": "llama3.1:8b",
"platform": "ollama",
"job_type": "llm",
"input_data": "Explain quantum computing in simple terms."
}
Embedding:
{
"model_id": "nomic-embed-text",
"platform": "ollama",
"job_type": "embed",
"input_data": {"texts": ["Hello world", "How are you?"]}
}
Vision:
{
"model_id": "llava:7b",
"platform": "ollama",
"job_type": "llm",
"input_data": "Describe this image",
"attached_files": [{"download_url": "...", "file_type": "image/png"}]
}
Parameter Mapping¶
Generic parameters are automatically mapped to Ollama-specific parameters:
| Generic | Ollama | Description |
|---|---|---|
max_tokens |
num_predict |
Maximum tokens to generate |
temperature |
temperature |
Sampling temperature |
top_p |
top_p |
Nucleus sampling |
top_k |
top_k |
Top-k sampling |
stop_sequences |
stop |
Stop sequences |
Implementation Files¶
src/engines/ollama.py- OllamaEngine classsrc/engines/base.py- InferenceEngine base class
Troubleshooting¶
Connection refused¶
Ensure Ollama server is running: ollama serve
Model not found¶
Pull the model first: ollama pull <model-name>
Timeout errors¶
Increase timeout in config for large models/generations.