Multi-Engine Architecture¶
The MicroDC Worker supports running multiple inference engines simultaneously, loading them on-demand based on job requirements.
Overview¶
Instead of a single static engine, the worker can be configured with multiple available engines. Each engine is loaded only when a job requests it, optimizing resource usage.
Configuration¶
Available Engines¶
Configure which engines this worker supports:
Or via environment variable:
Engine-Specific Configuration¶
Each engine has its own configuration section:
engine:
available:
- ollama
- transformers
ollama:
base_url: http://localhost:11434
timeout: 600
transformers:
model_path: ./models
device: auto
max_memory_mb: 0
auto_unload: true
How It Works¶
1. Engine Discovery¶
At startup, the worker reads engine.available and stores the list of supported engine types. No engines are loaded yet.
2. Job Routing¶
When a job arrives, the worker checks the platform field:
3. On-Demand Loading¶
If the requested engine isn't loaded:
- Worker creates the engine instance
- Engine initializes with its config
- Engine is cached for future jobs
- Job is executed
4. Default Platform¶
If no platform is specified, the worker uses the first available engine.
Job Examples¶
Ollama Job¶
{
"model_id": "llama3.1:8b",
"platform": "ollama",
"job_type": "llm",
"input_data": "Explain quantum computing."
}
Transformers Job¶
{
"model_id": "meta-llama/Llama-2-7b-chat-hf",
"platform": "transformers",
"job_type": "llm",
"input_data": "Write a haiku about coding."
}
Embedding Job (Auto-Platform)¶
Without platform, uses first available engine that has the model.
Heartbeat Reporting¶
The worker reports all available engines and their loaded models in heartbeats:
{
"engines": ["ollama", "transformers"],
"models": [
{"id": "llama3.1:8b", "platform": "ollama"},
{"id": "nomic-embed-text", "platform": "transformers"}
]
}
Memory Management¶
Each engine manages its own memory independently:
- Ollama: Managed by Ollama server
- Transformers: LRU eviction with VRAM tracking
When multiple engines are loaded, be aware of total GPU memory usage.
Adding Custom Engines¶
To add a new engine:
- Create engine class inheriting from
InferenceEngine - Implement all abstract methods (see
src/engines/base.py) - Add configuration section to
config/default.yaml - Register in
src/core/client.py:_create_engine() - Add documentation in
docs/engines/
Troubleshooting¶
Engine not loading¶
- Check engine is listed in
engine.available - Verify engine dependencies are installed
- Check engine-specific config is valid
Wrong engine used¶
- Explicitly set
platformfield in job - Check model exists in expected engine
Out of memory¶
- Limit engines to what you need
- Use quantization for Transformers models
- Enable auto_unload for dynamic memory management