Inference Engines¶
This directory contains documentation for the inference engines supported by the MicroDC Worker.
Available Engines¶
| Engine | Status | Description |
|---|---|---|
| Ollama | Production | Local LLM inference via Ollama |
| Transformers | Production | HuggingFace Transformers for local model inference |
| vLLM | Planned | High-performance LLM inference |
Multi-Engine Architecture¶
The worker supports running multiple engines simultaneously. See MULTI_ENGINE.md for details on:
- Configuring multiple engines
- On-demand engine loading
- Job routing based on platform
- Memory management across engines
Quick Start¶
Engine Selection for Jobs¶
Jobs specify which engine to use via the platform field:
If no platform is specified, the worker uses the first available engine.
Adding a New Engine¶
To add a new inference engine:
- Create engine class inheriting from
InferenceEngine(seesrc/engines/base.py) - Implement all required abstract methods
- Add configuration section to
config/default.yaml - Register engine in
src/core/client.py:_create_engine() - Add documentation in
docs/engines/ - Add tests in
tests/
See transformers.md for a complete example.