Inference Engines¶

This directory contains documentation for the inference engines supported by the MicroDC Worker.

Available Engines¶

Engine	Status	Description
Ollama	Production	Local LLM inference via Ollama
Transformers	Production	HuggingFace Transformers for local model inference
vLLM	Planned	High-performance LLM inference

The worker supports running multiple engines simultaneously. See MULTI_ENGINE.md for details on:

# config/default.yaml
engine:
  available:
    - ollama
    - transformers

Jobs specify which engine to use via the platform field:

{
  "model_id": "llama3.1:8b",
  "platform": "ollama",
  "input_data": "Hello, world!"
}

If no platform is specified, the worker uses the first available engine.

To add a new inference engine:

Create engine class inheriting from InferenceEngine (see src/engines/base.py)
Implement all required abstract methods
Add configuration section to config/default.yaml
Register engine in src/core/client.py:_create_engine()
Add documentation in docs/engines/
Add tests in tests/

See transformers.md for a complete example.