Skip to content

Inference Engines

This directory contains documentation for the inference engines supported by the MicroDC Worker.

Available Engines

Engine Status Description
Ollama Production Local LLM inference via Ollama
Transformers Production HuggingFace Transformers for local model inference
vLLM Planned High-performance LLM inference

Multi-Engine Architecture

The worker supports running multiple engines simultaneously. See MULTI_ENGINE.md for details on:

  • Configuring multiple engines
  • On-demand engine loading
  • Job routing based on platform
  • Memory management across engines

Quick Start

# config/default.yaml
engine:
  available:
    - ollama
    - transformers

Engine Selection for Jobs

Jobs specify which engine to use via the platform field:

{
  "model_id": "llama3.1:8b",
  "platform": "ollama",
  "input_data": "Hello, world!"
}

If no platform is specified, the worker uses the first available engine.

Adding a New Engine

To add a new inference engine:

  1. Create engine class inheriting from InferenceEngine (see src/engines/base.py)
  2. Implement all required abstract methods
  3. Add configuration section to config/default.yaml
  4. Register engine in src/core/client.py:_create_engine()
  5. Add documentation in docs/engines/
  6. Add tests in tests/

See transformers.md for a complete example.