MicroDC.ai Worker Client - Development TODO¶
CI/CD & Infrastructure ✅ (2026-02-13)¶
Dockerfile ✅¶
- [x] Created multi-stage Dockerfile based on
nvidia/cuda:12.2.2-runtime-ubuntu22.04 - [x] Builder stage installs Python 3.11 and pip dependencies in a venv
- [x] Runtime stage copies only the venv and source code for smaller image
- [x] Configuration via environment variables (MICRODC_API_KEY, MICRODC_SERVER_URL, etc.)
- [x] Entrypoint set to
python -m src.core.cli - [x] Added
.dockerignoreto exclude .git, venv, tests, docs, caches, models
GitLab CI Pipeline ✅¶
- [x] Created
.gitlab-ci.ymlwith 4 stages: test, lint, build, deploy - [x]
unit-testsjob: runs pytest on Python 3.11 - [x]
lintjob: runs black --check, isort --check, ruff check - [x]
docker-buildjob: builds and pushes image to GitLab Container Registry (main only) - [x]
pagesjob: builds MkDocs site and deploys to GitLab Pages (main only) - [x] All jobs tagged with
aisrv05-docker - [x] Pip cache configured for faster CI runs
MkDocs Documentation Site ✅¶
- [x] Created
mkdocs.ymlwith Material theme (dark/light toggle) - [x] Created
requirements-docs.txt(mkdocs + mkdocs-material) - [x] Created
docs/index.mdlanding page adapted from README - [x] Nav structure covers all existing docs (setup, architecture, engines, API, etc.)
- [x] GitLab Pages deployment configured in CI pipeline
New Features ✅ (2025-11-21)¶
Multimodal File Support ✅¶
- [x] Added
attached_filesfield to Job model (models.py:284-287) - [x] Implemented file download method in ServerClient (server_client.py:978-1010)
- [x] Updated client.py to pass attached_files to Job objects (client.py:1012)
- [x] Added file download and base64 encoding in JobExecutor (executor.py:333-377)
- [x] Updated OllamaEngineV2.generate() to accept images parameter (ollama_v2.py:329, 365-366, 383, 404)
- [x] Added logging for attached files in job details (client.py:1050-1053)
- [x] Support for vision models (qwen2.5vl:7b, llava, bakllava, etc.)
- [x] Support for document processing models (docling)
- [x] Automatic base64 encoding for image files
- [x] Document file handling for processing engines (executor.py:404-432)
- [x] Job routing based on job_type first, not model_id (executor.py:384-396)
- [x] Case-insensitive job_type handling (executor.py:164, 380)
- [x] Relaxed input_data validation for document jobs with attached files (executor.py:173-178)
- [x] Graceful handling of unsupported file types
Bug Fixes ✅ (2025-11-21)¶
Configuration Type Casting Fixes ✅¶
- [x] Fixed multiple "'<' not supported between instances of 'int' and 'str'" errors
- [x] Root cause: Environment variable substitution in YAML returns strings, not ints
- [x] Fixed job queue max_size comparison (client.py:287)
- [x] Fixed docling max_file_size_mb comparison (docling_engine.py:51)
- [x] Fixed docling timeout_seconds (docling_engine.py:52)
- [x] Enhanced priority validator to handle None values (models.py:289)
- [x] Added explicit int() cast for all numeric priority values (models.py:299)
- [x] Changed priority default from string "normal" to integer 5 (client.py:971)
Ollama API Parameter Handling Fix ✅¶
- [x] Fixed TypeError when passing None values for optional parameters
- [x] Changed from passing
images=Noneto conditionally including images parameter - [x] Build parameter dictionaries dynamically (ollama_v2.py:378-415)
- [x] Only include options/images when they have actual values
Embed Job Payload Compatibility Fix ✅¶
- [x] Fixed embed job execution to accept both
textsandinputfield formats - [x] Server sends
inputfield in payload, worker expectedtexts - [x] Updated executor.py:367 to try both fields with fallback
- [x] Maintains compatibility with both payload formats
Test Suite CI/CD Compatibility Fix ✅¶
- [x] Fixed test failures in automation/CI environments (test_config.py)
- [x] Tests were failing due to MICRODC_API_KEY environment variable overriding config
- [x] Added
@patch.dictdecorator to isolate test environment (test_config.py:30, 58) - [x] Clear MICRODC_* environment variables at start of affected tests
- [x] Both failing tests now pass consistently in automation environments
- [x] All 115 tests now pass reliably in CI/CD pipelines
Heartbeat Model Reporting Enhancement ✅¶
- [x] Added processing engine models (e.g., docling) to heartbeat current_models field
- [x] Worker now reports both inference engine models (Ollama) and processing engine models
- [x] Added timeout protection (1.0s) for processing engine model listing
- [x] Enhanced debug logging to distinguish between inference and processing engines
- [x] Server now receives complete model inventory including docling
- [x] Added get_platform_name() method to Engine base class
- [x] DoclingEngine reports platform as "docling" (not "ollama")
- [x] Each engine reports its own platform name for accurate server tracking
- [x] Updated run.sh and run_prod.sh to display processing engines in configuration output
Document Processing Integration ✅ (2025-11-21)¶
Docling Integration ✅¶
- [x] Added Docling dependency to requirements.txt
- [x] Created DocumentProcessor class in src/processors/
- [x] Implemented support for PDF, DOCX, PPTX, XLSX, HTML, images, audio
- [x] Added URL, base64, and path input handling
- [x] Implemented multiple output formats (markdown, HTML, JSON, doctags)
- [x] Added OCR and table extraction capabilities
- [x] Implemented file size limits and timeout handling
- [x] Added comprehensive error handling and validation
Surya OCR Integration ✅ (2025-12-09)¶
- [x] Created SuryaProcessor class in src/processors/surya_processor.py
- [x] Created SuryaEngine class in src/engines/surya_engine.py
- [x] Implemented support for PDF and image files (PNG, JPEG, TIFF, BMP, WEBP)
- [x] Added OCR text recognition with 90+ language support
- [x] Implemented layout detection feature
- [x] Implemented table recognition feature
- [x] Added URL, base64, and path input handling
- [x] Implemented multiple output formats (markdown, JSON, text)
- [x] Added file size limits and timeout handling
- [x] Added comprehensive error handling and validation
- [x] Added configuration to default.yaml
- [x] Registered engine in client.py
- [x] Updated job executor to route by model_id (not just job_type)
- [x] Auto-detect device (cuda/mps/cpu) for optimal performance
- [x] Updated to new Surya API (Predictor classes)
- [x] Updated documentation (README.md, FUTURE_ENGINES.md, TODO.md)
Engine Architecture Refactoring ✅¶
- [x] Created Engine base class and EngineType/EngineCapability enums
- [x] Created ProcessingEngine abstract class for non-inference engines
- [x] Refactored InferenceEngine to inherit from Engine base
- [x] Implemented DoclingEngine as ProcessingEngine
- [x] Added model registration system for processing engines
- [x] Implemented model-based routing (jobs route by model_id, not job_type)
- [x] Added lazy model lookup for processing engines
- [x] Updated JobExecutor to support List[Engine] for processing engines
- [x] Implemented _get_processing_engine_for_model() helper method
- [x] Fixed event loop handling for async model registration
- [x] Updated all tests for model-based routing (17/17 passing)
- [x] Updated WorkerClient to pass processing engines as list
- [x] Fixed capability registration to iterate over list
Job Routing Changes ✅¶
- [x] Changed from job_type-based routing to model_id-based routing
- [x] Jobs now specify model_id="docling" to route to DoclingEngine
- [x] Supports multiple engines of same type (e.g., model_id="tesseract" vs "easyocr")
- [x] All jobs now require model_id for routing
- [x] Processing engine models skip model loading (always available)
- [x] Inference engine models follow normal load/unload lifecycle
Configuration ✅¶
- [x] Added engine.processing.docling section to config/default.yaml
- [x] Moved configuration under engine section for consistency
- [x] Implemented environment variable support for all engine settings
- [x] Added configurable max file size, timeout
- [x] Implemented temp directory configuration
- [x] Updated WorkerClient to read from engine.processing.docling path
- [x] Cleaned up default.yaml to remove unimplemented engines (vLLM, Transformers)
- [x] Created docs/FUTURE_ENGINES.md for planned engine configurations
- [x] Updated README.md to reference FUTURE_ENGINES.md
- [x] Removed unused config sections (performance, security, development)
- [x] Removed unused config values (metrics_interval, model loading strategy)
- [x] Simplified config to only show what's actually implemented
Testing ✅¶
- [x] Created comprehensive test suite (tests/test_document_processor.py)
- [x] Unit tests for DocumentProcessor class
- [x] Integration tests for JobExecutor document handling
- [x] Tests for error conditions and edge cases
- [x] Tests for JobParameters document field validation
Documentation ✅¶
- [x] Created DOCUMENT_PROCESSING_INTEGRATION.md guide
- [x] Documented server-side changes needed
- [x] Documented Python client integration requirements
- [x] Updated README.md with document processing section
- [x] Added document processing to feature list
- [x] Updated project structure diagram
Production Readiness ✅ (2025-11-14)¶
Long-Running Job Heartbeat Fix ✅¶
- [x] Fixed heartbeat blocking during long-running jobs
- [x] Added 5s timeout to engine.list_models() in heartbeat loop
- [x] Added 3s timeout to system metrics collection with asyncio.to_thread()
- [x] Added 15s timeout to heartbeat send operation
- [x] Implemented fallback to cached model registry on timeout
- [x] Heartbeats now guaranteed to send every 30s even during 10min+ jobs
- [x] Job progress included in heartbeat to reset server-side timeout
Code Quality and Standards ✅¶
- [x] Fixed all ruff linting errors (14 errors resolved)
- [x] Updated pyproject.toml to modern ruff configuration format
- [x] Replaced all bare except clauses with specific Exception handling
- [x] Fixed unused variable warnings
- [x] All 115 tests passing (updated from 98 after multimodal/document features)
- [x] Set up Black for code formatting (v24.10.0) - available in dev extras
- [x] Set up Ruff for linting (v0.9.1) - configured in pyproject.toml
- [x] Set up MyPy for type checking (v1.14.0) - available in dev extras
- [x] Configure pre-commit hooks (2025-11-23)
- [x] Created .pre-commit-config.yaml with 10 hooks
- [x] Configured Black (line-length=100)
- [x] Configured isort (profile=black)
- [x] Configured Ruff (auto-fix enabled, excludes notebooks)
- [x] Configured mypy (excludes tests/examples/notebooks)
- [x] Configured bandit for security checks
- [x] Configured markdownlint
- [x] Configured YAML formatter (pretty-format-yaml)
- [x] Added python-safety-dependencies-check
- [x] Configured general file checks (trailing-whitespace, end-of-file-fixer, etc.)
- [x] Install pre-commit:
pip install pre-commit && pre-commit install - [x] Run hooks:
pre-commit run --all-files - [x] Fixed all Bandit security issues (2025-11-24)
- [x] Fixed 2 high-severity tarfile extraction vulnerabilities (B202)
- [x] Replaced unsafe os.system() calls with subprocess.run()
- [x] Replaced 13 empty except:pass blocks with proper error handling
- [x] Replaced 4 assert statements with RuntimeError exceptions
- [x] Added #nosec comments for legitimate subprocess usage
- [x] All security checks now pass with 0 issues
- [ ] Add type hints to all functions (in progress - core modules have hints)
- [ ] Achieve >90% test coverage (currently 37% overall - 4547 statements, 2880 uncovered)
- High coverage modules: api/models.py (98%), core/exceptions.py (100%), core/config.py (78%)
- Medium coverage: engines/base.py (72%), jobs/monitor.py (87%), models/registry.py (72%)
- Low coverage: cli.py (0%), client.py (0%), server_client.py (10%) - mostly integration code
- Focus areas: Increase coverage for job executor (47%), ollama engines, document processor (74%)
Documentation and File Management ✅¶
- [x] Added docs/AUTHENTICATION.md to version control
- [x] Added docs/CODE_QUALITY.md for development tools and standards (2025-11-23)
- [x] Created CONTRIBUTING.md with contributor guidelines and workflow (2025-11-23)
- [x] Removed deprecated documentation files (SETUP.md, WORKER_CHANGE_REQUEST.md)
- [x] Removed unused example_usage.py file
- [x] Moved test_max_tokens.py to tests/ directory with proper pytest decorator
- [x] Updated README.md with code quality tools section and contributor guide
TODO Comments Resolution ✅¶
- [x] Replaced all TODO comments with explanatory notes
- [x] Documented that log rotation is handled by systemd
- [x] Noted signature verification is future enhancement (checksums provide integrity)
- [x] Clarified credential validation handled by ServerClient
- [x] Updated daemon/log viewing commands to reference systemd
Project Hygiene ✅¶
- [x] Enhanced .gitignore with comprehensive Python patterns
- [x] Cleaned all build artifacts (pycache, *.pyc, .DS_Store)
- [x] All changes staged and ready for commit
Phase 1: Core Implementation 🚀 ✅¶
Project Setup ✅¶
- [x] Set up project structure and core directories
- [x] Create requirements.txt and setup.py
- [x] Create default configuration file (config/default.yaml)
Core Components ✅¶
- [x] Implement core configuration management (config.py)
- [x] Create Pydantic models for API communication
- [x] Implement abstract InferenceEngine base class
- [x] Build Ollama engine integration
Server Communication ✅¶
- [x] Create server API client for communication
- [x] Implement worker registration flow
- [x] Build model discovery and capability reporting
- [x] Implement heartbeat mechanism
Job Processing ✅¶
- [x] Create job executor for processing inference jobs
- [x] Build job queue management system
- [x] Implement system resource monitoring utilities
Client Orchestration ✅¶
- [x] Create main client orchestrator
- [x] Add CLI interface with commands
- [x] Set up logging and health checks
Phase 2: Testing & Robustness 🛡️¶
Testing¶
- [x] Write unit tests for core components
- [x] Config tests (tests/test_config.py)
- [x] Model management tests (tests/test_models.py)
- [x] Job queue tests (tests/test_jobs.py)
- [x] System monitor tests (tests/test_system.py)
- [x] API model validation tests (tests/test_api_models.py)
- [ ] Test end-to-end worker registration and job execution
- [ ] Integration tests with mock server
Error Handling ✅¶
- [x] Add error handling and retry logic
- [x] Implement graceful shutdown
- [x] Add resource limits enforcement
GPU Support ✅¶
- [x] NVIDIA GPU detection via nvidia-ml-py
- [x] Apple Metal Performance Shaders (MPS) detection (requires PyTorch)
- [x] Unified memory tracking for Apple Silicon
- [x] GPU capability reporting in system info
- [x] PyTorch dependency added to GPU extras for MPS support
Development Tools ✅¶
- [x] Development run script (tools/run.sh)
- [x] Test script with interactive menu (tools/test_worker.sh)
- [x] Makefile for common tasks
- [x] Watch mode for auto-restart
- [x] Interactive shell mode for debugging
Phase 3: Additional Features (Future) 🔮¶
Engine Support¶
- [ ] vLLM integration
- [x] HuggingFace Transformers integration (2025-12-15)
- [x] Created TransformersEngine class inheriting from InferenceEngine
- [x] Support for text generation, embeddings, and multimodal models
- [x] Dynamic VRAM management with LRU model eviction
- [x] bitsandbytes 4-bit/8-bit quantization support
- [x] HuggingFace Hub download with configurable allowlist/blocklist
- [x] Case-insensitive allowlist matching
- [x] On-demand model download for allowlisted Hub models
- [x] Streaming text generation via TextIteratorStreamer
- [x] Auto device selection (CUDA, MPS, CPU)
- [x] Job platform routing via
platformfield - [x] 47 unit tests with 66% coverage
- [ ] Custom engine plugin system
Advanced Features¶
- [x] Auto-Update System (2025-09-23)
- [x] Version checking against server requirements
- [x] Automatic download and installation of updates
- [x] Graceful shutdown before updates
- [x] Rollback capability on update failure
- [x] Maintenance window support
- [x] Platform-specific update scripts (Linux/macOS/Windows)
- [x] Model auto-pulling on demand (Server-Initiated Model Downloads)
- [x] Added PendingDownloadRequest, DownloadResponseType Pydantic models
- [x] Added model_downloads config section with enable toggle, allowlists, hardware thresholds
- [x] Created ModelDownloadManager for download orchestration with hardware validation
- [x] Integrated with heartbeat response
pending_download_requestsfield - [x] API endpoints: respond, progress, complete (GET/POST /api/v1/workers/download-requests/)
- [x] Hardware compatibility checking (estimated_size_gb, required_vram_gb, required_ram_gb)
- [x] Platform-specific allowlist/blocklist support with wildcard patterns
- [x] Automatic model existence check before downloading
- [x] Resume detection via GET /api/v1/workers/active-downloads on startup
- [ ] Multi-GPU support
- [ ] Job prioritization (basic priority support already implemented)
- [ ] Result caching
- [ ] Distributed inference support
Code Cleanup & Refactoring ✅¶
- [x] Resolve duplicate class definitions (GPUInfo, CPUInfo classes are correctly separated - internal dataclasses with to_api_model converters)
- [x] Remove unused exception classes (removed ConfigurationError)
- [x] Fix placeholder code (removed "not yet implemented" message in logging.py)
- [x] Fix linting issues (resolved unused variables and bare except clauses)
- [x] Enhanced error messages for missing API key configuration
- [x] Add parameter tracking for max_tokens → num_predict conversion
- [x] Create test utilities for parameter verification
- [ ] Review and remove/document unused API models (kept as they define server API contract)
- [ ] Clean up unused system utility functions (many are CLI entry points or public APIs)
- [ ] Add tests for public APIs to prevent accidental removal
- [ ] Document which "unused" functions are actually public APIs or future hooks
Documentation¶
- [ ] API documentation
- [x] Deployment guide
- [x] Created docs/setup/UBUNTU_SETUP.md - comprehensive Ubuntu/systemd installation guide
- [x] Created docs/setup/WINDOWS_SETUP.md - Windows installation guide with service options
- [ ] Configuration reference
- [ ] Troubleshooting guide
Operations¶
- [ ] Prometheus metrics export
- [ ] OpenTelemetry tracing
- [ ] Performance profiling
- [ ] Auto-scaling support
Current Status 📊¶
Completed ✅¶
- All core infrastructure and main components
- Ollama engine integration with full API support
- Complete server communication layer with retry logic
- Comprehensive resource monitoring (CPU, GPU, Memory, Storage)
- Model lifecycle management with loading strategies
- Job processing pipeline with queue management
- Rich CLI interface with all essential commands
- Error handling with custom exceptions
- Unit test suite covering core components
- Claim-based job assignment system implementation
- Bearer token authentication with credential persistence
- Automatic credential reuse on worker restart
- Support for both "input" and "prompt" fields in job payload
- Proper handling of required
tokens_usedfield in job completion - Race condition mitigation with completion delay
- Detailed server response logging for debugging
- Updated to match new server API (removed capabilities field, uses normalized tables)
- Fixed heartbeat format to include status object and supported_models list
- Enhanced error messaging for missing API key configuration
- Parameter tracking and logging for max_tokens → num_predict conversion
- Code cleanup: removed unused code, fixed linting issues
- Test utilities for parameter verification
- Multimodal support: Job model and client support for llm_interaction_type and modalities
- Model platform tracking: Reports inference platform (ollama, vLLM, etc.) for each model in heartbeat (registration simplified to not include models)
- Resilient server communication: Worker survives temporary server outages with automatic retry and graceful recovery
In Progress 🚧¶
- Integration testing with real MicroHub server
- Performance optimization for high-throughput scenarios
Recently Completed ✅¶
- HuggingFace Transformers Engine (2025-12-15)
- Created TransformersEngine class in
src/engines/transformers_engine.py(~1000 lines) - Support for text generation models (CausalLM, Seq2SeqLM)
- Support for embedding models (BERT, RoBERTa, sentence-transformers)
- Support for multimodal models (LLaVA, Qwen-VL)
- Dynamic VRAM management with LRU eviction for multiple loaded models
- bitsandbytes 4-bit/8-bit quantization support
- HuggingFace Hub download with configurable allowlist/blocklist
- Case-insensitive allowlist matching for model names
- On-demand model download for allowlisted Hub models
- Streaming text generation via TextIteratorStreamer
- Auto device selection (CUDA, MPS, CPU)
- Job routing via
platformfield in job data - 47 unit tests with 66% coverage
- Configuration in
config/default.yaml - Integration in
src/core/client.py - Documentation in
docs/engines/transformers.md - Silenced noisy third-party loggers (urllib3, filelock, etc.)
-
Removed deprecated
engine.typeconfig (useengine.availablelist) -
Fixed ReadTimeout Error for Long-Running Generations (2025-11-07)
- Fixed timeout handling in OllamaEngineV2 to prevent ReadTimeout errors
- ollama_v2.py:10: Added httpx import for granular timeout configuration
- ollama_v2.py:56-61: Configured httpx.Timeout with 120s read timeout between chunks
- ollama_v2.py:318-417: Refactored generate() to use internal streaming for batch jobs
- Batch jobs now internally stream from Ollama while returning complete results
- Set reasonable read timeout (120s between chunks) that detects actual stalls
- No need for artificial unlimited timeouts - as long as tokens flow, job continues
- Better error detection - quickly identifies when Ollama actually stalls
- Maintained connection timeout (30s), write timeout (30s), pool timeout (10s)
- Fixes "Generation failed: ReadTimeout:" errors on large models (e.g., deepseek-r1:70b)
- Jobs no longer fail prematurely during extended generation tasks
-
Worker can now handle models that require unlimited generation time
-
Automatic Version Management System (2025-11-04)
- Implemented fully automated version bumping on every git commit
- tools/bump_version.py: Python utility for version incrementing
- tools/git-hooks/pre-commit: Git hook for automatic PATCH version bump
- tools/install-git-hooks.sh: One-time setup script for git hooks
- version.py:15: Updated to version 0.1.0 (initial development)
- docs/VERSIONING.md: Comprehensive guide with automatic versioning instructions
- README.md:568-599: Added automatic version management section
- Every commit now auto-increments PATCH version (0.1.0 → 0.1.1)
- Manual bumps available for MINOR/MAJOR versions
- Zero-configuration after one-time hook installation
-
Established 0.x.x for development, 1.0.0 for first production release
-
Worker Version Tracking (2025-11-04)
- Added worker_version field to WorkerHeartbeat model
- models.py:368: Added worker_version as optional string field
- client.py:684: Heartbeat now includes version from version.py
- README.md:271: Updated heartbeat format example to include worker_version
- Worker version now reported in every heartbeat for tracking and debugging
- Server can track which worker versions are deployed in the fleet
-
Enables version-specific debugging and compatibility checks
-
Embed Job Execution Fix (2025-11-04)
- Fixed embed job execution to properly handle payload with "texts" field
- client.py:795: Extract job_type early to use in payload parsing logic
- client.py:805-807: Added special handling for job_type="embed" to preserve entire payload
- executor.py:89-91: Updated validation to accept dict input_data for embed/chat jobs
- executor.py:103-105: Store job_type in result metadata for proper output formatting
- executor.py:240-253: Extract "texts" field from payload dict for embedding generation
- server_client.py:1012-1026: Format output based on job_type (embeddings vs text)
- Fixes error: "Job has no input_data/prompt" when executing embed jobs
- Fixes output format: embeddings now properly structured as {"embeddings": [...], "finish_reason": "stop"}
-
Worker now fully supports embedding job execution with proper payload parsing and result formatting
-
Embedding Model Support Fix (2025-10-26)
- Fixed
load_model()method in both Ollama engines to handle embedding models - ollama.py:249: Added embedding model detection and proper test method
- ollama_v2.py:261: Added embedding model detection and proper test method
- Embedding models (containing "embed" in name) now tested with
generate_embeddings() - Non-embedding models continue to use
generate()test - Fixes error: "does not support generate" when loading embedding models
-
Worker now fully supports embedding models like qwen3-embedding:8b, nomic-embed-text, etc.
-
Test Suite Fixes (2025-10-14)
- Fixed all 6 failing tests after resilient server communication implementation
- test_worker_registration: Updated for new WorkerRegistration schema without supported_models
- test_required_fields: Changed to test ModelCapability instead of Job model
- test_download_update_success: Fixed async context manager mock for streaming
- test_periodic_update_check: Fixed async task cancellation pattern
- test_default_config_loading: Fixed config path and expected values
- test_nested_config_access: Added handling for string vs int config values
-
All 97 tests now passing successfully
-
Resilient Server Communication (2025-10-14)
- Worker now handles server unavailability gracefully without crashing
- Registration retry: Added automatic retry with exponential backoff for network errors
- Continuous operation: Heartbeat and job polling loops continue even when server is down
- Server availability tracking: New
_server_unavailable_sinceand_last_server_successtracking - Helper methods: Added
_mark_server_success(),_mark_server_failure(),_check_server_unavailability() - Graceful shutdown: Worker exits gracefully after configurable max unavailability (default: 5 minutes)
- Server availability monitor: New background task monitors server status periodically
- Smart recovery: Automatically resumes when server comes back online with logging
- Configuration: Added
server.retry.*andserver.unavailable.*config options -
Workers now survive temporary server outages and network issues
-
Model Platform Tracking (2025-10-13)
- Worker now reports which inference platform hosts each model
- Registration simplified: Removed
supported_modelsfield (server gets updates via heartbeat only) - Heartbeat includes platform info in
current_modelsfield with model+platform objects - Added
ModelWithPlatformPydantic model for structured platform data - Platform auto-detection based on engine type configuration
- Support for 7 standard platforms plus custom platform type
- Helper method
_models_to_platform_format()for converting model lists - Backward compatible Union type supports both old and new formats
- Enables server-side platform-specific job routing and performance tracking
-
Model updates are now fully dynamic via heartbeat system
-
Runtime Token Refresh (2025-10-11)
- Fixed authentication failures during worker runtime (heartbeat, job polling, etc.)
- Enhanced
_make_request()with automatic token refresh on 401 errors - Added transparent request retry after successful token refresh
- Prevents infinite retry loops with
retry_authflag - Enhanced error logging in heartbeat and job polling loops for authentication failures
- Workers now seamlessly handle token expiration during long-running sessions
-
No service interruption when tokens expire - automatic recovery
-
Configuration Loading Fix (2025-10-11)
- Fixed "Config file not found" warnings on production workers
- WorkerConfig now respects
MICRODC_CONFIGenvironment variable - Added
config/worker.yamlto search paths (used by ubuntu_setup.sh) - Proper configuration loading for systemd service installations
-
Production workers now correctly load configuration from
/srv/microdcworker/config/worker.yaml -
Automatic Token Refresh (2025-10-09)
- Fixed token expiration handling for long-running workers
- Added proper calculation of token expiration time from
expires_infield - Implemented automatic credential refresh using saved
secret_key - Added
_refresh_credentials()method in ServerClient to handle token renewal - Workers now save
expires_at,refresh_token, andsecret_keywith credentials - Enhanced error messages for expired bootstrap tokens with clear renewal instructions
-
Workers can now run indefinitely without "Invalid or expired token" errors
-
Multimodal Support (2025-10-04)
- Added llm_interaction_type field to Job model (generation vs chat)
- Added input_modalities and output_modalities fields to Job model
- Added job_type field to Job model
- Enhanced job executor logging with multimodal information
- Updated client job claim parsing to extract multimodal fields
- Enhanced job claim logging with modality and interaction type details
- Implemented routing in job executor based on interaction type
- Added _prepare_chat_messages() helper for chat format conversion
- Implemented embedding job support with JSON result formatting
- Handle chat messages in payload for proper message extraction
-
Full compatibility with server WORKER_CHANGE_REQUEST.md specifications
-
Auto-Update System (2025-09-23)
- Created WorkerAutoUpdater class with comprehensive update management
- Implemented version checking with server endpoints (/api/v1/version/)
- Added automatic download with progress tracking
- Created backup and rollback functionality
- Integrated graceful shutdown for running jobs
- Added maintenance window support for controlled updates
- Created platform-specific update scripts for Linux/macOS/Windows
- Implemented update configuration in default.yaml
- Added comprehensive test coverage for update functionality
-
Integrated auto-updater with main worker client lifecycle
-
Enhanced Heartbeat System (2025-09-22)
- Added SystemMetricsCollector class for comprehensive metrics collection
- Enhanced WorkerHeartbeat model with SystemMetrics field
- Implemented collection of: load average, CPU count, memory metrics, disk metrics, GPU metrics, network I/O, uptime
- Added temperature monitoring (CPU and GPU when available)
- Dynamic model reporting in current_models field
- Backward compatible with legacy heartbeat format
-
Tested on macOS with Apple Silicon (MPS GPU support)
-
Dynamic Model Reporting (2025-09-22)
- Heartbeat now fetches fresh model list from engine on each cycle
- Added refresh_models() method for silent registry updates
- Implemented model_refresh_loop() for periodic registry synchronization
- Model registry cleared and repopulated on refresh to prevent stale entries
- Fallback to cached registry if engine query fails
- Configurable refresh interval (default: 5 minutes)
- Successfully tested with 31 Ollama models
Next Steps 📋¶
- Monitor and optimize job completion success rate
- Add metrics collection for job processing performance
- Implement adaptive retry strategies based on error types
- Add vLLM inference engine support
- Add Prometheus metrics export for monitoring
- Implement distributed inference support for large models
Notes¶
Core Implementation ✅¶
- All Phase 1 core implementation completed successfully
- Ollama is fully integrated as the primary inference engine
- Async/await patterns used throughout for optimal concurrency
- Comprehensive error handling with retry logic implemented
- Structured logging with configurable outputs and detailed job logging
- Pluggable architecture ready for additional engines
API Compatibility Updates ✅¶
- Server Schema Changes: Worker updated to match new server API without capabilities field
- Normalized Tables: Hardware specs and supported models sent for server's normalized database storage
- Heartbeat Format: Correctly wraps status object and includes supported_models list
- Parameter Mapping: Properly converts generic parameters to engine-specific ones (e.g., max_tokens → num_predict)
Job Processing ✅¶
- Claim-based system: Atomic job claiming prevents race conditions
- Flexible payload handling: Supports both "input" and "prompt" fields from server
- Token usage: Always includes required
tokens_usedfield in completions - Race condition fix: 0.5s delay before result submission prevents "Assignment not active" errors
- Error reporting: All job failures properly reported to server with detailed error messages
Authentication & Security ✅¶
- Credential persistence: Saves credentials after registration for reuse
- Automatic token refresh: Transparently refreshes expired tokens using saved secret_key
- Token expiration handling: Properly calculates and tracks token expiration times
- Clean error handling: Worker exits cleanly on authentication failures
- Bearer token auth: Proper implementation of bearer token authentication
- Bootstrap token: Initial registration uses one-time bootstrap token
- Enhanced error messages: Clear guidance when bootstrap tokens expire
Testing & Quality¶
- Unit tests provide good coverage for individual components
- Ready for integration testing with actual MicroDC server
- Comprehensive logging for debugging and monitoring