MicroDC.ai Worker Client - Development TODO¶

CI/CD & Infrastructure ✅ (2026-02-13)¶

Dockerfile ✅¶

[x] Created multi-stage Dockerfile based on nvidia/cuda:12.2.2-runtime-ubuntu22.04
[x] Builder stage installs Python 3.11 and pip dependencies in a venv
[x] Runtime stage copies only the venv and source code for smaller image
[x] Configuration via environment variables (MICRODC_API_KEY, MICRODC_SERVER_URL, etc.)
[x] Entrypoint set to python -m src.core.cli
[x] Added .dockerignore to exclude .git, venv, tests, docs, caches, models

GitLab CI Pipeline ✅¶

[x] Created .gitlab-ci.yml with 4 stages: test, lint, build, deploy
[x] unit-tests job: runs pytest on Python 3.11
[x] lint job: runs black --check, isort --check, ruff check
[x] docker-build job: builds and pushes image to GitLab Container Registry (main only)
[x] pages job: builds MkDocs site and deploys to GitLab Pages (main only)
[x] All jobs tagged with aisrv05-docker
[x] Pip cache configured for faster CI runs

MkDocs Documentation Site ✅¶

[x] Created mkdocs.yml with Material theme (dark/light toggle)
[x] Created requirements-docs.txt (mkdocs + mkdocs-material)
[x] Created docs/index.md landing page adapted from README
[x] Nav structure covers all existing docs (setup, architecture, engines, API, etc.)
[x] GitLab Pages deployment configured in CI pipeline

New Features ✅ (2025-11-21)¶

Multimodal File Support ✅¶

[x] Added attached_files field to Job model (models.py:284-287)
[x] Implemented file download method in ServerClient (server_client.py:978-1010)
[x] Updated client.py to pass attached_files to Job objects (client.py:1012)
[x] Added file download and base64 encoding in JobExecutor (executor.py:333-377)
[x] Updated OllamaEngineV2.generate() to accept images parameter (ollama_v2.py:329, 365-366, 383, 404)
[x] Added logging for attached files in job details (client.py:1050-1053)
[x] Support for vision models (qwen2.5vl:7b, llava, bakllava, etc.)
[x] Support for document processing models (docling)
[x] Automatic base64 encoding for image files
[x] Document file handling for processing engines (executor.py:404-432)
[x] Job routing based on job_type first, not model_id (executor.py:384-396)
[x] Case-insensitive job_type handling (executor.py:164, 380)
[x] Relaxed input_data validation for document jobs with attached files (executor.py:173-178)
[x] Graceful handling of unsupported file types

Bug Fixes ✅ (2025-11-21)¶

Configuration Type Casting Fixes ✅¶

[x] Fixed multiple "'<' not supported between instances of 'int' and 'str'" errors
[x] Root cause: Environment variable substitution in YAML returns strings, not ints
[x] Fixed job queue max_size comparison (client.py:287)
[x] Fixed docling max_file_size_mb comparison (docling_engine.py:51)
[x] Fixed docling timeout_seconds (docling_engine.py:52)
[x] Enhanced priority validator to handle None values (models.py:289)
[x] Added explicit int() cast for all numeric priority values (models.py:299)
[x] Changed priority default from string "normal" to integer 5 (client.py:971)

Ollama API Parameter Handling Fix ✅¶

[x] Fixed TypeError when passing None values for optional parameters
[x] Changed from passing images=None to conditionally including images parameter
[x] Build parameter dictionaries dynamically (ollama_v2.py:378-415)
[x] Only include options/images when they have actual values

Embed Job Payload Compatibility Fix ✅¶

[x] Fixed embed job execution to accept both texts and input field formats
[x] Server sends input field in payload, worker expected texts
[x] Updated executor.py:367 to try both fields with fallback
[x] Maintains compatibility with both payload formats

Test Suite CI/CD Compatibility Fix ✅¶

[x] Fixed test failures in automation/CI environments (test_config.py)
[x] Tests were failing due to MICRODC_API_KEY environment variable overriding config
[x] Added @patch.dict decorator to isolate test environment (test_config.py:30, 58)
[x] Clear MICRODC_* environment variables at start of affected tests
[x] Both failing tests now pass consistently in automation environments
[x] All 115 tests now pass reliably in CI/CD pipelines

Heartbeat Model Reporting Enhancement ✅¶

[x] Added processing engine models (e.g., docling) to heartbeat current_models field
[x] Worker now reports both inference engine models (Ollama) and processing engine models
[x] Added timeout protection (1.0s) for processing engine model listing
[x] Enhanced debug logging to distinguish between inference and processing engines
[x] Server now receives complete model inventory including docling
[x] Added get_platform_name() method to Engine base class
[x] DoclingEngine reports platform as "docling" (not "ollama")
[x] Each engine reports its own platform name for accurate server tracking
[x] Updated run.sh and run_prod.sh to display processing engines in configuration output

Document Processing Integration ✅ (2025-11-21)¶

Docling Integration ✅¶

[x] Added Docling dependency to requirements.txt
[x] Created DocumentProcessor class in src/processors/
[x] Implemented support for PDF, DOCX, PPTX, XLSX, HTML, images, audio
[x] Added URL, base64, and path input handling
[x] Implemented multiple output formats (markdown, HTML, JSON, doctags)
[x] Added OCR and table extraction capabilities
[x] Implemented file size limits and timeout handling
[x] Added comprehensive error handling and validation

Surya OCR Integration ✅ (2025-12-09)¶

[x] Created SuryaProcessor class in src/processors/surya_processor.py
[x] Created SuryaEngine class in src/engines/surya_engine.py
[x] Implemented support for PDF and image files (PNG, JPEG, TIFF, BMP, WEBP)
[x] Added OCR text recognition with 90+ language support
[x] Implemented layout detection feature
[x] Implemented table recognition feature
[x] Added URL, base64, and path input handling
[x] Implemented multiple output formats (markdown, JSON, text)
[x] Added file size limits and timeout handling
[x] Added comprehensive error handling and validation
[x] Added configuration to default.yaml
[x] Registered engine in client.py
[x] Updated job executor to route by model_id (not just job_type)
[x] Auto-detect device (cuda/mps/cpu) for optimal performance
[x] Updated to new Surya API (Predictor classes)
[x] Updated documentation (README.md, FUTURE_ENGINES.md, TODO.md)

Engine Architecture Refactoring ✅¶

[x] Created Engine base class and EngineType/EngineCapability enums
[x] Created ProcessingEngine abstract class for non-inference engines
[x] Refactored InferenceEngine to inherit from Engine base
[x] Implemented DoclingEngine as ProcessingEngine
[x] Added model registration system for processing engines
[x] Implemented model-based routing (jobs route by model_id, not job_type)
[x] Added lazy model lookup for processing engines
[x] Updated JobExecutor to support List[Engine] for processing engines
[x] Implemented _get_processing_engine_for_model() helper method
[x] Fixed event loop handling for async model registration
[x] Updated all tests for model-based routing (17/17 passing)
[x] Updated WorkerClient to pass processing engines as list
[x] Fixed capability registration to iterate over list

Job Routing Changes ✅¶

[x] Changed from job_type-based routing to model_id-based routing
[x] Jobs now specify model_id="docling" to route to DoclingEngine
[x] Supports multiple engines of same type (e.g., model_id="tesseract" vs "easyocr")
[x] All jobs now require model_id for routing
[x] Processing engine models skip model loading (always available)
[x] Inference engine models follow normal load/unload lifecycle

Configuration ✅¶

[x] Added engine.processing.docling section to config/default.yaml
[x] Moved configuration under engine section for consistency
[x] Implemented environment variable support for all engine settings
[x] Added configurable max file size, timeout
[x] Implemented temp directory configuration
[x] Updated WorkerClient to read from engine.processing.docling path
[x] Cleaned up default.yaml to remove unimplemented engines (vLLM, Transformers)
[x] Created docs/FUTURE_ENGINES.md for planned engine configurations
[x] Updated README.md to reference FUTURE_ENGINES.md
[x] Removed unused config sections (performance, security, development)
[x] Removed unused config values (metrics_interval, model loading strategy)
[x] Simplified config to only show what's actually implemented

Testing ✅¶

[x] Created comprehensive test suite (tests/test_document_processor.py)
[x] Unit tests for DocumentProcessor class
[x] Integration tests for JobExecutor document handling
[x] Tests for error conditions and edge cases
[x] Tests for JobParameters document field validation

Documentation ✅¶

[x] Created DOCUMENT_PROCESSING_INTEGRATION.md guide
[x] Documented server-side changes needed
[x] Documented Python client integration requirements
[x] Updated README.md with document processing section
[x] Added document processing to feature list
[x] Updated project structure diagram

Production Readiness ✅ (2025-11-14)¶

Long-Running Job Heartbeat Fix ✅¶

[x] Fixed heartbeat blocking during long-running jobs
[x] Added 5s timeout to engine.list_models() in heartbeat loop
[x] Added 3s timeout to system metrics collection with asyncio.to_thread()
[x] Added 15s timeout to heartbeat send operation
[x] Implemented fallback to cached model registry on timeout
[x] Heartbeats now guaranteed to send every 30s even during 10min+ jobs
[x] Job progress included in heartbeat to reset server-side timeout

Code Quality and Standards ✅¶

[x] Fixed all ruff linting errors (14 errors resolved)
[x] Updated pyproject.toml to modern ruff configuration format
[x] Replaced all bare except clauses with specific Exception handling
[x] Fixed unused variable warnings
[x] All 115 tests passing (updated from 98 after multimodal/document features)
[x] Set up Black for code formatting (v24.10.0) - available in dev extras
[x] Set up Ruff for linting (v0.9.1) - configured in pyproject.toml
[x] Set up MyPy for type checking (v1.14.0) - available in dev extras
[x] Configure pre-commit hooks (2025-11-23)
[x] Created .pre-commit-config.yaml with 10 hooks
[x] Configured Black (line-length=100)
[x] Configured isort (profile=black)
[x] Configured Ruff (auto-fix enabled, excludes notebooks)
[x] Configured mypy (excludes tests/examples/notebooks)
[x] Configured bandit for security checks
[x] Configured markdownlint
[x] Configured YAML formatter (pretty-format-yaml)
[x] Added python-safety-dependencies-check
[x] Configured general file checks (trailing-whitespace, end-of-file-fixer, etc.)
[x] Install pre-commit: pip install pre-commit && pre-commit install
[x] Run hooks: pre-commit run --all-files
[x] Fixed all Bandit security issues (2025-11-24)
[x] Fixed 2 high-severity tarfile extraction vulnerabilities (B202)
[x] Replaced unsafe os.system() calls with subprocess.run()
[x] Replaced 13 empty except:pass blocks with proper error handling
[x] Replaced 4 assert statements with RuntimeError exceptions
[x] Added #nosec comments for legitimate subprocess usage
[x] All security checks now pass with 0 issues
[ ] Add type hints to all functions (in progress - core modules have hints)
[ ] Achieve >90% test coverage (currently 37% overall - 4547 statements, 2880 uncovered)
High coverage modules: api/models.py (98%), core/exceptions.py (100%), core/config.py (78%)
Medium coverage: engines/base.py (72%), jobs/monitor.py (87%), models/registry.py (72%)
Low coverage: cli.py (0%), client.py (0%), server_client.py (10%) - mostly integration code
Focus areas: Increase coverage for job executor (47%), ollama engines, document processor (74%)

Documentation and File Management ✅¶

[x] Added docs/AUTHENTICATION.md to version control
[x] Added docs/CODE_QUALITY.md for development tools and standards (2025-11-23)
[x] Created CONTRIBUTING.md with contributor guidelines and workflow (2025-11-23)
[x] Removed deprecated documentation files (SETUP.md, WORKER_CHANGE_REQUEST.md)
[x] Removed unused example_usage.py file
[x] Moved test_max_tokens.py to tests/ directory with proper pytest decorator
[x] Updated README.md with code quality tools section and contributor guide

TODO Comments Resolution ✅¶

[x] Replaced all TODO comments with explanatory notes
[x] Documented that log rotation is handled by systemd
[x] Noted signature verification is future enhancement (checksums provide integrity)
[x] Clarified credential validation handled by ServerClient
[x] Updated daemon/log viewing commands to reference systemd

Project Hygiene ✅¶

[x] Enhanced .gitignore with comprehensive Python patterns
[x] Cleaned all build artifacts (pycache, *.pyc, .DS_Store)
[x] All changes staged and ready for commit

Phase 1: Core Implementation 🚀 ✅¶

Project Setup ✅¶

[x] Set up project structure and core directories
[x] Create requirements.txt and setup.py
[x] Create default configuration file (config/default.yaml)

Core Components ✅¶

[x] Implement core configuration management (config.py)
[x] Create Pydantic models for API communication
[x] Implement abstract InferenceEngine base class
[x] Build Ollama engine integration

Server Communication ✅¶

[x] Create server API client for communication
[x] Implement worker registration flow
[x] Build model discovery and capability reporting
[x] Implement heartbeat mechanism

Job Processing ✅¶

[x] Create job executor for processing inference jobs
[x] Build job queue management system
[x] Implement system resource monitoring utilities

Client Orchestration ✅¶

[x] Create main client orchestrator
[x] Add CLI interface with commands
[x] Set up logging and health checks

Phase 2: Testing & Robustness 🛡️¶

Testing¶

[x] Write unit tests for core components
[x] Config tests (tests/test_config.py)
[x] Model management tests (tests/test_models.py)
[x] Job queue tests (tests/test_jobs.py)
[x] System monitor tests (tests/test_system.py)
[x] API model validation tests (tests/test_api_models.py)
[ ] Test end-to-end worker registration and job execution
[ ] Integration tests with mock server

Error Handling ✅¶

[x] Add error handling and retry logic
[x] Implement graceful shutdown
[x] Add resource limits enforcement

GPU Support ✅¶

[x] NVIDIA GPU detection via nvidia-ml-py
[x] Apple Metal Performance Shaders (MPS) detection (requires PyTorch)
[x] Unified memory tracking for Apple Silicon
[x] GPU capability reporting in system info
[x] PyTorch dependency added to GPU extras for MPS support

Development Tools ✅¶

[x] Development run script (tools/run.sh)
[x] Test script with interactive menu (tools/test_worker.sh)
[x] Makefile for common tasks
[x] Watch mode for auto-restart
[x] Interactive shell mode for debugging

Phase 3: Additional Features (Future) 🔮¶

Engine Support¶

[ ] vLLM integration
[x] HuggingFace Transformers integration (2025-12-15)
[x] Created TransformersEngine class inheriting from InferenceEngine
[x] Support for text generation, embeddings, and multimodal models
[x] Dynamic VRAM management with LRU model eviction
[x] bitsandbytes 4-bit/8-bit quantization support
[x] HuggingFace Hub download with configurable allowlist/blocklist
[x] Case-insensitive allowlist matching
[x] On-demand model download for allowlisted Hub models
[x] Streaming text generation via TextIteratorStreamer
[x] Auto device selection (CUDA, MPS, CPU)
[x] Job platform routing via platform field
[x] 47 unit tests with 66% coverage
[ ] Custom engine plugin system

Advanced Features¶

[x] Auto-Update System (2025-09-23)
[x] Version checking against server requirements
[x] Automatic download and installation of updates
[x] Graceful shutdown before updates
[x] Rollback capability on update failure
[x] Maintenance window support
[x] Platform-specific update scripts (Linux/macOS/Windows)
[x] Model auto-pulling on demand (Server-Initiated Model Downloads)
[x] Added PendingDownloadRequest, DownloadResponseType Pydantic models
[x] Added model_downloads config section with enable toggle, allowlists, hardware thresholds
[x] Created ModelDownloadManager for download orchestration with hardware validation
[x] Integrated with heartbeat response pending_download_requests field
[x] API endpoints: respond, progress, complete (GET/POST /api/v1/workers/download-requests/)
[x] Hardware compatibility checking (estimated_size_gb, required_vram_gb, required_ram_gb)
[x] Platform-specific allowlist/blocklist support with wildcard patterns
[x] Automatic model existence check before downloading
[x] Resume detection via GET /api/v1/workers/active-downloads on startup
[ ] Multi-GPU support
[ ] Job prioritization (basic priority support already implemented)
[ ] Result caching
[ ] Distributed inference support

Code Cleanup & Refactoring ✅¶

[x] Resolve duplicate class definitions (GPUInfo, CPUInfo classes are correctly separated - internal dataclasses with to_api_model converters)
[x] Remove unused exception classes (removed ConfigurationError)
[x] Fix placeholder code (removed "not yet implemented" message in logging.py)
[x] Fix linting issues (resolved unused variables and bare except clauses)
[x] Enhanced error messages for missing API key configuration
[x] Add parameter tracking for max_tokens → num_predict conversion
[x] Create test utilities for parameter verification
[ ] Review and remove/document unused API models (kept as they define server API contract)
[ ] Clean up unused system utility functions (many are CLI entry points or public APIs)
[ ] Add tests for public APIs to prevent accidental removal
[ ] Document which "unused" functions are actually public APIs or future hooks

Documentation¶

[ ] API documentation
[x] Deployment guide
[x] Created docs/setup/UBUNTU_SETUP.md - comprehensive Ubuntu/systemd installation guide
[x] Created docs/setup/WINDOWS_SETUP.md - Windows installation guide with service options
[ ] Configuration reference
[ ] Troubleshooting guide

Operations¶

[ ] Prometheus metrics export
[ ] OpenTelemetry tracing
[ ] Performance profiling
[ ] Auto-scaling support

Current Status 📊¶

Completed ✅¶

All core infrastructure and main components
Ollama engine integration with full API support
Complete server communication layer with retry logic
Comprehensive resource monitoring (CPU, GPU, Memory, Storage)
Model lifecycle management with loading strategies
Job processing pipeline with queue management
Rich CLI interface with all essential commands
Error handling with custom exceptions
Unit test suite covering core components
Claim-based job assignment system implementation
Bearer token authentication with credential persistence
Automatic credential reuse on worker restart
Support for both "input" and "prompt" fields in job payload
Proper handling of required tokens_used field in job completion
Race condition mitigation with completion delay
Detailed server response logging for debugging
Updated to match new server API (removed capabilities field, uses normalized tables)
Fixed heartbeat format to include status object and supported_models list
Enhanced error messaging for missing API key configuration
Parameter tracking and logging for max_tokens → num_predict conversion
Code cleanup: removed unused code, fixed linting issues
Test utilities for parameter verification
Multimodal support: Job model and client support for llm_interaction_type and modalities
Model platform tracking: Reports inference platform (ollama, vLLM, etc.) for each model in heartbeat (registration simplified to not include models)
Resilient server communication: Worker survives temporary server outages with automatic retry and graceful recovery

In Progress 🚧¶

Integration testing with real MicroHub server
Performance optimization for high-throughput scenarios

Recently Completed ✅¶

HuggingFace Transformers Engine (2025-12-15)
Created TransformersEngine class in src/engines/transformers_engine.py (~1000 lines)
Support for text generation models (CausalLM, Seq2SeqLM)
Support for embedding models (BERT, RoBERTa, sentence-transformers)
Support for multimodal models (LLaVA, Qwen-VL)
Dynamic VRAM management with LRU eviction for multiple loaded models
bitsandbytes 4-bit/8-bit quantization support
HuggingFace Hub download with configurable allowlist/blocklist
Case-insensitive allowlist matching for model names
On-demand model download for allowlisted Hub models
Streaming text generation via TextIteratorStreamer
Auto device selection (CUDA, MPS, CPU)
Job routing via platform field in job data
47 unit tests with 66% coverage
Configuration in config/default.yaml
Integration in src/core/client.py
Documentation in docs/engines/transformers.md
Silenced noisy third-party loggers (urllib3, filelock, etc.)
Removed deprecated engine.type config (use engine.available list)
Fixed ReadTimeout Error for Long-Running Generations (2025-11-07)
Fixed timeout handling in OllamaEngineV2 to prevent ReadTimeout errors
ollama_v2.py:10: Added httpx import for granular timeout configuration
ollama_v2.py:56-61: Configured httpx.Timeout with 120s read timeout between chunks
ollama_v2.py:318-417: Refactored generate() to use internal streaming for batch jobs
Batch jobs now internally stream from Ollama while returning complete results
Set reasonable read timeout (120s between chunks) that detects actual stalls
No need for artificial unlimited timeouts - as long as tokens flow, job continues
Better error detection - quickly identifies when Ollama actually stalls
Maintained connection timeout (30s), write timeout (30s), pool timeout (10s)
Fixes "Generation failed: ReadTimeout:" errors on large models (e.g., deepseek-r1:70b)
Jobs no longer fail prematurely during extended generation tasks
Worker can now handle models that require unlimited generation time
Automatic Version Management System (2025-11-04)
Implemented fully automated version bumping on every git commit
tools/bump_version.py: Python utility for version incrementing
tools/git-hooks/pre-commit: Git hook for automatic PATCH version bump
tools/install-git-hooks.sh: One-time setup script for git hooks
version.py:15: Updated to version 0.1.0 (initial development)
docs/VERSIONING.md: Comprehensive guide with automatic versioning instructions
README.md:568-599: Added automatic version management section
Every commit now auto-increments PATCH version (0.1.0 → 0.1.1)
Manual bumps available for MINOR/MAJOR versions
Zero-configuration after one-time hook installation
Established 0.x.x for development, 1.0.0 for first production release
Worker Version Tracking (2025-11-04)
Added worker_version field to WorkerHeartbeat model
models.py:368: Added worker_version as optional string field
client.py:684: Heartbeat now includes version from version.py
README.md:271: Updated heartbeat format example to include worker_version
Worker version now reported in every heartbeat for tracking and debugging
Server can track which worker versions are deployed in the fleet
Enables version-specific debugging and compatibility checks
Embed Job Execution Fix (2025-11-04)
Fixed embed job execution to properly handle payload with "texts" field
client.py:795: Extract job_type early to use in payload parsing logic
client.py:805-807: Added special handling for job_type="embed" to preserve entire payload
executor.py:89-91: Updated validation to accept dict input_data for embed/chat jobs
executor.py:103-105: Store job_type in result metadata for proper output formatting
executor.py:240-253: Extract "texts" field from payload dict for embedding generation
server_client.py:1012-1026: Format output based on job_type (embeddings vs text)
Fixes error: "Job has no input_data/prompt" when executing embed jobs
Fixes output format: embeddings now properly structured as {"embeddings": [...], "finish_reason": "stop"}
Worker now fully supports embedding job execution with proper payload parsing and result formatting
Embedding Model Support Fix (2025-10-26)
Fixed load_model() method in both Ollama engines to handle embedding models
ollama.py:249: Added embedding model detection and proper test method
ollama_v2.py:261: Added embedding model detection and proper test method
Embedding models (containing "embed" in name) now tested with generate_embeddings()
Non-embedding models continue to use generate() test
Fixes error: "does not support generate" when loading embedding models
Worker now fully supports embedding models like qwen3-embedding:8b, nomic-embed-text, etc.
Test Suite Fixes (2025-10-14)
Fixed all 6 failing tests after resilient server communication implementation
test_worker_registration: Updated for new WorkerRegistration schema without supported_models
test_required_fields: Changed to test ModelCapability instead of Job model
test_download_update_success: Fixed async context manager mock for streaming
test_periodic_update_check: Fixed async task cancellation pattern
test_default_config_loading: Fixed config path and expected values
test_nested_config_access: Added handling for string vs int config values
All 97 tests now passing successfully
Resilient Server Communication (2025-10-14)
Worker now handles server unavailability gracefully without crashing
Registration retry: Added automatic retry with exponential backoff for network errors
Continuous operation: Heartbeat and job polling loops continue even when server is down
Server availability tracking: New _server_unavailable_since and _last_server_success tracking
Helper methods: Added _mark_server_success(), _mark_server_failure(), _check_server_unavailability()
Graceful shutdown: Worker exits gracefully after configurable max unavailability (default: 5 minutes)
Server availability monitor: New background task monitors server status periodically
Smart recovery: Automatically resumes when server comes back online with logging
Configuration: Added server.retry.* and server.unavailable.* config options
Workers now survive temporary server outages and network issues
Model Platform Tracking (2025-10-13)
Worker now reports which inference platform hosts each model
Registration simplified: Removed supported_models field (server gets updates via heartbeat only)
Heartbeat includes platform info in current_models field with model+platform objects
Added ModelWithPlatform Pydantic model for structured platform data
Platform auto-detection based on engine type configuration
Support for 7 standard platforms plus custom platform type
Helper method _models_to_platform_format() for converting model lists
Backward compatible Union type supports both old and new formats
Enables server-side platform-specific job routing and performance tracking
Model updates are now fully dynamic via heartbeat system
Runtime Token Refresh (2025-10-11)
Fixed authentication failures during worker runtime (heartbeat, job polling, etc.)
Enhanced _make_request() with automatic token refresh on 401 errors
Added transparent request retry after successful token refresh
Prevents infinite retry loops with retry_auth flag
Enhanced error logging in heartbeat and job polling loops for authentication failures
Workers now seamlessly handle token expiration during long-running sessions
No service interruption when tokens expire - automatic recovery
Configuration Loading Fix (2025-10-11)
Fixed "Config file not found" warnings on production workers
WorkerConfig now respects MICRODC_CONFIG environment variable
Added config/worker.yaml to search paths (used by ubuntu_setup.sh)
Proper configuration loading for systemd service installations
Production workers now correctly load configuration from /srv/microdcworker/config/worker.yaml
Automatic Token Refresh (2025-10-09)
Fixed token expiration handling for long-running workers
Added proper calculation of token expiration time from expires_in field
Implemented automatic credential refresh using saved secret_key
Added _refresh_credentials() method in ServerClient to handle token renewal
Workers now save expires_at, refresh_token, and secret_key with credentials
Enhanced error messages for expired bootstrap tokens with clear renewal instructions
Workers can now run indefinitely without "Invalid or expired token" errors
Multimodal Support (2025-10-04)
Added llm_interaction_type field to Job model (generation vs chat)
Added input_modalities and output_modalities fields to Job model
Added job_type field to Job model
Enhanced job executor logging with multimodal information
Updated client job claim parsing to extract multimodal fields
Enhanced job claim logging with modality and interaction type details
Implemented routing in job executor based on interaction type
Added _prepare_chat_messages() helper for chat format conversion
Implemented embedding job support with JSON result formatting
Handle chat messages in payload for proper message extraction
Full compatibility with server WORKER_CHANGE_REQUEST.md specifications
Auto-Update System (2025-09-23)
Created WorkerAutoUpdater class with comprehensive update management
Implemented version checking with server endpoints (/api/v1/version/)
Added automatic download with progress tracking
Created backup and rollback functionality
Integrated graceful shutdown for running jobs
Added maintenance window support for controlled updates
Created platform-specific update scripts for Linux/macOS/Windows
Implemented update configuration in default.yaml
Added comprehensive test coverage for update functionality
Integrated auto-updater with main worker client lifecycle
Enhanced Heartbeat System (2025-09-22)
Added SystemMetricsCollector class for comprehensive metrics collection
Enhanced WorkerHeartbeat model with SystemMetrics field
Implemented collection of: load average, CPU count, memory metrics, disk metrics, GPU metrics, network I/O, uptime
Added temperature monitoring (CPU and GPU when available)
Dynamic model reporting in current_models field
Backward compatible with legacy heartbeat format
Tested on macOS with Apple Silicon (MPS GPU support)
Dynamic Model Reporting (2025-09-22)
Heartbeat now fetches fresh model list from engine on each cycle
Added refresh_models() method for silent registry updates
Implemented model_refresh_loop() for periodic registry synchronization
Model registry cleared and repopulated on refresh to prevent stale entries
Fallback to cached registry if engine query fails
Configurable refresh interval (default: 5 minutes)
Successfully tested with 31 Ollama models

Next Steps 📋¶

Monitor and optimize job completion success rate
Add metrics collection for job processing performance
Implement adaptive retry strategies based on error types
Add vLLM inference engine support
Add Prometheus metrics export for monitoring
Implement distributed inference support for large models

Notes¶

Core Implementation ✅¶

All Phase 1 core implementation completed successfully
Ollama is fully integrated as the primary inference engine
Async/await patterns used throughout for optimal concurrency
Comprehensive error handling with retry logic implemented
Structured logging with configurable outputs and detailed job logging
Pluggable architecture ready for additional engines

API Compatibility Updates ✅¶

Server Schema Changes: Worker updated to match new server API without capabilities field
Normalized Tables: Hardware specs and supported models sent for server's normalized database storage
Heartbeat Format: Correctly wraps status object and includes supported_models list
Parameter Mapping: Properly converts generic parameters to engine-specific ones (e.g., max_tokens → num_predict)

Job Processing ✅¶

Claim-based system: Atomic job claiming prevents race conditions
Flexible payload handling: Supports both "input" and "prompt" fields from server
Token usage: Always includes required tokens_used field in completions
Race condition fix: 0.5s delay before result submission prevents "Assignment not active" errors
Error reporting: All job failures properly reported to server with detailed error messages

Authentication & Security ✅¶

Credential persistence: Saves credentials after registration for reuse
Automatic token refresh: Transparently refreshes expired tokens using saved secret_key
Token expiration handling: Properly calculates and tracks token expiration times
Clean error handling: Worker exits cleanly on authentication failures
Bearer token auth: Proper implementation of bearer token authentication
Bootstrap token: Initial registration uses one-time bootstrap token
Enhanced error messages: Clear guidance when bootstrap tokens expire

Testing & Quality¶

Unit tests provide good coverage for individual components
Ready for integration testing with actual MicroDC server
Comprehensive logging for debugging and monitoring