Skip to content

MicroDC.ai Worker Client - Development TODO

CI/CD & Infrastructure ✅ (2026-02-13)

Dockerfile ✅

  • [x] Created multi-stage Dockerfile based on nvidia/cuda:12.2.2-runtime-ubuntu22.04
  • [x] Builder stage installs Python 3.11 and pip dependencies in a venv
  • [x] Runtime stage copies only the venv and source code for smaller image
  • [x] Configuration via environment variables (MICRODC_API_KEY, MICRODC_SERVER_URL, etc.)
  • [x] Entrypoint set to python -m src.core.cli
  • [x] Added .dockerignore to exclude .git, venv, tests, docs, caches, models

GitLab CI Pipeline ✅

  • [x] Created .gitlab-ci.yml with 4 stages: test, lint, build, deploy
  • [x] unit-tests job: runs pytest on Python 3.11
  • [x] lint job: runs black --check, isort --check, ruff check
  • [x] docker-build job: builds and pushes image to GitLab Container Registry (main only)
  • [x] pages job: builds MkDocs site and deploys to GitLab Pages (main only)
  • [x] All jobs tagged with aisrv05-docker
  • [x] Pip cache configured for faster CI runs

MkDocs Documentation Site ✅

  • [x] Created mkdocs.yml with Material theme (dark/light toggle)
  • [x] Created requirements-docs.txt (mkdocs + mkdocs-material)
  • [x] Created docs/index.md landing page adapted from README
  • [x] Nav structure covers all existing docs (setup, architecture, engines, API, etc.)
  • [x] GitLab Pages deployment configured in CI pipeline

New Features ✅ (2025-11-21)

Multimodal File Support ✅

  • [x] Added attached_files field to Job model (models.py:284-287)
  • [x] Implemented file download method in ServerClient (server_client.py:978-1010)
  • [x] Updated client.py to pass attached_files to Job objects (client.py:1012)
  • [x] Added file download and base64 encoding in JobExecutor (executor.py:333-377)
  • [x] Updated OllamaEngineV2.generate() to accept images parameter (ollama_v2.py:329, 365-366, 383, 404)
  • [x] Added logging for attached files in job details (client.py:1050-1053)
  • [x] Support for vision models (qwen2.5vl:7b, llava, bakllava, etc.)
  • [x] Support for document processing models (docling)
  • [x] Automatic base64 encoding for image files
  • [x] Document file handling for processing engines (executor.py:404-432)
  • [x] Job routing based on job_type first, not model_id (executor.py:384-396)
  • [x] Case-insensitive job_type handling (executor.py:164, 380)
  • [x] Relaxed input_data validation for document jobs with attached files (executor.py:173-178)
  • [x] Graceful handling of unsupported file types

Bug Fixes ✅ (2025-11-21)

Configuration Type Casting Fixes ✅

  • [x] Fixed multiple "'<' not supported between instances of 'int' and 'str'" errors
  • [x] Root cause: Environment variable substitution in YAML returns strings, not ints
  • [x] Fixed job queue max_size comparison (client.py:287)
  • [x] Fixed docling max_file_size_mb comparison (docling_engine.py:51)
  • [x] Fixed docling timeout_seconds (docling_engine.py:52)
  • [x] Enhanced priority validator to handle None values (models.py:289)
  • [x] Added explicit int() cast for all numeric priority values (models.py:299)
  • [x] Changed priority default from string "normal" to integer 5 (client.py:971)

Ollama API Parameter Handling Fix ✅

  • [x] Fixed TypeError when passing None values for optional parameters
  • [x] Changed from passing images=None to conditionally including images parameter
  • [x] Build parameter dictionaries dynamically (ollama_v2.py:378-415)
  • [x] Only include options/images when they have actual values

Embed Job Payload Compatibility Fix ✅

  • [x] Fixed embed job execution to accept both texts and input field formats
  • [x] Server sends input field in payload, worker expected texts
  • [x] Updated executor.py:367 to try both fields with fallback
  • [x] Maintains compatibility with both payload formats

Test Suite CI/CD Compatibility Fix ✅

  • [x] Fixed test failures in automation/CI environments (test_config.py)
  • [x] Tests were failing due to MICRODC_API_KEY environment variable overriding config
  • [x] Added @patch.dict decorator to isolate test environment (test_config.py:30, 58)
  • [x] Clear MICRODC_* environment variables at start of affected tests
  • [x] Both failing tests now pass consistently in automation environments
  • [x] All 115 tests now pass reliably in CI/CD pipelines

Heartbeat Model Reporting Enhancement ✅

  • [x] Added processing engine models (e.g., docling) to heartbeat current_models field
  • [x] Worker now reports both inference engine models (Ollama) and processing engine models
  • [x] Added timeout protection (1.0s) for processing engine model listing
  • [x] Enhanced debug logging to distinguish between inference and processing engines
  • [x] Server now receives complete model inventory including docling
  • [x] Added get_platform_name() method to Engine base class
  • [x] DoclingEngine reports platform as "docling" (not "ollama")
  • [x] Each engine reports its own platform name for accurate server tracking
  • [x] Updated run.sh and run_prod.sh to display processing engines in configuration output

Document Processing Integration ✅ (2025-11-21)

Docling Integration ✅

  • [x] Added Docling dependency to requirements.txt
  • [x] Created DocumentProcessor class in src/processors/
  • [x] Implemented support for PDF, DOCX, PPTX, XLSX, HTML, images, audio
  • [x] Added URL, base64, and path input handling
  • [x] Implemented multiple output formats (markdown, HTML, JSON, doctags)
  • [x] Added OCR and table extraction capabilities
  • [x] Implemented file size limits and timeout handling
  • [x] Added comprehensive error handling and validation

Surya OCR Integration ✅ (2025-12-09)

  • [x] Created SuryaProcessor class in src/processors/surya_processor.py
  • [x] Created SuryaEngine class in src/engines/surya_engine.py
  • [x] Implemented support for PDF and image files (PNG, JPEG, TIFF, BMP, WEBP)
  • [x] Added OCR text recognition with 90+ language support
  • [x] Implemented layout detection feature
  • [x] Implemented table recognition feature
  • [x] Added URL, base64, and path input handling
  • [x] Implemented multiple output formats (markdown, JSON, text)
  • [x] Added file size limits and timeout handling
  • [x] Added comprehensive error handling and validation
  • [x] Added configuration to default.yaml
  • [x] Registered engine in client.py
  • [x] Updated job executor to route by model_id (not just job_type)
  • [x] Auto-detect device (cuda/mps/cpu) for optimal performance
  • [x] Updated to new Surya API (Predictor classes)
  • [x] Updated documentation (README.md, FUTURE_ENGINES.md, TODO.md)

Engine Architecture Refactoring ✅

  • [x] Created Engine base class and EngineType/EngineCapability enums
  • [x] Created ProcessingEngine abstract class for non-inference engines
  • [x] Refactored InferenceEngine to inherit from Engine base
  • [x] Implemented DoclingEngine as ProcessingEngine
  • [x] Added model registration system for processing engines
  • [x] Implemented model-based routing (jobs route by model_id, not job_type)
  • [x] Added lazy model lookup for processing engines
  • [x] Updated JobExecutor to support List[Engine] for processing engines
  • [x] Implemented _get_processing_engine_for_model() helper method
  • [x] Fixed event loop handling for async model registration
  • [x] Updated all tests for model-based routing (17/17 passing)
  • [x] Updated WorkerClient to pass processing engines as list
  • [x] Fixed capability registration to iterate over list

Job Routing Changes ✅

  • [x] Changed from job_type-based routing to model_id-based routing
  • [x] Jobs now specify model_id="docling" to route to DoclingEngine
  • [x] Supports multiple engines of same type (e.g., model_id="tesseract" vs "easyocr")
  • [x] All jobs now require model_id for routing
  • [x] Processing engine models skip model loading (always available)
  • [x] Inference engine models follow normal load/unload lifecycle

Configuration ✅

  • [x] Added engine.processing.docling section to config/default.yaml
  • [x] Moved configuration under engine section for consistency
  • [x] Implemented environment variable support for all engine settings
  • [x] Added configurable max file size, timeout
  • [x] Implemented temp directory configuration
  • [x] Updated WorkerClient to read from engine.processing.docling path
  • [x] Cleaned up default.yaml to remove unimplemented engines (vLLM, Transformers)
  • [x] Created docs/FUTURE_ENGINES.md for planned engine configurations
  • [x] Updated README.md to reference FUTURE_ENGINES.md
  • [x] Removed unused config sections (performance, security, development)
  • [x] Removed unused config values (metrics_interval, model loading strategy)
  • [x] Simplified config to only show what's actually implemented

Testing ✅

  • [x] Created comprehensive test suite (tests/test_document_processor.py)
  • [x] Unit tests for DocumentProcessor class
  • [x] Integration tests for JobExecutor document handling
  • [x] Tests for error conditions and edge cases
  • [x] Tests for JobParameters document field validation

Documentation ✅

  • [x] Created DOCUMENT_PROCESSING_INTEGRATION.md guide
  • [x] Documented server-side changes needed
  • [x] Documented Python client integration requirements
  • [x] Updated README.md with document processing section
  • [x] Added document processing to feature list
  • [x] Updated project structure diagram

Production Readiness ✅ (2025-11-14)

Long-Running Job Heartbeat Fix ✅

  • [x] Fixed heartbeat blocking during long-running jobs
  • [x] Added 5s timeout to engine.list_models() in heartbeat loop
  • [x] Added 3s timeout to system metrics collection with asyncio.to_thread()
  • [x] Added 15s timeout to heartbeat send operation
  • [x] Implemented fallback to cached model registry on timeout
  • [x] Heartbeats now guaranteed to send every 30s even during 10min+ jobs
  • [x] Job progress included in heartbeat to reset server-side timeout

Code Quality and Standards ✅

  • [x] Fixed all ruff linting errors (14 errors resolved)
  • [x] Updated pyproject.toml to modern ruff configuration format
  • [x] Replaced all bare except clauses with specific Exception handling
  • [x] Fixed unused variable warnings
  • [x] All 115 tests passing (updated from 98 after multimodal/document features)
  • [x] Set up Black for code formatting (v24.10.0) - available in dev extras
  • [x] Set up Ruff for linting (v0.9.1) - configured in pyproject.toml
  • [x] Set up MyPy for type checking (v1.14.0) - available in dev extras
  • [x] Configure pre-commit hooks (2025-11-23)
  • [x] Created .pre-commit-config.yaml with 10 hooks
  • [x] Configured Black (line-length=100)
  • [x] Configured isort (profile=black)
  • [x] Configured Ruff (auto-fix enabled, excludes notebooks)
  • [x] Configured mypy (excludes tests/examples/notebooks)
  • [x] Configured bandit for security checks
  • [x] Configured markdownlint
  • [x] Configured YAML formatter (pretty-format-yaml)
  • [x] Added python-safety-dependencies-check
  • [x] Configured general file checks (trailing-whitespace, end-of-file-fixer, etc.)
  • [x] Install pre-commit: pip install pre-commit && pre-commit install
  • [x] Run hooks: pre-commit run --all-files
  • [x] Fixed all Bandit security issues (2025-11-24)
  • [x] Fixed 2 high-severity tarfile extraction vulnerabilities (B202)
  • [x] Replaced unsafe os.system() calls with subprocess.run()
  • [x] Replaced 13 empty except:pass blocks with proper error handling
  • [x] Replaced 4 assert statements with RuntimeError exceptions
  • [x] Added #nosec comments for legitimate subprocess usage
  • [x] All security checks now pass with 0 issues
  • [ ] Add type hints to all functions (in progress - core modules have hints)
  • [ ] Achieve >90% test coverage (currently 37% overall - 4547 statements, 2880 uncovered)
  • High coverage modules: api/models.py (98%), core/exceptions.py (100%), core/config.py (78%)
  • Medium coverage: engines/base.py (72%), jobs/monitor.py (87%), models/registry.py (72%)
  • Low coverage: cli.py (0%), client.py (0%), server_client.py (10%) - mostly integration code
  • Focus areas: Increase coverage for job executor (47%), ollama engines, document processor (74%)

Documentation and File Management ✅

  • [x] Added docs/AUTHENTICATION.md to version control
  • [x] Added docs/CODE_QUALITY.md for development tools and standards (2025-11-23)
  • [x] Created CONTRIBUTING.md with contributor guidelines and workflow (2025-11-23)
  • [x] Removed deprecated documentation files (SETUP.md, WORKER_CHANGE_REQUEST.md)
  • [x] Removed unused example_usage.py file
  • [x] Moved test_max_tokens.py to tests/ directory with proper pytest decorator
  • [x] Updated README.md with code quality tools section and contributor guide

TODO Comments Resolution ✅

  • [x] Replaced all TODO comments with explanatory notes
  • [x] Documented that log rotation is handled by systemd
  • [x] Noted signature verification is future enhancement (checksums provide integrity)
  • [x] Clarified credential validation handled by ServerClient
  • [x] Updated daemon/log viewing commands to reference systemd

Project Hygiene ✅

  • [x] Enhanced .gitignore with comprehensive Python patterns
  • [x] Cleaned all build artifacts (pycache, *.pyc, .DS_Store)
  • [x] All changes staged and ready for commit

Phase 1: Core Implementation 🚀 ✅

Project Setup ✅

  • [x] Set up project structure and core directories
  • [x] Create requirements.txt and setup.py
  • [x] Create default configuration file (config/default.yaml)

Core Components ✅

  • [x] Implement core configuration management (config.py)
  • [x] Create Pydantic models for API communication
  • [x] Implement abstract InferenceEngine base class
  • [x] Build Ollama engine integration

Server Communication ✅

  • [x] Create server API client for communication
  • [x] Implement worker registration flow
  • [x] Build model discovery and capability reporting
  • [x] Implement heartbeat mechanism

Job Processing ✅

  • [x] Create job executor for processing inference jobs
  • [x] Build job queue management system
  • [x] Implement system resource monitoring utilities

Client Orchestration ✅

  • [x] Create main client orchestrator
  • [x] Add CLI interface with commands
  • [x] Set up logging and health checks

Phase 2: Testing & Robustness 🛡️

Testing

  • [x] Write unit tests for core components
  • [x] Config tests (tests/test_config.py)
  • [x] Model management tests (tests/test_models.py)
  • [x] Job queue tests (tests/test_jobs.py)
  • [x] System monitor tests (tests/test_system.py)
  • [x] API model validation tests (tests/test_api_models.py)
  • [ ] Test end-to-end worker registration and job execution
  • [ ] Integration tests with mock server

Error Handling ✅

  • [x] Add error handling and retry logic
  • [x] Implement graceful shutdown
  • [x] Add resource limits enforcement

GPU Support ✅

  • [x] NVIDIA GPU detection via nvidia-ml-py
  • [x] Apple Metal Performance Shaders (MPS) detection (requires PyTorch)
  • [x] Unified memory tracking for Apple Silicon
  • [x] GPU capability reporting in system info
  • [x] PyTorch dependency added to GPU extras for MPS support

Development Tools ✅

  • [x] Development run script (tools/run.sh)
  • [x] Test script with interactive menu (tools/test_worker.sh)
  • [x] Makefile for common tasks
  • [x] Watch mode for auto-restart
  • [x] Interactive shell mode for debugging

Phase 3: Additional Features (Future) 🔮

Engine Support

  • [ ] vLLM integration
  • [x] HuggingFace Transformers integration (2025-12-15)
  • [x] Created TransformersEngine class inheriting from InferenceEngine
  • [x] Support for text generation, embeddings, and multimodal models
  • [x] Dynamic VRAM management with LRU model eviction
  • [x] bitsandbytes 4-bit/8-bit quantization support
  • [x] HuggingFace Hub download with configurable allowlist/blocklist
  • [x] Case-insensitive allowlist matching
  • [x] On-demand model download for allowlisted Hub models
  • [x] Streaming text generation via TextIteratorStreamer
  • [x] Auto device selection (CUDA, MPS, CPU)
  • [x] Job platform routing via platform field
  • [x] 47 unit tests with 66% coverage
  • [ ] Custom engine plugin system

Advanced Features

  • [x] Auto-Update System (2025-09-23)
  • [x] Version checking against server requirements
  • [x] Automatic download and installation of updates
  • [x] Graceful shutdown before updates
  • [x] Rollback capability on update failure
  • [x] Maintenance window support
  • [x] Platform-specific update scripts (Linux/macOS/Windows)
  • [x] Model auto-pulling on demand (Server-Initiated Model Downloads)
  • [x] Added PendingDownloadRequest, DownloadResponseType Pydantic models
  • [x] Added model_downloads config section with enable toggle, allowlists, hardware thresholds
  • [x] Created ModelDownloadManager for download orchestration with hardware validation
  • [x] Integrated with heartbeat response pending_download_requests field
  • [x] API endpoints: respond, progress, complete (GET/POST /api/v1/workers/download-requests/)
  • [x] Hardware compatibility checking (estimated_size_gb, required_vram_gb, required_ram_gb)
  • [x] Platform-specific allowlist/blocklist support with wildcard patterns
  • [x] Automatic model existence check before downloading
  • [x] Resume detection via GET /api/v1/workers/active-downloads on startup
  • [ ] Multi-GPU support
  • [ ] Job prioritization (basic priority support already implemented)
  • [ ] Result caching
  • [ ] Distributed inference support

Code Cleanup & Refactoring ✅

  • [x] Resolve duplicate class definitions (GPUInfo, CPUInfo classes are correctly separated - internal dataclasses with to_api_model converters)
  • [x] Remove unused exception classes (removed ConfigurationError)
  • [x] Fix placeholder code (removed "not yet implemented" message in logging.py)
  • [x] Fix linting issues (resolved unused variables and bare except clauses)
  • [x] Enhanced error messages for missing API key configuration
  • [x] Add parameter tracking for max_tokens → num_predict conversion
  • [x] Create test utilities for parameter verification
  • [ ] Review and remove/document unused API models (kept as they define server API contract)
  • [ ] Clean up unused system utility functions (many are CLI entry points or public APIs)
  • [ ] Add tests for public APIs to prevent accidental removal
  • [ ] Document which "unused" functions are actually public APIs or future hooks

Documentation

  • [ ] API documentation
  • [x] Deployment guide
  • [x] Created docs/setup/UBUNTU_SETUP.md - comprehensive Ubuntu/systemd installation guide
  • [x] Created docs/setup/WINDOWS_SETUP.md - Windows installation guide with service options
  • [ ] Configuration reference
  • [ ] Troubleshooting guide

Operations

  • [ ] Prometheus metrics export
  • [ ] OpenTelemetry tracing
  • [ ] Performance profiling
  • [ ] Auto-scaling support

Current Status 📊

Completed ✅

  • All core infrastructure and main components
  • Ollama engine integration with full API support
  • Complete server communication layer with retry logic
  • Comprehensive resource monitoring (CPU, GPU, Memory, Storage)
  • Model lifecycle management with loading strategies
  • Job processing pipeline with queue management
  • Rich CLI interface with all essential commands
  • Error handling with custom exceptions
  • Unit test suite covering core components
  • Claim-based job assignment system implementation
  • Bearer token authentication with credential persistence
  • Automatic credential reuse on worker restart
  • Support for both "input" and "prompt" fields in job payload
  • Proper handling of required tokens_used field in job completion
  • Race condition mitigation with completion delay
  • Detailed server response logging for debugging
  • Updated to match new server API (removed capabilities field, uses normalized tables)
  • Fixed heartbeat format to include status object and supported_models list
  • Enhanced error messaging for missing API key configuration
  • Parameter tracking and logging for max_tokens → num_predict conversion
  • Code cleanup: removed unused code, fixed linting issues
  • Test utilities for parameter verification
  • Multimodal support: Job model and client support for llm_interaction_type and modalities
  • Model platform tracking: Reports inference platform (ollama, vLLM, etc.) for each model in heartbeat (registration simplified to not include models)
  • Resilient server communication: Worker survives temporary server outages with automatic retry and graceful recovery

In Progress 🚧

  • Integration testing with real MicroHub server
  • Performance optimization for high-throughput scenarios

Recently Completed ✅

  • HuggingFace Transformers Engine (2025-12-15)
  • Created TransformersEngine class in src/engines/transformers_engine.py (~1000 lines)
  • Support for text generation models (CausalLM, Seq2SeqLM)
  • Support for embedding models (BERT, RoBERTa, sentence-transformers)
  • Support for multimodal models (LLaVA, Qwen-VL)
  • Dynamic VRAM management with LRU eviction for multiple loaded models
  • bitsandbytes 4-bit/8-bit quantization support
  • HuggingFace Hub download with configurable allowlist/blocklist
  • Case-insensitive allowlist matching for model names
  • On-demand model download for allowlisted Hub models
  • Streaming text generation via TextIteratorStreamer
  • Auto device selection (CUDA, MPS, CPU)
  • Job routing via platform field in job data
  • 47 unit tests with 66% coverage
  • Configuration in config/default.yaml
  • Integration in src/core/client.py
  • Documentation in docs/engines/transformers.md
  • Silenced noisy third-party loggers (urllib3, filelock, etc.)
  • Removed deprecated engine.type config (use engine.available list)

  • Fixed ReadTimeout Error for Long-Running Generations (2025-11-07)

  • Fixed timeout handling in OllamaEngineV2 to prevent ReadTimeout errors
  • ollama_v2.py:10: Added httpx import for granular timeout configuration
  • ollama_v2.py:56-61: Configured httpx.Timeout with 120s read timeout between chunks
  • ollama_v2.py:318-417: Refactored generate() to use internal streaming for batch jobs
  • Batch jobs now internally stream from Ollama while returning complete results
  • Set reasonable read timeout (120s between chunks) that detects actual stalls
  • No need for artificial unlimited timeouts - as long as tokens flow, job continues
  • Better error detection - quickly identifies when Ollama actually stalls
  • Maintained connection timeout (30s), write timeout (30s), pool timeout (10s)
  • Fixes "Generation failed: ReadTimeout:" errors on large models (e.g., deepseek-r1:70b)
  • Jobs no longer fail prematurely during extended generation tasks
  • Worker can now handle models that require unlimited generation time

  • Automatic Version Management System (2025-11-04)

  • Implemented fully automated version bumping on every git commit
  • tools/bump_version.py: Python utility for version incrementing
  • tools/git-hooks/pre-commit: Git hook for automatic PATCH version bump
  • tools/install-git-hooks.sh: One-time setup script for git hooks
  • version.py:15: Updated to version 0.1.0 (initial development)
  • docs/VERSIONING.md: Comprehensive guide with automatic versioning instructions
  • README.md:568-599: Added automatic version management section
  • Every commit now auto-increments PATCH version (0.1.0 → 0.1.1)
  • Manual bumps available for MINOR/MAJOR versions
  • Zero-configuration after one-time hook installation
  • Established 0.x.x for development, 1.0.0 for first production release

  • Worker Version Tracking (2025-11-04)

  • Added worker_version field to WorkerHeartbeat model
  • models.py:368: Added worker_version as optional string field
  • client.py:684: Heartbeat now includes version from version.py
  • README.md:271: Updated heartbeat format example to include worker_version
  • Worker version now reported in every heartbeat for tracking and debugging
  • Server can track which worker versions are deployed in the fleet
  • Enables version-specific debugging and compatibility checks

  • Embed Job Execution Fix (2025-11-04)

  • Fixed embed job execution to properly handle payload with "texts" field
  • client.py:795: Extract job_type early to use in payload parsing logic
  • client.py:805-807: Added special handling for job_type="embed" to preserve entire payload
  • executor.py:89-91: Updated validation to accept dict input_data for embed/chat jobs
  • executor.py:103-105: Store job_type in result metadata for proper output formatting
  • executor.py:240-253: Extract "texts" field from payload dict for embedding generation
  • server_client.py:1012-1026: Format output based on job_type (embeddings vs text)
  • Fixes error: "Job has no input_data/prompt" when executing embed jobs
  • Fixes output format: embeddings now properly structured as {"embeddings": [...], "finish_reason": "stop"}
  • Worker now fully supports embedding job execution with proper payload parsing and result formatting

  • Embedding Model Support Fix (2025-10-26)

  • Fixed load_model() method in both Ollama engines to handle embedding models
  • ollama.py:249: Added embedding model detection and proper test method
  • ollama_v2.py:261: Added embedding model detection and proper test method
  • Embedding models (containing "embed" in name) now tested with generate_embeddings()
  • Non-embedding models continue to use generate() test
  • Fixes error: "does not support generate" when loading embedding models
  • Worker now fully supports embedding models like qwen3-embedding:8b, nomic-embed-text, etc.

  • Test Suite Fixes (2025-10-14)

  • Fixed all 6 failing tests after resilient server communication implementation
  • test_worker_registration: Updated for new WorkerRegistration schema without supported_models
  • test_required_fields: Changed to test ModelCapability instead of Job model
  • test_download_update_success: Fixed async context manager mock for streaming
  • test_periodic_update_check: Fixed async task cancellation pattern
  • test_default_config_loading: Fixed config path and expected values
  • test_nested_config_access: Added handling for string vs int config values
  • All 97 tests now passing successfully

  • Resilient Server Communication (2025-10-14)

  • Worker now handles server unavailability gracefully without crashing
  • Registration retry: Added automatic retry with exponential backoff for network errors
  • Continuous operation: Heartbeat and job polling loops continue even when server is down
  • Server availability tracking: New _server_unavailable_since and _last_server_success tracking
  • Helper methods: Added _mark_server_success(), _mark_server_failure(), _check_server_unavailability()
  • Graceful shutdown: Worker exits gracefully after configurable max unavailability (default: 5 minutes)
  • Server availability monitor: New background task monitors server status periodically
  • Smart recovery: Automatically resumes when server comes back online with logging
  • Configuration: Added server.retry.* and server.unavailable.* config options
  • Workers now survive temporary server outages and network issues

  • Model Platform Tracking (2025-10-13)

  • Worker now reports which inference platform hosts each model
  • Registration simplified: Removed supported_models field (server gets updates via heartbeat only)
  • Heartbeat includes platform info in current_models field with model+platform objects
  • Added ModelWithPlatform Pydantic model for structured platform data
  • Platform auto-detection based on engine type configuration
  • Support for 7 standard platforms plus custom platform type
  • Helper method _models_to_platform_format() for converting model lists
  • Backward compatible Union type supports both old and new formats
  • Enables server-side platform-specific job routing and performance tracking
  • Model updates are now fully dynamic via heartbeat system

  • Runtime Token Refresh (2025-10-11)

  • Fixed authentication failures during worker runtime (heartbeat, job polling, etc.)
  • Enhanced _make_request() with automatic token refresh on 401 errors
  • Added transparent request retry after successful token refresh
  • Prevents infinite retry loops with retry_auth flag
  • Enhanced error logging in heartbeat and job polling loops for authentication failures
  • Workers now seamlessly handle token expiration during long-running sessions
  • No service interruption when tokens expire - automatic recovery

  • Configuration Loading Fix (2025-10-11)

  • Fixed "Config file not found" warnings on production workers
  • WorkerConfig now respects MICRODC_CONFIG environment variable
  • Added config/worker.yaml to search paths (used by ubuntu_setup.sh)
  • Proper configuration loading for systemd service installations
  • Production workers now correctly load configuration from /srv/microdcworker/config/worker.yaml

  • Automatic Token Refresh (2025-10-09)

  • Fixed token expiration handling for long-running workers
  • Added proper calculation of token expiration time from expires_in field
  • Implemented automatic credential refresh using saved secret_key
  • Added _refresh_credentials() method in ServerClient to handle token renewal
  • Workers now save expires_at, refresh_token, and secret_key with credentials
  • Enhanced error messages for expired bootstrap tokens with clear renewal instructions
  • Workers can now run indefinitely without "Invalid or expired token" errors

  • Multimodal Support (2025-10-04)

  • Added llm_interaction_type field to Job model (generation vs chat)
  • Added input_modalities and output_modalities fields to Job model
  • Added job_type field to Job model
  • Enhanced job executor logging with multimodal information
  • Updated client job claim parsing to extract multimodal fields
  • Enhanced job claim logging with modality and interaction type details
  • Implemented routing in job executor based on interaction type
  • Added _prepare_chat_messages() helper for chat format conversion
  • Implemented embedding job support with JSON result formatting
  • Handle chat messages in payload for proper message extraction
  • Full compatibility with server WORKER_CHANGE_REQUEST.md specifications

  • Auto-Update System (2025-09-23)

  • Created WorkerAutoUpdater class with comprehensive update management
  • Implemented version checking with server endpoints (/api/v1/version/)
  • Added automatic download with progress tracking
  • Created backup and rollback functionality
  • Integrated graceful shutdown for running jobs
  • Added maintenance window support for controlled updates
  • Created platform-specific update scripts for Linux/macOS/Windows
  • Implemented update configuration in default.yaml
  • Added comprehensive test coverage for update functionality
  • Integrated auto-updater with main worker client lifecycle

  • Enhanced Heartbeat System (2025-09-22)

  • Added SystemMetricsCollector class for comprehensive metrics collection
  • Enhanced WorkerHeartbeat model with SystemMetrics field
  • Implemented collection of: load average, CPU count, memory metrics, disk metrics, GPU metrics, network I/O, uptime
  • Added temperature monitoring (CPU and GPU when available)
  • Dynamic model reporting in current_models field
  • Backward compatible with legacy heartbeat format
  • Tested on macOS with Apple Silicon (MPS GPU support)

  • Dynamic Model Reporting (2025-09-22)

  • Heartbeat now fetches fresh model list from engine on each cycle
  • Added refresh_models() method for silent registry updates
  • Implemented model_refresh_loop() for periodic registry synchronization
  • Model registry cleared and repopulated on refresh to prevent stale entries
  • Fallback to cached registry if engine query fails
  • Configurable refresh interval (default: 5 minutes)
  • Successfully tested with 31 Ollama models

Next Steps 📋

  1. Monitor and optimize job completion success rate
  2. Add metrics collection for job processing performance
  3. Implement adaptive retry strategies based on error types
  4. Add vLLM inference engine support
  5. Add Prometheus metrics export for monitoring
  6. Implement distributed inference support for large models

Notes

Core Implementation ✅

  • All Phase 1 core implementation completed successfully
  • Ollama is fully integrated as the primary inference engine
  • Async/await patterns used throughout for optimal concurrency
  • Comprehensive error handling with retry logic implemented
  • Structured logging with configurable outputs and detailed job logging
  • Pluggable architecture ready for additional engines

API Compatibility Updates ✅

  • Server Schema Changes: Worker updated to match new server API without capabilities field
  • Normalized Tables: Hardware specs and supported models sent for server's normalized database storage
  • Heartbeat Format: Correctly wraps status object and includes supported_models list
  • Parameter Mapping: Properly converts generic parameters to engine-specific ones (e.g., max_tokens → num_predict)

Job Processing ✅

  • Claim-based system: Atomic job claiming prevents race conditions
  • Flexible payload handling: Supports both "input" and "prompt" fields from server
  • Token usage: Always includes required tokens_used field in completions
  • Race condition fix: 0.5s delay before result submission prevents "Assignment not active" errors
  • Error reporting: All job failures properly reported to server with detailed error messages

Authentication & Security ✅

  • Credential persistence: Saves credentials after registration for reuse
  • Automatic token refresh: Transparently refreshes expired tokens using saved secret_key
  • Token expiration handling: Properly calculates and tracks token expiration times
  • Clean error handling: Worker exits cleanly on authentication failures
  • Bearer token auth: Proper implementation of bearer token authentication
  • Bootstrap token: Initial registration uses one-time bootstrap token
  • Enhanced error messages: Clear guidance when bootstrap tokens expire

Testing & Quality

  • Unit tests provide good coverage for individual components
  • Ready for integration testing with actual MicroDC server
  • Comprehensive logging for debugging and monitoring