ML Model Versioning with DVC

Data Version Control for ML experiments and model tracking

返回教程列表
高级18 分钟

ML Model Versioning with DVC

Data Version Control for ML experiments and model tracking

ML Model Versioning with DVC Overview Data Version Control for ML experiments and model tracking. This guide covers practical implementation for production ML systems. Why This Matters in MLOps Modern ML systems require rigorous operations practi

ML Model Versioning with DVC

Overview

Data Version Control for ML experiments and model tracking. This guide covers practical implementation for production ML systems.

Why This Matters in MLOps

Setup

bash

Install required tools

pip install dvc mlflow pandas numpy scikit-learn

Or with Docker

docker pull python:3.11-slim

Core Implementation

python
import os
import json
import logging
from datetime import datetime
from pathlib import Path

logger = logging.getLogger(__name__)

class MLModelVersioningwithDVC: """ ML Model Versioning with DVC implementation. Handles: experiment tracking Tool: dvc """ def __init__(self, config: dict = None): self.config = config or self._default_config() self._setup() def _default_config(self) -> dict: return { "tool": "dvc", "environment": os.getenv("ENVIRONMENT", "development"), "log_level": "INFO", } def _setup(self): """Initialize dvc connection and resources.""" logging.basicConfig(level=self.config.get("log_level", "INFO")) logger.info(f"Initialized ML Model Versioning with DVC with config: {self.config}") def run(self, **kwargs) -> dict: """Execute experiment tracking.""" start = datetime.utcnow() try: result = self._execute(**kwargs) elapsed = (datetime.utcnow() - start).total_seconds() logger.info(f"ML Model Versioning with DVC completed in {elapsed:.2f}s") return { "status": "success", "result": result, "elapsed_seconds": elapsed } except Exception as e: logger.error(f"ML Model Versioning with DVC failed: {e}") return { "status": "failed", "error": str(e) } def _execute(self, **kwargs) -> dict: """Core experiment tracking logic. Override to customize.""" return {"completed": True, "tool": "dvc"}

Configuration

config = { "tool": "dvc", "tracking_uri": os.getenv("MLFLOW_TRACKING_URI", "http://localhost:5000"), "artifact_root": "./artifacts", }

Initialize

processor = MLModelVersioningwithDVC(config) result = processor.run() print(json.dumps(result, indent=2))

DVC Integration

python

Specific dvc integration for experiment tracking

import subprocess

def setup_dvc(): """Configure dvc for experiment tracking.""" # Initialize project print(f"Setting up dvc for experiment tracking...") # Example configuration config = { "project": "my-ml-project", "tool": "dvc", "specialty": "experiment tracking", "version": "1.0.0" } # Save configuration Path(".dvc").mkdir(exist_ok=True) with open(f".dvc/config.json", "w") as f: json.dump(config, f, indent=2) print(f"dvc configured for experiment tracking") return config

config = setup_dvc()

Monitoring and Alerting

python
from dataclasses import dataclass
import time

class MLOpsMonitor: """Monitor experiment tracking metrics.""" def __init__(self): self.metrics: list[MetricSnapshot] = [] self.thresholds = { "error_rate": 0.05, "latency_p99_ms": 1000, "data_drift_score": 0.3 } def record(self, metric: str, value: float, labels: dict = None): snapshot = MetricSnapshot( timestamp=time.time(), metric_name=metric, value=value, labels=labels or {} ) self.metrics.append(snapshot) self._check_threshold(metric, value) def _check_threshold(self, metric: str, value: float): threshold = self.thresholds.get(metric) if threshold and value > threshold: logger.warning(f"ALERT: {metric}={value:.3f} exceeds threshold {threshold}")

monitor = MLOpsMonitor()

CI/CD Integration

yaml

.github/workflows/ml-pipeline.yml

name: ML Pipeline

on: push: paths: ['src/', 'data/']

jobs: train-and-evaluate: runs-on: ubuntu-latest steps: - uses: actions/checkout@v4 - name: Setup Python uses: actions/setup-python@v5 with: python-version: '3.11' - name: Install dependencies run: pip install -r requirements.txt - name: Run experiment tracking run: python -m src.ml_model_versioning_with_dvc env: MLFLOW_TRACKING_URI: ${{ secrets.MLFLOW_URI }} - name: Check model quality run: python -m src.validate_model

Best Practices

Resources

相关工具

dvcpythondocker