在生产环境中部署AI计算机视觉：从训练到边缘

为实际应用构建可扩展的视觉AI系统

返回教程列表 🌐 Read in English

高级约 22 分钟

在生产环境中部署AI计算机视觉：从训练到边缘

为实际应用构建可扩展的视觉AI系统

一份实用指南，涵盖从目标检测、图像分类、视频分析到边缘部署策略，帮助您构建并部署生产级规模的计算机视觉系统。

computer vision AI object detection edge AI deep learning production ML

在生产环境中部署AI计算机视觉：从训练到边缘

计算机视觉的生产挑战

研究型计算机视觉追求在基准数据集上达到最先进的准确率。而生产型计算机视觉则致力于构建能够在真实世界数据上可靠运行、扩展到数百万张图像、并满足严格延迟要求的系统。

关键生产挑战：

分布偏移：当生产图像与训练数据不同时，模型性能下降

延迟约束：许多应用要求推理时间低于100毫秒

规模化成本：每天处理数百万张图像需要高效率

边缘部署：许多应用需要在设备端进行推理

现代计算机视觉架构选择

基础模型 vs. 自定义训练

python
选项1：微调基础模型（推荐起点）
from transformers import AutoFeatureExtractor, AutoModelForImageClassification
import torch
从预训练的视觉Transformer开始
model = AutoModelForImageClassification.from_pretrained(
    "google/vit-base-patch16-224",
    num_labels=your_num_classes,
    ignore_mismatched_sizes=True
)
选项2：使用CLIP进行零样本分类
from transformers import CLIPProcessor, CLIPModel
clip_model = CLIPModel.from_pretrained("openai/clip-vit-large-patch14")
processor = CLIPProcessor.from_pretrained("openai/clip-vit-large-patch14")
def zero_shot_classify(image, class_names: list[str]) -> dict:
    inputs = processor(
        text=class_names,
        images=image,
        return_tensors="pt",
        padding=True
    )
    outputs = clip_model(**inputs)
    probs = outputs.logits_per_image.softmax(dim=1)
    return dict(zip(class_names, probs[0].tolist()))
无需为新类别进行训练！
result = zero_shot_classify(
    product_image,
    ["electronics", "clothing", "food", "furniture"]
)

生产环境中的目标检测

python
YOLO v8 - 生产环境中速度和准确率的最佳平衡
from ultralytics import YOLO
训练
model = YOLO('yolov8n.pt')  # 从预训练模型开始
results = model.train(
    data='dataset.yaml',
    epochs=100,
    imgsz=640,
    batch=16,
    device='0'  # GPU
)
批量推理
model = YOLO('best.pt')
高效处理图像批次
results = model(
    ['image1.jpg', 'image2.jpg', ...],
    batch=32,
    conf=0.5,
    iou=0.45
)
导出用于生产
model.export(format='onnx', optimize=True)  # ONNX格式便于移植
model.export(format='tflite')               # TensorFlow Lite用于移动端
model.export(format='engine')               # TensorRT用于NVIDIA

构建生产级视觉流水线

高吞吐量图像处理

python
import asyncio
import aiohttp
from PIL import Image
import io
class ProductionVisionPipeline:
    def __init__(self, model_path: str, batch_size: int = 32):
        self.model = load_optimized_model(model_path)
        self.batch_size = batch_size
        self.queue = asyncio.Queue(maxsize=1000)
    
    async def process_batch(self, images: list) -> list:
        """GPU高效的批量处理"""
        preprocessed = [self.preprocess(img) for img in images]
        batch_tensor = torch.stack(preprocessed).cuda()
        
        with torch.cuda.amp.autocast():  # 混合精度，速度提升2倍
            with torch.no_grad():
                outputs = self.model(batch_tensor)
        
        return self.postprocess(outputs)
    
    async def worker(self):
        """持续从队列中处理批次"""
        while True:
            batch = []
            # 收集最多batch_size个项
            try:
                for _ in range(self.batch_size):
                    item = await asyncio.wait_for(
                        self.queue.get(), timeout=0.1
                    )
                    batch.append(item)
            except asyncio.TimeoutError:
                pass
            
            if batch:
                results = await self.process_batch([b['image'] for b in batch])
                for item, result in zip(batch, results):
                    item['future'].set_result(result)
在A100 GPU上达到每秒500+张图像

边缘部署

针对移动和边缘设备优化

python
步骤1：量化以用于边缘部署
import torch
from torch.quantization import quantize_dynamic
PTQ（训练后量化）- 无需重新训练
quantized = quantize_dynamic(model, {torch.nn.Conv2d, torch.nn.Linear}, dtype=torch.qint8)
体积缩小4倍，速度提升2-3倍，准确率损失小于1%
步骤2：导出为TFLite（Android/iOS）
import tensorflow as tf
converter = tf.lite.TFLiteConverter.from_saved_model(saved_model_path)
converter.optimizations = [tf.lite.Optimize.DEFAULT]
converter.target_spec.supported_types = [tf.float16]  # FP16用于GPU加速
tflite_model = converter.convert()
步骤3：ONNX用于跨平台
torch.onnx.export(
    model,
    dummy_input,
    "model.onnx",
    opset_version=13,
    dynamic_axes={'input': {0: 'batch_size'}}
)

iPhone上的设备端推理

swift
// iOS上的Core ML模型推理
import CoreML
import Vision
import UIKitclass VisionClassifier {
    private let model: VNCoreMLModel
    
    init() throws {
        let config = MLModelConfiguration()
        config.computeUnits = .all  // 可用时使用神经引擎
        let coreMLModel = try YourModel(configuration: config)
        self.model = try VNCoreMLModel(for: coreMLModel.model)
    }
    
    func classify(image: UIImage) async throws -> [VNClassificationObservation] {
        return try await withCheckedThrowingContinuation { continuation in
            let request = VNCoreMLRequest(model: model) { request, error in
                if let results = request.results as? [VNClassificationObservation] {
                    continuation.resume(returning: results)
                }
            }
            
            let handler = VNImageRequestHandler(
                cgImage: image.cgImage!,
                options: [:]
            )
            try? handler.perform([request])
        }
    }
}
// iPhone 15 Pro神经引擎上推理时间30毫秒

生产监控与质量控制

数据漂移检测

python
from evidently import ColumnMapping
from evidently.report import Report
from evidently.metric_preset import DataDriftPresetdef monitor_vision_data_quality(reference_images, production_images):
    """
    检测生产图像是否与训练数据显著不同
    """
    # 提取图像统计特征
    ref_features = extract_image_features(reference_images)
    prod_features = extract_image_features(production_images)
    
    # Evidently漂移报告
    report = Report(metrics=[DataDriftPreset()])
    report.run(
        reference_data=ref_features,
        current_data=prod_features
    )
    
    # 如果检测到漂移则发出警报
    if report.as_dict()['metrics'][0]['result']['dataset_drift']:
        trigger_retraining_alert()

视觉AI平台

使用场景推荐工具

通用分类AWS Rekognition, Google Vision API 自定义训练Roboflow + YOLOv8 医学影像AWS HealthLake, Google Health AI 制造质量检测Landing AI, Cognex VisionPro 视频分析NVIDIA Metropolis, Azure Video Analyzer 边缘部署NVIDIA Jetson, Apple Core ML

关键要点

从基础模型开始微调，而不是从头训练

CLIP支持零样本分类——无需重新训练即可添加新类别

YOLOv8是2024年生产环境中目标检测的标准

边缘部署需要量化和格式转换以提高效率

监控数据漂移——生产图像最终总会与训练数据不同

Getting Started

Learn how to get started with this application.

Learn more

Installation Guide

在生产环境中部署AI计算机视觉：从训练到边缘

在生产环境中部署AI计算机视觉：从训练到边缘

计算机视觉的生产挑战

现代计算机视觉架构选择

基础模型 vs. 自定义训练

选项1：微调基础模型（推荐起点）

从预训练的视觉Transformer开始

选项2：使用CLIP进行零样本分类

无需为新类别进行训练！

生产环境中的目标检测

YOLO v8 - 生产环境中速度和准确率的最佳平衡

训练

批量推理

高效处理图像批次

导出用于生产

构建生产级视觉流水线

高吞吐量图像处理

在A100 GPU上达到每秒500+张图像

边缘部署

针对移动和边缘设备优化

步骤1：量化以用于边缘部署

PTQ（训练后量化）- 无需重新训练

体积缩小4倍，速度提升2-3倍，准确率损失小于1%

步骤2：导出为TFLite（Android/iOS）

步骤3：ONNX用于跨平台

iPhone上的设备端推理

生产监控与质量控制

数据漂移检测

视觉AI平台

关键要点

Documentation

Getting Started

Learn more