教程中心
AI Agent 从入门到实战:概念理解、MCP 使用、平台实操、工作流自动化
1252
教程总数
234
入门教程
42
实操教程
按主题浏览
Fine-Tuning GPT-4o Mini: OpenAI Fine-Tuning API Complete Guide
When and how to fine-tune LLMs for domain-specific tasks
GPT-4o mini 微调完全指南(2026):用 OpenAI 微调 API 得到格式/风格稳定的托管模型、海量调用降本。含 JSONL 数据准备→上传→训练→调用真实代码、何时微调 vs 提示/RAG、数据质量 > 数量。
LangGraph Tutorial: Build Stateful AI Agents with Persistent Memory
Build complex multi-step AI workflows with state management using LangGraph
LangGraph enables AI agents with persistent state, conditional branching, and human-in-the-loop workflows. This tutorial builds a real research agent from scratch with memory, tool use, and error recovery.
Advanced Prompt Engineering: Techniques That Actually Work
Chain-of-thought, tree-of-thoughts, self-consistency, and systematic evaluation methods
Beyond basic prompting: master chain-of-thought, self-consistency sampling, tree-of-thoughts, constitutional AI prompting, and systematic evaluation techniques that reliably improve LLM performance.
vLLM Production Deployment: Self-Host Llama 3 at Scale
Deploy Llama 3 with 20x higher throughput than naive serving
Deploy open-source LLMs in production with vLLM. Covers GPU selection, Docker setup, Kubernetes orchestration, AWQ quantization for 75% memory reduction, and cost comparison showing break-even vs OpenAI at 5M tokens/month.
Technical Architecture for AI Startups: From Prototype to Scale
Build AI infrastructure that grows with your startup
Architecture guide for AI startups covering the evolution from prototype to production scale. Includes cost-effective infrastructure choices, avoiding common pitfalls, and when to invest in custom ML.
LLM Fine-Tuning for Production: LoRA, QLoRA & RLHF in 2025
Adapt foundation models to your domain efficiently with parameter-efficient fine-tuning techniques
Fine-tuning LLMs allows adapting powerful foundation models to specific domains without training from scratch. This guide covers LoRA and QLoRA for parameter-efficient fine-tuning, dataset preparation and quality filtering, instruction tuning format, RLHF and DPO for alignment, fine-tuning on consumer GPUs with quantization, evaluation with domain benchmarks, and deploying fine-tuned models with vLLM or TGI for production serving.
Vector Databases & RAG in Production: Pinecone, Weaviate & pgvector in 2025
Build production-grade retrieval-augmented generation systems with vector search at scale
Retrieval-Augmented Generation (RAG) is the dominant pattern for grounding LLMs with up-to-date knowledge. This guide covers vector database selection (Pinecone, Weaviate, Qdrant, pgvector), embedding model selection and optimization, chunking strategies for documents, hybrid search (vector + keyword), re-ranking, evaluating RAG quality, and deploying production RAG systems that stay accurate over time.
Build a Production RAG System with LlamaIndex and Pinecone
Step-by-step guide to retrieval-augmented generation that works on real data
Most RAG tutorials only show the happy path. This guide builds a production-ready RAG system covering chunking strategies, embedding selection, reranking, evaluation, and edge case handling.
AI Agent Frameworks: LangChain, AutoGen & CrewAI for Production in 2025
Build reliable AI agents that use tools, plan multi-step tasks, and collaborate in teams
AI agents go beyond chatbots—they use tools, maintain memory, plan multi-step tasks, and collaborate with other agents. This guide compares LangChain, LangGraph, AutoGen, and CrewAI for different use cases, covers reliable agent design patterns, tool calling best practices, memory architectures (short-term, long-term, episodic), handling errors and hallucinations, and deploying production agents with observability.
LLM Inference Optimization: vLLM, TensorRT-LLM, and Serving at Scale
PagedAttention, continuous batching, quantization, and production serving strategies
LLM 推理优化:vLLM、TensorRT-LLM 与规模化服务(2026):KV 缓存是瓶颈——PagedAttention + 连续批处理是最大吞吐杠杆。vLLM vs TensorRT-LLM 选型、量化/投机解码/前缀缓存/选小模型等其余手段。
CrewAI vs AutoGen vs LangGraph: Multi-Agent Framework 2026
Build production multi-agent systems with the right framework
Comprehensive comparison of CrewAI, AutoGen, and LangGraph for multi-agent AI systems. Covers role-based collaboration, conversation agents, state machines, and production deployment patterns.
Vector Database Guide 2026: Pinecone vs Qdrant vs pgvector vs Weaviate
Choose the right vector database for your RAG application performance and cost
Complete 2026 comparison of Pinecone, Qdrant, pgvector, and Weaviate. Includes Python code examples, performance benchmarks at 1M vectors, filtering, and self-hosting setup.
Fine-Tuning GPT-4 and Claude: When to Fine-Tune vs RAG 2026
Make the right architectural decision: fine-tuning or RAG for your LLM application
Comprehensive guide to deciding between fine-tuning and RAG for LLM applications. Covers fine-tuning GPT-4o mini, LoRA training with Hugging Face, cost comparison, and use case decision framework.
AI System Design Patterns 2026: Rate Limiting, Caching, Fallbacks
Production patterns for reliable, cost-efficient AI applications
Essential system design patterns for production AI applications: token budgeting, response caching, fallback chains, circuit breakers, and monitoring. Reduce costs 60-80% while improving reliability.
Python AI Development Stack 2026: FastAPI + LangChain + Supabase
Build production-ready AI applications with the modern Python AI stack
Complete guide to building production AI applications with FastAPI, LangChain, and Supabase in 2026. Covers project setup, async AI endpoints, RAG pipeline, vector search, and deployment.
Building Real-Time AI Personalization Engines
Deliver hyper-personalized experiences at scale
Design and implement real-time personalization using AI, covering user profiling, content ranking, A/B testing, and multi-armed bandit algorithms for continuous optimization.
AI-Powered Code Review: Beyond Static Analysis
Use LLMs to review code for bugs, security, and quality
Build intelligent code review tools using LLMs that go beyond traditional linters. Detect security vulnerabilities, suggest refactoring, explain complex code, and enforce team conventions automatically.
LLM Inference Optimization: vLLM, TensorRT-LLM & Quantization in 2025
Achieve 10-50x throughput improvements for LLM serving through batching, quantization, and GPU optimization
Serving LLMs in production requires careful optimization to achieve cost-effective performance at scale. This guide covers continuous batching with vLLM, NVIDIA TensorRT-LLM for GPU-optimized inference, speculative decoding, flash attention, KV cache optimization, INT4/INT8 quantization with AWQ and GPTQ, and benchmarking LLM serving systems to find the right performance/cost tradeoff.
AI-Powered DevOps: Intelligent Infrastructure Management and Incident Resolution
AIOps, automated root cause analysis, capacity planning, and self-healing systems
Implement AIOps practices including ML-powered anomaly detection, automated root cause analysis, predictive capacity planning, and self-healing infrastructure for modern cloud environments.
Reducing LLM Hallucinations: Techniques That Actually Work in Production
RAG, self-consistency, chain-of-verification, and calibration for faithful AI outputs
Comprehensive guide to practical techniques for reducing LLM hallucinations in production systems, including RAG, retrieval verification, self-consistency sampling, and chain-of-verification prompting.
AI-Powered Search and Autocomplete with Elasticsearch and LLMs
Semantic search, neural reranking, personalized suggestions, and query understanding
Build an intelligent search system combining Elasticsearch with AI for semantic understanding, neural reranking, personalized autocomplete, and query expansion for superior search relevance.
Production NER Systems: Fine-Tuning spaCy and Transformers for Custom Entities
Training custom NER models, handling low-resource scenarios, and deployment patterns
Build production Named Entity Recognition systems for custom entity types using spaCy and transformer models, covering annotation strategies, active learning, and deployment optimization.
Production Computer Vision with YOLO v11: Object Detection at Scale
Training, optimization, edge deployment, and real-time video processing with YOLO
Build production computer vision systems using YOLO v11 for object detection, including custom training, model optimization with TensorRT, edge deployment, and real-time video stream processing.
Production Document Q&A System: PDF Processing to Enterprise Deployment
Complete guide from PDF parsing to scalable enterprise document intelligence
Build a production document Q&A system from PDF parsing and chunking through vector indexing, RAG-based answering, citation extraction, and enterprise deployment with access controls.
AI Anomaly Detection for Time Series: From Statistical to Deep Learning Approaches
Isolation Forest, LSTM Autoencoders, and production anomaly detection systems
Build production anomaly detection systems for time series data using statistical methods, isolation forest, LSTM autoencoders, and modern time series foundation models for infrastructure and IoT monitoring.
LLM Application Architecture Patterns: From Simple to Complex Systems
Simple chains, RAG, agents, and multi-agent patterns with decision frameworks
Comprehensive guide to LLM application architecture patterns from simple prompt-response to complex multi-agent systems, with a decision framework for choosing the right architecture.
AI Content Moderation at Scale: Building Trust and Safety Systems
Multi-modal content classification, human review workflows, and policy enforcement
Design production-grade AI content moderation systems for text, images, and video, covering classification models, human review workflows, policy management, and appeals processes.
Designing AI-Powered APIs: Best Practices for LLM-Backed Services
Rate limiting, streaming, idempotency, and versioning for AI APIs in production
Design patterns and best practices for building robust AI-powered REST and WebSocket APIs including streaming responses, idempotency, rate limiting, versioning, and managing non-deterministic outputs.
AI Function Calling and Tool Use: Production Patterns and Best Practices
Building reliable tool-using agents with OpenAI, Anthropic, and open source models
Master AI function calling and tool use patterns for building reliable agents, covering tool design, error handling, parallel tool execution, and preventing tool abuse.
LLM Cost Optimization: Reduce API Costs by 60-80% Without Sacrificing Quality
Caching, model routing, prompt compression, batching, and smart model selection
Practical strategies to dramatically reduce LLM API costs including semantic caching, intelligent model routing, prompt compression, request batching, and monitoring cost per feature.
LangChain in Production: Best Practices, Pitfalls, and Performance Optimization
Lessons from deploying LangChain applications handling millions of requests
Production guide for LangChain applications covering caching strategies, error handling, observability with LangSmith, cost optimization, and common anti-patterns to avoid.
Synthetic Data Generation for AI: Techniques, Tools, and Quality Evaluation
GANs, diffusion models, LLM-based generation, and validation methods for synthetic datasets
Learn to generate high-quality synthetic data for AI training using LLMs, GANs, and diffusion models. Covers data augmentation, privacy-preserving synthesis, and evaluating synthetic data quality.
On-Device AI: Running LLMs on iPhone, Android, and Edge Devices in 2025
CoreML, ONNX Runtime, MLC-LLM, and optimization techniques for edge inference
Technical guide to deploying AI models on edge devices including mobile phones, IoT devices, and edge servers using Apple CoreML, Android NNAPI, MLC-LLM, and hardware-specific optimizations.
Advanced RAG: Moving Beyond Naive Retrieval to Production-Grade Systems
Corrective RAG, Self-RAG, adaptive retrieval, and evaluation with RAGAS
Go beyond basic RAG implementation to build production-grade retrieval-augmented generation systems with query rewriting, reranking, corrective mechanisms, and comprehensive evaluation.
Building Enterprise Semantic Search with AI: Beyond Keyword Matching
Hybrid search, reranking, and personalization for intelligent enterprise knowledge systems
Design and implement enterprise semantic search systems that combine vector embeddings, BM25 keyword search, and LLM reranking for accurate, fast, and contextually relevant results.