← Back to tutorials

The Complete Guide to Local LLMs 2026: Running AI Models on Your Own Machine with Ollama

Installation, Model Selection, API Integration—Run AI Completely Locally

The Complete Guide to Local LLMs 2026: Running AI Models with Ollama

Why Run LLMs Locally?

  • Privacy: Data never leaves your machine, ideal for sensitive information
  • Cost: One-time hardware investment, no API fees
  • Latency: No network delays
  • Availability: No internet dependency
  • Installation (3 minutes)

    bash
    

    macOS/Linux

    curl -fsSL https://ollama.com/install.sh | sh ollama run llama3.2 # Run your first model

    Recommended Models

    ModelSizeMemoryBest For

    Llama 3.2 3B2GB4GBLow-end devices Qwen2.5 7B5GB8GBChinese tasks DeepSeek-R1 7B5GB8GBLogical reasoning Llama 3.1 70B40GB64GBHigh-quality generation

    API Integration (OpenAI Compatible)

    python
    from openai import OpenAI
    client = OpenAI(base_url='http://localhost:11434/v1', api_key='ollama')

    response = client.chat.completions.create( model="qwen2.5:7b", messages=[{"role": "user", "content": "Write a quicksort"}] )

    Custom Models

    dockerfile
    FROM qwen2.5:7b
    SYSTEM """You are a professional code review assistant"""
    PARAMETER temperature 0.1
    

    bash
    ollama create code-reviewer -f ./Modelfile
    

    Comparison with Cloud APIs

    On MacBook M3 Max:

  • Qwen2.5 7B: 50 tokens/s, excellent Chinese performance
  • Llama 3.1 8B: 45 tokens/s, balanced capabilities
  • Summary

    Local LLMs complement cloud APIs, ideal for: private data processing, high-frequency small tasks, offline scenarios, and development/testing.

    Getting started tip: Mac users with M-series chips can directly use Qwen2.5:7b.

    Also available in 中文.