Ollama Guide: Run Llama 3 and Mistral Locally on Mac and PC

Complete privacy with zero API costs - setup, models, and integration

返回教程列表
入门14 分钟

Ollama Guide: Run Llama 3 and Mistral Locally on Mac and PC

Complete privacy with zero API costs - setup, models, and integration

Run powerful AI models locally with Ollama for complete privacy. Covers installation, model selection guide, OpenAI-compatible API, LangChain integration, performance on Mac M-series, and privacy use cases.

ollamalocal llmprivate aillamamistraloffline ai

Ollama: Run LLMs Locally for Free

Why Run Locally?

  • Complete privacy: data never leaves your machine
  • Zero API costs: unlimited use after setup
  • Offline capability
  • No rate limits for development
  • Trade-off: Slower than cloud APIs, smaller models than GPT-4o.

    Installation

    macOS: brew install ollama Linux: curl -fsSL https://ollama.ai/install.sh | sh Windows: download installer from ollama.ai

    Running Models

    bash
    ollama run llama3:8b
    ollama run mistral:7b
    ollama run codellama:13b

    ollama list # List installed models ollama pull phi3:mini # Download without running

    Model Guide

    ModelRAMBest Use

    llama3:8b8GBGeneral use, coding llama3:70b40GBComplex reasoning mistral:7b8GBFast instructions codellama:13b8GBCode generation phi3:mini4GBVery fast, low resource

    Mac M1/M2/M3: Uses Metal GPU acceleration automatically.

    API Integration

    python
    from openai import OpenAI
    client = OpenAI(
        base_url="http://localhost:11434/v1",
        api_key="ollama"
    )
    response = client.chat.completions.create(
        model="llama3:8b",
        messages=[{"role": "user", "content": "Hello"}]
    )
    print(response.choices[0].message.content)
    

    LangChain

    python
    from langchain_community.llms import Ollama
    llm = Ollama(model="llama3:8b")
    result = llm.invoke("Explain Python vs JavaScript differences")
    

    Best Use Cases

    Privacy-sensitive:

  • Legal document analysis
  • Medical record processing
  • Proprietary code review
  • Private journal and notes
  • Development:

  • Prototyping without API costs
  • Testing without rate limits
  • Offline work on flights
  • Performance (Mac M-series)

    Llama 3 8B on M2 Pro: 20-40 tokens/second Llama 3 70B requires M2 Ultra or 64GB+ RAM

    相关工具

    OllamaLangChainLlamaIndex