Ollama Guide: Run Llama 3 and Mistral Locally on Mac and PC

Complete privacy with zero API costs - setup, models, and integration

入门约 14 分钟

Ollama Guide: Run Llama 3 and Mistral Locally on Mac and PC

Complete privacy with zero API costs - setup, models, and integration

Run powerful AI models locally with Ollama for complete privacy. Covers installation, model selection guide, OpenAI-compatible API, LangChain integration, performance on Mac M-series, and privacy use cases.

ollamalocal llmprivate aillamamistraloffline ai

Ollama: Run LLMs Locally for Free

Why Run Locally?

Complete privacy: data never leaves your machine

Zero API costs: unlimited use after setup

Offline capability

No rate limits for development

Trade-off: Slower than cloud APIs, smaller models than GPT-4o.

Installation

macOS: brew install ollama Linux: curl -fsSL https://ollama.ai/install.sh | sh Windows: download installer from ollama.ai

Running Models

bash ollama run llama3:8b ollama run mistral:7b ollama run codellama:13b

ollama list # List installed models ollama pull phi3:mini # Download without running

Model Guide

ModelRAMBest Use

llama3:8b8GBGeneral use, coding llama3:70b40GBComplex reasoning mistral:7b8GBFast instructions codellama:13b8GBCode generation phi3:mini4GBVery fast, low resource

Mac M1/M2/M3: Uses Metal GPU acceleration automatically.

API Integration

python
from openai import OpenAI
client = OpenAI(
    base_url="http://localhost:11434/v1",
    api_key="ollama"
)
response = client.chat.completions.create(
    model="llama3:8b",
    messages=[{"role": "user", "content": "Hello"}]
)
print(response.choices[0].message.content)

LangChain

python
from langchain_community.llms import Ollama
llm = Ollama(model="llama3:8b")
result = llm.invoke("Explain Python vs JavaScript differences")

Best Use Cases

Privacy-sensitive:

Legal document analysis

Medical record processing

Proprietary code review

Private journal and notes

Development:

Prototyping without API costs

Testing without rate limits

Offline work on flights

Performance (Mac M-series)

Llama 3 8B on M2 Pro: 20-40 tokens/second Llama 3 70B requires M2 Ultra or 64GB+ RAM

Getting Started

Learn how to get started with this application.

Learn more

Installation Guide

Ollama Guide: Run Llama 3 and Mistral Locally on Mac and PC

Ollama: Run LLMs Locally for Free

Why Run Locally?

Installation

Running Models

Model Guide

API Integration

LangChain

Best Use Cases

Performance (Mac M-series)

Documentation

Getting Started

Learn more