DeepSeek-R1 Local Deployment Complete Guide: Run a Top-Tier Reasoning Model at Zero Cost

Run DeepSeek-R1 on Mac/Linux/Windows with Ollama, fully offline, data stays local

DeepSeek-R1 Local Deployment Complete Guide

Supported Systems: macOS 12+, Ubuntu 20.04+, Windows 11 (WSL2)

Minimum Requirements: 8GB RAM (for 7B version); 32GB RAM recommended (for 70B version)

Why Choose DeepSeek-R1?

In January 2025, the release of DeepSeek-R1 shook the entire AI industry:

Performance: Math, code, and logical reasoning capabilities on par with OpenAI o1

Cost: Training cost only 3% of o1, API price only 1/20 of OpenAI

Open Source: Fully open-source under MIT license, can run locally, data never leaves your machine

Scale: From 1.5B to 671B parameters, fitting everything from laptops to servers

Local Deployment vs Cloud API

Local Deployment (Ollama)Cloud API (DeepSeek.com)

CostOne-time setup, free foreverPay per Token Privacy✅ Data stays completely local❌ Data uploaded to server SpeedDepends on hardware (M3 MacBook ~30 tok/s)Stable and fast Offline Use✅ Works without internet❌ Requires internet Model SizeLimited by local memoryCan use largest models

Step 1: Install Ollama

Ollama is the simplest tool for running local large models, supporting 100+ open-source models.

macOS

bash
Method 1: Download installer from website (recommended for beginners)
Visit https://ollama.ai to download the .dmg file, double-click to install
Method 2: Install via command line
brew install ollama

Linux

bash
curl -fsSL https://ollama.ai/install.sh | sh

Windows (WSL2)

bash
First install WSL2, then run in the WSL terminal:
curl -fsSL https://ollama.ai/install.sh | sh

Verify installation:

bash
ollama --version
Output similar to: ollama version 0.5.x

Step 2: Choose the Right DeepSeek-R1 Version

Choose a version based on your RAM/VRAM:

VersionModel SizeMinimum RAMRecommended Use Case

deepseek-r1:1.5b~1GB4GBLightweight testing deepseek-r1:7b~4.7GB8GBDaily use (recommended for starters) deepseek-r1:14b~9GB16GBStronger reasoning deepseek-r1:32b~20GB32GBNear cloud quality deepseek-r1:70b~43GB64GBBest local version

Recommendations:

Regular MacBook (16GB unified memory) → deepseek-r1:14b

M3 Max MacBook Pro (32GB) → deepseek-r1:32b

High-performance workstation → deepseek-r1:70b

Step 3: Download and Run the Model

bash
Download and run directly (first time requires download, takes a few minutes to tens of minutes)
ollama run deepseek-r1:14b
Or download first, then run
ollama pull deepseek-r1:14b
ollama run deepseek-r1:14b

Once successful, you'll see a command-line interactive interface:


>>> Write a quick sort algorithm for me

The user wants a quick sort algorithm...
Here is a Python implementation of quick sort:def quicksort(arr):
    if len(arr) <= 1:
        return arr
    pivot = arr[len(arr) // 2]
    ...

💡 R1's signature: You'll see ... tags, which are the model's "chain of thought" during reasoning. DeepSeek-R1 displays its reasoning process.

Exit the conversation: type /bye or press Ctrl+D

Step 4: Common Management Commands

bash
List downloaded models
ollama list
Delete a model (free up disk space)
ollama rm deepseek-r1:1.5b
Run Ollama service in background (API mode)
ollama serve
View running models
ollama ps
Update a model to the latest version
ollama pull deepseek-r1:14b

Step 5: Integrate with Cursor (AI Coding Assistant)

Ollama provides an OpenAI-compatible API interface that seamlessly integrates with Cursor:

Open Cursor → Settings (gear icon) → Models

Click Add Model, fill in:

- Base URL: http://localhost:11434/v1 - Model name: deepseek-r1:14b - API Key: ollama (any value, no local verification)

In the Chat panel, select deepseek-r1:14b as the current model

Test: type "Help me optimize the performance of this code"

Step 6: Integrate with VS Code (Continue Plugin)

Continue is the best AI coding plugin for VS Code, with native support for Ollama:

Install the Continue plugin

Edit ~/.continue/config.json:

json
{
  "models": [
    {
      "title": "DeepSeek-R1 14B (Local)",
      "provider": "ollama",
      "model": "deepseek-r1:14b",
      "apiBase": "http://localhost:11434"
    }
  ],
  "tabAutocompleteModel": {
    "title": "DeepSeek-R1 1.5B (Fast)",
    "provider": "ollama",
    "model": "deepseek-r1:1.5b"
  }
}

💡 Tip: Use the large model for conversations (14B) and the small model for Tab completion (1.5B) for faster speed and better experience.

Step 7: Use via API (Advanced)

Once Ollama is running, it provides an OpenAI-compatible API at http://localhost:11434:

python
from openai import OpenAI
Point to local Ollama
client = OpenAI(
    base_url='http://localhost:11434/v1',
    api_key='ollama'  # any string
)
response = client.chat.completions.create(
    model='deepseek-r1:14b',
    messages=[
        {'role': 'user', 'content': 'Write a binary search function in Python with complete tests'}
    ]
)print(response.choices[0].message.content)

Performance Optimization Tips

Enable GPU Acceleration (if you have a dedicated GPU)

bash
Ollama automatically detects NVIDIA/AMD GPUs, no extra configuration needed
Verify GPU usage:
ollama run deepseek-r1:14b
After running, execute:
ollama ps
If you see GPU: NVIDIA GeForce..., GPU is enabled

Apple Silicon Optimization

The unified memory architecture of M-series chips is particularly friendly to LLM inference. The 14B model can reach 30-50 tokens/s, close to cloud API experience.

bash
View real-time speed
ollama run deepseek-r1:14b --verbose

Concurrent Requests

Ollama supports concurrent requests by default, suitable for building multi-user applications:

bash
Set maximum concurrency (default is 1)
OLLAMA_NUM_PARALLEL=4 ollama serve

Other Recommended Local Models

ModelFeaturesBest For

qwen2.5-coder:7bCode-specific, HumanEval 92%Code generation and completion qwen2.5:14bExcellent Chinese capabilityChinese writing and conversation llama3.3:70bMeta flagship, strong general abilityComprehensive tasks (needs 64GB+ RAM) mistral:7bVery fastReal-time conversation and completion nomic-embed-textText embeddingRAG knowledge base

FAQ

Q: What if the model download is very slow? A: You can use a mirror for acceleration (for users in China):

bash
OLLAMA_REGISTRY_URL=https://registry.ollama.ai ollama pull deepseek-r1:14b

Or manually download the GGUF file from HuggingFace mirror and import it.

Q: What if I get a "context length exceeded" error? A: Reduce the context window:

bash
ollama run deepseek-r1:14b --context-length 4096

Q: How do I make Ollama start on boot?

macOS:

bash
brew services start ollama

Linux (systemd):

bash
sudo systemctl enable ollama
sudo systemctl start ollama

Q: What's the difference between DeepSeek-R1 and DeepSeek-V3? A:

R1: Reasoning-specific model, has reasoning process, suitable for math/code/logic problems

V3: General flagship model, faster, suitable for writing/conversation/daily tasks

For coding scenarios, R1 is recommended; for daily conversation, V3 is recommended (ollama run deepseek-v3:8b).

Also available in 中文.