← Back to tutorials

DeepSeek-R1 Local Deployment Complete Guide: Run a Top-Tier Reasoning Model at Zero Cost

Run DeepSeek-R1 on Mac/Linux/Windows with Ollama, fully offline, data stays local

DeepSeek-R1 Local Deployment Complete Guide

Supported Systems: macOS 12+, Ubuntu 20.04+, Windows 11 (WSL2)
Minimum Requirements: 8GB RAM (for 7B version); 32GB RAM recommended (for 70B version)


Why Choose DeepSeek-R1?

In January 2025, the release of DeepSeek-R1 shook the entire AI industry:

  • Performance: Math, code, and logical reasoning capabilities on par with OpenAI o1
  • Cost: Training cost only 3% of o1, API price only 1/20 of OpenAI
  • Open Source: Fully open-source under MIT license, can run locally, data never leaves your machine
  • Scale: From 1.5B to 671B parameters, fitting everything from laptops to servers
  • Local Deployment vs Cloud API

    Local Deployment (Ollama)Cloud API (DeepSeek.com)

    CostOne-time setup, free foreverPay per Token Privacy✅ Data stays completely local❌ Data uploaded to server SpeedDepends on hardware (M3 MacBook ~30 tok/s)Stable and fast Offline Use✅ Works without internet❌ Requires internet Model SizeLimited by local memoryCan use largest models


    Step 1: Install Ollama

    Ollama is the simplest tool for running local large models, supporting 100+ open-source models.

    macOS

    bash
    

    Method 1: Download installer from website (recommended for beginners)

    Visit https://ollama.ai to download the .dmg file, double-click to install

    Method 2: Install via command line

    brew install ollama

    Linux

    bash
    curl -fsSL https://ollama.ai/install.sh | sh
    

    Windows (WSL2)

    bash
    

    First install WSL2, then run in the WSL terminal:

    curl -fsSL https://ollama.ai/install.sh | sh

    Verify installation:

    bash
    ollama --version
    

    Output similar to: ollama version 0.5.x


    Step 2: Choose the Right DeepSeek-R1 Version

    Choose a version based on your RAM/VRAM:

    VersionModel SizeMinimum RAMRecommended Use Case

    deepseek-r1:1.5b~1GB4GBLightweight testing deepseek-r1:7b~4.7GB8GBDaily use (recommended for starters) deepseek-r1:14b~9GB16GBStronger reasoning deepseek-r1:32b~20GB32GBNear cloud quality deepseek-r1:70b~43GB64GBBest local version

    Recommendations:

  • Regular MacBook (16GB unified memory) → deepseek-r1:14b
  • M3 Max MacBook Pro (32GB) → deepseek-r1:32b
  • High-performance workstation → deepseek-r1:70b

  • Step 3: Download and Run the Model

    bash
    

    Download and run directly (first time requires download, takes a few minutes to tens of minutes)

    ollama run deepseek-r1:14b

    Or download first, then run

    ollama pull deepseek-r1:14b ollama run deepseek-r1:14b

    Once successful, you'll see a command-line interactive interface:

    
    >>> Write a quick sort algorithm for me

    The user wants a quick sort algorithm...

    Here is a Python implementation of quick sort:

    def quicksort(arr): if len(arr) <= 1: return arr pivot = arr[len(arr) // 2] ...

    💡 R1's signature: You'll see ... tags, which are the model's "chain of thought" during reasoning. DeepSeek-R1 displays its reasoning process.

    Exit the conversation: type /bye or press Ctrl+D


    Step 4: Common Management Commands

    bash
    

    List downloaded models

    ollama list

    Delete a model (free up disk space)

    ollama rm deepseek-r1:1.5b

    Run Ollama service in background (API mode)

    ollama serve

    View running models

    ollama ps

    Update a model to the latest version

    ollama pull deepseek-r1:14b


    Step 5: Integrate with Cursor (AI Coding Assistant)

    Ollama provides an OpenAI-compatible API interface that seamlessly integrates with Cursor:

  • Open Cursor → Settings (gear icon)Models
  • Click Add Model, fill in:
  • - Base URL: http://localhost:11434/v1 - Model name: deepseek-r1:14b - API Key: ollama (any value, no local verification)

  • In the Chat panel, select deepseek-r1:14b as the current model
  • Test: type "Help me optimize the performance of this code"

  • Step 6: Integrate with VS Code (Continue Plugin)

    Continue is the best AI coding plugin for VS Code, with native support for Ollama:

  • Install the Continue plugin
  • Edit ~/.continue/config.json:
  • json
    {
      "models": [
        {
          "title": "DeepSeek-R1 14B (Local)",
          "provider": "ollama",
          "model": "deepseek-r1:14b",
          "apiBase": "http://localhost:11434"
        }
      ],
      "tabAutocompleteModel": {
        "title": "DeepSeek-R1 1.5B (Fast)",
        "provider": "ollama",
        "model": "deepseek-r1:1.5b"
      }
    }
    

    💡 Tip: Use the large model for conversations (14B) and the small model for Tab completion (1.5B) for faster speed and better experience.


    Step 7: Use via API (Advanced)

    Once Ollama is running, it provides an OpenAI-compatible API at http://localhost:11434:

    python
    from openai import OpenAI

    Point to local Ollama

    client = OpenAI( base_url='http://localhost:11434/v1', api_key='ollama' # any string )

    response = client.chat.completions.create( model='deepseek-r1:14b', messages=[ {'role': 'user', 'content': 'Write a binary search function in Python with complete tests'} ] )

    print(response.choices[0].message.content)


    Performance Optimization Tips

    Enable GPU Acceleration (if you have a dedicated GPU)

    bash
    

    Ollama automatically detects NVIDIA/AMD GPUs, no extra configuration needed

    Verify GPU usage:

    ollama run deepseek-r1:14b

    After running, execute:

    ollama ps

    If you see GPU: NVIDIA GeForce..., GPU is enabled

    Apple Silicon Optimization

    The unified memory architecture of M-series chips is particularly friendly to LLM inference. The 14B model can reach 30-50 tokens/s, close to cloud API experience.

    bash
    

    View real-time speed

    ollama run deepseek-r1:14b --verbose

    Concurrent Requests

    Ollama supports concurrent requests by default, suitable for building multi-user applications:

    bash
    

    Set maximum concurrency (default is 1)

    OLLAMA_NUM_PARALLEL=4 ollama serve


    Other Recommended Local Models

    ModelFeaturesBest For

    qwen2.5-coder:7bCode-specific, HumanEval 92%Code generation and completion qwen2.5:14bExcellent Chinese capabilityChinese writing and conversation llama3.3:70bMeta flagship, strong general abilityComprehensive tasks (needs 64GB+ RAM) mistral:7bVery fastReal-time conversation and completion nomic-embed-textText embeddingRAG knowledge base


    FAQ

    Q: What if the model download is very slow? A: You can use a mirror for acceleration (for users in China):

    bash
    OLLAMA_REGISTRY_URL=https://registry.ollama.ai ollama pull deepseek-r1:14b
    
    Or manually download the GGUF file from HuggingFace mirror and import it.

    Q: What if I get a "context length exceeded" error? A: Reduce the context window:

    bash
    ollama run deepseek-r1:14b --context-length 4096
    

    Q: How do I make Ollama start on boot?

    macOS:

    bash
    brew services start ollama
    

    Linux (systemd):

    bash
    sudo systemctl enable ollama
    sudo systemctl start ollama
    

    Q: What's the difference between DeepSeek-R1 and DeepSeek-V3? A:

  • R1: Reasoning-specific model, has reasoning process, suitable for math/code/logic problems
  • V3: General flagship model, faster, suitable for writing/conversation/daily tasks
  • For coding scenarios, R1 is recommended; for daily conversation, V3 is recommended (ollama run deepseek-v3:8b).

    Also available in 中文.