Tongyi Qianwen API Developer Guide 2026: The Most Cost-Effective Domestic LLM Integration Solution

From API Integration to Production Deployment: A Full-Stack Qwen Development Tutorial

By AI Skill Navigation Editorial TeamPublished May 27, 2026

Qwen (Tongyi Qianwen) is a large language model series developed by Alibaba, occupying a unique position in the domestic AI landscape: it offers both a closed-source cloud API (via Alibaba Cloud) and an open-source version (runnable locally with Ollama).

For developers, this means: rapid development and validation using the API, with a path to local deployment if needed.

1. Qwen Model Overview

ModelUse CaseContextPrice (per million tokens)

Qwen-LongUltra-long documents10M tokensInput ¥0.5, Output ¥2 Qwen-MaxHighest quality32KInput ¥40, Output ¥120 Qwen-PlusBalanced131KInput ¥4, Output ¥12 Qwen-TurboFast/Low cost131KInput ¥2, Output ¥6 Qwen2.5-Coder-32BCode-specific131KPay-as-you-go

Cost comparison: Qwen-Plus is about 90% cheaper than GPT-4o, with comparable quality on Chinese tasks.

2. API Integration

Obtaining an API Key

Visit dashscope.aliyuncs.com

Activate the DashScope service

Create an API Key in the console

Python Invocation

python
Method 1: OpenAI-compatible format (recommended)
from openai import OpenAI
client = OpenAI(
    api_key="your_dashscope_api_key",
    base_url="https://dashscope.aliyuncs.com/compatible-mode/v1"
)
response = client.chat.completions.create(
    model="qwen-plus",
    messages=[
        {"role": "system", "content": "You are a professional code reviewer"},
        {"role": "user", "content": "Please review this Python code..."}
    ],
    temperature=0.7,
    max_tokens=2000
)print(response.choices[0].message.content)

python
Method 2: DashScope SDK
import dashscope
from dashscope import Generation
dashscope.api_key = "your_api_key"response = Generation.call(
    model="qwen-plus",
    messages=[{"role": "user", "content": "Hello"}],
    result_format='message'
)
print(response.output.choices[0].message.content)

Streaming Output

python
stream = client.chat.completions.create(
    model="qwen-plus",
    messages=[{"role": "user", "content": "Write a 500-word article"}],
    stream=True
)for chunk in stream:
    if chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="", flush=True)

3. Multimodal Capabilities

The Qwen-VL series supports image understanding:

python
response = client.chat.completions.create(
    model="qwen-vl-max",
    messages=[
        {
            "role": "user",
            "content": [
                {"type": "image_url", "image_url": {"url": "https://..."}},
                {"type": "text", "text": "What is in this image?"}
            ]
        }
    ]
)

4. Production-Level Usage Recommendations

Error Handling and Retries

python
import time
from openai import RateLimitError, APIConnectionErrordef call_with_retry(client, **kwargs, max_retries=3):
    for attempt in range(max_retries):
        try:
            return client.chat.completions.create(**kwargs)
        except RateLimitError:
            wait_time = 2 ** attempt
            print(f"Rate limit hit, waiting {wait_time}s...")
            time.sleep(wait_time)
        except APIConnectionError as e:
            if attempt == max_retries - 1:
                raise
            time.sleep(1)
    raise Exception("Max retries exceeded")

Cost Control

python
Request concise output in the system prompt
system_prompt = """You are an assistant.
Please answer concisely, avoid unnecessary elaboration,
and do not repeat information."""
Set a reasonable max_tokens
response = client.chat.completions.create(
    model="qwen-turbo",  # Use turbo for lightweight tasks
    messages=[...],
    max_tokens=500  # Control output length
)

Multi-Model Routing Strategy

python
def get_model(task_type: str, content_length: int) -> str:
    """Select the most suitable model based on task type and content length"""
    if content_length > 50000:
        return "qwen-long"
    elif task_type == "code":
        return "qwen2.5-coder-32b-instruct"
    elif task_type == "simple":
        return "qwen-turbo"
    else:
        return "qwen-plus"

5. Integration with Vercel AI SDK

typescript
// Integrate Qwen in Next.js
import { createOpenAI } from '@ai-sdk/openai';
const qwen = createOpenAI({
  apiKey: process.env.DASHSCOPE_API_KEY,
  baseURL: 'https://dashscope.aliyuncs.com/compatible-mode/v1'
});const result = streamText({
  model: qwen('qwen-plus'),
  messages: [...],
});

Tongyi Qianwen API Developer Guide 2026: The Most Cost-Effective Domestic LLM Integration Solution

1. Qwen Model Overview

2. API Integration

Obtaining an API Key

Python Invocation

Method 1: OpenAI-compatible format (recommended)

Method 2: DashScope SDK

Streaming Output

3. Multimodal Capabilities

4. Production-Level Usage Recommendations

Error Handling and Retries

Cost Control

Request concise output in the system prompt

Set a reasonable max_tokens

Multi-Model Routing Strategy

5. Integration with Vercel AI SDK

Further Reading

Documentation

Getting Started

Learn more