Tongyi Qianwen API Developer Guide 2026: The Most Cost-Effective Domestic LLM Integration Solution
From API Integration to Production Deployment: A Full-Stack Qwen Development Tutorial
Qwen (Tongyi Qianwen) is a large language model series developed by Alibaba, occupying a unique position in the domestic AI landscape: it offers both a closed-source cloud API (via Alibaba Cloud) and an open-source version (runnable locally with Ollama).
For developers, this means: rapid development and validation using the API, with a path to local deployment if needed.
1. Qwen Model Overview
Cost comparison: Qwen-Plus is about 90% cheaper than GPT-4o, with comparable quality on Chinese tasks.
2. API Integration
Obtaining an API Key
Python Invocation
python
Method 1: OpenAI-compatible format (recommended)
from openai import OpenAIclient = OpenAI(
api_key="your_dashscope_api_key",
base_url="https://dashscope.aliyuncs.com/compatible-mode/v1"
)
response = client.chat.completions.create(
model="qwen-plus",
messages=[
{"role": "system", "content": "You are a professional code reviewer"},
{"role": "user", "content": "Please review this Python code..."}
],
temperature=0.7,
max_tokens=2000
)
print(response.choices[0].message.content)
python
Method 2: DashScope SDK
import dashscope
from dashscope import Generationdashscope.api_key = "your_api_key"
response = Generation.call(
model="qwen-plus",
messages=[{"role": "user", "content": "Hello"}],
result_format='message'
)
print(response.output.choices[0].message.content)
Streaming Output
python
stream = client.chat.completions.create(
model="qwen-plus",
messages=[{"role": "user", "content": "Write a 500-word article"}],
stream=True
)for chunk in stream:
if chunk.choices[0].delta.content:
print(chunk.choices[0].delta.content, end="", flush=True)
3. Multimodal Capabilities
The Qwen-VL series supports image understanding:
python
response = client.chat.completions.create(
model="qwen-vl-max",
messages=[
{
"role": "user",
"content": [
{"type": "image_url", "image_url": {"url": "https://..."}},
{"type": "text", "text": "What is in this image?"}
]
}
]
)
4. Production-Level Usage Recommendations
Error Handling and Retries
python
import time
from openai import RateLimitError, APIConnectionErrordef call_with_retry(client, **kwargs, max_retries=3):
for attempt in range(max_retries):
try:
return client.chat.completions.create(**kwargs)
except RateLimitError:
wait_time = 2 ** attempt
print(f"Rate limit hit, waiting {wait_time}s...")
time.sleep(wait_time)
except APIConnectionError as e:
if attempt == max_retries - 1:
raise
time.sleep(1)
raise Exception("Max retries exceeded")
Cost Control
python
Request concise output in the system prompt
system_prompt = """You are an assistant.
Please answer concisely, avoid unnecessary elaboration,
and do not repeat information."""Set a reasonable max_tokens
response = client.chat.completions.create(
model="qwen-turbo", # Use turbo for lightweight tasks
messages=[...],
max_tokens=500 # Control output length
)
Multi-Model Routing Strategy
python
def get_model(task_type: str, content_length: int) -> str:
"""Select the most suitable model based on task type and content length"""
if content_length > 50000:
return "qwen-long"
elif task_type == "code":
return "qwen2.5-coder-32b-instruct"
elif task_type == "simple":
return "qwen-turbo"
else:
return "qwen-plus"
5. Integration with Vercel AI SDK
typescript
// Integrate Qwen in Next.js
import { createOpenAI } from '@ai-sdk/openai';const qwen = createOpenAI({
apiKey: process.env.DASHSCOPE_API_KEY,
baseURL: 'https://dashscope.aliyuncs.com/compatible-mode/v1'
});
const result = streamText({
model: qwen('qwen-plus'),
messages: [...],
});
Further Reading
Also available in 中文.