← Back to tutorials

Tongyi Qianwen API Developer Guide 2026: The Most Cost-Effective Domestic LLM Integration Solution

From API Integration to Production Deployment: A Full-Stack Qwen Development Tutorial

Qwen (Tongyi Qianwen) is a large language model series developed by Alibaba, occupying a unique position in the domestic AI landscape: it offers both a closed-source cloud API (via Alibaba Cloud) and an open-source version (runnable locally with Ollama).

For developers, this means: rapid development and validation using the API, with a path to local deployment if needed.

1. Qwen Model Overview

ModelUse CaseContextPrice (per million tokens)

Qwen-LongUltra-long documents10M tokensInput ¥0.5, Output ¥2 Qwen-MaxHighest quality32KInput ¥40, Output ¥120 Qwen-PlusBalanced131KInput ¥4, Output ¥12 Qwen-TurboFast/Low cost131KInput ¥2, Output ¥6 Qwen2.5-Coder-32BCode-specific131KPay-as-you-go

Cost comparison: Qwen-Plus is about 90% cheaper than GPT-4o, with comparable quality on Chinese tasks.

2. API Integration

Obtaining an API Key

  • Visit dashscope.aliyuncs.com
  • Log in with your Alibaba Cloud account
  • Activate the DashScope service
  • Create an API Key in the console
  • Python Invocation

    python
    

    Method 1: OpenAI-compatible format (recommended)

    from openai import OpenAI

    client = OpenAI( api_key="your_dashscope_api_key", base_url="https://dashscope.aliyuncs.com/compatible-mode/v1" )

    response = client.chat.completions.create( model="qwen-plus", messages=[ {"role": "system", "content": "You are a professional code reviewer"}, {"role": "user", "content": "Please review this Python code..."} ], temperature=0.7, max_tokens=2000 )

    print(response.choices[0].message.content)

    python
    

    Method 2: DashScope SDK

    import dashscope from dashscope import Generation

    dashscope.api_key = "your_api_key"

    response = Generation.call( model="qwen-plus", messages=[{"role": "user", "content": "Hello"}], result_format='message' ) print(response.output.choices[0].message.content)

    Streaming Output

    python
    stream = client.chat.completions.create(
        model="qwen-plus",
        messages=[{"role": "user", "content": "Write a 500-word article"}],
        stream=True
    )

    for chunk in stream: if chunk.choices[0].delta.content: print(chunk.choices[0].delta.content, end="", flush=True)

    3. Multimodal Capabilities

    The Qwen-VL series supports image understanding:

    python
    response = client.chat.completions.create(
        model="qwen-vl-max",
        messages=[
            {
                "role": "user",
                "content": [
                    {"type": "image_url", "image_url": {"url": "https://..."}},
                    {"type": "text", "text": "What is in this image?"}
                ]
            }
        ]
    )
    

    4. Production-Level Usage Recommendations

    Error Handling and Retries

    python
    import time
    from openai import RateLimitError, APIConnectionError

    def call_with_retry(client, **kwargs, max_retries=3): for attempt in range(max_retries): try: return client.chat.completions.create(**kwargs) except RateLimitError: wait_time = 2 ** attempt print(f"Rate limit hit, waiting {wait_time}s...") time.sleep(wait_time) except APIConnectionError as e: if attempt == max_retries - 1: raise time.sleep(1) raise Exception("Max retries exceeded")

    Cost Control

    python
    

    Request concise output in the system prompt

    system_prompt = """You are an assistant. Please answer concisely, avoid unnecessary elaboration, and do not repeat information."""

    Set a reasonable max_tokens

    response = client.chat.completions.create( model="qwen-turbo", # Use turbo for lightweight tasks messages=[...], max_tokens=500 # Control output length )

    Multi-Model Routing Strategy

    python
    def get_model(task_type: str, content_length: int) -> str:
        """Select the most suitable model based on task type and content length"""
        if content_length > 50000:
            return "qwen-long"
        elif task_type == "code":
            return "qwen2.5-coder-32b-instruct"
        elif task_type == "simple":
            return "qwen-turbo"
        else:
            return "qwen-plus"
    

    5. Integration with Vercel AI SDK

    typescript
    // Integrate Qwen in Next.js
    import { createOpenAI } from '@ai-sdk/openai';

    const qwen = createOpenAI({ apiKey: process.env.DASHSCOPE_API_KEY, baseURL: 'https://dashscope.aliyuncs.com/compatible-mode/v1' });

    const result = streamText({ model: qwen('qwen-plus'), messages: [...], });


    Further Reading

  • Kimi K2 Complete Usage Guide
  • Vercel AI SDK Practical Tutorial
  • LLM API Cost Optimization Guide
  • Also available in 中文.