Gemini 2.0 API Tutorial 2026: Multimodal AI with 2M Token Context

Build multimodal AI apps with Gemini 2.0 Flash and Pro: vision, audio, documents

返回教程列表
进阶35 分钟

Gemini 2.0 API Tutorial 2026: Multimodal AI with 2M Token Context

Build multimodal AI apps with Gemini 2.0 Flash and Pro: vision, audio, documents

Complete Gemini 2.0 API tutorial covering multimodal inputs, 2M token context, function calling, grounding with Google Search, and code execution.

Gemini 2.0 API Tutorial 2026: Multimodal AI with 2M Token Context

Gemini 2.0 is Google's most capable multimodal model with a 2M token context window.

Models

ModelContextBest ForCost/1M

Gemini 2.0 Flash1MFast, cost-efficient$0.075/$0.30 Gemini 2.0 Pro2MComplex, large docs$3.50/$10.50 Gemini 2.0 Thinking1MReasoning$0.15/$0.60

Setup

bash
pip install google-generativeai

python
import google.generativeai as genai
genai.configure(api_key='your-key')

Text Generation

python
model = genai.GenerativeModel('gemini-2.0-flash')
response = model.generate_content('Explain RAG vs fine-tuning')
print(response.text)

Streaming

for chunk in model.generate_content('Write a FastAPI tutorial', stream=True): print(chunk.text, end='', flush=True)

Image Understanding

python
import PIL.Image

model = genai.GenerativeModel('gemini-2.0-flash') image = PIL.Image.open('screenshot.png')

response = model.generate_content([ image, 'What UI elements are visible? Describe all interactive components.' ]) print(response.text)

Compare multiple images

chart1 = PIL.Image.open('q1_sales.png') chart2 = PIL.Image.open('q2_sales.png') response = model.generate_content([chart1, chart2, 'Compare Q1 and Q2 trends'])

Large Document Analysis (2M Context)

python

Process entire PDF reports

with open('annual_report.pdf', 'rb') as f: pdf = f.read()

response = model.generate_content([ {'mime_type': 'application/pdf', 'data': pdf}, 'Summarize key financial highlights, risks, and growth opportunities.' ])

Process entire codebase (500K+ tokens)

with open('codebase.txt') as f: code = f.read() response = model.generate_content(f'Codebase:\n{code}\n\nFind all security vulnerabilities.')

Audio Processing

python
import base64

with open('meeting.mp3', 'rb') as f: audio = f.read()

response = model.generate_content([ {'mime_type': 'audio/mp3', 'data': base64.b64encode(audio).decode()}, 'Transcribe this and provide a summary with action items.' ])

Function Calling

python
tools = genai.protos.Tool(
    function_declarations=[genai.protos.FunctionDeclaration(
        name='get_stock_price',
        description='Get current stock price',
        parameters=genai.protos.Schema(
            type=genai.protos.Type.OBJECT,
            properties={'symbol': genai.protos.Schema(type=genai.protos.Type.STRING)},
            required=['symbol']
        )
    )]
)

model = genai.GenerativeModel('gemini-2.0-pro', tools=[tools]) response = model.generate_content('What is AAPL price?') fc = response.candidates[0].content.parts[0].function_call print(f'{fc.name}({dict(fc.args)})')

Grounding with Google Search

python
model = genai.GenerativeModel('gemini-2.0-flash', tools=['google_search_retrieval'])
response = model.generate_content('Latest AI model releases May 2026?')
print(response.text)  # Grounded in real-time search

Conclusion

Gemini 2.0 excels at multimodal tasks and analyzing large documents. Its 2M context window is a genuine differentiator for processing complete codebases or entire document archives.

相关工具

google-aipython
所属主题:API 与集成开发