OpenAI Whisper API 2026：面向AI应用的语音转文本

使用Whisper转录音频文件、会议和实时语音

返回教程列表 🌐 Read in English

入门约 25 分钟

OpenAI Whisper API 2026：面向AI应用的语音转文本

使用Whisper转录音频文件、会议和实时语音

完整的Whisper API教程。涵盖带时间戳的转录、翻译、本地faster-whisper、实时录音以及带AI摘要的会议转录流程。

whisper speech to text openai audio transcription

OpenAI Whisper API 2026：面向AI应用的语音转文本

Whisper是OpenAI最先进的语音识别模型，可通过API或本地使用。

为什么选择Whisper？

支持99种语言，准确率高

处理口音、背景噪音和专业词汇

输出单词和片段的对齐时间戳

可通过API使用，或本地运行以保护隐私

API转录

python
from openai import OpenAI
import os
client = OpenAI()
基础转录
with open('audio.mp3', 'rb') as f:
    transcript = client.audio.transcriptions.create(
        model='whisper-1',
        file=f,
        language='en',  # 可选，不指定则自动检测
        response_format='text'  # text, json, srt, vtt 或 verbose_json
    )
print(transcript)
带时间戳的详细JSON
with open('meeting.mp3', 'rb') as f:
    transcript = client.audio.transcriptions.create(
        model='whisper-1',
        file=f,
        response_format='verbose_json',
        timestamp_granularities=['word', 'segment']
    )
print(f'时长：{transcript.duration}s')
for seg in transcript.segments:
    print(f'[{seg.start:.1f}s - {seg.end:.1f}s] {seg.text}')
单词级时间戳
for word in transcript.words:
    print(f'{word.word}: {word.start:.2f}s - {word.end:.2f}s')

翻译（非英语转英语）

python
with open('french_interview.mp3', 'rb') as f:
    translation = client.audio.translations.create(
        model='whisper-1',
        file=f,
        response_format='text'
    )
print(translation)  # 始终返回英语

本地Whisper（免费、私密）

bash
pip install openai-whisper
或 faster-whisper，速度提升4倍
pip install faster-whisper

python
faster-whisper（推荐本地使用）
from faster_whisper import WhisperModel
模型：tiny, base, small, medium, large-v3
model = WhisperModel('medium', device='cuda', compute_type='float16')
CPU: model = WhisperModel('base', device='cpu', compute_type='int8')
segments, info = model.transcribe('audio.mp3', beam_size=5)
print(f'检测到的语言：{info.language}（概率：{info.language_probability:.0%}）')for segment in segments:
    print(f'[{segment.start:.1f}s -> {segment.end:.1f}s] {segment.text}')

实时转录

python
import pyaudio
import wave
import tempfile
import threading
CHUNK = 1024
FORMAT = pyaudio.paFloat32
CHANNELS = 1
RATE = 16000
RECORD_SECONDS = 5
def record_and_transcribe():
    audio = pyaudio.PyAudio()
    stream = audio.open(format=FORMAT, channels=CHANNELS, rate=RATE, input=True, frames_per_buffer=CHUNK)
    print('正在录音...')
    frames = [stream.read(CHUNK) for _ in range(0, int(RATE / CHUNK * RECORD_SECONDS))]
    stream.close()
    audio.terminate()
    
    with tempfile.NamedTemporaryFile(suffix='.wav', delete=False) as f:
        wf = wave.open(f.name, 'wb')
        wf.setnchannels(CHANNELS)
        wf.setsampwidth(audio.get_sample_size(FORMAT))
        wf.setframerate(RATE)
        wf.writeframes(b''.join(frames))
        wf.close()
        
        with open(f.name, 'rb') as audio_file:
            result = client.audio.transcriptions.create(model='whisper-1', file=audio_file)
        return result.textprint(record_and_transcribe())

会议转录 + AI摘要流程

python
def transcribe_and_summarize(audio_path: str) -> dict:
    # 转录
    with open(audio_path, 'rb') as f:
        transcript = client.audio.transcriptions.create(
            model='whisper-1', file=f, response_format='verbose_json'
        )
    
    text = transcript.text
    
    # 使用GPT-4生成摘要
    summary = client.chat.completions.create(
        model='gpt-4o',
        messages=[{
            'role': 'user',
            'content': f'总结以下会议记录。请包含：\n'
                       f'1. 做出的关键决策\n'
                       f'2. 待办事项及负责人\n'
                       f'3. 后续步骤\n\n会议记录：\n{text}'
        }]
    )
    
    return {
        'transcript': text,
        'duration': transcript.duration,
        'summary': summary.choices[0].message.content
    }

结论

Whisper是2026年最可靠的语音转文本解决方案。使用API方便快捷，本地使用faster-whisper保护隐私并节省成本。会议转录加AI摘要的流程可直接用于生产环境。

Getting Started

Learn how to get started with this application.

Learn more

Installation Guide

OpenAI Whisper API 2026：面向AI应用的语音转文本

OpenAI Whisper API 2026：面向AI应用的语音转文本

为什么选择Whisper？

API转录

基础转录

带时间戳的详细JSON

单词级时间戳

翻译（非英语转英语）

本地Whisper（免费、私密）

或 faster-whisper，速度提升4倍

faster-whisper（推荐本地使用）

模型：tiny, base, small, medium, large-v3

CPU: model = WhisperModel('base', device='cpu', compute_type='int8')

实时转录

会议转录 + AI摘要流程

结论

Documentation

Getting Started

Learn more