OpenAI Whisper API: Complete Guide to Speech Recognition in Your App

Add accurate speech-to-text to any application using OpenAI Whisper API

返回教程列表
进阶14 分钟

OpenAI Whisper API: Complete Guide to Speech Recognition in Your App

Add accurate speech-to-text to any application using OpenAI Whisper API

Complete guide to integrating OpenAI Whisper for speech recognition: API setup, language detection, translation, real-time streaming, cost optimization, and handling audio quality issues.

whisper apispeech to textopenaiaudio processingtranscription

OpenAI Whisper API: Complete Integration Guide

What Whisper Does

Whisper transcribes audio to text with state-of-the-art accuracy. Supports 57 languages and can translate non-English audio directly to English text.

Cost: $0.006 per minute - extremely affordable for most use cases.

Basic Setup

python
from openai import OpenAI
client = OpenAI(api_key='your-key')

Transcribe an audio file

with open('meeting.mp3', 'rb') as audio_file: transcript = client.audio.transcriptions.create( model='whisper-1', file=audio_file ) print(transcript.text)

Language Detection and Translation

python

Auto-detect language

transcript = client.audio.transcriptions.create( model='whisper-1', file=audio_file, response_format='verbose_json' # Includes detected language ) print(f'Language: {transcript.language}') print(f'Text: {transcript.text}')

Translate non-English to English

translation = client.audio.translations.create( model='whisper-1', file=spanish_audio ) print(translation.text) # Always English output

Timestamps and Segmentation

python
transcript = client.audio.transcriptions.create(
    model='whisper-1',
    file=audio_file,
    response_format='verbose_json',
    timestamp_granularities=['segment', 'word']  # Both segment and word timestamps
)

for segment in transcript.segments: print(f'[{segment.start:.1f}s - {segment.end:.1f}s]: {segment.text}')

Cost Optimization

python
import librosa
import soundfile as sf

def optimize_audio_for_whisper(input_path: str, output_path: str): # Load and resample to 16kHz mono (Whisper native format) audio, sr = librosa.load(input_path, sr=16000, mono=True) # Trim silence (saves significant cost on meetings with long pauses) audio_trimmed, _ = librosa.effects.trim(audio, top_db=20) # Save as 16-bit PCM WAV (smaller than MP3 for short clips) sf.write(output_path, audio_trimmed, 16000, subtype='PCM_16') original_duration = librosa.get_duration(filename=input_path) trimmed_duration = len(audio_trimmed) / 16000 savings = (original_duration - trimmed_duration) / original_duration print(f'Audio reduced by {savings:.1%}') return output_path

Handling Large Files

Whisper API has a 25MB file size limit. For longer audio:

python
from pydub import AudioSegment

def transcribe_long_audio(file_path: str, chunk_minutes: int = 10) -> str: audio = AudioSegment.from_file(file_path) chunk_ms = chunk_minutes * 60 * 1000 chunks = [audio[i:i+chunk_ms] for i in range(0, len(audio), chunk_ms)] transcripts = [] for i, chunk in enumerate(chunks): chunk_path = f'/tmp/chunk_{i}.mp3' chunk.export(chunk_path, format='mp3', bitrate='64k') with open(chunk_path, 'rb') as f: result = client.audio.transcriptions.create( model='whisper-1', file=f ) transcripts.append(result.text) return ' '.join(transcripts)

Real-World Applications

Meeting transcription: Record meetings, transcribe, then use GPT-4o to extract action items and summaries.

Customer service analytics: Transcribe support calls to identify common issues and sentiment patterns.

Subtitle generation: Whisper with word timestamps generates accurate SRT subtitle files.

Multilingual support: Support users in any of 57 languages without separate language-specific models.

Quality Tips

  • 16kHz mono audio gives best results (Whisper native format)
  • Use verbose_json format to detect and handle low-confidence segments
  • For specialized vocabulary (medical, legal, technical), use the prompt parameter to provide context
  • Background noise significantly impacts accuracy - consider noise reduction preprocessing
  • 相关工具

    OpenAI WhisperOpenAIPython