Real-time Transcription with AI

Live speech-to-text and translation pipeline — hands-on project tutorial

进阶约 25 分钟

Real-time Transcription with AI

Live speech-to-text and translation pipeline — hands-on project tutorial

Real-time Transcription with AI What You'll Build Live speech-to-text and translation pipeline. By the end of this tutorial, you'll have a fully working implementation you can extend for production use. **Time**: ~25 minutes **Difficulty**: Inte

tutorialhands-onwhispertranscriptionproject

Real-time Transcription with AI

What You'll Build

Live speech-to-text and translation pipeline. By the end of this tutorial, you'll have a fully working implementation you can extend for production use.

Time: ~25 minutes Difficulty: Intermediate Prerequisites: Python, basic LLM API knowledge

Project Overview


Input → Processing Layer → AI Model → Output
  ↓           ↓               ↓         ↓
Validate   Transform       Inference  Format

Step 1: Project Setup

bash
Create project directory
mkdir real-time-transcription-with-ai
cd real-time-transcription-with-ai
Create virtual environment
python -m venv venv
source venv/bin/activate  # Windows: venv\Scripts\activate
Install dependencies
pip install openai anthropic whisper fastapi uvicorn python-dotenv pydantic
Create .env file
cat > .env << 'EOF'
OPENAI_API_KEY=sk-...
ANTHROPIC_API_KEY=sk-ant-...
EOF

Step 2: Core Implementation

python
main.py
import os
from dotenv import load_dotenv
from openai import OpenAI
from pydantic import BaseModel
from typing import Optional
load_dotenv()
client = OpenAI()
class RealtimeTranscriptionWithAIConfig(BaseModel):
    model: str = "gpt-4o-mini"
    temperature: float = 0.3
    max_tokens: int = 2000
    system_prompt: str = "You are a helpful AI assistant specializing in transcription."class RealtimeTranscriptionWithAI:
    """Real-time Transcription with AI implementation.
    
    Live speech-to-text and translation pipeline
    """
    
    def __init__(self, config: Optional[RealtimeTranscriptionWithAIConfig] = None):
        self.config = config or RealtimeTranscriptionWithAIConfig()
        self.client = client
        self.history = []
    
    def process(self, user_input: str) -> str:
        """Main processing pipeline for transcription."""
        
        # Add to conversation history
        self.history.append({"role": "user", "content": user_input})
        
        # Build messages
        messages = [
            {"role": "system", "content": self.config.system_prompt}
        ] + self.history[-10:]  # Keep last 10 turns
        
        # Call AI
        response = self.client.chat.completions.create(
            model=self.config.model,
            messages=messages,
            temperature=self.config.temperature,
            max_tokens=self.config.max_tokens
        )
        
        assistant_msg = response.choices[0].message.content
        
        # Add assistant response to history
        self.history.append({"role": "assistant", "content": assistant_msg})
        
        return assistant_msg
    
    def reset(self):
        """Clear conversation history."""
        self.history = []
        print("Conversation history cleared.")

Step 3: API Service

python
api.py
from fastapi import FastAPI, HTTPException
from fastapi.responses import StreamingResponse
from pydantic import BaseModel
app = FastAPI(title="Real-time Transcription with AI API")
Initialize the system
system = RealtimeTranscriptionWithAI()
class ProcessRequest(BaseModel):
    input: str
    session_id: str = "default"
class ProcessResponse(BaseModel):
    output: str
    session_id: str
@app.post("/process", response_model=ProcessResponse)
async def process(req: ProcessRequest):
    """Process input through the AI system."""
    try:
        result = system.process(req.input)
        return ProcessResponse(output=result, session_id=req.session_id)
    except Exception as e:
        raise HTTPException(status_code=500, detail=str(e))
@app.delete("/session/{session_id}")
async def clear_session(session_id: str):
    """Clear a conversation session."""
    system.reset()
    return {"message": f"Session {session_id} cleared"}
@app.get("/health")
async def health():
    return {"status": "ok", "service": "Real-time Transcription with AI"}if __name__ == "__main__":
    import uvicorn
    uvicorn.run(app, host="0.0.0.0", port=8000)

Step 4: CLI Interface

python
cli.py
import sys
def main():
    """Interactive CLI for Real-time Transcription with AI."""
    print(f"Real-time Transcription with AI")
    print(f"Type 'quit' to exit, 'reset' to clear history\n")
    
    system = RealtimeTranscriptionWithAI()
    
    while True:
        try:
            user_input = input("You: ").strip()
            
            if not user_input:
                continue
            
            if user_input.lower() == 'quit':
                print("Goodbye!")
                sys.exit(0)
            
            if user_input.lower() == 'reset':
                system.reset()
                continue
            
            response = system.process(user_input)
            print(f"AI: {response}\n")
            
        except KeyboardInterrupt:
            print("\nGoodbye!")
            sys.exit(0)if __name__ == "__main__":
    main()

Step 5: Testing

python
test_main.py
import pytest
from main import RealtimeTranscriptionWithAI
@pytest.fixture
def system():
    return RealtimeTranscriptionWithAI()
def test_basic_processing(system):
    result = system.process("Hello, please help me with transcription")
    assert isinstance(result, str)
    assert len(result) > 10
def test_history_tracking(system):
    system.process("First message")
    system.process("Second message")
    assert len(system.history) == 4  # 2 user + 2 assistantdef test_history_reset(system):
    system.process("Some message")
    system.reset()
    assert len(system.history) == 0

Running the Project

bash
Run API server
uvicorn api:app --reload
Test API
curl -X POST http://localhost:8000/process \
  -H "Content-Type: application/json" \
  -d '{"input": "Test transcription", "session_id": "test"}'
Run CLI
python cli.py
Run tests
pytest test_main.py -v

What's Next

Extend this project with:

Authentication (JWT/API keys)

Rate limiting

Persistent storage (PostgreSQL/Redis)

Response streaming

Multi-user support

Metrics and monitoring

Resources

OpenAI API: https://platform.openai.com/docs

FastAPI: https://fastapi.tiangolo.com

whisper documentation: https://github.com/whisper/whisper

Getting Started

Learn how to get started with this application.

Learn more

Installation Guide

Real-time Transcription with AI

Real-time Transcription with AI

What You'll Build

Project Overview

Step 1: Project Setup

Create project directory

Create virtual environment

Install dependencies

Create .env file

Step 2: Core Implementation

main.py

Step 3: API Service

api.py

Initialize the system

Step 4: CLI Interface

cli.py

Step 5: Testing

test_main.py

Running the Project

Run API server

Test API

Run CLI

Run tests

What's Next

Resources

Documentation

Getting Started

Learn more