Real-time Transcription with AI
Live speech-to-text and translation pipeline — hands-on project tutorial
Real-time Transcription with AI
Live speech-to-text and translation pipeline — hands-on project tutorial
Real-time Transcription with AI What You'll Build Live speech-to-text and translation pipeline. By the end of this tutorial, you'll have a fully working implementation you can extend for production use. **Time**: ~25 minutes **Difficulty**: Inte
Real-time Transcription with AI
What You'll Build
Live speech-to-text and translation pipeline. By the end of this tutorial, you'll have a fully working implementation you can extend for production use.
Time: ~25 minutes Difficulty: Intermediate Prerequisites: Python, basic LLM API knowledge
Project Overview
Input → Processing Layer → AI Model → Output
↓ ↓ ↓ ↓
Validate Transform Inference Format
Step 1: Project Setup
bash
Create project directory
mkdir real-time-transcription-with-ai
cd real-time-transcription-with-aiCreate virtual environment
python -m venv venv
source venv/bin/activate # Windows: venv\Scripts\activateInstall dependencies
pip install openai anthropic whisper fastapi uvicorn python-dotenv pydanticCreate .env file
cat > .env << 'EOF'
OPENAI_API_KEY=sk-...
ANTHROPIC_API_KEY=sk-ant-...
EOF
Step 2: Core Implementation
python
main.py
import os
from dotenv import load_dotenv
from openai import OpenAI
from pydantic import BaseModel
from typing import Optionalload_dotenv()
client = OpenAI()
class RealtimeTranscriptionWithAIConfig(BaseModel):
model: str = "gpt-4o-mini"
temperature: float = 0.3
max_tokens: int = 2000
system_prompt: str = "You are a helpful AI assistant specializing in transcription."
class RealtimeTranscriptionWithAI:
"""Real-time Transcription with AI implementation.
Live speech-to-text and translation pipeline
"""
def __init__(self, config: Optional[RealtimeTranscriptionWithAIConfig] = None):
self.config = config or RealtimeTranscriptionWithAIConfig()
self.client = client
self.history = []
def process(self, user_input: str) -> str:
"""Main processing pipeline for transcription."""
# Add to conversation history
self.history.append({"role": "user", "content": user_input})
# Build messages
messages = [
{"role": "system", "content": self.config.system_prompt}
] + self.history[-10:] # Keep last 10 turns
# Call AI
response = self.client.chat.completions.create(
model=self.config.model,
messages=messages,
temperature=self.config.temperature,
max_tokens=self.config.max_tokens
)
assistant_msg = response.choices[0].message.content
# Add assistant response to history
self.history.append({"role": "assistant", "content": assistant_msg})
return assistant_msg
def reset(self):
"""Clear conversation history."""
self.history = []
print("Conversation history cleared.")
Step 3: API Service
python
api.py
from fastapi import FastAPI, HTTPException
from fastapi.responses import StreamingResponse
from pydantic import BaseModelapp = FastAPI(title="Real-time Transcription with AI API")
Initialize the system
system = RealtimeTranscriptionWithAI()class ProcessRequest(BaseModel):
input: str
session_id: str = "default"
class ProcessResponse(BaseModel):
output: str
session_id: str
@app.post("/process", response_model=ProcessResponse)
async def process(req: ProcessRequest):
"""Process input through the AI system."""
try:
result = system.process(req.input)
return ProcessResponse(output=result, session_id=req.session_id)
except Exception as e:
raise HTTPException(status_code=500, detail=str(e))
@app.delete("/session/{session_id}")
async def clear_session(session_id: str):
"""Clear a conversation session."""
system.reset()
return {"message": f"Session {session_id} cleared"}
@app.get("/health")
async def health():
return {"status": "ok", "service": "Real-time Transcription with AI"}
if __name__ == "__main__":
import uvicorn
uvicorn.run(app, host="0.0.0.0", port=8000)
Step 4: CLI Interface
python
cli.py
import sysdef main():
"""Interactive CLI for Real-time Transcription with AI."""
print(f"Real-time Transcription with AI")
print(f"Type 'quit' to exit, 'reset' to clear history\n")
system = RealtimeTranscriptionWithAI()
while True:
try:
user_input = input("You: ").strip()
if not user_input:
continue
if user_input.lower() == 'quit':
print("Goodbye!")
sys.exit(0)
if user_input.lower() == 'reset':
system.reset()
continue
response = system.process(user_input)
print(f"AI: {response}\n")
except KeyboardInterrupt:
print("\nGoodbye!")
sys.exit(0)
if __name__ == "__main__":
main()
Step 5: Testing
python
test_main.py
import pytest
from main import RealtimeTranscriptionWithAI@pytest.fixture
def system():
return RealtimeTranscriptionWithAI()
def test_basic_processing(system):
result = system.process("Hello, please help me with transcription")
assert isinstance(result, str)
assert len(result) > 10
def test_history_tracking(system):
system.process("First message")
system.process("Second message")
assert len(system.history) == 4 # 2 user + 2 assistant
def test_history_reset(system):
system.process("Some message")
system.reset()
assert len(system.history) == 0
Running the Project
bash
Run API server
uvicorn api:app --reloadTest API
curl -X POST http://localhost:8000/process \
-H "Content-Type: application/json" \
-d '{"input": "Test transcription", "session_id": "test"}'Run CLI
python cli.pyRun tests
pytest test_main.py -v
What's Next
Extend this project with:
Resources
相关工具