← Back to tutorials

Building AI Voice Assistants for Customer Service: IVR That Actually Works

How to replace frustrating phone trees with natural language voice AI that customers actually like

Building AI Voice Assistants for Customer Service: IVR That Actually Works

"Press 1 for billing, press 2 for technical support, press 3 for..." — customers hate phone trees. AI voice assistants using large language models are making these frustrating experiences obsolete.

The IVR Problem

Traditional Interactive Voice Response systems:

  • Average 7-12 menu levels before reaching resolution
  • 67% of customers hang up in frustration before getting help
  • Force customers to know the right category before they can explain their problem
  • Can't handle "actually, I need something different"
  • AI voice assistants let customers say what they need naturally: "I got charged twice last month and need a refund" — no menu navigation required.

    Architecture for AI Voice Customer Service

    
    Incoming Call → Speech-to-Text → Intent Understanding (LLM) → 
    Action/Lookup → Response Generation (LLM) → Text-to-Speech → 
    Spoken Response → Continue or Transfer
    

    Building a Natural Language Voice System

    python
    from openai import OpenAI
    import anthropic
    import json
    from dataclasses import dataclass
    from typing import Optional

    client_openai = OpenAI() client_anthropic = anthropic.Anthropic()

    @dataclass class VoiceConversationState: session_id: str caller_phone: str authenticated: bool customer_data: Optional[dict] intent: Optional[str] turn_count: int conversation_history: list[dict] transfer_requested: bool = False

    class AIVoiceAgent: """ AI-powered voice customer service agent. Integrates with Twilio or Amazon Connect for telephony. """ SYSTEM_PROMPT = """You are a voice customer service agent. Keep responses SHORT (1-2 sentences). You will be converted to speech, so:

  • No bullet points, lists, or formatting
  • Speak naturally as you would on a phone call
  • Confirm customer understanding with brief questions
  • If you can't resolve an issue, smoothly offer to transfer to a specialist
  • Remember you're speaking, not writing.""" def __init__(self, company_name: str, knowledge_base: dict): self.company_name = company_name self.knowledge_base = knowledge_base def transcribe_audio(self, audio_file_path: str) -> str: """Convert spoken audio to text using Whisper.""" with open(audio_file_path, 'rb') as audio_file: transcript = client_openai.audio.transcriptions.create( model="whisper-1", file=audio_file, language="en" ) return transcript.text def process_turn(self, state: VoiceConversationState, customer_speech: str) -> dict: """Process one turn of voice conversation.""" # Add to history state.conversation_history.append({ "role": "user", "content": customer_speech }) state.turn_count += 1 # Build context customer_context = "" if state.customer_data: customer_context = f""" Current customer:

  • Name: {state.customer_data.get('name')}
  • Account: {state.customer_data.get('account_number')}
  • Plan: {state.customer_data.get('plan')}
  • Open issues: {state.customer_data.get('open_tickets', 0)}
  • """ # Determine if transfer needed transfer_keywords = ['speak to someone', 'real person', 'human', 'agent', 'representative'] needs_transfer = any(kw in customer_speech.lower() for kw in transfer_keywords) if needs_transfer or state.turn_count > 10: return { 'response': f"Of course, let me connect you with one of our specialists. Please hold for just a moment.", 'action': 'transfer', 'transfer_reason': 'Customer requested' if needs_transfer else 'Unresolved after multiple turns', 'context_for_agent': self._generate_handoff_summary(state) } # Generate response messages = [{"role": "system", "content": self.SYSTEM_PROMPT + customer_context}] messages.extend(state.conversation_history[-6:]) # Last 6 turns response = client_anthropic.messages.create( model="claude-haiku-4-5", max_tokens=150, # Keep responses SHORT for voice messages=messages ) agent_response = response.content[0].text state.conversation_history.append({ "role": "assistant", "content": agent_response }) return { 'response': agent_response, 'action': 'continue', 'turn_count': state.turn_count } def synthesize_speech(self, text: str) -> bytes: """Convert text response to natural-sounding speech.""" response = client_openai.audio.speech.create( model="tts-1", voice="nova", # Natural, professional voice input=text, speed=0.95 # Slightly slower for phone clarity ) return response.content def generate_greeting(self, state: VoiceConversationState) -> str: """Generate personalized greeting.""" if state.authenticated and state.customer_data: name = state.customer_data.get('name', 'there') return f"Hi {name}, thanks for calling {self.company_name}. I'm your AI assistant. How can I help you today?" return f"Thank you for calling {self.company_name}. I'm here to help. Can you tell me your account number or the phone number on your account?" def _generate_handoff_summary(self, state: VoiceConversationState) -> str: """Generate context summary for human agent receiving transfer.""" if not state.conversation_history: return "New call, no conversation history." summary = client_anthropic.messages.create( model="claude-haiku-4-5", max_tokens=200, messages=[{ "role": "user", "content": f"""Summarize this customer service call for the human agent receiving the transfer. Include: issue, what was attempted, current status, any account details mentioned. Keep it under 3 sentences.

    Conversation: {json.dumps(state.conversation_history, indent=2)[:2000]}""" }] ) return summary.content[0].text

    Twilio webhook handler example

    def handle_twilio_webhook(request_data: dict) -> dict: """ Handle incoming call from Twilio Programmable Voice. Returns TwiML response. """ call_sid = request_data.get('CallSid') caller = request_data.get('From') speech_result = request_data.get('SpeechResult', '') # Look up or create conversation state state = VoiceConversationState( session_id=call_sid, caller_phone=caller, authenticated=False, customer_data=None, intent=None, turn_count=0, conversation_history=[] ) agent = AIVoiceAgent("Acme Corp", {}) if not speech_result: # First turn greeting = agent.generate_greeting(state) return { "twiml": f""" {greeting} """ } # Process customer speech result = agent.process_turn(state, speech_result) if result['action'] == 'transfer': return { "twiml": f""" {result['response']} +18005551234 """ } return { "twiml": f""" {result['response']} """ }

    The Economics of AI Voice vs. Human Agents

    MetricHuman AgentAI Voice Agent

    Cost per call$5-12$0.10-0.50 AvailabilityBusiness hours24/7 Wait time2-8 minutesInstant ConsistencyVariableConsistent Complex issuesExcellentPoor LanguagesLimited100+ (with Whisper)

    The winning strategy: AI handles routine calls (60-70%), humans handle complex/emotional situations. Companies implementing this model are seeing 50-60% reduction in customer service costs while improving satisfaction scores.

    Also available in 中文.