Voice AI for Clinical Conversations: Lessons from 10,000 Triage Calls
What we've learned about building AI systems that can conduct safe, effective clinical conversations over the telephone.
James Morrison
Chief Technology Officer, Medelic
Building AI that can talk to patients about their health is one of the most challenging problems in healthcare technology. After processing over 10,000 triage calls in our pilot practices, here's what we've learned about making it work safely and effectively.
Why Voice Matters
Despite the proliferation of apps and online services, the telephone remains the universal interface for healthcare access. It requires no technical literacy, works for patients with visual impairments, and doesn't discriminate based on smartphone ownership. For many patients - particularly elderly patients and those in underserved communities - the phone is their primary way of reaching their GP.
But the phone is also where general practice is most constrained. Limited phone lines, limited staff, and the serial nature of phone conversations create bottlenecks that web forms and apps can't fully solve. Voice AI offers a way to scale the human touch.
The Technical Challenge
Clinical voice AI is harder than general conversational AI for several reasons:
- Stakes are high - errors can have serious consequences for patient safety
- Medical terminology - patients describe symptoms in countless ways, often imprecisely
- Emotional context - patients may be anxious, in pain, or caring for someone else
- Diverse populations - accents, dialects, and languages vary enormously across the NHS
- Audio quality - phone lines aren't designed for high-fidelity audio
What We Got Wrong (At First)
Our early prototypes made several mistakes that are worth sharing:
Too much structure. Our first version followed a rigid question tree, like a phone menu. Patients hated it. They wanted to explain their problem in their own words, not answer 20 yes/no questions. We redesigned to start with open-ended questions and only drill down when needed.
Too little acknowledgment. When a patient says "I've been having terrible chest pain since yesterday," they need to feel heard before being asked follow-up questions. Adding natural acknowledgments ("I'm sorry to hear you're experiencing chest pain - let me ask a few questions to help understand this better") dramatically improved patient satisfaction.
Insufficient handling of uncertainty. Speech recognition isn't perfect. Our early system would confidently proceed even when it wasn't sure what the patient said. Now, when confidence is low, we confirm: "I want to make sure I understood correctly - did you say the pain is in your chest or your back?"
"The breakthrough came when we stopped thinking about it as a chatbot and started thinking about it as a really good listener with medical training. The technology is in service of the conversation, not the other way around."
Red Flag Detection
The most critical capability for any triage system is reliably identifying clinical red flags - symptoms that indicate potentially serious conditions requiring urgent attention. Our approach uses multiple layers:
- Keyword monitoring - certain words and phrases trigger immediate alerts regardless of context
- Pattern recognition - trained on thousands of cases to recognise concerning symptom combinations
- Explicit safety questions - structured questions for high-risk presentations (e.g., chest pain always triggers questions about radiation, breathlessness, sweating)
- Tone analysis - detecting signs of distress or deterioration in the patient's voice
In our validation studies, red flag detection sensitivity is 100% - we have never missed a case that required emergency escalation. Specificity is lower (we over-triage rather than under-triage), but that's the appropriate trade-off for patient safety.
Handling Multiple Languages
The NHS serves patients who speak over 300 languages. While we can't support all of them, we've built capabilities for the most common languages in our pilot areas, including Urdu, Punjabi, Bengali, and Polish.
Multilingual support isn't just about translation - it's about cultural competence. Medical concepts don't translate directly; symptoms are described differently in different cultures; and the appropriate level of formality varies. We work with native speakers and cultural consultants to ensure our conversations are natural and appropriate in each language.
What's Next
We're continuing to improve our voice AI in several areas:
- Better handling of background noise and poor-quality connections
- Support for more languages and dialects
- Improved detection of cognitive impairment and safeguarding concerns
- Integration with video for visual assessment when appropriate
The goal isn't to replace human clinicians - it's to handle the initial contact so clinicians can focus on the patients who need them most. Every call our AI handles well is time that a practice nurse or GP can spend with a complex patient who needs their expertise.
Want to hear Medelic in action?
We can arrange a live demonstration of our voice AI conducting a simulated triage call.
Request a Demo