By Lily Clifford
“Your call is very important to us. Please hold.” Can you hear that in your head? Curious why it’s so cringey?
Here’s the science. A lot comes down to prosody—the musicality of speech that gives every conversation its distinctive rhythm, pitch, and pauses.
As humans, we like familiar prosody. That’s why an old friend’s voice is so comforting. And why we’re distrustful of salespeople who sound fake. Are they hiding something?
Prosody involves nuanced elements like pitch contours, temporal patterns, and strategic pauses that we unconsciously process and dramatically impacts how we feel about what we hear.
So if the prosody of the training data for an AI voice model isn’t context-appropriate, the voices will sound off. Many of our competitors’ models are trained on polished podcast data, resulting in an upbeat, voiceover-announcer style. While that energetic cadence works great for commercials, it can feel out of place in customer service interactions, where familiarity and approachability are key.
On top of that, when everyone trains on the same publicly available data that’s easily scraped from online sources, all the voices start to sound the same. And you risk boring your customers, or worse, making them cringe.
Rime’s unique approach to data collection—recording live conversations with everyday people in our private studio—harnesses the natural flow of conversation, ensuring our digital voices sound authentic. That in turn boosts customer engagement and conversion metrics like CSAT and call success.
By the way, the screenshot is from a live Rime call! The blue line is the pitch/prosody.
So what do you think? Would you feel more comfortable talking to a bot on the phone if it sounded like a friend?