Skip to content

Interview: Speech Recognition in Telephony

blank

Many people already use voice assistants such as Alexa, Siri or Cortana. Automatic transcriptions of spoken content into written text now also work almost flawlessly. In telephony, however, such systems and artificial intelligence do not yet seem to be as advanced. Voice commands in service lines of various providers are often insufficiently recognized and lead to frustrated callers. The functionality is also usually limited to very simple selection options. To find out why this is so, what technical solutions are already available in this regard, and what possibilities these technologies offer, we interviewed Peter Kugler, CEO of YouCon.

Hello Peter, do you share the perception of many people that speech recognition in telephony often works less well than in other areas?

Yes, I am convinced of that, I do have some experience with the topic – also as a customer – and it has convinced me in the rarest of cases.

Is this due to the AI and speech recognition solutions used, or are there other reasons?

When we talk about telephony, we usually mean conversations that are conducted with a cell phone. The information that is transmitted over this channel is much lower than, for example, with Internet telephony, not to mention the ambient noise. The AI then actually has no chance to do a good job. You can compare it well with a low resolution picture, even there it becomes difficult to recognize anything.

So the phone line itself is a barrier?

Yes, the utilization of the lines and radio masts doubles every year in terms of data volume, but the infrastructure behind them does not. Providers are therefore trying to compress everything as much as possible, to reduce the information content, in order to bring more connections over the existing lines.

Is there a technical solution for this?

There really is. Microsoft, Google, etc. are starting to run the texted conversation through more algorithms that check the text for grammar and replace missing or misunderstood words. This is, of course, very costly.

Google, for example, has already developed a function that can make appointments by phone. Is this based on a different technical solution?

No, it is the same. You limit yourself to a very limited functionality here and that’s why it works well.

Is this the reason why most telephone voice systems follow only very simple commands?

Exactly, free calls without a clear framework are not yet possible in good quality, with reasonable economic costs. The systems focus on single keywords and from this then result follow-up actions.

What would have to change for more complex topics to be carried out on the phone by artificial intelligence in the future?

There is still a lot to change. It is already possible today to teach an AI many special topics, but it will be a few years before it can conduct a complex complaint conversation. The basis for this will have to be much faster hardware on which the AI is operated and the costs for this must make economic sense for the operator. The AI must be able to respond individually to the caller and apply what it has learned in a “situation-elastic” manner.

Are there use cases where this would make particular sense?

Every large company that produces or sells consumer goods has to deal with a very large number of customers. Depending on the company, this means several million customer contacts per year. So it is obvious that these companies want to automate a large part of their customer contacts.

Thank you very much for the interview.

If you are particularly interested in this topic and would like to ask Peter more questions, we would be happy to hear from you.