Microsoft researchers have created a technology that accurately recognises the words in a conversation like humans do.
The team from Microsoft Artificial Intelligence and Research reported a speech recognition system that makes the same or fewer errors than professional transcriptionists.
The researchers reported a word error rate (WER) of 5.9 percent, down from the 6.3 percent WER the team reported just last month.
The 5.9 percent error rate is about equal to that of people who were asked to transcribe the same conversation, and it’s the lowest ever recorded against the industry standard “Switchboard” speech recognition task.
To reach the human parity milestone, the team used Microsoft’s Computational Network Toolkit (CNTK).
CNTK’s ability to quickly process deep learning algorithms across multiple computers running a specialised chip called a graphics processing unit vastly improved the speed at which the team was able to do research and, ultimately, reach human parity.
The milestone means that, for the first time, a computer can recognise the words in a conversation as well as a person would.
In doing so, the team has beat a goal they set less than a year ago – and greatly exceeded everyone else’s expectations as well.
The research milestone comes after decades of research in speech recognition, beginning in the early 1970s with DARPA, the US agency tasked with making technology breakthroughs in the interest of national security.
The milestone will have broad implications for consumer and business products that can be significantly augmented by speech recognition. That includes consumer entertainment devices like the Xbox, accessibility tools such as instant speech-to-text transcription and personal digital assistants such as Cortana.