By navigating our site, you agree to allow us to use cookies, in accordance with our Privacy Policy.

Intelligent Voice Control Applications – All You Need to Do is Talk

Engineers have developed speech recognition algorithms, which combined with large scale computing power, have enabled designers to design technology that responds quickly and precisely to a huge variety of commands.

Intelligent Voice Control

In the quest for the perfect technology that would provide an innovative platform for human-technology interaction, engineers have never stopped experimenting. The touchscreen was the product of several such experiments. Touchscreens offered a much more natural and intuitive method of control as compared to the already existing interfaces.

But, every technology has certain limitations and very few are ideal for every application. Similarly, touchscreens add cost to cheap applications, they are awkward to use in smaller devices, they are a weak point in outside installations, both from the environment and from vandals, they can be a threat to safety, and of course, they need physical proximity for interaction. To overcome these limitations, voice recognition and control seems like a perfect technology to engineers. Engineers have developed speech recognition algorithms, which combined with large-scale computing power, have enabled designers to design technology that responds quickly and precisely to a huge variety of commands.

Several important experiments and innovations during 1950s-1990s provided the building blocks for voice control applications –

  • The very first attempt at developing speech recognition algorithms was made by Bell Laboratories in 1952 when it developed a basic system named Audrey. This system could only understand a few numbers that were spoken by specific people.
  • 1970s marked some significant events that paved the way for voice control applications. The DARPA Speech Understanding Research (SUR) program by the US Department of Defence led to the development of Carnegie Mellon’s Harpy system, which had a vocabulary of over 1000 words and could predict the finite-state network of possible sentences.
  • Following this, a statistical modelling technique called Hidden Markov Model (HMM) was developed. It could predict whether individual sounds could be words and further expanded the number of words that a computer could learn to several thousand.
  • The next giant technological advance came in 1997, when Dragon Naturally Speaking, the first system that could understand natural speech was launched. It could process around 100 words per minute.

But, even after several such projects and innovations, there was still a need to bring the technology into the mainstream with low cost, wide availability and large-scale computing that would provide real-time responses for control.

This came more recently from two industry giants – Google and Apple when Google Search and Siri became the secondary means of control after the touchscreen.

Siri- Apple’s intelligent personal assistant

In 2011, Apple launched Siri, the company’s intelligent digital personal assistant on the iPhone 4S. Siri added a level of user control to the system, allowing users to call friends, dictate messages or play music using voice control.

Google’s Voice Search Application

In 2012, originally developed for Apple’s iPhone, Google’s Voice Search app took advantage of the phones inherent connectivity to compare search phrases against the host of data from user searches that the company had accumulated in the Cloud. The ability to compare with previous searches gave a huge jump in the level of accuracy, as it allowed AI to better understand the context of the search.

Industry analysts, ABI Research, estimate that 120 million voice-enabled devices will be shipped annually and voice control will be a key user interface for the smart home by 2021. These findings have made speech recognition and control a viable potential option by designers and hobbyists for their next design.  Designers can build a system that operates offline with relatively limited functionality or offer a more comprehensive instruction set by a connection to the cloud.

Most large cloud providers, including Amazon and Google use a cloud connection and offer speech tools that are relatively cheap to incorporate into designs.

They offer an ecosystem that includes some of the most respected developers of home automation products. Amazon’s partners include Nexia, Philips Hue, Cree, Osram, Belkin and Samsung devices. Google also shares many of the same partners as Amazon which includes Hive, Nest, Nvidia, Philips Hue and Belkin.

To help incorporate Amazon and Google voice control into their products, the two companies offer access to their platforms for a relatively low cost. Amazon’s Alexa Voice Service (AVS) allows developers to integrate Alexa directly into their products. Similarly, Google also allows developers to utilise the functionality of the company’s Google Assistant intelligent digital personal assistant through a SDK.

For those who would rather use an open source interface, other options are also available. For example, Mycroft is a free and open-source intelligent personal assistant for Linux-based operating systems that uses a natural language user interface. Mycroft is also a modular application, allowing users to change its components. Jasper is another open source option that allows developers to easily add new functionality to the software.

As for hardware, the most likely host will be a single board computer, like a Raspberry Pi. There are some boards that have been designed specifically for voice control applications, such as Matrix’ Creator board, which can work as a Raspberry Pi Hat, or as a stand-alone unit. The board has an array of seven MEMs microphones to give a 360o listening field.  The board is powered by an ARM Cortex M3 with 64 Mbit SDRAM. A variety of sensors are also incorporated to allow developers to add functionality.

Additionally, microphones are a very important design consideration. Multiple microphones, often arranged in an array, are often used to capture a more accurate representation of the sound. If the technology to stitch the sound from the various microphones together is not built into the array, it may require extra design work and processing power. Noise reduction technology is also extremely important to ensure instructions are received accurately.


The quest to develop an ideal technology that would provide the most intuitive interface possible between humans and machines doesn’t seem to see an end anytime soon. While there is no interface that can be compared to the instinctive way that humans communicate with each other, voice control is almost at a stage where the process feels close to being as natural as talking to another human. Because much of the hard processing is done in the cloud, the hardware required is not as demanding as you might think. Also, to simplify the voice-control process, there are specialised boards, tools and services that are widely available making it possible to add voice control to almost any project.


Cliff OrtmeyerCliff Ortmeyer, Global Head of Solutions Development,Premier Farnell


Jyoti Gazmer

A Mass Comm. graduate believes strongly in the power of words. A book lover who dreams to own a library some day. An introvert but will become your closest friend if you share mutual feelings about COFFEE. I prefer having more puppies over humans.

Related Articles

Upcoming Events