Voice as an Interface

Machine learning on edge devices

Using voice as an interface has been terrible in the past, and felt more like a gamble than accurate control. But with recent improvements in speech recognition and natural language understanding, smart assistants conquered our homes. But can a voice interface actually be useful and more than just a fun toy to show off? Why would anyone prefer it over a simple switch or button? Can’t using a smartphone app accomplish the same task easier and more accurate?

Often this will be the case, but I believe that voice can be an important interface if applied to the correct tasks.

1. Hands-free operation

Everyday life: In many activities and circumstances, people are just not able to use their hands as an interface. We usually don’t think of these until they happen. Imagine carrying two heavy bags of groceries and try to turn the light on. There are many more activities where being able to execute simple commands hand free can be very useful. Think of driving a car, riding a bicycle and running.

Emergencies: In emergency situations it might not be possible for a person to move. Imagine an elderly person fell and can’t get up. Being able to call for help might be a lifesaving option.

Jobs: Another class of use-cases occur in peoples everyday jobs. A lot of professions require them working with hands. If they have to put stuff down to adjust or execute something and then pick the stuff back up productivity goes down. And being efficient in business means money is saved.

Medical: In medical environments it’s often required to switch hand-gloves after a simple action like pushing a button. Having a hands-free solution offers enormous cost and time savings.

In all of these cases even being able to recognize simple commands can be useful. These systems are available now which can run locally on small, inexpensive embedded devices like the Raspberry Pi or event on micro-controllers.

2. Faster or more convenient than typing

“Jarvis, play me X from Artist Y”

“Jarvis, add milk to the shopping list”

We now these kinds of commands from the omnipresent smart assistants and these are applications where they really shine. Within narrow domains (skills) users can use their voice to execute semi-complex commands. This is often more convenient and faster than using a smartphone opening the correct app and entering data.

3. People not used to technology

Young people in the first world often forget that not everybody is used to today’s technology. Yes, there are lots of people who have never had a smartphone or computer. But they will most likely have used their voice before.

4. Disabled people

For certain people, the ability to accurately input data with their hands is simply not given. The most extreme case is certainly having no arms or can’t move them due to being paraplegic.

But even less dramatic handicaps render traditional input methods difficult. Having Parkinson certainly makes it difficult to type long texts on a smartphone or a keyboard. For blind people, it could be hard to find a physical button, especially in unknown environments. Even slight handicaps like damaged nerves in the finger can render smartphone usage cumbersome.

5. Use common interfaces

In 2018 Google showed an impressive demo of the google-assistant calling restaurants or hair cutters to make appointments. Of course, it would be much easier for the assistant if the restaurant’s website would have a standardized reservation API. But the vast majority of small businesses, even in modern cities, have neither time, money or intent to provide such.

Using already established interfaces (phone) is much easier than making everyone and their dog switching over to some obscure new technology they never heard of. If you look at small businesses, most of them won’t even have a website.

We can’t expect every small business to adopt the latest web API. Using voice as a common interface can bring modern service to everyday people.

6. Somebody to talk to

Voice has been a natural way for humans to interact for a long time. Right now voice assistants are still limited to a set of simple tasks. But with speech-to-text and text-to-speech are now nearing human performance, and natural language understanding is making big leaps, voice is here to stay.

It won’t be long until using a voice interface will feel as natural as talking to a real person. Time will tell if humans want and will build a connection to machines but movies like Her give us a glimpse of one possible future.


Leave a Reply

Your email address will not be published. Required fields are marked *