VUI Design Guide (Voice-User Interface)

Good Morning…..

  • “Alexa set a reminder for 11pm”
  • “No, set a reminder for 11pm!”
  • “I’m sorry I can’t help with that”
  • *Sets alarm manually*

“If you’ve ever been to a major city around the world then I am sure how aware you will be of how diverses for varying our voices, mannerisms, dialects, phrasing and intentions are. Language is a complex system of communication that even the best of us can sometimes struggle with understanding”

Well for computers – understanding natural language has been a real challenge. Natural language is the term used to describe how people talk under normal settings and circumstances and is the term used by designers creating natural language processing software that is capable of understanding human language.

Natural language processing is a subfield of linguistics, computer science, information engineering, and artificial intelligence concerned with the interactions between computers and human (natural) languages, in particular how to program computers to process and analyze large amounts of natural language data and has been around since the 1950s.

Now the first really successful VUI or Voice User Interface that was implemented in a mass-market product is arguably the Amazon Alexa released in November 2014. To date over 100 million Amazon Alexa’s have been sold. Perhaps the second most popular VUI device on the market is the Google Home with over 52 million units sold since its release in  2016 less than two years ago. In that time the technology has come a really long way 

  • Major speech recognition platforms have achieved accuracy over 95%, which is on par with humans
  • ComScore predicts that 50% of all searches will be voice searches by 2020 – comScore
  • 65% of people who own an Amazon Echo or Google Home can’t imagine going back to the days before they had a smart speaker – GeoMarketing
  • 72% of people who own a voice-activated speaker say their devices are used as part of their daily routines – Google

So I think it’s no real stretch to say that VUI’s are fast becoming an effective way for humans to interact with their devices, end-users have shown that they are willing to adapt to the technology and that they are as likely to use it as traditional search methods. So how can we as designers embrace this and begin to implement this into our designs?

Planning – constraints, use cases and target markets

The successful implementation of any product starts first with planning and an effective definition of the constraints with which you are designing to. VUI’s are no different, defining who your target market is and how they will most likely use your products VUI is critical as it will shape the terminology that of your search terms and before you can define your interactive design you must first environmental context that frames the voice interaction.

Define the voice genre

  • Phone
  • Wearables
  • Stationary connected smart device
  • Non-stationary computing device


  • Big players are Apple, Google and Samsung so you should focus testing on these platforms initially
  • Connectivity — Cellular networks (3G/4G), wifi, paired devices, Bluetooth
  • The environment variability is greatest with Phones as people can conceivably take their phones anywhere
  • Thanks to “Siri” – users are well accustomed to using voice interfaces
  • Allows users to interact with audio, visual and allows manual input or selection of multiple choices
  • Method of interaction is fairly standard across the industry and has set user flows that users are accustomed to


  • Tighter use case parameters as tends to be focused around specific devices such as smartwatches or fitness wearables
  • Connectivity — Cellular networks (3G/4G), wifi, paired devices, Bluetooth
  • Limited standardisation across the industry for voice interactions, users may not be accustomed to talking to their watch in public. Don’t assume users have a pre-existing understanding of user-flows. 
  • Slightly more variability for user interaction as some devices will have screens and input via either touch screen or buttons but will be highly variable across devices. May allow for visual, audio, or tactile feedback. 
  • Lower end devices are dependent on connectivity to another device for core functionality. Higher-end devices may have their our cellular connection and can operate more independently but don’t assume this. 

Stationary Connected Devices

  • Amazon echos and google homes fall into this category as well as computers, tv’s smart thermostats, appliances with screens, smart sound systems and speakers, and other smart home hub systems. 
  • Connectivity — Wired networks (ethernet), wifi, paired devices, Bluetooth
  • As these devices are stationary users become highly adjusted to using them under strict circumstances and can become highly habitual. 
  • Standardisation across the industry can be expected within genres of devices, such as smart home hubs will behave similarly and computers will all behave similarly, however, expect little standardisation across device group types. Users are also adapted to using these devices differently so avoid trying to standardise across device types and instead take a tailored approach if applicable. 

Non-Stationary Computing Devices (Non-Phones)

  • Laptops and tablets, in-car connected entertainment systems
  • Connectivity — Wireless networks, wifi, paired devices, rarely cellular connections. 
  • Primary input mode is typically not voice
  • Environmental context has a substantial impact on voice interactivity
  • Typically have unstandardized voice interaction methods between device genres

Once you have defined the who, what, where and when you can start to define the Voice Input UX

Voice Input UX

To understand the user’s underlying expectations of voice interfaces—we must understand the principles that govern human communication. In other words, we need to take a look in the mirror before we can determine what makes a design of this type either latch with users or end up frustrating them, remembering, too, that users will become very frustrated very soon if things go wrong.

1. Keep it simple

Short terms are easier for any Natural Language processor to understand so when creating voice commands make them short, clear and to the point. “Alexa, turn off living rooms lights”. This makes things easier for the user and the VCD – voice command device. 

2. Keep things simple

I know I’m repeating myself but really boil down what you are trying to achieve to as few steps as possible. That is if you a smart thermostat really be critical of what you are going to try to make voice-controlled. Turning the temperature up and down via voice, yes definitely achievable; set temperature to specific value e.g. 24deg, also yes very achievable. Programming specific schedules for your heating including multiple temperature changes shut off times, geofencing when you leave a certain distance from your home, no. Too complicated. Again keep it simple. 

3. Prompt with guidance for the user

When building your VUI build in prompts where necessary or when a user is presented with a choice. Again KEEP IT SIMPLE. Where possible make these choices binary as users will struggle to keep more than two options in their head. 

4. Consider parallel language

Okay without going too deep into the theory parallel language is where two languages are considered equal in a particular domain, and where the choice of language depends on what is deemed most appropriate and efficient in a specific situation. In it’s simplest form, this could be the difference between American English and English spoken phrases. In more complex situations it could be in a country that speaks both English and a native language as their main languages and uses them interchangeably. So again consider the context, the end-user and exercise some design thinking. It may be that through user testing you find that end-users use several different phrases to say the same command. Where possible build this into your VUI.

5. Give users credit.

 People know how to talk. Don’t put words in their mouth. In general, people know what they want to do and know what they want to say naturally to get that task done. Don’t get in the way of this. User studies are a great way to work this out if you are unsure. 

Voice command tech still has a way to go, and as it really is still in its infancy we don’t have clear standardisation for user flows, interaction design or even for device capabilities. Users have shown that they want this technology and are more than willing to persevere. Soon we would hope to see more standardisation as UX becomes more standardised across platforms and devices which will naturally happen in time (as it did over time with app design). 

With all that said, the technology is by no means experimental, nor is it a gimmick. The future will see an increase in voice user interface devices and users will come to expect to be able to talk to their devices. If you would like to implement a voice user interface into your product or would like to know more about how you could be leveraging voice user interfaces then get in touch with us today. Here at Detekt, we have the expertise and experience to escort you through your journey with voice user interfaces.