The Transformative Potential of Voice User Interface: Explore the emergence of Multimodal Interactions for Inclusive Tech Design

Essential to the development of smart homes, virtual assistants and a myriad of mobile applications, VUIs make it possible to control various devices and access information without the need for physical touch or sight. Examples of how VUI technology has revolutionised how we interact with our devices and become an integral part of our daily lives include virtual assistants like Amazon’s Alexa and Apple’s Siri, which can perform tasks such as setting reminders, playing music, or controlling smart home devices (Kakade, S. 2018).

‍

The benefits of VUIs are broad, they offer an intuitive and hands-free mode of operation, enhance accessibility for those with physical or visual impairments and can lead to more efficient task completion as they eliminate the need for navigation through complex menus. However, its design landscape is continually evolving, presenting new opportunities and many new challenges.

‍

Its Current State

‍

Up until now, VUI design primarily focused on usability and accessibility, with designers aiming to create experiences that are intuitive, responsive and capable of understanding natural language commands (Gachko, E. 2024). For example, a user can ask their smartphone to ‘call mum’ or ‘play the latest news,’ and the VUI system facilitates these actions without any manual input. However, existing VUI systems lack contextual awareness, which can lead to frustrating interactions. A case in point is when a user asks for directions while driving and the VUI fails to consider the current location or traffic conditions, providing outdated or lengthy routes.

‍

Additionally, the design of VUI often falls short in terms of personalisation and customisation; for instance, users with unique speech patterns or accents might find the system’s responses less accurate. Emotional engagement is another area where VUIs typically do not perform well, as most systems cannot detect subtleties in user mood or tone, missing cues that would otherwise require a more empathetic response.

‍

The Voice of the Future

‍

Looking ahead, VUI design holds immense promise. Advancements in artificial intelligence and machine learning will enable VUI systems to become more contextually aware and empathetic to the user. Developments in Large Language Models (LLMs) have revolutionised human-computer interaction by enabling more natural and intuitive communication, bridging the gap between human language and machine understanding. Underpinned by AI, these systems will not only understand what users say but also anticipate their needs and emotions, leading to more meaningful and satisfying interactions. It could lead to an overhaul in how we run our houses, businesses, and vehicles, not to mention bringing new accessibility to all.

‍

A noteworthy example of the potential offered by VUI is ChatGPT’s introduction of voice and image capabilities. This enhancement to the advanced AI language model facilitates a dual-mode interaction, permitting users to engage in verbal dialogues with the AI, as well as utilise image input to contextualise enquiries. For instance, from an educational perspective, the integration of image recognition and voice interaction offers a collaborative platform for problem-solving; in disciplines such as mathematics, visual and auditory cues can significantly enhance comprehension (Cervini, P. and Molaschi, V. 2023).

‍

ChatGPT’s leap into VUI signifies a major shift in how AI platforms serve us, seamlessly transforming from text-based interfaces to a multimodal approach that embraces both voice and visual elements, making technology adapt to human behaviour rather than the other way around.

‍

How to Adapt

‍

One of the key areas where design will play a significant role in the future of VUI is personalisation. Just as graphical user interfaces (GUIs) have evolved to tailor experiences based on user preferences (we see this especially in the gaming industry), VUI will follow suit. Consequently, designers will need to create interfaces that adapt to individual users' speech patterns, preferences and even moods, providing a truly human-centric and instinctually customisable experience (Diaz, I. 2023).

‍

What About EQ?

‍

Emotional intelligence is something even humans struggle with from time to time as every situation calls for its own response. However, as AI becomes more adept at understanding human emotions, we can expect VUI systems to be able to respond empathetically, offering support, encouragement, or even humour when appropriate. Designers will need to imbue these systems with emotional sensitivity, ensuring that interactions feel not just functional but also genuinely human. It’s something that even in movies we can’t quite depict correctly as AI often seems ‘robotic’ or fake (Naskar, V. 2023).

‍

Integration with Multimodal Interfaces

‍

Another exciting prospect within the field of user experience design is the integration of VUI design with multimodal interfaces (Reeves et al. 2003). As technology continues to advance, VUI will not exist in isolation but rather as part of a broader ecosystem that includes graphical, tactile and even olfactory interfaces (Cornelio, P.,Velasco, C., and Obrist, M. 2021). The Apple Vision Pro, for example, uses hand gestures, eye movements, and spoken commands to navigate through applications and carry out tasks.

‍

The capacity to combine these modalities fluidly requires thoughtful design considerations to ensure that transitions between modes are smooth and intuitive. Designers are therefore tasked with the challenge of harmonising these different interaction modes within a single interface, which demands a deep understanding of how users engage with each mode separately and in combination. The goal is to create an ecosystem where switching from voice commands to gestures, for example, feels effortless and natural.

‍

Moreover, it’s crucial that these systems are adaptable to individual users’ changing circumstances -whether they’re driving a car, cooking in the kitchen, or working in a noisy environment- offering the most suitable mode of interaction for any given scenario.

‍

Challenges and Considerations

‍

The future of VUI design won't be without its challenges. Privacy concerns, ethical considerations and issues of inclusivity will need to be carefully navigated. For example, individuals who are hard of hearing may struggle with standard VUI systems. Designers must innovate to accommodate these users, possibly by enhancing voice recognition to work with assistive devices like hearing aids or by developing visual signalling alongside auditory commands. Such adaptations ensure that VUI technology remains an inclusive advancement rather than a barrier. Designers must strike a balance between functionality and respect for users' rights and boundaries. Furthermore, the growing concern around constant screen use suggests a societal shift that VUI may both influence and adapt to, as people seek more natural, inclusive and personalised interactions with technology.

‍

Conclusion

‍

The world of VUI, and it could be a very different world, may provide huge technical advances. It could help across all sectors, including education, diversity barriers, healthcare and more.

Designers have a huge role to play in the ethical dilemmas it will bring to the forefront. How this tool is designed and produced could shape the future of its failure or success. Hopefully, designers can create experiences that are not only functional but also emotionally resonant and deeply customisable. By embracing empathy, innovation, and inclusivity, we can unlock the full potential of VUI, ushering in a new era of human-machine interaction.

‍

Co-authored by Cat Mellors (Lead Product Designer) & Anusha Gurung (Technical Research Analyst).

‍

References

_{1. Kakade, S. (2018)}_{Voice User interface (VUI)}_{. Available at:}_{https://www.techtarget.com/searcherp/definition/voice-user-interface-VUI#:~:text=Voice%20user%20interface%20(VUI)%20is,are%20prime%20examples%20of%20VUIs}_{. (Accessed 18 April 2024).}

_{2. Gachko, E. (2024)}_{Voice User Interface (VUI) Design Best Practises.}_{Available at:}_{https://designlab.com/blog/voice-user-interface-design-best-practices}_{. (Accessed 18 April 2024).}

_{3. Diaz, I. (2023) The Evolution Of User Interface: From GUI To Voice and Gesture Control. Available at:}_{https://apiumhub.com/tech-blog-barcelona/the-evolution-of-user-interfaces/}_{. (Accessed 18 April 2024).}

_{4. Reeves, L.M., Lai, J., Larson, J.A., Oviatt, S., Balaji, T.S., Buisine, S., Collings, P., Cohen, P., Kraal, B., Martin, J.C. and McTear, M. (2004) Guidelines for multimodal user interface design. Communications of the ACM, 47(1), pp.57-59.}

_{5. Naskar, V. (2023) Why AI Can’t Replicate Human Emotional Intelligence And Creativity. Available at:}_{https://medium.com/illumination/why-ai-cant-replicate-human-emotional-intelligence-and-creativity-e5ad137033a5}_{. (Accessed on 19 April 2024).}

_{6. Cornelio, P., Velasco, C. and Obrist, M. (2021)}_{Multisensory Integration as per Technological Advances: A Review. Front Neurosci}_{. Available at:}_{https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8257956/}_{. (Accessed 19 April 2024).}

‍

The Transformative Potential of Voice User Interface

Its Current State

The Voice of the Future

How to Adapt

What About EQ?

Integration with Multimodal Interfaces

Challenges and Considerations

Conclusion

References

Tell us about your project

About

Community

Legal

Socials