#VOICE

Voice technologies, Voice assistants & Voice as a Service

7 min readDec 13, 2020

I’ve seen a growing number of fintechs, banks, and other organizations set up innovation voice labs to experiment with voice-based technologies, be it in the fintech context, insurance context, retail context, healthcare context or n number of other fields.

Voice itself is an interesting channel? technology? commodity? medium?

Voice is being transformed into a variety of permutations and combinations, be it a virtual voice assistant integrated into your phone or laying slack on your bedside table that enables you to activate and access information on the fly, or as another channel to provide your content (podcasts, ads, information, guidance, touch-less support), or as a search mechanism (in addition to textual and image search).

It‘s nothing new per say (voice based technologies aka IVR have been incorporated into the call support ecosystem since 1970s), however the proliferation of voice based innovation currently is more due to the deep scratching of nuanced usecases in a smorgasbord of contexts and industries.

First and foremost, how can voice help the business?

Business case for voice based technologies & virtual voice assistants

Interactive Voice ads (Revenue play) —

Pandora recently launched interactive voice ads, whereby users verbally respond to advertiser prompts. The format is meant in particular to aid advertisers in connecting with users who are not looking at their phone. For example, when people are listening to Pandora while driving, cooking, cleaning the house or doing some other hands-free activity.

Voice assistants for employees (Employee productivity play)—

Walmart is set to launch its own voice assistant, for its employees. This would enable store employees to look up prices, access store maps, find products, view sales information, check email in addition to access COVID information, including the latest guidelines, guidance and safety videos. This can speed up the time it takes to get information as compared to typing a query on a small screen. This allows employees to better engage with customers instead of spending time on their device looking for information.

And in the COVID-19 era, the tool would offer another perk — it’s easier to use a voice app when you’re wearing gloves.

Voice search for customers (Customer engagement play) —

Snap is preparing to roll out a new method of sorting through Lenses (Snapchat image/video filters), via voice, and if they can nail it, the company will have a clear pathway for transition from entertainment-only AR to a platform based around utility. In its current format, the app’s new voice search will allow Snapchat users to ask the app to help it surface filters that enable them to do something unique.

The 3 examples above demonstrate that increasing employee productivity, harnessing ad revenue & customer engagement/adoption would be the three primary business cases for conversational AI, voice virtual assistants or voice technologies in general. Let’s add cost-reduction to the mix as well, considering we’ve been using voice bots, chat-assistants and automated voice technologies in the customer support setting for quite a long time now.

However, why is voice taking so long to adopt across the enterprise?

Barriers to adoption

From a business standpoint —

Data integration — For voice to work seamlessly across legacy systems, formats and technical architectures, it needs to be normalized into a common format. And if a company is working with a vendor or other providers to enable voice technology for a specific usecase, the integration of differing ecosystems also poses a challenge to quickly fine-tune and enhance voice for smarter functioning.
ROI — the ROI of investing heavily into voice technologies still needs to be realized by organizations. Many companies today have innovation hubs that’ve launched experimental versions of voice assistants, however these voice capabilities are still just ‘nice to haves’ in the ecosystem.

From a customer standpoint —

Visual or Face-to-Face interaction — a mobile screen or a face to face interaction is still a preferred way of interacting with or accessing information, especially if the context is from a banking/payments standpoint.
Trust — It’s challenging to put our trust (and data) in a voice, especially from bigger players (Alexa, Google Home) and even niche providers.
Work in Progress — AI technology behind voice assistants is still being iterated on and is a work in progress. We’re not at a stage currently to have in depth conversation about our finances, expenses, mortgages or transaction disputes with a voice assistant.

However, organizations are still heavily investing into voice. There are voice labs that have been set up just to explore different use cases applicable for voice technology. Technology is getting refined everyday with the hopes that it’d get incremental adoption as we progress.

More niche use cases are being thought about i.e. instead of AI based voice assistants trying to solve everything for the user , they can be applied to solve niche cases such as telling you the amount you spent on groceries last month or over the period of last 4–5 months, and strategies on how you can save more by harnessing some discounts and offers. Basically solving a very particular usecase, but solving it well. This would be an opportunity for companies to provide a real value add to the customer, gain customer’s trust and then slowly expand the technology to more usecases.

Benefits and risks of 1st party deployments (web, mobile app) vs. 3rd party (Alexa, Assistant)

This depends on the organizational goals and what you’re trying to solve for.

One immediate benefit of developing voice capabilities as a first party deployment is that you can control the end to end experience without having to integrate with another channel provider (such as Alexa or Google Home) or without having to create a revenue-share deal.

You also have the highest control over data privacy and user security of your customers through 1st party deployment, which might be a risk with 3rd party deployments. You can keep the data in your own ecosystem without ever risking any 3rd party data breaches or hacks.

However, on the flipside, you cannot leverage the userbase and customer traction of platforms that already have a pretty significant voice usage such those of Alexa, Google Home or Siri. You cannot leverage the advantage of inserting your banking usecase within the user’s Alexa music hearing journey.

Voice innovation in the market

Voice isolating algorithms

With the rise of many virtual collaboration tools including video and audio conference call systems, the need for an algorithm or a solution that can eliminate or minimize surrounding background noise is critical, especially with the whole world working from home currently.
Krisp’s smart noise suppression tech, which silences ambient sounds and isolates your voice for calls, has arrived just in time. Krisp applies a machine learning system to audio in real time that has been trained on what is and isn’t the human voice. What isn’t a voice gets carefully removed even during speech, and what remains sounds clearer.

The next iteration will tell you not just about noise, but give you real time feedback on how you are performing as a speaker. Haven’t you ever wondered about how much you actually spoke during some call, or whether you interrupted or were interrupted by others, and so on?
When someone is speaking they may not necessarily want to see that. But over time we’ll analyze what you say, give you hints about vocabulary, how to improve your speaking abilities.
Think Grammar.ly for voice and video.

Microchips for voice based apps

Syntiant, which makes semiconductors for voice recognition and speech-based applications, provides adoption of voice services like Alexa, especially in mobile scenarios that require devices to balance low power with continuous, high-accuracy voice recognition.

Digital voice messaging services

Yac, the digital voice messaging service — users can turn their call button into a Yac button to deliver audio messages instead of doing real-time phone calls.
Riff, a London-based startup developing what it describes as a “voice-first” chat tool for remote working —While tools like Slack and Zoom are built first for text and video, respectively, Riff thinks that audio is the missing piece of the puzzle, but used in a way that encourages the spontaneity and serendipity experienced by in-person teams.

enables effortless and spontaneous collaboration throughout the day. The remote equivalent of turning to ask a colleague a quick question, or to discuss the project you are working on.

Voice-to-Text

Medallia — a customer experience platform that scans online reviews, social media, and other sources to provide better insights into what a company is doing right and wrong and what needs to get addressed .

Voci transcribes 100% of live and recorded calls into text that can be analyzed quickly to determine customer satisfaction.

While there are a lot of speech-to-text offerings in the market today, the key with Voci is that it is able to discern a number of other details in the call, including emotion, gender, sentiment, and voice biometric identity. It’s also able to filter out personal identifiable information to ensure more privacy around using the data for further analytics.

Final thoughts / Future of Voice

Organizations and customers would definitely adopt voice more widely as the technology gets more sophisticated, more intuitive and the customer experience gets free flowing.

I imagine a future where I start my financial journey on one device or channel, and seamlessly carry it over another device or channel or form factor eg. I’m inputting the number of stocks I want to buy of Tesla, and then I switch gears because I’m driving, and I use my voice to complete the transaction, and I get a successful trade completion notification on my holographic screen in my car.

Voice will become a commodity like a website or an app. Just as we publish on a website and on a mobile app, we’ll start publishing on a third channel, which is voice.

Voice isn’t quite there yet, but it’s moving in that direction. The pandemic has only amplified the use.