How can I equip my smart speaker with NLP?

by Kelvin
How can I equip my smart speaker with NLP?

As a company, I am interested in my smart speaker or voice assistant being able to talk about my business. The smart speakers already on the market, like Amazon Threw out or Google HomeThey already come with some default features, which is basically what is read in the box: “Alexa, what is the weather like today in Barcelona?”, “Alexa, tell me a joke”, “Ok, Google, who is Barack Obama?”..

This default knowledge belongs to the device, but what if I want my smart speaker to be able to support my customers? Or even allow them to shop? For example, suppose I am an airline and want my customers to be able to check the estimated time of departure before their flight; Or I am an insurance company and I want Alexa to provide healthy advice to my policyholders. Where does this knowledge come from and how can my smart speaker use it?


How can I create my own application, a.k.a. skill or action?

In order for the smart speaker to manage knowledge added to the default, I have to create my own application that the end user will be able to access thanks to an invocation phrase. For example: “Order a pizza at Pizzi-Pizzo-Pizza.”

Depending on the device I am using, this type of application is called skill in Amazon or action in Google. Ultimately, we are talking about building a piece of code that uses a dataset – the knowledge base – and a user interface – the device itself. This knowledge can be incorporated directly into the console of Amazon or to the Google console, creating these units of knowledge for each response that the smart speaker has to give, to get as many expressions as possible. The objective of this training process is to teach the matches between all the questions potentially asked and each unit of knowledge. It is not necessary to emphasize that it is an inefficient task that consumes excess time and resources. However, if I am already working with a natural language processing (NLP) solution, I have the advantage of being able to bypass the training process by directly linking my smart speaker application with my NLP technology, thanks to the APIs. And it is at this point that Artificial Intelligence companies intervene to solve the equation.

Let’s go back to creating my smart speaker app. I still have to create a piece of code, but instead of sharing my knowledge on Amazon or Google, my app sends any user interactions (i.e. any questions asked to Amazon Echo or Google Home) to Inbenta through the chatbot API. The specialized AI company understands and processes this input (NLU and NLP capabilities) and gets matchings semantics from the knowledge base. It means that, in this scenario, my knowledge base is on the platform of that company instead of the console of Amazon or from Google. Once the process of matching, the smart company returns the correct answer to my application, allowing the smart speaker to communicate it to the customer. To summarize each role, the smart speaker handles voice recognition and text-to-speech, while the artificial intelligence company is responsible for matching NLP. Thus, the end user receives an immediate and relevant response. Finally, and of utmost importance also, as a company, I save the time of training of the machine for the benefit of building a chatbot knowledge base suitable for smart speakers, but also for web, applications, messaging channels, etc.

What should I keep in mind when creating a smart speaker app?

An important rule of digital projects is that whoever provides the user interface is in control of it. In this case, it means that if I want to implement a skill in Alexa, I must request Amazon to validate it and Amazon You have the last word to decide if my skill meets your requirements. The invocation phrase that allows the user to enter my application has been previously mentioned. These “magic words” have to be approved by the interface provider, which also ensures that no invocation phrase can be common to multiple applications. Another example of the policy of Amazon Regarding skills, my application cannot claim to be any other person than Alexa. It means the knowledge unit “Hi, this is Jessica, your Pizzi-Pizzo-Pizza assistant, how can I help you?” would be rejected by AmazonBecause Alexa has to be Alexa and it can’t be Jessica. Once my application complies with the regulations of Amazon or Google, will approve and publish it so that any end user of the smart speaker can use it.

The second key factor concerns the answers that my application will give. As the person in charge of the knowledge base, and considering that this knowledge may be common to another chatbot, I must also take into account the specificities of a voice channel: among them, you cannot show rich media, you cannot click in a hyperlink and my end users will not be willing to hear very long answers to a simple question. However, if my NLP solution supports complex dialogs and decision trees, I can easily create step-by-step scripts through my knowledge base and offer my users a very human conversation, with the added value of interacting through voice instead of text.

At the end of the day, a smart speaker app posted on Amazon Echo or Google Home is nothing more than a chatbot, with the only difference that it uses a voice channel instead of the usual text interfaces. As with any chatbot, the key elements to pay attention to are: the knowledge base (the answers or the processes that I make available to my clients), the use case (what do I mean to whom and why) and any restrictions on the user interface.