I made a bot with API.ai

03. Jun 2017 00:30

I made a bot with API.ai, Google Cloud Speech, WebSockets and WebRTC

It has become very obvious that the next big thing is digital home assistants. Apps and devices that you can talk to and use to control light and heating among thousands of other things. These services are usually made from some sort of chatbots. Chatbots are nothing new though. Apple's Siri has already existed several years. Microsoft has it's Cortana, Google has Google Assistant and Amazon has Amazon Alexa. With Amazon Echo or Echo Dot, you have a device in your home that you can talk to and integrate your own apps in. Google recently presented their version of Amazon Echo at Google I/O 2017, named Google Home and now it seems like Apple releasing something similar too.

With all the hype I wanted to see for myself if chatbots actually have a future or if it's just a fad, like 3D movies and VR (Yes, I said it). So I made a tiny bot with a couple of Google services which purpose is to help the user search for products, add them to a cart and check out in a conversational way. It consists of Googles bot framework, api.ai, which is a free service to build your bot and map text to "intents". Intents can be seen as the action you want the bot to perform. My bot has 11 intents, which includes "add to cart" and "check" among others. Several of the intents use a web service I've set up with NodeJS for performing product search from a Mongo-database.

The frontend part has a very plain look and mostly focus on sending written or spoken queries to a backend Node-service, which handles all the communication between Google Cloud Speech and api.ai. For spoken queries I use WebRTC to capture the sound and a library called MediaRecorder for "recording" from the WebRTC stream. The recording are posted to the backend as a blob and forgotten by the frontend. When the AppService is ready and have received a response from api.ai, it sends data to the frontend using a websocket connection.

This setup works very well, except for an issue with MongoDB fulltext search. MongoDB always returns a whatever matches any of the words you enter. So it may include results you might not have asked for. Instead of going for the full Elasticsearch-experience I ended up using regular expressions instead. The drawback is that your queries must be more precise. After all it's just a demo...

To the left it shows the result of a simple conversation. In the example to the right it performs a freetext search for iphones. 

So what is my conclusion?

Written it works very well, and it could be a great thing to add to Slack, Skype, Facebook messenger or even your own website. There is still some challenges regarding spoken words though. Language support is one of the major ones. Names, products names and addresses can also be troublesome. It might be a hype for someone, but with training and enough data I'm sure it will be usefull in many settings.

Check it out here!