With Apple's Siri, Microsoft's Cortana, Amazon's Alexa, and Google's Assistant, smart voice interfaces have found their way onto each of our smartphones, and they are about to be found in more and more gadgets in our homes:
The newest generation of smart speakers brings the vocal assistant as a key feature, not only allowing us to listen to our favorite tracks, streamed from Spotify via Bluetooth from our smartphones, but also acting as an interface to the web.
Using voice assistants to get the weather forecast for the day, find out what the capital of Finland is, or create a reminder to buy the milk makes these tasks a lot easier.
Natural voice interfaces open up a door to the connected world to control our homes and gain easy access to information. The technology is still young and full of flaws—many questions that you might ask the digital assistant are not understood yet, and result in an answer like "Sorry, I could not understand you. Did you mean…?". The complexity of conversations is also far off from resembling natural conversations between humans. Asking one of the voice assistants "How is the weather tomorrow?" is an easy task and the response will most likely be helpful to you and will let you know whether you should pack an umbrella or not. Asking a follow-up question such as "And the day after?" is a completely different problem and has not been supported until very recently.
Because of advancements in machine learning, these problems are about to be tackled, and the assistants get better each year, respond to more complex questions (and follow-up questions), and behave more and more like humans.
You might be wondering what all this has to do with IoT. When working on IoT projects, and especially wearables, you might have strict spacial restrictions. Let's say you want to create an LED necklace that can change color—being able to change the color would require physical buttons, which would take up extra space. It would also often feel hard to use without adding a display. This could already be too much to carry around. Using a voice assistant for this case would feel way more natural. The necklace could have one button to activate the assistant; once it is pressed, you could tell your necklace to change its color to blue, for example, using an external voice recognition service such as the Google Speech-to-Text API. From the API, you would then get the text blue back, which you could use in your Arduino code to actually switch the color.
Using external services and premade modules is one of the things I really want to push you toward. Creating a prototype is not about creating a consumer-grade product, ready to be produced in a batch of 50,000 in a factory in Shenzhen and hand-coded in tens of thousands of lines of code. Creating a prototype is about succeeding (or failing) fast in creating a functional prototype, either because you really want to build your own smart coffee maker or you want to find out whether that idea you had for your company might actually help with digitalization. Use whatever is available, hack it together. If one of the components is a proprietary voice recognition library, it is alright. If your prototype is doing well and you are thinking about bringing it to the next stage and actually producing it, you can still look for alternatives.