HOW TO BUILD A SIMPLE VIRTUAL ASSISTANT USING PYTHON
Virtual assistants are everywhere from Alexa, to Google Home, to Apple Siri. They help us check the weather, make phone calls, control the thermostat, door locks, and other smart home devices e.t.c
In this article, I will be walking you through how to create a simple virtual assistant using Google Speech Recognition and IBM Watson Text to Speech in Python. This article assumes you have a basic understanding of the following prerequisites:
Functions
Classes
Request Library
WHAT WE WILL BE BUILDING
In this article, we will be building a simple virtual assistant named Cali.
Cali can check the weather, tell the local time, open up a twitter profile, and search for youtube videos. (This isn't a lot, but you can build on top of Cali to add more features, I urge you to do so when you're done with this blog post)
Cali is made up of the following source code files:
Util.py: Uses Python's webbrowser module to open a new browser tab.
Timeservice.py: Checks and return local computer time and local time in other cities.
Weather.py: Uses OpenWeatherMap API to fetch weather data for any city.
Speaker.py: Uses IBM Watson Text to Speech library to synthesize text to speech, save it as an audio file and play it using MPG321 command-line player.
Main.py: The main program file, receives input using Google Speech Recognition and maps the received text to an action to take.
To follow along with this blog post, go ahead and clone Cali from Github at https://github.com/iamabeljoshua/Cali/tree/online-tts-only or download the code-load using this link: https://github.com/iamabeljoshua/Cali/archive/online-tts-only.zip
For this tutorial, we will be using the online-tts-only branch on Github.
After you have downloaded or cloned Cali from Github, you should follow the steps in README.MD file to download and install Cali's dependencies.
CODING UP CALI
Cali's source code is pretty straight forward and isn't that difficult to understand. In this section, I will walk you through how Cali's entire source code works.
Util.py
Util.py file is the simplest of all Cali source code files, it contains the code we will use to open a new browser tab. It uses Python's webbrowser module to do that.
If you're not familiar with webbrowser in Python, visit this link to learn more about how it works.
import webbrowser
def open_page(url):
webbrowser.open_new_tab(url)
Timeservice.py
Timeservice.py file is responsible for checking for the local system time and local time in other cities. It uses Python's datetime module to check and return the current system time.
from datetime import datetime
class TimeService(object):
def __init__(self):
pass
def get_time(self, city_name):
#Todo implement to fetch local time in any city, using city name
#As an exercise, you can implement this function.
pass
def get_local_time(self):
current_time = datetime.now().strftime("%I:%M %p")
output = "Your current local time is " + current_time
return output
datetime.now() returns a new datetime object for the current system date and time. We simply used strftime("%I:%M %p") to create a formatted string from the datetime object.
The "%I" returns the hours passed since the beginning of the day in a 12-hour clock format.
The "%M" returns the counting minute for the current hour and the "%p" returns AM or PM depending on the current time.
To learn more about datetime in Python, check out this link
Weather.py
Weather.py file uses OpenWeatherMap API to fetch the current weather data of a city.
This file uses the Python requests library to send GET request to http://api.openweathermap.org/data/2.5/weather. We will send the APP ID that we got from registering at openweathermap.org and the city name as part of the request parameter.
If you don't know how HTTP requests and Python requests library works. Check out this article and and this too
import requests
import json
class WeatherService(object):
API_URL = "http://api.openweathermap.org/data/2.5/weather?q={}&APPID={}&units=metric";
API_KEY = "3b31a7e394e41c3a30759dfde1a3383e";
def __init__(self):
pass
def get_weather_data(self, city_name):
r = requests.get(WeatherService.API_URL.format(city_name, WeatherService.API_KEY))
weather_data = (r.json())
temp = self._extract_temp(weather_data)
description = self._extract_desc(weather_data)
return "Currently, in {}, its {} degrees with {}".format(city_name, temp, description)
def _extract_temp(self, weatherdata):
temp = weatherdata['main']['temp']
return temp
def _extract_desc(self, weatherdata):
return weatherdata['weather'][0]['description']
The get_weather_data function receives city_name as argument and uses the city name to fetch weather data. The function sends out a GET request using requests library and decodes the response of the request as a JSON using .json()
The .json() reads the response content as JSON and transforms it into a Python List and Dictionary object depending on the response content.
We can simply use indexing(list and dict indexing) to retrieve the part of the response we care about. And that is what the _extract_temp and _extract_desc does. They extract the weather temperature and weather data description from the response object respectively.
Once we've done that, we simply returned a new descriptive string of the city's current weather data.
Speaker.py
Speaker.py file uses IBM Watson Text to speech to synthesize text to speech. To do this, we import and use the ibm_watson Python library.
You should have installed this library if you followed the steps in the README.MD page on Cali's Github repository page.
# A python program that converts text to speech using tts
from ibm_watson import TextToSpeechV1
from ibm_cloud_sdk_core.authenticators import IAMAuthenticator
import random
import os
# import pyttsx3
class TTSSpeaker(object):
voice = "online" # set to offline to use pyttsx
def __init__(self):
#initialize offline tts
# self.pyttsxengine = pyttsx3.init()
#initialize an authenticator with our UNIQUE key
self.authenticator = IAMAuthenticator('1-D9_Ydjmq8Hg72JkLFBZTNWtC9i6X6NEb6LffHi2LBH')
#initialize texttospeech mdoule with the authenticator
self.text_to_speech = TextToSpeechV1(
authenticator=self.authenticator
)
#set the service url is recommended on IBM Watson Tutorial page.
self.text_to_speech.set_service_url('https://api.eu-gb.text-to-speech.watson.cloud.ibm.com/instances/1915a1e6-e3b2-43a7-a600-91c6810567ac')
def speak(self, input_texts):
print(input_texts)
if(TTSSpeaker.voice == "offline"):
pass
else:
filename = "ibmvoice.mp3"
with open(filename, 'wb') as audio_file:
audio_file.write(self.text_to_speech.synthesize(
input_texts,
voice='en-US_AllisonV3Voice',
accept='audio/mp3'
).get_result().content)
os.system("mpg321 "+ filename)
On the init function of the TTSPeaker class, we initialized the IBM TextToSpeech module with an authenticator object which we configured with the unique key we got from registering at IBM Cloud. This is necessary for the IBM Watson Text to Speech API to verify the authenticity of our request.
We also set the service URL as we are instructed to do on the IBM TTS documentation page
The speak function receives input_text as an argument and uses IBM Watson TextToSpeechV1 module to synthesize input_text by calling .sythesize(input_text, voice='en-US_AllisonV3Voice',accept='audio/mp3')
The input_text is the text to synthesize.
The voice parameter is the IBM TTS voice we want to use to synthesize the text. You can change the voice to one of the voices in IBM TTS DEMO
The accept parameter is the file format we want to receive back from IBM.
Note that we have already initialized TextToSpeechV1 on the init function and assigned it to self.text_to_speech.
Once IBM has synthesized the text to speech, we get the binary content of the synthesized audio and write it to a file.
We simply used MPG321 command line player to play the newly created audio file. You should have MPG321 command line player installed on your PC if you followed the steps on the README.MD page found at Cali's Github repository page.
Main.py
Main.py uses Google's Speech Recognition library to listen and synthesize speech to text. We then try to map the received text to the right action to take. If we don't understand the received text, our Speaker class will synthesize "I am sorry, I didn't get that" to speech and play it. Below is the source code for this file:
import speech_recognition as sr
from speaker import *
from timeservice import *
from weather import *
from util import *
import re
import json
#the following functions uses regex to match recognized text to validate actions to take.
def is_twitter_profile_action(recognized_text):
#demonstrates how to use regex for pattern matching and extraction.
pattern = "open up (\S+) on twitter"
matches = re.findall(pattern, recognized_text, re.IGNORECASE)
return len(matches) > 0
def is_youtube_search_action(recognized_text):
text = recognized_text.lower() #convert everything to lower case
return "search for" in text and "on youtube" in text
def extract_youtube_search_term(recognized_text):
text = recognized_text.lower()
text = text.replace("search for","")
text = text.replace("on youtube","")
return text.strip() #remove any leading or trailing whitespace
def get_twitter_profile(recognized_text):
pattern = "open up (\S+) on twitter"
matches = re.findall(pattern, recognized_text, re.IGNORECASE)
return matches[0]
def is_weather_search_action(recognized_text):
text = recognized_text.lower() #convert everything to lower case
return "what is the weather in" in text
def extract_city_name_for_weather_action(recognized_text):
text = recognized_text.lower()
return text.replace("what is the weather in","").strip()
def main():
tts_speaker = TTSSpeaker()
recognizer = sr.Recognizer()
while True:
with sr.Microphone() as source:
print("Say something!")
audio = recognizer.adjust_for_ambient_noise(source) # listen for 1 second to calibrate the energy threshold for ambient noise levels
audio = recognizer.listen(source) # now when we listen, the energy threshold is already set to a good value, and we can reliably catch speech right away
# Speech recognition using Google Speech Recognition
try:
# To use your API Key use: `r.recognize_google(audio, key="GOOGLE_SPEECH_RECOGNITION_API_KEY")`
# instead of `r.recognize_google(audio)`
recognized_text = recognizer.recognize_google(audio)
print("You said: " + recognized_text)
#Here we will use simple if statements to map the captured text to appropriate actions.
if "local time" in recognized_text:
tts_speaker.speak(TimeService().get_local_time())
#should open a twitter profile?, sentence to match: open up iamabeljoshua on twitter.
if is_twitter_profile_action(recognized_text):
open_page("https://twitter.com/" + get_twitter_profile(recognized_text))
#should open a youtube search page?, sentence to match: search for {searchterm} on youtube
if is_youtube_search_action(recognized_text):
open_page("https://www.youtube.com/results?search_query=" + extract_youtube_search_term(recognized_text))
#should fetch weather data for a particular city?
if is_weather_search_action(recognized_text):
tts_speaker.speak(WeatherService().get_weather_data(extract_city_name_for_weather_action(recognized_text)))
else:
tts_speaker.speak("I am sorry. I didn't get that!. There is no procedure available to handle your request")
except sr.UnknownValueError:
tts_speaker.speak("I am sorry. I didn't get that!")
print("Google Speech Recognition could not understand audio")
except sr.RequestError as e:
tts_speaker.speak("I am sorry. I didn't get that!")
print("Could not request results from Google Speech Recognition service; {0}".format(e))
if __name__ == "__main__":
main()
The code above listens for speech, converts the speech to text using the SpeechRecongnition library and tries to map the received text to an action to take. It uses simple regex patterns and string functions to search for certain keywords like "on youtube" to know that the command is to open up a youtube video search page or "on twitter" to know that the command is to open up a twitter profile.
Cali isn't super-smart, therefore it can only process specific commands. For example, it understands what "open up iamabeljoshua on twitter means" and even what "What is the weather in Abuja" means, but it doesn't understand what "What is the weather like in Abuja" means.
To make Cali smarter than this and teach it to be able to understand and extract intent from a text, we will need to use natural language processing, which is beyond the scope of this tutorial.
For now, enjoy testing out and adding more features to Cali. Hopefully, in the future, I will be publishing a new blog post series on how to use NLP techniques to make Cali smarter.
CONCLUSION
Thanks for reading this blog post. Cali is a simple project that demonstrates how you can use Speech Recognition and Text to Speech to create a simple virtual assistant. Feel free to download and reuse a portion or all of Cali's source code, forking and submitting pull-requests on Github.
ABOUT ME
I am Abel Joshua, a self-taught full-stack software developer currently building and co-founded https://clique.ng
You can also follow me on twitter at https://twitter.com/iamabeljoshua
If you have any questions, drop a comment below and I will be glad to answer them.