Visualize News Word Cloud using Python, Flask and JQCloud
Learning objectives of this post are:
- Get news feed using REST based API from NewsAPI.org
- Get words and their frequency
- Visualize word cloud using JQCloud
We will build a Flask APP to put everything in place.
NewsAPI.org
Goal-1: Get news feed fromNewsAPI.org provides free and simple to use RESTful API. It provides two endpoints, sources and articles.
Using sources endpoint we can get a list of news sources and blogs available on News API and using articles endpoint we can get a list of articles for a particular source.
Let's now understand the request url for articles endpoint.
https://newsapi.org/v1/articles?source={source}&apiKey={apikey}
As, we can see API requires two parameters, source and apikey. Source is a short code provided by NewsAPI.org for each news source or blog listed by it. You can choose any news source or blog of your choice, some of the sources with their respective codes are Bloomberg (bloomberg), BBC News (bbc-news), Business Insider(business-insider) etc.
You can generate your API key from NewsAPI.org when you will register on it.
So, request will look like this:
https://newsapi.org/v1/articles?source=bbc-news&apiKey=123456
Let's write python code to get news data
import requests # this we will use to call API and get data
import json # to convert python dictionary/list to string format
# get API key from NewsAPI.org
NEWS_API_KEY = "123456"
# url for articles endpoint
# I'm using bbc-news source, you can choose a source of your choice
# or can pull data from multiple sources
url = "https://newsapi.org/v1/articles?source=bbc-news&apiKey="+NEWS_API_KEY
# call the api
response = requests.get(url)
# get the data in json format
result = response.json()
print(result)
This is the json response we will get.
{
"status":"ok",
"source":"bbc-news",
"sortBy":"top",
"articles":[
{
"author":"BBC News",
"title":"British Airways to resume most flights but delays still expected",
"description":"British Airways warns there will still be some delays and cancellations, a day after its IT crash.",
"url":"http://www.bbc.co.uk/news/uk-40074751",
"urlToImage":"https://ichef.bbci.co.uk/news/1024/cpsprodpb/11F52/production/_96245537_ba_reuters.jpg",
"publishedAt":"2017-05-28T07:54:49+00:00"
}
]
}
Goal-2: Get words and their frequency
To achieve this first we will get description for each news article returned by the API. Then we will split the description/sentences into words using NLTK and after that we will use collections.Counter to get the words and their frequencies.
Let's achieve this step-by-step!
from nltk.tokenize import word_tokenize # to split sentences into words
from nltk.corpus import stopwords # to get a list of stopwords
from collections import Counter # to get words-frequency
descriptions = []
# this is in continuation of above code
# result variable holds the json response
# all the news articles are listed under 'articles' key
# we are interested in the description of each news article
for each_article in result['articles']:
description.append(each_article['description])
# split sentences into words
words = []
for description in descriptions:
tokens = word_tokenize(description)
words.extend(tokens)
# remove stopwords from our words list and also remove any word whose length is less than 3
# stopwords are commonly occuring words like is, am, are, they, some, etc.
stop_words = set(stopwords.words('english'))
words = [word for word in words if word not in stop_words and len(word)>2]
# now, get the words and their frequency
words_freq = Counter(words)
print(words_freq)
JQCloud
Goal-3: Visualize word cloud usingIn this goal we will return the word cloud data from python to the JQCloud for the visualization.
JQCloud requires data in following format, so before returning word cloud data, we will have to put it in a usable format.
[
{
'text':'police',
'weight':100
},
{
'text':'parents',
'weight':80
}
]
Code to convert data JQCloud compatible format and also dump json into string format
words_json = [{'text': word, 'weight': count} for word, count in words_freq.items()]
# json.dumps is used to convert json object i.e. dictionary or list into a string
print(json.dumps(words_freq))
Now lets write some JQuery code, to call our flask-app endpoint, get data and then build word cloud
First, we will write our html code (index.html) to include the css and js for JQCloud and also include our jquery script.
<!DOCTYPE html>
<html lang="en" xmlns="http://www.w3.org/1999/html">
<head>
<meta charset="UTF-8">
<title>News Word Cloud</title>
<!-- You can download css and js from https://github.com/mistic100/jQCloud/tree/master/dist -->
<link rel="stylesheet" href="../static/css/jqcloud.min.css">
<!-- You need to include jquery before the jqcloud.js, you can get it from -->
<script src="https://ajax.googleapis.com/ajax/libs/jquery/3.2.1/jquery.min.js"></script>
<script type="text/javascript" src="../static/js/jqcloud.min.js"></script>
<!-- here we included our script.js -->
<script type="text/javascript" src="../static/js/script.js"></script>
</head>
<body>
<!-- Empty div where JQCloud will build the word cloud-->
<div id="word_cloud">
</div>
</body>
</html>
In script.js, we will call our flask-app endpoint 'word_cloud', get data and visualize it using JQCloud
$(document).ready(function () {
// on page load this will fetch data from our flask-app asynchronously
$.ajax({url: '/word_cloud', success: function (data) {
// returned data is in string format we have to convert it back into json format
var words_data = $.parseJSON(data);
// we will build a word cloud into our div with id=word_cloud
// we have to specify width and height of the word_cloud chart
$('#word_cloud').jQCloud(words_data, {
width: 800,
height: 600
});
}});
});
Here is our complete Flask-APP code
from flask import Flask, render_template
from nltk.tokenize import word_tokenize # to split sentences into words
from nltk.corpus import stopwords # to get a list of stopwords
from collections import Counter # to get words-frequency
import requests # this we will use to call API and get data
import json # to convert python dictionary to string format
app = Flask(__name__)
# get API key from NewsAPI.org
NEWS_API_KEY = "123456"
@app.route('/')
def home_page():
return render_template('index.html')
@app.route('/word_cloud', methods=['GET'])
def word_cloud():
try:
# url for articles endpoint
# I'm using bbc-news source, you can choose a source of your choice
# or can pull data from multiple sources
url = "https://newsapi.org/v1/articles?source=bbc-news&apiKey="+NEWS_API_KEY
# call the api
response = requests.get(url)
# get the data in json format
result = response.json()
# all the news articles are listed under 'articles' key
# we are interested in the description of each news article
sentences = ""
for news in result['articles']:
description = news['description']
sentences = sentences + " " + description
# split sentences into words
words = word_tokenize(sentences)
# get stopwords
stop_words = set(stopwords.words('english'))
# remove stopwords from our words list and also remove any word whose length is less than 3
# stopwords are commonly occuring words like is, am, are, they, some, etc.
words = [word for word in words if word not in stop_words and len(word) > 3]
# now, get the words and their frequency
words_freq = Counter(words)
# JQCloud requires words in format {'text': 'sample', 'weight': '100'}
# so, lets convert out word_freq in the respective format
words_json = [{'text': word, 'weight': count} for word, count in words_freq.items()]
# now convert it into a string format and return it
return json.dumps(words_json)
except Exception as e:
return '[]'
if __name__ == '__main__':
app.run()
Added jumbotron from bootstrap to make it look little better!..
You can fork the complete code from git repository - https://github.com/prateekkrjain/newsapi_word_cloud
Hi Shohreh!
There are two ways to achieve it.
Thanks for this post, I managed to put a static word-cloud, however I was wondering how I might be able to change the word cloud data. or how can I call the word_cloud function with inputs.