Codementor Events

Wordcloud with sentiment analysis

Published Apr 25, 2019Last updated Apr 28, 2019
Wordcloud with sentiment analysis

A few days left for 2019 general elections in Spain. Four candidates participating in TV debates. Media speaking about polls and undecided electors. Parties throwing big money in Facebook advertising. Political leaders pariticipating in the last meetings and sending their final words in Twitter.

So, this is the perfect moment to analyze Twitter messages from the main leaders. Which has been their message during last year? Well, this post is about twitter word analysis of the five most important political leaders in Spain in 2019. I will show you how to build some wordclouds and classify their words according to positive and negative sentiment using R.

I choose five leaders (ordered by predicted number of votes according to polls in 24/04/2019) and the tweets from 12 months ending the 24 April 2019:

Pedro Sanchez (PSOE) (1M followers)
Pablo Casado (PP) (238k followers)
Albert Rivera (Ciudadanos) (1.1M followers)
Pablo Iglesias (Unidas Podemos) (2.2M followers)
Santiago Abascal (VOX) (218k followers)

The first thing to do is to retrieve the tweets from these politicians. So let's say that we managed to get the tweets in a csv file for each leader in the selected period (12 months). I will describe the steps for one leader. For the rest the process is similar. So, for Pablo Casado, we have a file called 'PabloCasado.csv' with a column called text that contains the content of each tweet.

Tweets <- read_delim("PabloCasado.csv", delim=";")

Now, I use the tm library for mining text, build a Corpus, treat content and finally generate the document term matrix, tidy the matrix and join it with a data frame containing the sentiment classification. So the match will be something like this:

ap_sentiments <- ap_td %>%
  inner_join(get_sentiments_spanish(), by = c(term = "word"))

In one side I have a data frame with rows containing leader's words and their frequency. In the other side, a data frame containing a list of positive words in Spanish like (abrazar, cantar, bailar) and another with negative words like (llorar, golpear, sufrir). So the inner_join will result in a list of words tweeted by the leader that have a positive or negative classification. And that is the main idea of our simple sentiment analysis.

As these are tweets written in Spanish, I needed to find a Special Lexicon in Spanish for positive/negative words and I decided to use this one:

Saralegi X., San Vicente I.. 2013. "Elhuyar at TASS 2013". In Proceedings of "XXIX Congreso de la Sociedad Española de Procesamiento de lenguaje natural". Workshop on Sentiment Analysis at SEPLN (TASS2013). Madrid. pp. 143-150. ISBN: 978-84-695-8349-4

And now, it is just the moment to treat the new data frame to visualize the results. One inmediate metric is: how many positive words did this candidate tweet during that period? how many negative? which was the balance? positive? which is the most positive leader? and the most negative? Lets show a table with this summary:

Leader Positive Words Negative Words Total
Pedro Sánchez 3911 1153 2758
Pablo Casado 6054 3024 3030
Albert Rivera 4300 1954 2346
Pablo Iglesias 2443 1271 1172
Santiago Abascal 1425 1634 -209

Facts: Mr. Casado is the most positive twitter leader and also the one tweeting more (at least sentiment words) Second positive twitter leader is Mr. Sánchez. Third Mr. Rivera. And then, Mr. Iglesias has one third of negative words and Mr. Abascal is the negative leader with more negative words than positive ones.

For building the word clouds, I used the library wordcloud and add the color depending on the word classification. The result is the following:

Pedro Sánchez:
SanchezCastejon_wc_positive_withcolor.png

Pablo Casado:
PabloCasado_wc_negative_withcolor.png
Albert Rivera:
AlbertRivera_wc_positive_withcolor.png
Pablo Iglesias:
PabloIglesias_wc_positive_withcolor.png
Santiago Abascal:
SantiagoAbascal_wc_positive_withcolor.png

When I first looked at the results, I though: gee!, why they use the word 'gracia' so many times? Till I discovered that the word all the leaders use is 'gracias' which means 'thank you'. A method in the tm was applied to search for the stem of the words, so games and game will be represented with just the stem word. That is the reason 'gracia' appears instead of 'gracias'.
So, it is important to note that negative clouds appear bigger than the positive clouds but that has no relationship with the frequency between them. To prove that let's plot negative and positive words together for one leader, just to verify that in general all leaders tweet more positive words. For instance, for Pablo Casado:

PabloCasado_wc_all_bycolor.png

So wordclouds allows a quick visualization of most frequent words and in our case to confirm if most frequent words of its leader are aligned with the core values of their party. Those familiarized with Spanish political scenario may extract some conclusions in the wordclouds like:

  1. Far right (VOX)
  • The amount of negative words of Santiago Abascal, leader of a new party in the far right. 'odio' hate and 'miedo' fear as the most frequent words.
  1. Liberal Parties (PP, Ciudadanos and VOX)
  • The positive word 'libertad' freedom with high frequency for the right parties (Mr.Casado, Mr. Abascal and Mr. Rivera)
  1. Liberal Parties (PP,Ciudadanos and VOX)
  • The negative word 'impuestos' taxes for liberal parties (Mr. Casado and Mr. Rivera) but not for Mr.Abascal.
  1. Socialist Party (PSOE)
  • The positive word 'igualdad' equality and the negative words 'fight' and 'inequality' in the PSOE party (Mr.Sanchez)
  1. Far left Party (Unidas Podemos)
  • The positive word 'derechos' rights and negative word 'corruptos' corrupted in the Unidas Podemos party (Mr. Iglesias)
  1. PP
  • The negative word 'terrorismo' terrorism (Mr. Casado)
  1. Ciudadanos
  • The word 'odio' hate appears as a very frequent negative word (Mr. Rivera)

That is all: just a glimpse of the powerful tools that R offers to text mining while visualizing results with modern techniques.

Discover and read more posts from Luis
get started
post comments1Reply
Rushabh Agarwal
5 years ago

Where can I get the code. I need this in a project.