Detection and analysis of major topics addressed by US politicians
Introduction

The goal of our project is to investigate what are the topics and the problems that US politicians have addressed the most in the period of 2015 until the April of 2020. For this purpose, we used the Quotebank dataset that consists of more than 178M quotations extracted from English articles. We are not only interested to categorize topics, but we also want to dive deeper into specific events that happen during the mentioned period.

Research questions

More specifically, the research questions we are trying to answer are as follows:
  • What are the major topics addressed among US politicians?
  • How does the popularity of these topics change over time?
  • Is it possible to detect certain events which justify such trends?
  • How much do US parties, Republicans and Democrats, address these problems?
  • Is it possible to compare the results with the results from previous research done on this matter?

Methodology

Although Jana's mother was worried that her daughter would have to read through 178 million quotations, luckily, we managed to utilize our knowledge from the Applied Data Analysis course! In order to detect major topics addressed among US politicians, we used LDA (Latent Dirichlet Allocation) method. We applied SBERT (Sentence-BERT) model for filtering the quotations on certain topics.
Major topic detection

We used LDA (Latent Dirichlet Allocation) method in order to detect the most important topics. We clustered the quotes for each year with the respect to the party separately. Every quote is assigned a probability that it belongs to a particular cluster. One cluster represents one or more topics. It would be perfect if one cluster is only about one particular topic, but of course, this is not the case. By empirical analysis of different numbers of clusters, we have decided to use 8 clusters for this task. For each cluster, keywords that are assigned to the cluster, as well as a topic name (which we derived based on those keywords) are shown.

This gives us an insight into what topics were presented mostly in US media. Moreover, it allows us to detect some interesting events. For example, check the year 2018 and Republican's quotations! Isn't it fascinating that there exists a cluster that contains the following keywords: China, trade, war? This clearly shows the importance of the Trade war with China. Many clusters include keywords such as health and budget together. Indeed, the affordability of health care is an important issue in the US.

Note: The spatial layout of clusters isn't semantically meaningful, it was inherited from the PyLDAVis tool and only helps in visualizing the clusters. The sizes of the words in the word clouds are arbitrary and do not correlate with the word frequencies.

Major topics by Democrats

Try clicking one of the topics!

Major topics by Republicans

Try clicking one of the topics!
Research questions

Popularity of major topics

In the previous section, we used LDA models for the detection of topics. After analysis of the obtained clusters, we got an impression what are the most addressed topics. Moreover, we see that there are significant overlaps between topics found in LDA clusters and topics from the study performed by the Pew research. For example, according to the Pew research, affordability of healthcare is a big issue in the US, and indeed health and budget are usually in the same clusters. Here, we decided to fix a set of topics, by taking all the topics mentioned in Pew research, but also some topics that we found interesting after analyzing LDA clusters (politics, China, Russia, Korea, etc). We show what were the most popular topics among this given set of topics during the years (based on the topic frequiency in both parties).
For each year, we can conclude that politics is the absolute winner. Second, third, and fourth places are always shared among topics related to crime, economy, and Donald Trump (which started to be among the top 4 places after his entry into the political career). We also see an important issue for people in the US is job opportunities.

Major topic popularity over time

Differences in views of major problems

We compared our results with the results from a previous study conducted by the Pew Research Center. They investigated the differences in views of the major problems among the Republican and Democrat parties. Problems they observed are climate change, economic inequality, racism, sexism, job opportunities, affordability of college education, affordability of health care, drug addiction, federal budget deficit, terrorism, and illegal immigration. They compare percentages of people in each party that consider each of these problems as relevant for the country. We tried to get an estimate of this using a different variable: the percentages of quotations related to each topic (out of all quotations) with regards to the party. The first plot shows topics that are taken more seriously by Democrats, while the second plot shows topics that are taken more seriously by Republicans, according to Pew Research. Let us concentrate on the question of who takes some problems more seriously. It makes sense to assume that people from that party would also speak more about such problems. We see that our approach differs from results by the Pew Research Center only in the case of drug addiction and job opportunities. However, in both of those cases, differences between Republicans and Democrats are minimal. For all the other topics, we see that our results follow a similar trend as results obtained by Pew Research, but the ratios are a bit different.
References

Meet the  limunADA team

Jana Vučković
Gojko Čutura
Vuk Vuković
Miloš Novaković