Data-Informed Thinking + Doing

Topic Classification Using Bag-of-Words

Extracting the important topics from the final 2012 presidential debate transcript—using TF-IDF in R, Python, and Julia.

Term frequency-inverse document frequency (TF-IDF) is a baseline natural language processing (NLP) technique for discovering and extracting meaningful patterns in text. This is done by quantifying the frequency of each word in the document into a Bag-of-Words—signaling which words are influencing the topic. Advanced techniques, such as word embedding (i.e., Word2Vec) and language models (i.e., BERT), are explored on other posts.

Let’s see how Bag-of-Words through TF-IDF can help us process the transcript of the final 2012 presidential debate and understand the most important issues/promises for U.S. President (and fellow Columbian) Baraka Obama and former Massachusetts Governor (and current U.S. Senator from Utah) Mitt Romney.

Getting Started

If you are interested in reproducing this work, here are the versions of R, Python, and Julia used (as well as the respective packages for each). Additionally, Leland Wilkinson’s approach to data visualization (Grammar of Graphics) has been adopted for this work. Finally, my coding style here is verbose, in order to trace back where functions/methods and variables are originating from, and make this a learning experience for everyone—including me.

cat(R.version$version.string, R.version$nickname)
R version 4.2.3 (2023-03-15) Shortstop Beagle
require(devtools)
devtools::install_version("jsonlite", version="1.8.7", repos="http://cran.us.r-project.org")
devtools::install_version("dplyr", version="1.1.2", repos="http://cran.us.r-project.org")
devtools::install_version("tidyr", version="1.3.0", repos="http://cran.us.r-project.org")
devtools::install_version("ggplot2", version="3.4.2", repos="http://cran.us.r-project.org")
devtools::install_version("tidytext", version="0.4.1", repos="http://cran.us.r-project.org")
devtools::install_version("tm", version="0.7-11", repos="http://cran.us.r-project.org")
devtools::install_version("qdap", version="2.4.6", repos="http://cran.us.r-project.org")
devtools::install_version("wordcloud", version="2.6", repos="http://cran.us.r-project.org")
library(jsonlite)
library(dplyr)
library(tidyr)
library(ggplot2)
library(tidytext)
library(tm)
library(qdap)
library(wordcloud)
import sys
print(sys.version)
3.11.4 (v3.11.4:d2340ef257, Jun  6 2023, 19:15:51) [Clang 13.0.0 (clang-1300.0.29.30)]
!pip install pandas==2.0.3
!pip install plotnine==0.12.1
import pandas
import plotnine
using InteractiveUtils
InteractiveUtils.versioninfo()
Julia Version 1.9.2
Commit e4ee485e909 (2023-07-05 09:39 UTC)
Platform Info:
  OS: macOS (x86_64-apple-darwin22.4.0)
  CPU: 8 × Intel(R) Core(TM) i5-8259U CPU @ 2.30GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-14.0.6 (ORCJIT, skylake)
  Threads: 1 on 8 virtual cores
Environment:
  DYLD_FALLBACK_LIBRARY_PATH = /Library/Frameworks/R.framework/Resources/lib:/Library/Java/JavaVirtualMachines/jdk1.8.0_241.jdk/Contents/Home/jre/lib/server
  DYLD_LIBRARY_PATH = /Library/Java/JavaVirtualMachines/jdk-20.jdk/Contents/Home/lib/server
using Pkg
Pkg.add(name="CSV", version="0.10.11")
Pkg.add(name="DataFrames", version="1.5.0")
Pkg.add(name="CategoricalArrays", version="0.10.8")
Pkg.add(name="Colors", version="0.12.10")
Pkg.add(name="Cairo", version="1.0.5")
Pkg.add(name="Gadfly", version="1.4.0")
using Dates
using CSV
using DataFrames
using CategoricalArrays
using Colors
using Cairo
using Gadfly

Importing and Examining Dataset

transcript_r <- readLines("../../dataset/2012-final-presidential-debate-obama-romney.txt")
str(transcript_r)
 chr [1:973] "SCHIEFFER: Good evening from the campus of Lynn University here in Boca Raton, Florida. This is the fourth and "| __truncated__ "" "This one’s on foreign policy. I’m Bob Schieffer of CBS News. The questions are mine, and I have not shared them"| __truncated__ "" "SCHIEFFER: The audience has taken a vow of silence — no applause, no reaction of any kind, except right now whe"| __truncated__ "" "(APPLAUSE)" "" "Gentlemen, your campaigns have agreed to certain rules and they are simple. They’ve asked me to divide the even"| __truncated__ "" "Tonight’s debate, as both of you know, comes on the 50th anniversary of the night that President Kennedy told t"| __truncated__ "" "So let’s begin." "" "SCHIEFFER: The first segment is the challenge of a changing Middle East and the new face of terrorism. I’m goin"| __truncated__ "" "Governor Romney, you said this was an example of an American policy in the Middle East that is unraveling befor"| __truncated__ "" "SCHIEFFER: I’d like to hear each of you give your thoughts on that." "" "Governor Romney, you won the toss. You go first." "" "ROMNEY: Thank you, Bob. And thank you for agreeing to moderate this debate this evening. Thank you to Lynn Univ"| __truncated__ "" "This is obviously an area of great concern to the entire world, and to America in particular, which is to see a"| __truncated__ "" "With the Arab Spring, came a great deal of hope that there would be a change towards more moderation, and oppor"| __truncated__ "" "Our hearts and — and minds go out to them. Mali has been taken over, the northern part of Mali by al-Qaeda type"| __truncated__ "" "But we can’t kill our way out of this mess. We’re going to have to put in place a very comprehensive and robust"| __truncated__ "" "ROMNEY: It’s certainly not hiding. This is a group that is now involved in 10 or 12 countries, and it presents "| __truncated__ "" "SCHIEFFER: Mr. President?" "" "OBAMA: Well, my first job as commander in chief, Bob, is to keep the American people safe. And that’s what we’v"| __truncated__ "" "We ended the war in Iraq, refocused our attention on those who actually killed us on 9/11. And as a consequence"| __truncated__ "" "In addition, we’re now able to transition out of Afghanistan in a responsible way, making sure that Afghans tak"| __truncated__ "" "But I think it’s important to step back and think about what happened in Libya. Keep in mind that I and America"| __truncated__ "" "OBAMA: Now that represents the opportunity we have to take advantage of. And, you know, Governor Romney, I’m gl"| __truncated__ "" "ROMNEY: Well, my strategy is pretty straightforward, which is to go after the bad guys, to make sure we do our "| __truncated__ "" "But my strategy is broader than that. That’s — that’s important, of course. But the key that we’re going to hav"| __truncated__ "" "We don’t want another Iraq, we don’t want another Afghanistan. That’s not the right course for us. The right co"| __truncated__ "" "And how do we do that? A group of Arab scholars came together, organized by the U.N., to look at how we can hel"| __truncated__ "" "One, more economic development. We should key our foreign aid, our direct foreign investment, and that of our f"| __truncated__ "" "Number two, better education." "" "Number three, gender equality." "" "Number four, the rule of law. We have to help these nations create civil societies." "" "But what’s been happening over the last couple of years is, as we’ve watched this tumult in the Middle East, th"| __truncated__ "" "ROMNEY: It’s wonderful that Libya seems to be making some progress, despite this terrible tragedy." "" "But next door, of course, we have Egypt. Libya’s 6 million population; Egypt, 80 million population. We want — "| __truncated__ "" "And, of course, Iran on the path to a nuclear weapon, we’ve got real (inaudible)." "" "SCHIEFFER: We’ll get to that, but let’s give the president a chance." "" "OBAMA: Governor Romney, I’m glad that you recognize that Al Qaida is a threat, because a few months ago when yo"| __truncated__ "" "But Governor, when it comes to our foreign policy, you seem to want to import the foreign policies of the 1980s"| __truncated__ "" "You say that you’re not interested in duplicating what happened in Iraq. But just a few weeks ago, you said you"| __truncated__ "" "You said that we should still have troops in Iraq to this day. You indicated that we shouldn’t be passing nucle"| __truncated__ "" "OBAMA: So, what — what we need to do with respect to the Middle East is strong, steady leadership, not wrong an"| __truncated__ "" "SCHIEFFER: I’m going to add a couple of minutes here to give you a chance to respond." "" "ROMNEY: Well, of course I don’t concur with what the president said about my own record and the things that I’v"| __truncated__ "" "But I’ll respond to a couple of things that you mentioned. First of all, Russia I indicated is a geopolitical foe. Not…" "" "(CROSSTALK)" "" "ROMNEY: Excuse me. It’s a geopolitical foe, and I said in the same — in the same paragraph I said, and Iran is "| __truncated__ "" "(CROSSTALK)" "" "ROMNEY: Oh you didn’t? You didn’t want a status of…" "" "OBAMA: What I would not have had done was left 10,000 troops in Iraq that would tie us down. And that certainly"| __truncated__ "" "ROMNEY: I’m sorry, you actually — there was a — there was an effort on the part of the president to have a stat"| __truncated__ "" "(CROSSTALK)" "" "OBAMA: Governor…" "" "(CROSSTALK)" "" "ROMNEY: …that your posture. That was my posture as well. You thought it should have been 5,000 troops…" "" "(CROSSTALK)" "" "OBAMA: Governor?" "" "ROMNEY: … I thought there should have been more troops, but you know what? The answer was we got…" "" "(CROSSTALK)" "" "ROMNEY: … no troops through whatsoever." "" "OBAMA: This was just a few weeks ago that you indicated that we should still have troops in Iraq." "" "ROMNEY: No, I…" "" "(CROSSTALK)" "" "ROMNEY: …I’m sorry that’s a…" "" "(CROSSTALK)" "" "OBAMA: You — you…" "" "ROMNEY: …that’s a — I indicated…" "" "(CROSSTALK)" "" "OBAMA: …major speech." "" "(CROSSTALK)" "" "ROMNEY: …I indicated that you failed to put in place a status…" "" "(CROSSTALK)" "" "OBAMA: Governor?" "" "(CROSSTALK)" "" "ROMNEY: …of forces agreement at the end of the conflict that existed." "" "OBAMA: Governor — here — here’s — here’s one thing…" "" "(CROSSTALK)" "" "OBAMA: …here’s one thing I’ve learned as commander in chief." "" "(CROSSTALK)" "" "SCHIEFFER: Let him answer…" "" "OBAMA: You’ve got to be clear, both to our allies and our enemies, about where you stand and what you mean. You"| __truncated__ "" "Now, it is absolutely true that we cannot just meet these challenges militarily. And so what I’ve done througho"| __truncated__ "" "Number two, make sure that they are standing by our interests in Israel’s security, because it is a true friend"| __truncated__ "" "Number three, we do have to make sure that we’re protecting religious minorities and women because these countr"| __truncated__ "" "Number four, we do have to develop their economic — their economic capabilities." "" "But number five, the other thing that we have to do is recognize that we can’t continue to do nation building i"| __truncated__ "" "SCHIEFFER: Let me interject the second topic question in this segment about the Middle East and so on, and that"| __truncated__ "" "The war in Syria has now spilled over into Lebanon. We have, what, more than 100 people that were killed there "| __truncated__ "" "Mr. President, it’s been more than a year since you saw — you told Assad he had to go. Since then, 30,000 Syria"| __truncated__ "" "The war goes on. He’s still there. Should we reassess our policy and see if we can find a better way to influen"| __truncated__ "" "And you go first, sir." "" "OBAMA: What we’ve done is organize the international community, saying Assad has to go. We’ve mobilized sanctio"| __truncated__ "" "But ultimately, Syrians are going to have to determine their own future. And so everything we’re doing, we’re d"| __truncated__ "" "This — what we’re seeing taking place in Syria is heartbreaking, and that’s why we are going to do everything w"| __truncated__ "" "And I am confident that Assad’s days are numbered. But what we can’t do is to simply suggest that, as Governor "| __truncated__ "" "ROMNEY: Well, let’s step back and talk about what’s happening in Syria and how important it is. First of all, 3"| __truncated__ "" "ROMNEY: Syria is Iran’s only ally in the Arab world. It’s their route to the sea. It’s the route for them to ar"| __truncated__ "" "And so the right course for us, is working through our partners and with our own resources, to identify respons"| __truncated__ "" "But the Saudi’s and the Qatari, and — and the Turks are all very concerned about this. They’re willing to work "| __truncated__ "" "This — this is a critical opportunity for America. And what I’m afraid of is we’ve watched over the past year o"| __truncated__ "" "SCHIEFFER: All right." "" "ROMNEY: …by the leadership role." "" "OBAMA: We are playing the leadership role. We organized the Friends of Syria. We are mobilizing humanitarian su"| __truncated__ "" "And to the governor’s credit, you supported us going into Libya and the coalition that we organized. But when i"| __truncated__ "" "Imagine if we had pulled out at that point. You know, Moammar Gadhafi had more American blood on his hands than"| __truncated__ "" "But we did so in a careful, thoughtful way, making certain that we knew who we were dealing with, that those fo"| __truncated__ "" "SCHIEFFER: Governor, can I just ask you, would you go beyond what the administration would do, like for example"| __truncated__ "" "ROMNEY: I don’t want to have our military involved in Syria. I don’t think there is a necessity to put our mili"| __truncated__ "" "As I indicated, our objectives are to replace Assad and to have in place a new government which is friendly to "| __truncated__ "" ...
transcript_r[1:25]
 [1] "SCHIEFFER: Good evening from the campus of Lynn University here in Boca Raton, Florida. This is the fourth and last debate of the 2012 campaign, brought to you by the Commission on Presidential Debates."                                                                                                                                                                                                                                                                                                                                                                           ""                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     "This one’s on foreign policy. I’m Bob Schieffer of CBS News. The questions are mine, and I have not shared them with the candidates or their aides."                                                                                                                                                                                                                                                                                                                                                                                                                                  ""                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     "SCHIEFFER: The audience has taken a vow of silence — no applause, no reaction of any kind, except right now when we welcome President Barack Obama and Governor Mitt Romney."                                                                                                                                                                                                                                                                                                                                                                                                         ""                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     "(APPLAUSE)"                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           ""                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     "Gentlemen, your campaigns have agreed to certain rules and they are simple. They’ve asked me to divide the evening into segments. I’ll pose a question at the beginning of each segment. You will each have two minutes to respond and then we will have a general discussion until we move to the next segment."                                                                                                                                                                                                                                                                     ""                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     "Tonight’s debate, as both of you know, comes on the 50th anniversary of the night that President Kennedy told the world that the Soviet Union had installed nuclear missiles in Cuba, perhaps the closest we’ve ever come to nuclear war. And it is a sobering reminder that every president faces at some point an unexpected threat to our national security from abroad."                                                                                                                                                                                                          ""                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     "So let’s begin."                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      ""                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     "SCHIEFFER: The first segment is the challenge of a changing Middle East and the new face of terrorism. I’m going to put this into two segments so you’ll have two topic questions within this one segment on the subject. The first question, and it concerns Libya. The controversy over what happened there continues. Four Americans are dead, including an American ambassador. Questions remain. What happened? What caused it? Was it spontaneous? Was it an intelligence failure? Was it a policy failure? Was there an attempt to mislead people about what really happened?" ""                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     "Governor Romney, you said this was an example of an American policy in the Middle East that is unraveling before our very eyes."                                                                                                                                                                                                                                                                                                                                                                                                                                                     
[18] ""                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     "SCHIEFFER: I’d like to hear each of you give your thoughts on that."                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  ""                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     "Governor Romney, you won the toss. You go first."                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     ""                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     "ROMNEY: Thank you, Bob. And thank you for agreeing to moderate this debate this evening. Thank you to Lynn University for welcoming us here. And Mr. President, it’s good to be with you again. We were together at a humorous event a little earlier, and it’s nice to maybe funny this time, not on purpose. We’ll see what happens."                                                                                                                                                                                                                                               ""                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     "This is obviously an area of great concern to the entire world, and to America in particular, which is to see a — a complete change in the — the structure and the — the environment in the Middle East."                                                                                                                                                                                                                                                                                                                                                                            
transcript_r[973]
[1] "Good night."

Wrangling Data

transcript_r <- transcript_r %>%
    tolower() %>%
    # tm::removePunctuation() %>%
    # qdap::replace_contraction()
    # qdap::replace_symbol() %>%
    # qdap::replace_number() %>%
    # tm::removeNumbers()
    # qdap::replace_abbreviation() %>%
    # qdap::bracketX() %>%
    tm::stripWhitespace()

#transcript <- tm::removeWords(transcript, tm::stopwords(kind = "en"))
#transcript <- tm::removeWords(transcript, tm::stopwords(kind = "SMART"))
# transcript <- tm::removeWords(transcript, c("president", "obama", "governor", "romney", "schieffer"))

transcript_r[1:25]
 [1] "schieffer: good evening from the campus of lynn university here in boca raton, florida. this is the fourth and last debate of the 2012 campaign, brought to you by the commission on presidential debates."                                                                                                                                                                                                                                                                                                                                                                           ""                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     "this one’s on foreign policy. i’m bob schieffer of cbs news. the questions are mine, and i have not shared them with the candidates or their aides."                                                                                                                                                                                                                                                                                                                                                                                                                                  ""                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     "schieffer: the audience has taken a vow of silence — no applause, no reaction of any kind, except right now when we welcome president barack obama and governor mitt romney."                                                                                                                                                                                                                                                                                                                                                                                                         ""                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     "(applause)"                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           ""                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     "gentlemen, your campaigns have agreed to certain rules and they are simple. they’ve asked me to divide the evening into segments. i’ll pose a question at the beginning of each segment. you will each have two minutes to respond and then we will have a general discussion until we move to the next segment."                                                                                                                                                                                                                                                                     ""                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     "tonight’s debate, as both of you know, comes on the 50th anniversary of the night that president kennedy told the world that the soviet union had installed nuclear missiles in cuba, perhaps the closest we’ve ever come to nuclear war. and it is a sobering reminder that every president faces at some point an unexpected threat to our national security from abroad."                                                                                                                                                                                                          ""                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     "so let’s begin."                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      ""                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     "schieffer: the first segment is the challenge of a changing middle east and the new face of terrorism. i’m going to put this into two segments so you’ll have two topic questions within this one segment on the subject. the first question, and it concerns libya. the controversy over what happened there continues. four americans are dead, including an american ambassador. questions remain. what happened? what caused it? was it spontaneous? was it an intelligence failure? was it a policy failure? was there an attempt to mislead people about what really happened?" ""                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     "governor romney, you said this was an example of an american policy in the middle east that is unraveling before our very eyes."                                                                                                                                                                                                                                                                                                                                                                                                                                                     
[18] ""                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     "schieffer: i’d like to hear each of you give your thoughts on that."                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  ""                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     "governor romney, you won the toss. you go first."                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     ""                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     "romney: thank you, bob. and thank you for agreeing to moderate this debate this evening. thank you to lynn university for welcoming us here. and mr. president, it’s good to be with you again. we were together at a humorous event a little earlier, and it’s nice to maybe funny this time, not on purpose. we’ll see what happens."                                                                                                                                                                                                                                               ""                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     "this is obviously an area of great concern to the entire world, and to america in particular, which is to see a — a complete change in the — the structure and the — the environment in the middle east."                                                                                                                                                                                                                                                                                                                                                                            

Top 25 Most Frequent Terms

top_25_terms_r <- freq_terms(transcript_r, 25)
str(top_25_terms_r)
Classes 'freq_terms', 'all_words' and 'data.frame':	25 obs. of  2 variables:
 $ WORD: chr  "the" "to" "and" "that" ...
 $ FREQ: num  777 715 568 513 368 339 335 327 247 236 ...
top_25_terms_r
   WORD   FREQ
1  the     777
2  to      715
3  and     568
4  that    513
5  we      368
6  in      339
7  a       335
8  of      327
9  is      247
10 i       236
11 you     211
12 have    210
13 our     202
14 with    129
15 for     128
16 not     125
17 it      122
18 were    117
19 but     115
20 this    114
21 they    113
22 are     112
23 romney  106
24 going   103
25 on       97
freq_terms(text.var=transcript_r, top=25, stopwords=Top200Words)
   WORD       FREQ
1  romney      106
2  going       103
3  thats        74
4  obama        69
5  sure         69
6  president    68
7  governor     62
8  weve         61
9  schieffer    57
10 crosstalk    49
11 military     48
12 those        43
13 years        41
14 dont         40
15 iran         38
16 nuclear      38
17 got          37
18 im           37
19 leadership   33
20 china        32
21 jobs         32
22 israel       31
23 states       27
24 american     26
25 four         26
26 nation       26

References

Applied Advanced Analytics & AI in Sports