I’ve been dusting off my machine learning/data-science skills by diving into Python, which has become a lingua franca (along with R) of the data analysis world. Python’s libraries for data analysis and visualization are really superb and can make quick work of complex data analysis tasks.

Sentiment Analysis

Today, it’s possible to use computers to quantify, with reasonable accuracy, the emotional “sentiment” of an utterance, determining if it is fundamentally positive, negative or neutral in nature. This is called “sentiment analysis,” and is a very fertile area for some really interesting projects.

Looking around for basic topics to put new Python learnings to the test, I thought it’d be fun to take a look at the Twitter sentiment and “virality” of two famous tweeters: realDonaldTrump and hillaryclinton.

Using basic data science techniques, it seemed possible to know answers to questions such as:

  • Which of the two is currently more negative on Twitter?
  • Are Trump’s Twitter followers more “viral” than Clinton’s?
  • Under what conditions do their followers tend to like and retweet their messages?
  • When one of them goes negative, how do their followers respond?
  • etc.

[Update: I’ve just discovered jupyter, which would be the ideal platform to write this up in.]

Caveats

This is only the result of about 90 minutes of work, meant as a fun “throwaway” project to help me learn the frameworks. This is not university-level, peer-reviewed research. There are numerous caveats, and I caution against over-interpreting the data.

The two biggest caveats: First, the problem of Twitter bots is real, and well known. Twitter is fighting them, but it’s impossible to say with the data below just how much that impacts the data. Still, the results of the data below suggests to me that Trump probably has more bots that just retweet and like what he says, regardless of content.

Second, note that I’m using only the most basic “sentiment analysis” library: textblob. This is prone to false positives, as it looks word-by-word and does not accurately measure sentiments in the case of, for instance, double-negatives. This is experimental, and meant solely for learning and fun. Several tests I’ve run across have shown it to only be about 83–85% accurate.

Results

Nevertheless, here’s the result of the analysis as of March 13, 2018, looking at their last 200 tweets:

What jumped out at me

  • Trump has 2.2x as many followers as HRC. OK, duh. We don’t need python for that. But this is important to keep in mind for the table above, so it’s first off the bat. Said another way, if the audiences were relatively equal, you’d expect similar ratios to retweets, likes, and responses to negative or positive expressions.
  • HRC’s followers are significantly more “viral,” on a per-capita basis — they like and retweet on a “per-capita” basis much moreso than do Trump’s. While Trump has 2.2x as many followers, he only has 69% more “likes” from his followers; HRC’s followers are about twice as active, in general in liking/retweeting what she writes. Looking at her last 200 tweets:
  • Trump’s followers are equally likely to RT Trump regardless of whether what he writes is positive, negative or neutral. Another way to think of this is that no matter what Trump says, a sizable percentage of his core followers will retweet it. (One plausible theory — see “bots” above. He may well have more of them.)
  • Note the substantial dropoff of retweets for Trump (and not for Clinton) around the end of February and early March, 2018. One hypothesis (I’ve not yet checked) is that this was due to a purge of bots by Twitter.
Likes and Retweets over Time: Hillary Clinton: Much higher per follower than Trump’s
Likes and Retweets over Time: Donald Trump’s — Steadier Likes, Regardless of content

Most Retweeted Tweet, Trump (in past 200):

Lowest rated Oscars in HISTORY. Problem is, we don’t have Stars anymore — except your President (just kidding, of course)!

Most Retweeted Tweet, Clinton (in past 200):

RT @BillKristol: Two weeks ago a 26-year old soldier raced repeatedly into a burning Bronx apartment building, saving four people before he died in the flames. His name was Pvt. Emmanuel Mensah and he immigrated from Ghana, a country Donald Trump apparently thinks produces very subpar immigrants.
https://twitter.com/BillKristol/status/951637572576477184?utm_campaign=crowdfire&utm_content=crowdfire&utm_medium=social&utm_source=twitter

Code

Acknowledgements

Thanks much to this informative article by Rodolfo Farro for the jumpstart.

error
Author

Steve's career has included leadership and entrepreneurship in consumer apps, online travel, games, relational databases, consulting and telecommunications. Steve founded BigOven, the first recipe app for iPhone, now with more than 13 million downloads, which was purchased in 2018. Steve served as Chairman of Escapia Inc., the leading SaaS solution for the US vacation rental industry, now part of Expedia. Steve was co-founder, President, CEO and Chairman of VacationSpot, a pioneer in the online reservation of vacation rentals, bought by Expedia in January 2000. At Expedia, Steve was Vice President of Vacation Packages, leading the vacation package and destination services teams, helping to create two patents on the first-ever dynamic vacation packaging system on the Internet, which now represents billions in annual transactions for Expedia. Steve has keynoted on several occasions at the Vacation Rental Managers Association (VRMA) and taught a course on the management of innovation at the University of Washington Graduate Business School in Seattle, Washington. Steve worked for Microsoft from 1991 to 1997 in a variety of senior marketing and executive positions, and led the creation of the internet games group, helping develop several products and patents related to online multiplayer gaming. He helped launch Microsoft Access and was involved in the acquisition of Fox Software by Microsoft in 1993. He's worked for IBM, Booz-Allen Hamilton and Bell Communications Research. Steve holds an MS in Computer Science from Stanford University in Symbolic and Heuristic Computation (AI), an MBA from Harvard Business School, where he was named a George F. Baker Scholar (awarded to top 5% of graduating class), and a dual BS in Applied Mathematics / Computer Science and Industrial Management from Carnegie Mellon University (CMU) with University Honors. Steve volunteers when time allows with Habitat for Humanity, University District Food Bank, YMCA Seattle, Technology Access Foundation (TAF) and SPEAK OUT Seattle, among other groups.

Write A Comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.