Twitter Analysis – Part 1: Getting Started

For my first steps into data analysis I wanted to do some analysis around my interests to make it enjoyable and informative. Since I like a wide variety of sports I decided I would do some analysis on an UFC (Ultimate Fighting Championship) event.  The reason being that these events last for approximately 6 hours depending on the length of the bouts. This kind of time frame would be perfect for me as I would get a large amount of data from the event without having to wait days or weeks.

After some brief research I decided would do the analysis for the upcoming event UFC 197: Jones vs. Saint Preux. However I wanted do a test run of scraping tweets from the event before this, UFC on Fox: Teixeira vs. Evans. The reason for this is because I want make sure that everything go well when scraping tweets for UFC 197 and the UFC on Fox event would be a perfect test event as both events have the same structure.

Now that I had set some objectives I needed to learn how to scrape tweets. From the beginning I wanted to use Python as my choice of language since it’s one of the programming languages used for data analysis due to the tools and libraries available for it. Also, I didn’t have a Twitter account before this so I set one up and did all the things you needed to do before scraping tweets (i.e. getting consumer keys, tokens and etc.).

After some googling I found a series of guides to Mine Twitter Data with Python. Funnily enough the author of these guides, Marco Bonzanini, was a teaching assistant of mine for a module in my undergraduate degree (small world eh!). After reading through the first few parts I felt this was a suitable resource of information to begin scraping tweets.

So I started of following Part 1: Collecting data. I used all the code he provided as I felt that there is no point reinventing the wheel when it comes to scraping tweets. However, I made sure I would understand the code to get a better idea of what was going on and add in some comments for myself.

Click here to see part 1 of my IPython Notebook.

For example –

I didn’t understand one particular line of code:

  • twitter_stream.filter(track=[‘#python’])

What is filter()?  What does it do?

Filter allows you to match incoming tweets based on a ‘filter’ you apply. If the tweet matches your filter then you can do whatever you want with it, in my case store the tweet in a JSON file.

The filters you can apply are:

  • follow – returns statuses of the user IDs specified
  • track – searches for a word in a tweet
  • location – returns tweets based on the location they were tweeted from

Now I have understood that to return tweets with a certain hashtag filter and track must be used in order to do this. This will be perfect for me as the UFC197 associated hashtag (#UFC197) which will help me return tweets related to the event.

In the next part I will discuss how the test run went.

 

2 thoughts on “Twitter Analysis – Part 1: Getting Started

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s