Since I had completed Part 1 a week or so before the UFC on Fox event. I decided I would do some mini test runs. A few sporting events where going on, another UFC event, UFC Fight Night: Rothwell vs. dos Santos and the Masters (Golf).
The mini-test runs
The UFC event was happening whilst I finished the script so I quickly decided to use UFC event as a mini test run. The hashtag was #UFCZagreb. I captured tweets from the middle of the 3rd round of the main event (9.00pm BST) till 15 minutes after the fight finished which was around 10.30pm BST. The test went well and no hiccups occurred.
I thought I would run another mini test with the Masters for slightly longer. I intended for it to run from straight after finishing up with the UFC event at 10.30pm till the end of the BBC coverage of the Masters. However, the script errored at 11.16pm. I didn’t know this at the time as I was watching it on TV while the script was running in my room. It was only until I came back did I find out the script had errored. I suspect it was due to the internet connection being lost on my laptop as I connect my laptop to the WiFi.
This got me thinking about what to do for the upcoming test event and the actual event I wanted to capture tweets for. I wanted to avoid any problems like I occured for the Masters event. Thankfully where I work, we use Microsoft Azure to deliver a SaaS solution of our platform. As a result I have been deploying Virtual Machines, creating customer environments and etc. This gave me confidence to explore other Cloud Infrastructure providers for my data analysis learning.
The reason for using Virtual Machines is because they can run scripts without you having to worry about them going on stand by, using up your laptop battery or disconnecting from the WiFi as a normal laptop might do. A VM will allow me to run the script without any worry and I can sit back and enjoy the event while it’s working hard scraping tweets for me!
I knew of two other major providers, Google Cloud Services and Amazon EC2. After reviewing both of them I decided to go with Amazon EC2 as they had a 1 year free trial and they have the largest market share for providing cloud services. Furthermore, it’ll add to my existing skill of using Microsoft Azure.
The test run
Creating the Virtual Machine was straight forward. I followed the guides provided to deploy a t2.micro VM. Deploying a VM in EC2 is more fiddly for first impressions then in Azure because of the use of keys and setting up a VPN. Hopefully I’ll use EC2 more and it’ll become less finickity. Also, it maybe because at work we have it all documented for us which makes it easier for us to deploy in Azure.
I needed to decide when to run the script during the UFC on Fox event. The schedule for the event was as follows:
- 4:30 pm ET – Prelims on @UFCFightPass
- 6 pm ET – Prelims on Fox
- 8 pm ET – Main Card on Fox
I decided to run the script from 9.30pm – 3.30am BST (half an hour after the final bout ended). Note : ET = BST (-5).
The hashtag of the event was #UFCTampa as it was being held in Tampa, Florida. The run was successful as it didn’t error during the event and as a result I obtained a JSON file 228MB in size.
In the next part I will discuss how the actual run went for UFC197 and some brief post analysis which altered my original plan slightly.