Data Challenge 2018 Experience

Data Challenge really lived unto its name since it was no less than a challenge for us to get all the pieces of the project together by the deadline. Challenge was presented to us in various shapes and sizes. The three of us were ready to take it upon us and present something really insightful by the end of the Data Challenge week. Like everyone else, we had to choose a dataset from the rich repository that was presented before us. There is where we faced our first challenge - should it be the data that we are interested in or the problem? We decided to go with the problem.  A considerable amount of time was spent to figure the problem statement that motivated us to come up with a solution for.  Driven by the idea of a world citizen, we decided to target issues that had a global impact. The Global Database of Events, Language and Tone (GDELT) describes how social systems are operating and presents us with a completely new perspective on the planet that we call home. The march of history has always been more of a stumble than a steady rise. Did you really think the world has 'evolved' this way on purpose?

Economic growth and political stability are deeply interconnected. On one hand, the uncertainty associated with an unstable political environment may reduce investment and the speed of economic development. On the other hand, poor economic performance may lead to government collapse and political unrest. What would it look like to use massive computing power to see the world as seen through the eyes of the world's news media? We plan to use the world events database of GDELT repository, to highlight events of world leadership failure and provide an insight on how they impact the political stability of a nation, and in turn affect the world economy.  Also, social ties between nations do not change periodically and the human eye often fails to perceive world relations over a prolonged duration of time. We use this global database of society to analyze how the connections between nations change over time and present a spectrum of world relations to the audience.

We decided on the problem statement, and it was time to get our hands dirty with the analysis. We knew the dataset was huge (about 158GB) and having it stored on Amazon’s S3 storage, there was a clear need to use Amazon Web Services for extraction, querying, exploratory and detailed analyses of the data. Little did we know, we were already facing the next challenge - Big Data!! With Big Data comes a series of cascading challenges. We had to select an appropriate web service that was apt for our analyses, make sure the credit provided (yes, everyone was provided with free credits on AWS) was usable for the service chosen for the analyses. A considerable amount of time was spent in discussion with AWS to make the right selection.

The analyses start with working on Big Query using which we tried to get a good idea about the dataset. Big Query was a solution provided on the Global Events website but the problem was that only a part of the dataset was available for analyses. The next step was to look for a similar tool on AWS and after a lot of research we stumbled upon AWS Athena.  Amazon Athena is a serverless, interactive query service that makes it easy to analyze big data in S3 using standard SQL. Just with a few clicks we were able to access the entire Global Events dataset on S3 using simple SQL commands.

Having spent sleepless nights on trying to find the effect of political instability on economic growth of the nation through failures in world leadership, and represent nation ties over the decades, we found success in our attempt not very far from the deadline. The struggle to make sense of the big data wasn’t enough when faced the challenge to communicate the insights. We realized this had to be done in the simplest form since we wanted every kind of audience to comprehend the results just as clearly as we did. Data Visualization to the rescue, Tableau and Alteryx helped us convey the information as knowledge through timelines and geographic representations. It isn’t over until they say it is; a key aspect of the Data Challenge is to present your seven days of work in mere 7 minutes. These challenges were truly reformative. They not only helped us learn something new through every challenge but also become resilient to them. It was a great opportunity to create interesting insights and learn new things throughout the journey. If you’re reading this and are a Data Nerd, we would definitely like to encourage you to Challenge yourself in the most inserting way possible - Go register for the next Data Challenge! 

Aniket Jadhav, Arpit Chandra, and Shashank Kava
Master of Information Management 2018