RNW Media is an NGO that focuses on countries where there is limited freedom of expression. The organisation tries to make an impact through online channels such as social media and websites. In order to measure that impact, RNW Media drew up a Theory of Change (a kind of KPI framework for NGOs).
It turned out that there was no system that could centrally store the data of 12 websites and related social media channels. Therefore, the metrics from the framework were largely measured manually. In practice, this meant making a printout 12 times of Facebook data, 12 times of Youtube, 12 times of Instagram, et cetera. Due to the large amount of data and increasing complexity, it was impossible to make adjustments based on data. RNW Media asked us to help them automate their processes.
The project consists of two parts. In consultation with the client, we prioritised the channels. We then wrote Python scripts with which we brought the data to PowerBI. We retrieved this data as Proof of Concept in separate, local files that you had to call up yourself. When this turned out to work well, we converted the process into a fully automated datalake.
RNW Media wants to be able to analyse all content based on subject matter. That’s why we developed a method in which all social media posts are collected per subject. We linked the website and social media channels for a total overview. There are tools on the market with which you can link different social media channels, but not in a way that suits RNW Media’s working method.
We used Airflow to set up the datalake: this automates and monitors Python jobs. We rewrote the Python scripts from the first phase of the project to Airflow pipelines. This also made it possible to replenish data retroactively.
Creating internal support for the automation of data processing was a nice challenge. In addition to PowerBI, we set up the Metabase tool. In this tool, everyone within RNW Media can ‘click around’ and get insights from the data via a simple interface.
With the help of Airflow we automated the ETL process. The result is stored in the data lake and a new (BigQuery) data warehouse. Here all cleaned data is available and accessible to the analysts. They now all work with the same facts. By combining the warehouse with the data lake, the data storage is central and scalable. The data is automatically refreshed on a daily basis.
RNW Media now has insight into the metrics from the Theory of Change based on social media data, grouped by subject. In the next phase, we will also link the cost data (expenditure on marketing data) to the data warehouse environment. This also gives the NGO insight into the financial costs in relation to the impact they make.
RNW Media employees in 12 different countries now have access to new data on a daily basis. They can use their data to find out how many people are discussing and looking at a subject and how involved they are. This provides clear insight into the questions that arise around a topic. The NGO can also respond better to the information needs of their target groups and make an even greater social impact.
We are still working on creating new links between the online channels of RNW Media and the datalake. The datalake now mainly contains raw data. We are going to clean up and combine this even further. As soon as all sources are connected, we will build a business layer on top of the data layer that RNW Media’s Analysts can work with themselves.
In the future, the data warehouse will function as the single source of truth. Where possible (and efficient), we will also add offline data sources.