Measurable impact on social change using a data lake
RNW Media
- Customer case
- Data Engineering
- Data consultancy
RNW Media is an NGO that focuses on countries where there is limited freedom of expression. The organisation tries to make an impact through online channels such as social media and websites. To measure that impact, RNW Media drew up a Theory of Change (a kind of KPI framework for NGOs).
It turned out no system existed that could centrally store the data of 12 websites and their associated social media channels. That is why the metrics from the framework were largely measured manually. In practice, this meant making 12 printouts of the Facebook data, 12 times from Youtube, 12 times from Instagram, and so on. Due to the large amount of data and increasing complexity, it was impossible to make adjustments based on data. RNW Media asked us to help automate their processes.
Our approach
The project consists of two parts. We prioritised the channels in consultation with the customer. We then wrote Python scripts with which we brought the data to PowerBI. We retrieved this data as Proof of Concept in separate, local files that you had to call yourself. When this proved to work well, we converted the process into a fully automated data lake.
RNW Media wants to be able to analyse all content based on subject. That is why we developed a method in which all social media posts are collected per subject. We linked the website and social media channels for a complete overview. There are tools on the market with which you can link different social media channels, but not in a way that fits the working method of RNW Media.
We used Airflow to set up the data lake; this automates and monitors Python jobs. We converted the Python scripts from the first phase of the project into Airflow pipelines. This also made it possible to supplement data with retroactive effect.
Creating internal support for the automation of data processing was a great challenge. In addition to PowerBI, we set up the Metabase tool. Everyone within RNW Media can 'click around' through a simple interface and extract insights from the data.
The result
We automated the ETL process using Airflow. The result of this is stored in the data lake and a new (BigQuery) data warehouse. Here, all the cleaned data is available and accessible to the analysts. They are all now working with the same facts. By combining the warehouse with the data lake, the data storage is central and scalable. The data is automatically refreshed daily.
RNW Media now has insight into the metrics from the Theory of Change based on social media data, grouped by subject. In the next phase, we will also link the cost data (expenses on marketing data) to the data warehouse environment. This also gives the NGO insight into the financial costs in relation to the impact they make.
RNW Media employees in 12 different countries now have access to new data every day. They can extract from their data how many people are discussing and viewing a topic and how involved they are. There is a clear insight into the questions that play around a subject. The NGO can also better respond to the information needs of their target groups and have even more social impact.
We are still working on creating new links between the online channels of RNW Media and the data lake. Raw data is now mainly available in the data lake. We will further clean and combine these. As soon as all sources are connected, we build a business layer on top of the data lake so that the Analysts of RNW Media can work with it themselves.
The data warehouse will serve as the single source of truth in the future. Where possible (and efficient), we will also add offline data sources.
Want to know more?
Joachim will be happy to talk to you about what we can do for you and your organisation as a data partner.
Business Manager+31(0)20 308 43 90+31(0)6 23 59 83
Receive data insights, use cases and behind-the-scenes peeks once a month?
Sign up for our email list and stay 'up to data':