Measurable impact on social change using a data lake
- Customer case
- Data Engineering
RNW Media is an NGO that focuses on countries where there is limited freedom of expression. The organisation tries to make an impact through online channels such as social media and websites. To measure that impact, RNW Media drew up a Theory of Change (a kind of KPI framework for NGOs).
It turned out no system existed that could centrally store the data of 12 websites and their associated social media channels. That is why the metrics from the framework were largely measured manually. In practice, this meant making 12 printouts of the Facebook data, 12 times from Youtube, 12 times from Instagram, and so on. Due to the large amount of data and increasing complexity, it was impossible to make adjustments based on data. RNW Media asked us to help automate their processes.
The project consists of two parts. We prioritised the channels in consultation with the customer. We then wrote Python scripts with which we brought the data to PowerBI. We retrieved this data as Proof of Concept in separate, local files that you had to call yourself. When this proved to work well, we converted the process into a fully automated data lake.
RNW Media wants to be able to analyse all content based on subject. That is why we developed a method in which all social media posts are collected per subject. We linked the website and social media channels for a complete overview. There are tools on the market with which you can link different social media channels, but not in a way that fits the working method of RNW Media.
We used Airflow to set up the data lake; this automates and monitors Python jobs. We converted the Python scripts from the first phase of the project into Airflow pipelines. This also made it possible to supplement data with retroactive effect.
Creating internal support for the automation of data processing was a great challenge. In addition to PowerBI, we set up the Metabase tool. Everyone within RNW Media can 'click around' through a simple interface and extract insights from the data.
We automated the ETL process using Airflow. The result of this is stored in the data lake and a new (BigQuery) data warehouse. Here, all the cleaned data is available and accessible to the analysts. They are all now working with the same facts. By combining the warehouse with the data lake, the data storage is central and scalable. The data is automatically refreshed daily.
RNW Media now has insight into the metrics from the Theory of Change based on social media data, grouped by subject. In the next phase, we will also link the cost data (expenses on marketing data) to the data warehouse environment. This also gives the NGO insight into the financial costs in relation to the impact they make.
RNW Media employees in 12 different countries now have access to new data every day. They can extract from their data how many people are discussing and viewing a topic and how involved they are. There is a clear insight into the questions that play around a subject. The NGO can also better respond to the information needs of their target groups and have even more social impact.
We are still working on creating new links between the online channels of RNW Media and the data lake. Raw data is now mainly available in the data lake. We will further clean and combine these. As soon as all sources are connected, we build a business layer on top of the data lake so that the Analysts of RNW Media can work with it themselves.
The data warehouse will serve as the single source of truth in the future. Where possible (and efficient), we will also add offline data sources.
Want to know more?
Erik will be happy to talk to you about what we can do for you and your organisation as a data partner.
Commercial Manager+31(0)20 308 43 90+31(0)6 51 95 61 email@example.com
Receive data insights, use cases and behind-the-scenes peeks once a month?
Sign up for our email list and stay 'up to data':
You might find this interesting too:
Making impact measurable
The Designathon Works foundation organises Design Hackathons (Designathons) for children aged 8 to 12. The target? Teaching children from all over the world skills to become a 'changemaker'. They are challenged to design solutions for a better world, for example to combat climate change. From the Datahub, we helped Designathon Works fine-tune the impact measurements free of charge. We also made a first move towards automating data collection, analysis and visualisation.
The COVID-19 Violence Tracker
The outbreak of the corona pandemic in early 2020 has turned the world upside down. In addition to countless infections, hospitalisations and deaths, we also saw an outbreak of violence in many countries. Citizens took to the streets, sometimes violently, to protest against the measures taken, but domestic violence also increased in many places and fear and frustration played a role in racism.
Deliver reliable and meaningful data to everyone from a solid, scalable infrastructure.
5 reasons to use Infrastructure as Code (IaC)
Infrastructure as Code has proven itself as a reliable technique for setting up platforms in the cloud. However, it does require an additional investment of time from the developers involved. In which cases does the extra effort pay off? Find out in this article.
A well-organised data infrastructure
FysioHolland is an umbrella organisation for physiotherapists in the Netherlands. A central service team relieves therapists of additional work, so that they can mainly focus on providing the best care. In addition to organic growth, FysioHolland is connecting new practices to the organisation. Each of these has its own systems, work processes and treatment codes. This has made FysioHolland's data management large and complex.
Implementing a Data Platform
Based on our know-how, the purpose of this blog is to transmit our knowledge and experience to the community by describing guidelines for implementing a data platform in an organization. We understand that the specific needs of every organization are different, that they will have an impact on the technologies used and that a single architecture satisfying all of them makes no sense. So, in this blog we will keep it as general as we can.
Improved data quality thanks to a new data pipeline
At Royal HaskoningDHV, the number of requests from customers with Data Engineering issues continue to climb. The new department they have set up for this, is growing. So they asked us to temporarily offer their Data Engineering team more capacity. One of the issues we offered help with involved the Aa en Maas Water Authority.
A scalable machine-learning platform for predicting billboard impressions
The Neuron provides a programmatic bidding platform to plan, buy and manage digital Out-Of-Home ads in real-time. They asked us to predict the number of expected impressions for digital advertising on billboards in a scalable and efficient way.
Why do I need Data Engineers when I have Data Scientists?
It is now clear to most companies: data-driven decisions by Data Science add concrete value to business operations. Whether your goal is to build better marketing campaigns, perform preventive maintenance on your machines or fight fraud more effectively, there are applications for Data Science in every industry.