The COVID-19 Violence Tracker

PeaceTech Lab

  • Customer case
  • Data Analytics
  • Data Science
  • Data Engineering
  • Data projects
COVID violence tracker
PeaceTech Lab logo
Marieke-voorzitter-digitalpower-datahub
Marieke Schulte
Chair Digital Power Datahub
3 min
12 Oct 2021

The outbreak of the corona pandemic in early 2020 has turned the world upside down. In addition to countless infections, hospitalisations and deaths, we also saw an outbreak of violence in many countries. Citizens took to the streets, sometimes violently, to protest against the measures taken, but domestic violence also increased in many places and fear and frustration played a role in racism.

In May 2020, PeaceTech Lab (US) reported to PeaceTech Lab NL and Digital Power with a prototype of the COVID-19 ViolenceTracker. Assisted by volunteers, PeaceTech Lab had manually collected reports on corona-related violence and visualised the insights in a dashboard (see figure 1). The lab then wanted to automate the collection of news items. This turned out to be an excellent challenge for our consultants!

The first visualisation of the Violence Tracker
The first visualisation of the Violence Tracker

Our approach

Starting from May 2020, various Data Scientists, Data Engineers and Data Analysts from our team have contributed to building the tracker. We donated over 200 hours of work through our foundation, the Digital Power Datahub. We largely took charge of the technical implementation and discussed progress with the PeaceTech Lab teams on a weekly basis. The project ultimately consisted of eight phases:

Phase 1

We started with a (text) analysis of the vocabulary in the manually collected news items. The word cloud below is a visualisation of that. We looked at what the most common words were in the news stories and often came across "domestic", "violence" and "police" for example.

A word cloud based on the prototype
A word cloud based on the prototype

Phase 2

After research, we chose an appropriate method for data collection, which ended up being social listening with the Brandwatch tool. This allowed us to collect automated news items from the web that contained certain words (such as "domestic" and "violence").

Phase 3

We wrote a (search) query with the most relevant (combinations of) English words, for example "covid"+"violence".

Phase 4

From July 2020, we continuously collected news items via Brandwatch. In total we collected more than 9 million messages.

Phase 5

Although Brandwatch is a useful tool for data collection, it did not offer all the possibilities we were looking for in the analysis of our data. That is why we developed a separate data infrastructure in Google Cloud. The data from Brandwatch was automatically exported there, allowing us to work with the data ourselves.

Phase 6

As soon as we started the first analyses of the data, we found that there was a lot of noise (irrelevant news items) in our dataset. We wanted to get rid of this. That is why we held three validation rounds with volunteers who indicated which messages truly were about corona-related violence. By doing this, we started a major data cleanup.

Phase 7

Based on insights from these validation rounds, we optimised our query in Brandwatch. We removed words with double meanings, such as "beat" (meaning hitting, but also beating in sports) from the search. These were largely responsible for the noise.

Phase 8

Even with these adjustments, there was still a lot of noise in our data set. So it was time to bring out the big guns: we developed an NLP model with our validated datasets that learned to distinguish between relevant and irrelevant news items. Ultimately, we concluded the project with a properly cleaned dataset.

The result

In September 2021, 16 Digital Power data specialists were given 3 hours to extract insights from the dataset that could be of interest to policymakers.

While this hackathon once again exposed data quality issues, our consultants also discovered some interesting opportunities. One of the teams, for example, worked on a text analysis to map the language use around various themes (such as racism).

A second team looked at geographic patterns in reporting on corona-related violence, and looked at possible links between (the volume of) reporting and press freedom in different countries. In short: although the data quality of the dataset is still not perfect, there is plenty to investigate with the COVID-19 Violence Tracker!

Curious about the dataset? Download it here.

Partners

We carried out this project via our Datahub foundation, together with PeaceTech Lab. In addition, the following partners contributed:

partners covid violence tracker

Want to know more?

Marieke, chair of the Digital Power Datahub will be happy to talk to you about what we can do for you and your organisation as a data partner.

Marieke Schulte

Chair Digital Power Datahub+31(0)6 10 93 54 60marieke.schulte@digital-power.com

Receive data insights, use cases and behind-the-scenes peeks once a month?


Sign up for our email list and stay 'up to data':

You might find this interesting too:

Children who participated in a designathon

Making impact measurable

The Designathon Works foundation organises Design Hackathons (Designathons) for children aged 8 to 12. The target? Teaching children from all over the world skills to become a 'changemaker'. They are challenged to design solutions for a better world, for example to combat climate change. From the Datahub, we helped Designathon Works fine-tune the impact measurements free of charge. We also made a first move towards automating data collection, analysis and visualisation.

Read more
Data Engineer at work

Your Data Engineering partner

Generate reliable and meaningful insights from a solid, secure and scalable infrastructure. Our team of 25+ Data Engineers is ready to implement, maintain and optimise your data products and infrastructure end-to-end.

Read more

5 reasons to use Infrastructure as Code (IaC)

Infrastructure as Code has proven itself as a reliable technique for setting up platforms in the cloud. However, it does require an additional investment of time from the developers involved. In which cases does the extra effort pay off? Find out in this article.

Read more
fysioholland data

A well-organised data infrastructure

FysioHolland is an umbrella organisation for physiotherapists in the Netherlands. A central service team relieves therapists of additional work, so that they can mainly focus on providing the best care. In addition to organic growth, FysioHolland is connecting new practices to the organisation. Each of these has its own systems, work processes and treatment codes. This has made FysioHolland's data management large and complex.

Read more
Data Engineer at work

Data Engineer

Work on challenging technical assignments with various clients.

Read more
implementing a data platform

Implementing a data platform

Based on our know-how, the purpose of this blog is to transmit our knowledge and experience to the community by describing guidelines for implementing a data platform in an organisation. We understand that the specific needs of every organisation are different, that they will have an impact on the technologies used and that a single architecture satisfying all of them makes no sense. So, in this blog we will keep it as general as we can.

Read more
lake

Improved data quality thanks to a new data pipeline

At Royal HaskoningDHV, the number of requests from customers with Data Engineering issues continue to climb. The new department they have set up for this, is growing. So they asked us to temporarily offer their Data Engineering team more capacity. One of the issues we offered help with involved the Aa en Maas Water Authority.

Read more
billboards

A scalable machine-learning platform for predicting billboard impressions

The Neuron provides a programmatic bidding platform to plan, buy and manage digital Out-Of-Home ads in real-time. They asked us to predict the number of expected impressions for digital advertising on billboards in a scalable and efficient way.

Read more

Why do I need Data Engineers when I have Data Scientists?

It is now clear to most companies: data-driven decisions by Data Science add concrete value to business operations. Whether your goal is to build better marketing campaigns, perform preventive maintenance on your machines or fight fraud more effectively, there are applications for Data Science in every industry.

Read more

5 questions for Data Engineer Dennis

In this video, you will find out what a job as a Data Engineer looks like! What does a working week look like, which clients do our Data Engineers work for and what makes working so much fun? Dennis likes to tell you more about it!

Read more