Converting billions of streams into actionable insights with a new data & analytics platform

Merlin

  • Customer case
  • Data Engineering
  • Data projects
iphone with spotify music
Merlin
Zev-business-manager
Zev Posma
Business Manager
3 min
22 Jan 2024

Merlin is the largest digital music licensing partner for independent labels, distributors, and other rightsholders. Merlin’s members represent 15% of the global recorded music market. The company has deals in place with Apple, Facebook, Spotify, YouTube, and 40 other innovative digital platforms around the world for its’ member’s recordings. The Merlin team tracks payments and usage reports from digital partners while ensuring that their members are paid and reported to accurately, efficiently, and consistently.

In addition to financial data, Merlin receives non-financial usage trend information across DSPs worldwide. Combining and analysing this data is of immense value to Merlin's members, but the question is: how do you collect, structure, and combine the data from over one billion streams, royalty and trends inputs each day? And how do you derive actionable insights from it?

Merlin enlisted our help to design a data platform where all this data converges. A distinctive challenge in the design and development was the enormous volume of streaming data that needed to be collected and structured. Another challenge was the old system that used different cloud providers.

Forty sources had to be connected to the new system, with Spotify being by far the largest. To illustrate: for Spotify alone, nearly 1 billion rows of data are processed daily.

Approach

For this project, we worked in two phases.

1. Discovery phase

We started with a Discovery Phase, gathering all technical and business requirements from the involved stakeholders. The deliverables included a series of concrete recommendations and an architecture with a tool stack. In this case, we opted for a data lake architecture. The open nature of a data lake aligned well with the existing platform, reducing risks.

As seen in the image below, both the analytical data flow and the financial data flow were brought together on one platform.

analytics and financial data brought together in one data platform

2. Implementation phase

As part of the migration, many pipelines had to be migrated from the cloud platform and orchestrator. We also connected numerous new sources and transferred old data. In practice, challenges arose: a data model that changed over time or a pipeline that practically burst at the seams. Along the way, we addressed and resolved these challenges in collaboration with the stakeholders involved.

Platforms like these, processing terabytes of data daily, benefit significantly from a robust setup. Tool selection plays a crucial role, as does a good setup of Continuous Integration/Continuous Deployment (CI/CD). Additionally, having high test coverage of functional code and running automated quality tests across all data within the ETL/ELT process is essential.

Over nine months, our team, in close collaboration with Merlin, worked on implementing the new platform. The first phase involved implementing the platform using Infrastructure as Code (IaC) and deploying Dremio, the query engine. We then integrated the data pipelines and established the data layers within the data lakehouse. The pipelines ran on Python code within Airflow, and the tables were modularly set up using DBT and SQL.

data architecture sketch

Result

The new platform enables Merlin to link financial and non-financial usage data across multiple partners. Now, Merlin can gain deeper insights into trends, for use both internally and for its Members. Merlin can now see they are close to answering questions such as 'What is the best day to release a new single in a particular market for an artist in country or pop?' or 'How does the popularity of artists and genres compare across different music platforms?' The consolidated system has also significantly reduced the operational burden on Merlin’s Analytics Team as all data is available within the same data warehouse.

Because Merlin works with labels, distributors, and rightsholders around the world, it has a unique variety of data that is highly versatile and granular. The new platform provides Merlin with the opportunity to generate unique, previously unobtainable insights for its Members.

Want to know more?

Zev will be happy to talk to you about what we can do for you and your organisation as a data partner.

Receive data insights, use cases and behind-the-scenes peeks once a month?


Sign up for our email list and stay 'up to data':

You might find this interesting too

business managers having a conversation

Insight into the complete sales funnel thanks to a data warehouse with dbt

Our consultants log the assignments they take on for our clients in our ERP system AFAS. In our CRM system HubSpot, we can see all the information relevant before signing a collaboration agreement. When we close a deal, all the information from HubSpot automatically transfers to AFAS. So, HubSpot is mainly used for the process before entering a collaboration, while AFAS is used for the subsequent phase. To tighten our people's planning and improve our financial forecasts, we decided to set up a data warehouse to integrate data from both data sources.

Read more
woman shopping online

A standardised way of processing data using dbt

One of the largest online shops in the Netherlands wanted to develop a standardised way of data processing within one of its data teams. All data was stored in the scalable cloud data warehouse Google BigQuery. Large amounts of data were available within this platform regarding orders, products, marketing, returns, customer cases and partners.

Read more
dutch highway

Reliable reporting using robust Python code

The National Road Traffic Data Portal (NDW) is a valuable resource for municipalities, provinces, and the national government to gain insight into traffic flows and improve infrastructure efficiency.

Read more

A fully automated data import pipeline

Stichting Donateursbelangen aims to strengthen trust between donors and charities. They believe that that trust is based on collecting money honestly, openly, transparently and respectfully. At the same time effectively using the raised donation funds to make an impact. To further this goal, Stichting Donateursbelangen wants to share information about charities with donors through their own search engine.

Read more
data platform

A scalable data platform in Azure

TM Forum, an alliance of over 850 global companies, engaged our company as a data partner to identify and solve data-related challenges.

Read more
potatoes

Valuable insights from Microsoft Dynamics 365

Agrico is a cooperative of potato growers. They cultivate potatoes for various purposes such as consumption and planting future crops. These potatoes are exported worldwide through various subsidiaries. All logistical and operational data is stored in their ERP system, Microsoft Dynamics 365. Due to the complexity of this system with its many features, the data is not suitable for direct use in reporting. Agrico asked us to help make their ERP data understandable and develop clear reports.

Read more
valk exclusief

Setting up a future-proof data infrastructure

Valk Exclusief is a chain of 4-star+ hotels with 43 hotels in the Netherlands. The hotel chain wants to offer guests a personal experience, both in the hotel and online.

Read more
data mental healthcare

Central data storage with a new data infrastructure

Dedimo is a collaboration of five mental healthcare initiatives. In order to continuously enhance the quality of their care, they organize internal processes more efficiently. Therefore, they use perceptions from the data that is internally available. Previously, they acquired the data themselves from different source systems with ad hoc scripts. They requested our help to make this process more robust, efficient and to further professionalise it. They asked us to facilitate the central storage of their data, located in a cloud data warehouse. The goal was to set up the data infrastructure within this environment, since they were already used to working with Google Cloud Platform (GCP).

Read more
billboards

A scalable machine-learning platform for predicting billboard impressions

The Neuron provides a programmatic bidding platform to plan, buy and manage digital Out-Of-Home ads in real-time. They asked us to predict the number of expected impressions for digital advertising on billboards in a scalable and efficient way.

Read more
kadaster header

Working more efficiently thanks to migration to Databricks

The Kadaster manages complex (geo)data, including all real estate in the Netherlands. All data is stored and processed using an on-premise data warehouse in Postgres. They rely on an IT partner for maintaining this warehouse. The Kadaster aims to save costs and work more efficiently by migrating to a Databricks environment. They asked us to assist in implementing this data lakehouse in the Microsoft Azure Cloud.

Read more