Working more efficiently thanks to migration to Databricks

The Kadaster

Customer case
Data consultancy
Data Engineering

Joachim van Biemen

Business Manager

5 min

02 Apr 2024

The Kadaster manages complex (geo)data, including all real estate in the Netherlands. All data is stored and processed using an on-premise data warehouse in Postgres. They rely on an IT partner for maintaining this warehouse. The Kadaster aims to save costs and work more efficiently by migrating to a Databricks environment. They asked us to assist in implementing this data lakehouse in the Microsoft Azure Cloud.

Approach

Together with an internal tech team, we mapped out existing pipelines. We assessed what data was already available in Databricks via the Datahub and what data we could load ourselves using Python code. Based on this, we defined several use cases for which we then designed the Databricks structure. We utilised a Medallion Architecture and the dbt framework for data transformations within Databricks. Step by step, we further developed the use cases until they could replace the Postgres setup.

Kadaster's architecture — A simplified architecture overview

The Kadaster keeps track, among other things, of whether a property is an apartment, terraced house, corner house, semi-detached house, or detached house. To do this, the relationship between objects on a map is examined. Comparing all properties in the Netherlands in this way involves heavy calculations. Therefore, we divided the map of the Netherlands into grid squares, each with its own index, and compared the properties within those squares. This calculation now runs in a few hours instead of a whole day, a significant efficiency gain in both time and costs.

We focused on specific use cases that could be rolled out to all teams within the Kadaster. During implementation, we were in constant communication with the Kadaster's cloud architects. We shared best practices and prepared the data for end-users, primarily analysts providing insights to the Kadaster's clients. To create internal support and enthusiasm for the new platform, we organised knowledge-sharing sessions.

Result

Several use cases have been developed and set up in Databricks, serving as a blueprint for migrating other processes. We helped accelerate the Kadaster's migration of their infrastructure to the cloud. The components we set up in Databricks can then be replicated in Postgres, allowing the Kadaster to save more costs and time gradually.

Handling geodata on Databricks is a niche area. Together with the Kadaster, we were pioneers in this field, figuring out efficient methods. While many geotransformations were readily available in the Postgres environment via Postgis, these functionalities are still in development in Databricks. We utilised the open-source extension Mosaic and often had to figure out how to perform the same transformations in Databricks throughout the project.

It was crucial to involve the organisation in what is currently possible and what will be possible in the future. We trained the internal team in the new processes and taught them the software principles we used in the development process.

To ensure knowledge was retained within the team, we created various documentation pages within the internal wiki environment. Additionally, we provided training throughout the project on Python, dbt, and Databricks to enhance the team's knowledge level.

Want to know more?

Joachim will be happy to talk to you about what we can do for you and your organisation as a data partner.

Joachim van Biemen

Business Manager+31(0)20 308 43 90 +31(0)6 23 59 83 71 joachim.vanbiemen@digital-power.com

Schedule an online meeting

Receive data insights, use cases and behind-the-scenes peeks once a month?

You might find this interesting too

Insight into the complete sales funnel thanks to a data warehouse with dbt

Our consultants log the assignments they take on for our clients in our ERP system AFAS. In our CRM system HubSpot, we can see all the information relevant before signing a collaboration agreement. When we close a deal, all the information from HubSpot automatically transfers to AFAS. So, HubSpot is mainly used for the process before entering a collaboration, while AFAS is used for the subsequent phase. To tighten our people's planning and improve our financial forecasts, we decided to set up a data warehouse to integrate data from both data sources.

A standardised way of processing data using dbt

One of the largest online shops in the Netherlands wanted to develop a standardised way of data processing within one of its data teams. All data was stored in the scalable cloud data warehouse Google BigQuery. Large amounts of data were available within this platform regarding orders, products, marketing, returns, customer cases and partners.

Reliable reporting using robust Python code

The National Road Traffic Data Portal (NDW) is a valuable resource for municipalities, provinces, and the national government to gain insight into traffic flows and improve infrastructure efficiency.

A fully automated data import pipeline

Stichting Donateursbelangen aims to strengthen trust between donors and charities. They believe that that trust is based on collecting money honestly, openly, transparently and respectfully. At the same time effectively using the raised donation funds to make an impact. To further this goal, Stichting Donateursbelangen wants to share information about charities with donors through their own search engine.

A scalable data platform in Azure

TM Forum, an alliance of over 850 global companies, engaged our company as a data partner to identify and solve data-related challenges.

Valuable insights from Microsoft Dynamics 365

Agrico is a cooperative of potato growers. They cultivate potatoes for various purposes such as consumption and planting future crops. These potatoes are exported worldwide through various subsidiaries. All logistical and operational data is stored in their ERP system, Microsoft Dynamics 365. Due to the complexity of this system with its many features, the data is not suitable for direct use in reporting. Agrico asked us to help make their ERP data understandable and develop clear reports.

Setting up a future-proof data infrastructure

Valk Exclusief is a chain of 4-star+ hotels with 43 hotels in the Netherlands. The hotel chain wants to offer guests a personal experience, both in the hotel and online.

Central data storage with a new data infrastructure

Dedimo is a collaboration of five mental healthcare initiatives. In order to continuously enhance the quality of their care, they organize internal processes more efficiently. Therefore, they use perceptions from the data that is internally available. Previously, they acquired the data themselves from different source systems with ad hoc scripts. They requested our help to make this process more robust, efficient and to further professionalise it. They asked us to facilitate the central storage of their data, located in a cloud data warehouse. The goal was to set up the data infrastructure within this environment, since they were already used to working with Google Cloud Platform (GCP).

A scalable machine-learning platform for predicting billboard impressions

The Neuron provides a programmatic bidding platform to plan, buy and manage digital Out-Of-Home ads in real-time. They asked us to predict the number of expected impressions for digital advertising on billboards in a scalable and efficient way.

Converting billions of streams into actionable insights with a new data & analytics platform

Merlin is the largest digital music licensing partner for independent labels, distributors, and other rightsholders. Merlin’s members represent 15% of the global recorded music market. The company has deals in place with Apple, Facebook, Spotify, YouTube, and 40 other innovative digital platforms around the world for its’ member’s recordings. The Merlin team tracks payments and usage reports from digital partners while ensuring that their members are paid and reported to accurately, efficiently, and consistently.