Central data storage with a new data infrastructure
Dedimo
- Customer case
- Data Engineering
Dedimo is a collaboration of five mental healthcare initiatives. In order to continuously enhance the quality of their care, they organize internal processes more efficiently. Therefore, they use perceptions from the data that is internally available. Previously, they acquired the data themselves from different source systems with ad hoc scripts. They requested our help to make this process more robust, efficient and to further professionalise it. They asked us to facilitate the central storage of their data, located in a cloud data warehouse. The goal was to set up the data infrastructure within this environment, since they were already used to working with Google Cloud Platform (GCP).
Our approach
To keep the project manageable during the start-up phase, we began with the delivery of an MVP. Thus, we selected a single data source to connect to the data pipelines that we were constructing. It was Dedimo's goal to expand the infrastructure in the long-term for supplemental data sources. Therefore, in the future, the MVP must be extended. That is why we considered the scalability of the solution into account when developing the MVP.
To secure scalability, we set up a scalable scheduler that kicks off data pipelines on a regular basis. This became Apache's open source scheduler: Airflow. Initially, we ran Airflow locally within the development environment, using Docker. For the manufacturing environment, we used a GCP-managed instance of Airflow in the form of 'Cloud Composer.'
As a data warehouse, we selected 'Bigquery,' GCP's serverless data warehouse solution. A major advantage of this is that the data warehouse is additionally scalable, and you only pay for the genuine use of it. Furthermore, a data lake has been set up in Cloud Storage, in which all data exported from the source databases is held centrally.
We produced the data pipelines in such a way that the ETL process (consisting of exporting the data lake, transforming the data and loading the data into the data warehouse) is performed fully by the Bigquery engine. Airflow is only used for what it is designed for, namely as a pure scheduler. This keeps the implementation simple and elegant, and therefore robust.
To manage Bigquery, we chose to use SQL code's that are dynamically generated using several Python scripts. These are kicked off by Airflow; a scalable and flexible solution. Through this method, the pipelines can be modified with a low number of configuration parameters. This is an effective method when connecting new source databases or changes in the source data. Then, the Python scripts ensure that the configuration parameters are translated into SQL encoded commands that the Bigquery engine comprehends.
With Cloud Build, Google's serverless CI/CD platform, we constructed the foundation of a development street that connects the local development environment to the production environment.
The result
The delivered MVP is an original version of a production environment. This allows Dedimo to internally test further. Thus, this makes your personal data needs clearer and the current proof-of-concept can be further navigated and developed. To support Dedimo in adapting the production environment, we organize training in the use of Bigquery for its internal employees.
Moreover, we provided a development environment, with which the project can be further expanded with the MVP as a starting point. This development path was made possible by prioritizing scalability in infrastructure architecture choices.
In addition, by keeping the architecture simple and having the processing performed centrally by Bigquery, we laid a robust and reliable foundation for the future. In this regard, Dedimo uses digitization to consis focus on what Dedimo is best at, namely the care for its clients.
Want to know more about this project?
Reimer would gladly tell you all about it.
Business Manager+31(0)20 308 43 90+31(0)6 83 69 07 78reimer.vandepol@digital-power.com
Receive data insights, use cases and behind-the-scenes peeks once a month?
Sign up for our email list and stay 'up to data':