Using MLOps for fully automated and reliable sales forecasting

Global asset manager

Customer case
Data Engineering
Data consultancy
Machine learning operations

Philip Roeleveld

Machine Learning Engineer

5 min

24 Sep 2024

A global asset manager, specialising in Quant and Sustainable Investing, offers a range of investment strategies, including equities and bonds. To strengthen their competitive position and proactively respond to changing client needs and market developments, the sales and marketing department aimed to adopt a more data-driven approach.

The team conducted data analyses to answer ad-hoc queries, and a data scientist developed a machine learning model to predict sales opportunities. However, this model was hosted on the data scientist's laptop, which caused the forecasts on the dashboard to quickly become outdated and required a lot of manual work to update. While this was a solid first step, it was not a sustainable, future-proof solution.

To create an automated system that could generate periodic forecasts and send them directly to the dashboard, they enlisted the help of one of our Data Engineers.

Approach

We worked alongside the asset manager’s data scientist through coaching sessions. Our goal was to automate and future-proof the existing sales forecasting model using MLOps best practices. This ensured that the model would continue to operate in the future and allowed for easier integration of new models. Since the data scientist was closely involved throughout the process and implemented many aspects with coaching, the knowledge and MLOps methodology would remain embedded within the company. The process involved the following steps:

1. Automating source data: The first task was to automate the retrieval of data used for training the model from the source. A challenge here was that the asset manager was simultaneously restructuring their data warehouse, so we couldn’t directly connect to it. Instead, we temporarily automated the upload of data from the analytics system to Azure. Once the new data warehouse is ready, the model will be able to connect to it directly.

2. Setting up pipelines: Next, we created two pipelines in Azure ML and Azure DevOps. The first pipeline was designed for training the model, and the second for generating predictions. The second pipeline also ensures that the data automatically reaches the appropriate location for the dashboard.

3. Rewriting code: We rewrote the code to make it suitable for automation and for use in the two pipelines. We also placed the code in a Git repository to enable version control and CI/CD.

We set up the automation according to the MLOps method, making it suitable for multiple models and future-proof. To do this, we made the boilerplate code generic so that both the code and configuration could be easily reused for other models. This promotes consistency, speeds up the development of new models, and ensures that all models will run in a stable environment in the future.

Result

The model is now fully automated and future-proof. Predictions are generated automatically and immediately integrated into the dashboard, saving a significant amount of manual work. The only remaining manual process is the approval of the model after training. This final review by a person remains essential to ensure quality.

Additionally, the data scientist has gained considerable knowledge in data engineering and MLOps. This enables her to apply these skills more independently in the future, ensuring that the expertise remains within the organisation.

Future

In the future, we will continue to collaborate with the asset manager’s data scientist to develop a new model for another application. As this model will need to be developed from scratch, we will begin with data science and then integrate it into the same structure for automation. Since the first model was set up using the proper MLOps approach, this process will be significantly more efficient.

Want to know more?

Joachim will be happy to talk to you about what we can do for you and your organisation as a data partner.

Joachim van Biemen

Commercieel Manager Data Engineering+31(0)20 308 43 90 +31(0)6 23 59 83 71 joachim.vanbiemen@digital-power.com

Schedule an online meeting

Receive data insights, use cases and behind-the-scenes peeks once a month?

Your Data Engineering partner

Generate reliable and meaningful insights from a solid, secure and scalable infrastructure. Our team of 25+ Data Engineers is ready to implement, maintain and optimise your data products and infrastructure end-to-end.

Comparing the best Python project managers

In the ever-changing world of Python, managing packages, environments and versions efficiently is important. Traditional tools like pip and conda have served us well, but as projects become more complex, so do our requirements. This guide looks at modern alternatives - Poetry, PDM, Hatch and Rye - each of which offers unique capabilities to streamline Python project management.

What does a (Cloud) Data Engineer do versus a Machine Learning Engineer?

In the world of data and technology, Data Engineers and Machine Learning Engineers are crucial players. Both roles are essential for designing, building, and maintaining modern data infrastructures and advanced machine learning (ML) applications. In this blog, we focus specifically on the roles and responsibilities of a Data Engineer and Machine Learning Engineer.

How does the AI Document Explorer work in practice?

The AI Document Explorer (AIDE) is a cloud solution developed by Digital Power that utilises OpenAI's GPT model. It can be deployed to quickly gain insights into company documents. AIDE securely indexes your files, enabling you to ask questions about your own documents. Not only does it provide you with the answers you are looking for, but it also references the locations where these answers are found.

Fast and reliable internal information using AI Document Explorer

Financial institutions need to process large amounts of documentation. For this particular institution, an internal team facilitates this by, for example, creating summaries using text analysis and natural language processing (NLP). They make these available to the various business units. To conduct audits more efficiently, they wanted to develop a question-and-answer model to get the right information to them faster. When ChatGPT was launched, they asked us to create a proof of concept.

Implementing a data platform

Based on our know-how, the purpose of this blog is to transmit our knowledge and experience to the community by describing guidelines for implementing a data platform in an organisation. We understand that the specific needs of every organisation are different, that they will have an impact on the technologies used and that a single architecture satisfying all of them makes no sense. So, in this blog we will keep it as general as we can.

Working more efficiently thanks to migration to Databricks

The Kadaster manages complex (geo)data, including all real estate in the Netherlands. All data is stored and processed using an on-premise data warehouse in Postgres. They rely on an IT partner for maintaining this warehouse. The Kadaster aims to save costs and work more efficiently by migrating to a Databricks environment. They asked us to assist in implementing this data lakehouse in the Microsoft Azure Cloud.

Converting billions of streams into actionable insights with a new data & analytics platform

Merlin is the largest digital music licensing partner for independent labels, distributors, and other rightsholders. Merlin’s members represent 15% of the global recorded music market. The company has deals in place with Apple, Facebook, Spotify, YouTube, and 40 other innovative digital platforms around the world for its’ member’s recordings. The Merlin team tracks payments and usage reports from digital partners while ensuring that their members are paid and reported to accurately, efficiently, and consistently.

20% fewer complaints thanks to data-driven maintenance reports

An essential part of Otis's business operations is the maintenance of their elevators. To time this effectively and proactively inform customers about the status of their elevator, Otis wanted to implement continuous monitoring. They saw great potential in predictive maintenance and remote maintenance.

Kubernetes-based event-driven autoscaling with KEDA: a practical guide

This article explains the essence of Kubernetes Event Driven Autoscaling (KEDA). Subsequently, we configure a local development environment enabling the demonstration of KEDA using Docker and Minikube. Following this, we expound upon the scenario that will be implemented to showcase KEDA, and we guide through each step of this scenario. By the end of the article, you will have a clear understanding of what KEDA entails and how they can personally implement an architecture with KEDA.

Setting up Azure App functions

In the article, we start by discussing Serverless Functions. Then we demonstrate how to use Terraform files to simplify the process of deploying a target infrastructure, how to create a Function App in Azure, the use GitHub workflows to manage continuous integration and deployment, and how to use branching strategies to selectively deploy code changes to specific instances of Function Apps.

Setting up a future-proof data infrastructure

Valk Exclusief is a chain of 4-star+ hotels with 43 hotels in the Netherlands. The hotel chain wants to offer guests a personal experience, both in the hotel and online.

A day in the life of a Data Engineer

For developing modern data applications, a Data Engineer is essential. But what does it actually mean to be a Data Engineer and what exactly do you do? Our colleague Oskar, Data Engineer at Digital Power, explains.

A well-organised data infrastructure

FysioHolland is an umbrella organisation for physiotherapists in the Netherlands. A central service team relieves therapists of additional work, so that they can mainly focus on providing the best care. In addition to organic growth, FysioHolland is connecting new practices to the organisation. Each of these has its own systems, work processes and treatment codes. This has made FysioHolland's data management large and complex.

A scalable machine-learning platform for predicting billboard impressions

The Neuron provides a programmatic bidding platform to plan, buy and manage digital Out-Of-Home ads in real-time. They asked us to predict the number of expected impressions for digital advertising on billboards in a scalable and efficient way.