Het ontwerpen van waardevolle ML-systemen

Een stapsgewijze gids: van bedrijfsbehoefte tot productie

  • Artikel
  • Data Engineering
  • Machine learning operations
Machine learning engineers die het ontwerp van een machine learning-systeem bespreken

Machine learning (ML) wordt vaak gezien als een modelleerproces. Je kiest een algoritme, traint het, evalueert de statistieken en implementeert het. In werkelijkheid is de keuze voor het algoritme een van de minst belangrijke beslissingen die je neemt.

Voor de leesbaarheid en vanwege gangbare terminologie binnen het vakgebied is de rest van dit artikel in het Engels geschreven.

This article is for you if you're a Data Scientist, ML Engineer, or Analytics Lead who has built models that perform well in testing but struggle to translate them into real-life results. If you want to understand the challenges of an end-to-end ML project and how to overcome them in production, this guide gives you the structure to get from business question to working system.

What separates ML projects that deliver business value from those that stall in a notebook is system design: the structured process of translating a business need into a production-ready decision system. The difference becomes clear when you compare the two side by side:

Comparison table showing differences between ML Modelling and ML System Design across focus, success metrics, scope, and risk profile
The differences between ML Modelling and ML System Design

What is ML system design?

ML system design is the discipline of translating a business question into a production-ready system that reliably supports decision-making using data and machine learning. It covers everything from problem framing and data strategy to model development, evaluation, and operational integration.

Step 1: Translate the business question

Goal: define what decision will change, under which constraints, and how success is measured in business terms.

The first step is also the most underestimated. Before any data work begins, you need to understand what the client actually needs, not what they literally asked for. Clients speak in outcomes (“reduce churn”), not in ML terms (“train a binary classifier with a 30-day prediction window”). Your job is to bridge that gap.

Clarify the desired outcome. What decision will be made differently because this model exists? Clients rarely want a score, they want to know who to contact, when, and with what offer.

Identify operational constraints. How many customers can the team realistically reach? What’s the budget per intervention? What’s the minimum lead time?

Define success metrics. Model accuracy is almost never the right success metric. Business KPIs like revenue retained, churn rate reduction, and campaign ROI are what matter.

Specify actionability. Every prediction must map to a concrete intervention. If there’s no action tied to a prediction, the model is an expensive dashboard decoration.

Step 2: Frame the ML problem

Goal:Turn the business need into a precise prediction target with a defined horizon, scope, and integration path.

Once you understand the business need, you translate it into a formal ML problem. This means making explicit decisions that are often left vague, and that’s where projects go wrong.

Define the prediction target. What exactly are you predicting? The label must have a precise, unambiguous definition. Does it include partial disengagement, or only complete exits?

Choose the prediction horizon. How far ahead does the model need to predict? Too short and there’s no time to intervene. Too long and predictions become unreliable.

Set the model scope. Which entities does the model cover? Not every segment deserves the same model. High-value accounts may need different treatment than self-serve users.

Ensure integration alignment. How will predictions be consumed? If the output needs to feed a CRM workflow, that shapes everything from output format to latency requirements.

Step 3: Make your data decisions

Goal: Identify the right sources, engineer features with domain knowledge, and guard against quality pitfalls that silently break models.

Data decisions form the backbone of any ML system. Start by mapping which internal systems (CRM, billing, support logs, product analytics) can provide predictive power. For each source, assess availability, freshness, and reliability.

Raw data rarely has predictive power on its own. Effective features emerge when you combine data with domain understanding. Three categories tend to matter most: rolling aggregates that capture behaviour over time windows (30, 60, 90 days), trend and slope features that reveal whether engagement is increasing or declining, and recency indicators that measure how recently key interactions occurred.

Data quality is the most underestimated source of ML project failure. Watch for data leakage (features computed after the label event), survivorship bias (if departed users are removed from your data, the model only learns from those who stayed), class imbalance (a naive model predicting “no” every time can score 95%+ while being useless), and poor label quality (inconsistencies and incorrect measurements in historical data).

Step 4: Select and train your model

Goal: Find the model that best balances performance, interpretability, and maintenance cost for your specific context.

Model selection is a trade-off exercise, not a competition. A reliable approach follows three phases:

Establish a baseline. Start with a simple, interpretable model like logistic regression. It gives you a performance floor and initial insight into feature importance. If the baseline already meets the business need, you may not need anything more complex.

Explore complexity. Gradient boosting models (LightGBM, XGBoost) tend to perform well on tabular data and handle common data imperfections such as missing values, noisy features, and imbalanced classes through built-in parameters.

Tune deliberately. Use Bayesian hyperparameter search on stratified k-fold cross-validation, optimised for a metric that reflects the business objective (F1-score, AUC-ROC).

The critical question is not “which model scores highest?” but “does the marginal improvement justify the added complexity and maintenance burden?” A model that nobody can explain or maintain delivers no lasting value.

Step 5: Evaluate, offline and online

Goal: Validate that the model works statistically and delivers measurable business impact in the real world.

Evaluation is where many ML projects create a false sense of confidence. A model can look excellent in a notebook and fail completely in production.

Choose metrics that reflect the business objective, not just statistical performance. When the target event is rare, accuracy is misleading. Metrics like precision, recall, F1-score, and AUC-ROC give a more honest picture. Explainability matters too: SHAP values allow you to show, per individual prediction, which factors were decisive. This builds trust with the client and helps catch unexpected model behaviour before deployment.

Offline metrics tell you whether the model is statistically sound. Online evaluation, typically A/B testing, tells you whether it actually works in the real world. Always relate model performance back to business KPIs. A 3% improvement in AUC means nothing to a CFO. A €200K increase in retained revenue does.

Step 6: Integrate, monitor, and maintain

Goal: Embed predictions into workflows, close the feedback loop, and detect drift before it erodes value.

A model that is not embedded in business workflows delivers nothing. Predictions must land where decisions are made, whether that’s a CRM, a marketing automation platform, or an operational dashboard. Define exactly what actions are triggered by which predictions and under what conditions.

Once in production, feedback loops become essential. No amount of offline testing can guarantee real-world predictive value. Capturing actual outcomes (did the customer churn or not?) and comparing them against predictions is what allows you to measure true model performance, identify weaknesses, and continuously fine-tune the system.

ML systems are not static. Over time, the patterns a model learned can become outdated, a phenomenon known as data drift. Customer behaviour changes, new products launch, markets shift, and the data the model encounters in production gradually diverges from the data it was trained on. Without continuous monitoring, model performance degrades silently. Track prediction distributions, feature distributions, and business KPIs over time to catch these shifts early and trigger retraining before performance degrades.

And communicate transparently with the client: business teams need to understand what model outputs mean, how reliable they are, and where limitations exist.

The bigger picture

ML system design is not a technical exercise, it is a structured decision-making discipline. At every stage there are trade-offs: between simplicity and complexity, between accuracy and interpretability, between speed and robustness.

A question we often hear is: what is the difference between an ML model and an ML system? An ML model produces predictions. An ML system includes the business logic, data pipelines, integrations, monitoring, and feedback loops that turn those predictions into better decisions. The model is one component. The system is what delivers value.

This also explains why most ML projects fail to deliver impact. It is rarely because the algorithm was wrong. It is because predictions were never aligned to real business actions, constraints, and incentives. A model optimised for AUC in a notebook is not the same as a system that reduces churn in production.

The churn example illustrates this clearly: the model itself is almost incidental. What makes the system work is the precise problem definition, the carefully engineered features, the business-aligned evaluation, and the tight integration into the retention workflow. Take away any of those, and the algorithm alone delivers nothing.

Dit is een artikel van George Pavlidis

George Pavlidis is Data Scientist bij Digital Power. Hij heeft zes jaar ervaring op het gebied van datawetenschap, machine learning engineering en data-engineering. Hij heeft in diverse sectoren gewerkt, waaronder de financiële sector, hernieuwbare energie en de productiesector, waarbij hij zich altijd heeft gericht op het vertalen van zakelijke vraagstukken naar productieklaar ML-oplossingen. George heeft een bachelor in toegepaste informatica behaald aan de Universiteit van Macedonië en onderzoek gepubliceerd op gebieden variërend van spraakherkenning tot het voorspellen van de energievraag.

George Pavlidis

1x per maand data insights, praktijkcases en een kijkje achter de schermen ontvangen?

Meld je aan voor onze maillijst en blijf 'up to data':

Dit vind je mogelijk ook interessant

Breng data science modellen in productie met ons Machine Learning Operations framework

Zet de volgende stap in machine learning volwassenheid en faciliteer naadloze implementatie en governance van je ML-modellen met Machine Learning Operations. Bespaar tijd, verlaag kosten en vertaal je investeringen in machine learning naar concrete bedrijfswaarde.

Lees meer

Data Scientist (AI & Analytics)

Ontwikkel analytische en machine learning-modellen en vertaal data naar inzichten die direct toepasbaar zijn in de praktijk.

Lees meer

Machine Learning Engineer

Werk aan uitdagende machine learning- en data science-projecten bij toonaangevende organisaties.

Lees meer