Moet je LLM’s lokaal draaien?

When local AI models outperform API-based services

Artikel
AI & Data Science
Data Engineering

Bob Strube

Data Engineer

5 min

30 Mar 2026

Large Language Models (LLMs) zijn in korte tijd een standaard onderdeel geworden van moderne applicaties. De meeste teams starten met het integreren van modellen zoals OpenAI of Claude via API’s. Dat is snel, eenvoudig en vereist weinig infrastructuur.

Maar wat gebeurt er als:

API-kosten stijgen?
Modellen onverwacht veranderen?
Je met gevoelige data werkt die je omgeving niet mag verlaten?

Voor organisaties die werken met proprietary data, hoge volumes of strikte compliance-eisen, wordt het draaien van open-source LLMs lokaal een steeds aantrekkelijker alternatief.

In dit artikel leggen we uit wanneer lokale LLM deployment zinvol is, hoe je begint en welke infrastructuur nodig is om van experiment naar productie te gaan.

Voor de leesbaarheid en vanwege gangbare terminologie binnen het vakgebied is de rest van dit artikel in het Engels geschreven.

This article in short

Use API-based LLMs for fast prototyping and low-volume use cases
Run LLMs locally when data privacy, cost control or scale are critical
Local LLMs remove rate limits and give full control over model behaviour
Mid-size models (~70B) offer the best balance between performance and cost
A hybrid setup (API + local models) is often the most practical strategy

Why not just use API-based LLMs?

API services such as OpenAI or Anthropic offer powerful models that are easy to integrate. For many use cases, they remain the simplest option.

However, there are situations where local models offer clear advantages.

When running LLMs locally becomes attractive

Local deployment becomes particularly valuable when:

You process sensitive or regulated data
Usage volumes become large and predictable
API costs grow significantly
You need stable and reproducible model behaviour

Understanding local LLM deployment

Running LLMs locally involves several components. To simplify the landscape, we divide the process into five key areas:

Local development tools
Interfaces and user experience
Model selection and performance
Deployment infrastructure
Cost considerations

1. Local development tools

Before deploying models in production, you will typically start by testing them locally. Two tools make this particularly straightforward:

Both tools allow you to prototype locally before committing to production infrastructure.

In practice, LM Studio is often preferred for quick testing, while Ollama offers more flexibility for automation and integration.

2. Interface and user experience

Once models run locally, you need a way to interact with them.

While LM Studio and Ollama include basic chat interfaces, many teams prefer a dedicated interface layer.

Open WebUI provides a modern web interface similar to ChatGPT, but connected to your own models.

Key capabilities:

Connect to multiple model backends
Compare model responses side-by-side
Share access with team members
Manage conversation history
Upload documents for context

This makes it especially useful when you want to give non-technical colleagues access to local AI tools.

3. Model selection and performance

Choosing the right model is essential when running LLMs locally. Model size is typically measured in parameters (billions).

For many organisations, mid-size models around 70B parameters provide the best balance between performance and infrastructure cost. Smaller models can handle high-volume, simpler tasks, while larger models are typically reserved for specialised workloads where maximum capability is required.

4. Deployment and infrastructure

Once you move beyond experimentation, you need infrastructure capable of serving models reliably.

Running models locally

Your own laptop or workstation is often sufficient for development.

This works well when:

testing integrations
experimenting with prompts
running smaller models

Deploying to GPU virtual machines

For production workloads, a GPU-enabled VM is typically required.

This is necessary when:

multiple users access the model
uptime and reliability are important
larger models require more VRAM
higher throughput is required

Typical deployment setup

A common architecture includes:

GPU VM
Ollama running the model
Nginx reverse proxy
client applications connecting via API

This setup provides an OpenAI-compatible endpoint that you control entirely.

5. Cost considerations

There are generally three ways to serve LLMs:

When does local deployment make sense?

Local models are particularly valuable when:

Workloads are high volume and predictable
Batch processing tasks analyse large datasets
Multiple use cases share the same infrastructure
Data privacy requirements prevent external sharing

Many organisations adopt a hybrid strategy, combining local models with external APIs depending on the task.

Conclusion

Running open-source LLMs locally gives organisations significantly more control over their AI infrastructure.

You gain:

full data privacy
predictable infrastructure costs
independence from API rate limits
stable model behaviour in production

For teams exploring this approach, a practical path forward is:

Start small
Experiment locally using tools such as LM Studio or Ollama.
Choose the right model
Match model size to the complexity of your task.
Scale thoughtfully

When moving to production, evaluate whether a budget GPU provider or a major cloud platform fits your architecture best.

Local LLMs are not always the right solution, but for many organisations they give greater flexibility, control and cost efficiency in modern AI systems.

Dit is een artikel van Bob Strube

Bob is Data Engineer bij Digital Power en specialiseert zich in AI en schaalbare dataplatformen. Hij heeft machine learning- en GenAI-oplossingen gebouwd en uitgerold op Azure en Databricks. Hij helpt organisaties om van experiment naar productie te gaan met betrouwbare, kostenefficiënte en privacyvriendelijke infrastructuur.

Bob Strube

Data Engineer

1x per maand data insights, praktijkcases en een kijkje achter de schermen ontvangen?

Meld je aan voor onze maillijst en blijf 'up to data':

Aanmelden

Misschien vind je dit ook leuk

Maandelijkse Digital Insights – juli 2026

In deze maandelijkse serie delen we de belangrijkste trends, productupdates en inzichten voor professionals in AI, data en analytics. Daarnaast delen we onze visie op wat deze ontwikkelingen betekenen voor organisaties en waar wij de grootste kansen zien om waarde te creëren.

This article in short

Why not just use API-based LLMs?

When running LLMs locally becomes attractive

Understanding local LLM deployment

1. Local development tools

2. Interface and user experience

3. Model selection and performance

4. Deployment and infrastructure

5. Cost considerations

When does local deployment make sense?

Conclusion

Dit is een artikel van Bob Strube

1x per maand data insights, praktijkcases en een kijkje achter de schermen ontvangen?

Misschien vind je dit ook leuk

Maandelijkse Digital Insights – juli 2026

Hoe Databricks Genie Agents data toegankelijk maken met natuurlijke taal

AI lost je dataprobleem niet op. Het vergroot het uit.

Van sensordata naar beslissingen die je operatie echt verbeteren

Data governance in Azure Databricks

In 5 stappen grip op data governance voor je data- & AI-platform

Claude Code configureren met CLAUDE.md

Snellere AI-zoekresultaten met een schaalbare streaming data pipeline

In 3 stappen naar effectieve data governance

Waarom je niet langer kunt wachten met data governance

Waarom moderne data-architectuur een organisatievraagstuk is

Het ontwerpen van waardevolle ML-systemen

Tealium Digital Velocity: AI in de praktijk

Minder administratietijd in de gezondheidszorg dankzij veilige AI-gespreksrapportage

4× snellere personalisatie met een composable CDP (Databricks deepdive)

Direct inzicht in sensordata met een self-service analytics platform

Hoe AI programeren verandert: Van autocomplete naar agentic coderen

Meer grip op AI-initiatieven met ondersteuning van een Analytics Translator

Dataplatform audit biedt helder inzicht en concrete optimalisaties

Slimme tekstanalyse: hoe onze AI-tool snel grote hoeveelheden data categoriseert

Van ambitie naar activatie: hoe Ennatuurlijk met data echt in beweging kwam

Hoe migreer je je data warehouse?

Download: Migratiegids voor moderne data warehousing

Welke Europese data warehouse oplossingen zijn beschikbaar?

Is jouw organisatie klaar voor onafhankelijkheid van de VS en de overstap naar EuroStack?

AI, GenAI, ML en MLOps uitgelegd

AI agents ontrafeld

Waarom het voor organisaties nú belangrijk is om in AI-trainingen te investeren

Hoe een start-up begint met datagedreven werken

Hoe bouw je een sterk cloud governance framework op?

Webinar | Machine Learning operations framework

Van strategie naar realisatie: een datagedreven toekomst

400% snellere time-to-market voor nieuwe personalisatie use cases

Webinar | Hoe Transavia haar klantdata samenvoegde via een composable customer data platform (CDP)

Wat is een composable CDP en waarom is het de toekomst?

Gepersonaliseerde marketing met een composable CDP

Schaalbare machine learning-modellen dankzij implementatie MLOps-framework

Implementatie van AI-toepassingen die businesswaarde opleveren

Wat is data governance?

Machine Learning-inferentie optimaliseren met PySpark en Pandas UDF's

Effectiever verkopen dankzij voorspelling van kans op leadconversie

Met MLOps naar volledig geautomatiseerde en betrouwbare salesvoorspellingen

Een schaalbaar datamodel voor de analytics van meerdere websites

Wat is social listening?

Duurzame groei door komst datateam

Low-code/no-code versus zelf coderen

Wat doet een (Cloud) Data Engineer versus een Machine Learning Engineer?

De organisatorische voordelen van het implementeren van je eigen AI-chatbot

Hoe werkt de AI Document Explorer in de praktijk?

Snelle en betrouwbare interne informatie met behulp van AI Document Explorer

Een dataplatform implementeren

Efficiënter werken dankzij migratie naar Databricks

Kwalitatieve onderzoekers vervangen door AI, een goede beslissing?

Breng structuur aan in je data

Miljarden streams omgezet in bruikbare inzichten met een nieuw data- en analytics platform

Cloudmigratie: hoe werkt dit in de praktijk?

Wat is machine learning operations (MLOps)?

Webinar: Data Governance

20% minder klachten dankzij datagedreven onderhoudsrapportages

Waardevolle inzichten uit Microsoft Dynamics 365

Kubernetes-based event-driven autoscaling met KEDA: een praktische gids

AWS (Amazon Web Services) versus GCP (Google Cloud Platform) voor Apache Airflow

Inzicht in de complete salesfunnel dankzij een datawarehouse met dbt

Datakwaliteit: de basis voor effectief datagedreven werken

Het all-round profiel van de moderne data engineer

Inzicht in marktdynamieken voor een stevigere concurrentiepositie