Smart text analysis: how our AI tool rapidly categorises large amounts of data
Is AI replacing the human researcher?
- Article
- AI & Data Science
- Customer Experience


Analysing hundreds or thousands of open answers from surveys, interviews or reviews is time-consuming. To better understand those answers, we group them into themes (for example: ease of use, service, delivery or reliability). At Digital Power, we use Large Language Models (LLMs) to quickly categorise high volumes of open answers. Our team built our own secure, transparent tool that lets us see exactly what happens under the hood. But is AI already advanced enough to replace our human researchers?
Our conclusion from an earlier blog was clear: AI is fast, seems easy to use and when given the right instructions, reasonably accurate. But many tools are a black box or make up incorrect answers. The human researcher remains essential for nuance, context and trustworthy insights.
We’ll take you through the process of categorising large volumes of qualitative data. In this article, you'll read the pros and cons for the researcher, both with and without the AI tool. At the end of this blog, we look ahead and discuss planned improvements to our AI tool. We keep learning and developing to make the collaboration between human and AI even more effective.
Step 1. Preparing the data
To use our AI tool, we place the dataset in a secure container in an Azure environment. In Databricks, we use a Python script to load the data. Our data engineers made the script user-friendly: as a researcher, you only fill in a few parameters, such as the location, name and format of the file. In this step, we can also easily filter out duplicates.
💡 With or without AI
Researchers also prepare data without AI; the tool mainly speeds up cleaning and filtering. It takes some time initially to figure out what your data should look like, where to store it and how the script works, but once you’ve done it: piece of cake!
Step 2. Creating categories with an LLM
This step requires some practice and iteration. As a researcher, you choose the number of categories to generate. Based on the output, you assess whether the number needs to be adjusted. When the categories don’t fully align, we can manually tweak them using SQL. For large datasets, using a sample is especially valuable: it saves time and reduces computing costs. One trade-off: always stay critical of data reliability. Ensure your sample is large enough to represent the entire dataset.
💡 With or without AI
How can a researcher judge whether the AI-generated categories match the data? When using AI like this, you haven’t yet immersed yourself in the content. For interviews you conducted yourself, this is easier. You know the context. But when analysing survey data, it becomes more challenging. To properly evaluate the categorisation, a researcher must always spend time understanding the data first.
Step 3. Classifying answers into categories
For this step, we use a prompt that positions our AI tool as an ‘opinion analyst'. The task: take each piece of text and classify it into one of the given categories. The advantage: the AI classification adds extra columns without changing the original data. This lets us see exactly what happens and ensures our opinion analyst doesn’t hallucinate (invent answers that aren’t there).
💡 With or without AI
Often, a single answer touches on multiple topics. Currently, the script allows each answer to be assigned to only one category. As a researcher, you have more flexibility: you can assign answers to several categories. There’s also the challenge of abbreviations, jargon and ambiguity. Our AI tool lacks the research and business context, making it harder to categorise accurately. On top of that, our AI tool still doesn’t understand sarcasm.
On the positive side: this step takes significantly less time than doing it manually. Not all answers are categorised correctly, which brings us to step 4.
Step 4. AI-based validation
To validate the categorisation, we assign our AI tool the role of “expert AI evaluator”. It assesses whether each categorisation is correct or incorrect and provides reasoning.
💡 With or without AI
Our expert AI evaluator is surprisingly good at improving incorrectly categorised answers—a promising sign. It not only corrects outputs that our human researcher flagged as illogical, but also provides valuable additional suggestions. However, it still remains necessary for a researcher to check the output. Again, the AI tool struggles with emotion, context and sarcasm.
Step 5. The insights
Our AI tool delivers solid and valuable work so far, but the most important step is still ahead: generating insights. This is where the AI tool’s knowledge currently ends. The output is our original list of answers with an associated category.
💡 With or without AI
Suppose the AI tool produces the category “satisfaction”. We know how many and which answers fall into this category, but not what the sentiment is. Are people satisfied or not? And what exactly are they satisfied or dissatisfied about? Here, the researcher’s expertise is vital: by exploring the data directly, they gain deep understanding of the content.
Optimising our AI tool
Our AI tool already performs well, but there’s plenty of room for improvement. In future iterations, we want the tool to:
• be usable without technical knowledge
• determine the number of categories automatically
• place an answer in multiple categories
• handle context, jargon and emotion
• analyse sentiment within each topic
• automatically generate summaries per category
We hope these improvements will allow our AI tool to add even more value in collaboration with researchers, who remain in control of interpretation and meaning.
Conclusion: the power lies in the collaboration between human and AI
AI shows that qualitative data analysis can be smarter, faster and more efficient. The biggest advantage of automatic categorisation is speed. AI also helps uncover patterns a researcher might overlook. But there’s a clear downside: the AI tool struggles with nuance, jargon, emotion, context and subjective language.
Our AI tool brings speed, structure and clarity, while the researcher provides context, interpretation and intuition. Individually, they each have limitations, but together they form a powerful duo, enabling faster and smarter insights. Should researchers be worried that AI will take over their jobs? No. Understanding context, assigning meaning and making the right interpretations; that remains human work, for now.
In short: let AI handle the heavy lifting, but rely on the researcher for the real insights.
This is an article by Mila van der Zwaag
Mila is a Researcher and Customer Experience Specialist at Digital Power. She has a background in Cognitive Psychology and likes to combine this knowledge about people and behaviour with data to arrive at the best solutions.
Customer Experience Specialistmila.vanderzwaag@digital-power.com
Receive data insights, use cases and behind-the-scenes peeks once a month?
Sign up for our email list and stay 'up to data':