Fast and reliable internal information using AI Document Explorer

A financial institution

  • Customer case
  • Data projects
  • Data Engineering
image of euros
Joachim-business-manager
Joachim van Biemen
Business Manager
3 min
16 May 2024

Financial institutions need to process large amounts of documentation. For this particular institution, an internal team facilitates this by, for example, creating summaries using text analysis and natural language processing (NLP). They make these available to the various business units. To conduct audits more efficiently, they wanted to develop a question-and-answer model to get the right information to them faster. When ChatGPT was launched, they asked us to create a proof of concept.

Approach

Setting up a large language model within a proprietary environment with proprietary data is a relatively new field. To ensure privacy and set up a secure environment, we used the AI Document Explorer. This is a private instance of a GPT model according to the Retrieval Augemented Generation framework that we connected to the financial institution's existing infrastructure.

We set up the data processing process as follows:

  1. Documents from SharePoint are retrieved within the Azure environment.
  2. Data from images, web pages, PDFs, PowerPoints, Word documents and Excel files are converted to readable text.
  3. This output is split into small pieces of text and indexed for each piece. The semantic value per piece of text is also retrieved by a smart algorithm and stored with it.

We also built a chat app to talk to the data. When a user asks a question within this app, the following steps are taken:

  1. We convert the query into keywords that are used to return a top 10 indexed pieces of text that are semantically most similar to the keywords.
  2. We send the returned pieces of text to the ChatGPT model along with a system prompt (instructions) and the original query.
  3. We return the model's answer and the pieces of text used to the user.

Because ChatGPT passes along the pieces of text used, the user can see which source the answer came from and check if it is correct.

Result

Instead of having to search by themselves for answers to ad hoc questions and explanations of specific processes to be followed, employees of the financial institution can now ask questions about some 100,000 pieces of text of documentation. They get 'human' answers with source references and citations. The output looks as follows:

example output of AI document explorer
example of the output, not related to this specific case

Information is available faster and reliability is always verified by an employee. Thus, work is done more efficiently and the quality of work remains undiminished.

Want to know more?

Do you also want to safely search your confidential documents with the AI Document Explorer? Joachim would be glad to discuss the possibilities with you.

Receive data insights, use cases and behind-the-scenes peeks once a month?

Sign up for our email list and stay 'up to data':

You might find this interesting too

Securely search through your confidential documents with the AI Document Explorer

The AI Document Explorer is a secure, AI-driven tool to improve your work efficiency. Streamline your work by quickly finding answers and accessing your documents, all in one secure place. Take the step towards working more efficiently and easily!

Read more

The organisational benefits of implementing your own AI-chatbot

With the increasing availability of cloud services that enable companies to leverage Large Language Models, it becomes relatively easy to setup your own GPT-model. However, one important question needs to be answered before you start building: what are the benefits for my organisation?

Read more
ai document explorer example

How does the AI Document Explorer work in practice?

The AI Document Explorer (AIDE) is a cloud solution developed by Digital Power that utilises OpenAI's GPT model. It can be deployed to quickly gain insights into company documents. AIDE securely indexes your files, enabling you to ask questions about your own documents. Not only does it provide you with the answers you are looking for, but it also references the locations where these answers are found.

Read more
business managers having a conversation

Insight into the complete sales funnel thanks to a data warehouse with dbt

Our consultants log the assignments they take on for our clients in our ERP system AFAS. In our CRM system HubSpot, we can see all the information relevant before signing a collaboration agreement. When we close a deal, all the information from HubSpot automatically transfers to AFAS. So, HubSpot is mainly used for the process before entering a collaboration, while AFAS is used for the subsequent phase. To tighten our people's planning and improve our financial forecasts, we decided to set up a data warehouse to integrate data from both data sources.

Read more
iphone with spotify music

Converting billions of streams into actionable insights with a new data & analytics platform

Merlin is the largest digital music licensing partner for independent labels, distributors, and other rightsholders. Merlin’s members represent 15% of the global recorded music market. The company has deals in place with Apple, Facebook, Spotify, YouTube, and 40 other innovative digital platforms around the world for its’ member’s recordings. The Merlin team tracks payments and usage reports from digital partners while ensuring that their members are paid and reported to accurately, efficiently, and consistently.

Read more
woman shopping online

A standardised way of processing data using dbt

One of the largest online shops in the Netherlands wanted to develop a standardised way of data processing within one of its data teams. All data was stored in the scalable cloud data warehouse Google BigQuery. Large amounts of data were available within this platform regarding orders, products, marketing, returns, customer cases and partners.

Read more
kadaster header

Working more efficiently thanks to migration to Databricks

The Kadaster manages complex (geo)data, including all real estate in the Netherlands. All data is stored and processed using an on-premise data warehouse in Postgres. They rely on an IT partner for maintaining this warehouse. The Kadaster aims to save costs and work more efficiently by migrating to a Databricks environment. They asked us to assist in implementing this data lakehouse in the Microsoft Azure Cloud.

Read more
dutch highway

Reliable reporting using robust Python code

The National Road Traffic Data Portal (NDW) is a valuable resource for municipalities, provinces, and the national government to gain insight into traffic flows and improve infrastructure efficiency.

Read more

A fully automated data import pipeline

Stichting Donateursbelangen aims to strengthen trust between donors and charities. They believe that that trust is based on collecting money honestly, openly, transparently and respectfully. At the same time effectively using the raised donation funds to make an impact. To further this goal, Stichting Donateursbelangen wants to share information about charities with donors through their own search engine.

Read more
data platform

A scalable data platform in Azure

TM Forum, an alliance of over 850 global companies, engaged our company as a data partner to identify and solve data-related challenges.

Read more
potatoes

Valuable insights from Microsoft Dynamics 365

Agrico is a cooperative of potato growers. They cultivate potatoes for various purposes such as consumption and planting future crops. These potatoes are exported worldwide through various subsidiaries. All logistical and operational data is stored in their ERP system, Microsoft Dynamics 365. Due to the complexity of this system with its many features, the data is not suitable for direct use in reporting. Agrico asked us to help make their ERP data understandable and develop clear reports.

Read more
valk exclusief

Setting up a future-proof data infrastructure

Valk Exclusief is a chain of 4-star+ hotels with 43 hotels in the Netherlands. The hotel chain wants to offer guests a personal experience, both in the hotel and online.

Read more
data mental healthcare

Central data storage with a new data infrastructure

Dedimo is a collaboration of five mental healthcare initiatives. In order to continuously enhance the quality of their care, they organize internal processes more efficiently. Therefore, they use perceptions from the data that is internally available. Previously, they acquired the data themselves from different source systems with ad hoc scripts. They requested our help to make this process more robust, efficient and to further professionalise it. They asked us to facilitate the central storage of their data, located in a cloud data warehouse. The goal was to set up the data infrastructure within this environment, since they were already used to working with Google Cloud Platform (GCP).

Read more
billboards

A scalable machine-learning platform for predicting billboard impressions

The Neuron provides a programmatic bidding platform to plan, buy and manage digital Out-Of-Home ads in real-time. They asked us to predict the number of expected impressions for digital advertising on billboards in a scalable and efficient way.

Read more