How does the AI Document Explorer work in practice?

Generating more insight into your organisation with AI

  • Article
  • Data Engineering
  • AI & Data Science
  • AI document explorer
ai document explorer example

The AI Document Explorer (AIDE) is a cloud solution developed by Digital Power that utilises OpenAI's GPT model. It can be deployed to quickly gain insights into company documents. AIDE securely indexes your files, enabling you to ask questions about your own documents. Not only does it provide you with the answers you are looking for, but it also references the locations where these answers are found.

AIDE uses several components and operates as follows: users (like Sarah) ask questions through a web application. These questions are then processed via a Smart Retriever and subsequently forwarded to a private instance of a GPT model.

The Smart Retriever and the model use embeddings to generate answers. All of this runs within the Azure AI infrastructure. In this blog, we delve deeper into the definitions and use of various technical terms.

AI document explorer

Azure AI: enabling intelligent solutions

Azure AI is a product offered within the Azure Cloud Environment. It provides diverse AI capabilities in the areas of language, vision, automation, and more. It facilitates the integration of powerful models like GPT, developed by OpenAI.

AIDE utilises two Azure AI components: Azure AI Search and Azure OpenAI. Azure AI Search retrieves relevant documents based on your query. Azure OpenAI is used to connect to your private instance of the GPT model. This structure ensures that documents within the organisation are securely stored and not used for model training.

ai document explorer working

The role of Natural Language Processing

Natural Language Processing (NLP) is a subset of Artificial Intelligence (AI) aimed at enabling machines to understand and process natural language. This involves not only written language but also spoken language.

NLP programs have various applications due to their ability to understand natural language. This includes summarising and translating text as well as analysing input data.

The rise of Large Language Models

Large Language Models (LLMs) are a specific type of model that uses NLP. The model can produce language independently, even if that specific combination has never appeared in the training data. Based on extremely large amounts of data, the model tries to discover patterns and rules of language to apply them subsequently.

The importance of embeddings

Embeddings are used by LLMs to convert words into understandable numerical representations that can be processed by machines. During model training, words are mapped to an n-dimensional space.

In the image below, "apple" and "pear" are represented as [0.08, 0.38] and [0.25, 0.16], indicating their proximity. This numerical conversion enables calculations that support language comprehension and generation. The same applies to "king" and "queen," which are also close to each other. If, for example, "mango" were added to this space, it would be located near "apple" and "pear."

Embeddings example

Converting words into numerical representations makes it possible to perform calculations. This allows LLMs to answer questions by calculating the probability of certain words in that dimensional space.

Retrieval Augmented Generation: connecting AI models with your data

Retrieval Augmented Generation (RAG) is a technique to connect AI models with your own data. In practice, you support the model with more data to obtain the correct answers. A Smart Retriever retrieves your relevant documents, which are then forwarded to the machine learning model along with the original question and additional instructions (a prompt). Within the Azure environment, various versions of the GPT models are primarily used.

Thus, instead of just the question, the GPT model now also receives a prompt and relevant documents. This allows you to improve the answer or fine-tune it more accurately based on these documents.

Curious about the possibilities for your organisation? We would be happy to discuss how we can effectively deploy the AI Document Explorer! Feel free to contact us for more information or schedule a meeting.

This is an article by Myrthe Lammerse, Data Engineer at Digital Power

Myrthe has been working at Digital Power as a Data Engineer since 2022.

Myrthe Lammerse

Receive data insights, use cases and behind-the-scenes peeks once a month?


Sign up for our email list and stay 'up to data':

You might also like

Securely search through your confidential documents with the AI Document Explorer

The AI Document Explorer is a secure, AI-driven tool to improve your work efficiency. Streamline your work by quickly finding answers and accessing your documents, all in one secure place. Take the step towards working more efficiently and easily!

Read more

The organisational benefits of implementing your own AI-chatbot

With the increasing availability of cloud services that enable companies to leverage Large Language Models, it becomes relatively easy to setup your own GPT-model. However, one important question needs to be answered before you start building: what are the benefits for my organisation?

Read more

Fast and reliable internal information using AI Document Explorer

Financial institutions need to process large amounts of documentation. For this particular institution, an internal team facilitates this by, for example, creating summaries using text analysis and natural language processing (NLP). They make these available to the various business units. To conduct audits more efficiently, they wanted to develop a question-and-answer model to get the right information to them faster. When ChatGPT was launched, they asked us to create a proof of concept.

Read more