How does the AI Document Explorer work in practice?
Generating more insight into your organisation with AI
- Article
- Data Engineering
- AI & Data Science
- AI document explorer

The AI Document Explorer (AIDE) is a cloud solution developed by Digital Power that utilises OpenAI's GPT model. It can be deployed to quickly gain insights into company documents. AIDE securely indexes your files, enabling you to ask questions about your own documents. Not only does it provide you with the answers you are looking for, but it also references the locations where these answers are found.
AIDE uses several components and operates as follows: users (like Sarah) ask questions through a web application. These questions are then processed via a Smart Retriever and subsequently forwarded to a private instance of a GPT model.
The Smart Retriever and the model use embeddings to generate answers. All of this runs within the Azure AI infrastructure. In this blog, we delve deeper into the definitions and use of various technical terms.
Azure AI: enabling intelligent solutions
Azure AI is a product offered within the Azure Cloud Environment. It provides diverse AI capabilities in the areas of language, vision, automation, and more. It facilitates the integration of powerful models like GPT, developed by OpenAI.
AIDE utilises two Azure AI components: Azure AI Search and Azure OpenAI. Azure AI Search retrieves relevant documents based on your query. Azure OpenAI is used to connect to your private instance of the GPT model. This structure ensures that documents within the organisation are securely stored and not used for model training.
The role of Natural Language Processing
Natural Language Processing (NLP) is a subset of Artificial Intelligence (AI) aimed at enabling machines to understand and process natural language. This involves not only written language but also spoken language.
NLP programs have various applications due to their ability to understand natural language. This includes summarising and translating text as well as analysing input data.
The rise of Large Language Models
Large Language Models (LLMs) are a specific type of model that uses NLP. The model can produce language independently, even if that specific combination has never appeared in the training data. Based on extremely large amounts of data, the model tries to discover patterns and rules of language to apply them subsequently.
The importance of embeddings
Embeddings are used by LLMs to convert words into understandable numerical representations that can be processed by machines. During model training, words are mapped to an n-dimensional space.
In the image below, "apple" and "pear" are represented as [0.08, 0.38] and [0.25, 0.16], indicating their proximity. This numerical conversion enables calculations that support language comprehension and generation. The same applies to "king" and "queen," which are also close to each other. If, for example, "mango" were added to this space, it would be located near "apple" and "pear."

Converting words into numerical representations makes it possible to perform calculations. This allows LLMs to answer questions by calculating the probability of certain words in that dimensional space.
Retrieval Augmented Generation: connecting AI models with your data
Retrieval Augmented Generation (RAG) is a technique to connect AI models with your own data. In practice, you support the model with more data to obtain the correct answers. A Smart Retriever retrieves your relevant documents, which are then forwarded to the machine learning model along with the original question and additional instructions (a prompt). Within the Azure environment, various versions of the GPT models are primarily used.
Thus, instead of just the question, the GPT model now also receives a prompt and relevant documents. This allows you to improve the answer or fine-tune it more accurately based on these documents.
Curious about the possibilities for your organisation? We would be happy to discuss how we can effectively deploy the AI Document Explorer! Feel free to contact us for more information or schedule a meeting.
Receive data insights, use cases and behind-the-scenes peeks once a month?
Sign up for our email list and stay 'up to data':