This post is part of a series of texts aiming to explore and understand patterns and practices that enable the construction of a production-ready AI data infrastructure. The main focus of the series is on the modeling and retrieval of evolving data, which would empower Large Language Model (LLM) apps and Agents to serve millions of users concurrently.

For a broad overview of the problem and our understanding of the current state of the LLM landscape, check out our initial post here.

In this post, we delve into context enrichment and testing in Retrieval Augmented Generation (RAG) applications.

RAG applications can retrieve relevant information from a knowledge base and generate detailed, context-aware answers to user queries.

As we are trying to improve on the base information LLMs are giving us, we need to be able to retrieve and understand more complex data, which can be stored in various data stores, in many formats, and using different techniques.

All of this leads to a lot of opportunities, but also creates a lot of confusion in generating and using RAG applications and extending the existing context of LLMs with new knowledge.

1. Context Enrichment and Testing in RAG Applications

In navigating the complexities of RAG applications, the first challenge we face is the need for robust testing. Determining whether augmenting a LLM's context with additional information will yield better results is far from straightforward and often relies on subjective assessments.

Imagine, for instance, adding the digital version of the book The Adventures of Tom Sawyer to the LLM's database in order to enrich its context and obtain more detailed answers about the book's content for a paper we're writing. To evaluate this enhancement, we need a way to measure the accuracy of the responses before and after adding the book while considering the variations of every adjustable parameter.

2. Adjustable Parameters in RAG Applications

The end-to-end process of enhancing RAG applications involves various adjustable parameters, which offer multiple paths toward achieving similar goals with varying outcomes. These parameters include:

  1. Number of documents loaded into memory.
  2. Size of each sub-document chunk uploaded.
  3. Overlap between documents uploaded.
  4. Relationship between documents (Parent-Son etc.)
  5. Type of embedding used for data-to-vector conversion (OpenAI, Cohere, or any other embedding method).
  6. Metadata structure for data navigation.
  7. Indexes and data structures.