Advanced RAG

2 min readDec 12, 2023

A proper way to do Retrieval Augmented Generation (RAG)

Cause

Actually, I’m diving deeper into RAG by learning the so-called Advanced RAG. Typically, LangChain has provided a very simple API for us to do vanilla RAG. This article gives the quickest code introduction:

Chat with your PDF （Streamlit Demo)

Conversation with specific files

teetracker.medium.com

I can’t say how powerful they are, but I can guarantee that these codes can already handle daily document retrieval and summarization tasks.

What is RAG?

Retrieval-augmented generation (RAG) is a natural language processing (NLP) technique that utilizes the capabilities of retrieval and generative-based AI models.

What is Vanilla RAG?

Naive RAG commonly involves dividing documents into sections, embedding them, and then retrieving sections based on semantic similarity in response to a user’s query.

This method is straightforward but has generally poor performance.

Why we need Advanced RAG?

More control over the summaries and related documents
Improving the retrieval of diagram or chart-based data
Enhancement of image data retrieval
Much more informative than the original content.

Architecture

Basically, our goal is to retrieve a batch of “vectorized” summaries through prompts and then retrieve the text related to the summaries. We use Chroma’s vector store to store the summaries and a doc store to store the text. In this example, the doc store is an InMemoryStore. For the retriever, we use the MultiVectorRetriever, which is based on Chroma.

The final chain will be

question: prompt -> context: retriever -> LLM -> result in string

            {"question": RunnablePassthrough(), "context": retriever}
            | prompt
            | ChatOpenAI(temperature=0, model="gpt-4-1106-preview")
            | StrOutputParser()

Process

Assume we are doing a search for a PDF file.
We use a library called partition_pdf to split PDFs.
Just a heads up, the splitting process can be really slow without a GPU.
The split elements mainly consist of two types, with a focus on text and tables.
<class ‘unstructured.documents.elements.Table’>
<class ‘unstructured.documents.elements.CompositeElement’>
We’re lucky that we can just use str() to convert these types directly.
When we have two sets of lists, one set is the content of the table, and the other set is the content of the text.
We’ll do an LLM prompt on these two lists separately, seeking to make some summaries of them. Note that we need to ensure that the summaries of tables and the number of tables remain consistent here, and the same goes for text, so using assert is necessary.
Create retriever, store the summaries of tables and tables’ content into the vector store and doc store. Treat the text the same way.
Estable chain for prompting.

Full code

chat-your-doc/advanced/adv_rag.py at master · XinyueZ/chat-your-doc

Awesome LLM application repo. Contribute to XinyueZ/chat-your-doc development by creating an account on GitHub.

github.com