RAG with Multi-Query pattern

3 min readFeb 19, 2024

Additional discussion of Multi-Query pattern in RAG

The so-called Multi-Query mostly refers to using the origin query to ask the LLM to generate multiple “similar” queries, so called sub-queries. An earlier article introduced the Know-How based on this pattern, LangChain has “similar” direct support, and the implementation of Llama-Index is also very simple. You can check it out here:

LangChain / Llama-Index: RAG with Multi-Query Retrieval

Enhance query context with intermediate queries during RAG to improve information retrieval for the original query.

teetracker.medium.com

This article will skip over the details of retrieval and instead introduce 2 approaches that directly use the “question and answer” of sub-queries as “context”, commonly referred to as “decomposition”.

Overview

To generate sub-queries and their corresponding answers, along with the original query, using a multi-query pattern will incur additional overhead. If using OpenAI or other non-”local” hosted models, be sure to optimize prompts and other aspects.

Sub-queries in Bundle

When we use LLM to generate “similar” sub-queries based on the origin query, we then search these sub-queries to get the corresponding answers. After that, we use the sub-queries and corresponding answers, along with the retrieval of the origin query itself, as the “context” for the query. Essentially, this is similar to Multi-Query Retrieval, but the difference is that we skip the retrieval details and discuss it directly at a higher level, which is the query level. That’s why this article mainly focuses on the Llama-Index code.

Sub-queries in step down

Actually, this pattern is very similar to the Prio-reasoning. We will run each sub-query one by one, and the result of each sub-query (question-answer pair) will be stored in a “memory”. The questioning of each sub-query is based not only on its own retrieval as context but also on this “memory”. In other words, each sub-query’s search is influenced by previous sub-queries. When the original query comes in for a search, it is influenced by all sub-queries and the original query’s own retrieval.

Furthermore

The MultiStepQueryEngine of Llama-Index is also a kind of “decomposition” implementation, and this article has a know-how introduction to it:

Prio-reasoning

Apply the “Prio-reasoning” in RAG. “Refine” in LangChain and “MultiStepQueryEngine” in Llama-Index.

teetracker.medium.com

Code

Streamlit app

more codes you can find in

LangChain / Llama-Index: RAG with Multi-Query Retrieval

Enhance query context with intermediate queries during RAG to improve information retrieval for the original query.

teetracker.medium.com

RAG with Multi-Query pattern

LangChain / Llama-Index: RAG with Multi-Query Retrieval

Enhance query context with intermediate queries during RAG to improve information retrieval for the original query.

Overview

Sub-queries in Bundle

Sub-queries in step down

Furthermore

Prio-reasoning

Apply the “Prio-reasoning” in RAG. “Refine” in LangChain and “MultiStepQueryEngine” in Llama-Index.

Code

LangChain / Llama-Index: RAG with Multi-Query Retrieval

Enhance query context with intermediate queries during RAG to improve information retrieval for the original query.

Sign up to discover human stories that deepen your understanding of the world.

Free

Membership

Written by TeeTracker

No responses yet