RAG with Multi-Query pattern
Additional discussion of Multi-Query pattern in RAG
The so-called Multi-Query mostly refers to using the origin query to ask the LLM to generate multiple “similar” queries, so called sub-queries. An earlier article introduced the Know-How based on this pattern, LangChain has “similar” direct support, and the implementation of Llama-Index is also very simple. You can check it out here:
This article will skip over the details of retrieval and instead introduce 2 approaches that directly use the “question and answer” of sub-queries as “context”, commonly referred to as “decomposition”.
Overview
To generate sub-queries and their corresponding answers, along with the original query, using a multi-query pattern will incur additional overhead. If using OpenAI or other non-”local” hosted models, be sure to optimize prompts and other aspects.
Sub-queries in Bundle
When we use LLM to generate “similar” sub-queries based on the origin query, we then search these sub-queries to get the corresponding answers. After that, we use the sub-queries and corresponding answers, along with the retrieval of the origin query itself, as the “context” for the query. Essentially, this is similar to Multi-Query Retrieval, but the difference is that we skip the retrieval details and discuss it directly at a higher level, which is the query level. That’s why this article mainly focuses on the Llama-Index code.
Sub-queries in step down
Actually, this pattern is very similar to the Prio-reasoning. We will run each sub-query one by one, and the result of each sub-query (question-answer pair) will be stored in a “memory”. The questioning of each sub-query is based not only on its own retrieval as context but also on this “memory”. In other words, each sub-query’s search is influenced by previous sub-queries. When the original query comes in for a search, it is influenced by all sub-queries and the original query’s own retrieval.
Furthermore
The MultiStepQueryEngine of Llama-Index is also a kind of “decomposition” implementation, and this article has a know-how introduction to it:
Code
more codes you can find in