LangChain: History-Driven RAG Enhanced By LLM-Chain Routing

TeeTracker
6 min readMar 27, 2024

--

· Abstract
· Method
RAG Basic Concept(Vanilla RAG)
RAG with Route
· Implementation
Core
Route Chain
Context Chain
LLM
Code

gen by https://chrome.google.com/webstore/detail/sider-chatgpt-sidebar-%20-g/difoiogjjojoaoomphldepapgpbgkhkb/reviews

Abstract

Using LLM to predict the routing of the chain and achieve Retrieval Augmented Generation with help of conversation history applying HyDE or refined standalone query. The advantage of HyDE lies in its ability to generate hypothetical documents within the query-specific topic domain. In this context, the HyDE has been augmented by the conversation history, which can enhance the hypothetical boundary. If the route is willing to bypass HyDE, then LLM will also refine the original query based on history to continue the search.

Steamlit code:

Method

RAG Basic Concept(Vanilla RAG)

RAG is basically a search process, where the user inputs a query that is then embedded, and then searches for similar relevant information in documents that are also embedded. These results are used as context, and with the help of LLM and the original query, the final response text is generated. This is the simplest RAG process so far and is also the basis for numerous variations.

Vanilla RAG

RAG with Route

The method in this article is to add a route logic on top of the basic concept. There are various ways to drive the route, such as LangGraph. In this lab, we simply use LLM to decide whether to go with HyDE or refine the original query. Regardless, we also use the conversation history between the user and LLM. This benefits HyDE by providing a closer boundary to the hypothetical text to be generated. When the user query itself has “flaws”, the conversation history can also improve it. Through experiments, even if the conversation history is empty and the route goes to HyDE, LLM still generates an improved query. This implicit effect directs the user query towards the “correct” direction, thereby improving the quality of retrieval without a doubt.

streamlit app

Implementation

Core

The core chain is as follows, as shown in the above picture, we will use LLM to decide which branch to go, we have defined standardalonequery or hydechain for this lab, pointing to refine the original query and HyDE respectively.

    prompt = ChatPromptTemplate.from_messages(
[
SystemMessagePromptTemplate.from_template(
"Answer query inside [Query] marks solely based on the following context:\n{context}\n"
),
MessagesPlaceholder(variable_name="history"),
HumanMessagePromptTemplate.from_template("[Query]{query}[Query]"),
]
)

mid_chain = (
RunnablePassthrough.assign(next_step=route_chain())
| RunnablePassthrough.assign(context=(_routing_chain_ | base_retriever))
| prompt
| llm
| StrOutputParser()
)

final_chain = RunnableWithMessageHistory(
mid_chain,
lambda _: st.session_state["history"],
input_messages_key="query",
history_messages_key="history",
)

Route Chain

In this lab, when it comes to setting up the route, we completely rely on LLM for tone prediction of the query. After multiple prompt engineering, we have identified the following.


def route_chain() -> RunnableSerializable:
prompt = ChatPromptTemplate.from_messages(
[
SystemMessagePromptTemplate.from_template(
"""As an AI assistant for application route, determine whether to enter the process for handling declarative and normal sentences or interrogative and question sentences (indirect or direct questions) based on the user's query tone inside [Query] marks,
If it is a declarative and normal sentence process, return the string: "hydechain";
otherwise, return: "standardalonequery".

Notice:
Do not answer the query or make up the query, only return as simple as possible, eithter "standardalonequery" or "hydechain" as string without any instruction text, reasoning text, headlines, leading-text or other additional information.
"""
),
HumanMessagePromptTemplate.from_template("""[Query]{query}[Query]"""),
]
)

return prompt | llm | StrOutputParser()

Note that it’s not necessarily only “questions” that will lead to a standalone query chain. On the contrary, even declarative sentences may lead LLM to a standalone query chain. As mentioned earlier, the method to implement the route can be different, such as LangGraph. Programmers may also distinguish inputs through hard coding to determine the route direction. This experiment uses LCEL entirely.

Context Chain

Our main goal is to get high-quality retrieval to build the final context for generation. The premise of this text assumes that the reader is familiar with using LCEL, so I won’t go into that. Here, we’re translating the code that generates the context chain to speed up reading the entire code:

After obtaining the purpose of the route from route_chain (1st RunnablePassthrough) the chain acts as a state machine and will move on to the next transmission(2nd RunnablePassthrough). The first step is to enter the _routing_chain_ method, which reads the next_step of the state machine. The two if statements respectively provide chains that generate HyDE or standalone queries.

mid_chain = (
RunnablePassthrough.assign(next_step=route_chain())
| RunnablePassthrough.assign(context=(_routing_chain_ | base_retriever))
| prompt
| llm
| StrOutputParser()
)

We have provided:

  • HyDE Chain
def hyde_chain() -> RunnableSerializable:
prompt = ChatPromptTemplate.from_messages(
[
SystemMessagePromptTemplate.from_template(
"""Generate a different version of the text inside [Origin text] marks with the help of conversation history to retrieve relevant documents from a vector storage.
The version is for better model comprehension while maintaining the original text sentiment and brevity.
Your goal is to help the user overcome some of the limitations of the distance-based similarity search.
Notice: Only return the reformulated statement without any instruction text, reasoning text, headlines, leading-text or other additional information."""
),
MessagesPlaceholder(variable_name="history"),
HumanMessagePromptTemplate.from_template(
"[Origin text]\n{query}\n[Origin text]"
),
]
)

return prompt | hyde_llm | StrOutputParser()

It’s worth noting that in the lab, we can create a specialized LLM just for HyDE and crank up the temperature. This is because traditional RAGs typically rely on a temperature of 0 to reach query-dependent responses, but with HyDE’s nature, we want to add more randomness. So, it’s recommended to increase the temperature appropriately. The other LLMs in this experiment do not have a temperature.

  • Standalone Query Chain
def standalone_query_chain() -> RunnableSerializable:
prompt = ChatPromptTemplate.from_messages(
[
SystemMessagePromptTemplate.from_template(
"""Given a conversation history and a follow-up query inside [Query] marks, rephrase the follow-up query to be a standalone query. \
Do NOT answer the query, just reformulate it if needed, otherwise return it as is.
Notice: Only return the final standalone query as simple as possible without any instruction text, headlines, leading-text or other additional information."""
),
MessagesPlaceholder(variable_name="history"),
HumanMessagePromptTemplate.from_template("[Query]{query}[Query]"),
]
)

return prompt | llm | StrOutputParser()
  • Routing Chain
    @chain
def _routing_chain_(info) -> RunnableSerializable:
pretty_print("info", info)
if "standardalonequery" in info["next_step"].lower():
pretty_print("standalone_query_chain")
return standalone_query_chain()

if "hydechain" in info["next_step"].lower():
pretty_print("hydechain")
return hyde_chain()

raise ValueError("Invalid next step")

LLM

The purpose of this experiment is to compare the models provided by Groq and Nvidia Cloud. We used the llama2–70b-4096 and mixtral-8x7b-32768 models from these two providers. Readers can refer to the documentation to apply for an API key and globally configure it in the development environment using export.

Code

--

--