Experience with the LangChain App Template
Create wine price Q&A with the LangChain App Template
We will utilize LangChain’s template to create a basic Q&A application, and utilize LLM to obtain price references for wine.
Install LangChain CLI
pip install -U langchain-cli
OpenAI Key
To use OpenAI by default, global setup is required.
export OPENAI_API_KEY
App creation
In the terminal to create a new application, run:
langchain app new wine-price-app
Add template
There are various methods for adding templates. When you run the langchain app new, you may be prompted directly. Alternatively, you can skip it and add the templates later, to run like:
langchain app add csv-agent
We use the csv-agent template to request information from a CSV file. Check the template package here.
The template uses passenger information from the Titanic incident as a source of information. After we finish building the application, we can query relevant information from this document without using our own data.
Route for FastAPI
After adding the template (takes < 1 min), you can see the following codes which should be added into a server.py.
The template-based app is under the hood of FastAPI.
Copy the code and we will use them later.
from csv_agent import agent_executor as csv_agent_chain
add_routes(app, csv_agent_chain, path="/csv-agent")
Poetry on Dependencies
Once an application has been created, you can find something similar under the project directory (wine-price-app):
You’ve seen the pyproject.toml, so you know it means we’ll use poetry to manage dependencies.
When addressing the Python version issue, you should follow these steps:
1. Update the Python version, as some templates require a specific version.
2. Force the change of the Python property in the pyproject.toml file.
Run the App
🚨The application must be launched from the root directory of the application, here:
wine-price-app/
Edit server.py
Before we run the application, we must change the app/server.py, as you can see it is a typical FastAPI application.
from fastapi import FastAPI
from fastapi.responses import RedirectResponse
from langserve import add_routes
app = FastAPI()
@app.get("/")
async def redirect_root_to_docs():
return RedirectResponse("/docs")
# Edit this to add the chain you want to add
add_routes(app, NotImplemented)
if __name__ == "__main__":
import uvicorn
uvicorn.run(app, host="0.0.0.0", port=8000)
Change the code below the comment:
# Edit this to add the chain you want to add
from fastapi import FastAPI
from fastapi.responses import RedirectResponse
from langserve import add_routes
app = FastAPI()
@app.get("/")
async def redirect_root_to_docs():
return RedirectResponse("/docs")
from csv_agent import agent_executor as csv_agent_chain
add_routes(app, csv_agent_chain, path="/csv-agent")
if __name__ == "__main__":
import uvicorn
uvicorn.run(app, host="0.0.0.0", port=8000)
Remember the code we copied previously?
Start server
In terminal to run (cmd+c or ctrl+c to kill the serve)
poetry run langchain serve
The app’s API has been published at http://127.0.0.1:8000/docs
The app’s main entry is located at http://127.0.0.1:8000/csv-agent/playground/
Utilize our own data
As we mentioned earlier, we have only been using the default data provided by the template so far. This is passenger information for the Titanic incident. We can ask related questions, such as: what is the sex of Braund, Mr. Owen Harris? And then we can get the relevant answer.
We need to modify the items under the application packages directory for our own data, in our case:
wine-price-app/packages/csv-agent
🚨First and foremost, we utilize the data source here. https://github.com/XinyueZ/llm-fine-tune-wine-price/blob/master/data/wine_data.csv
The CSV file contains the following columns:
country, description, designation, points, price, province, region_1, region_2, variety, winery
I think there’s no need to explain their meaning too much.
Create data based on this CSV
Before we do let’s check the official document. According to the official document, we must reuse the ingest.py script to convert the CSV into a vectorstore.
In the wine-price-app/packages/csv-agent directory, we can observe:
After retrieving the data source, we can observe:
Edit the ingest.py file for our specific data. Let’s review the original source:
from langchain.document_loaders import CSVLoader
from langchain.indexes import VectorstoreIndexCreator
from langchain.vectorstores import FAISS
loader = CSVLoader("/Users/harrisonchase/Downloads/titanic.csv")
docs = loader.load()
index_creator = VectorstoreIndexCreator(vectorstore_cls=FAISS)
index = index_creator.from_documents(docs)
index.vectorstore.save_local("titanic_data")
It’s obvious that what we need to change is the loader and save_local:
from langchain.document_loaders import CSVLoader
from langchain.indexes import VectorstoreIndexCreator
from langchain.vectorstores import FAISS
loader = CSVLoader("wine_data.csv")
docs = loader.load()
index_creator = VectorstoreIndexCreator(vectorstore_cls=FAISS)
index = index_creator.from_documents(docs)
index.vectorstore.save_local("wine_data")
Run: python ingest.py, and then we can proceed to convert the data. Due to the large size of the data source, it will take some time to complete.
python ingest.py
Connect to new data
If you want to access/connect to our data, we also need to modify agent.py:
wine-price-app/packages/csv-agent/csv_agent/agent.py
Yes, there is also a csv_agent directory, and its agent.py is our target:
from pathlib import Path
import pandas as pd
from langchain.agents import AgentExecutor, OpenAIFunctionsAgent
from langchain.chat_models import ChatOpenAI
from langchain.embeddings import OpenAIEmbeddings
from langchain.prompts import ChatPromptTemplate, MessagesPlaceholder
from langchain.pydantic_v1 import BaseModel, Field
from langchain.tools.retriever import create_retriever_tool
from langchain.vectorstores import FAISS
from langchain_experimental.tools import PythonAstREPLTool
MAIN_DIR = Path(__file__).parents[1]
pd.set_option("display.max_rows", 20)
pd.set_option("display.max_columns", 20)
embedding_model = OpenAIEmbeddings()
vectorstore = FAISS.load_local(MAIN_DIR / "titanic_data", embedding_model)
retriever_tool = create_retriever_tool(
vectorstore.as_retriever(), "person_name_search", "Search for a person by name"
)
TEMPLATE = """You are working with a pandas dataframe in Python. The name of the dataframe is `df`.
It is important to understand the attributes of the dataframe before working with it. This is the result of running `df.head().to_markdown()`
<df>
{dhead}
</df>
You are not meant to use only these rows to answer questions - they are meant as a way of telling you about the shape and schema of the dataframe.
You also do not have use only the information here to answer questions - you can run intermediate queries to do exporatory data analysis to give you more information as needed.
You have a tool called `person_name_search` through which you can lookup a person by name and find the records corresponding to people with similar name as the query.
You should only really use this if your search term contains a persons name. Otherwise, try to solve it with code.
For example:
<question>How old is Jane?</question>
<logic>Use `person_name_search` since you can use the query `Jane`</logic>
<question>Who has id 320</question>
<logic>Use `python_repl` since even though the question is about a person, you don't know their name so you can't include it.</logic>
""" # noqa: E501
class PythonInputs(BaseModel):
query: str = Field(description="code snippet to run")
df = pd.read_csv(MAIN_DIR / "titanic.csv")
template = TEMPLATE.format(dhead=df.head().to_markdown())
prompt = ChatPromptTemplate.from_messages(
[
("system", template),
MessagesPlaceholder(variable_name="agent_scratchpad"),
("human", "{input}"),
]
)
repl = PythonAstREPLTool(
locals={"df": df},
name="python_repl",
description="Runs code and returns the output of the final line",
args_schema=PythonInputs,
)
tools = [repl, retriever_tool]
agent = OpenAIFunctionsAgent(
llm=ChatOpenAI(temperature=0, model="gpt-4"), prompt=prompt, tools=tools
)
agent_executor = AgentExecutor(
agent=agent, tools=tools, max_iterations=5, early_stopping_method="generate"
) | (lambda x: x["output"])
# Typing for playground inputs
class AgentInputs(BaseModel):
input: str
agent_executor = agent_executor.with_types(input_type=AgentInputs)
🚨Find all “titanic_data” and “titanic.csv”, replace with “wine_data” and “wine_data.csv”:
from pathlib import Path
import pandas as pd
from langchain.agents import AgentExecutor, OpenAIFunctionsAgent
from langchain.chat_models import ChatOpenAI
from langchain.embeddings import OpenAIEmbeddings
from langchain.prompts import ChatPromptTemplate, MessagesPlaceholder
from langchain.pydantic_v1 import BaseModel, Field
from langchain.tools.retriever import create_retriever_tool
from langchain.vectorstores import FAISS
from langchain_experimental.tools import PythonAstREPLTool
MAIN_DIR = Path(__file__).parents[1]
pd.set_option("display.max_rows", 20)
pd.set_option("display.max_columns", 20)
embedding_model = OpenAIEmbeddings()
vectorstore = FAISS.load_local(MAIN_DIR / "wine_data", embedding_model)
retriever_tool = create_retriever_tool(
vectorstore.as_retriever(), "person_name_search", "Search for a person by name"
)
TEMPLATE = """You are working with a pandas dataframe in Python. The name of the dataframe is `df`.
It is important to understand the attributes of the dataframe before working with it. This is the result of running `df.head().to_markdown()`
<df>
{dhead}
</df>
You are not meant to use only these rows to answer questions - they are meant as a way of telling you about the shape and schema of the dataframe.
You also do not have use only the information here to answer questions - you can run intermediate queries to do exporatory data analysis to give you more information as needed.
You have a tool called `person_name_search` through which you can lookup a person by name and find the records corresponding to people with similar name as the query.
You should only really use this if your search term contains a persons name. Otherwise, try to solve it with code.
For example:
<question>How old is Jane?</question>
<logic>Use `person_name_search` since you can use the query `Jane`</logic>
<question>Who has id 320</question>
<logic>Use `python_repl` since even though the question is about a person, you don't know their name so you can't include it.</logic>
""" # noqa: E501
class PythonInputs(BaseModel):
query: str = Field(description="code snippet to run")
df = pd.read_csv(MAIN_DIR / "wine_data.csv")
template = TEMPLATE.format(dhead=df.head().to_markdown())
prompt = ChatPromptTemplate.from_messages(
[
("system", template),
MessagesPlaceholder(variable_name="agent_scratchpad"),
("human", "{input}"),
]
)
repl = PythonAstREPLTool(
locals={"df": df},
name="python_repl",
description="Runs code and returns the output of the final line",
args_schema=PythonInputs,
)
tools = [repl, retriever_tool]
agent = OpenAIFunctionsAgent(
llm=ChatOpenAI(temperature=0, model="gpt-4"), prompt=prompt, tools=tools
)
agent_executor = AgentExecutor(
agent=agent, tools=tools, max_iterations=5, early_stopping_method="generate"
) | (lambda x: x["output"])
# Typing for playground inputs
class AgentInputs(BaseModel):
input: str
agent_executor = agent_executor.with_types(input_type=AgentInputs)
(Optional) You can also modify the content of agent main.py if you want to run it in the command line interface with a script style. Here, I’m just ignoring that and only providing the location of the file.
wine-price-app/packages/csv-agent/main.py
Run the application with new data
🚨The application must be launched from the root directory of the application, here:
wine-price-app/
In terminal to run (cmd+c or ctrl+c to kill the serve)
poetry run langchain serve
The app’s API has been published at http://127.0.0.1:8000/docs
The app’s main entry is located at http://127.0.0.1:8000/csv-agent/playground/
Summary
I really like the concept of the template, not because of its architecture, but because I can use it to learn how to use LangChain’s API itself, such as embedding processing, and so on. It’s a good learning tool.