Experience with the LangChain App Template

TeeTracker
8 min readDec 2, 2023

--

Create wine price Q&A with the LangChain App Template

We will utilize LangChain’s template to create a basic Q&A application, and utilize LLM to obtain price references for wine.

Install LangChain CLI

pip install -U langchain-cli
In terminal run: langchain

OpenAI Key

To use OpenAI by default, global setup is required.

export OPENAI_API_KEY

App creation

In the terminal to create a new application, run:

langchain app new wine-price-app
Ask to add a template

Add template

There are various methods for adding templates. When you run the langchain app new, you may be prompted directly. Alternatively, you can skip it and add the templates later, to run like:

langchain app add csv-agent

We use the csv-agent template to request information from a CSV file. Check the template package here.

The template uses passenger information from the Titanic incident as a source of information. After we finish building the application, we can query relevant information from this document without using our own data.

Route for FastAPI

After adding the template (takes < 1 min), you can see the following codes which should be added into a server.py.

The template-based app is under the hood of FastAPI.

Copy the code and we will use them later.

from csv_agent import agent_executor as csv_agent_chain

add_routes(app, csv_agent_chain, path="/csv-agent")

Poetry on Dependencies

Once an application has been created, you can find something similar under the project directory (wine-price-app):

You’ve seen the pyproject.toml, so you know it means we’ll use poetry to manage dependencies.

When addressing the Python version issue, you should follow these steps:

1. Update the Python version, as some templates require a specific version.
2. Force the change of the Python property in the pyproject.toml file.

Run: conda install python=3.11 to solve (maybe)
Force to change

Run the App

🚨The application must be launched from the root directory of the application, here:

wine-price-app/

Edit server.py

Before we run the application, we must change the app/server.py, as you can see it is a typical FastAPI application.

from fastapi import FastAPI
from fastapi.responses import RedirectResponse
from langserve import add_routes

app = FastAPI()


@app.get("/")
async def redirect_root_to_docs():
return RedirectResponse("/docs")


# Edit this to add the chain you want to add
add_routes(app, NotImplemented)

if __name__ == "__main__":
import uvicorn

uvicorn.run(app, host="0.0.0.0", port=8000)

Change the code below the comment:

# Edit this to add the chain you want to add

from fastapi import FastAPI
from fastapi.responses import RedirectResponse
from langserve import add_routes

app = FastAPI()


@app.get("/")
async def redirect_root_to_docs():
return RedirectResponse("/docs")

from csv_agent import agent_executor as csv_agent_chain

add_routes(app, csv_agent_chain, path="/csv-agent")

if __name__ == "__main__":
import uvicorn

uvicorn.run(app, host="0.0.0.0", port=8000)

Remember the code we copied previously?

Start server

In terminal to run (cmd+c or ctrl+c to kill the serve)

poetry run langchain serve

The app’s API has been published at http://127.0.0.1:8000/docs

The app’s main entry is located at http://127.0.0.1:8000/csv-agent/playground/

Utilize our own data

As we mentioned earlier, we have only been using the default data provided by the template so far. This is passenger information for the Titanic incident. We can ask related questions, such as: what is the sex of Braund, Mr. Owen Harris? And then we can get the relevant answer.

We need to modify the items under the application packages directory for our own data, in our case:

wine-price-app/packages/csv-agent

🚨First and foremost, we utilize the data source here. https://github.com/XinyueZ/llm-fine-tune-wine-price/blob/master/data/wine_data.csv

The CSV file contains the following columns:

country, description, designation, points, price, province, region_1, region_2, variety, winery

I think there’s no need to explain their meaning too much.

Create data based on this CSV

Before we do let’s check the official document. According to the official document, we must reuse the ingest.py script to convert the CSV into a vectorstore.

In the wine-price-app/packages/csv-agent directory, we can observe:

After retrieving the data source, we can observe:

wine_data.csv

Edit the ingest.py file for our specific data. Let’s review the original source:

from langchain.document_loaders import CSVLoader
from langchain.indexes import VectorstoreIndexCreator
from langchain.vectorstores import FAISS

loader = CSVLoader("/Users/harrisonchase/Downloads/titanic.csv")

docs = loader.load()
index_creator = VectorstoreIndexCreator(vectorstore_cls=FAISS)

index = index_creator.from_documents(docs)

index.vectorstore.save_local("titanic_data")

It’s obvious that what we need to change is the loader and save_local:

from langchain.document_loaders import CSVLoader
from langchain.indexes import VectorstoreIndexCreator
from langchain.vectorstores import FAISS

loader = CSVLoader("wine_data.csv")

docs = loader.load()
index_creator = VectorstoreIndexCreator(vectorstore_cls=FAISS)

index = index_creator.from_documents(docs)

index.vectorstore.save_local("wine_data")

Run: python ingest.py, and then we can proceed to convert the data. Due to the large size of the data source, it will take some time to complete.

python ingest.py

Connect to new data

If you want to access/connect to our data, we also need to modify agent.py:

wine-price-app/packages/csv-agent/csv_agent/agent.py

Yes, there is also a csv_agent directory, and its agent.py is our target:

from pathlib import Path

import pandas as pd
from langchain.agents import AgentExecutor, OpenAIFunctionsAgent
from langchain.chat_models import ChatOpenAI
from langchain.embeddings import OpenAIEmbeddings
from langchain.prompts import ChatPromptTemplate, MessagesPlaceholder
from langchain.pydantic_v1 import BaseModel, Field
from langchain.tools.retriever import create_retriever_tool
from langchain.vectorstores import FAISS
from langchain_experimental.tools import PythonAstREPLTool

MAIN_DIR = Path(__file__).parents[1]

pd.set_option("display.max_rows", 20)
pd.set_option("display.max_columns", 20)

embedding_model = OpenAIEmbeddings()
vectorstore = FAISS.load_local(MAIN_DIR / "titanic_data", embedding_model)
retriever_tool = create_retriever_tool(
vectorstore.as_retriever(), "person_name_search", "Search for a person by name"
)


TEMPLATE = """You are working with a pandas dataframe in Python. The name of the dataframe is `df`.
It is important to understand the attributes of the dataframe before working with it. This is the result of running `df.head().to_markdown()`

<df>
{dhead}
</df>

You are not meant to use only these rows to answer questions - they are meant as a way of telling you about the shape and schema of the dataframe.
You also do not have use only the information here to answer questions - you can run intermediate queries to do exporatory data analysis to give you more information as needed.

You have a tool called `person_name_search` through which you can lookup a person by name and find the records corresponding to people with similar name as the query.
You should only really use this if your search term contains a persons name. Otherwise, try to solve it with code.

For example:

<question>How old is Jane?</question>
<logic>Use `person_name_search` since you can use the query `Jane`</logic>

<question>Who has id 320</question>
<logic>Use `python_repl` since even though the question is about a person, you don't know their name so you can't include it.</logic>
""" # noqa: E501


class PythonInputs(BaseModel):
query: str = Field(description="code snippet to run")


df = pd.read_csv(MAIN_DIR / "titanic.csv")
template = TEMPLATE.format(dhead=df.head().to_markdown())

prompt = ChatPromptTemplate.from_messages(
[
("system", template),
MessagesPlaceholder(variable_name="agent_scratchpad"),
("human", "{input}"),
]
)

repl = PythonAstREPLTool(
locals={"df": df},
name="python_repl",
description="Runs code and returns the output of the final line",
args_schema=PythonInputs,
)
tools = [repl, retriever_tool]
agent = OpenAIFunctionsAgent(
llm=ChatOpenAI(temperature=0, model="gpt-4"), prompt=prompt, tools=tools
)
agent_executor = AgentExecutor(
agent=agent, tools=tools, max_iterations=5, early_stopping_method="generate"
) | (lambda x: x["output"])

# Typing for playground inputs


class AgentInputs(BaseModel):
input: str


agent_executor = agent_executor.with_types(input_type=AgentInputs)

🚨Find all “titanic_data” and “titanic.csv”, replace with “wine_data” and “wine_data.csv”:

from pathlib import Path

import pandas as pd
from langchain.agents import AgentExecutor, OpenAIFunctionsAgent
from langchain.chat_models import ChatOpenAI
from langchain.embeddings import OpenAIEmbeddings
from langchain.prompts import ChatPromptTemplate, MessagesPlaceholder
from langchain.pydantic_v1 import BaseModel, Field
from langchain.tools.retriever import create_retriever_tool
from langchain.vectorstores import FAISS
from langchain_experimental.tools import PythonAstREPLTool

MAIN_DIR = Path(__file__).parents[1]

pd.set_option("display.max_rows", 20)
pd.set_option("display.max_columns", 20)

embedding_model = OpenAIEmbeddings()
vectorstore = FAISS.load_local(MAIN_DIR / "wine_data", embedding_model)
retriever_tool = create_retriever_tool(
vectorstore.as_retriever(), "person_name_search", "Search for a person by name"
)


TEMPLATE = """You are working with a pandas dataframe in Python. The name of the dataframe is `df`.
It is important to understand the attributes of the dataframe before working with it. This is the result of running `df.head().to_markdown()`

<df>
{dhead}
</df>

You are not meant to use only these rows to answer questions - they are meant as a way of telling you about the shape and schema of the dataframe.
You also do not have use only the information here to answer questions - you can run intermediate queries to do exporatory data analysis to give you more information as needed.

You have a tool called `person_name_search` through which you can lookup a person by name and find the records corresponding to people with similar name as the query.
You should only really use this if your search term contains a persons name. Otherwise, try to solve it with code.

For example:

<question>How old is Jane?</question>
<logic>Use `person_name_search` since you can use the query `Jane`</logic>

<question>Who has id 320</question>
<logic>Use `python_repl` since even though the question is about a person, you don't know their name so you can't include it.</logic>
""" # noqa: E501


class PythonInputs(BaseModel):
query: str = Field(description="code snippet to run")


df = pd.read_csv(MAIN_DIR / "wine_data.csv")
template = TEMPLATE.format(dhead=df.head().to_markdown())

prompt = ChatPromptTemplate.from_messages(
[
("system", template),
MessagesPlaceholder(variable_name="agent_scratchpad"),
("human", "{input}"),
]
)

repl = PythonAstREPLTool(
locals={"df": df},
name="python_repl",
description="Runs code and returns the output of the final line",
args_schema=PythonInputs,
)
tools = [repl, retriever_tool]
agent = OpenAIFunctionsAgent(
llm=ChatOpenAI(temperature=0, model="gpt-4"), prompt=prompt, tools=tools
)
agent_executor = AgentExecutor(
agent=agent, tools=tools, max_iterations=5, early_stopping_method="generate"
) | (lambda x: x["output"])

# Typing for playground inputs


class AgentInputs(BaseModel):
input: str


agent_executor = agent_executor.with_types(input_type=AgentInputs)

(Optional) You can also modify the content of agent main.py if you want to run it in the command line interface with a script style. Here, I’m just ignoring that and only providing the location of the file.

wine-price-app/packages/csv-agent/main.py

Run the application with new data

🚨The application must be launched from the root directory of the application, here:

wine-price-app/

In terminal to run (cmd+c or ctrl+c to kill the serve)

poetry run langchain serve

The app’s API has been published at http://127.0.0.1:8000/docs

The app’s main entry is located at http://127.0.0.1:8000/csv-agent/playground/

Summary

I really like the concept of the template, not because of its architecture, but because I can use it to learn how to use LangChain’s API itself, such as embedding processing, and so on. It’s a good learning tool.

--

--

TeeTracker
TeeTracker

Responses (1)