LangChain: Use LLM Agent for World GPU Demand Prediction Report

13 min readMar 11, 2024


Reflecting the analytical methods of the AI era, based on the Agent mechanism and generative AI, intelligently generates necessary information and code.

Try using the Agent mechanism to do market and demand analysis report, making future demand predictions based on early past GPU market demand and comparisons in some regions.

Don’t be scared by the topic, there are very few technical issues involved here. If you’re familiar with LangChain, you just need to know that most of it is about writing prompts.

Through this case study, we can also develop a pattern, providing new methods for data analysis in the AI era.


Just like the article mentioned at the beginning, the report on market demand predictions is based on the comparison data of past market demand and demand in different regions.

We define tools to let the Agent collect data and generate necessary code to organize data and charts, ultimately creating a report paper.

Although there are many tools listed in the code, they can be roughly divided into: free information on the internet(Google, Yahoo, DuckDuckGo, etc.), Arxiv and Wikipedia, including Wikidata. Python tools are used to generate the necessary code to create plots and dataframes. This reflects an element of software engineering in the AI era: “Generative AI generates AI”.

Demand Prediction Analyse

‼️Note, because Agent is an uncontrollable “black box” operation, we completely describe our purpose through the prompt, and all plot and data generation is done by AI, but we don’t have it saved as a local Python file, so each time we execute analyse, the results we get may be completely different. In real projects, we need to collect AI’s charts and data at least 3–5 times, and then do further analysis.

The chart collected next has been basically repeatedly obtained after dozens of calls to the Agent, meaning it’s “stable”. The final report is completely based on the synthesis of results from three different collections.

Early Demand Global Market

We fetch historical early GPU demand from 2010 to the end of 2023. Based on several attempts, the data sources are from Wikipedia, DuckDuckGo Search, and four other external links. We did the most basic RAG on them. The Agent everytime stops as it wants to plot final data and diagram.


Comparing the demand across various regions

We deliberately chose the Americas, Europe, and Asia as the base for comparison. Surprisingly, Google Markets Search and Google Finance Search were dragged into the query queue by the Agent.

What confuses me is that Wikidata doesn’t provide any data.

│   AgentAction(
│ │ │ │ tool='Wikidata Search',
│ │ │ │ tool_input='GPU market demand by region',
│ │ │ │ log='The Wikipedia search did not yield the specific information required for an overview of the demand for GPUs over the past decades between Americas, Europe, and Asia. I will proceed with the next tool in the priority order, which is Wikidata Search.\n\nAction:\n```json\n{\n "action": "Wikidata Search",\n "action_input": "GPU market demand by region"\n}\n```'
│ │ │ ),
│ │ │ 'No good Wikidata Search Result was found'
│ │ ),

Future Demand Global Market

What surprised me even more was that after several prompts, the AI didn’t simply present me with a projected curve from 2024 to 2032 in a monotonous manner, but rather meticulously dissected it by industry categories.

Here, it’s important to emphasize that in this analysis prompt, we deliberately added all the useful intermediate-steps from the previous two analyses, in other words, this query implies the “chain of thought” from the previous queries, I feel like it’s more of a “chain of reasoning”.

-- Early Demand Rate --

-- Compare between different regions: Americas, Europe and Asia --

“Arxiv” is also quite unexpected, becoming the only tool used to search for additional content for “future prediction” as the intermediate_steps of the Agent returned.

'intermediate_steps': [
│ │ (
│ │ │ AgentAction(
│ │ │ │ tool='arxiv',
│ │ │ │ tool_input='global GPU market demand forecast 2023-2033',
│ │ │ │ log='Here is my attempt at estimating global GPU demand over the next decade:\n\nThought: To estimate global GPU demand through 2033, I\'ll need to analyze key market trends and demand drivers across gaming, AI, data centers, and other segments. Searching arxiv and other sources for data and insights.\n\nAction:\n```\n{\n "action": "arxiv",\n "action_input": "global GPU market demand forecast 2023-2033"\n}\n```\n\n'
│ │ │ ),
│ │ │ 'Published: 2018-04-29\nTitle: Application of Probabilistic Graphical Models in Forecasting Crude Oil Price\nAuthors: Danish A. Alvi\nSummary: The dissertation investigates the application of Probabilistic Graphical\nModels (PGMs) in forecasting the price of Crude Oil.\n This research is important because crude oil plays a very pivotal role in the\nglobal economy hence is a very critical macroeconomic indicator of the\nindustrial growth. Given the vast amount of macroeconomic factors affecting the\nprice of crude oil such as supply of oil from OPEC countries, demand of oil\nfrom OECD countries, geopolitical and geoeconomic changes among many other\nvariables - probabilistic graphical models (PGMs) allow us to understand by\nlearning the graphical structure. This dissertation proposes condensing data\nnumerous Crude Oil factors into a graphical model in the attempt of creating a\naccurate forecast of the price of crude oil.\n The research project experiments with using different libraries in Python in\norder to construct models of the crude oil market. The experiments in this\nthesis investigate three main challenges commonly presented while trading oil\nin the financial markets. The first challenge it investigates is the process of\nlearning the structure of the oil markets; thus allowing crude oil traders to\nunderstand the different physical market factors and macroeconomic indicators\naffecting crude oil markets and how they are \\textit{causally} related. The\nsecond challenge it solves is the exploration and exploitation of the available\ndata and the learnt structure in predicting the behaviour of the oil markets.\nThe third challenge it investigates is how to validate the performance and\nreliability of the constructed model in order for it to be deployed in the\nfinancial markets.\n A design and implementation of a probabilistic framework for forecasting the\nprice of crude oil is also presented as part of the research.\n\nPublished: 2018-05-02\nTitle: Utilizing Device-level Demand Forecasting for Flexibility Markets - Full Version\nAuthors: Bijay Neupane, Torben Bach Pedersen, Bo Thiesson\nSummary: The uncertainty in the power supply due to fluctuating Renewable Energy\nSources (RES) has severe (financial and other) implications for energy market\nplayers. In this paper, we present a device-level Demand Response (DR) scheme\nthat captures the atomic (all available) flexibilities in energy demand and\nprovides the largest possible solution space to generate demand/supply\nschedules that minimize market imbalances. We evaluate the effectiveness and\nfeasibility of widely used forecasting models for device-level flexibility\nanalysis. In a typical device-level flexibility forecast, a market player is\nmore concerned with the \\textit{utility} that the demand flexibility brings to\nthe market, rather than the intrinsic forecast accuracy. In this regard, we\nprovide comprehensive predictive modeling and scheduling of demand flexibility\nfrom household appliances to demonstrate the (financial and otherwise)\nviability of introducing flexibility-based DR in the Danish/Nordic market.\nFurther, we investigate the correlation between the potential utility and the\naccuracy of the demand forecast model. Furthermore, we perform a number of\nexperiments to determine the data granularity that provides the best financial\nreward to market players for adopting the proposed DR scheme. A cost-benefit\nanalysis of forecast results shows that even with somewhat low forecast\naccuracy, market players can achieve regulation cost savings of 54% of the\ntheoretically optimal.\n\nPublished: 2023-09-27\nTitle: A City-centric Approach to Estimate and Evaluate Global Urban Air Mobility Demand\nAuthors: Lukas Asmer, Roman Jaksche, Henry Pak, Petra Kokus\nSummary: Urban Air Mobility (UAM) is expected to effectively complement the existing\ntransportation system by providing fast and safe travel options, contributing\nto decarbonization, and providing benefits to citizens and communities. A\npreliminary es'
│ │ ),
│ │ (
│ │ │ AgentAction(
│ │ │ │ tool='python-tool',
│ │ │ │ tool_input="import pandas as pd\nimport matplotlib.pyplot as plt\n\n# Estimated global GPU demand in millions of units\ndata = {\n 'Year': [2024, 2025, 2026, 2027, 2028, 2029, 2030, 2031, 2032, 2033],\n 'Gaming': [60, 66, 72, 79, 86, 94, 103, 112, 122, 133], \n 'Data Center': [25, 30, 36, 44, 52, 62, 75, 90, 108, 129],\n 'Professional Visualization': [6, 7, 8, 9, 10, 11, 12, 14, 16, 18],\n 'Automotive': [10, 12, 14, 17, 20, 24, 29, 35, 42, 50],\n 'Other': [5, 5, 6, 6, 7, 7, 8, 9, 10, 11]\n}\n\ndf = pd.DataFrame(data)\ndf['Total'] = df.sum(axis=1)\n\nprint(df)\n\nax = df.plot(x='Year', y=['Gaming', 'Data Center', 'Professional Visualization', 'Automotive', 'Other'], \n kind='area', stacked=True, alpha=0.5, title='Estimated Global GPU Demand')\nax.set_ylabel('GPU Demand (Millions)')\nax.set_xlabel('Year')\\n",
│ │ │ │ log='Here is my analysis of the global GPU market and demand forecast for 2024-2033:\n\nAction:\n```\n{\n "action": "python-tool",\n "action_input": "import pandas as pd\nimport matplotlib.pyplot as plt\n\n# Estimated global GPU demand in millions of units\ndata = {\n \'Year\': [2024, 2025, 2026, 2027, 2028, 2029, 2030, 2031, 2032, 2033],\n \'Gaming\': [60, 66, 72, 79, 86, 94, 103, 112, 122, 133], \n \'Data Center\': [25, 30, 36, 44, 52, 62, 75, 90, 108, 129],\n \'Professional Visualization\': [6, 7, 8, 9, 10, 11, 12, 14, 16, 18],\n \'Automotive\': [10, 12, 14, 17, 20, 24, 29, 35, 42, 50],\n \'Other\': [5, 5, 6, 6, 7, 7, 8, 9, 10, 11]\n}\n\ndf = pd.DataFrame(data)\ndf[\'Total\'] = df.sum(axis=1)\n\nprint(df)\n\nax = df.plot(x=\'Year\', y=[\'Gaming\', \'Data Center\', \'Professional Visualization\', \'Automotive\', \'Other\'], \n kind=\'area\', stacked=True, alpha=0.5, title=\'Estimated Global GPU Demand\')\nax.set_ylabel(\'GPU Demand (Millions)\')\nax.set_xlabel(\'Year\')\\n```\n\n'
│ │ │ ),
│ │ │ ' Year Gaming Data Center Professional Visualization Automotive Other \n0 2024 60 25 6 10 5 \\\n1 2025 66 30 7 12 5 \n2 2026 72 36 8 14 6 \n3 2027 79 44 9 17 6 \n4 2028 86 52 10 20 7 \n5 2029 94 62 11 24 7 \n6 2030 103 75 12 29 8 \n7 2031 112 90 14 35 9 \n8 2032 122 108 16 42 10 \n9 2033 133 129 18 50 11 \n\n Total \n0 2130 \n1 2145 \n2 2162 \n3 2182 \n4 2203 \n5 2227 \n6 2257 \n7 2291 \n8 2330 \n9 2374 \n'
│ │ )
│ ]

💡 I’m thinking, can we add intermediate-steps of “Early Demand Global Market” as part of the context of prompting for the section on “Comparing the demand across various regions”?

If the reader was interested in it, just try 👍

Also, this concept that uses the early(previous) Query, Reponse is common in RAG, it’s kind of prio-reasoning concept, you can gradually narrow down the scope of context to make the results more accurate. Find relative work as In-Context Learning, Prio-reasoning, or decomposition: multi-query pattern and retrieval.

Additionally here’s a class called Llama-Index, which expresses a similar meaning:

Final report paper

The prompt to generate the report uses all the intermediate-steps from the previous three steps. It’s the whole “chain of thought” (or “chain of reasoning”) from the first 3 steps.


  • The AgentExecutor of LangChain does provide a switch, so called return_intermediate_steps, so for this case, we need to turn it on.
  • After each Agent has finished invoking, we get the intermediate-steps from each execution of the Agent. We need to do some technical work, like filtering out useless information.
all_intermediate_steps = list(filter(lambda x: x != "" and x != "Invalid or incomplete response" and "No good" not in x and "ValueError" not in x, all_intermediate_steps))

So all intermediate-steps do not contain information like “empty”, “invalid return” or even worse.

The final output of report paper will be saved as markdown. What’s interesting is that I divided the two steps to generate the final result.

The first step is for markdown output and the second is for persistence. I have tried to merge both into one prompt; however, the real world overwhelms me so much. The Agent, which is the “black box”, always cannot finish the result and do file persistence continually. Therefore, I split it into two parts.

The following sections provide a technical overview. Please refer to it for assistance in understanding the entire workout. Here is the link to the notebook:

Model Selection

  • Agent Model: GPT 4 (vision) or Claude-3-opus

The main purpose is to be used in Agent calling various tools. This case study involves many tools and requires a large amount of data, so using the latest claude-3 or gpt4 is quite suitable.

I don’t have any other options here, so far the only ones with good token throughput are these. I can also recommend Mistral, but I haven’t tried it, so I won’t write about it. Also, make sure to estimate the expenses properly.

Those stronger models are for the Agent and final report paper generation.

from langchain_anthropic import ChatAnthropic
agent_model = ChatAnthropic(model='claude-3-opus-20240229', temperature=0.)

from langchain_openai import ChatOpenAI
agent_model = ChatOpenAI(
model="gpt-4-vision-preview", temperature=0, max_tokens=1024 * 2

For the final report paper, because it is generated based on all previous intermediate steps, so choose a better model and a decent temperature.

  • Retrieval Model:

In the case study, there’s this RAG. To extract information from a bunch of URLs faster and cheaper, I went with Groq.

from langchain_groq import ChatGroq
retrieval_model = ChatGroq(model="llama2-70b-4096", temperature=0)

from langchain_groq import ChatGroq
retrieval_model = ChatGroq(model="mixtral-8x7b-32768", temperature=0)
  • Local Model:

There might be some really small tasks that just need a local model, like “translation,” so I used Ollama to host Google’s latest Gemma.

from langchain_community.llms.ollama import Ollama
local_model = Ollama(


With the increasing prevalence of large models being released and the growing need for large models in (AI) software development, it is essential to carefully consider various aspects such as the origins, purposes, and costs of these models. These issues can no longer be ignored and one should not simply use OpenAI’s or a particular vendor’s model indiscriminately, but rather choose the model based on specific requirements.

Tool Definition

Most tools out there use predefined language translation tools, but here I introduce two tools that I deliberately defined. One is a language translation tool, as mentioned earlier, I deployed Gemma locally, low consumption and reliable.

class TranslationTool(BaseModel):
string: str = Field(description="The origin text before translation.")
language: str = Field(description="The language to translate to.")

translate_template = ChatPromptTemplate.from_messages(
"""You're an translator to translate string to {language}.
ONLY return the result.
NO any insturction text.
NO other information.
Keey the result as simple as possible.

@tool("translation-tool", args_schema=TranslationTool, return_direct=True)
def translation_tool(string: str, language: str) -> str:
"""Translate a string to the given language, only return the result."""
return (translate_template | local_model | StrOutputParser()).invoke(
"string": string,
"language": language,


This is a very basic RAG, it just reads a series of URLs, then extracts information from them, the vector store is FAISS, and the model is based on Groq’s llama2–70b-4096 or mixtral-8x7b-32768.

load_urls_prompt_template = PromptTemplate.from_template(
"""Reponse the query inside [query] based on the context inside [context]:



Only return the answer without any instruction text or additional information.
Keep the result as simple as possible."""

@tool("load-urls-tool", args_schema=LoadUrlsTool, return_direct=False)
def load_urls_tool(query: str, urls: List[str]) -> str:
"""Load the content of the given Urls and return the query result."""
loader = WebBaseLoader(urls)
docs = loader.load()
splitter = RecursiveCharacterTextSplitter.from_tiktoken_encoder(
chunk_size=1000, chunk_overlap=0
chunks = splitter.split_documents(docs)
db = FAISS.from_documents(chunks, OpenAIEmbeddings())
retriever = db.as_retriever()

chain = (
{"context": retriever, "query": RunnablePassthrough()}
| load_urls_prompt_template
| retrieval_model
| StrOutputParser()
res = chain.invoke(query)
return res

Tool list

Out of the translation tool now, including the load-urls-tool defined by the following Python expression, all “classes” dependencies and usage test methods can be found directly at the following link.

Tools | 🦜️🔗 Langchain

tools = [
name="Google Markets Search",
"engine": "google_finance_markets",
"google_domain": "",
"gl": "us",
"hl": "en",
"trend": "indexes",
description="Use google markets search to find the information from the web, almost up-to-date.",
name="Google Finance Search",
"engine": "google_finance",
"google_domain": "",
"gl": "us",
"hl": "en",
description="Use google finance search to find the information from the web, almost up-to-date.",
name="Google Search",
description="Search Google for recent results.",
name="Yahoo Finance News",
description="Use to search for the news from Yahoo Finance.",
name="DuckDuckGo Search",
description="Use to search for the information from DuckDuckGo.",
name="Wikipedia Search",
description="Use to search for the information from Wikipedia.",
name="Wikidata Search",
description="Use to search for the information from Wikidata.",
func=PythonREPLTool(name="python-tool", return_direct=True).run,
description="General Python REPL tool according to demand.",


Base Prompt

If you notice, by default, the Agent is quite a “black box”, we can’t control which tool will be used, can’t directly give priority, so I deliberately defined a base prompt to guide the model to call tools in a certain order as much as possible, and divide them into 2 batches.

base_prompt = """When there is external link(s) to open, either you like or not, avoid ignoring useful information, 
use the tool 'load-urls-tool' to load URLs carrying with the user input.

For searching, use the tools by the following priority order:

- Wikipedia Search
- Wikidata Search
- DuckDuckGo Search
- Google Search

When the above tools are not available, use the following tools:

- Google Markets Search
- Google Finance Search

'arxiv', the tool is for the academic papers to solidify the estimation if necessary."""

First batch of agent tools

I deliberately arranged it this way for cost considerations, and also because it’s within the scope of this business. Since we need to query historical data (about GPU market demand 2010–2023), Wikipedia and Wikidata are the low-cost first choices. If we expand to the entire web, my choice would be to start with the zero-cost DuckDuckGo and then the low-cost Google Programmable Search Engine .

Second batch

As the code states, when the first batch of tools is ineffective, the second batch of tools will be considered. Here recommend:

Google Finance Markets API — SerpApi

For trial users, the request limit is 100 per month.


For more details, refer to the notebook. I am not focusing on statistical data analysis or deep research of data mining techniques. I aim to create a new approach before starting statistical analysis. Use AI for preprocessing and initial analysis. Use a specific approach for client data to perform tasks such as checking skewness, data correlation, and other EDA tasks. Use the python-tool to generate relevant diagrams and get an initial overview of the available data.