LLM Training: In-Context Learning
The most basic and simple model training method is just like having a chat.
At first, I didn’t think of it as “training,” but after using ChatGPT and Bard many times, I found that as long as you give the model a certain context and keep the historical records within a controllable range, it can make some simple “predictions” and “inferences.”
Note that when I use quotes here, the “reasoning” I’m talking about is very overfitting. If the context doesn’t mention it, the model often “talks nonsense” in a serious manner, which is “Hallucination”.
Use case
This text and codelab is based on the wine price dataset.https://github.com/XinyueZ/llm-fine-tune-wine-price/blob/master/data/wine_data.csv
The CSV file contains the following columns:
country, description, designation, points, price, province, region_1, region_2, variety, winery
There’s no need to explain their meaning too much.
Let the model give the price of the wine.
As mentioned earlier, this is not a real prediction, but rather giving the model enough context, similar to KNN. The model carries a large amount of data and provides “what the model knows” based on this data, which is relatively overfitting.
We try 2 kinds of inferences
- Use the wine data that the model has processed. This should remain stable when the
temperature
is relatively low, as the model has limited creativity. - On the contrary, we use a model with information about wine that it has never seen before, but we give
temperature
relative a large value. The result will be quite “unexpected”.
Train model
First off, we need to tell the model that:
As an assistant, you’ve to give the price of the wine based on the provided wine information.
This is similar to a “system” message.
Second, roughly, there are three types of so-called In-Context Learning to train the model.
- zero-shot
- one-shot
- few-shot
zero-shot
Give the “prompt”, and ask the model directly, the model usually responds incoherently, without any basis, only based on the pre-trained state.
For example I directly provide a red wine’s place of production and description, hoping the model can give the price, this result is very “unreliable”.
one-shot
We give the model an “example” so it knows what we’re trying to do.
We provide a description of a wine, its origin, then we ask how much this wine costs, and finally we give model the price. Lastly, we insert the information we need to inquire about the wine (description and origin, etc.), and then ask the model for the price of this wine.
few-shot
Continuing from “one-shot”, we provide multiple examples to let the model know more information. As mentioned several times earlier, the model will overfit, so it can very “stably” answer the wine information it has seen before and give the appropriate price.
Summary
In-context learning is not a traditional concept in machine learning, but more of a way for the general public to use a code implementation similar to ChatGPT. It can be understood as:
Carrying a bunch of chat history and fixed template responses, then expecting the model to give a templated answer on the last question.
This concept is very similar to KNN, where the model itself carries the “data”.
Before fine-tuning the model, you can use this method to warm up and experience the model, maybe the project doesn’t even need fine-tuning at all.
Colab
In colab there are tries for one-shot and few-shot
Thanks for reading