Content-Based Filtering(Recommendation)
Briefing of the content-based recommendation(CBR) is that the algorithm tries to figure out what a user’s favorite aspect of an item, eg. movie, music, book, etc. The algorithm can recommend items that present those aspects.
Advantages
Learns user’s preferences, profile.
Highly personalized for the user.
Disadvantages
Doesn’t take into account what others think of the item, so low quality item recommendations might happen.
Extracting data is not always intuitive.
Determining what characteristics (properties, features) of the item the user dislikes or likes is not always obvious
Doesn’t work well if the characteristics (properties, features) are not included in user’s profile.
Input
In order to get recommendations, the algorithm tries to build a user’s profile based on the user’s content input values.
The content values can be a numerical description that presents the user’s favorite aspect, eg. like or rating.
In the machine learning world, the profile 👆 is the weight vector or parameter matrix, or weight matrix.
In the real world all items mean the products which have different properties, eg. Movie has different genres like [Adventure, Animation, Children, Comedy, Fantasy ….].
We also call those properties features. In order to layout all sample products clearly we can build a feature matrix.
we use the One Hot Encoding technique to convert the list of features to a vector where each column corresponds to one possible value of the feature.
This encoding is needed for feeding categorical data. In this case, we store every different genre in columns that contain either 1 or 0. 1 shows that a movie has that genre and 0 shows that it doesn’t.
Output
With the input’s profile (weights) and the complete list of sample products and their features in hand, including the items that the user might have given as input:
we’re going to take the weighted summation (or average) of every item based on the input profile and recommend the top twenty items that most satisfy it (with higher scores).
Build Model
Input
# One user content value vector
U := [2, 10, 8]# A bundle of product feature matrix
X := [[0, 1, 1, 0],
[1, 1, 1, 1],
[1, 0, 1, 0],
]
S.t Build user profile, the weights
# Operation a dot product between U and X to get content weights.W := U dot X
W := norm W into 0~1
W := [0.3, 0.2, 0.33, 0.16]
Output
# A bundle of prodct feature matrix, including the input one
X_hat := [
[1, 1, 0, 1],
[0, 0, 1, 0],
[1, 0, 1, 0],
[0, 1, 1, 0],
[1, 1, 1, 1],
[1, 0, 1, 0],
]# Multiply the item features by the weights and then take the weighted summation or averagescores := W dot transpose(X_hat)
scores := [0.66, 0.33, 0.63, 0.53, 0.99, 0.63]
Result
scores := sort scores by DESC
scores := [0.99, 0.66, 0.63, 0.63, 0.53, 0.33,]# The product fitting with the features [1, 1, 1, 1] wins the final recommendation.
That’s all, the whole algorithm of Content-Based Filtering(Recommendation).