Content-Based Filtering(Recommendation)

TeeTracker
3 min readJan 28, 2022

Briefing of the content-based recommendation(CBR) is that the algorithm tries to figure out what a user’s favorite aspect of an item, eg. movie, music, book, etc. The algorithm can recommend items that present those aspects.

Advantages

Learns user’s preferences, profile.

Highly personalized for the user.

Disadvantages

Doesn’t take into account what others think of the item, so low quality item recommendations might happen.

Extracting data is not always intuitive.

Determining what characteristics (properties, features) of the item the user dislikes or likes is not always obvious

Doesn’t work well if the characteristics (properties, features) are not included in user’s profile.

Input

In order to get recommendations, the algorithm tries to build a user’s profile based on the user’s content input values.

Item: movie, music or book, any kind of product

The content values can be a numerical description that presents the user’s favorite aspect, eg. like or rating.

In the machine learning world, the profile 👆 is the weight vector or parameter matrix, or weight matrix.

In the real world all items mean the products which have different properties, eg. Movie has different genres like [Adventure, Animation, Children, Comedy, Fantasy ….].

We also call those properties features. In order to layout all sample products clearly we can build a feature matrix.

we use the One Hot Encoding technique to convert the list of features to a vector where each column corresponds to one possible value of the feature.

This encoding is needed for feeding categorical data. In this case, we store every different genre in columns that contain either 1 or 0. 1 shows that a movie has that genre and 0 shows that it doesn’t.

Output

With the input’s profile (weights) and the complete list of sample products and their features in hand, including the items that the user might have given as input:

we’re going to take the weighted summation (or average) of every item based on the input profile and recommend the top twenty items that most satisfy it (with higher scores).

Build Model

Input

# One user content value vector
U := [2, 10, 8]
# A bundle of product feature matrix
X := [[0, 1, 1, 0],
[1, 1, 1, 1],
[1, 0, 1, 0],
]

S.t Build user profile, the weights

# Operation a dot product between U and X to get content weights.W := U dot X
W := norm W into 0~1
W := [0.3, 0.2, 0.33, 0.16]

Output

# A bundle of prodct feature matrix, including the input one
X_hat := [
[1, 1, 0, 1],
[0, 0, 1, 0],
[1, 0, 1, 0],
[0, 1, 1, 0],
[1, 1, 1, 1],
[1, 0, 1, 0],
]
# Multiply the item features by the weights and then take the weighted summation or averagescores := W dot transpose(X_hat)
scores := [0.66, 0.33, 0.63, 0.53, 0.99, 0.63]

Result

scores := sort scores by DESC
scores := [0.99, 0.66, 0.63, 0.63, 0.53, 0.33,]
# The product fitting with the features [1, 1, 1, 1] wins the final recommendation.

That’s all, the whole algorithm of Content-Based Filtering(Recommendation).

--

--