Content based user profile in recommendation system

7 min readNov 13, 2022


Math base, linear combination

> don't like, just ignore this and go to the bottom of this article and read from bottom to top.

Yes, we need to start topic with linear combination 😅, a pity, but definitely not an in-depth explanation, just a quote:

《Deep Learning》Ian Goodfellow Yoshua Bengio Aaron Courville

Chapter 2 Linear Algebra

we can think of the columns of A as specifying different directions we can travel from the origin (the point specified by the vector of all zeros), and determine how many ways there are of reaching b. In this view, each element of x specifies how far we should travel in each of these directions, with xi specifying how far to move in the direction of column i:

In general, this kind of operation is called a linear combination. Formally, a linear combination of some set of vectors {v (1) , . . . , v(n)} is given by multiplying each vector v(i) by a corresponding scalar coefficient and adding the results:


🧐 My explanation and understanding

Vector is a scaler + direction: Size or sometimes call length and a number representing “value” of it. Also direction can be projected to different axis, ie: x, y in x-y coordination system. A vector can be kept unchanging via “just moving action” in the coordination system.

Think of that b (green point) which is the destination we hope to reach. Here we have to points.

Think of that the approximate final reached position after performing algorithm (yellow point).

We have vertical and horizontal ways, every point has its own directions, x00, x10 for horizontal , x01, x11 for vertical. What clear is that they are all vectors.

Compute according to the concept above we get so called linear combination, a vector with the length based on the equation defined. Here we have two combinations for two goals.

Don’t get me wrong, the main point of this article has little to do with these 2 formulas and only a little “borrowing”.

Assume that we have 3 coefficients based space and we have two vector:

v_0 = [3,4,1] 
v_1 = [1,3,4]
x = [1,

According to the phrase above, we have 3 ways to reach b, the 3 elements or coefficient determine how far. Very abstract, just listen to it, don’t take it as a must read.

According to equations:

Ax := [3*1, 4*0, 1*1,
1*1, 3*0, 4*1]
Ax := [3*1 + 4*0 + 1*1,
1*1 + 3*0 + 4*1]
Ax = [4,

According to the phrase above, the linear combination of v_0 is 4 and for v_1 is 5.

Build content based user profile

Back on track, tell us how to create a user-specific profile for an item.

Assume that we have two lectures:

Each column means a topic that the lecture involves, 1 for yes, 0 for no.

Assume that we have two audience members, who have given ratings (actually any behavior is ok, for example “clicked in”, “checked in”….).

User profile table

Let’s make them in numpy that possible all reader can understand easily.

# members(users)
u0 = np.array([[3, 0]]) # no rating for another
u1 = np.array([[0, 2]]) # no rating for another

# Lectures, three topics for each lecture.
L = np.array([[1,0,1],
u0_weights = np.matmul(u0, L)
# u0_weights := array([[3, 0, 3]])

u1_weights = np.matmul(u1, L)
# u1_weights := array([[2, 2, 0]])

# df <- u0_weights && u1_weights
The last lines to a Dataframe of pandas

u0_weights or u1_weights is so called user profile vectors, the output after converting to Dataframe is then the profile matrix.

🧐 Link to the math

W = U.C

# Suppose that we have one user U, for 4 lectures rated.
U.shape := 1 X 4
# Suppose that we have for those lectures only one topic.
C.shape := 4 x 1
# Result out is the preference of topic based in those lectures.
W.shape := 1

# When we have 4 topics:
C.shape := 4 x 4
# Result, 1-column vecto, the preference of 4 topics.
W.shape: = 1 x 4

The destination or goal is to find the preference topic. In order to get the value of it there are different ways to support it, here the ways are the different lectures.

Assume that we have only 1 topic and 4 lectures (code above), naturally we then have a 4-dim vector( 4 x 1), each element of the vector is the factor of how-far of the corresponding way.

The result of the combination is a 1-dim vector (1 x 4).

In practice, it is not necessary to think in such a complicated way, just let the columns of the user matrix be represented as the rating of a certain lecture, and the matrix of the topic, each row being a lecture and each column representing a topic. let both do dot op, and the resulting matrix is a user’s profile per row.

Make a recommendation

Assume that we have now new lecture:

Recall that we have a profile matrix already:

two users
one_lecture    := [1, 0, 0]
user_0_profile := [3,
(one_lecture dot user_0_profile) .shape := 1 x 1 
# a scaler, on user's quantity of interest in one lecture.

For two users

one_lecture   := [1, 0, 0]
users_profile := [[3,2],

(one_lecture dot users_profile) .shape := 1 x 2
# tow users's quantity of interest in one lecture.

For two users and three lectures, we have then the score matrix.

three_lectures:= [[1, 0, 0],
[0, 1, 0],
[0, 0, 1]]
users_profile := [[3,2],

(one_lecture dot users_profile) .shape := 3 x 2
# tow users's preferences of three lectures.
# one row represents on lecture, every column represents a user's preference.

🧐 Link to the math with this example

The goal is to find the potential quantity of user’s interest in one lecture. In order to do this we need different ways, here means different topics. In the code above, each lecture has three topics.

Assume that one user has his profile vector, a 3-dim vector representing preference of 3 topics. Each element of the vector is the factor of how-far of the way(topic) to the destination.

After the linear combination, we find out the user’s interest in different lecture.


Again, we call this the score matrix. Each number represents the quantity of interest in the lecture by specific user.


  • Actually we recap the matrix dot-calculation with a column vector from the point of view of “row-wise”. Generally we calculate via the “column-wise”. However, with thinking of linear combination is better, because some knowledge of ML or DL like embedding or PCA, NMF, even the autoencoder can be easily explained on the base of vector decomposition and combination.
  • All things are linear combination, and let the math behind:
  1. Find user and item(lecture) matrix, row represents user, column represents behavior (rating) quantity, this table can also be called user profile. Don’t get confused, we’re looking for profile matrices or profile vectors, which have names in common, but are not the same thing.
  2. Find item(lecture) matrix, row represents item, column represents different attributes(topic). Use a scalar to indicate the corresponding quantity, too. This example is just a bool, 1, 0, representing whether or not the topic content is involved, which could actually be any value, for different business environments.
  3. Do multiplication of two matrices (dot op), get profile matrix.
  4. Find unseen items, the unknown item-matrix.
  5. Transpose the profile matrix, depends on how we get profile matrix.
  6. Do multiplication of two matrices (dot op), get the score matrix.


For this kind of recommendation we have to know the information of each lecture, i.e. whether the lecture involves in topic (1 or 0, maybe other design). It is quite natural that without this information we cannot figure out the quantity of user’s interest in different topic.

The question is, can we use the a batch of the history of user’s interaction user X lecture to estimate quantity of interest or rating? Or in other words, can we get the “topics” that the lecture involves and preference of those “topics” implicitly?




AI advocate // Computer vision // NeRF // Machine learning // Deep learning // Certificated Tensorflow Developer