Multitask Training for Recommender Systems

Description

In this project, I developed a multitask recommender system to predict user-movie interactions and potential movie scores. The system was crafted by implementing a hybrid of Matrix Factorization and Neural Collaborative Filtering algorithms. I extended the renowned BellKor solution to the Netflix Grand Prize challenge, utilizing a multi-task neural network architecture to cater to two primary tasks.

The first task aimed at predicting whether a user would watch a given movie, and the second task focused on estimating the score a user would assign to a movie. I used the MovieLens dataset comprising 100K reviews of 1700 movies generated by 1000 users to train and evaluate the model. The dataset was split into training (95%) and testing (5%) sets.

The system's core was the MultiTaskNet model, which was implemented in PyTorch. I explored different configurations, including shared and separate latent vector representations for users and movies, to cater to both tasks. The model was evaluated based on Mean Squared Error (MSE) for score prediction and Mean Reciprocal Rank for likelihood prediction on the held-out user ratings.

The system was further refined by conducting four experiments with varying configurations of task weights and representations. These experiments were aimed at understanding the effect of parameter sharing and loss weighting on the model's performance.

The findings from this project not only deepened my understanding of multi-task architectures, training pipelines, and PyTorch coding but also provided a foundation for further exploration into more sophisticated multi-task recommender systems. Through meticulous analysis and comparison of different model configurations, I have garnered insights that will be instrumental in advancing the field of recommender systems.

Findings

1_shared=True_LF=0.99_LR=0.01.png

2_shared=True_LF=0.5_LR=0.5.png

3_shared=False_LF=0.5_LR=0.5.png

4_shared=False_LF=0.99_LR=0.01.png

Untitled

Code Snippet

"""
Classes defining user and item latent representations in
factorization models.
"""
import torch
import torch.nn as nn
import torch.nn.functional as F

class ScaledEmbedding(nn.Embedding):
    """
    Embedding layer that initialises its values
    to using a normal variable scaled by the inverse
    of the embedding dimension.
    """

    def reset_parameters(self):
        """
        Initialize parameters.
        """

        self.weight.data.normal_(0, 1.0 / self.embedding_dim)
        if self.padding_idx is not None:
            self.weight.data[self.padding_idx].fill_(0)

class ZeroEmbedding(nn.Embedding):
    """
    Embedding layer that initialises its values
    to zero.

    Used for biases.
    """

    def reset_parameters(self):
        """
        Initialize parameters.
        """

        self.weight.data.zero_()
        if self.padding_idx is not None:
            self.weight.data[self.padding_idx].fill_(0)

class MultiTaskNet(nn.Module):
    """
    Multitask factorization representation.

    Encodes both users and items as an embedding layer; the likelihood score
    for a user-item pair is given by the dot product of the item
    and user latent vectors. The numerical score is predicted using a small MLP.

    Parameters
    ----------

    num_users: int
        Number of users in the model.
    num_items: int
        Number of items in the model.
    embedding_dim: int, optional
        Dimensionality of the latent representations.
    layer_sizes: list
        List of layer sizes to for the regression network.
    embedding_sharing: boolean, optional
        Share embedding representations for both tasks.

    """

    def __init__(self, num_users, num_items, embedding_dim=32, layer_sizes=[96, 64],
                 embedding_sharing=True):

        super().__init__()
        self.embedding_dim = embedding_dim
        self.embedding_sharing = embedding_sharing

        if embedding_sharing:
            self.U, self.Q = self.init_shared_user_and_item_embeddings(num_users, num_items, embedding_dim)
        else:
            self.U_reg, self.Q_reg, self.U_fact, self.Q_fact = self.init_separate_user_and_item_embeddings(num_users, num_items, embedding_dim)

        self.B = self.init_item_bias(num_users, num_items)
        self.mlp_layers = self.init_mlp_layers(layer_sizes)

    def forward(self, user_ids, item_ids):
        """
        Compute the forward pass of the representation.

        Only need to compute values for user and item at the same index.
        For example, interaction and score between (user_ids[1] w.r.t item_ids[1]), ..., (user_ids[batch] w.r.t item_ids[batch])

        Parameters
        ----------

        user_ids: tensor
            A tensor of integer user IDs of shape (batch,)
        item_ids: tensor
            A tensor of integer item IDs of shape (batch,)

        Returns
        -------

        predictions: tensor
            Tensor of user-item interaction predictions of shape (batch,)
        score: tensor
            Tensor of user-item score predictions of shape (batch,)
        """
        if self.embedding_sharing:
            predictions, score = self.forward_with_embedding_sharing(user_ids, item_ids)
        else:
            predictions, score = self.forward_without_embedding_sharing(user_ids, item_ids)

        ## Make sure you return predictions and scores of shape (batch,)
        if (len(predictions.shape) > 1) or (len(score.shape) > 1):
            raise ValueError("Check your shapes!")
        
        return predictions, score
    
    def init_shared_user_and_item_embeddings(self, num_users, num_items, embedding_dim):
        """
        Initializes shared user and item embeddings
        used in both factorization and regression tasks

        Parameters
        ----------

        num_users: int
            Number of users in the model.
        num_items: int
            Number of items in the model.
        embedding_dim: int, optional
            Dimensionality of the latent representations.
            

        Returns
        -------

        U: ScaledEmbedding layer for users
            nn.Embedding of shape (num_users, embedding_dim)
        Q: ScaledEmbedding layer for items
            nn.Embedding of shape (num_items, embedding_dim)
        """
        U = Q = None
        ### START CODE HERE ###
        U = ScaledEmbedding(num_embeddings=num_users, embedding_dim=embedding_dim)
        Q = ScaledEmbedding(num_embeddings=num_items, embedding_dim=embedding_dim)
        ### END CODE HERE ###
        return U, Q
    
    def init_separate_user_and_item_embeddings(self, num_users, num_items, embedding_dim):
        """
        Initializes separate user and item embeddings
        where one will be used for factorization (ie _fact) and 
        other for regression tasks (ie _reg)

        Parameters
        ----------

        num_users: int
            Number of users in the model.
        num_items: int
            Number of items in the model.
        embedding_dim: int, optional
            Dimensionality of the latent representations.
            

        Returns
        -------

        U_reg: first ScaledEmbedding layer for users
            nn.Embedding of shape (num_users, embedding_dim)
        Q_reg: first ScaledEmbedding layer for items
            nn.Embedding of shape (num_items, embedding_dim)
        U_fact: second ScaledEmbedding layer for users
            nn.Embedding of shape (num_users, embedding_dim)
        Q_fact: second ScaledEmbedding layer for items
            nn.Embedding of shape (num_items, embedding_dim)

        Note: Order does matter here! Please declare the layers in the order
        they are returned.
        """
        U_reg = Q_reg = U_fact = Q_fact = None
        ### START CODE HERE ###
        U_reg = ScaledEmbedding(num_embeddings=num_users, embedding_dim=embedding_dim)
        Q_reg = ScaledEmbedding(num_embeddings=num_items, embedding_dim=embedding_dim)
        U_fact = ScaledEmbedding(num_embeddings=num_users, embedding_dim=embedding_dim)
        Q_fact = ScaledEmbedding(num_embeddings=num_items, embedding_dim=embedding_dim)

        ### END CODE HERE ###
        return U_reg, Q_reg, U_fact, Q_fact
    
    def init_item_bias(self, num_users, num_items):
        """
        Initializes item bias terms

        Parameters
        ----------

        num_users: int
            Number of users in the model.
        num_items: int
            Number of items in the model.

        Returns
        -------
        B: ZeroEmbedding layer for items
            nn.Embedding of shape (num_items, 1)
        """
        B = None
        ### START CODE HERE ###
        # Item bias terms (Matrix Factorization Only)
        B = ZeroEmbedding(num_embeddings=num_items, embedding_dim=1)
        ### END CODE HERE ###
        return B
    
    def init_mlp_layers(self, layer_sizes):
        """
        Initializes MLP layer for regression task

        Parameters
        ----------

        layer_sizes: list
            List of layer sizes to for the regression network.

        Returns
        -------

        mlp_layers: nn.ModuleList
            MLP network containing Linear and ReLU layers
        """
        mlp_layers = None
        ### START CODE HERE ###
        # MLP layer for regression task
        mlp_layers = nn.Sequential(
            nn.Linear(layer_sizes[0], layer_sizes[1]),
            nn.ReLU(),
        # Add final linear layer to the network
            nn.Linear(layer_sizes[1], 1)
        )
        ### END CODE HERE ###
        return mlp_layers

    def forward_with_embedding_sharing(self, user_ids, item_ids):
        """
        Please see forward() docstrings for reference
        """
        predictions = score = None
        ### START CODE HERE ###
        U = self.U(user_ids)
        Q = self.Q(item_ids)
        B = self.B(item_ids)

        # Regression head
        numerical_rating = U * Q
        single_latent_vector = torch.cat((U, Q, numerical_rating), dim=1)
        score = self.mlp_layers(single_latent_vector).squeeze()

        # # Matrix Factorization Head
        U_Q = U * Q
        predictions = (torch.sum(U_Q, dim=1, keepdim=True) + B).squeeze()

        ### END CODE HERE ###
        return predictions, score
    
    def forward_without_embedding_sharing(self, user_ids, item_ids):
        """
        Please see forward() docstrings for reference
        """
        predictions = score = None
        ### START CODE HERE ###
        U_reg = self.U_reg(user_ids)
        Q_reg = self.Q_reg(item_ids)

        U_fact = self.U_fact(user_ids)
        Q_fact = self.Q_fact(item_ids)
        B_fact = self.B(item_ids)

        # Regression head
        numerical_rating = U_reg * Q_reg
        single_latent_vector_reg = torch.cat((U_reg, Q_reg, numerical_rating), dim=1)
        score = self.mlp_layers(single_latent_vector_reg).squeeze()

        # Matrix Factorization Head
        U_fact_Q_fact = U_fact * Q_fact
        predictions = (torch.sum(U_fact_Q_fact, dim=1, keepdim=True) + B_fact).squeeze()

        ### END CODE HERE ###
        return predictions, score