top of page

Building a Movie Recommendation Engine with Flask

Updated: Aug 23

Introduction


In this blog, we'll walk through building a simple web-based movie recommendation engine using Flask, Pandas, and Scikit-learn. This project leverages natural language processing techniques to provide recommendations based on movie descriptions.



Overview


We'll develop a Flask web application that recommends movies similar to a user-provided title. The similarity between movies will be computed using the TF-IDF (Term Frequency-Inverse Document Frequency) and cosine similarity.



Prerequisites


Before diving in, make sure you have the following installed:

  • Python 3.x

  • Flask

  • Pandas

  • Scikit-learn



The Dataset


We'll use a dataset that contains metadata about various movies. The data is available at https://www.kaggle.com/datasets/tmdb/tmdb-movie-metadata



Setting Up the Flask Application


First, we import the necessary libraries and set up a basic Flask app:


import flask
import pandas as pd
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import cosine_similarity

app = flask.Flask(__name__, template_folder='templates')

Here, flask.Flask initializes our Flask application, and template_folder specifies the directory where HTML templates are stored.



Loading and Processing the Data


Next, we load the movie data and prepare it for similarity calculations:


Loading Data: The dataset is loaded into a DataFrame (df2).

df2 = pd.read_csv('./model/tmdb.csv')

TF-IDF Vectorization: We use TfidfVectorizer from Scikit-learn to transform the soup column (which contains text data) into a TF-IDF matrix. This matrix represents the importance of words across different movies.



tfidf = TfidfVectorizer(stop_words='english', analyzer='word')
# Construct the TF-IDF matrix by fitting and transforming the data
tfidf_matrix = tfidf.fit_transform(df2['soup'])
print(tfidf_matrix.shape)

Cosine Similarity: Using the TF-IDF matrix, we calculate the cosine similarity between all movies. This results in a square matrix where each element represents the similarity between two movies.


# Construct cosine similarity matrix
cosine_sim = cosine_similarity(tfidf_matrix, tfidf_matrix)
print(cosine_sim.shape)
df2 = df2.reset_index()
indices = pd.Series(df2.index, index=df2['title']).drop_duplicates()


Building the Recommendation Function


We define a function to get movie recommendations based on cosine similarity:


def get_recommendations(title):
    global sim_scores
    # Get the index of the movie that matches the title
    idx = indices[title]
    # Get the pairwise similarity scores of all movies with that movie
    sim_scores = list(enumerate(cosine_sim[idx]))
    # Sort the movies based on the similarity scores
    sim_scores = sorted(sim_scores, key=lambda x: x[1], reverse=True)
    # Get the scores of the 10 most similar movies
    sim_scores = sim_scores[1:11]
    
    # Get the movie indices
    movie_indices = [i[0] for i in sim_scores]

    # Return a DataFrame with similar movies
    return_df = pd.DataFrame(columns=['Title', 'Homepage'])
    return_df['Title'] = df2['title'].iloc[movie_indices]
    return_df['Homepage'] = df2['homepage'].iloc[movie_indices]
    return_df['ReleaseDate'] = df2['release_date'].iloc[movie_indices]
    return return_df

  • Finding the Movie: The function first locates the index of the provided movie title.

  • Calculating Similarity: It then calculates similarity scores with all other movies and sorts them in descending order.

  • Returning Recommendations: The function returns the top 10 most similar movies, along with their titles, homepages, and release dates.



Setting Up Flask Routes


We define the main route that handles GET and POST requests:


@app.route('/', methods=['GET', 'POST'])
def main():
    if flask.request.method == 'GET':
        return(flask.render_template('index.html'))
            
    if flask.request.method == 'POST':
        m_name = " ".join(flask.request.form['movie_name'].split())
        if m_name not in all_titles:
            return(flask.render_template('notFound.html', name=m_name))
        else:
            result_final = get_recommendations(m_name)
            names = []
            homepage = []
            releaseDate = []
            for i in range(len(result_final)):
                names.append(result_final.iloc[i][0])
                releaseDate.append(result_final.iloc[i][2])
                if(len(str(result_final.iloc[i][1])) > 3):
                    homepage.append(result_final.iloc[i][1])
                else:
                    homepage.append("#")

            return flask.render_template('found.html', movie_names=names, movie_homepage=homepage, search_name=m_name, movie_releaseDate=releaseDate, movie_simScore=sim_scores)

  • GET Request: Renders the main search page (index.html).

  • POST Request: Handles the form submission, checks if the movie exists, and returns the recommendations. If the movie isn't found, it renders a notFound.html template.



Running the Flask App


Finally, to run the Flask application, add this block:


if __name__ == '__main__':
    app.run(host="127.0.0.1", port=8080, debug=True)


Putting it All together


import flask
import pandas as pd
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import cosine_similarity

app = flask.Flask(__name__, template_folder='templates')

df2 = pd.read_csv('./model/tmdb.csv')

tfidf = TfidfVectorizer(stop_words='english',analyzer='word')
#Construct the required TF-IDF matrix by fitting and transforming the data
tfidf_matrix = tfidf.fit_transform(df2['soup'])

print(tfidf_matrix.shape)

#construct cosine similarity matrix
cosine_sim = cosine_similarity(tfidf_matrix, tfidf_matrix)
print(cosine_sim.shape)

df2 = df2.reset_index()

indices = pd.Series(df2.index, index=df2['title']).drop_duplicates()
# create array with all movie titles
all_titles = [df2['title'][i] for i in range(len(df2['title']))]

def get_recommendations(title):
    global sim_scores
    # Get the index of the movie that matches the title
    idx = indices[title]
    # Get the pairwise similarity scores of all movies with that movie
    sim_scores = list(enumerate(cosine_sim[idx]))
    # Sort the movies based on the similarity scores
    sim_scores = sorted(sim_scores, key=lambda x: x[1], reverse=True)
    # Get the scores of the 10 most similar movies
    sim_scores = sim_scores[1:11]
    # print similarity scores
    print("\n movieId      score")
    for i in sim_scores:
        print(i)

    # Get the movie indices
    movie_indices = [i[0] for i in sim_scores]

    # return list of similar movies
    return_df = pd.DataFrame(columns=['Title','Homepage'])
    return_df['Title'] = df2['title'].iloc[movie_indices]
    return_df['Homepage'] = df2['homepage'].iloc[movie_indices]
    return_df['ReleaseDate'] = df2['release_date'].iloc[movie_indices]

    return return_df

# Set up the main route
@app.route('/', methods=['GET', 'POST'])

def main():
    if flask.request.method == 'GET':
        return(flask.render_template('index.html'))

    if flask.request.method == 'POST':
        m_name = " ".join(flask.request.form['movie_name'].split())
#        check = difflib.get_close_matches(m_name,all_titles,cutout=0.50,n=1)
        if m_name not in all_titles:
            return(flask.render_template('notFound.html',name=m_name))
        else:
            result_final = get_recommendations(m_name)
            names = []
            homepage = []
            releaseDate = []
            for i in range(len(result_final)):
                names.append(result_final.iloc[i][0])
                releaseDate.append(result_final.iloc[i][2])
                if(len(str(result_final.iloc[i][1]))>3):
                    homepage.append(result_final.iloc[i][1])
                else:
                    homepage.append("#")
                
            return flask.render_template('found.html',movie_names=names,movie_homepage=homepage,search_name=m_name, movie_releaseDate=releaseDate, movie_simScore=sim_scores)

if __name__ == '__main__':

    app.run(host="127.0.0.1", port=8080, debug=True)
    #app.run()

If you require any assistance with your Machine Learning projects, please do not hesitate to contact us. We have a team of experienced developers who specialize in Machine Learning and can provide you with the necessary support and expertise to ensure the success of your project. You can reach us through our website or by contacting us directly via email or phone.



Comentários


bottom of page