Building an AutoML Regression Application with Streamlit: A Step-by-Step Guide

Aug 15, 2024

In this tutorial blog, we'll walk through building an AutoML application using Python, Streamlit, PyCaret, and pandas profiling. This application will allow users to upload datasets, perform data profiling, run machine learning models, and download the trained model all within a streamlined and interactive interface. Below, we'll explore the code step by step. But before moving towards the code lets know about AutoML, Streamlit and what can we do with AutoML Appliacation.

Introduction

In the world of machine learning, AutoML (Automated Machine Learning) is a game-changer. AutoML automates the time-consuming process of applying machine learning to real-world problems, making it accessible to those with limited expertise in the field. The application we'll build today is an AutoML tool designed to simplify the process of data exploration, model training, and deployment.

What is Streamlit?

Streamlit is an open-source framework that allows developers to create interactive web applications using Python. It’s particularly popular for building data-driven applications due to its simplicity and ability to integrate with various data science libraries. With Streamlit, you can turn your data scripts into shareable web apps in minutes.

What Can You Do with This AutoML Application?

This AutoML application enables users to upload datasets, perform exploratory data analysis (EDA), automatically train machine learning models, and download the best-performing model. It’s built with user-friendliness in mind, allowing non-experts to utilize machine learning capabilities without needing to write any code. Now, let's dive into the implementation of this powerful tool.

Step 1: Importing Necessary Libraries

from operator import index
import streamlit as st
import plotly.express as px
from pycaret.regression import setup, compare_models, pull, save_model, load_model
import pandas_profiling
import pandas as pd
from streamlit_pandas_profiling import st_profile_report
import os

In above section, we import the necessary libraries:

streamlit is used for creating the web application.
plotly.express is included for data visualization, though it's not used in this specific code.
pycaret.regression provides the AutoML functionality.
pandas_profiling and pandas handle data profiling and manipulation.
streamlit_pandas_profiling integrates pandas profiling with Streamlit.
os is used to interact with the file system.

Step 2: Setting Up the Streamlit Page Layout

st.set_page_config(layout="wide")

Here, we configure the Streamlit page to use a wide layout, giving our application more space to display content.

Step 3: Checking for an Existing Dataset

if os.path.exists('./dataset.csv'):
    df = pd.read_csv('dataset.csv', index_col=None)

If a file named dataset.csv exists in the current directory, it is loaded into a pandas DataFrame. This ensures that users can continue working with a previously uploaded dataset without needing to re-upload it.

Step 4: Building the Sidebar

with st.sidebar:
    st.image("https://www.onepointltd.com/wp-content/uploads/2020/03/inno2.png", width=200)
    st.title("AutoML")
    choice = st.radio("Navigation", ["Upload", "Profiling", "Modelling", "Download"])
    st.info("This project application helps you build and explore your data.")

The sidebar contains:

An image logo.
The title of the application.
A navigation radio button to switch between different sections: Upload, Profiling, Modelling, and Download.
A brief description of the application's purpose.

Step 5: Handling the "Upload" Section

if choice == "Upload":
    st.header("Upload Your Dataset")
    file = st.file_uploader("Upload Your Dataset")
    if file:
        df = pd.read_csv(file, index_col=None)
        df.to_csv('dataset.csv', index=None)
        st.dataframe(df)

When "Upload" is selected, users can upload a CSV file. The uploaded dataset is saved as dataset.csv and displayed in a table format.

Step 6: Handling the "Profiling" Section

if choice == "Profiling":
    st.header("Exploratory Data Analysis")
    profile_df = df.profile_report()
    st_profile_report(profile_df)

In the "Profiling" section, the code generates an exploratory data analysis report using pandas profiling and displays it in the application. This provides a comprehensive overview of the dataset's characteristics.

Step 7: Handling the "Modelling" Section

if choice == "Modelling":
    st.header("Modeling")
    chosen_target = st.selectbox('Choose the Target Column', df.columns)

Users can select the target column for their regression model in the "Modelling" section.

# Convert string and categorical columns to numeric
numeric_df = df.select_dtypes(include=['number'])
categorical_columns = df.select_dtypes(exclude=['number']).columns

for col in categorical_columns:
    df[col] = df[col].astype('category').cat.codes

Before modeling, string and categorical columns are converted to numeric values. This step ensures that the machine learning model can process all features.

if st.button('Run Modeling'):
    setup(df, target=chosen_target, silent=True, preprocess=False, fold_shuffle=True)
    setup_df = pull()
    st.dataframe(setup_df)
    best_model = compare_models()
    compare_df = pull()
    st.dataframe(compare_df)
    save_model(best_model, 'best_model')

Upon clicking the "Run Modeling" button:

PyCaret's setup function is called to prepare the dataset for modeling.
The best model is selected using compare_models.
The results of the setup and model comparison are displayed.
The best model is saved as best_model.pkl.

Step 8: Handling the "Download" Section

if choice == "Download":
    with open('best_model.pkl', 'rb') as f:
        st.download_button('Download Model', f, file_name="best_model.pkl")

In the "Download" section, users can download the trained model by clicking a button.

Step 9: Adding a Footer

# Footer
st.markdown("<hr>", unsafe_allow_html=True)
st.markdown("<p style='text-align: center; color: #999;'>C</p>", unsafe_allow_html=True)

A simple footer is added to the bottom of the page for a polished look.

complete code

from operator import index
import streamlit as st
import plotly.express as px
from pycaret.regression import setup, compare_models, pull, save_model, load_model
import pandas_profiling
import pandas as pd
from streamlit_pandas_profiling import st_profile_report
import os

# Set page config to wide layout
st.set_page_config(layout="wide")

if os.path.exists('./dataset.csv'):
    df = pd.read_csv('dataset.csv', index_col=None)

with st.sidebar:
    st.image("https://www.onepointltd.com/wp-content/uploads/2020/03/inno2.png", width=200)
    st.title("AutoML")
    choice = st.radio("Navigation", ["Upload", "Profiling", "Modelling", "Download"])
    st.info("This project application helps you build and explore your data.")

if choice == "Upload":
    st.header("Upload Your Dataset")
    file = st.file_uploader("Upload Your Dataset")
    if file:
        df = pd.read_csv(file, index_col=None)
        df.to_csv('dataset.csv', index=None)
        st.dataframe(df)

if choice == "Profiling":
    st.header("Exploratory Data Analysis")
    profile_df = df.profile_report()
    st_profile_report(profile_df)

if choice == "Modelling":
    st.header("Modeling")
    chosen_target = st.selectbox('Choose the Target Column', df.columns)

    # Convert string and categorical columns to numeric
    numeric_df = df.select_dtypes(include=['number'])
    categorical_columns = df.select_dtypes(exclude=['number']).columns

    for col in categorical_columns:
        df[col] = df[col].astype('category').cat.codes

    if st.button('Run Modeling'):
        setup(df, target=chosen_target, silent=True,
              preprocess=False, fold_shuffle=True)
        setup_df = pull()
        st.dataframe(setup_df)
        best_model = compare_models()
        compare_df = pull()
        st.dataframe(compare_df)
        save_model(best_model, 'best_model')

if choice == "Download":
    with open('best_model.pkl', 'rb') as f:
        st.download_button('Download Model', f, file_name="best_model.pkl")

# Footer
st.markdown("<hr>", unsafe_allow_html=True)
st.markdown("<p style='text-align: center; color: #999;'>C</p>", unsafe_allow_html=True)

This tutorial has walked you through building an AutoML application using Streamlit, PyCaret, and pandas profiling. The application allows users to upload datasets, perform data profiling, run regression models, and download the best model all within an intuitive interface. This setup is highly flexible and can be extended to include more machine learning tasks or additional features.