Building a Table Question-Answering Application with Streamlit and TAPAS: A Step-by-Step Guide

Aug 15, 2024

In this tutorial, we'll walk through building a Table Question-Answering (QA) application using Python, Streamlit, and the TAPAS model from Hugging Face's Transformers library. This application allows users to upload CSV files, ask questions related to the data in those files, and receive answers directly from the table highlighted for easy reference. Below, we'll explore the code step by step.

Introduction

In the realm of Natural Language Processing (NLP), Table Question-Answering is an exciting frontier that enables users to extract specific information from tabular data using natural language queries. TAPAS (Table Parsing Sequence-to-Sequence) is a powerful model fine-tuned for such tasks, and in this tutorial, we’ll see how to integrate it into a Streamlit application for a seamless user experience.

What is Streamlit?

Streamlit is an open-source framework that allows developers to create interactive web applications using Python. It’s particularly useful for building data-driven applications due to its simplicity and ability to integrate with various data science libraries. With Streamlit, you can quickly turn data scripts into shareable web apps.

What Can You Do with This Table QA Application?

This application enables users to:

Upload CSV files containing tabular data.
Ask natural language questions related to the data.
Receive precise answers, with relevant cells highlighted in the table.
Display and interact with the table directly in the application.

This tool is ideal for users who need to query large datasets without manual search or coding, making data analysis more accessible and efficient.

Step 1: Importing Necessary Libraries

import tensorflow.compat.v1 as tf
import os
import shutil
import csv
import sys
import pandas as pd
import numpy as np
import IPython
import streamlit as st
from itertools import islice
import random
from transformers import TapasTokenizer, TapasForQuestionAnswering

In above section, we import the necessary libraries:

tensorflow.compat.v1 is used to suppress TensorFlow warnings.
os, shutil, csv, sys handle file operations and system interactions.
pandas and numpy are used for data manipulation and numerical operations.
IPython is included but not used directly in this code.
streamlit creates the web application.
random is used for generating random colors for highlighting answers.
transformers provides the TAPAS model and tokenizer for question-answering.

Step 2: Loading the TAPAS Model and Tokenizer

model_name = 'google/tapas-base-finetuned-wtq'
model = TapasForQuestionAnswering.from_pretrained(model_name, local_files_only=False)
tokenizer = TapasTokenizer.from_pretrained(model_name)

Here, we load the TAPAS model and tokenizer:

google/tapas-base-finetuned-wtq is the model pre-trained on WikiTableQuestions, a dataset for table-based question answering.
The model and tokenizer are initialized using the from_pretrained method.

Step 3: Configuring Streamlit

st.set_option('deprecation.showfileUploaderEncoding', False)
st.title('Query your Table')
st.header('Upload CSV file')

We configure Streamlit to:

Suppress a deprecation warning related to file uploader encoding.
Set the title and header for the application.

Step 4: Handling File Upload and Displaying the Data

uploaded_file = st.file_uploader("Choose your CSV file", type='csv')
placeholder = st.empty()

if uploaded_file is not None:
    data = pd.read_csv(uploaded_file)
    data.replace(',', '', regex=True, inplace=True)
    if st.checkbox('Want to see the data?'):
        placeholder.dataframe(data)

In this section:

Users can upload a CSV file, which is then read into a pandas DataFrame.
Commas are removed from the data for cleaner processing.
A checkbox allows users to view the uploaded data in the app.

Step 5: Handling User Queries

st.header('Enter your queries')
input_queries = st.text_input('Type your queries separated by comma(,)', value='')
input_queries = input_queries.split(',')

Users are prompted to enter their queries, which are split by commas to allow multiple questions.

Step 6: Generating Random Colors for Highlighting

colors1 = ["#" + ''.join([random.choice('0123456789ABCDEF') for j in range(6)]) for i in range(len(input_queries))]
colors2 = ['background-color:' + str(color) + '; color: black' for color in colors1]

Random colors are generated to highlight the answers in the table, making it easy to distinguish between different queries.

Step 7: Predicting Answers and Highlighting Cells

if st.button('Predict Answers'):
    with st.spinner('It will take approx a minute'):
        table = data.astype(str)
        inputs = tokenizer(table=table, queries=input_queries, padding='max_length', truncation=True, return_tensors="pt")
        outputs = model(**inputs)
        predicted_answer_coordinates, predicted_aggregation_indices = tokenizer.convert_logits_to_predictions(inputs, outputs.logits.detach(), outputs.logits_aggregation.detach())

        id2aggregation = {0: "NONE", 1: "SUM", 2: "AVERAGE", 3: "COUNT"}
        aggregation_predictions_string = [id2aggregation[x] for x in predicted_aggregation_indices]

        answers = []
        for coordinates in predicted_answer_coordinates:
            if len(coordinates) == 1:
                answers.append(table.iat[coordinates[0]])
            else:
                cell_values = []
                for coordinate in coordinates:
                    cell_values.append(table.iat[coordinate])
                answers.append(", ".join(cell_values))

In this step:

Users click a button to predict answers based on their queries.
The TAPAS model processes the input table and queries.
The model predicts the coordinates of the relevant cells and the type of aggregation (e.g., SUM, AVERAGE).
The answers are compiled from the predicted cells.

Step 8: Displaying the Results

st.success('Done! Please check below the answers and its cells highlighted in table above')

placeholder.dataframe(data.style.apply(styling_specific_cell, tags=predicted_answer_coordinates, colors=colors2, axis=None))

for query, answer, predicted_agg, c in zip(input_queries, answers, aggregation_predictions_string, colors1):
    st.write('\n')
    st.markdown('<font color={} size=4>**{}**</font>'.format(c, query), unsafe_allow_html=True)
    st.write('\n')
    
    if predicted_agg == "NONE" or predicted_agg == 'COUNT':
        st.markdown('**>** ' + str(answer))
    else:
        if predicted_agg == 'SUM':
            st.markdown('**>** ' + str(sum(list(map(float, answer.split(','))))))
        else:
            st.markdown('**>** ' + str(np.round(np.mean(list(map(float, answer.split(',')))), 2)))

Finally:

The application displays the answers with the corresponding cells highlighted in the table.
Each query is displayed alongside its answer, with additional calculations if the model predicts an aggregation.

Templates

answer.html

This template renders the answers in a user-friendly format using Bootstrap for styling.

<!DOCTYPE html>
<html>
<head>
    <title>Question-Answering App - Answers</title>
    <link rel="stylesheet" href="https://stackpath.bootstrapcdn.com/bootstrap/4.5.2/css/bootstrap.min.css">
</head>
<body>
    <div class="container">
        <h1 class="mt-4">Question-Answering App - Answers</h1>
        <ul class="mt-4">
            {% for answer in answers %}
                <li>{{ answer["answer"] }}</li>
            {% endfor %}
        </ul>
    </div>
</body>
</html>

index.html

This template is used for the main page where users can upload files and submit their questions.

<!DOCTYPE html>
<html>
<head>
    <title>Question-Answering App</title>
    <link rel="stylesheet" href="https://stackpath.bootstrapcdn.com/bootstrap/4.5.2/css/bootstrap.min.css">
</head>
<body>
    <div class="container">
        <h1 class="mt-4">Question-Answering App</h1>
        <form method="post" action="/upload" enctype="multipart/form-data">
            <div class="form-group mt-4">
                <label for="file">Upload File:</label>
                <input type="file" name="file" accept=".csv,.xlsx" class="form-control-file">
            </div>
            <div class="form-group mt-4">
                <label for="question">Question:</label>
                <textarea name="question" class="form-control" rows="4" required></textarea>
            </div>
            <button type="submit" class="btn btn-primary">Submit</button>
        </form>
    </div>
</body>
</html>

In this blog, I have explained step by step how to build a Table Question-Answering application using Streamlit and the TAPAS model from Hugging Face. The application enables users to upload a CSV file, ask questions, and receive answers directly from the table. This setup can be further extended to handle more complex queries and larger datasets, making it a powerful tool for data exploration and analysis.

Screenshots

4.

Demo Video

For the complete solution or any assistance with building a Table Question-Answering Application with Streamlit and TAPAS, feel free to contact us