top of page

Blood Transfusion service center data set - Classification




Description :


This dataset was taken from a blood transfusion service center in taiwan. This dataset contains information about the blood donor, E.g. duration of last month blood donation, number of times blood donated, how much blood donated, how many times blood donated etc.This dataset consists of 748 instances and 5 attributes. We can use this dataset to predict the whether he/she donated blood in March 2007.


Recommended Model :


Algorithms to be used, TPOT Classifier, logistic regression etc.


Recommended Projects :


To predict the whether he/she donated blood in March 2007

Dataset link



Overview of data


Detailed overview of dataset

  • Records in the dataset = 748 ROWS

  • Columns in the dataset = 5 COLUMNS

  1. Recency (months) - The number of months since the most recent donation

  2. Frequency (times) - Total number of blood donation of particular donor

  3. Monetary (c.c. blood) - Total amount of blood that the donor has donated in C.C

  4. Time (months) - Number of months since the donor's first donation

Target Variable

  1. whether he/she donated blood in March 2007 - This is a binary variable which represents whether the donor donated blood in March 2007 (0 - not donate blood and 1 - blood donate)


EDA[Code]


Blood donation Dataset



import pandas as pd
# Load Data
file_loc = "data\\transfusion.DATA"
blood_transfusion_data = pd.read_csv(file_loc)
blood_transfusion_data.head()



Total number of rows and column in the dataset.


# Number of Rows and columns 
rows_col = blood_transfusion_data.shape
print("Total number of Rows in the dataset : {}".format(rows_col[0]))
print("Total number of columns in the dataset : {}".format(rows_col[1]))



Dataset information

# Data information
blood_transfusion_data.info()


Check the number of missing values in the dataset.


# Check the number of Missing Values in each columns
blood_transfusion_data.isna().sum()


Statistical information.


# Statistical information
blood_transfusion_data.describe()


Data Visualization


Correlation

import seaborn as sns
import matplotlib.pyplot as plt
# correlation
corr = blood_transfusion_data.corr()
corr.style.background_gradient(cmap='coolwarm')



Plot the count plot of Target Variable

# 0 means no, 1 - yes
sns.set_style("whitegrid")
plt.figure(figsize=(8,5))
sns.countplot(x= "whether he/she donated blood in March 2007",data=blood_transfusion_data)


Countplot of Recency (month)

plt.figure(figsize=(8,5))
sns.countplot(x= "Recency (months)",data=blood_transfusion_data)


Count plot of Frequency (times)



plt.figure(figsize=(8,5))
sns.countplot(x= "Frequency (times)",data=blood_transfusion_data)

Count plot of Monetary(c.c. blood)


plt.figure(figsize=(18,5))
sns.countplot(x= "Monetary (c.c. blood)",data=blood_transfusion_data)


Count plot of Time (months)


plt.figure(figsize=(20,5))
sns.countplot(x= "Time (months)",data=blood_transfusion_data)


num_cols = blood_transfusion_data.columns
num_cols=num_cols[:-1]
for col in num_cols:
    sns.set_theme(style="whitegrid")
    plt.figure(figsize=(10,5))
    ax = sns.boxplot(x=blood_transfusion_data[col])


Other related data


If you need implementation for any of the topics mentioned above or assignment help on any of its variants, feel free to contact us


bottom of page