Description :
This dataset was taken from a blood transfusion service center in taiwan. This dataset contains information about the blood donor, E.g. duration of last month blood donation, number of times blood donated, how much blood donated, how many times blood donated etc.This dataset consists of 748 instances and 5 attributes. We can use this dataset to predict the whether he/she donated blood in March 2007.
Recommended Model :
Algorithms to be used, TPOT Classifier, logistic regression etc.
Recommended Projects :
To predict the whether he/she donated blood in March 2007
Dataset link
Data set Link : UCI MLR - https://archive.ics.uci.edu/ml/datasets/Blood+Transfusion+Service+Center
Overview of data
Detailed overview of dataset
Records in the dataset = 748 ROWS
Columns in the dataset = 5 COLUMNS
Recency (months) - The number of months since the most recent donation
Frequency (times) - Total number of blood donation of particular donor
Monetary (c.c. blood) - Total amount of blood that the donor has donated in C.C
Time (months) - Number of months since the donor's first donation
Target Variable
whether he/she donated blood in March 2007 - This is a binary variable which represents whether the donor donated blood in March 2007 (0 - not donate blood and 1 - blood donate)
EDA[Code]
Blood donation Dataset
import pandas as pd
# Load Data
file_loc = "data\\transfusion.DATA"
blood_transfusion_data = pd.read_csv(file_loc)
blood_transfusion_data.head()
Total number of rows and column in the dataset.
# Number of Rows and columns
rows_col = blood_transfusion_data.shape
print("Total number of Rows in the dataset : {}".format(rows_col[0]))
print("Total number of columns in the dataset : {}".format(rows_col[1]))
Dataset information
# Data information
blood_transfusion_data.info()
Check the number of missing values in the dataset.
# Check the number of Missing Values in each columns
blood_transfusion_data.isna().sum()
Statistical information.
# Statistical information
blood_transfusion_data.describe()
Data Visualization
Correlation
import seaborn as sns
import matplotlib.pyplot as plt
# correlation
corr = blood_transfusion_data.corr()
corr.style.background_gradient(cmap='coolwarm')
Plot the count plot of Target Variable
# 0 means no, 1 - yes
sns.set_style("whitegrid")
plt.figure(figsize=(8,5))
sns.countplot(x= "whether he/she donated blood in March 2007",data=blood_transfusion_data)
Countplot of Recency (month)
plt.figure(figsize=(8,5))
sns.countplot(x= "Recency (months)",data=blood_transfusion_data)
Count plot of Frequency (times)
plt.figure(figsize=(8,5))
sns.countplot(x= "Frequency (times)",data=blood_transfusion_data)
Count plot of Monetary(c.c. blood)
plt.figure(figsize=(18,5))
sns.countplot(x= "Monetary (c.c. blood)",data=blood_transfusion_data)
Count plot of Time (months)
plt.figure(figsize=(20,5))
sns.countplot(x= "Time (months)",data=blood_transfusion_data)
num_cols = blood_transfusion_data.columns
num_cols=num_cols[:-1]
for col in num_cols:
sns.set_theme(style="whitegrid")
plt.figure(figsize=(10,5))
ax = sns.boxplot(x=blood_transfusion_data[col])
Other related data
If you need implementation for any of the topics mentioned above or assignment help on any of its variants, feel free to contact us
Comments