Heart Disease data set - classification

Oct 30, 2021

Updated: Nov 3, 2021

Description :

The heart disease dataset is available on kaggle and UCI Machine learning Repository. According to UCI, "This dataset contains 76 attributes, but all published experiments refer to using a subset of 14 of them. In particular, the Cleveland database is the only one that has been used by ML researchers to this date." We can use this dataset for classification, to predict whether patients have heart disease by giving some features of users.

Recommended Model :

Algorithms to be used, Logistic Regression, SVM, Naive Bayes, Random Forest, Neural network etc.

Recommended Projects :

To predict whether patients have heart disease by giving some features of users.

Dataset link

Data set Link : Kaggle : - https://www.kaggle.com/ronitf/heart-disease-uci

Overview of data

Detailed overview of dataset

Records in the dataset = 303ROWS
Columns in the dataset = 14 COLUMNS

Age : -Patient’s age in year (continuous value)
Sex : - Gender of Patient (1- male , 0- femae )
CP : Chest Pain (1- typical angina, 2- atypical angina, 3 - non- angina pain, 4- asymptomatic)
Trestbps : - Resting Blood Pressure (continuous value in mm/hg)
Chol : - serum cholesterol in mg/dl (continuous value mg/dl)
FBS :- Fasting Blood sugar
Restege : Resting Electrographic result
Thalach : -Maximum heart rate achieved (continuous value)
Exang : Exercise induced angina
Oldpeak : ST depression induced by exercise relative to rest
Slope : the slope of the peak exercise ST segment
Ca : number of major vessels coloured by fluoroscopy
Thal : defect type
Num : diagnosis of heart disease

EDA[Code]

Dataset

import pandas as pd
# Load Data
file_loc = "data\\heart.csv"
heart_data = pd.read_csv(file_loc)
heart_data.head()

Total number of rows and columns

# Number of Rows and columns 
rows_col = heart_data.shape
print("Total number of Rows in the dataset : {}".format(rows_col[0]))
print("Total number of columns in the dataset : {}".format(rows_col[1]))

Check Details

# Data information
heart_data.info()

Check Missing values

# Missing Values
heart_data.isna().sum()

Statistical information

# Statistical information
heart_data.describe()

Data Visualization

Correlation

import seaborn as sns
import matplotlib.pyplot as plt
# correlation
corr = heart_data.corr()
corr.style.background_gradient(cmap='coolwarm')

Count the heart patient

# 0 means no, 1 - yes
sns.set_style("whitegrid")
plt.figure(figsize=(8,5))
sns.countplot(x= "target",data=heart_data)

Count the number of male and female patient

# Gender Male and Female 
plt.figure(figsize=(8,5))
sns.countplot(x= "sex",data=heart_data)

Count plot of chest pain

# Chest pain
plt.figure(figsize=(8,5))
sns.countplot(x= "cp",data=heart_data)

count plot of fast blood pressure

# fasting blood pressure
plt.figure(figsize=(8,5))
sns.countplot(x= "fbs",data=heart_data)

Count plot of Resting Electorgraphic result

# resting electrocardiographic results
plt.figure(figsize=(8,5))
sns.countplot(x= "restecg",data=heart_data)

Exercise induced angina count

# exercise induced angina 0 - no  1 - yes
plt.figure(figsize=(8,5))
sns.countplot(x= "exang",data=heart_data)

Count plot of thal

plt.figure(figsize=(8,5))
sns.countplot(x= "thal",data=heart_data)

Histogram plot of age

# Histogram 
plt.figure(figsize=(8,5))
sns.histplot(x="age",data=heart_data)

Other related data

Occupancy Detection Data Set - Classification

Census income Data Set - Classification

Wholesale customer - Classification and Clustering

Online retail dataset - classification, clustering and regression

Cervical Cancer Risk Factor Dataset - classification and clustering

Blood Transfusion service center dataset - Classification

Divorce Predictor Dataset -classification

Fire Forest Dataset - Regression

Student performance dataset - Classification and Regression

If you need implementation for any of the topics mentioned above or assignment help on any of its variants, feel free to contact us.