Description :
The heart disease dataset is available on kaggle and UCI Machine learning Repository. According to UCI, "This dataset contains 76 attributes, but all published experiments refer to using a subset of 14 of them. In particular, the Cleveland database is the only one that has been used by ML researchers to this date." We can use this dataset for classification, to predict whether patients have heart disease by giving some features of users.
Recommended Model :
Algorithms to be used, Logistic Regression, SVM, Naive Bayes, Random Forest, Neural network etc.
Recommended Projects :
To predict whether patients have heart disease by giving some features of users.
Dataset link
Data set Link : Kaggle : - https://www.kaggle.com/ronitf/heart-disease-uci
Overview of data
Detailed overview of dataset
Records in the dataset = 303ROWS
Columns in the dataset = 14 COLUMNS
Age : -Patient’s age in year (continuous value)
Sex : - Gender of Patient (1- male , 0- femae )
CP : Chest Pain (1- typical angina, 2- atypical angina, 3 - non- angina pain, 4- asymptomatic)
Trestbps : - Resting Blood Pressure (continuous value in mm/hg)
Chol : - serum cholesterol in mg/dl (continuous value mg/dl)
FBS :- Fasting Blood sugar
Restege : Resting Electrographic result
Thalach : -Maximum heart rate achieved (continuous value)
Exang : Exercise induced angina
Oldpeak : ST depression induced by exercise relative to rest
Slope : the slope of the peak exercise ST segment
Ca : number of major vessels coloured by fluoroscopy
Thal : defect type
Num : diagnosis of heart disease
EDA[Code]
Dataset
import pandas as pd
# Load Data
file_loc = "data\\heart.csv"
heart_data = pd.read_csv(file_loc)
heart_data.head()
Total number of rows and columns
# Number of Rows and columns
rows_col = heart_data.shape
print("Total number of Rows in the dataset : {}".format(rows_col[0]))
print("Total number of columns in the dataset : {}".format(rows_col[1]))
Check Details
# Data information
heart_data.info()
Check Missing values
# Missing Values
heart_data.isna().sum()
Statistical information
# Statistical information
heart_data.describe()
Data Visualization
Correlation
import seaborn as sns
import matplotlib.pyplot as plt
# correlation
corr = heart_data.corr()
corr.style.background_gradient(cmap='coolwarm')
Count the heart patient
# 0 means no, 1 - yes
sns.set_style("whitegrid")
plt.figure(figsize=(8,5))
sns.countplot(x= "target",data=heart_data)
Count the number of male and female patient
# Gender Male and Female
plt.figure(figsize=(8,5))
sns.countplot(x= "sex",data=heart_data)
Count plot of chest pain
# Chest pain
plt.figure(figsize=(8,5))
sns.countplot(x= "cp",data=heart_data)
count plot of fast blood pressure
# fasting blood pressure
plt.figure(figsize=(8,5))
sns.countplot(x= "fbs",data=heart_data)
Count plot of Resting Electorgraphic result
# resting electrocardiographic results
plt.figure(figsize=(8,5))
sns.countplot(x= "restecg",data=heart_data)
Exercise induced angina count
# exercise induced angina 0 - no 1 - yes
plt.figure(figsize=(8,5))
sns.countplot(x= "exang",data=heart_data)
Count plot of thal
plt.figure(figsize=(8,5))
sns.countplot(x= "thal",data=heart_data)
Histogram plot of age
# Histogram
plt.figure(figsize=(8,5))
sns.histplot(x="age",data=heart_data)
Other related data
If you need implementation for any of the topics mentioned above or assignment help on any of its variants, feel free to contact us.
Comments