Titanic Survival Exploration Using Python Machine Learning
In 1912, the ship RMS Titanic struck an iceberg on its maiden voyage and sank, an accident case occurs in which most the people are dead, here we predict data as per most of the people is survived and most of are not survived.
Now Let We Started
To begin this first we need to import all the related functionality like all related libraries and others like how to load data using the panda's data frame.
import numpy as np
import pandas as pd
#from titanic_visualizations import survival_stats
%matplotlib inline
# Load the dataset
file = 'Q2Titanicdata/titanic.csv'
data = pd.read_csv(in_file)
# Print the first few entries of the RMS Titanic data
data.head()
We can see it as:
Here there are various features which are in titanic data:
Survived: Outcome of survival (0 = No; 1 = Yes)
Pclass: Socio-economic class (1 = Upper class; 2 = Middle class; 3 = Lower class)
Name: Name of passenger
Sex: Sex of the passenger
Age: Age of the passenger (Some entries contain NaN)
SibSp: Number of siblings and spouses of the passenger aboard
Parch: Number of parents and children of the passenger aboard
Ticket: Ticket number of the passenger
Fare: Fare paid by the passenger
Cabin Cabin number of the passenger (Some entries contain NaN)
Embarked: Port of embarkation of the passenger (C = Cherbourg; Q = Queenstown; S = Southampton)
Now we select the target variable from datasets
#store survived in new target variable and remvoe it from data
outcomes = data['Survived']
data = data.drop('Survived', axis = 1)
data.head()
Calculating the accuracy of data which is Serviced
def accuracy_score(truth, pred):
""" Returns accuracy score for input truth and predictions. """
# Ensure that the number of predictions matches number of outcomes
if len(truth) == len(pred):
# Calculate and return the accuracy as a percent
return "Predictions have an accuracy of {:.2f}.".format((truth ==
pred).mean()*100)
else:
return "Number of predictions does not match number of outcomes!"
predictions = pd.Series(np.ones(5, dtype = int))
predictions
Output:
0 1 1 1 2 1 3 1 4 1 dtype: int32
Make predictions for supposing no one is alive from titanic:
#Let we assume no one is alive from RSM titanic
def predictions_0(data):
predictions = []
for index, passenger in data.iterrows():
# Predict the survival of 'passenger'
predictions.append(0)
return pd.Series(predictions)
# Make the predictions
predictions = predictions_0(data)
predictions.head()
Output:
0 0 1 0 2 0 3 0 4 0 dtype: int64
Finding the accuracy of this:
print(accuracy_score(outcomes, predictions))
The output is shown as like:
Predictions have an accuracy of 61.62.
Now we are plotting the bar plot:
x = data.Sex.unique()
# Counting 'Males' and 'Females' in the dataset
y = data.Sex.value_counts()
# Plotting the bar graph
plt.bar(x, y)
Now if a passenger was female, then we will predict that they survived. Otherwise, we will predict the passenger did not survive.
How accurate would a prediction be that all female passengers survived and the remaining passengers did not survive?
def predictions_1(data):
# create a list of predictions
predictions = []
for index, passenger in data.iterrows():
# Remove the 'pass' statement below
if passenger['Sex'] == 'female':
predictions.append(1)
else:
predictions.append(0)
# Return our predictions
return pd.Series(predictions)
# Make the predictions
predictions = predictions_1(data)
predictions.head()
Output:
0 0 1 1 2 1 3 1 4 0 dtype: int64
print(accuracy_score(outcomes, predictions))
Output:
Predictions have an accuracy of 78.68.
How accurate would a prediction be that all female passengers and all-male passengers younger than 10 survive?
def predictions_2(data):
predictions = []
for index, passenger in data.iterrows():
if passenger['Sex'] == 'female':
predictions.append(1)
elif (passenger['Sex'] == 'male') & (passenger['Age'] < 10):
predictions.append(1)
else:
predictions.append(0)
# Return our predictions
return pd.Series(predictions)
# Make the predictions
predictions = predictions_2(data)
predictions.head(20)
Output:
0 0
1 1
2 1
3 1
4 0
5 0
6 0
7 1
8 1
9 1
10 1
11 1
12 0
13 0
14 1
15 1
16 1
17 0
18 1
19 1
dtype: int64
print(accuracy_score(outcomes, predictions))
Output:
Predictions have an accuracy of 79.35.
How you can get Codersarts Deep Learning Assignment Help?
Get your project or assignment completed by Deep learning expert and experienced developers and researchers.
OR
If you have project files, You can send at codersarts@gmail.com directly