Machine Learning Code to Predict the Boston Housing Prices using Pandas Dataframe
In this, you will evaluate the performance and predictive of data collected from homes in the suburbs of Boston, Massachusetts.
Now we load the datasets and its related libraries:
# Import libraries necessary for this project
import numpy as np
import pandas as pd
import visuals as vs # Supplementary code
from sklearn.model_selection import ShuffleSplit
# Pretty display for notebooks
%matplotlib inline
# Load the Boston housing dataset
data = pd.read_csv('housing.csv')
prices = data['MEDV']
features = data.drop('MEDV', axis = 1)
# Success
print('Boston housing dataset has {0} data points with {1} variables each'.format(*data.shape))
Now we will see the output of this like:
Boston housing dataset has 489 data points with 4 variables each
Now next step is Data Exploration:
In this first section of this project, you will make a cursory investigation about the Boston housing data and provide your observations. Familiarizing yourself with the data through an explorative process is a fundamental practice to help you better understand and justify your results.
Since the main goal of this project is to construct a working model that has the capability of predicting the value of houses, we will need to separate the dataset into features and the target variable. The features, 'RM', 'LSTAT', and 'PTRATIO', give us quantitative information about each data point. The target variable, 'MEDV', will be the variable we seek to predict. These are stored in features and prices, respectively.
# FIND: Minimum price of the data
minimum_price = np.min(prices)
# TODO: Maximum price of the data
maximum_price = np.max(prices)
# TODO: Mean price of the data
mean_price = np.mean(prices)
# TODO: Median price of the data
median_price = np.median(prices)
# TODO: Standard deviation of prices of the data
std_price = np.std(prices)
# There are other statistics you can calculate too like quartiles
first_quartile = np.percentile(prices, 25)
third_quartile = np.percentile(prices, 75)
inter_quartile = third_quartile - first_quartile
# Show the calculated statistics
print("Statistics for Boston housing dataset:\n")
print("Minimum price: ${:,.2f}".format(minimum_price))
print("Maximum price: ${:,.2f}".format(maximum_price))
print("Mean price: ${:,.2f}".format(mean_price))
print("Median price ${:,.2f}".format(median_price))
print("Standard deviation of prices: ${:,.2f}".format(std_price))
print("First quartile of prices: ${:,.2f}".format(first_quartile))
print("Second quartile of prices: ${:,.2f}".format(third_quartile))
print("Interquartile (IQR) of prices: ${:,.2f}".format(inter_quartile))
Output:
Statistics for Boston housing dataset: Minimum price: $105,000.00 Maximum price: $1,024,800.00 Mean price: $454,342.94 Median price $438,900.00 Standard deviation of prices: $165,171.13 First quartile of prices: $350,700.00 Second quartile of prices: $518,700.00 Interquartile (IQR) of prices: $168,000.00
Now plot it:
# Using pyplot
import matplotlib.pyplot as plt
plt.figure(figsize=(20, 5))
# i: index
for i, col in enumerate(features.columns):
# 3 plots here hence 1, 3
plt.subplot(1, 3, i+1)
x = data[col]
y = prices
plt.plot(x, y, 'o')
# Create regression line
plt.plot(np.unique(x), np.poly1d(np.polyfit(x, y, 1))(np.unique(x)))
plt.title(col)
plt.xlabel(col)
plt.ylabel('prices')
Output:
Developing a Model
# TODO: Import 'r2_score'
from sklearn.metrics import r2_score
def performance_metric(y_true, y_predict):
# TODO: Calculate the performance score between 'y_true' and 'y_predict'
score = r2_score(y_true, y_predict)
# Return the score
return score
Now we are calculation the performance of the model:
# Calculate the performance of this model
score = performance_metric([3, -0.5, 2, 7, 4.2], [2.5, 0.0, 2.1, 7.8, 5.3])
print("Model has a coefficient of determination, R^2, of {:.3f}.".format(score))
The model has a coefficient of determination, R^2, of 0.923.
If you need any deep learning assignment help then you can contact us at given below link references:
Get your project or assignment completed by machine learning expert and experienced developers and researchers.
OR
If you have project files, You can send at codersarts@gmail.com directly