Task 1: Regression
In this task you are required to apply a machine learning algorithm to the data
set houseprice_data.csv which can be downloaded from the assignment task on
canvas. This data set contains information about house sales in King County, USA.
The data has 18 features, such as: number of bedrooms, bathrooms, floors etc., and
a target variable: house price.
Using linear regression (simple or multiple), develop a model to predict the price
of a house. After developing the model you should also analyze the results and
discuss the effectiveness of the model, outlining the improvements when developing
the model.
Ideas to consider when completing this task:
• Is there a way of visualizing your model? (Possibly just one or two input/feature
variable(s).)
• How will you assess the effectiveness of the model?
• Include as many features as you can. Does the model improve?
• How could you make further improvements?
• What can you conclude about your model?
Task 2: Clustering
In this task you are required to apply a machine learning algorithm to the data set
country_data.csv which can be downloaded from the assignment task on canvas.
This data set contains information about a countries child mortality, exports, health
spending, etc.
Use clustering to investigate this data set. After clustering the data you should
analyze the results and discuss what can be concluded by the clusters.
Ideas to consider when completing this task:
• Is there a way of visualizing the clusters?
• Can you make any conclusions about the clustering?
• Include as many features as you can. Does the clustering change?
• What advice would you give, in the context of the data, based on the clustering?
Task 3: Classification & Neural Networks
In this task you are required to apply a variety of machine learning algorithms to
the data set nba_rookie_data.csv which can be downloaded from the assignment
task on canvas. This data set contains NBA rookie performance with target variable
Target_5Yrs with 1: if career length >= 5 yrs or 0: if career length < 5 yrs.
The classification problem here is to predict if a player will last 5 years in the NBA.
Apply Logistic Regression, Gaussian Naïve Bayes and construct Neural Net-
works. After developing the various models you should also analyze the results and
discuss the effectiveness of the models, outlining the improvements when developing
the models and compare the approaches/algorithms used (strengths and weaknesses).
Ideas to consider when completing this task:
• Apply various algorithms to the problem. Caution: Use a small number rather
than many, analyse in depth rather than being superficial and repetitive.
• Is there a way of visualising the model(s)?
• How will you assess the effectiveness of the model(s)?
• Include as many features as you can. Does the model improve?
• Compare the models produced.
• How could you make further improvements?
• What can you conclude about your model?
• How strong is the relationship between the predictor and target variables?
Sample Screenshots:
This project can be used as final year project, capstone project, personal portfolio project, resume, proof of concept.
If you need implementation for the above problem or any of its variants, feel free to contact us.
Comments