top of page

Data Modelling In Machine Learning | Machine Learning Homework Help | Codersarts



Introduction

This assignment focuses on data modelling, a core step in the data science process. You will need to develop and implement appropriate steps, in IPython, to complete the cor-

responding tasks.


This assignment is intended to give you practical experience with the typical 5th and 6th steps of the data science process: data modelling, and presentation and automation. The \Practical Data Science" Canvas contains further announcements and a discussion board for this assignment. Please be sure to check these on a regular basis { it is your responsibility to stay informed with regards to any announcements or changes.

Login through https://rmit.instructure.com/.


Coding Environment

Please develop your code by using Anaconda (with Python 3 or above version).

Academic integrity and plagiarism (standard warning) Academic integrity is about the honest presentation of your academic work. It means acknowledging the work of others while developing your own insights, knowledge, and ideas.


You should take extreme care that you have:

  • Acknowledged words, data, diagrams, models, frameworks and/or ideas of others you have quoted (i.e. directly copied), summarised, paraphrased, discussed or mentioned in your assessment through the appropriate referencing methods

  • Provided a reference list of the publication details so your reader can locate the source if necessary. This includes material taken from Internet sites. If you do not acknowledge the sources of your material, you may be accused of plagiarism because you have passed o the work and ideas of another person without appropriate referencing, as if they were your own.

RMIT University treats plagiarism as a very serious offense constituting misconduct.

Plagiarism covers a variety of inappropriate behaviors, including:

  • Failure to properly document a source

  • Copyright material from the internet or databases

  • Collusion between students

For further information on our policies and procedures, please refer to the following:

https://www.rmit.edu.au/students/student-essentials/rights-and-responsibilities/

academic-integrity.


General Requirements

This section contains information about the general requirements that your assignment

must meet. Please read all requirements carefully before you start.

  • You must do all modelling in IPython or Jupyter Notebook (in Anaconda).

  • You must include a plain text le called \readme.txt" with your submission. This file should include your name and student ID, and instructions for how to execute your submitted script les. This is important as automation is part of the 6th step of the data science process, and will be assessed strictly.

  • Parts of this assignment will include a written report, this must be in PDF format.

  • Please ensure that your submission follows the le naming rules speci fied in the tasks below. File names are case sensitive, i.e. if it is speci fied that the le name is gryphon, then that is exactly the fi le name you should submit; Gryphon, GRYPHON, griffin, and anything else but gryphon will be rejected.


Task 1: Retrieving and Preparing the Data

This assignment will focus on data modelling, and you can choose to focus on one ap-

approach: Classi cation or Clustering.


For this assignment, you need to select one dataset from the following options, and

then work on it:


Being a careful data scientist, you know that it is vital to set the goal of the project, then thoroughly pre-process any available data (each attribute) before starting to analyse and model it. In your report in Task 4, You need to clearly state the goal of your project, and the design/steps of pre-processing your data. Please ensure you understand the data you selected, including the meaning of each attribute.


Task 2: Data Exploration

Explore the selected data, carrying out the following tasks:

  • Explore each column (or at least 10 columns if there are more than 10 columns), using appropriate descriptive statistics and graphs (if appropriate). For each explored column, please think carefully and report in your report in Task 4): 1) the way you used to explore a column (e.g. the graph); 2) what you can observe from the way you used to explore it. (Please format each graph carefully, and use it in your final report. You need to include appropriate labels on the x-axis and y-axis, a title, and a legend. The fonts should be sized for good readability. Components of the graphs should be coloured appropriately, if applicable.)

  • Explore the relationship between all pairs of attributes (or at least 10 pairs of attributes, if there are more in the data), and show the relationship in appropriate graphs. You may choose which pairs of columns to focus on, but you need to generate a visualization graph for each pair of attributes. Each of the attribute pair should address a plausible hypothesis for the data concerned. In your report, for each plot (pair of attributes), state the hypothesis that you are investigating. Then, briefly discuss any interesting relationships (or lack of relationships) that you can observe from your visualization.


Please note you do not need to put all the graphs in your report, and you only need to include the representative ones and/or those showing signifi cant information.


Task 3: Data Modelling

Model the data by treating it as either a Classification or Clustering Task, depending on your choice.

You must use two different models (i.e. two Classification models, or two Clustering models), and when building each model, it must include the following steps:

  • Select the appropriate features

  • Select the appropriate model (e.g. DecisionT ree for classification) from sklearn.

If you choose to do a Classifi cation Task,

  1. Train and evaluate the model appropriately.

  2. Train the model by selecting the appropriate values for each parameter in the model. You need to show how you choose these values and justify why you choose it.

If you choose to do a Clustering Task,


Train the model by selecting appropriate values for each parameter in the

model.


-----> Show how do you choose this value, and justify why you choose it (for

example, k in the k-means model).


Determine the optimal number of clusters, and justify


Evaluate the performance of the clustering model by:


-----> Checking the clustering results against the true observation labels


----->Constructing a \confusion matrix" to analyze the meaning of each cluster by looking at the majority of observations in the cluster. (You can do this by using a pen and a piece of paper, as we did in Practical Exercise; if you prefer, you can also explore how to do this step directly in IPython.)


After you have built two Classification models, or two Clustering models, on your data, the next step is to compare the models. You need to include the results of this comparison, including a recommendation of which model should be used, in your report

(see next section).



Contact us for this machine learning assignment Solutions by Codersarts Specialist who can help you mentor and guide for such machine learning assignments.


If you have project or assignment files, You can send at contact@codersarts.com directly


Comments


bottom of page