Need help with Machine learning Assignment Help or Project Help? At Codersarts we offer session with expert, Code mentorship, Code mentorship, Course Training, and ongoing development projects. Get help from vetted Machine Learning engineers, mentors, experts, and tutors.
Task
In this project we will implement a random forest classifier in Python via a Jupyter notebook. The performance of the classifier will be evaluated via the out-of-bag (OOB) error estimate using the provided dataset pimaindians-diabetes.csv, a comma-separated (csv) file in the Q2 folder. The dataset was derived from the National Institute of Diabetes and Digestive and Kidney Diseases. You must not modify the dataset. Each row describes one person (a data point, or data record) using 9 columns. The first 8 are attributes. The 9th is the label, and you must NOT treat it as an attribute. You will perform binary classification on the dataset to determine if a person has diabetes.
Technology used
Python 3.7.x
Scikit
Deliverable files
Code
We have prepared some Python starter code to help you load the data and evaluate your model. The starter file name is Q2.ipynb has three classes:
Utility: contains utility functions that help you build a decision tree
DecisionTree: a decision tree class that you will use to build your random forest
RandomForest: a random forest class
We will Implement
Utility class: implement the functions to compute entropy, information gain, perform splitting, and find the best variable (attribute) and split-point. You can add additional methods for convenience. Note: Do not round the output or any of your functions. 2.
DecisionTree class: implement the learn() method to build your decision tree using the utility functions above. 3.
DecisionTree class: implement the classify() method to predict the label of a test record using your decision tree. 4.
RandomForest class: implement the methods _bootstrapping(), fitting(), voting() and user(). 5. get_random_seed(),
get_forest_size(): implement the functions to return a random seed and forest size (number of decision trees) for your implementation.
Description
Essential Reading Decision Trees. To complete this question, you will develop a good understanding of how decision trees work. We recommend that you review the lecture on the decision tree. Specifically, review how to construct decision trees using Entropy and Information Gain to select the splitting attribute and split point for the selected attribute. These slides from CMU (also mentioned in the lecture) provide an excellent example of how to construct a decision tree using Entropy and Information Gain. Note: there is a typo on page 10, containing the Entropy equation; ignore one negative sign (only one negative sign is needed).
Random Forests. To refresh your memory about random forests, see Chapter 15 in the Elements of Statistical Learning book and the lecture on random forests. Here is a blog post that introduces random forests in a fun way, in layman’s terms. Out-of-Bag Error Estimate. In random forests, it is not necessary to perform explicit cross-validation or use a separate test set for performance evaluation.
Out-of-bag (OOB) error estimate has shown to be reasonably accurate and unbiased. Below, we summarize the key points about OOB in the original article by Breiman and Cutler. Each tree in the forest is constructed using a different bootstrap sample from the original data.
Each bootstrap sample is constructed by randomly sampling from the original dataset with replacement (usually, a bootstrap sample has the same size as the original dataset). Statistically, about one-third of the data records (or data points) are left out of the bootstrap sample and not used in the construction of the kth tree. For each data record that is not used in the construction of the kth tree, it can be classified by the kth tree. As a result, each record will have a “test set” classification by the subset of trees that treat the record as an out-of-bag sample. The majority vote for that record will be its predicted class. The proportion of times that a record’s predicted class is different from the true class, averaged over all such records, is the OOB error estimate.
How Codersarts can Help you in Random forest classifier?
Codersarts provide:
Random Forest Algorithm Assignment help
Random Forest Algorithm Project Help
Mentorship in Random Forest Algorithm from Experts
Random Forest Algorithm Development Project
If you are looking for any kind of Help in Random Forest Algorithm Contact us
Comments