The Breast cancer Wisconsin (diagnostic) dataset from scikit-learn contains information on two types of cancer: WDBC-Malignant and WDBC-Benign. The dataset contains 569 instances of data and each instance is described by 30 attributes. These attributes computed from medical images describe the following characteristics:
radius (mean of distances from center to points on the perimeter)
texture (standard deviation of gray-scale values)
perimeter
area
smoothness (local variation in radius lengths)
compactness (perimeter^2 / area - 1.0)
concavity (severity of concave portions of the contour)
concave points (number of concave portions of the contour)
symmetry
fractal dimension (“coastline approximation” - 1)
More information on the dataset can be found in the following links:
https://scikit-learn.org/stable/datasets/index.html
https://archive.ics.uci.edu/ml/datasets/Breast+Cancer+Wisconsin+(Diagnostic)
Requirement of the project:
• Built and implement a classifier of Breast cancer Wisconsin based on Multi-Layer Perceptron.
Study the performance of the classifier in terms of accuracy with respect to the different parameters of the MLP (number of Layers, activation function in hidden layers, learning rate, batch length, iterations number, etc.).
Study the generalization ability of the MLP classifier.
To improve the Generalization ability of the MLP classifier, we suggest to implement two different techniques:
1. Early Stopping
2. Dropout
Implement these two techniques and assess the impact of each technique on the improvement of the generalization performance.
Compare the performance of the MLP classifier with the performance of the SVM classifier.
Comentarios