Machine Learning Project Help - Dimensionality Reduction

Dimensionality Reduction: Feature Transformation using PCA

Here we have used the PCA(Principle component analysis) to reduce the dimension. In this, we will use the iris_data sets.

Import all the related libraries

# Import
from sklearn.decomposition import PCA
from sklearn import datasets

Now load the data set

# Create data
iris = datasets.load_iris()
X = iris.data
y = iris.target

Standardizing

It is used to change the non-numeric value into the numeric form.

from sklearn.preprocessing import StandardScaler
X_std = StandardScaler().fit_transform(X)

Now we are using the PCA

# Instantiate
pca = PCA(n_components=2)
# Fit and Apply dimensionality reduction on X
pca.fit_transform(X_std)

Finding the eigenvalues

pca.explained_variance_ratio_

Accessing Components

# Access components
pc_1 = pca.components_[0]
print(pc_1)
pc_2 = pca.components_[1]
print(pc_2)

PCA for Facial Recognition

In this example, we will do PCA for face recognization and reducing the dimension from 1850 PCs to 150 PCs.

Import all the libraries first

#import libraries
from time import time
import logging
import pylab as pl
from sklearn.model_selection import train_test_split
from sklearn.datasets import fetch_lfw_people
from sklearn.model_selection import GridSearchCV
from sklearn.metrics import classification_report, confusion_matrix, f1_score
from sklearn.decomposition import PCA as RandomizedPCA
from sklearn.svm import SVC

After importing all the related libraries split the data

# Data of famous people's faces
faces = fetch_lfw_people(min_faces_per_person=70, resize=0.4)
X = faces.data
y = faces.target
target_names = faces.target_names
n_classes = target_names.shape[0]

# Split data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25, random_state=42)

Show the features and classes in the train datasets


# Introspect the images arrays to find the shapes (for plotting)
n_samples, h, w = faces.images.shape

# For machine learning we use the data directly (as relative pixel
# position info is ignored by this model)
X = faces.data
n_features = X.shape[1]

# the label to predict is the id of the person
y = faces.target
target_names = faces.target_names
n_classes = target_names.shape[0]

print("Total dataset size:")
print("n_samples: %d" % n_samples)
print("n_features: %d" % n_features)
print("n_classes: %d" % n_classes)

Output:

Total dataset size: n_samples: 1288 n_features: 1850 n_classes: 7

Now applying PCA to reduce the dimension

Here we are applying the PCA to reduce the dimension from 1850 PCs to 150 PCs(n_components = 150)


# Compute a PCA (eigenfaces) on the face dataset
n_components = 150

print("Extracting the top {} eigenfaces from {} faces".format(n_components, X_train.shape[0]))
pca = RandomizedPCA(n_components=n_components, whiten=True).fit(X_train)

# We've gone from 1800 to 150
eigenfaces = pca.components_.reshape((n_components, h, w))

# Transform data into principal components representation
print("Projecting the input data on the eigenfaces orthonormal basis")
X_train_pca = pca.transform(X_train)
X_test_pca = pca.transform(X_test)

Extracting the top 150 eigenfaces from 966 faces Projecting the input data on the eigenfaces orthonormal basis

Now train data using the SVM Classification method


# Train an SVM classification model

print("Fitting the classifier to the training set")

param_grid = {
    'C': [1e3, 5e3, 5e4, 1e5],
    'gamma':[0.0001, 0.0005, 0.001, 0.005, 0.01, 0.1]
}

# Instantiate model
svm = SVC(kernel='rbf', class_weight='balanced', random_state=42)

# GridSearch
clf = GridSearchCV(svm, param_grid)
clf.fit(X_train_pca, y_train)
print(clf.best_estimator_)

Output:

Fitting the classifier to the training set SVC(C=1000.0, cache_size=200, class_weight='balanced', coef0=0.0, decision_function_shape=None, degree=3, gamma=0.001, kernel='rbf', max_iter=-1, probability=False, random_state=42, shrinking=True, tol=0.001, verbose=False)

Now Predicting the result

#predicting the test data
print("Predicting the people names on the testing test")
y_pred = clf.predict(X_test_pca)

Finding the Classification report

# Classification report and confusion matrix
print(classification_report(y_test, y_pred, target_names=target_names))

Output:

precision recall f1-score support Ariel Sharon 0.47 0.54 0.50 13 Colin Powell 0.72 0.85 0.78 60 Donald Rumsfeld 0.67 0.52 0.58 27 George W Bush 0.84 0.86 0.85 146 Gerhard Schroeder 0.80 0.80 0.80 25 Hugo Chavez 0.88 0.47 0.61 15 Tony Blair 0.76 0.69 0.72 36 avg / total 0.78 0.77 0.77 322

Get your project or assignment completed by Machine Learning learning expert and experienced developers and researchers.

Submit a proposal

If you have project files, You can send at codersarts@gmail.com directly