Dimensionality Reduction: Feature Transformation using PCA
Here we have used the PCA(Principle component analysis) to reduce the dimension. In this, we will use the iris_data sets.
Import all the related libraries
# Import
from sklearn.decomposition import PCA
from sklearn import datasets
Now load the data set
# Create data
iris = datasets.load_iris()
X = iris.data
y = iris.target
Standardizing
It is used to change the non-numeric value into the numeric form.
from sklearn.preprocessing import StandardScaler
X_std = StandardScaler().fit_transform(X)
Now we are using the PCA
# Instantiate
pca = PCA(n_components=2)
# Fit and Apply dimensionality reduction on X
pca.fit_transform(X_std)
Finding the eigenvalues
pca.explained_variance_ratio_
Accessing Components
# Access components
pc_1 = pca.components_[0]
print(pc_1)
pc_2 = pca.components_[1]
print(pc_2)
PCA for Facial Recognition
In this example, we will do PCA for face recognization and reducing the dimension from 1850 PCs to 150 PCs.
Import all the libraries first
#import libraries
from time import time
import logging
import pylab as pl
from sklearn.model_selection import train_test_split
from sklearn.datasets import fetch_lfw_people
from sklearn.model_selection import GridSearchCV
from sklearn.metrics import classification_report, confusion_matrix, f1_score
from sklearn.decomposition import PCA as RandomizedPCA
from sklearn.svm import SVC
After importing all the related libraries split the data
# Data of famous people's faces
faces = fetch_lfw_people(min_faces_per_person=70, resize=0.4)
X = faces.data
y = faces.target
target_names = faces.target_names
n_classes = target_names.shape[0]
# Split data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25, random_state=42)
Show the features and classes in the train datasets
# Introspect the images arrays to find the shapes (for plotting)
n_samples, h, w = faces.images.shape
# For machine learning we use the data directly (as relative pixel
# position info is ignored by this model)
X = faces.data
n_features = X.shape[1]
# the label to predict is the id of the person
y = faces.target
target_names = faces.target_names
n_classes = target_names.shape[0]
print("Total dataset size:")
print("n_samples: %d" % n_samples)
print("n_features: %d" % n_features)
print("n_classes: %d" % n_classes)
Output:
Total dataset size: n_samples: 1288 n_features: 1850 n_classes: 7
Now applying PCA to reduce the dimension
Here we are applying the PCA to reduce the dimension from 1850 PCs to 150 PCs(n_components = 150)
# Compute a PCA (eigenfaces) on the face dataset
n_components = 150
print("Extracting the top {} eigenfaces from {} faces".format(n_components, X_train.shape[0]))
pca = RandomizedPCA(n_components=n_components, whiten=True).fit(X_train)
# We've gone from 1800 to 150
eigenfaces = pca.components_.reshape((n_components, h, w))
# Transform data into principal components representation
print("Projecting the input data on the eigenfaces orthonormal basis")
X_train_pca = pca.transform(X_train)
X_test_pca = pca.transform(X_test)
Extracting the top 150 eigenfaces from 966 faces Projecting the input data on the eigenfaces orthonormal basis
Now train data using the SVM Classification method
# Train an SVM classification model
print("Fitting the classifier to the training set")
param_grid = {
'C': [1e3, 5e3, 5e4, 1e5],
'gamma':[0.0001, 0.0005, 0.001, 0.005, 0.01, 0.1]
}
# Instantiate model
svm = SVC(kernel='rbf', class_weight='balanced', random_state=42)
# GridSearch
clf = GridSearchCV(svm, param_grid)
clf.fit(X_train_pca, y_train)
print(clf.best_estimator_)
Output:
Fitting the classifier to the training set SVC(C=1000.0, cache_size=200, class_weight='balanced', coef0=0.0, decision_function_shape=None, degree=3, gamma=0.001, kernel='rbf', max_iter=-1, probability=False, random_state=42, shrinking=True, tol=0.001, verbose=False)
Now Predicting the result
#predicting the test data
print("Predicting the people names on the testing test")
y_pred = clf.predict(X_test_pca)
Finding the Classification report
# Classification report and confusion matrix
print(classification_report(y_test, y_pred, target_names=target_names))
Output:
precision recall f1-score support Ariel Sharon 0.47 0.54 0.50 13 Colin Powell 0.72 0.85 0.78 60 Donald Rumsfeld 0.67 0.52 0.58 27 George W Bush 0.84 0.86 0.85 146 Gerhard Schroeder 0.80 0.80 0.80 25 Hugo Chavez 0.88 0.47 0.61 15 Tony Blair 0.76 0.69 0.72 36 avg / total 0.78 0.77 0.77 322
Get your project or assignment completed by Machine Learning learning expert and experienced developers and researchers.
OR
If you have project files, You can send at codersarts@gmail.com directly