Dimensionality Reduction: Feature Transformation using PCA
Here we have used the PCA(Principle component analysis) to reduce the dimension. In this, we will use the iris_data sets.
Import all the related libraries
# Import
from sklearn.decomposition import PCA
from sklearn import datasets
Now load the data set
# Create data
iris = datasets.load_iris()
X = iris.data
y = iris.target
It is used to change the non-numeric value into the numeric form.
from sklearn.preprocessing import StandardScaler
X_std = StandardScaler().fit_transform(X)
Now we are using the PCA
# Instantiate
pca = PCA(n_components=2)
# Fit and Apply dimensionality reduction on X
Finding the eigenvalues
Accessing Components
# Access components
pc_1 = pca.components_[0]
pc_2 = pca.components_[1]
PCA for Facial Recognition
In this example, we will do PCA for face recognization and reducing the dimension from 1850 PCs to 150 PCs.
Import all the libraries first
#import libraries
from time import time
import logging
import pylab as pl
from sklearn.model_selection import train_test_split
from sklearn.datasets import fetch_lfw_people
from sklearn.model_selection import GridSearchCV
from sklearn.metrics import classification_report, confusion_matrix, f1_score
from sklearn.decomposition import PCA as RandomizedPCA
from sklearn.svm import SVC
After importing all the related libraries split the data
# Data of famous people's faces
faces = fetch_lfw_people(min_faces_per_person=70, resize=0.4)
X = faces.data
y = faces.target
target_names = faces.target_names
n_classes = target_names.shape[0]
# Split data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25, random_state=42)
Show the features and classes in the train datasets
# Introspect the images arrays to find the shapes (for plotting)
n_samples, h, w = faces.images.shape
# For machine learning we use the data directly (as relative pixel
# position info is ignored by this model)
X = faces.data
n_features = X.shape[1]
# the label to predict is the id of the person
y = faces.target
target_names = faces.target_names
n_classes = target_names.shape[0]
print("Total dataset size:")
print("n_samples: %d" % n_samples)
print("n_features: %d" % n_features)
print("n_classes: %d" % n_classes)
Total dataset size: n_samples: 1288 n_features: 1850 n_classes: 7
Now applying PCA to reduce the dimension
Here we are applying the PCA to reduce the dimension from 1850 PCs to 150 PCs(n_components = 150)
# Compute a PCA (eigenfaces) on the face dataset
n_components = 150
print("Extracting the top {} eigenfaces from {} faces".format(n_components, X_train.shape[0]))
pca = RandomizedPCA(n_components=n_components, whiten=True).fit(X_train)
# We've gone from 1800 to 150
eigenfaces = pca.components_.reshape((n_components, h, w))
# Transform data into principal components representation
print("Projecting the input data on the eigenfaces orthonormal basis")
X_train_pca = pca.transform(X_train)
X_test_pca = pca.transform(X_test)
Extracting the top 150 eigenfaces from 966 faces Projecting the input data on the eigenfaces orthonormal basis
Now train data using the SVM Classification method
# Train an SVM classification model
print("Fitting the classifier to the training set")
param_grid = {
'C': [1e3, 5e3, 5e4, 1e5],
'gamma':[0.0001, 0.0005, 0.001, 0.005, 0.01, 0.1]
# Instantiate model
svm = SVC(kernel='rbf', class_weight='balanced', random_state=42)
# GridSearch
clf = GridSearchCV(svm, param_grid)
clf.fit(X_train_pca, y_train)
Fitting the classifier to the training set SVC(C=1000.0, cache_size=200, class_weight='balanced', coef0=0.0, decision_function_shape=None, degree=3, gamma=0.001, kernel='rbf', max_iter=-1, probability=False, random_state=42, shrinking=True, tol=0.001, verbose=False)
Now Predicting the result
#predicting the test data
print("Predicting the people names on the testing test")
y_pred = clf.predict(X_test_pca)
Finding the Classification report
# Classification report and confusion matrix
print(classification_report(y_test, y_pred, target_names=target_names))
precision recall f1-score support Ariel Sharon 0.47 0.54 0.50 13 Colin Powell 0.72 0.85 0.78 60 Donald Rumsfeld 0.67 0.52 0.58 27 George W Bush 0.84 0.86 0.85 146 Gerhard Schroeder 0.80 0.80 0.80 25 Hugo Chavez 0.88 0.47 0.61 15 Tony Blair 0.76 0.69 0.72 36 avg / total 0.78 0.77 0.77 322
