In machine learning, if the data is given in the form of non-numeric, then to convert it into the numeric form using the concept of encoding. There are different types of Encoding algorithms in machine learning which is given below:
Label Encoder
One hot encoding
Binary Encoding
Hashing
Target Encoding
Now we can go through "Label Encoder":
First import sklearn label encoder libraries:
import pandas as pd
from sklearn.preprocessing import LabelEncoder
Now fit data frame columns into the label encoder:
labelencoder = LabelEncoder()
df['x'] = labelencoder.fit_transform(df['x'])
Here 'x' is the column of the data frame
Example:
import pandas as pd
import numpy as np
# Define the headers since the data does not have any
headers = ["A", "B", "C", "D", "E","F", "G", "I", "J","K", "L", "M", "N", "O","P", "Q", "R", "S","T", "U", "V", "W", "X","Y", "Z", "A1"]
# Read in the CSV file and convert "?" to NaN
df = pd.read_csv("http://mlr.cs.umass.edu/ml/machine-learning-databases/autos/imports-85.data",
header=None, names=headers, na_values="?" )
df.head()
Now we will apply the label encoding on column 'E', which is non-numeric form:
#Now we convert 'E' column into the numeric using the label encoding in pandas dataframe
from sklearn.preprocessing import LabelEncoder
labelencoder = LabelEncoder()
df['E'] = labelencoder.fit_transform(df['E'])
And it displays the following result:
You can see into the above data frame output result column 'E' is successfully encoded into the numeric form.