top of page

Replacing strings with numbers in Python for Data Analysis with pandas

Updated: Mar 23, 2021


In this blog, we will learn how to implement string format data in ML and how to fit it into ML models.


Sometimes data is given in string format which is not fit into ML models, to solve this issue first changing a string value into any numeric value and then split it into training and testing data.


Let we will learn below data and fit it into ML models then we need to change sex column value into numeric like-


F(Female) - 1

M(Male) - 2



Here we change the value of F by 1 and M by 2, here below python code to do this is:


Here below steps to do this:


Step 1:


Read the CSV file using:



>>> df = pd.read_csv('mydata.csv')
>>> df.head()


And after this remove all nan value from the dataset



>>> data = df.dropna()



Step 2:


Divided data into target and source for training and testing. We will use one column as target for prediction


# divide data for training and testing


>>> x=data.drop('target column',axis=1)
>>> y=data.target column


Now we will split data into training and testing



>>> from sklearn.model_selection import train_test_split
>>> x_train,x_test,y_train,y_test=train_test_split(x,y,test_size=0.5)


Now we will change the value string to numeric for data analysis and fit it into ML models for predictions.


Dropping unnecessary column from data using drop:



>>> x_train_data = x_train.drop(['name'],axis=1)


Step 3:



>>>x_train_data .sex[x_train.sex == 'F'] = 1
>>>x_train_data .sex[x_train.sex == 'M'] = 2


By using this all value of sex column is updated to numeric values.


Step 4:


After this, we will fit it into the models


Fit into the Logistic Regression



>>> model = LogisticRegression()
>>> fit = model .fit(x_train_data, y_train)


I hope it may be helpful for you, and there are many models which you need to predict or need help to predict any types of ML models then contact us here

We have a highly professional expert team that help any type of machine learning and data science problem and give better solutions within your due date.




Comments


bottom of page