Introduction
Speech Emotion Recognition, abbreviated as SER, is the act of attempting to recognize human emotion and affective states from speech. This is capitalizing on the fact that voice often reflects underlying emotion through tone and pitch. This is also the phenomenon that animals like dogs and horses employ to be able to understand human emotion.
First Import the "librosa" Libraries:
Librosa is a Python for analyzing audio and music. It has a flatter package layout, standardizes interfaces and names, backwards compatibility, modular functions, and readable code.
After this start the Jupyter notebook and then import all the related packages
#install all the related libraries
pip install librosa soundfile numpy sklearn pyaudio
Import libraries
#import all the libraries
import librosa
import soundfile
import os, glob, pickle
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.neural_network import MLPClassifier
from sklearn.metrics import accuracy_score
Extract the mfcc, chroma, and mel features from a sound file
#Extract sound file
def extract_feature(file_name, mfcc, chroma, mel):
with soundfile.SoundFile(file_name) as sound_file:
X = sound_file.read(dtype="float32") sample_rate=sound_file.samplerate
if chroma:
stft=np.abs(librosa.stft(X))
result=np.array([])
if mfcc:
mfccs=np.mean(librosa.feature.mfcc(y=X, sr=sample_rate, n_mfcc=40).T, axis=0)
result=np.hstack((result, mfccs))
if chroma:
chroma=np.mean(librosa.feature.chroma_stft(S=stft, sr=sample_rate).T,axis=0)
result=np.hstack((result, chroma))
if mel:
mel=np.mean(librosa.feature.melspectrogram(X, sr=sample_rate).T,axis=0)
result=np.hstack((result, mel))
return result
Creating the dictionary of emotion
#creating the emotion
emotions={ '01':'neutral', '02':'calm', '03':'happy', '04':'sad', '05':'angry', '06':'fearful', '07':'disgust', '08':'surprised'}
#Emotions to observe
observed_emotions=['calm', 'happy', 'fearful', 'disgust']
Now load data
Load the data with a function load_data() – this takes in the relative size of the test set as a parameter.
def load_data(test_size=0.2):
x,y=[],[]
for file in glob.glob("filename.wav"):
file_name=os.path.basename(file)
emotion=emotions[file_name.split("-")[2]]
if emotion not in observed_emotions:
continue
feature=extract_feature(file, mfcc=True, chroma=True, mel=True)
x.append(feature)
y.append(emotion)
return train_test_split(np.array(x), y, test_size=test_size, random_state=9)
Split the dataset
#Split Data Sets
x_train,x_test,y_train,y_test=load_data(test_size=0.25)
Get the shape of the training and testing datasets
#printing the shape of datasets
print((x_train.shape[0], x_test.shape[0]))
Training the model
Initialize the Multi-Layer Perceptron Classifier
#initialize the model
model=MLPClassifier(alpha=0.01, batch_size=256, epsilon=1e-08, hidden_layer_sizes=(300,), learning_rate='adaptive', max_iter=500)
Fit/train the model.
#fit into the model
model.fit(x_train,y_train)
Find the accuracy
#find the accuracy
accuracy=accuracy_score(y_true=y_test, y_pred=y_pred)
#Print the accuracy
print("Accuracy: {:.2f}%".format(accuracy*100))
Get your project or assignment completed by Deep learning expert and experienced developers and researchers.
OR
If you have project files, You can send at codersarts@gmail.com directly