top of page

Chatbot Assignment Help

Updated: May 11, 2022





Aim of the project is to build an intelligent conversational chatbot, Riki, that can understand


complex queries from the user and intelligently respond.


Background


R-Intelligence Inc., an AI startup, has partnered with an online chat and discussion website bluedit.io. They have an average of over 5 million active customers across the globe and more than 100,000 active chat rooms. Due to the increased traffic, they are looking at improving their user experience with a chatbot moderator, which helps them engage in a meaningful conversation and keeps them updated on trending topics, while merely chatting with Riki, a chatbot. The Artificial Intelligence-powered chat experience provides easy access to information and a host of options to the customers.


Business Requirement


R-Intelligence Inc. has invested in Python, PySpark, and Tensorflow. Using emerging technologies of Artificial Intelligence, Machine Learning, and Natural Language Processing, Riki


- the chatbot should make the whole conversation as realistic as talking to an actual human.


The chatbot should understand that users have different intents and make it extremely simple to work around these by presenting the users with options and recommendations that best suit their needs.


Suggested Approach


R-Intelligence Inc. used an approach using only Natural Language Processing, in which Seq2seq models (encoder and Decoder) are used as the state-of-the-art approach to implement end to end text generation for a conversational bot.


Tasks to be performed


Download the glove model available at https://nlp.stanford.edu/projects/glove/ Specification: Twitter (2B tweets, 27B tokens, 1.2M vocab, uncased, 250, 500, 100d, & 2000 vectors, 1.42 GB download): glove. twitter.27B.zip


Load the glove word embedding into a dictionary where the key is a unique word token and the value is a d dimension vector


Data Preparation - Filter the conversations till max word length and convert the dialogues pairs into input text and target texts. Put start and end token to recognize the beginning and end of the sentence token.


Create two dictionaries:


target_word2id

target_id2word

and save it as NumPy file format in the disk.


• Prepare the input data with embedding. The input data is a list of lists:

First list is a list of sentences

Each sentence is a list of words


• Generate training data per batch


• Define the model architecture and perform the following steps:


Step 1: Use a LSTM encoder to get input words encoded in the form of (encoder outputs, encoder hidden state, encoder context) from input words


Step 2: Use a LSTM decoder to get target words encoded in the form of (decoder outputs, decoder hidden state, decoder context) from target words. Use encoder hidden states and encoder context (represents input memory) as initial state.


Step 3: Use a dense layer to predict the next token out of the vocabulary given decoder output generated by Step 2.


Step 4: Use loss ='categorical_crossentropy' and optimizer='rmsprop'


• Generate the model summary

• Finally generate the prediction


Dataset Description

Dataset: Cornell Movie Dialogue corpus


Brief Description

This corpus contains a large metadata-rich collection of fictional conversations extracted from raw movie scripts:


  • 220,579 conversational exchanges between 10,292 pairs of movie characters

  • involves 9,035 characters from 617 movies

  • in total 304,713 utterances

  • movie metadata included:


genres

release year

IMDB rating

number of IMDB votes

IMDB rating


character metadata included:

- gender (for 3,774 characters)

- position on movie credits (3,321 characters)



File Description

In all files the field separator is ". +++$+++ "


movie_titles_metadata.txt


Contains information about each movie title


fields:

movieID,

movie title,

movie year,

IMDB rating,

no. IMDB votes,

genres in the format ['genre1', 'genre2', É,'genren']


Movie_characters_metadata.txt


Contains information about each movie character


fields:

characterID

character name

movieID

movie title

gender ("?" for unlabeled cases)

Position in credits (“?” for unlabelled cases)


movie_lines.txt

Contains the actual text of each utterance


fields:

lineID

characterID (who uttered this phrase)

movieID

character name

text of the utterance


raw_script_urls.txt


Contains the urls from which the raw sources were retrieved


How to Start with the Project?


1. Login to the Google Co-lab, load the notebook to the environment. Go to Runtime to choose the "Change runtime type". For faster training, choose GPU as the hardware accelerator and SAVE it.


2. Open the “Chatbot.ipynb' notebook and start filling the code.


3. Import all the necessary Python packages. Numpy and Pandas for numerical processing, data importing, preprocessing etc. Sklearn for splitting datasets, keras/tensorflow for deep learning model creation, training, testing, inference etc.


4. From here you can take over to the project and start building the conversational chatbot.


How Codersarts can Help you in Chatbot?


Codersarts provide:


  • Chatbot Assignment help

  • Chatbot Error Resolving Help

  • Mentorship in Cahtbot from Experts

  • Chatbot Development Project

If you are looking for any kind of Help in Chatbot assignment Contact us


Recent Posts

See All

Comments


bottom of page