INTRODUCTION
In this blog, you will be introduced to TextBlob library and some of its functions that we are going to perform on text file having the texts of novel named Emma by Jane Austen. You will also be introduced to WordCloud.
IMPLEMENTATION
To begin with, we will first import the essential libraries.
import nltk
import spacy
import pandas as pd
nltk.download('punkt')
nltk.download('wordnet')
nltk.download('stopwords')
en_stop = set(nltk.corpus.stopwords.words('english'))
nltk.download('omw-1.4')
nlp = spacy.load('en_core_web_sm')
all_stopwords = nlp.Defaults.stop_words
from textblob import TextBlob
nltk.download('averaged_perceptron_tagger') # for pos
nltk.download('brown') # for noun phrase
from textblob import Word
from operator import itemgetter
import imageio
from wordcloud import WordCloud
import matplotlib.pyplot as plt
Now, we are ready to import the text file. The file is available at Project Gutenberg. To Download the file, you can refer to this link: Emma By Jane Austen
text = open("/content/emma.text","r+")
get_text = text.read()
no_specials_string = re.sub('[!#?,:";]', ' ', get_text)
Now, we will load the text into TextBlob() which takes string as input.
blob = TextBlob(no_specials_string)
blob.sentences[:5]
[Sentence("VOLUME I
CHAPTER I
Emma Woodhouse handsome clever and rich with a comfortable home and
happy disposition seemed to unite some of the best blessings of
existence and had lived nearly twenty one years in the world with very
little to distress or vex her."),
Sentence("She was the youngest of the two daughters of a most affectionate
indulgent father and had in consequence of her sister’s marriage
been mistress of his house from a very early period."),
Sentence("Her mother had
died too long ago for her to have more than an indistinct remembrance
of her caresses and her place had been supplied by an excellent woman
as governess who had fallen little short of a mother in affection."),
Sentence("Sixteen years had Miss Taylor been in Mr. Woodhouse’s family less as a
governess than a friend very fond of both daughters but particularly
of Emma."),
Sentence("Between _them_ it was more the intimacy of sisters.")]
To get the first-hundred tocknized words.
blob.words[:100]
WordList(['VOLUME', 'I', 'CHAPTER', 'I', 'Emma', 'Woodhouse', 'handsome', 'clever', 'and', 'rich', 'with', 'a', 'comfortable', 'home', 'and', 'happy', 'disposition', 'seemed', 'to', 'unite', 'some', 'of', 'the', 'best', 'blessings', 'of', 'existence', 'and', 'had', 'lived', 'nearly', 'twenty', 'one', 'years', 'in', 'the', 'world', 'with', 'very', 'little', 'to', 'distress', 'or', 'vex', 'her', 'She', 'was', 'the', 'youngest', 'of', 'the', 'two', 'daughters', 'of', 'a', 'most', 'affectionate', 'indulgent', 'father', 'and', 'had', 'in', 'consequence', 'of', 'her', 'sister', '’', 's', 'marriage', 'been', 'mistress', 'of', 'his', 'house', 'from', 'a', 'very', 'early', 'period', 'Her', 'mother', 'had', 'died', 'too', 'long', 'ago', 'for', 'her', 'to', 'have', 'more', 'than', 'an', 'indistinct', 'remembrance', 'of', 'her', 'caresses', 'and', 'her']
To get the parts of speech, we will use .tags attribuate.
blob.tags[:10]
[('VOLUME', 'NNP'),
('I', 'PRP'),
('CHAPTER', 'VBP'),
('I', 'PRP'),
('Emma', 'NNP'),
('Woodhouse', 'NNP'),
('handsome', 'VBD'),
('clever', 'NN'),
('and', 'CC'),
('rich', 'JJ')]
Perform sentiment analysis on the text.
for sentence in blob.sentences[:5]:
print(sentence)
print(sentence.sentiment)
print()
VOLUME I
CHAPTER I
Emma Woodhouse handsome clever and rich with a comfortable home and
happy disposition seemed to unite some of the best blessings of
existence and had lived nearly twenty one years in the world with very
little to distress or vex her.
Sentiment(polarity=0.3872395833333333, subjectivity=0.7166666666666668)
She was the youngest of the two daughters of a most affectionate
indulgent father and had in consequence of her sister’s marriage
been mistress of his house from a very early period.
Sentiment(polarity=0.315, subjectivity=0.445)
Her mother had
died too long ago for her to have more than an indistinct remembrance
of her caresses and her place had been supplied by an excellent woman
as governess who had fallen little short of a mother in affection.
Sentiment(polarity=0.2525, subjectivity=0.5399999999999999)
Sixteen years had Miss Taylor been in Mr. Woodhouse’s family less as a
governess than a friend very fond of both daughters but particularly
of Emma.
Sentiment(polarity=0.06666666666666667, subjectivity=0.2333333333333333)
Between _them_ it was more the intimacy of sisters.
Sentiment(polarity=0.5, subjectivity=0.5)
To get the definition of a specific word, We will use Word('specific word').definitions
Word('resolved').definitions
['bring to an end; settle conclusively',
'reach a conclusion after a discussion or deliberation',
'reach a decision',
'understand the meaning of',
'make clearly visible',
'find the solution',
'cause to go into a solution',
'determined',
'explained or answered']
To get the synonym of a specific word, we will use Word('specific word').synsets
Word('resolved').synsets
[Synset('decide.v.02'),
Synset('conclude.v.03'),
Synset('purpose.v.02'),
Synset('answer.v.04'),
Synset('resolve.v.05'),
Synset('resolve.v.06'),
Synset('dissolve.v.02'),
Synset('single-minded.s.01'),
Synset('solved.a.01')]
To get the n-grams, we will use ngrams('number of ngrams') function.
blob.ngrams()[:5]
[WordList(['VOLUME', 'I', 'CHAPTER']),
WordList(['I', 'CHAPTER', 'I']),
WordList(['CHAPTER', 'I', 'Emma']),
WordList(['I', 'Emma', 'Woodhouse']),
WordList(['Emma', 'Woodhouse', 'handsome'])]
Get n-grams of five words.
blob.ngrams(n = 5)[:5]
[WordList(['VOLUME', 'I', 'CHAPTER', 'I', 'Emma']),
WordList(['I', 'CHAPTER', 'I', 'Emma', 'Woodhouse']),
WordList(['CHAPTER', 'I', 'Emma', 'Woodhouse', 'handsome']),
WordList(['I', 'Emma', 'Woodhouse', 'handsome', 'clever']),
WordList(['Emma', 'Woodhouse', 'handsome', 'clever', 'and'])]
To get the counts each word, we will use word_counts.items() expressio. After that, we will remove the stop words from and then perform sort operation inn descending order to get the words with most number of frequencies.
items = blob.word_counts.items()
items = [item for item in items if item[0] not in all_stopwords]
sorted_items = sorted(items, key=itemgetter(1), reverse=True)
Get the top-twenty words.
top20 = sorted_items[1:21] pd.DataFrame(top20, columns = ['words', 'count'])
Get the words with most number of frequencies
df = df.iloc[4:, :]
axes = df.plot.bar(x='words', y='count')
Import an image with white background to create WordCloud.
mask_image = imageio.imread("white.jpg")
wordcloud = wordcloud.generate(no_specials_string)
wordcloud = wordcloud.to_file("new_white.jpg")
If you need implementation for any of the topics mentioned above or assignment help on any of its variants, feel free to contact us.
Comments