top of page

INTRODUCTION TO TEXTBLOB AND WORDCLOUD

Updated: Aug 20, 2022

INTRODUCTION

In this blog, you will be introduced to TextBlob library and some of its functions that we are going to perform on text file having the texts of novel named Emma by Jane Austen. You will also be introduced to WordCloud.


IMPLEMENTATION

To begin with, we will first import the essential libraries.


import nltk
import spacy
import pandas as pd
nltk.download('punkt')
nltk.download('wordnet')
nltk.download('stopwords')
en_stop = set(nltk.corpus.stopwords.words('english'))
nltk.download('omw-1.4')
nlp = spacy.load('en_core_web_sm')
all_stopwords = nlp.Defaults.stop_words
from textblob import TextBlob
nltk.download('averaged_perceptron_tagger') # for pos 
nltk.download('brown') # for noun phrase
from textblob import Word 
from operator import itemgetter
import imageio
from wordcloud import WordCloud
import matplotlib.pyplot as plt

Now, we are ready to import the text file. The file is available at Project Gutenberg. To Download the file, you can refer to this link: Emma By Jane Austen


text = open("/content/emma.text","r+") 
get_text = text.read()
no_specials_string = re.sub('[!#?,:";]', ' ', get_text)

Now, we will load the text into TextBlob() which takes string as input.

blob = TextBlob(no_specials_string)
blob.sentences[:5]
 [Sentence("VOLUME I




  CHAPTER I


  Emma Woodhouse  handsome  clever  and rich  with a comfortable home and
  happy disposition  seemed to unite some of the best blessings of
  existence  and had lived nearly twenty one years in the world with very
  little to distress or vex her."),
  Sentence("She was the youngest of the two daughters of a most affectionate 
  indulgent father  and had  in consequence of her sister’s marriage 
  been mistress of his house from a very early period."),
  Sentence("Her mother had
  died too long ago for her to have more than an indistinct remembrance
  of her caresses  and her place had been supplied by an excellent woman
  as governess  who had fallen little short of a mother in affection."),
  Sentence("Sixteen years had Miss Taylor been in Mr. Woodhouse’s family  less as a
  governess than a friend  very fond of both daughters  but particularly
  of Emma."),
  Sentence("Between _them_ it was more the intimacy of sisters.")]

To get the first-hundred tocknized words.

blob.words[:100]
WordList(['VOLUME', 'I', 'CHAPTER', 'I', 'Emma', 'Woodhouse', 'handsome', 'clever', 'and', 'rich', 'with', 'a', 'comfortable', 'home', 'and', 'happy', 'disposition', 'seemed', 'to', 'unite', 'some', 'of', 'the', 'best', 'blessings', 'of', 'existence', 'and', 'had', 'lived', 'nearly', 'twenty', 'one', 'years', 'in', 'the', 'world', 'with', 'very', 'little', 'to', 'distress', 'or', 'vex', 'her', 'She', 'was', 'the', 'youngest', 'of', 'the', 'two', 'daughters', 'of', 'a', 'most', 'affectionate', 'indulgent', 'father', 'and', 'had', 'in', 'consequence', 'of', 'her', 'sister', '’', 's', 'marriage', 'been', 'mistress', 'of', 'his', 'house', 'from', 'a', 'very', 'early', 'period', 'Her', 'mother', 'had', 'died', 'too', 'long', 'ago', 'for', 'her', 'to', 'have', 'more', 'than', 'an', 'indistinct', 'remembrance', 'of', 'her', 'caresses', 'and', 'her']

To get the parts of speech, we will use .tags attribuate.

blob.tags[:10]
  [('VOLUME', 'NNP'),
 ('I', 'PRP'),
 ('CHAPTER', 'VBP'),
 ('I', 'PRP'),
 ('Emma', 'NNP'),
 ('Woodhouse', 'NNP'),
 ('handsome', 'VBD'),
 ('clever', 'NN'),
 ('and', 'CC'),
 ('rich', 'JJ')]
 

Perform sentiment analysis on the text.

for sentence in blob.sentences[:5]:
	print(sentence)
	print(sentence.sentiment)
	print()
  VOLUME I




  CHAPTER I


  Emma Woodhouse  handsome  clever  and rich  with a comfortable home and
  happy disposition  seemed to unite some of the best blessings of
  existence  and had lived nearly twenty one years in the world with very
  little to distress or vex her.
  Sentiment(polarity=0.3872395833333333, subjectivity=0.7166666666666668)

  She was the youngest of the two daughters of a most affectionate 
  indulgent father  and had  in consequence of her sister’s marriage 
  been mistress of his house from a very early period.
  Sentiment(polarity=0.315, subjectivity=0.445)

  Her mother had
  died too long ago for her to have more than an indistinct remembrance
  of her caresses  and her place had been supplied by an excellent woman
  as governess  who had fallen little short of a mother in affection.
  Sentiment(polarity=0.2525, subjectivity=0.5399999999999999)

  Sixteen years had Miss Taylor been in Mr. Woodhouse’s family  less as a
  governess than a friend  very fond of both daughters  but particularly
  of Emma.
  Sentiment(polarity=0.06666666666666667, subjectivity=0.2333333333333333)

  Between _them_ it was more the intimacy of sisters.
  Sentiment(polarity=0.5, subjectivity=0.5)

To get the definition of a specific word, We will use Word('specific word').definitions

Word('resolved').definitions
['bring to an end; settle conclusively',
 'reach a conclusion after a discussion or deliberation',
 'reach a decision',
 'understand the meaning of',
 'make clearly visible',
 'find the solution',
 'cause to go into a solution',
 'determined',
 'explained or answered']

To get the synonym of a specific word, we will use Word('specific word').synsets

Word('resolved').synsets
 [Synset('decide.v.02'),
 Synset('conclude.v.03'),
 Synset('purpose.v.02'),
 Synset('answer.v.04'),
 Synset('resolve.v.05'),
 Synset('resolve.v.06'),
 Synset('dissolve.v.02'),
 Synset('single-minded.s.01'),
 Synset('solved.a.01')]

To get the n-grams, we will use ngrams('number of ngrams') function.

blob.ngrams()[:5]
  [WordList(['VOLUME', 'I', 'CHAPTER']),
  WordList(['I', 'CHAPTER', 'I']),
  WordList(['CHAPTER', 'I', 'Emma']),
  WordList(['I', 'Emma', 'Woodhouse']),
  WordList(['Emma', 'Woodhouse', 'handsome'])]

Get n-grams of five words.

blob.ngrams(n = 5)[:5]
  [WordList(['VOLUME', 'I', 'CHAPTER', 'I', 'Emma']),
 WordList(['I', 'CHAPTER', 'I', 'Emma', 'Woodhouse']),
 WordList(['CHAPTER', 'I', 'Emma', 'Woodhouse', 'handsome']),
 WordList(['I', 'Emma', 'Woodhouse', 'handsome', 'clever']),
 WordList(['Emma', 'Woodhouse', 'handsome', 'clever', 'and'])]

To get the counts each word, we will use word_counts.items() expressio. After that, we will remove the stop words from and then perform sort operation inn descending order to get the words with most number of frequencies.

items = blob.word_counts.items()
items = [item for item in items if item[0] not in all_stopwords]
sorted_items = sorted(items, key=itemgetter(1), reverse=True) 

Get the top-twenty words.

top20 = sorted_items[1:21] pd.DataFrame(top20, columns = ['words', 'count'])


Get the words with most number of frequencies

df = df.iloc[4:, :]
axes = df.plot.bar(x='words', y='count')

Import an image with white background to create WordCloud.
mask_image = imageio.imread("white.jpg")
wordcloud = wordcloud.generate(no_specials_string)
wordcloud = wordcloud.to_file("new_white.jpg")

If you need implementation for any of the topics mentioned above or assignment help on any of its variants, feel free to contact us.


Comments


bottom of page