top of page

Natural Language Processing In Python : Part - 2



This is the part - 2 of our series "Natural Language Processing". In previous blog we learn all about text analysis using NLP. In this blog we will learn N - Grams, this is second topics of this series so ready to learn with NLP N-Grams.


Before start it first we will again repeat all about general information which we discuss in part - 1 also.


I suggest that you go through part - 1 before start it which is also more help-full for this NLP Series.


What is NLP ?


It is the branch of data science that consists of systematic processes for analyzing, understanding, and how to driving information from the text data in a smart and efficient manner.


First install libraries which is related to NLP -

nltk, numpy, matplotlib.pyplot, tweepy, TwitterSearch, unidecode, langdetect, langid, gensim


And then import all of these:


Install these all libraries which use in this


import nltk # https://www.nltk.org/install.html

import numpy # https://www.scipy.org/install.html

import matplotlib.pyplot # https://matplotlib.org/downloads.html

import tweepy # https://github.com/tweepy/tweepy

import TwitterSearch # https://github.com/ckoepp/TwitterSearch

import unidecode # https://pypi.python.org/pypi/Unidecode

import langdetect # https://pypi.python.org/pypi/langdetect

import langid # https://github.com/saffsd/langid.py

import gensim


List of Topics which we will covers in this series:

  • Text-analysis using NLTK library

  • N-Grams

  • Detecting text language

  • Language identifier

  • Stemming and Lemmatization using Bigrams

  • Finding unusual words

  • part of speech and meaning

  • Name-Gender identifier

  • Classify document into categories

  • Sentiment Analysis

  • Sentiment Analysis with NLTK

  • Work with Twitter streaming and Cleaning

  • Language detection


Now let's starts Topics -N Grams


What is N - Grams ?


N - Grams used to text mining, Language detection and natural language processing tasks. It is basically set of word which occurs in fixed widow. It is also used to find title of text. It move one word forward at every time and left one work backside. depends on user selection grams.


To understand it in general way, here we will go through this basic theoretical example.

Let suppose text is:


"He is very lazy boy."


If N=2(bigrams) orN=3(trigrams)


Then generated bigrams(N=2) is:


He is

is very

very lazy

lazy boy


Total number of N - grams in text


Here formula to find total number of N-grams in text file


Total Ngrams = X - ( N- 1 ) #where X is total number of word


Use of N - Grams


There are many uses of N - Grams. like

  • spelling corrections

  • word breaking and

  • text summarization


Here we will start it with advanced level with the help of examples :


Step - 1:

First tokenize text and removing punctuation


Run with jupyter notebook:


Step 2:

Generating 2 - Grams :Frist import this

from nltk.util import ngrams


Run with jupyter notebook:


If you want to select fixed field then used this line of code:


print(generated_2grams[:8])

Step 3:


Short n grams as per frequency



Run with jupyter notebook:


Thanks for reading, part - 2 is finished in next part we will learn "Detecting Text Language"


If you like Codersarts blog and looking for Assignment help,Project help, Programming tutors help and suggestion  you can send mail at contact@codersarts.com.

Please write your suggestion in comment section below if you find anything incorrect in this blog post 

Comments


bottom of page