Problem Statement
In this project, you will implement a parts-of-speech (POS) tagger using two methods, Hidden Markov Model (HMM) and Recurrent Neural Network (RNN). The dataset for this work is the Universal Dependency POS dataset that contains annotations with two different sets of tags, UD and Penn Treebank. You should use Penn Treebank for your task.
1. In this part, you will implement a Viterbi decoding algorithm for a generative HMM. Evaluate your model on test and validation datasets, and report accuracy, precision, recall, and F-measure. You can only use basic python and numpy libraries for this part.
Codes for downloading and preprocessing data:
import torch
import torchtext
import torchtext.data as data
# set up fields
text = data. Field(lower=True)
ud_tags = data.Field (unk_token = None)
ptb_tags = data. Field(unk_token = None)
fields = [('text', text), ('ud_tags', ud_tags), ('ptb_tags',ptb_tags)]
# make splits for data
train, val, test = torchtext.datasets. UDPOS. splits (fields)
# You can access tokens/tags by index
print('text', train.examples [0].text)
print('penn tags', train.examples [0].ptb_tags)
# Alternately, you can iterate over the object
for sample in train:
print('text', sample.text, 'penn tags', sample.ptb_tags)
Get solution of this project at affordable price. please send your query at contact@codersarts.com we'll share you price quote
How can you contact us for assignment Help.
Via Email: you can directly send your complete requirement files at email id contact@codersarts.com and our email team follow up there for complete discussion like deadline , budget, programming , payment details and expert meet if needed.
Website live chat: Chat with our live chat assistance for your basis queries and doubts for more information.
Contact Form: Fill up the contact form with complete details and we'll review and get back to you via email
Codersarts Dashboard : Register at Codersarts Dashboard, and track order progress
Comments