Designing a Part Of Speech Tagger


We Will Write a Custom Essay Specifically
For You For Only $13.90/page!

order now

Sudhanshu Srivastava

1506041 (, Part-3)

NIT Patna, Bihar

Under the Guidance of

Dr. A. K. Singh














Department of Computer Science &


VARANASI – 221005

Artificial Intelligence

It could be taken as the superset of
machine learning which itself is a superset of deep learning. On a frank scale,
it could be said as the Technology which gives a machine human like
computational approach.


language processing

A branch of Artificial Intelligence
which deals with the way of  communicating
with a machine/intelligent system with any natural  language like  English or Hindi.



Giving a computer the ability to learn
without being explicitly programmed on that very interest. Basically, training
a system on the past so that it could predict the output of present/future.


It has two Sub branches –




Machine learning is the superset of
Deep learning.




The machines generate their features
by themselves, basically forming Algorithms to mimic human brain.

It is implemented through neural
networks which has a basic unit called perceptron which is the functional unit
of the neural networks.

The basic Structure of a perceptron. At
first the weights are randomly assigned to the inputs.

propagation method

Compares the output with the given
output and changes the weight correspondingly.

Multiple neural network with several
hidden layers constitute of deep network

forward networks

Networks that are not cyclic in
nature, i.e. the outputs are independent of each other.


neural network

Here, a neuron in a layer is only
connected to a small region of the layer before it. It’s a feed forward neural
network inspired from the visual cortex.

neural networks

The neural network in which the
present output depends on the previous outputs (Could be understood as an
analogy to Dynamic programming).





Basic structure of a RNN

There are some limitations with RNN

gradient problem

When the change in weight is very very
small i.e(<<<<1), it corresponds to (de/dw)<<<1. The new weight is almost equal to the old one. This is removed by using another neural network known as LONG SHORT TERM MEMORY NETWORKS(LSTMs)       Long short term memory networks(lstm) RNN equipped with long term dependencies.   WORD2VEC A model that predicts between a center word and context words in terms of word vectors. It comprises of two models: ·        Skip – Gram model ·        Continuous Bag of words model                         Task Designing a Part of Speech tagger. Dataset A merged Bhojpuri dataset containing of sentences of Bhojpuri and the corresponding labels to the words.   A sample of the dataset. Tools used ·        Python 3 ·        Keras ·        Tensor Flow Backend     After having a thorough understanding of the above listed topics. I have first taken the Word2vec Embeddings of the words with their corresponding sentences.  So, I have extracted a sentence and then created the vector word by word. The implementation could be taken as a 2D array with sentences and words. The very same I have done with the labels, I have created a 2D array of the corresponding words in the sentences. A dictionary is being used to map the words and the corresponding labels. For the label Vector Part, The total different tags were used to create the one hot vector, The total number of different labels are 29 in number and namely are: ['NNP', 'NN', 'PSP','NST','VM','JJ','RB',         'RP','CC','VAUX','SYM','RDP','QC','PRP','QF','NEG',         'DEM','RDP','WQ','INJ','CL','ECH','UT','INTF','UNK','NP','VGF','CCP','BLK']   Another dictionary is used to map the labels to the vectors. Now, we have to take a sample test data, train the lstm model on that and then predict it on test values. We have encoded the test vector and labels of the test dataset as well which we have used as the validation data. A sequential model has been taken and as the size of the sentence with maximum words came out to be 226 Lstm was trained with an input shape of 226*100 as the vector size is 100 and the maximum size is 226 with the return sequences as True. 29 was passed to the Dense function as there are 29 different tags. After being trained in lstm attention mechanism is applied. 


I'm James!

Would you like to get a custom essay? How about receiving a customized one?

Check it out