Amharic Parts-of-speech Tagger Using Neural Word Embeddings As Features

Computer Engineering Project Topics

Get the Complete Project Materials Now! ยป

The parts-of-speech (POS) tagging for Amharic language is not matured yet to be used as onernimportant component in other natural language processing (NLP) applications. Previous studiesrndone on Amharic POS tagger used hand-crafted features to develop tagging models. In Amharicrnlanguage, prepositions and conjunctions usually are attached with the other parts-of-speech. Thisrnforces the tags to represent more than one basic information and also decrease the total number ofrninstances in the training corpus. In addition, the manual design of features requires longer time,rnmore labor and linguistic background.rnIn this study, automatically generated neural word embeddings are used as features for therndevelopment of an Amharic POS tagger. Neural word embeddings are multi-dimensional vectorrnrepresentations of words. The vector representations capture syntactic and semantic informationrnabout words. Another additional aspect in this study is, prepositions and conjunctions attachedrnwith the other parts-of-speech are segmented using HornMorpho morphological analyzer. Stateof-rnthe-art deep learning algorithms are also used to develop tagging models. Long Short-TermrnMemory (LSTM) recurrent neural networks and their bidirectional versions (Bi-LSTM RNNs) arernused to develop tagging models from the possible deep learning algorithms.rnThe maximum evaluation result observed is 93.67% F-measure obtained from the modelrndeveloped by using Bi-LSTM recurrent neural network. From the results obtained, it can bernobserved that word embeddings generated by neural networks can replace manually designedrnfeatures which is an important advantage. Segmenting prepositions and conjunctions attached withrnthe other parts-of-speech also improved the accuracy of the POS tagger by more than 5%. Thernaccuracy improvement of the POS tagger is obtained from the increased total number of instancesrnand decreased number of tags due to segmentation.

Get Full Work

Report copyright infringement or plagiarism

Be the First to Share On Social



1GB data
1GB data

RELATED TOPICS

1GB data
1GB data
Amharic Parts-of-speech Tagger Using Neural Word Embeddings As Features

195