Design And Development Of Part-of-speech Tagger For Kafi-noonoo Language

Computer Science Project Topics

Get the Complete Project Materials Now! ยป

Part-Of-Speech tagger is a program that reads text in given language and assigns parts-of-speech such as noun, verb, adjective, etc. to each word and other token within the text. Several part-of-speech taggers are available on the web for different languages including Amharic, Oromifa and Tigrigna. However, these POS taggers cannot be applied directly for Kafi-noonoo language. Thus, this thesis presents a research work on Kafi-noonoo part-of-speech tagger. In order to develop the tagger, the study employed a hybrid approach i.e. HMM and rule-based tagger at sentence level. Developing part-of-speech tagger for a language has many advantages such as: it can be used as input for full parser; it can be used in text-to-speech system to correct the way of pronunciation, it can be used for surface linguistic analysis, it can be used as a pre-processing step for researchers who want to conduct higher level NLP application development and it also provide a way of learning the language by discovering the word category and grammar construction of the language.rnFor training and testing purpose, 354 untagged Kafi-noonoo sentences are collected from two genres and annotated using an incremental corpus preparation approach. In addition to this, 34 part-of-speech tags are identified for tagging purpose. After assigning word class information on each word within the sentences, both HMM and rule-based taggers are trained on 90% of the tagged sentences to generate probabilities i.e. lexical and transitional probability for the statistical component of the hybrid tagger and set of transformation rules for the rule-based component of the hybrid tagger. Based on these probabilities and transformation rules, the hybrid tagger (combination of HMM and rule-based tagger) assigns the most suitable word class information for the given untagged Kafi-noonoo texts. The performance of the prototypes i.e. HMM, rule-based and hybrid taggers are tested using different experiments. As a result, HMM and rule-based tagger with unigram initial state tagger shows 77.19% and 61.88%accuracy respectively whereas, the hybrid tagger improve the accuracy to 80.47%.rnKey words: Part of speech tagger, HMM, Rule-based, Hybrid tagger and Transformation rules

Get Full Work

Report copyright infringement or plagiarism

Be the First to Share On Social



1GB data
1GB data

RELATED TOPICS

1GB data
1GB data
Design And Development Of Part-of-speech Tagger For Kafi-noonoo Language

222