Use Of Part Of Speech Tagging For Afaan Oromo Word Sense Modeling

Information Sciences Project Topics

Get the Complete Project Materials Now! ยป

Word sense induction (WSI) is the task of automatically discovering all senses of an ambiguous word in a corpus. Induced senses can lead researchers in machine translation and information retrieval to improved performance. rnIn this thesis we have investigated the application of POS tagging to increase the performance of Word Sense Disambiguation for Afaan Oromo by word sense modeling. rnIn order to conduct the study the untagged corpus was taken from yehuwalashet [1]. We prepared annotated corpus by implementing POS tagging on the data. A total corpus of 424397 words for WSM and 29845 words for POS tagging with 20 ambiguous words were used to test the system. For POS tagging purpose NLTK and Python Programming were used and to run the WSM system Java Neatbean were used. Different preprocessing tasks such as Tokenization, stop word removal and normalization were applied on both unannotated and POS tagged annotated corpus to make them ready for the experiment. rnThe experiments were done with two clustering algorithms: EM and K-means and one to three context window size. Experiment results show that using annotated corpus for both approach improved the performance of the system. ML approach with EM algorithm achieved 74.85% for annotated corpus and 70.35% for unannotated one. Hybrid approach with k-means algorithm scored 79.1% for annotated corpus and 74.85% for unannotated corpus. EM algorithm generated error results for hybrid approach. The result showed that using annotated corpus improves the WSM system of Afaan Oromo Words and hybrid approach of WSM system performed good using POS annotated corpus for Afaan Oromo words .

Get Full Work

Report copyright infringement or plagiarism

Be the First to Share On Social



1GB data
1GB data

RELATED TOPICS

1GB data
1GB data
Use Of Part Of Speech Tagging For Afaan  Oromo Word Sense Modeling

187