Development Of Stemming Algorithm For Afaan Oromo Lanugage Text

Information Sciences Project Topics

Get the Complete Project Materials Now! ยป

This paper reports the design and development of a stemming algorithm for Afaan 01'011100rnlanguage. Reviews of Afaan 01'011100 morphology, stemming algorithms, and other relevantrnmaterials were made. In Afaan 01'011100, inflectional and derivational affixations are thernmajor word formation processes. The initial design of the stemming algorithm was based onrnfree-context conflation procedures following the longest-match suffix removal approach. Anrnaccuracy rate of 71% was obtained from this initial attempt. The improved algorithmrnincorporated suffix, context-sensitive, and recording rules in the procedures. Beforernstemming, functional and frequently occurring words, which were compiled as stoplist, arernexcluded from the input term(s) to increase the efficiency of the stemmer. Procedures forrnprefix removal and for conflation of words formed by reduplication of first syl lable are alsorncomponents of the modified algorithm. Using the modified stemmer an accuracy rate of 92%rnwas gained from the test based on a sample of 1061 words. The percentage of errorsrnrecorded as understemming and overstemming were reduced to 4.58% and 2.5%rnrespectively from 10.5% and 17.5% for the first version. A substantial decrease in size ofrnsample text is achieved from this stemmer. The morphological complexity of the language isrnthe main sources of errors for the resulting inaccuracies of the stemming algoritlun. Forrnfurther improvement of the stemmer therefore, detailed study of afaan 01'011100 morphologyrnis helpful. The result of this study in general shows the possibility of employing a stemmingrnalgorithm for conflating Afaan Oromo words.

Get Full Work

Report copyright infringement or plagiarism

Be the First to Share On Social



1GB data
1GB data

RELATED TOPICS

1GB data
1GB data
Development Of Stemming Algorithm For Afaan Oromo Lanugage Text

132