Development Of Stemming Algorithm For Wolaytta Text

Information Sciences Project Topics

Get the Complete Project Materials Now! ยป

This study describes the design of a stemming algorithm for Wolaytta language. To give a solidrnbackground for the thesis, literature on conflation in general and stemming algorithms inrnparticular were reviewed. Since it is the nature and characteristics of suffixation that guide therndevelopment of steamer, the Wolaytta language morphology was studied and described in orderrnto model the language and develop an automatic procedure for conflation. The inflectional andrnderivational morphologies of the language are discussed. It is indicated that suffixation is thernmain word formation process in Wordplay language. It is also attempted to show that the languagernis morphological complex and uses extensive concatenation of suffixesrnThe result of the study is a prototype context sensitive iterative stemmer for Wolaytta language.rnError counting technique was employed to evaluate the performance of this stemmer. Thernstemmer was trained on 3537 words (80% of the sample text) and the improved version revealsrnan accuracy of 90.6% on the training set. The number of over stemmed and understeml11ed wordsrnon the training set were 8.6% (304 words) and 0.8% (28 words) respectively. When the stemmerrnrW1S on the unseen sample of 884 words (20% of the sample text), it performed with an accuracyrnof 86.9%. The percentage of endorser recorded as under stunned and over stemmed on this unseenrn(test set) were 9% and 4.1 %, respectively. Moreover, a dictionary reduction of 38 .92% wasrnattained on the test set. The major sources of errors are also reported with possiblernrecommendations to further improve the performance of the stemmer and also for furtherrnresearch.

Get Full Work

Report copyright infringement or plagiarism

Be the First to Share On Social



1GB data
1GB data

RELATED TOPICS

1GB data
1GB data
Development Of Stemming Algorithm For Wolaytta Text

253