The Application Of Decislon Tree F Or Part Of Speech (pos) T Agging For Amharic

Information Sciences Project Topics

Get the Complete Project Materials Now! ยป

Automatic understanding of natural languages requires a set of language processing tools.rnPOS tagger, which assigns the proper part s of speech (like noun , verb, adjective, etc) tornword s in a sentence, is one of these tool s. T h is stud y in vest gates the possibility ofrnapplying decision tree based POS tagger for Amharic . The tagger was developed us in grnj48 decision tree c classifier algorithm , which is Weka's implementation ofC4.5 algorithmrnin the process, a corpus developed b y ELRC annotation team was used to get the requiredrndata for training and testing the model s . The datasets is comprised of 10 6 5 newsrndocuments ; 2 10 ,000 words. A sample o f some 800 sentences are selected and used forrnmodel development and evaluation . The datasets was processed in line with thernrequirements of the Weka's data mining tool. In order to support decision treernclassification mode is, a table that contain s the contextual and orthographic information isrnconstructed semi-automatically and used as training and testing datasets The right and left neighboring words tags for each word are used as contextualrninformation. Moreover, orthographic information abut the word like the first and lastrncharacter, the prefix and suffix, existence of rim e riding it within the word and so o n arernincluded in the table to provide useful information to the word to be tagged. Performance tests we re conducted at various stages using 10-fold cross validation testrnoption. Experimental results show that, only two successive left and rig ht words tagrnpro v id e useful contextual information; contextual information beyond t woodiestrnprovide useful information rather noise. In the end , a n over all ,including ambiguous us andrnunknown word s, 84.9% correctness (or accuracy) was obtained us in g 10- fold crossrnvalidation test option. Even though , the accuracy of this stud y is encouraging furtherrnstudy to improve the accuracy so a s to reach at implementation level is recommended.rnrn.

Get Full Work

Report copyright infringement or plagiarism

Be the First to Share On Social



1GB data
1GB data

RELATED TOPICS

1GB data
1GB data
The Application Of Decislon Tree F Or Part Of Speech (pos) T Agging For Amharic

340