Automatic Classification Of News Amharic Items The Case Of Ethiopian News Agency

Information Sciences Project Topics

Get the Complete Project Materials Now! ยป

To organize its news stock efficiently and to facilitate the storage and retrieval of newsrnitems, Ethiopian News Agency (ENA) use a classification scheme developed in-house.rnWith its large volume of news items produced each year, ENA is facing problems inrnclassifying news items timely. This research has come up with Amharic NewsrnClassifier (ANC) that has the capability of classifying Amharic news items into thernpredefined classes automatically based on their content.rnThe development of automatic document classification system passes through di fferentrnsteps and there are different methods that can be used at each step. This research usedrnstati stical techniques of automatic class ification in all the steps. The steps in automaticrnclass ification include document analys is, generation of document and class vectorsrnbased on document and class representatives, and matching document and classrnvectors to determine the class where a document belongs.rnThe process of document analysis reqUIres some preprocessmg activities such asrnstemming and stopword removal, which are language dependent. In this research, thernkey terms are stemmed using a simple depluralization and suffix and prefix removalrnprogram developed for this purpose. A database of stop word li st, which containsrnmost frequently occurring Amharic words, was also developed. In addition, problemsrnrelated to Amhatic language script were considered during text processing.rnTo identify document representatives, tfX idf weighting technique is used. Classrnvectors, also called centroid vectors, are generated by computing the average value ofrndocument vectors. After identifying class representatives from the learning data set,rncosine function is used as a matching technique to automatically classify the test datarnset that had no relation with the construction of the class vectors.rnThe overall result of this research has showed that statistical techniques can be used tornanalyze Amharic news items and classify them automatically into predefined classes.rnAfter training the classifier, 273 out of 321 news items were correctly classified by thernsystem. The result is very promising, however, additional works are recommended inrnorder to implement the system.

Get Full Work

Report copyright infringement or plagiarism

Be the First to Share On Social



1GB data
1GB data

RELATED TOPICS

1GB data
1GB data
Automatic Classification Of News Amharic Items The Case Of Ethiopian News Agency

296