Automatic Amharic Text News Classification A Neural Networks Approach

Information Sciences Project Topics

Get the Complete Project Materials Now! ยป

Text classification is one of the methods used to organize massively ail able textual informationrnin a meaningful context to maximize utilization of information. Automatic text class fiction isrnthe preferred method for accomplishing Classify at ion in large volumes of in formation. Researchrnworks on automatic classification is flourishing in the context of other languages; whereas,rnresearch on automatic Amharic text classy fiction is in its in fancy stage and very few attemptsrnhave been made till now. This study puts forward its own contribution for automatic Amharicrntext class fiction.rnBefore the classifier is constructed, preprocessing has been done on the data to make it ready forrnthe learning algorithm including changing various Amharic characters with the same sound tornone common form; stemming word variants; and removing stop words, punctuation marks andrnnumbers. And Document Frequency (OF) threshold is applied to select features of news items .rnTwo weighting schemes, Term Frequency (TF) and Term Frequency by In verse DocumentrnFrequency (TF* IOF), are used so as to weight the features in news documents to construct newsrnby features matrix, which is fed to the learning algorithm. This study considers one of the neuralrnnetworks learning methods called Learning Vector Quantization (LVQ), to see its suitability forrnautomatic Amharic text news classification. In the course of this study, it is found that TFrnweighting scheme outperforms TF* IDF weighting scheme by 3.54% on average. Using the TFrnweight method, 94.81 %, 61.61 % and 70.08% accuracies are obtained at three, six and ninerncat ego rise pediments respectively with an average of 75.5% accuracy. For similar experiments,rnthe application of TF*IOF weight method resulted in 69.63%, 78.22% and 68.03% ac curaciesrnwith an average of 71.96% accuracy.rnPrevious research works on Amharic text c classification show that, accuracy decreasesrnconsistently with the increase in categories. The result of this study shows that accuracy does notrndepend on the number of news items and categories considered; rather, representing eachrncategory with enough number of subclasses determines accuracy. Therefore, further worksrnfocusing on finding the optimum number of subclasses is the major direction of research withrnregard to Amharic text news classification using LVQ.

Get Full Work

Report copyright infringement or plagiarism

Be the First to Share On Social



1GB data
1GB data

RELATED TOPICS

1GB data
1GB data
Automatic Amharic Text News Classification A Neural Networks Approach

242