Ngrambased Automatic Indexing For Amharic Text

Information Sciences Project Topics

Get the Complete Project Materials Now! ยป

This research explored the applicability of the n-gram method for indexing text written in thernAmharic language. 100 documents (Amharic news articles written in the Visual Ge'ez fontrnobtained from Walta Information Center) and 24 queries (collected from people whornfrequent ly read newspapers) were selected and used for the test. The values of n used werernn=2 (bi-grams) and n=3 (tri-grams). For comparison purposes, unstemmed words were alsornused as index terms. The Vector Space Model (VSM) was used for document representation and retrieval. Thus,rnthe individual words, bi-grams and tri -grams were identified for the collection. These uniquerntel111S were then weighted using the TFIIDF weighting technique used in the VSM. The termrnvectors were generated from these calculated weights for each type of term, i.e. unstemmedrnword, bi-gram, and tri-gram. The query terms (words, bi-grams, and tri-grams) were alsornidentified and weighted. A different weighting fOl111Ula was used for the query terms. Thernvectors of terms were then formed.In order to retrieve relevant documents, similarity calculations were performed between eachrndocument-query vector pair. The ranked results from this calculation were then used torncalculate precision and recall measures that are used in the VSM to test or compare retrievalrneffectiveness. The relevance information th at was used to detel111ine recall and precision wasrnstored in a tabl e. Recall and precision values for the queries for each type of index (word, bigram,rnand tri-gram) were calculated and compared.The results showed that although word indexes are better in overall indexing performance, bigramsrnand tri-grams also have va lues for indexing comparable to words.

Get Full Work

Report copyright infringement or plagiarism

Be the First to Share On Social



1GB data
1GB data

RELATED TOPICS

1GB data
1GB data
Ngrambased Automatic Indexing For Amharic Text

267