Concept-based Amharic Documents Similarity (cads) Measure

Computer Science Project Topics

Get the Complete Project Materials Now! ยป

Similarity measure has significance in the area of NLP applications such as search engme,rnin format ion ex traction and document classification. These LP applications are implemented inrnAmharic language. However, most of them rely on simple matching techniques or probabil isticrnmethod to measure si mil arity. These approaches do not always accurately capture conceptualrnrelatedness as measured by humans. Some of the researches try to consider semantic nature of arndocument without handling ambiguity of words. In this research, we proposed Concept-basedrnAmharic Document Simi larity (CADS) by buildin g AmhWordNel.rnThe objective of this research is to implement effect ive similarity measure of documents byrnconsidering issues like pol yscmy, synonymy and semantic relationship between words. Thernmain components of the proposed system (CADS) are AmhWordNet and Concept-basedrnSimil arity Measure (CSM). CSM consists of Word Sense Disambiguation (WSD), Concept TrecrnExtraction and Semantic Similarity Measure modul es.rnThe Amh WordNet is used as input during concept tree extraction and to implement WSDrnmodul e. The extracted concept tree together with WSD module helps to lind the semanticrnsimilarity between words. The output of word similarity is used to compute se ntence similarity.rnFinally document similarity is computed based on sentence similarities.rnThe performance of CADS is evaluated using prec ision, recall and F-measure evaluation metri cs.rnCADS without WSD (CADS WoWS D), Pointwise Mutual Information (PMI), Jaccard andrnCosine similarity measures are implemented so that comparison between the fi ve systcms isrndone. According to the result we get from the experimcnt we conducted, the proposed system hasrnbetter performance than the existing ones.

Get Full Work

Report copyright infringement or plagiarism

Be the First to Share On Social



1GB data
1GB data

RELATED TOPICS

1GB data
1GB data
Concept-based Amharic Documents Similarity (cads) Measure

181