Teteyeq () Amharic Question Answering System For Factoid Questions

Computer Science Project Topics

Get the Complete Project Materials Now! »

Amharic documents on the Web are increasing as many newspaper publishers started their servicesrnelectronically. People were relying on IR systems to satisfy their information needs but it has beenrncriticized for lack of delivering “readymade” information to the user, so that the QA systems emergernas best solution to get the required information to the user with the help of information extractionrntechniques. QA systems in other languages have been extensively researched and have shownrnreasonable outcomes, while it is the first work for Amharic. Amharic is a less-resourced language andrndeveloping a QA system was not done before. A number of techniques and approaches were used inrndeveloping the Amharic QA system. The language specific issues in Amharic are extensively studiedrnand hence, document normalization was found very crucial for the performance of our QA system.rnExperiment has showed that documents normalized bear higher performance than the un-normalizedrnones. A distinct technique was used to determine the question types, possible question focuses, andrnexpected answer types as well as to generate proper IR query, based on our language specific issuerninvestigations. An approach in document retrieval focused on retrieving three types of documentsrn(Sentence, paragraph, and file). The file based document retrieval is found more important than thernother two techniques, i.e., taking the advantages of concept distribution over sentences and lessrnpopulous answer particles found in a file based retrieval techniques. An algorithm has been developedrnfor sentence/paragraph re-ranking and answer selection. The named entity (gazetteer) and patternrnbased answer pinpointing algorithms developed help locating possible answer particles in a document.rnThe evaluation of our system, being the first Amharic QA system, has shown promising performance.rnThe rule based question classification module classified about 89% of the question correctly. Therndocument retrieval component showed greater coverage of relevant document retrieval (97%) whilernthe sentence based retrieval has the least (93%) which contributed to the better recall of our system.rnThe gazetteer based answer selection using a paragraph answer selection technique answers 72% of thernquestions correctly which can be considered as promising. The file based answer selection techniquernexhibits better recall (0.909) which indicates that most relevant documents which are thought to havernthe correct answer are returned. The pattern based answer selection technique has better accuracy forrnperson names using paragraph based answer selection technique while the sentence based answerrnselection technique has outperformed in numeric and date question types. In general, our algorithmsrnand tools have shown good performance compared with high-resourced language QA systems such asrnEnglish.

Get Full Work

Report copyright infringement or plagiarism

Be the First to Share On Social



1GB data
1GB data

RELATED TOPICS

1GB data
1GB data
Teteyeq () Amharic Question Answering System For Factoid Questions

204