Incorporation Of Relevance Data In The Term Discrimination Value

Information Sciences Project Topics

Get the Complete Project Materials Now! »

Indexing in information retrieval is used to obtain a suitable vocabularyrnof index terms and optimum assignment of these terms to documents forrnincreasing the effectiveness and efficiency of the the retrieval system. Arngreat many automatic indexing models have been developed over the yearsrnin an effort to produce indexing methods that are both effective and usablernin practice. One of the most elegant approaches for automatic selectionrnand weighting of index terms is the term discrimination value that has beenrndeveloped by Salton and his co-workers. This model ranks the index termsrnin accordance with how well they are able to discriminate the documentsrnof a collection from each other; that is, the value of an index term dependsrnon how much the average separation between individual documents changesrnwhen the given term is assigned for content identification. It is suggestedrnthat the most useful index terms, those which achieve greatest separation,rnare the medium frequency terms.rnSince the basic requirement in effective retrieval is the separation betweenrndocuments which are relevant to a given query and documents whichrnare not relevant to that query, a more complete picture of a term behaviorrnmay be obtained by the consideration of its ability to effect greater separationrnbetween relevant and non-relevant documents while at the same timernmoving relevant documents close to each other.rnThis study was aimed at testing the extent to which the discriminationrnvalue model considers relevance characteristics of documents in ranking thernindex terms. An over-view of the more important ideas current in automaticrnindexing is provided. The term discrimination value model is discussedrnin greater detail. An efficient technique for computing exact termrndiscrimination values for relevant - non-relevant document distinction is introduced.rnThe study is conducted using the KEEN, CRANFIELD, EVANS,rnHARDING and LISA document collections and their associated queries andrnrelevance judgments rnWhile some of the results are consistent with those derived by previousrnworkers, in some cases, specially in the case of relevant - relevant discrimination,rnthe results obtained appear to be in complete disagreement withrnthat of Slaton’s theory: that the medium frequency terms are not the mostrnuseful terms.

Get Full Work

Report copyright infringement or plagiarism

Be the First to Share On Social



1GB data
1GB data

RELATED TOPICS

1GB data
1GB data
Incorporation Of Relevance Data In The Term Discrimination Value

244