A Generative Theory of Relevance by Victor Lavrenko

By Victor Lavrenko

A sleek details retrieval method should have the potential to discover, set up and current very diverse manifestations of knowledge – equivalent to textual content, photographs, video clips or database documents – any of that could be of relevance to the consumer. even though, the idea that of relevance, whereas likely intuitive, is basically tough to outline, and it really is even more durable to version in a proper way.

Lavrenko doesn't try and bring about a brand new definition of relevance, nor supply arguments as to why any specific definition could be theoretically better or extra entire. in its place, he's taking a generally authorised, albeit a bit conservative definition, makes a number of assumptions, and from them develops a brand new probabilistic version that explicitly captures that inspiration of relevance. With this booklet, he makes significant contributions to the sphere of knowledge retrieval: first, a brand new approach to examine topical relevance, complementing the 2 dominant types, i.e., the classical probabilistic version and the language modeling technique, and which explicitly combines files, queries, and relevance in one formalism; moment, a brand new strategy for modeling exchangeable sequences of discrete random variables which doesn't make any structural assumptions concerning the information and which could additionally deal with infrequent events.

Thus his publication is of significant curiosity to researchers and graduate scholars in info retrieval who specialise in relevance modeling, rating algorithms, and language modeling.

Show description

Read Online or Download A Generative Theory of Relevance PDF

Similar structured design books

MCITP SQL Server 2005 Database Developer All-in-One Exam Guide

All-in-One is All you would like Get whole assurance of all 3 Microsoft qualified IT specialist database developer checks for SQL Server 2005 during this entire quantity. Written through a SQL Server specialist and MCITP, this definitive examination advisor gains studying goals first and foremost of every bankruptcy, examination suggestions, perform questions, and in-depth reasons.

Transactions on Computational Systems Biology IX

The LNCS magazine Transactions on Computational platforms Biology is dedicated to inter- and multidisciplinary examine within the fields of laptop technology and existence sciences and helps a paradigmatic shift within the suggestions from laptop and knowledge technology to deal with the recent demanding situations bobbing up from the structures orientated standpoint of organic phenomena.

The Scheme Programming Language : Third Edition

This completely up-to-date variation of The Scheme Programming Language offers an advent to Scheme and a definitive reference for traditional Scheme, awarded in a transparent and concise demeanour. Written for execs and scholars with a few earlier programming adventure, it starts off through best the programmer lightly during the fundamentals of Scheme and keeps with an creation to a few of the extra complex gains of the language.

Euro-Par 2014: Parallel Processing Workshops: Euro-Par 2014 International Workshops, Porto, Portugal, August 25-26, 2014, Revised Selected Papers, Part I

The 2 volumes LNCS 8805 and 8806 represent the completely refereed post-conference lawsuits of 18 workshops held on the twentieth foreign convention on Parallel Computing, Euro-Par 2014, in Porto, Portugal, in August 2014. The a hundred revised complete papers awarded have been rigorously reviewed and chosen from 173 submissions.

Additional info for A Generative Theory of Relevance

Example text

Harter assumed that frequency of v in both classes follows a Poisson distribution, but that the mean is higher in the elite class. Under this assumption, the frequency of v in the collection as a whole would follow a mixture of two Poissons: P (Dv =dv ) = P (E=1) v v e−μ0,v μd0,v e−μ1,v μd1,v + P (E=0) dv ! dv ! 6) Here E is a binary variable specifying whether D is in the elite set of v, μ1,v is the mean frequency of v in the elite documents, and μ0,v is the same for the non-elite set. Since we don’t know which documents are elite for a given word, we need some way to estimate three parameters: μ1,v , μ0,v and P (E=1).

On the other hand, we gain several distinct advantages by hypothesizing the process described above. These are: 1. We can define a common generative model. By assuming that documents and queries originate in the same space, we pave the way for defining a single distribution that can describe both documents and queries. This is an absolute necessity if we want to entertain the generative hypothesis. 2. Anything can be used as a query in the model. The fact that the query has an artificially enriched latent representation means that any aspect of that representation can initiate the searching process, as long as it happens to be observable.

Since both assumptions appear obviously incorrect, a lot of effort went into improving performance by modeling dependencies. Unfortunately, explicit models of dependence did not lead to consistent performance improvements in either framework. In an attempt to understand this curious effect, we provided two different arguments for why dependency models do not help in the classical framework and the language modeling framework. 2 for details). For the language modeling framework we informally argued that explicit models of dependence will capture nothing but the surface form (wellformedness) of text, which has little to do with the topical content.

Download PDF sample

Rated 4.46 of 5 – based on 24 votes