Statistical language models for information retrieval

As online information grows dramatically, search engines such as Google are playing a more and more important role in our lives. Critical to all search engines is the problem of designing an effective retrieval model that can rank documents accurately for a given query. This has been a central resea...

Full description

Bibliographic Details
Main Author: Zhai, ChengXiang.
Format: Electronic
Language:English
Published: San Rafael, Calif. (1537 Fourth Street, San Rafael, CA 94901 USA) : Morgan & Claypool Publishers, c2008.
Series:Synthesis lectures on human language technologies (Online) ; # 1.
Subjects:
Online Access:Abstract with links to full text
Table of Contents:
  • Introduction
  • Basic concepts in information retrieval
  • Statistical language models
  • Overview of information retrieval models
  • Similarity-based models
  • Probabilistic relevance models
  • Probabilistic inference models
  • Axiomatic retrieval framework
  • Decision-theoretic retrieval framework
  • Summary
  • Simple query likelihood retrieval model
  • Basic idea
  • Event models for [theta] d
  • Multinomial [theta] d
  • Multiple Bernoulli [theta] d
  • Multiple Poisson [theta] d
  • Comparison of the three models
  • Estimation of [theta] d
  • A general smoothing strategy using collection language model
  • Jelinek-Mercer smoothing (fixed coefficient interpolation)
  • Dirichlet prior smoothing
  • Absolute discounting smoothing
  • Interpolation vs. backoff
  • Other smoothing methods
  • Comparison of different smoothing methods
  • Smoothing and TF-IDF weighting
  • Two-stage smoothing
  • Exploit document prior
  • Summary
  • Complex query likelihood retrieval model
  • Document-specific smoothing of [theta]d
  • Cluster-based smoothing
  • Document expansion
  • Beyond unigram models
  • Parsimonious language models
  • Full Bayesian query likelihood
  • Translation model
  • Summary
  • Probabilistic distance retrieval model
  • Difficulty in supporting feedback with query likelihood
  • Kullback-Leibler divergence retrieval model
  • Estimation of query models
  • Model-based feedback
  • Markov chain query model estimation
  • Relevance model
  • Structured query models
  • Negative relevance feedback
  • Summary
  • Language models for special retrieval tasks
  • Cross-lingual information retrieval
  • Distributed information retrieval
  • Structured document retrieval and combining representations
  • Personalized and context-sensitive search
  • Expert finding
  • Passage retrieval
  • Subtopic retrieval
  • Other retrieval-related tasks
  • Modeling redundancy and novelty
  • Predicting query difficulty
  • Summary
  • Language models for latent topic analysis
  • Probabilistic latent semantic analysis (PLSA)
  • Latent dirichlet allocation (LDA)
  • Extensions of PLSA and LDA
  • Topic model labeling
  • Using topic models for retrieval
  • Summary
  • Conclusions
  • Language models vs.traditional retrieval models
  • Summary of research progress
  • Future directions.