Statistical language models for information retrieval

As online information grows dramatically, search engines such as Google are playing a more and more important role in our lives. Critical to all search engines is the problem of designing an effective retrieval model that can rank documents accurately for a given query. This has been a central resea...

Full description

Bibliographic Details
Main Author: Zhai, ChengXiang.
Format: Electronic
Language:English
Published: San Rafael, Calif. (1537 Fourth Street, San Rafael, CA 94901 USA) : Morgan & Claypool Publishers, c2008.
Series:Synthesis lectures on human language technologies (Online) ; # 1.
Subjects:
Online Access:Abstract with links to full text
LEADER 06255nam a2200541 a 4500
001 3380
005 20081216144101.0
006 m e d
007 cr cn |||m|||a
008 081210s2008 caua fsab 000 0 eng d
020 # # |a 9781598295917 (electronic bk.) 
020 # # |a 9781598295900 (pbk.) 
024 7 # |a 10.2200/S00158ED1V01Y200811HLT001  |2 doi 
035 # # |a 176927388 (OCLC) 
035 # # |a (CaBNvSL)gtp00532206 
040 # # |a CaBNvSL  |c CaBNvSL  |d CaBNvSL 
050 # 4 |a P98.5.S83  |b Z427 2008 
082 0 4 |a 025.04  |2 22 
100 1 # |a Zhai, ChengXiang. 
245 1 0 |a Statistical language models for information retrieval  |c ChengXiang Zhai.  |h [electronic resource] / 
260 # # |a San Rafael, Calif. (1537 Fourth Street, San Rafael, CA 94901 USA) :  |b Morgan & Claypool Publishers,  |c c2008. 
300 # # |a 1 electronic text (xiii, 125 p. : ill.) :  |b digital file. 
490 1 # |a Synthesis lectures on human language technologies ;  |v # 1 
500 # # |a Part of: Synthesis digital library of engineering and computer science. 
500 # # |a Title from PDF t.p. (viewed on December 10, 2008). 
500 # # |a Series from website. 
504 # # |a Includes bibliographical references (p. 109-125). 
505 0 # |a Introduction -- Basic concepts in information retrieval -- Statistical language models -- Overview of information retrieval models -- Similarity-based models -- Probabilistic relevance models -- Probabilistic inference models -- Axiomatic retrieval framework -- Decision-theoretic retrieval framework -- Summary -- Simple query likelihood retrieval model -- Basic idea -- Event models for [theta] d -- Multinomial [theta] d -- Multiple Bernoulli [theta] d -- Multiple Poisson [theta] d -- Comparison of the three models -- Estimation of [theta] d -- A general smoothing strategy using collection language model -- Jelinek-Mercer smoothing (fixed coefficient interpolation) -- Dirichlet prior smoothing -- Absolute discounting smoothing -- Interpolation vs. backoff -- Other smoothing methods -- Comparison of different smoothing methods -- Smoothing and TF-IDF weighting -- Two-stage smoothing -- Exploit document prior -- Summary -- Complex query likelihood retrieval model -- Document-specific smoothing of [theta]d -- Cluster-based smoothing -- Document expansion -- Beyond unigram models -- Parsimonious language models -- Full Bayesian query likelihood -- Translation model -- Summary -- Probabilistic distance retrieval model -- Difficulty in supporting feedback with query likelihood -- Kullback-Leibler divergence retrieval model -- Estimation of query models -- Model-based feedback -- Markov chain query model estimation -- Relevance model -- Structured query models -- Negative relevance feedback -- Summary -- Language models for special retrieval tasks -- Cross-lingual information retrieval -- Distributed information retrieval -- Structured document retrieval and combining representations -- Personalized and context-sensitive search -- Expert finding -- Passage retrieval -- Subtopic retrieval -- Other retrieval-related tasks -- Modeling redundancy and novelty -- Predicting query difficulty -- Summary -- Language models for latent topic analysis -- Probabilistic latent semantic analysis (PLSA) -- Latent dirichlet allocation (LDA) -- Extensions of PLSA and LDA -- Topic model labeling -- Using topic models for retrieval -- Summary -- Conclusions -- Language models vs.traditional retrieval models -- Summary of research progress -- Future directions. 
506 # # |a Abstract freely available; full-text restricted to subscribers or individual document purchasers. 
510 0 # |a Compendex 
510 0 # |a INSPEC 
510 0 # |a Google scholar 
510 0 # |a Google book search 
520 # # |a As online information grows dramatically, search engines such as Google are playing a more and more important role in our lives. Critical to all search engines is the problem of designing an effective retrieval model that can rank documents accurately for a given query. This has been a central research problem in information retrieval for several decades. In the past ten years, a new generation of retrieval models, often referred to as statistical language models, has been successfully applied to solve many different information retrieval problems. Compared with the traditional models such as the vector space model, these new models have a more sound statistical foundation and can leverage statistical estimation to optimize retrieval parameters. They can also be more easily adapted to model nontraditional and complex retrieval problems. Empirically, they tend to achieve comparable or better performance than a traditional model with less effort on parameter tuning. This book systematically reviews the large body of literature on applying statistical language models to information retrieval with an emphasis on the underlying principles, empirically effective language models, and language models developed for non-traditional retrieval tasks. All the relevant literature has been synthesized to make it easy for a reader to digest the research progress achieved so far and see the frontier of research in this area. The book also offers practitioners an informative introduction to a set of practically useful language models that can effectively solve a variety of retrieval problems. No prior knowledge about information retrieval is required, but some basic knowledge about probability and statistics would be useful for fully digesting all the details. 
530 # # |a Also available in print. 
538 # # |a Mode of access: World Wide Web. 
538 # # |a System requirements: Adobe Acrobat reader. 
650 # 0 |a Web search engines  |x Mathematical models. 
650 # 0 |a Computational linguistics  |x Statistical methods. 
690 # # |a Information retrieval. 
690 # # |a Search engines. 
690 # # |a Retrieval models. 
690 # # |a Language models. 
690 # # |a Smoothing. 
690 # # |a Topic models. 
730 0 # |a Synthesis digital library of engineering and computer science. 
830 # 0 |a Synthesis lectures on human language technologies (Online) ;  |v # 1. 
856 4 2 |u https://ezaccess.library.uitm.edu.my/login?url=http://dx.doi.org/10.2200/S00158ED1V01Y200811HLT001  |3 Abstract with links to full text