Latent semantic mapping principles & applications /

Latent semantic mapping (LSM) is a generalization of latent semantic analysis (LSA), a paradigm originally developed to capture hidden word patterns in a text document corpus. In information retrieval, LSA enables retrieval on the basis of conceptual content, instead of merely matching words between...

Full description

Bibliographic Details
Main Author: Bellegarda, Jerome Rene, 1961-
Format: Electronic
Language:English
Published: San Rafael, Calif. (1537 Fourth Street, San Rafael, CA 94901 USA) : Morgan & Claypool Publishers, c2007.
Edition:1st ed.
Series:Synthesis lectures on speech and audio processing (Online), #3.
Subjects:
Online Access:Abstract with links to resource
LEADER 05226nam a2200577 a 4500
001 3251
005 20081106162106.0
006 m d
006 m e d
007 cr cn |||m|||a
008 070913s2007 caua sb 000 0 eng d
020 # # |a 159829105X (electronic bk.) 
020 # # |a 9781598291056 (electronic bk.) 
024 7 # |a 10.2200/S00048ED1V01Y200609SAP003  |2 doi 
035 # # |a (CaBNvSL)gtp00531413 
040 # # |a WAU  |c WAU  |d CaBNvSL 
050 # 4 |a P98  |b .B455 2007 
082 0 4 |a 401/.9  |2 22 
100 1 # |a Bellegarda, Jerome Rene,  |d 1961- 
245 1 0 |a Latent semantic mapping  |b principles & applications /  |c Jerome R. Bellegarda.  |h [electronic resource] : 
250 # # |a 1st ed. 
260 # # |a San Rafael, Calif. (1537 Fourth Street, San Rafael, CA 94901 USA) :  |b Morgan & Claypool Publishers,  |c c2007. 
300 # # |a 1 electronic text (x, 101 p. : ill.) :  |b digital file. 
490 1 # |a Synthesis lectures on speech and audio processing,  |v #3  |x 1932-1678 ; 
500 # # |a Part of: Synthesis digital library of engineering and computer science. 
500 # # |a Title from PDF t.p. (viewed on October 24, 2008). 
500 # # |a Series from website. 
504 # # |a Includes bibliographical references (p. 89-100). 
505 0 # |a Principles -- Introduction -- Motivation -- From LSA to LSM -- Organization -- Latent semantic mapping -- Co-occurrence matrix -- Vector representation -- Interpretation -- LSM feature space -- Closeness measures -- LSM framework extension -- Salient characteristics -- Computational effort -- Off-line cost -- Online cost -- Possible shortcuts -- Probabilistic extensions -- Dual probability model -- Probabilistic latent semantic analysis -- Inherent limitations -- Applications -- Junk e-mail filtering -- Conventional approaches -- LSM-based filtering -- Performance -- Semantic classification -- Underlying issues -- Semantic inference -- Caveats -- Language modeling -- N-gram limitations -- MultiSpan language modeling -- Smoothing -- Pronunciation modeling -- Grapheme-to-phoneme conversion -- Pronunciation by latent analogy -- Speaker verification -- The task -- LSM-based speaker verification -- TTS unit selection -- Concatenative synthesis -- LSM-based unit selection -- LSM-based boundary training -- Perspectives -- Discussion -- Inherent tradeoffs -- General applicability -- Conclusion -- Summary -- Perspectives. 
506 # # |a Abstract freely available; full-text restricted to subscribers or individual document purchasers. 
510 0 # |a Compendex 
510 0 # |a INSPEC 
510 0 # |a Google scholar 
510 0 # |a Google book search 
520 # # |a Latent semantic mapping (LSM) is a generalization of latent semantic analysis (LSA), a paradigm originally developed to capture hidden word patterns in a text document corpus. In information retrieval, LSA enables retrieval on the basis of conceptual content, instead of merely matching words between queries and documents. It operates under the assumption that there is some latent semantic structure in the data, which is partially obscured by the randomness of word choice with respect to retrieval. Algebraic and/or statistical techniques are brought to bear to estimate this structure and get rid of the obscuring "noise." This results in a parsimonious continuous parameter description of words and documents, which then replaces the original parameterization in indexing and retrieval. This approach exhibits three main characteristics: 1) discrete entities (words and documents) are mapped onto a continuous vector space; 2) this mapping is determined by global correlation patterns; and 3) dimensionality reduction is an integral part of the process. Such fairly generic properties are advantageous in a variety of different contexts, which motivates a broader interpretation of the underlying paradigm. The outcome (LSM) is a data-driven framework for modeling meaningful global relationships implicit in large volumes of (not necessarily textual) data. This monograph gives a general overview of the framework, and underscores the multifaceted benefits it can bring to a number of problems in natural language understanding and spoken language processing. It concludes with a discussion of the inherent tradeoffs associated with the approach, and some perspectives on its general applicability to data-driven information extraction. 
530 # # |a Also available in print. 
538 # # |a Mode of access: World Wide Web. 
538 # # |a System requirements: Adobe Acrobat reader. 
650 # 0 |a Latent semantic indexing. 
650 # 0 |a Semantics  |x Mathematical models. 
650 # 0 |a Computational linguistics. 
650 # 0 |a Automatic speech recognition. 
690 # # |a Natural language processing. 
690 # # |a Long-span dependencies. 
690 # # |a Data-driven modeling. 
690 # # |a Parsimonious representation. 
690 # # |a Singular value decomposition. 
730 0 # |a Synthesis digital library of engineering and computer science. 
830 # 0 |a Synthesis lectures on speech and audio processing (Online),  |v #3.  |x 1932-1678 ; 
856 4 2 |u https://ezaccess.library.uitm.edu.my/login?url=http://dx.doi.org/10.2200/S00048ED1V01Y200609SAP003  |3 Abstract with links to resource