Introduction to linguistic annotation and text analytics

Linguistic annotation and text analytics are active areas of research and development, with academic conferences and industry events such as the Linguistic Annotation Workshops and the annual Text Analytics Summits. This book provides a basic introduction to both fields, and aims to show that good l...

Full description

Bibliographic Details
Main Author: Wilcock, Graham.
Format: Electronic
Language:English
Published: San Rafael, Calif. (1537 Fourth Street, San Rafael, CA 94901 USA) : Morgan & Claypool Publishers, c2009.
Series:Synthesis lectures on human language technologies (Online), # 3.
Subjects:
Online Access:Abstract with links to full text
LEADER 05339nam a2200553 a 4500
001 3421
005 20090605144200.0
006 m e d
007 cr cn |||m|||a
008 090604s2009 caua fsab 000 0 eng d
020 # # |a 9781598297393 (electronic bk.) 
020 # # |z 9781598297386 (pbk.) 
024 7 # |a 10.2200/S00194ED1V01Y200905HLT003  |2 doi 
035 # # |a (CaBNvSL)gtp00534713 
040 # # |a CaBNvSL  |c CaBNvSL  |d CaBNvSL 
050 # 4 |a P98.3  |b .W555 2009 
082 0 4 |a 410.285  |2 22 
100 1 # |a Wilcock, Graham. 
245 1 0 |a Introduction to linguistic annotation and text analytics  |c Graham Wilcock.  |h [electronic resource] / 
260 # # |a San Rafael, Calif. (1537 Fourth Street, San Rafael, CA 94901 USA) :  |b Morgan & Claypool Publishers,  |c c2009. 
300 # # |a 1 electronic text (x, 149 p. : ill.) :  |b digital file. 
490 1 # |a Synthesis lectures on human language technologies,  |v # 3  |x 1947-4059 ; 
500 # # |a Part of: Synthesis digital library of engineering and computer science. 
500 # # |a Title from PDF t.p. (viewed on June 4, 2009). 
500 # # |a Series from website. 
504 # # |a Includes bibliographical references (p. 147-149). 
505 0 # |a Working with XML -- Introduction -- XML basics -- XML parsing and validation -- XML transformations -- In-line annotations -- Stand-off annotations -- Annotation standards -- Further reading -- Linguistic annotation -- Levels of linguistic annotation -- WordFreak annotation tool -- Sentence boundaries -- Tokenization -- Part-of-speech tagging -- Syntactic parsing -- Semantics and discourse -- WordFreak with OpenNLP -- Further reading -- Using statistical NLP tools -- Statistical models -- OpenNLP and Stanford NLP tools -- Sentences and tokenization -- Statistical tagging -- Chunking and parsing -- Named entity recognition -- Coreference resolution -- Further reading -- Annotation interchange -- XSLT transformations -- WordFreak-OpenNLP transformation -- Gate XML format -- Gate-WordFreak transformation -- XML metadata interchange: XMI -- WordFreak-XMI transformation -- Towards interoperability -- Further reading -- Annotation architectures -- Gate -- Gate information extraction tools -- Annotations with JAPE rules -- Customizing GATE gazetteers -- UIMA -- UIMA wrappers for OpenNLP tools -- Annotations with regular expressions -- Customizing UIMA dictionaries -- Further reading -- Text analytics -- Text analytics tools -- Named entity recognition -- Training statistical models -- Coreference resolution -- Information extraction -- Text mining and searching -- New directions -- Further reading -- Bibliography. 
506 # # |a Abstract freely available; full-text restricted to subscribers or individual document purchasers. 
510 0 # |a Compendex 
510 0 # |a INSPEC 
510 0 # |a Google scholar 
510 0 # |a Google book search 
520 3 # |a Linguistic annotation and text analytics are active areas of research and development, with academic conferences and industry events such as the Linguistic Annotation Workshops and the annual Text Analytics Summits. This book provides a basic introduction to both fields, and aims to show that good linguistic annotations are the essential foundation for good text analytics. After briefly reviewing the basics of XML,with practical exercises illustrating in-line and stand-off annotations, a chapter is devoted to explaining the different levels of linguistic annotations. The reader is encouraged to create example annotations using the WordFreak linguistic annotation tool. The next chapter shows how annotations can be created automatically using statistical NLP tools, and compares two sets of tools, the OpenNLP and Stanford NLP tools. The second half of the book describes different annotation formats and gives practical examples of how to interchange annotations between different formats using XSLT transformations. The two main text analytics architectures,GATE and UIMA, are then described and compared, with practical exercises showing how to configure and customize them. The final chapter is an introduction to text analytics, describing the main applications and functions including named entity recognition, coreference resolution and information extraction, with practical examples using both open source and commercial tools. Copies of the example files, scripts, and stylesheets used in the book are available from the companion website, located at http://sites.morganclaypool.com/wilcock. 
530 # # |a Also available in print. 
538 # # |a Mode of access: World Wide Web. 
538 # # |a System requirements: Adobe Acrobat reader. 
650 # 0 |a Computational linguistics. 
650 # 0 |a Corpora (Linguistics) 
650 # 0 |a Linguistic analysis (Linguistics) 
650 # 0 |a XML (Document markup language) 
690 # # |a Linguistic annotation 
690 # # |a Statistical natural language processing 
690 # # |a Part-of-speech tagging 
690 # # |a Named entity recognition 
690 # # |a Information extractions 
690 # # |a Text analytics 
730 0 # |a Synthesis digital library of engineering and computer science. 
830 # 0 |a Synthesis lectures on human language technologies (Online),  |v # 3.  |x 1947-4059 ; 
856 4 2 |u https://ezaccess.library.uitm.edu.my/login?url=http://dx.doi.org/10.2200/S00194ED1V01Y200905HLT003  |3 Abstract with links to full text