Natural language annotation for machine learning /

By: Pustejovsky, JContributor(s): Stubbs, AmberMaterial type: TextTextPublication details: Mumbai : Shroff Publishers & Distr, 2013Description: xiv, 324 pISBN: 9781449306663 (pbk.); 9789351103738Subject(s): Natural language processing (Computer science) | Corpora (Linguistics) | Machine learningDDC classification: 006.35
Contents:
Machine generated contents note: The Importance of Language Annotation -- The Layers of Linguistic Description -- What Is Natural Language Processing? -- A Brief History of Corpus Linguistics -- What Is a Corpus? -- Early Use of Corpora -- Corpora Today -- Kinds of Annotation -- Language Data and Machine Learning -- Classification -- Clustering -- Structured Pattern Induction -- The Annotation Development Cycle -- Model the Phenomenon -- Annotate with the Specification -- Train and Test the Algorithms over the Corpus -- Evaluate the Results -- Revise the Model and Algorithms -- Summary -- Defining Your Goal -- The Statement of Purpose -- Refining Your Goal: Informativity Versus Correctness -- Background Research -- Language Resources -- Organizations and Conferences -- NLP Challenges -- Assembling Your Dataset -- The Ideal Corpus: Representative and Balanced -- Collecting Data from the Internet -- Eliciting Data from People -- The Size of Your Corpus -- Existing Corpora -- Distributions Within Corpora -- Summary -- Basic Probability for Corpus Analytics -- Joint Probability Distributions -- Bayes Rule -- Counting Occurrences -- Zipf's Law -- N-grams -- Language Models -- Summary -- Some Example Models and Specs -- Film Genre Classification -- Adding Named Entities -- Semantic Roles -- Adopting (or Not Adopting) Existing Models -- Creating Your Own Model and Specification: Generality Versus Specificity -- Using Existing Models and Specifications -- Using Models Without Specifications -- Different Kinds of Standards -- ISO Standards -- Community-Driven Standards -- Other Standards Affecting Annotation -- Summary -- Metadata Annotation: Document Classification -- Unique Labels: Movie Reviews -- Multiple Labels: Film Genres -- Text Extent Annotation: Named Entities -- Inline Annotation -- Stand-off Annotation by Tokens -- Stand-off Annotation by Character Location -- Linked Extent Annotation: Semantic Roles -- ISO Standards and You -- Summary -- The Infrastructure of an Annotation Project -- Specification Versus Guidelines -- Be Prepared to Revise -- Preparing Your Data for Annotation -- Metadata -- Preprocessed Data -- Splitting Up the Files for Annotation -- Writing the Annotation Guidelines -- Example 1: Single Labels-Movie Reviews -- Example 2: Multiple Labels-Film Genres -- Example 3: Extent Annotations-Named Entities -- Example 4: Link Tags-Semantic Roles -- Annotators -- Choosing an Annotation Environment -- Evaluating the Annotations -- Cohen's Kappa (K) -- Fleiss's Kappa (K) -- Interpreting Kappa Coefficients -- Calculating K in Other Contexts -- Creating the Gold Standard (Adjudication) -- Summary -- What Is Learning? -- Defining Our Learning Task -- Classifier Algorithms -- Decision Tree Learning -- Gender Identification -- Naive Bayes Learning -- Maximum Entropy Classifiers -- Other Classifiers to Know About -- Sequence Induction Algorithms -- Clustering and Unsupervised Learning -- Semi-Supervised Learning -- Matching Annotation to Algorithms -- Testing Your Algorithm -- Evaluating Your Algorithm -- Confusion Matrices -- Calculating Evaluation Scores -- Interpreting Evaluation Scores -- Problems That Can Affect Evaluation -- Dataset Is Too Small -- Algorithm Fits the Development Data Too Well -- Too Much Information in the Annotation -- Final Testing Scores -- Summary -- Revising Your Project -- Corpus Distributions and Content -- Model and Specification -- Annotation -- Training and Testing -- Reporting About Your Work -- About Your Corpus -- About Your Model and Specifications -- About Your Annotation Task and Annotators -- About Your ML Algorithm -- About Your Revisions -- Summary -- The Goal of TimeML -- Related Research -- Building the Corpus -- Model: Preliminary Specifications -- Times -- Signals -- Events -- Links -- Annotation: First Attempts -- Model: The TimeML Specification Used in TimeBank -- Time Expressions -- Events -- Signals -- Links -- Confidence -- Annotation: The Creation of TimeBank -- TimeML Becomes ISO-TimeML -- Modeling the Future: Directions for TimeML -- Narrative Containers -- Expanding TimeML to Other Domains -- Event Structures -- Summary -- The TARSQI Components -- GUTime: Temporal Marker Identification -- EVITA: Event Recognition and Classification -- GUTenLINK -- Slinket -- SputLink -- Machine Learning in the TARSQI Components -- Improvements to the TTK -- Structural Changes -- Improvements to Temporal Entity Recognition: BTime -- Temporal Relation Identification -- Temporal Relation Validation -- Temporal Relation Visualization -- TimeML Challenges: TempEval-2 -- TempEval-2: System Summaries -- Overview of Results -- Future of the TTK -- New Input Formats -- Narrative Containers/Narrative Times -- Medical Documents -- Cross-Document Analysis -- Summary -- Crowdsourcing Annotation -- Amazon's Mechanical Turk -- Games with a Purpose (GWAP) -- User-Generated Content -- Handling Big Data -- Boosting -- Active Learning -- Semi-Supervised Learning -- NLP Online and in the Cloud -- Distributed Computing -- Shared Language Resources -- Shared Language Applications -- And Finally ... -- Appendices.
Summary: Create your own natural language training corpus for machine learning. This example-driven book walks you through the annotation cycle, from selecting an annotation task and creating the annotation specification to designing the guidelines, creating a "gold standard" corpus, and then beginning the actual data creation with the annotation process.
Tags from this library: No tags from this library for this title. Log in to add tags.
Star ratings
    Average rating: 0.0 (0 votes)

Includes bibliographical references (p.306-315) and index.

Machine generated contents note: The Importance of Language Annotation --
The Layers of Linguistic Description --
What Is Natural Language Processing? --
A Brief History of Corpus Linguistics --
What Is a Corpus? --
Early Use of Corpora --
Corpora Today --
Kinds of Annotation --
Language Data and Machine Learning --
Classification --
Clustering --
Structured Pattern Induction --
The Annotation Development Cycle --
Model the Phenomenon --
Annotate with the Specification --
Train and Test the Algorithms over the Corpus --
Evaluate the Results --
Revise the Model and Algorithms --
Summary --
Defining Your Goal --
The Statement of Purpose --
Refining Your Goal: Informativity Versus Correctness --
Background Research --
Language Resources --
Organizations and Conferences --
NLP Challenges --
Assembling Your Dataset --
The Ideal Corpus: Representative and Balanced --
Collecting Data from the Internet --
Eliciting Data from People --
The Size of Your Corpus --
Existing Corpora --
Distributions Within Corpora --
Summary --
Basic Probability for Corpus Analytics --
Joint Probability Distributions --
Bayes Rule --
Counting Occurrences --
Zipf's Law --
N-grams --
Language Models --
Summary --
Some Example Models and Specs --
Film Genre Classification --
Adding Named Entities --
Semantic Roles --
Adopting (or Not Adopting) Existing Models --
Creating Your Own Model and Specification: Generality Versus Specificity --
Using Existing Models and Specifications --
Using Models Without Specifications --
Different Kinds of Standards --
ISO Standards --
Community-Driven Standards --
Other Standards Affecting Annotation --
Summary --
Metadata Annotation: Document Classification --
Unique Labels: Movie Reviews --
Multiple Labels: Film Genres --
Text Extent Annotation: Named Entities --
Inline Annotation --
Stand-off Annotation by Tokens --
Stand-off Annotation by Character Location --
Linked Extent Annotation: Semantic Roles --
ISO Standards and You --
Summary --
The Infrastructure of an Annotation Project --
Specification Versus Guidelines --
Be Prepared to Revise --
Preparing Your Data for Annotation --
Metadata --
Preprocessed Data --
Splitting Up the Files for Annotation --
Writing the Annotation Guidelines --
Example 1: Single Labels-Movie Reviews --
Example 2: Multiple Labels-Film Genres --
Example 3: Extent Annotations-Named Entities --
Example 4: Link Tags-Semantic Roles --
Annotators --
Choosing an Annotation Environment --
Evaluating the Annotations --
Cohen's Kappa (K) --
Fleiss's Kappa (K) --
Interpreting Kappa Coefficients --
Calculating K in Other Contexts --
Creating the Gold Standard (Adjudication) --
Summary --
What Is Learning? --
Defining Our Learning Task --
Classifier Algorithms --
Decision Tree Learning --
Gender Identification --
Naive Bayes Learning --
Maximum Entropy Classifiers --
Other Classifiers to Know About --
Sequence Induction Algorithms --
Clustering and Unsupervised Learning --
Semi-Supervised Learning --
Matching Annotation to Algorithms --
Testing Your Algorithm --
Evaluating Your Algorithm --
Confusion Matrices --
Calculating Evaluation Scores --
Interpreting Evaluation Scores --
Problems That Can Affect Evaluation --
Dataset Is Too Small --
Algorithm Fits the Development Data Too Well --
Too Much Information in the Annotation --
Final Testing Scores --
Summary --
Revising Your Project --
Corpus Distributions and Content --
Model and Specification --
Annotation --
Training and Testing --
Reporting About Your Work --
About Your Corpus --
About Your Model and Specifications --
About Your Annotation Task and Annotators --
About Your ML Algorithm --
About Your Revisions --
Summary --
The Goal of TimeML --
Related Research --
Building the Corpus --
Model: Preliminary Specifications --
Times --
Signals --
Events --
Links --
Annotation: First Attempts --
Model: The TimeML Specification Used in TimeBank --
Time Expressions --
Events --
Signals --
Links --
Confidence --
Annotation: The Creation of TimeBank --
TimeML Becomes ISO-TimeML --
Modeling the Future: Directions for TimeML --
Narrative Containers --
Expanding TimeML to Other Domains --
Event Structures --
Summary --
The TARSQI Components --
GUTime: Temporal Marker Identification --
EVITA: Event Recognition and Classification --
GUTenLINK --
Slinket --
SputLink --
Machine Learning in the TARSQI Components --
Improvements to the TTK --
Structural Changes --
Improvements to Temporal Entity Recognition: BTime --
Temporal Relation Identification --
Temporal Relation Validation --
Temporal Relation Visualization --
TimeML Challenges: TempEval-2 --
TempEval-2: System Summaries --
Overview of Results --
Future of the TTK --
New Input Formats --
Narrative Containers/Narrative Times --
Medical Documents --
Cross-Document Analysis --
Summary --
Crowdsourcing Annotation --
Amazon's Mechanical Turk --
Games with a Purpose (GWAP) --
User-Generated Content --
Handling Big Data --
Boosting --
Active Learning --
Semi-Supervised Learning --
NLP Online and in the Cloud --
Distributed Computing --
Shared Language Resources --
Shared Language Applications --
And Finally ... --
Appendices.

Create your own natural language training corpus for machine learning. This example-driven book walks you through the annotation cycle, from selecting an annotation task and creating the annotation specification to designing the guidelines, creating a "gold standard" corpus, and then beginning the actual data creation with the annotation process.

There are no comments on this title.

to post a comment.

© University of Vavuniya

---