TokenizedDocument

Overview

Package

Class

Use

Tree

Deprecated

Index

Help

PREV CLASS NEXT CLASS

FRAMES NO FRAMES

SUMMARY: NESTED | FIELD | CONSTR | METHOD

DETAIL: FIELD | CONSTR | METHOD

edu.columbia.cs.ref.model
Class TokenizedDocument

java.lang.Object
  edu.columbia.cs.ref.model.Document
      edu.columbia.cs.ref.model.TokenizedDocument

All Implemented Interfaces:: Matchable, Writable, java.io.Serializable

public class TokenizedDocument
extends Document
extends Document

Particular type of Document that went through a tokenization process.

Like a Document, a TokenizedDocument is defined by its path, the name of the file, a list of Segments that represent the content of the document and annotations of entities and relationships in the document. Additionally, a TokenizedDocument is composed by the information that results from the tokenization.

Since:: 2011-09-27
Version:: 0.1
Author:: Pablo Barrio, Goncalo Simoes
See Also:: Serialized Form

Constructor Summary
`TokenizedDocument(Document d, Tokenizer tokenizer)` Constructor of the Document

Method Summary
`Span`	`getEntitySpan(Entity entity)` Returns the indexes in the tokenization.
`Span[]`	`getTokenizedSpans()` Returns an array of spans where each entry corresponds to the start and ending indexes of the tokens in the text
`java.lang.String[]`	`getTokenizedString()` Returns an array of Strings where each entry is the value of each token of the text

Methods inherited from class edu.columbia.cs.ref.model.Document
`addEntity, addRelationship, equals, getEntities, getEntity, getFilename, getPath, getPlainText, getRelationship, getRelationships, getSubstring, getWritableValue, setFilename, setPath, setPlainText, toString`

Methods inherited from class java.lang.Object
`getClass, hashCode, notify, notifyAll, wait, wait, wait`

Constructor Detail

TokenizedDocument

public TokenizedDocument(Document d,
                         Tokenizer tokenizer)

Constructor of the Document

Parameters:: d - document without tokenization; tokenizer - tokenizer used to tokenize the document

Method Detail

getEntitySpan

public Span getEntitySpan(Entity entity)

Returns the indexes in the tokenization.

Parameters:: entity - Entity that we are trying to find the indexes for
Returns:: start and end indexes of the input entity

getTokenizedString

public java.lang.String[] getTokenizedString()

Returns an array of Strings where each entry is the value of each token of the text

Returns:: tokens of the text

getTokenizedSpans

public Span[] getTokenizedSpans()

Returns an array of spans where each entry corresponds to the start and ending indexes of the tokens in the text

Returns:: indexes of the tokens of the text

Overview

Package

Class

Use

Tree

Deprecated

Index

Help

PREV CLASS NEXT CLASS

FRAMES NO FRAMES

SUMMARY: NESTED | FIELD | CONSTR | METHOD

DETAIL: FIELD | CONSTR | METHOD

edu.columbia.cs.ref.model Class TokenizedDocument

TokenizedDocument

getEntitySpan

getTokenizedString

getTokenizedSpans

edu.columbia.cs.ref.model
Class TokenizedDocument