|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Objectedu.columbia.cs.ref.model.Document
edu.columbia.cs.ref.model.TokenizedDocument
public class TokenizedDocument
Particular type of Document that went through a tokenization process.
Like a Document, a TokenizedDocument is defined by its path, the name of the file,
a list of Segments that represent the content of the document and annotations of
entities and relationships in the document. Additionally, a TokenizedDocument is
composed by the information that results from the tokenization.
Constructor Summary | |
---|---|
TokenizedDocument(Document d,
Tokenizer tokenizer)
Constructor of the Document |
Method Summary | |
---|---|
Span |
getEntitySpan(Entity entity)
Returns the indexes in the tokenization. |
Span[] |
getTokenizedSpans()
Returns an array of spans where each entry corresponds to the start and ending indexes of the tokens in the text |
java.lang.String[] |
getTokenizedString()
Returns an array of Strings where each entry is the value of each token of the text |
Methods inherited from class edu.columbia.cs.ref.model.Document |
---|
addEntity, addRelationship, equals, getEntities, getEntity, getFilename, getPath, getPlainText, getRelationship, getRelationships, getSubstring, getWritableValue, setFilename, setPath, setPlainText, toString |
Methods inherited from class java.lang.Object |
---|
getClass, hashCode, notify, notifyAll, wait, wait, wait |
Constructor Detail |
---|
public TokenizedDocument(Document d, Tokenizer tokenizer)
d
- document without tokenizationtokenizer
- tokenizer used to tokenize the documentMethod Detail |
---|
public Span getEntitySpan(Entity entity)
entity
- Entity that we are trying to find the indexes for
public java.lang.String[] getTokenizedString()
public Span[] getTokenizedSpans()
|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |