edu.columbia.cs.ref.tool.tokenizer
Interface Tokenizer
- All Known Implementing Classes:
- OpenNLPTokenizer
public interface Tokenizer
Tokenizer is an interface for objects that are responsible for splitting the text
into several tokens.
The only method that a Tokenizer needs to implement is the tokenize method which
receives a String representing the content of the text and returns an Array of
Span where each Span points to a word in the text.
- Since:
- 2011-09-27
- Version:
- 0.1
- Author:
- Pablo Barrio, Goncalo Simoes
Method Summary |
Span[] |
tokenize(java.lang.String text)
Splits the content of a text into several tokens |
tokenize
Span[] tokenize(java.lang.String text)
- Splits the content of a text into several tokens
- Parameters:
text
- the content of the text
- Returns:
- the Spans pointing out to wach word in the text