edu.columbia.cs.ref.tool.tokenizer
Interface Tokenizer

All Known Implementing Classes:
OpenNLPTokenizer

public interface Tokenizer

Tokenizer is an interface for objects that are responsible for splitting the text into several tokens.

The only method that a Tokenizer needs to implement is the tokenize method which receives a String representing the content of the text and returns an Array of Span where each Span points to a word in the text.

Since:
2011-09-27
Version:
0.1
Author:
Pablo Barrio, Goncalo Simoes

Method Summary
 Span[] tokenize(java.lang.String text)
          Splits the content of a text into several tokens
 

Method Detail

tokenize

Span[] tokenize(java.lang.String text)
Splits the content of a text into several tokens

Parameters:
text - the content of the text
Returns:
the Spans pointing out to wach word in the text