edu.columbia.cs.ref.tool.tokenizer.impl
Class OpenNLPTokenizer

java.lang.Object
  extended by edu.columbia.cs.ref.tool.tokenizer.impl.OpenNLPTokenizer
All Implemented Interfaces:
Tokenizer

public class OpenNLPTokenizer
extends java.lang.Object
implements Tokenizer

The OpenNLPTokenizer is an implementation of the Tokenizer interface that uses OpenNLP models to split the text into tokens.

Since:
2011-09-27
Version:
0.1
Author:
Pablo Barrio, Goncalo Simoes

Constructor Summary
OpenNLPTokenizer(java.lang.String path)
          Instantiates a new OpenNLP tokenizer.
 
Method Summary
 Span[] tokenize(java.lang.String text)
          Splits the content of a text into several tokens
 
Methods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

OpenNLPTokenizer

public OpenNLPTokenizer(java.lang.String path)
Instantiates a new OpenNLP tokenizer. It receives as input the path to the model.

Parameters:
path - the path
Method Detail

tokenize

public Span[] tokenize(java.lang.String text)
Description copied from interface: Tokenizer
Splits the content of a text into several tokens

Specified by:
tokenize in interface Tokenizer
Parameters:
text - the content of the text
Returns:
the Spans pointing out to wach word in the text