edu.columbia.cs.cg.prdualrank.index.tokenizer
Class SpanBasedTokenizer

java.lang.Object
  extended by org.apache.lucene.util.AttributeSource
      extended by org.apache.lucene.analysis.TokenStream
          extended by org.apache.lucene.analysis.Tokenizer
              extended by edu.columbia.cs.cg.prdualrank.index.tokenizer.SpanBasedTokenizer
All Implemented Interfaces:
java.io.Closeable

public class SpanBasedTokenizer
extends org.apache.lucene.analysis.Tokenizer

For this Class, Apache Lucene Engine is required.
This class is used for our implementation of: "Searching Patterns for Relation Extraction over the Web: Rediscovering the Pattern-Relation Duality" . Y. Fang and K. C.-C. Chang. In WSDM, pages 825-834, 2011. For further information, WSDM 2011 Conference Website .

Description

Tokenizer for Apache Lucene Search Engine based on the already calculated tokens of the element we want either to index or search for.

Since:
2011-10-07
Version:
0.1
Author:
Pablo Barrio, Goncalo Simoes
See Also:
Apache Lucene Engine , WSDM 2011 Conference Website

Nested Class Summary
 
Nested classes/interfaces inherited from class org.apache.lucene.util.AttributeSource
org.apache.lucene.util.AttributeSource.AttributeFactory, org.apache.lucene.util.AttributeSource.State
 
Constructor Summary
SpanBasedTokenizer(Span[] spans, java.lang.String[] content)
          Instantiates a new span based tokenizer.
 
Method Summary
 boolean incrementToken()
           
 
Methods inherited from class org.apache.lucene.analysis.Tokenizer
close, reset
 
Methods inherited from class org.apache.lucene.analysis.TokenStream
end, reset
 
Methods inherited from class org.apache.lucene.util.AttributeSource
addAttribute, addAttributeImpl, captureState, clearAttributes, cloneAttributes, copyTo, equals, getAttribute, getAttributeClassesIterator, getAttributeFactory, getAttributeImplsIterator, hasAttribute, hasAttributes, hashCode, reflectAsString, reflectWith, restoreState, toString
 
Methods inherited from class java.lang.Object
getClass, notify, notifyAll, wait, wait, wait
 

Constructor Detail

SpanBasedTokenizer

public SpanBasedTokenizer(Span[] spans,
                          java.lang.String[] content)
Instantiates a new span based tokenizer.

Parameters:
spans - the spans of the element to be tokenized
content - the content the splitted content of the element to be tokenized. Must match the spans.
Method Detail

incrementToken

public boolean incrementToken()
                       throws java.io.IOException
Specified by:
incrementToken in class org.apache.lucene.analysis.TokenStream
Throws:
java.io.IOException