edu.columbia.cs.cg.prdualrank.index.analyzer
Class TokenizerBasedAnalyzer

java.lang.Object
  extended by org.apache.lucene.analysis.Analyzer
      extended by edu.columbia.cs.cg.prdualrank.index.analyzer.TokenizerBasedAnalyzer
All Implemented Interfaces:
java.io.Closeable

public class TokenizerBasedAnalyzer
extends org.apache.lucene.analysis.Analyzer

For this Class, Apache Lucene Engine is required.
This class is used for our implementation of: "Searching Patterns for Relation Extraction over the Web: Rediscovering the Pattern-Relation Duality" . Y. Fang and K. C.-C. Chang. In WSDM, pages 825-834, 2011. For further information, WSDM 2011 Conference Website .

Description

Analyzer for Apache Lucene based on a particular instance of a Tokenizer.

Since:
2011-10-07
Version:
0.1
Author:
Pablo Barrio, Goncalo Simoes
See Also:
Apache Lucene Engine , WSDM 2011 Conference Website

Constructor Summary
TokenizerBasedAnalyzer(Tokenizer tokenizer, java.util.Set<java.lang.String> stopWords)
          Instantiates a new tokenizer based analyzer.
 
Method Summary
 org.apache.lucene.analysis.TokenStream tokenStream(java.lang.String fieldName, java.io.Reader reader)
           
 
Methods inherited from class org.apache.lucene.analysis.Analyzer
close, getOffsetGap, getPositionIncrementGap, reusableTokenStream
 
Methods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

TokenizerBasedAnalyzer

public TokenizerBasedAnalyzer(Tokenizer tokenizer,
                              java.util.Set<java.lang.String> stopWords)
Instantiates a new tokenizer based analyzer.

Parameters:
tokenizer - the tokenizer to be used to tokenize the stream
stopWords - the stop words set to be used during indexing.
Method Detail

tokenStream

public org.apache.lucene.analysis.TokenStream tokenStream(java.lang.String fieldName,
                                                          java.io.Reader reader)
Specified by:
tokenStream in class org.apache.lucene.analysis.Analyzer