edu.columbia.cs.cg.prdualrank.index.analyzer
Class TokenizerBasedAnalyzer
java.lang.Object
org.apache.lucene.analysis.Analyzer
edu.columbia.cs.cg.prdualrank.index.analyzer.TokenizerBasedAnalyzer
- All Implemented Interfaces:
- java.io.Closeable
public class TokenizerBasedAnalyzer
- extends org.apache.lucene.analysis.Analyzer
For this Class, Apache Lucene Engine is required.
This class is used for our implementation of:
"Searching Patterns for Relation Extraction over the Web: Rediscovering the Pattern-Relation Duality" . Y. Fang and K. C.-C. Chang. In WSDM, pages 825-834, 2011.
For further information, WSDM 2011 Conference Website .
Description
Analyzer for Apache Lucene based on a particular instance of a Tokenizer.
- Since:
- 2011-10-07
- Version:
- 0.1
- Author:
- Pablo Barrio, Goncalo Simoes
- See Also:
- Apache Lucene Engine ,
WSDM 2011 Conference Website
Method Summary |
org.apache.lucene.analysis.TokenStream |
tokenStream(java.lang.String fieldName,
java.io.Reader reader)
|
Methods inherited from class org.apache.lucene.analysis.Analyzer |
close, getOffsetGap, getPositionIncrementGap, reusableTokenStream |
Methods inherited from class java.lang.Object |
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
TokenizerBasedAnalyzer
public TokenizerBasedAnalyzer(Tokenizer tokenizer,
java.util.Set<java.lang.String> stopWords)
- Instantiates a new tokenizer based analyzer.
- Parameters:
tokenizer
- the tokenizer to be used to tokenize the streamstopWords
- the stop words set to be used during indexing.
tokenStream
public org.apache.lucene.analysis.TokenStream tokenStream(java.lang.String fieldName,
java.io.Reader reader)
- Specified by:
tokenStream
in class org.apache.lucene.analysis.Analyzer