edu.columbia.cs.cg.prdualrank.index
Class Index

java.lang.Object
  extended by edu.columbia.cs.cg.prdualrank.index.Index

public class Index
extends java.lang.Object

For this Class, Apache Lucene Engine is required.
This class is used for our implementation of: "Searching Patterns for Relation Extraction over the Web: Rediscovering the Pattern-Relation Duality" . Y. Fang and K. C.-C. Chang. In WSDM, pages 825-834, 2011. For further information, WSDM 2011 Conference Website .

Description

Apache Lucene Indexer and Searcher. Used for optimal matching of the search patterns.
Read Algorithm PatternSearch(To,S,E) in Figure 9, Section 5 of the mentioned paper for more detailed information.

Since:
2011-10-07
Version:
0.1
Author:
Pablo Barrio, Goncalo Simoes
See Also:
Apache Lucene Engine , WSDM 2011 Conference Website

Field Summary
static java.lang.String CONTENT
           
 
Constructor Summary
Index(TokenBasedAnalyzer myAnalyzer, boolean lowercase, java.util.Set<java.lang.String> stopWords)
          Instantiates a new index.
 
Method Summary
 void addDocument(TokenizedDocument document)
          Adds the document to the index.
 void close()
          Closes and optimize the index.
 java.util.List<TokenizedDocument> search(org.apache.lucene.search.Query query, int n)
          Search the query in the index.
 
Methods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

CONTENT

public static final java.lang.String CONTENT
See Also:
Constant Field Values
Constructor Detail

Index

public Index(TokenBasedAnalyzer myAnalyzer,
             boolean lowercase,
             java.util.Set<java.lang.String> stopWords)
Instantiates a new index.

Parameters:
myAnalyzer - the analyzer to be used to index the content.
lowercase - specifies if the content will be stored in lowercase. No match case will be allowed if true.
stopWords - the set stop words. Empty set if no stop words are considered.
Method Detail

addDocument

public void addDocument(TokenizedDocument document)
Adds the document to the index.

Parameters:
document - the document to be indexed.

close

public void close()
Closes and optimize the index.


search

public java.util.List<TokenizedDocument> search(org.apache.lucene.search.Query query,
                                                int n)
Search the query in the index.

Parameters:
query - the query to be issued
n - the number of documents to be retrieved.
Returns:
the list of documents that match the query.