edu.columbia.cs.cg.prdualrank
Class PRDualRank

java.lang.Object
  extended by edu.columbia.cs.cg.prdualrank.PRDualRank
All Implemented Interfaces:
Engine

public class PRDualRank
extends java.lang.Object
implements Engine

This class is used for our implementation of: "Searching Patterns for Relation Extraction over the Web: Rediscovering the Pattern-Relation Duality" . Y. Fang and K. C.-C. Chang. In WSDM, pages 825-834, 2011. For further information, WSDM 2011 Conference Website .

Description

Main algorithm of PRDualRank to generate search and extraction patterns. The method train<\b> generates a PRDualRank model instance.
This algorithm represents the behavior described in Algorithm PatternSearch(To,S,E) in Figure 9 on Section 5 of the mentioned paper.

Since:
2011-10-07
Version:
0.1
Author:
Pablo Barrio, Goncalo Simoes
See Also:
WSDM 2011 Conference Website

Constructor Summary
PRDualRank(PatternExtractor<Document> spe, PatternExtractor<Relationship> epe, SearchEngine se, QueryGenerator<java.lang.String> qg, int k_seed, int minsupport, int k_nolabel, RankFunction<Pattern<Document,TokenizedDocument>> searchpatternRankFunction, RankFunction<Pattern<Relationship,TokenizedDocument>> extractpatternRankFunction, RankFunction<Relationship> tupleRankFunction, Tokenizer tokenizer, RelationshipType rType, TokenBasedAnalyzer myAnalyzer, QueryGenerator<org.apache.lucene.search.Query> forIndexQueryGenerator, QuestCalculator<Document,TokenizedDocument> searchPatternQuestCalculator, QuestCalculator<Relationship,TokenizedDocument> extractionPatternQuestCalculator)
          Instantiates a new pR dual rank.
 
Method Summary
 Model train(java.util.List<OperableStructure> list)
          Given a list of labeled operable structures that corresponds to the training data, produces a model for relationship extraction.
 
Methods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

PRDualRank

public PRDualRank(PatternExtractor<Document> spe,
                  PatternExtractor<Relationship> epe,
                  SearchEngine se,
                  QueryGenerator<java.lang.String> qg,
                  int k_seed,
                  int minsupport,
                  int k_nolabel,
                  RankFunction<Pattern<Document,TokenizedDocument>> searchpatternRankFunction,
                  RankFunction<Pattern<Relationship,TokenizedDocument>> extractpatternRankFunction,
                  RankFunction<Relationship> tupleRankFunction,
                  Tokenizer tokenizer,
                  RelationshipType rType,
                  TokenBasedAnalyzer myAnalyzer,
                  QueryGenerator<org.apache.lucene.search.Query> forIndexQueryGenerator,
                  QuestCalculator<Document,TokenizedDocument> searchPatternQuestCalculator,
                  QuestCalculator<Relationship,TokenizedDocument> extractionPatternQuestCalculator)
Instantiates a new pR dual rank.

Parameters:
spe - the Search Pattern Extractor instance.
epe - the Extraction Pattern Extractor instance.
se - the Search Engine.
qg - the Query Generator for the Search Engine se.
k_seed - the number of documents to be retrieved per query.
minsupport - the minimum required support for patterns to be considered in the graph generation.
k_nolabel - the number of non-seed tuples used in graph generation. Recommended 10 times k_seed.
searchpatternRankFunction - the ranking function for the search patterns.
extractpatternRankFunction - the ranking function for the extraction patterns.
tupleRankFunction - the ranking function for tuples.
tokenizer - the tokenizer used to tokenize the retrieved documents.
rType - the relationship type to be processed.
myAnalyzer - the Lucene analyzer for documents to be indexed and queried.
forIndexQueryGenerator - the query generator for the index. Has to understand the search patterns generated by the Search Pattern Extractor instance.
searchPatternQuestCalculator - the search pattern quest calculator (as defined in PRDualRank)
extractionPatternQuestCalculator - the extraction pattern quest calculator (as defined in PRDualRank)
Method Detail

train

public Model train(java.util.List<OperableStructure> list)
Description copied from interface: Engine
Given a list of labeled operable structures that corresponds to the training data, produces a model for relationship extraction.

Specified by:
train in interface Engine
Parameters:
list - the training data
Returns:
the relationship extraction model produced by this engine with the provided training data.