edu.columbia.cs.cg.prdualrank
Class PRDualRank
java.lang.Object
edu.columbia.cs.cg.prdualrank.PRDualRank
- All Implemented Interfaces:
- Engine
public class PRDualRank
- extends java.lang.Object
- implements Engine
This class is used for our implementation of:
"Searching Patterns for Relation Extraction over the Web: Rediscovering the Pattern-Relation Duality" . Y. Fang and K. C.-C. Chang. In WSDM, pages 825-834, 2011.
For further information, WSDM 2011 Conference Website .
Description
Main algorithm of PRDualRank to generate search and extraction patterns. The method train<\b> generates a PRDualRank model instance.
This algorithm represents the behavior described in Algorithm PatternSearch(To,S,E) in Figure 9 on Section 5 of the mentioned paper.
- Since:
- 2011-10-07
- Version:
- 0.1
- Author:
- Pablo Barrio, Goncalo Simoes
- See Also:
- WSDM 2011 Conference Website
Constructor Summary |
PRDualRank(PatternExtractor<Document> spe,
PatternExtractor<Relationship> epe,
SearchEngine se,
QueryGenerator<java.lang.String> qg,
int k_seed,
int minsupport,
int k_nolabel,
RankFunction<Pattern<Document,TokenizedDocument>> searchpatternRankFunction,
RankFunction<Pattern<Relationship,TokenizedDocument>> extractpatternRankFunction,
RankFunction<Relationship> tupleRankFunction,
Tokenizer tokenizer,
RelationshipType rType,
TokenBasedAnalyzer myAnalyzer,
QueryGenerator<org.apache.lucene.search.Query> forIndexQueryGenerator,
QuestCalculator<Document,TokenizedDocument> searchPatternQuestCalculator,
QuestCalculator<Relationship,TokenizedDocument> extractionPatternQuestCalculator)
Instantiates a new pR dual rank. |
Method Summary |
Model |
train(java.util.List<OperableStructure> list)
Given a list of
labeled operable structures that corresponds to the training data, produces
a model for relationship extraction. |
Methods inherited from class java.lang.Object |
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
PRDualRank
public PRDualRank(PatternExtractor<Document> spe,
PatternExtractor<Relationship> epe,
SearchEngine se,
QueryGenerator<java.lang.String> qg,
int k_seed,
int minsupport,
int k_nolabel,
RankFunction<Pattern<Document,TokenizedDocument>> searchpatternRankFunction,
RankFunction<Pattern<Relationship,TokenizedDocument>> extractpatternRankFunction,
RankFunction<Relationship> tupleRankFunction,
Tokenizer tokenizer,
RelationshipType rType,
TokenBasedAnalyzer myAnalyzer,
QueryGenerator<org.apache.lucene.search.Query> forIndexQueryGenerator,
QuestCalculator<Document,TokenizedDocument> searchPatternQuestCalculator,
QuestCalculator<Relationship,TokenizedDocument> extractionPatternQuestCalculator)
- Instantiates a new pR dual rank.
- Parameters:
spe
- the Search Pattern Extractor instance.epe
- the Extraction Pattern Extractor instance.se
- the Search Engine.qg
- the Query Generator for the Search Engine se.k_seed
- the number of documents to be retrieved per query.minsupport
- the minimum required support for patterns to be considered in the graph generation.k_nolabel
- the number of non-seed tuples used in graph generation. Recommended 10 times k_seed.searchpatternRankFunction
- the ranking function for the search patterns.extractpatternRankFunction
- the ranking function for the extraction patterns.tupleRankFunction
- the ranking function for tuples.tokenizer
- the tokenizer used to tokenize the retrieved documents.rType
- the relationship type to be processed.myAnalyzer
- the Lucene analyzer for documents to be indexed and queried.forIndexQueryGenerator
- the query generator for the index. Has to understand the search patterns generated by the Search Pattern Extractor instance.searchPatternQuestCalculator
- the search pattern quest calculator (as defined in PRDualRank)extractionPatternQuestCalculator
- the extraction pattern quest calculator (as defined in PRDualRank)
train
public Model train(java.util.List<OperableStructure> list)
- Description copied from interface:
Engine
- Given a list of
labeled operable structures that corresponds to the training data, produces
a model for relationship extraction.
- Specified by:
train
in interface Engine
- Parameters:
list
- the training data
- Returns:
- the relationship extraction model produced by this engine with the provided training data.