edu.columbia.cs.cg.prdualrank.pattern.extractor.impl
Class ExtractionPatternExtractor<T extends Relationship>

java.lang.Object
  extended by edu.columbia.cs.cg.prdualrank.pattern.extractor.impl.ExtractionPatternExtractor<T>
All Implemented Interfaces:
PatternExtractor<Relationship>

public class ExtractionPatternExtractor<T extends Relationship>
extends java.lang.Object
implements PatternExtractor<Relationship>

This class is used for our implementation of: "Searching Patterns for Relation Extraction over the Web: Rediscovering the Pattern-Relation Duality" . Y. Fang and K. C.-C. Chang. In WSDM, pages 825-834, 2011. For further information, WSDM 2011 Conference Website .

Description

Class used to generate Extraction Patterns as described in Algorithm PatternSearch(To,S,E) in Figure 9 on Section 5 and Definition 2 in Section 3.1.

Since:
2011-10-07
Version:
0.1
Author:
Pablo Barrio, Goncalo Simoes
See Also:
WSDM 2011 Conference Website

Constructor Summary
ExtractionPatternExtractor(int span, int individualPatternSize, RelationshipType rType)
          Instantiates a new extraction pattern extractor.
 
Method Summary
 java.util.Map<Pattern<Relationship,TokenizedDocument>,java.lang.Integer> extractPatterns(TokenizedDocument document, Relationship relationship, java.util.List<Relationship> matchingRelationships)
          Extract specific patterns from the document in the parameter list for the specified relationship and other matching relationships in the same document.
 
Methods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

ExtractionPatternExtractor

public ExtractionPatternExtractor(int span,
                                  int individualPatternSize,
                                  RelationshipType rType)
Instantiates a new extraction pattern extractor.

Parameters:
span - Maximum distance (in words) in between attributes of a tuple
individualPatternSize - Maximum size of an extraction pattern, per attribute.
rType - The relationship type to be extracted.
Method Detail

extractPatterns

public java.util.Map<Pattern<Relationship,TokenizedDocument>,java.lang.Integer> extractPatterns(TokenizedDocument document,
                                                                                                Relationship relationship,
                                                                                                java.util.List<Relationship> matchingRelationships)
Description copied from interface: PatternExtractor
Extract specific patterns from the document in the parameter list for the specified relationship and other matching relationships in the same document. The definition of matching used in this project is based on the EntityMatchers contained in the specified relationship.

Specified by:
extractPatterns in interface PatternExtractor<Relationship>
Parameters:
document - the document to be processed.
relationship - the relationship that the extractor is trying to generate patterns for.
matchingRelationships - the relationships contained in 'document' that match the specified relationship.
Returns:
the map