edu.columbia.cs.cg.prdualrank.pattern.extractor.impl
Class ExtractionPatternExtractor<T extends Relationship>
java.lang.Object
edu.columbia.cs.cg.prdualrank.pattern.extractor.impl.ExtractionPatternExtractor<T>
- All Implemented Interfaces:
- PatternExtractor<Relationship>
public class ExtractionPatternExtractor<T extends Relationship>
- extends java.lang.Object
- implements PatternExtractor<Relationship>
This class is used for our implementation of:
"Searching Patterns for Relation Extraction over the Web: Rediscovering the Pattern-Relation Duality" . Y. Fang and K. C.-C. Chang. In WSDM, pages 825-834, 2011.
For further information, WSDM 2011 Conference Website .
Description
Class used to generate Extraction Patterns as described in Algorithm PatternSearch(To,S,E) in Figure 9 on Section 5 and Definition 2 in Section 3.1.
- Since:
- 2011-10-07
- Version:
- 0.1
- Author:
- Pablo Barrio, Goncalo Simoes
- See Also:
- WSDM 2011 Conference Website
Methods inherited from class java.lang.Object |
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
ExtractionPatternExtractor
public ExtractionPatternExtractor(int span,
int individualPatternSize,
RelationshipType rType)
- Instantiates a new extraction pattern extractor.
- Parameters:
span
- Maximum distance (in words) in between attributes of a tupleindividualPatternSize
- Maximum size of an extraction pattern, per attribute.rType
- The relationship type to be extracted.
extractPatterns
public java.util.Map<Pattern<Relationship,TokenizedDocument>,java.lang.Integer> extractPatterns(TokenizedDocument document,
Relationship relationship,
java.util.List<Relationship> matchingRelationships)
- Description copied from interface:
PatternExtractor
- Extract specific patterns from the document in the parameter list for the specified relationship and other matching relationships in the same document. The definition of matching used in
this project is based on the EntityMatchers contained in the specified relationship.
- Specified by:
extractPatterns
in interface PatternExtractor<Relationship>
- Parameters:
document
- the document to be processed.relationship
- the relationship that the extractor is trying to generate patterns for.matchingRelationships
- the relationships contained in 'document' that match the specified relationship.
- Returns:
- the map