edu.columbia.cs.api
Class OpenIEUnsupervisedRelationshipExtractor<D extends Document>

java.lang.Object
  extended by edu.columbia.cs.api.OpenIEUnsupervisedRelationshipExtractor<D>
All Implemented Interfaces:
RelationshipExtractor<Document>

public class OpenIEUnsupervisedRelationshipExtractor<D extends Document>
extends java.lang.Object
implements RelationshipExtractor<Document>

Implementation of the relationship extractor that is based on the unsupervised learning of KnowItAll. Additionally, this extractor can use a classifier to determine the confidence that each result of the unsupervised learning is a relationship (ReVerb approach)

This class uses the original software of ReVerb that can be found in http://reverb.cs.washington.edu/

To know more about KnowItAll or Reverb please refer to: Identifying Relations for Open Information Extraction

Since:
2011-09-27
Version:
0.1
Author:
Pablo Barrio, Goncalo Simoes

Constructor Summary
OpenIEUnsupervisedRelationshipExtractor(SentenceSplitter splitter, Tokenizer tokenizer, POSTagger pos, Chunker chunker)
          Constructor of the Open IE relationship extractor.
OpenIEUnsupervisedRelationshipExtractor(SentenceSplitter splitter, Tokenizer tokenizer, POSTagger pos, Chunker chunker, weka.classifiers.Classifier cla)
          Constructor of the Open IE relationship extractor.
OpenIEUnsupervisedRelationshipExtractor(SentenceSplitter splitter, Tokenizer tokenizer, POSTagger pos, Chunker chunker, weka.classifiers.Classifier cla, double threshold)
          Constructor of the Open IE relationship extractor.
OpenIEUnsupervisedRelationshipExtractor(SentenceSplitter splitter, Tokenizer tokenizer, POSTagger pos, Chunker chunker, double threshold)
          Constructor of the Open IE relationship extractor.
 
Method Summary
 java.util.List<Relationship> extractTuples(Document doc)
          Implementation of the extractTuples method.
 
Methods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

OpenIEUnsupervisedRelationshipExtractor

public OpenIEUnsupervisedRelationshipExtractor(SentenceSplitter splitter,
                                               Tokenizer tokenizer,
                                               POSTagger pos,
                                               Chunker chunker)
Constructor of the Open IE relationship extractor. This constructor does not receive a classifier nor a threshold. In this case the classifier that is used is the default classifier of ReVerb while the threshold is set to 0.5.

Parameters:
splitter - the sentence splitter
tokenizer - the tokenizer
pos - the POS tagger
chunker - the NLP chunker

OpenIEUnsupervisedRelationshipExtractor

public OpenIEUnsupervisedRelationshipExtractor(SentenceSplitter splitter,
                                               Tokenizer tokenizer,
                                               POSTagger pos,
                                               Chunker chunker,
                                               weka.classifiers.Classifier cla)
Constructor of the Open IE relationship extractor. This constructor does not receive a threshold. In this case the threshold is set to 0.5.

Parameters:
splitter - the sentence splitter
tokenizer - the tokenizer
pos - the POS tagger
chunker - the NLP chunker
cla - the classifier used to compute the confidence that a given answer is a relationship

OpenIEUnsupervisedRelationshipExtractor

public OpenIEUnsupervisedRelationshipExtractor(SentenceSplitter splitter,
                                               Tokenizer tokenizer,
                                               POSTagger pos,
                                               Chunker chunker,
                                               double threshold)
Constructor of the Open IE relationship extractor. This constructor does not receive a classifier. In this case the classifier that is used is the default classifier of ReVerb.

Parameters:
splitter - the sentence splitter
tokenizer - the tokenizer
pos - the POS tagger
chunker - the NLP chunker
threshold - the confidence threshold to consider that a given candidate is a relationship: if the confidence of a candidate is higher than the threshold then the candidate is considered a relationship otherwise it is discarded

OpenIEUnsupervisedRelationshipExtractor

public OpenIEUnsupervisedRelationshipExtractor(SentenceSplitter splitter,
                                               Tokenizer tokenizer,
                                               POSTagger pos,
                                               Chunker chunker,
                                               weka.classifiers.Classifier cla,
                                               double threshold)
Constructor of the Open IE relationship extractor.

Parameters:
splitter - the sentence splitter
tokenizer - the tokenizer
pos - the POS tagger
chunker - the NLP chunker
cla - the classifier used to compute the confidence that a given answer is a relationship
threshold - the confidence threshold to consider that a given candidate is a relationship: if the confidence of a candidate is higher than the threshold then the candidate is considered a relationship otherwise it is discarded
Method Detail

extractTuples

public java.util.List<Relationship> extractTuples(Document doc)
Implementation of the extractTuples method. This method starts by splitting the input document into sentences. Each sentence is tokenized, and its POS tags and chunks are computed. Next, the candidates to relationships are generated using KnowItAll unsupervised learning. Finally, the the confidence of the candidate is computed using the classifier confidence. Only the candidates with confidence above the threshold are returned as relationships.

Specified by:
extractTuples in interface RelationshipExtractor<Document>
Parameters:
d - the document that contains the information to be extracted