edu.columbia.cs.ref.algorithm
Class CandidatesGenerator

java.lang.Object
  extended by edu.columbia.cs.ref.algorithm.CandidatesGenerator

public class CandidatesGenerator
extends java.lang.Object

The candidate generator is responsible for generating all the candidate sentences from a given document for a given set of relationship types.

A candidate generator contains a splitter that will split sentences accordingly. It is important to know that even though the behavihor of the candidate generator is mainly driven by the sentence segmentation there is one major exception: if the sentence segmentation tries to split a sentence in the middle of an entity, this decision will be ignored and the two resulting sentences will become only one.

Since:
2011-09-27
Version:
0.1
Author:
Pablo Barrio, Goncalo Simoes

Constructor Summary
CandidatesGenerator(SentenceSplitter splitter)
          The constructor of the candidate generator.
 
Method Summary
 java.util.Set<CandidateSentence> generateCandidates(Document doc, java.util.Set<RelationshipType> relationshipTypes)
          This is the method that is responsible for the generation of the candidates for a given document.
 
Methods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

CandidatesGenerator

public CandidatesGenerator(SentenceSplitter splitter)
The constructor of the candidate generator. It receives a splitter that will be used to generate the sentences.

Parameters:
splitter - model that generates the sentence splits
Method Detail

generateCandidates

public java.util.Set<CandidateSentence> generateCandidates(Document doc,
                                                           java.util.Set<RelationshipType> relationshipTypes)
This is the method that is responsible for the generation of the candidates for a given document. The generation is divided into three steps:

1) Sentence splitting: using the splitter passed in the constructor

2) Boundaries correction: in case the splitter breaks a sentence in the middle of an entity (e.g. "(...) the l.a. times (...)" may be broken as "(...) the l.a." and "time (...)"), the two resulting sentences are merged again

3) Particular candidate generation: for each sentence, the entities that are belong to it are assigned to each possible role in the relationship types in order to generate particular candidates

Parameters:
doc - document from which we are trying to extract the candidates
relationshipTypes - the relationship types that we are trying to find
Returns:
the set of candidate sentences that can be generated from a document