edu.columbia.cs.cg.prdualrank.pattern.extractor.resource
Class TupleContext

java.lang.Object
  extended by edu.columbia.cs.cg.prdualrank.pattern.extractor.resource.TupleContext

public class TupleContext
extends java.lang.Object

This class is used for our implementation of: "Searching Patterns for Relation Extraction over the Web: Rediscovering the Pattern-Relation Duality" . Y. Fang and K. C.-C. Chang. In WSDM, pages 825-834, 2011. For further information, WSDM 2011 Conference Website .

Description

This class represents the text surrounding a tuple or many tuples. Is used in the generation of Search Patterns<\b> using Window Generation or Document Generation respectively. For Document Search Pattern Generation, since no restriction in the size of arrays surrounding tuples will be provided, all the text except for the tuple attribute values will be considered in Search Pattern generation.
For instance, given a span between entities larger than 9, a windows size of 10 and a sentence (from Google.com): "When the acquisition is complete, YouTube will retain its distinct brand identity, strengthening and complementing Google’s own fast-growing video business. YouTube will continue to be based in San Bruno, CA, and all YouTube employees will remain with the company. With Google’s technology, advertiser relationships and global reach, YouTube will continue to build on its success as one of the world’s most popular services for video entertainment."
One of the Tuple Context (Considering the first two occurrences of Google and YouTube):
1. ["When","the","acquisition","is","complete"] COMPANY ["will","retain","its","distinct","brand","identity","strengthening","and","complementing"] BUYER ["'s","own","fast-growing","video","business","YouTube","will","continue","to","be"]

For more information, read Definition 1 in Section 3.1 of the mentioned paper.

Since:
2011-10-07
Version:
0.1
Author:
Pablo Barrio, Goncalo Simoes
See Also:
WSDM 2011 Conference Website

Constructor Summary
TupleContext(java.util.List<Span> realSpans)
          Instantiates a new tuple context.
 
Method Summary
 void addWords(java.lang.String[] newWords)
          Adds a new sequence of words to the context.
 java.util.Set<java.lang.String[]> generateNgrams(int ngram)
          Generate ngrams of size lower or equal to 'ngram' based on the sequences of words.
 java.lang.String toString()
           
 
Methods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, wait, wait, wait
 

Constructor Detail

TupleContext

public TupleContext(java.util.List<Span> realSpans)
Instantiates a new tuple context.

Parameters:
realSpans - the non overlapping text segments of a tuple. Overlapping segments were combined in a previous step.
Method Detail

addWords

public void addWords(java.lang.String[] newWords)
Adds a new sequence of words to the context. Notice that this text might appear before, in between or after the attributes of a tuple.

Parameters:
newWords - the new detected sequence of words between attributes.

generateNgrams

public java.util.Set<java.lang.String[]> generateNgrams(int ngram)
Generate ngrams of size lower or equal to 'ngram' based on the sequences of words.

Parameters:
ngram - the maximum size of ngrams.
Returns:
the set of ngrams extracted from the words surrounding the tuple.

toString

public java.lang.String toString()
Overrides:
toString in class java.lang.Object