edu.columbia.cs.cg.pattern.prdualrank
Class SearchPattern<T extends Document,D extends TokenizedDocument>

java.lang.Object
  extended by edu.columbia.cs.ref.model.pattern.Pattern<Document,TokenizedDocument>
      extended by edu.columbia.cs.cg.pattern.prdualrank.SearchPattern<T,D>
Type Parameters:
T - the type of document to be retrieved
D - the

public class SearchPattern<T extends Document,D extends TokenizedDocument>
extends Pattern<Document,TokenizedDocument>

The Class SearchPattern represents a pattern that can be used for Document Retrieval.

A SearchPattern is composed by several phrases that can be used to query for the documents.

Since:
2011-09-27
Version:
0.1
Author:
Pablo Barrio, Goncalo Simoes

Constructor Summary
SearchPattern(java.util.List<java.lang.String[]> phrases)
          Instantiates a new search pattern by providing the set of phrases to be used as queries
 
Method Summary
 java.util.List<Document> findMatch(TokenizedDocument d)
          Abstract method that finds all matches in the input document
 java.util.List<java.lang.String[]> getNGrams()
          Gets the list of phrases from the search pattern
static boolean isPatternizable(java.lang.String[] nGram)
          Static method that checks is a given phrase can be used as a valid pattern.
 boolean isValid()
          Checks if is the search pattern is valid

A search pattern is valid if: 1) None of its phrases are stop words

2) It does not contain repeated phrases

3) The phrases do not overlap
 
Methods inherited from class edu.columbia.cs.ref.model.pattern.Pattern
equals, hashCode, toString
 
Methods inherited from class java.lang.Object
getClass, notify, notifyAll, wait, wait, wait
 

Constructor Detail

SearchPattern

public SearchPattern(java.util.List<java.lang.String[]> phrases)
Instantiates a new search pattern by providing the set of phrases to be used as queries

Parameters:
phrases - the phrases to be used as queries
Method Detail

isValid

public boolean isValid()
Checks if is the search pattern is valid

A search pattern is valid if: 1) None of its phrases are stop words

2) It does not contain repeated phrases

3) The phrases do not overlap

Returns:
true, if is valid

findMatch

public java.util.List<Document> findMatch(TokenizedDocument d)
Description copied from class: Pattern
Abstract method that finds all matches in the input document

Specified by:
findMatch in class Pattern<Document,TokenizedDocument>
Parameters:
d - the document where we are looking for the matches
Returns:
the matched objects in document d

isPatternizable

public static boolean isPatternizable(java.lang.String[] nGram)
Static method that checks is a given phrase can be used as a valid pattern.

See the description of the method isValid() to learn what is a valid pattern

Parameters:
nGram - the sequence that may be used as a phrase
Returns:
true, if nGram can form a valid pattern

getNGrams

public java.util.List<java.lang.String[]> getNGrams()
Gets the list of phrases from the search pattern

Returns:
the list of phrases from the search pattern