edu.columbia.cs.ref.tool.preprocessor.impl
Class HTMLContentKeeper

java.lang.Object
  extended by edu.columbia.cs.ref.tool.preprocessor.impl.HTMLContentKeeper
All Implemented Interfaces:
Preprocessor

public class HTMLContentKeeper
extends java.lang.Object
implements Preprocessor

This class is an implementation of the Preprocessor interface. This Preprocessor is able to extract the content of HTML files.

Since:
2011-09-27
Version:
0.1
Author:
Pablo Barrio, Goncalo Simoes

Constructor Summary
HTMLContentKeeper()
           
 
Method Summary
 java.lang.String process(java.lang.String content)
          This method is responsible for processing the content of a document and returns a transformed String that corresponds to a transformed version of the input content.
 
Methods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

HTMLContentKeeper

public HTMLContentKeeper()
Method Detail

process

public java.lang.String process(java.lang.String content)
This method is responsible for processing the content of a document and returns a transformed String that corresponds to a transformed version of the input content.

This processor can be used to obtain the content of HTML files. To do that, we are calling the method HtmlToText.htmlToPlainText(content) from the Google Data Java Client Library

Specified by:
process in interface Preprocessor
Parameters:
content - the content of document represented as a String
Returns:
the transformed version of the input content