edu.columbia.cs.ref.tool.collection.splitter.impl
Class KFoldSplitter<E extends Writable>

java.lang.Object
  extended by edu.columbia.cs.ref.tool.collection.splitter.Splitter<E>
      extended by edu.columbia.cs.ref.tool.collection.splitter.impl.KFoldSplitter<E>

public class KFoldSplitter<E extends Writable>
extends Splitter<E>

This class is an implementation of the splitter interface for the K-fold cross-validation method.

This splitter is parameterized by the number of splits that are desired. The files that are generated by this splitter are called "train-i" and "test-i" where i is the number of the fold.

Since:
2011-09-27
Version:
0.1
Author:
Pablo Barrio, Goncalo Simoes

Constructor Summary
KFoldSplitter(int numberSplits)
          Constructor for the KFoldSplitter.
 
Method Summary
 void split(Dataset<E> dataset, java.io.File outputFolder)
          This method is an implementation of the split method for the K-Fold cross validation method.
 
Methods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

KFoldSplitter

public KFoldSplitter(int numberSplits)
Constructor for the KFoldSplitter. It receives as input the number of splits that will be produced in the process

Parameters:
numberSplits - number of folds produced by the method split
Method Detail

split

public void split(Dataset<E> dataset,
                  java.io.File outputFolder)
This method is an implementation of the split method for the K-Fold cross validation method. In order to compute the splits what is done is the following:

1) Create a list with all the elements of a dataset

2) Shuffle that list using the Collections.shuffle method from the java API

3) Create k buckets and scan the list putting each element of the list in one bucket at a time

4) Generate the fold files

Specified by:
split in class Splitter<E extends Writable>
Parameters:
dataset - dataset to be splitted
outputFolder - folder where the fold files will be written