edu.columbia.cs.ref.tool.collection.splitter.impl
Class KFoldSplitter<E extends Writable>
java.lang.Object
edu.columbia.cs.ref.tool.collection.splitter.Splitter<E>
edu.columbia.cs.ref.tool.collection.splitter.impl.KFoldSplitter<E>
public class KFoldSplitter<E extends Writable>
- extends Splitter<E>
This class is an implementation of the splitter interface for the K-fold cross-validation
method.
This splitter is parameterized by the number of splits that are desired. The files that are
generated by this splitter are called "train-i" and "test-i" where i is the number of the
fold.
- Since:
- 2011-09-27
- Version:
- 0.1
- Author:
- Pablo Barrio, Goncalo Simoes
Constructor Summary |
KFoldSplitter(int numberSplits)
Constructor for the KFoldSplitter. |
Method Summary |
void |
split(Dataset<E> dataset,
java.io.File outputFolder)
This method is an implementation of the split method for the K-Fold cross validation
method. |
Methods inherited from class java.lang.Object |
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
KFoldSplitter
public KFoldSplitter(int numberSplits)
- Constructor for the KFoldSplitter. It receives as input the number of splits that will
be produced in the process
- Parameters:
numberSplits
- number of folds produced by the method split
split
public void split(Dataset<E> dataset,
java.io.File outputFolder)
- This method is an implementation of the split method for the K-Fold cross validation
method. In order to compute the splits what is done is the following:
1) Create a list with all the elements of a dataset
2) Shuffle that list using the Collections.shuffle method from the java API
3) Create k buckets and scan the list putting each element of the list in one bucket
at a time
4) Generate the fold files
- Specified by:
split
in class Splitter<E extends Writable>
- Parameters:
dataset
- dataset to be splittedoutputFolder
- folder where the fold files will be written