org.encog.ml.data.buffer
Class BufferedMLDataSet

java.lang.Object
  extended by org.encog.ml.data.buffer.BufferedMLDataSet
All Implemented Interfaces:
Serializable, Iterable<MLDataPair>, MLDataSet

public class BufferedMLDataSet
extends Object
implements MLDataSet, Serializable

This class is not memory based, so very long files can be used, without running out of memory. This dataset uses a Encog binary training file as a buffer. When used with a slower access dataset, such as CSV, XML or SQL, where parsing must occur, this dataset can be used to load from the slower dataset and train at much higher speeds. This class makes use of Java file channels for maximum file access performance. If you are going to create a binary file, by using the add methods, you must call beginLoad to cause Encog to open an output file. Once the data has been loaded, call endLoad. You can also use the BinaryDataLoader class, with a CODEC, to load many other popular external formats. The binary files produced by this class are in the Encog binary training format, and can be used with any Encog platform. Encog binary files are stored using "little endian" numbers.

See Also:
Serialized Form

Field Summary
static String ERROR_ADD
          Error message for ADD.
static String ERROR_REMOVE
          Error message for REMOVE.
 
Constructor Summary
BufferedMLDataSet(File binaryFile)
          Construct the dataset using the specified binary file.
 
Method Summary
 void add(MLData data1)
          Add only input data, for an unsupervised dataset.
 void add(MLData inputData, MLData idealData)
          Add both the input and ideal data.
 void add(MLDataPair pair)
          Add a data pair of both input and ideal data.
 void beginLoad(int inputSize, int idealSize)
          Begin loading to the binary file.
 void close()
          Close the dataset.
 void endLoad()
          This method should be called once all the data has been loaded.
 MLDataPair get(int index)
           
 EncogEGBFile getEGB()
           
 File getFile()
           
 int getIdealSize()
           
 int getInputSize()
           
 BufferedMLDataSet getOwner()
           
 void getRecord(long index, MLDataPair pair)
          Read an individual record.
 long getRecordCount()
          Determine the total number of records in the set.
 boolean isSupervised()
           
 Iterator<MLDataPair> iterator()
           
 void load(MLDataSet training)
          Load the specified training set.
 MLDataSet loadToMemory()
          Load the binary dataset to memory.
 void open()
          Open the binary file for reading.
 BufferedMLDataSet openAdditional()
          Opens an additional instance of this dataset.
 void removeAdditional(BufferedMLDataSet child)
          Remove an additional dataset that was created.
 void setOwner(BufferedMLDataSet theOwner)
          Set the owner of this dataset.
 int size()
           
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

ERROR_ADD

public static final String ERROR_ADD
Error message for ADD.

See Also:
Constant Field Values

ERROR_REMOVE

public static final String ERROR_REMOVE
Error message for REMOVE.

See Also:
Constant Field Values
Constructor Detail

BufferedMLDataSet

public BufferedMLDataSet(File binaryFile)
Construct the dataset using the specified binary file.

Parameters:
binaryFile - The file to use.
Method Detail

open

public void open()
Open the binary file for reading.


iterator

public Iterator<MLDataPair> iterator()
Specified by:
iterator in interface Iterable<MLDataPair>
Returns:
An iterator.

getRecordCount

public long getRecordCount()
Description copied from interface: MLDataSet
Determine the total number of records in the set.

Specified by:
getRecordCount in interface MLDataSet
Returns:
The record count.

getRecord

public void getRecord(long index,
                      MLDataPair pair)
Read an individual record.

Specified by:
getRecord in interface MLDataSet
Parameters:
index - The zero-based index. Specify 0 for the first record, 1 for the second, and so on.
pair - THe data to read.

openAdditional

public BufferedMLDataSet openAdditional()
Description copied from interface: MLDataSet
Opens an additional instance of this dataset.

Specified by:
openAdditional in interface MLDataSet
Returns:
An additional training set.

add

public void add(MLData data1)
Add only input data, for an unsupervised dataset.

Specified by:
add in interface MLDataSet
Parameters:
data1 - The data to be added.

add

public void add(MLData inputData,
                MLData idealData)
Add both the input and ideal data.

Specified by:
add in interface MLDataSet
Parameters:
inputData - The input data.
idealData - The ideal data.

add

public void add(MLDataPair pair)
Add a data pair of both input and ideal data.

Specified by:
add in interface MLDataSet
Parameters:
pair - The pair to add.

close

public void close()
Close the dataset.

Specified by:
close in interface MLDataSet

getIdealSize

public int getIdealSize()
Specified by:
getIdealSize in interface MLDataSet
Returns:
The ideal data size.

getInputSize

public int getInputSize()
Specified by:
getInputSize in interface MLDataSet
Returns:
The input data size.

isSupervised

public boolean isSupervised()
Specified by:
isSupervised in interface MLDataSet
Returns:
True if this dataset is supervised.

getOwner

public BufferedMLDataSet getOwner()
Returns:
If this dataset was created by openAdditional, the set that created this object is the owner. Return the owner.

setOwner

public void setOwner(BufferedMLDataSet theOwner)
Set the owner of this dataset.

Parameters:
theOwner - The owner.

removeAdditional

public void removeAdditional(BufferedMLDataSet child)
Remove an additional dataset that was created.

Parameters:
child - The additional dataset to remove.

beginLoad

public void beginLoad(int inputSize,
                      int idealSize)
Begin loading to the binary file. After calling this method the add methods may be called.

Parameters:
inputSize - The input size.
idealSize - The ideal size.

endLoad

public void endLoad()
This method should be called once all the data has been loaded. The underlying file will be closed. The binary fill will then be opened for reading.


getFile

public File getFile()
Returns:
The binary file used.

getEGB

public EncogEGBFile getEGB()
Returns:
The EGB file to use.

loadToMemory

public MLDataSet loadToMemory()
Load the binary dataset to memory. Memory access is faster.

Returns:
A memory dataset.

load

public void load(MLDataSet training)
Load the specified training set.

Parameters:
training - The training set to load.

size

public int size()
Specified by:
size in interface MLDataSet

get

public MLDataPair get(int index)
Specified by:
get in interface MLDataSet


Copyright © 2014. All Rights Reserved.