org.encog.util.normalize
Class DataNormalization

java.lang.Object
  extended by org.encog.util.normalize.DataNormalization
All Implemented Interfaces:
Serializable

public class DataNormalization
extends Object
implements Serializable

This class is used to normalize both input and ideal data for neural networks. This class can accept input from a variety of sources and output to a variety of targets. Normalization is a process by which input data is normalized so that it falls in specific ranges. Neural networks typically require input to be in the range of 0 to 1, or -1 to 1, depending on how the network is structured. The normalize class is typically given for different types of objects to tell it how to process data. Input Fields: Input fields specify the raw data that will be read by the Normalize class. Input fields are added to the Normalize class by calling addInputField method. Input fields must implement the InputField interface. There are a number of different input fields provided. Input data can be read from several different sources. For example, you can read the "neural network input" data from one CSV file and the "ideal neural network output" from another. Output Fields: The output fields are used to specify the final output from the Normalize class. The output fields specify both the "neural network input" and "ideal output". The output fields are flagged as either input our ideal. The output fields are not necessarily one-to-one with the input fields. For example, several input fields may combine to produce a single output field. Further some input fields may be used only to segregate data, whereas other input fields may be ignored all together. The type of output field that you specify determines the type of processing that will be done on that field. An OutputField is added by calling the addOutputField method. Segregators: Segregators are used generally for two related purposes. First, segregators can be used to exclude rows of data based on certain input values. Perhaps the data includes several classes of data, and you only want to train on one class. Secondly, segregators can be used to segregate data into training and evaluation sets. You may choose to use 80% of your data for training and 20% for evaluation. A segregator is added by calling the addSegregator method. Target Storage: The data created by the Normalization class must be stored somewhere. The storage targets allow this to be specified. The output can be sent to a CSV file, a NeuralDataSet, or any other target supported by a NormalizationStorage derived class. The target is specified by calling the setTarget method. The normalization process can take some time. The progress can be reported to a StatusReportable object. The normalization is a two pass process. The first pass counts the number of records and computes important statistics that will be used to normalize the output. The second pass actually performs the normalization and writes to the target. Both passes are performed when the process method is called.

See Also:
Serialized Form

Constructor Summary
DataNormalization()
           
 
Method Summary
 void addInputField(InputField f)
          Add an input field.
 void addOutputField(OutputField outputField)
          Add an output field.
 void addOutputField(OutputField outputField, boolean ideal)
          Add a field and allow it to be specified as an "ideal output field".
 void addSegregator(Segregator segregator)
          Add a segregator.
 MLData buildForNetworkInput(double[] data)
          Build "input data for a neural network" based on the input values provided.
 InputField findInputField(Class<?> clazz, int count)
          Find an input field by its class.
 OutputField findOutputField(Class<?> clazz, int count)
          Find an output field by its class.
 CSVFormat getCSVFormat()
           
 Set<OutputFieldGroup> getGroups()
           
 List<InputField> getInputFields()
           
 int getNetworkInputLayerSize()
           
 int getNetworkOutputLayerSize()
           
 int getOutputFieldCount()
           
 List<OutputField> getOutputFields()
           
 int getRecordCount()
           
 StatusReportable getReport()
           
 List<Segregator> getSegregators()
           
 NormalizationStorage getStorage()
           
 void init()
           
 void initForOutput()
          Setup the row for output.
 void initForPass()
          Setup the row for output.
 void process()
          Call this method to begin the normalization process.
 void setCSVFormat(CSVFormat csvFormat)
          Set the CSV format to use.
 void setReport(StatusReportable report)
          Set the object that this one is reporting to.
 void setTarget(NormalizationStorage target)
          Determines where the normalized data will be sent.
 boolean twoPassesNeeded()
           
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

DataNormalization

public DataNormalization()
Method Detail

addInputField

public void addInputField(InputField f)
Add an input field.

Parameters:
f - The input field to add.

addOutputField

public void addOutputField(OutputField outputField)
Add an output field. This output field will be added as a "ML network input field", not an "ideal output field".

Parameters:
outputField - The output field to add.

addOutputField

public void addOutputField(OutputField outputField,
                           boolean ideal)
Add a field and allow it to be specified as an "ideal output field". An "ideal" field is the expected output that the ML network is training towards.

Parameters:
outputField - The output field.
ideal - True if this is an ideal field.

addSegregator

public void addSegregator(Segregator segregator)
Add a segregator.

Parameters:
segregator - The segregator to add.

buildForNetworkInput

public MLData buildForNetworkInput(double[] data)
Build "input data for a neural network" based on the input values provided. This allows input for a neural network to be normalized. This is typically used when data is to be presented to a trained neural network.

Parameters:
data - The input values to be normalized.
Returns:
The data to be sent to the neural network.

findInputField

public InputField findInputField(Class<?> clazz,
                                 int count)
Find an input field by its class.

Parameters:
clazz - The input field class type you are looking for.
count - The instance of the input field needed, 0 for the first.
Returns:
The input field if found, otherwise null.

findOutputField

public OutputField findOutputField(Class<?> clazz,
                                   int count)
Find an output field by its class.

Parameters:
clazz - The output field class type you are looking for.
count - The instance of the output field needed, 0 for the first.
Returns:
The output field if found, otherwise null.

getCSVFormat

public CSVFormat getCSVFormat()
Returns:
The CSV format being used.

getGroups

public Set<OutputFieldGroup> getGroups()
Returns:
The object groups.

getInputFields

public List<InputField> getInputFields()
Returns:
The input fields.

getNetworkInputLayerSize

public int getNetworkInputLayerSize()
Returns:
The number of output fields that are not used as ideal values, these will be the input to the neural network. This is the input layer size for the neural network.

getNetworkOutputLayerSize

public int getNetworkOutputLayerSize()
Returns:
The number of output fields that are used as ideal values, these will be the ideal output from the neural network. This is the output layer size for the neural network.

getOutputFieldCount

public int getOutputFieldCount()
Returns:
The total size of all output fields. This takes into account output fields that generate more than one value.

getOutputFields

public List<OutputField> getOutputFields()
Returns:
The output fields.

getRecordCount

public int getRecordCount()
Returns:
The record count.

getReport

public StatusReportable getReport()
Returns:
The class that progress will be reported to.

getSegregators

public List<Segregator> getSegregators()
Returns:
The segregators in use.

getStorage

public NormalizationStorage getStorage()
Returns:
The place that the normalization output will be stored.

initForOutput

public void initForOutput()
Setup the row for output.


initForPass

public void initForPass()
Setup the row for output.


init

public void init()

process

public void process()
Call this method to begin the normalization process. Any status updates will be sent to the class specified in the constructor.


setCSVFormat

public void setCSVFormat(CSVFormat csvFormat)
Set the CSV format to use.

Parameters:
csvFormat - The CSV format to use.

setReport

public void setReport(StatusReportable report)
Set the object that this one is reporting to.

Parameters:
report - The object that progress reports should be sent to.

setTarget

public void setTarget(NormalizationStorage target)
Determines where the normalized data will be sent.

Parameters:
target - The target.

twoPassesNeeded

public boolean twoPassesNeeded()
Returns:
True, if two passes are needed.


Copyright © 2014. All Rights Reserved.