org.encog.app.analyst.csv.balance
Class BalanceCSV

java.lang.Object
  extended by org.encog.app.analyst.csv.basic.BasicFile
      extended by org.encog.app.analyst.csv.balance.BalanceCSV
All Implemented Interfaces:
QuantTask

public class BalanceCSV
extends BasicFile

Balance a CSV file. This utility is useful when you have several an unbalanced training set. You may have a large number of one particular class, and many fewer elements of other classes. This can hinder many Machine Learning methods. This class can be used to balance the data. Obviously this class cannot generate data. You must request how many items you want per class. Some classes will have lower than this number if they were already below the specified amount. Any class above this amount will be trimmed to that amount.


Field Summary
 
Fields inherited from class org.encog.app.analyst.csv.basic.BasicFile
REPORT_INTERVAL
 
Constructor Summary
BalanceCSV()
           
 
Method Summary
 void analyze(File inputFile, boolean headers, CSVFormat format)
          Analyze the data.
 String dumpCounts()
          Return a string that lists the counts per class.
 Map<String,Integer> getCounts()
           
 void process(File outputFile, int targetField, int countPer)
          Process and balance the data.
 
Methods inherited from class org.encog.app.analyst.csv.basic.BasicFile
appendSeparator, getColumnCount, getFormat, getInputFilename, getInputHeadings, getPrecision, getRecordCount, getReport, getReportInterval, getScript, isAnalyzed, isExpectInputHeaders, isProduceOutputHeaders, performBasicCounts, prepareOutputFile, readHeaders, reportDone, reportDone, requestStop, resetStatus, setAnalyzed, setColumnCount, setExpectInputHeaders, setInputFilename, setInputFormat, setInputHeadings, setPrecision, setProduceOutputHeaders, setRecordCount, setReport, setReportInterval, setScript, shouldStop, toString, updateStatus, updateStatus, validateAnalyzed, writeRow
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
 

Constructor Detail

BalanceCSV

public BalanceCSV()
Method Detail

analyze

public void analyze(File inputFile,
                    boolean headers,
                    CSVFormat format)
Analyze the data. This counts the records and prepares the data to be processed.

Parameters:
inputFile - The input file to process.
headers - True, if headers are present.
format - The format of the CSV file.

dumpCounts

public String dumpCounts()
Return a string that lists the counts per class.

Returns:
The counts per class.

getCounts

public Map<String,Integer> getCounts()
Returns:
Tracks the counts of each class.

process

public void process(File outputFile,
                    int targetField,
                    int countPer)
Process and balance the data.

Parameters:
outputFile - The output file to write data to.
targetField - The field that is being balanced, this field determines the classes.
countPer - The desired count per class.


Copyright © 2014. All Rights Reserved.