org.encog.app.analyst.csv.balance
Class BalanceCSV
java.lang.Object
org.encog.app.analyst.csv.basic.BasicFile
org.encog.app.analyst.csv.balance.BalanceCSV
- All Implemented Interfaces:
- QuantTask
public class BalanceCSV
- extends BasicFile
Balance a CSV file. This utility is useful when you have several an
unbalanced training set. You may have a large number of one particular class,
and many fewer elements of other classes. This can hinder many Machine
Learning methods. This class can be used to balance the data.
Obviously this class cannot generate data. You must request how many items
you want per class. Some classes will have lower than this number if they
were already below the specified amount. Any class above this amount will be
trimmed to that amount.
Methods inherited from class org.encog.app.analyst.csv.basic.BasicFile |
appendSeparator, getColumnCount, getFormat, getInputFilename, getInputHeadings, getPrecision, getRecordCount, getReport, getReportInterval, getScript, isAnalyzed, isExpectInputHeaders, isProduceOutputHeaders, performBasicCounts, prepareOutputFile, readHeaders, reportDone, reportDone, requestStop, resetStatus, setAnalyzed, setColumnCount, setExpectInputHeaders, setInputFilename, setInputFormat, setInputHeadings, setPrecision, setProduceOutputHeaders, setRecordCount, setReport, setReportInterval, setScript, shouldStop, toString, updateStatus, updateStatus, validateAnalyzed, writeRow |
BalanceCSV
public BalanceCSV()
analyze
public void analyze(File inputFile,
boolean headers,
CSVFormat format)
- Analyze the data. This counts the records and prepares the data to be
processed.
- Parameters:
inputFile
- The input file to process.headers
- True, if headers are present.format
- The format of the CSV file.
dumpCounts
public String dumpCounts()
- Return a string that lists the counts per class.
- Returns:
- The counts per class.
getCounts
public Map<String,Integer> getCounts()
- Returns:
- Tracks the counts of each class.
process
public void process(File outputFile,
int targetField,
int countPer)
- Process and balance the data.
- Parameters:
outputFile
- The output file to write data to.targetField
- The field that is being balanced, this field determines the
classes.countPer
- The desired count per class.
Copyright © 2014. All Rights Reserved.