Constructor and Description |
---|
SummaryStatsGenerator(long datasetSchemaId,
long datasetVersionId,
org.wso2.carbon.ml.commons.domain.config.SummaryStatisticsSettings summaryStatsSettings,
DatasetProcessor processor) |
Modifier and Type | Method and Description |
---|---|
protected List<org.apache.commons.math3.stat.descriptive.DescriptiveStatistics> |
calculateDescriptiveStats()
Calculate descriptive statistics for Numerical columns.
|
protected List<SortedMap<?,Integer>> |
calculateIntervalFreqs(int column,
int intervals)
Calculate the frequencies of each interval of a column.
|
protected List<SortedMap<?,Integer>> |
calculateNumericColumnFrequencies()
Calculate the frequencies of each category/interval of Numerical data columns.
|
protected List<SortedMap<?,Integer>> |
calculateStringColumnFrequencies()
Calculate the frequencies of each category in String columns, needed to plot bar graphs/histograms.
|
protected String[] |
identifyColumnDataType()
Finds the columns with Categorical data and Numerical data.
|
void |
run()
get a summary of a sample from the given CSV file, including descriptive-statistics, missing values, unique
values and etc.
|
public SummaryStatsGenerator(long datasetSchemaId, long datasetVersionId, org.wso2.carbon.ml.commons.domain.config.SummaryStatisticsSettings summaryStatsSettings, DatasetProcessor processor)
public void run()
protected String[] identifyColumnDataType()
protected List<org.apache.commons.math3.stat.descriptive.DescriptiveStatistics> calculateDescriptiveStats()
protected List<SortedMap<?,Integer>> calculateStringColumnFrequencies()
protected List<SortedMap<?,Integer>> calculateNumericColumnFrequencies()
Copyright © 2015 WSO2, Inc.. All Rights Reserved.