com.univocity.parsers.common.processor.core
Class AbstractBatchedColumnProcessor<T extends Context>

java.lang.Object
  extended by com.univocity.parsers.common.processor.core.AbstractBatchedColumnProcessor<T>
All Implemented Interfaces:
Processor<T>
Direct Known Subclasses:
BatchedColumnProcessor

public abstract class AbstractBatchedColumnProcessor<T extends Context>
extends Object
implements Processor<T>

A Processor implementation that stores values of columns in batches. Use this implementation in favor of AbstractColumnProcessor when processing large inputs to avoid running out of memory. Values parsed in each row will be split into columns of Strings. Each column has its own list of values.

During the execution of the process, the batchProcessed(int) method will be invoked after a given number of rows has been processed.

The user can access the lists with values parsed for all columns using the methods getColumnValuesAsList(), getColumnValuesAsMapOfIndexes() and getColumnValuesAsMapOfNames().

After batchProcessed(int) is invoked, all values will be discarded and the next batch of column values will be accumulated. This process will repeat until there's no more rows in the input.

Author:
uniVocity Software Pty Ltd - parsers@univocity.com
See Also:
AbstractParser, BatchedColumnReader, Processor

Constructor Summary
AbstractBatchedColumnProcessor(int rowsPerBatch)
          Constructs a batched column processor configured to invoke the batchesProcessed method after a given number of rows has been processed.
 
Method Summary
abstract  void batchProcessed(int rowsInThisBatch)
          Callback to the user, where the lists with values parsed for all columns can be accessed using the methods getColumnValuesAsList(), getColumnValuesAsMapOfIndexes() and getColumnValuesAsMapOfNames().
 int getBatchesProcessed()
          Returns the number of batches already processed
 List<String> getColumn(int columnIndex)
          Returns the values of a given column.
 List<String> getColumn(String columnName)
          Returns the values of a given column.
 List<List<String>> getColumnValuesAsList()
          Returns the values processed for each column
 Map<Integer,List<String>> getColumnValuesAsMapOfIndexes()
          Returns a map of column indexes and their respective list of values parsed from the input.
 Map<String,List<String>> getColumnValuesAsMapOfNames()
          Returns a map of column names and their respective list of values parsed from the input.
 String[] getHeaders()
          Returns the column headers.
 int getRowsPerBatch()
          Returns the number of rows processed in each batch
 void processEnded(T context)
          This method will by invoked by the parser once, after the parsing process stopped and all resources were closed.
 void processStarted(T context)
          This method will by invoked by the parser once, when it is ready to start processing the input.
 void putColumnValuesInMapOfIndexes(Map<Integer,List<String>> map)
          Fills a given map associating each column index to its list of values
 void putColumnValuesInMapOfNames(Map<String,List<String>> map)
          Fills a given map associating each column name to its list o values
 void rowProcessed(String[] row, T context)
          Invoked by the parser after all values of a valid record have been processed.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

AbstractBatchedColumnProcessor

public AbstractBatchedColumnProcessor(int rowsPerBatch)
Constructs a batched column processor configured to invoke the batchesProcessed method after a given number of rows has been processed.

Parameters:
rowsPerBatch - the number of rows to process in each batch.
Method Detail

processStarted

public void processStarted(T context)
Description copied from interface: Processor
This method will by invoked by the parser once, when it is ready to start processing the input.

Specified by:
processStarted in interface Processor<T extends Context>
Parameters:
context - A contextual object with information and controls over the current state of the parsing process

rowProcessed

public void rowProcessed(String[] row,
                         T context)
Description copied from interface: Processor
Invoked by the parser after all values of a valid record have been processed.

Specified by:
rowProcessed in interface Processor<T extends Context>
Parameters:
row - the data extracted by the parser for an individual record. Note that:
context - A contextual object with information and controls over the current state of the parsing process

processEnded

public void processEnded(T context)
Description copied from interface: Processor
This method will by invoked by the parser once, after the parsing process stopped and all resources were closed.

It will always be called by the parser: in case of errors, if the end of the input us reached, or if the user stopped the process manually using Context.stop().

Specified by:
processEnded in interface Processor<T extends Context>
Parameters:
context - A contextual object with information and controls over the state of the parsing process

getHeaders

public final String[] getHeaders()
Returns the column headers. This can be either the headers defined in CommonSettings.getHeaders() or the headers parsed in the input when CommonSettings.getHeaders() equals to true

Returns:
the headers of all column parsed.

getColumnValuesAsList

public final List<List<String>> getColumnValuesAsList()
Returns the values processed for each column

Returns:
a list of lists. The stored lists correspond to the position of the column processed from the input; Each list contains the corresponding values parsed for a column, across multiple rows.

putColumnValuesInMapOfNames

public final void putColumnValuesInMapOfNames(Map<String,List<String>> map)
Fills a given map associating each column name to its list o values

Parameters:
map - the map to hold the values of each column

putColumnValuesInMapOfIndexes

public final void putColumnValuesInMapOfIndexes(Map<Integer,List<String>> map)
Fills a given map associating each column index to its list of values

Parameters:
map - the map to hold the values of each column

getColumnValuesAsMapOfNames

public final Map<String,List<String>> getColumnValuesAsMapOfNames()
Returns a map of column names and their respective list of values parsed from the input.

Returns:
a map of column names and their respective list of values.

getColumnValuesAsMapOfIndexes

public final Map<Integer,List<String>> getColumnValuesAsMapOfIndexes()
Returns a map of column indexes and their respective list of values parsed from the input.

Returns:
a map of column indexes and their respective list of values.

getColumn

public List<String> getColumn(String columnName)
Returns the values of a given column.

Parameters:
columnName - the name of the column in the input.
Returns:
a list with all data stored in the given column

getColumn

public List<String> getColumn(int columnIndex)
Returns the values of a given column.

Parameters:
columnIndex - the position of the column in the input (0-based).
Returns:
a list with all data stored in the given column

getRowsPerBatch

public int getRowsPerBatch()
Returns the number of rows processed in each batch

Returns:
the number of rows per batch

getBatchesProcessed

public int getBatchesProcessed()
Returns the number of batches already processed

Returns:
the number of batches already processed

batchProcessed

public abstract void batchProcessed(int rowsInThisBatch)
Callback to the user, where the lists with values parsed for all columns can be accessed using the methods getColumnValuesAsList(), getColumnValuesAsMapOfIndexes() and getColumnValuesAsMapOfNames().

Parameters:
rowsInThisBatch - the number of rows processed in the current batch. This corresponds to the number of elements of each list of each column.


Copyright © 2017 uniVocity Software Pty Ltd. All rights reserved.