Packages

c

org.apache.spark.sql.execution.aggregate

TungstenAggregationIterator

class TungstenAggregationIterator extends AggregationIterator with Logging

An iterator used to evaluate aggregate functions. It operates on UnsafeRows.

This iterator first uses hash-based aggregation to process input rows. It uses a hash map to store groups and their corresponding aggregation buffers. If this map cannot allocate memory from memory manager, it spills the map into disk and creates a new one. After processed all the input, then merge all the spills together using external sorter, and do sort-based aggregation.

The process has the following step:

  • Step 0: Do hash-based aggregation.
  • Step 1: Sort all entries of the hash map based on values of grouping expressions and spill them to disk.
  • Step 2: Create an external sorter based on the spilled sorted map entries and reset the map.
  • Step 3: Get a sorted KVIterator from the external sorter.
  • Step 4: Repeat step 0 until no more input.
  • Step 5: Initialize sort-based aggregation on the sorted iterator. Then, this iterator works in the way of sort-based aggregation.

The code of this class is organized as follows:

  • Part 1: Initializing aggregate functions.
  • Part 2: Methods and fields used by setting aggregation buffer values, processing input rows from inputIter, and generating output rows.
  • Part 3: Methods and fields used by hash-based aggregation.
  • Part 4: Methods and fields used when we switch to sort-based aggregation.
  • Part 5: Methods and fields used by sort-based aggregation.
  • Part 6: Loads input and process input rows.
  • Part 7: Public methods of this iterator.
  • Part 8: A utility function used to generate a result when there is no input and there is no grouping expression.
Linear Supertypes
AggregationIterator, Logging, Iterator[UnsafeRow], IterableOnceOps[UnsafeRow, Iterator, Iterator[UnsafeRow]], IterableOnce[UnsafeRow], AnyRef, Any
Ordering
  1. Alphabetic
  2. By Inheritance
Inherited
  1. TungstenAggregationIterator
  2. AggregationIterator
  3. Logging
  4. Iterator
  5. IterableOnceOps
  6. IterableOnce
  7. AnyRef
  8. Any
  1. Hide All
  2. Show All
Visibility
  1. Public
  2. Protected

Instance Constructors

  1. new TungstenAggregationIterator(partIndex: Int, groupingExpressions: Seq[NamedExpression], aggregateExpressions: Seq[AggregateExpression], aggregateAttributes: Seq[Attribute], initialInputBufferOffset: Int, resultExpressions: Seq[NamedExpression], newMutableProjection: (Seq[Expression], Seq[Attribute]) => MutableProjection, originalInputAttributes: Seq[Attribute], inputIter: Iterator[InternalRow], testFallbackStartsAt: Option[(Int, Int)], numOutputRows: SQLMetric, peakMemory: SQLMetric, spillSize: SQLMetric, avgHashProbe: SQLMetric, numTasksFallBacked: SQLMetric)

    partIndex

    index of the partition

    groupingExpressions

    expressions for grouping keys

    aggregateExpressions

    AggregateExpression containing AggregateFunctions with mode Partial, PartialMerge, or Final.

    aggregateAttributes

    the attributes of the aggregateExpressions' outputs when they are stored in the final aggregation buffer.

    resultExpressions

    expressions for generating output rows.

    newMutableProjection

    the function used to create mutable projections.

    originalInputAttributes

    attributes of representing input rows from inputIter.

    inputIter

    the iterator containing input UnsafeRows.

Type Members

  1. class GroupedIterator[B >: A] extends AbstractIterator[Seq[B]]
    Definition Classes
    Iterator

Value Members

  1. final def !=(arg0: Any): Boolean
    Definition Classes
    AnyRef → Any
  2. final def ##: Int
    Definition Classes
    AnyRef → Any
  3. final def ++[B >: UnsafeRow](xs: => IterableOnce[B]): Iterator[B]
    Definition Classes
    Iterator
    Annotations
    @inline()
  4. final def ==(arg0: Any): Boolean
    Definition Classes
    AnyRef → Any
  5. final def addString(b: StringBuilder): StringBuilder
    Definition Classes
    IterableOnceOps
    Annotations
    @inline()
  6. final def addString(b: StringBuilder, sep: String): StringBuilder
    Definition Classes
    IterableOnceOps
    Annotations
    @inline()
  7. def addString(b: StringBuilder, start: String, sep: String, end: String): StringBuilder
    Definition Classes
    IterableOnceOps
  8. val aggregateFunctions: Array[AggregateFunction]
    Attributes
    protected
    Definition Classes
    AggregationIterator
  9. val allImperativeAggregateFunctionPositions: Array[Int]
    Attributes
    protected[this]
    Definition Classes
    AggregationIterator
  10. val allImperativeAggregateFunctions: Array[ImperativeAggregate]
    Attributes
    protected[this]
    Definition Classes
    AggregationIterator
  11. final def asInstanceOf[T0]: T0
    Definition Classes
    Any
  12. def buffered: BufferedIterator[UnsafeRow]
    Definition Classes
    Iterator
  13. def clone(): AnyRef
    Attributes
    protected[lang]
    Definition Classes
    AnyRef
    Annotations
    @throws(classOf[java.lang.CloneNotSupportedException]) @native()
  14. def collect[B](pf: PartialFunction[UnsafeRow, B]): Iterator[B]
    Definition Classes
    Iterator → IterableOnceOps
  15. def collectFirst[B](pf: PartialFunction[UnsafeRow, B]): Option[B]
    Definition Classes
    IterableOnceOps
  16. def concat[B >: UnsafeRow](xs: => IterableOnce[B]): Iterator[B]
    Definition Classes
    Iterator
  17. def contains(elem: Any): Boolean
    Definition Classes
    Iterator
  18. def copyToArray[B >: UnsafeRow](xs: Array[B], start: Int, len: Int): Int
    Definition Classes
    IterableOnceOps
  19. def copyToArray[B >: UnsafeRow](xs: Array[B], start: Int): Int
    Definition Classes
    IterableOnceOps
    Annotations
    @deprecatedOverriding("This should always forward to the 3-arg version of this method", "2.13.4")
  20. def copyToArray[B >: UnsafeRow](xs: Array[B]): Int
    Definition Classes
    IterableOnceOps
    Annotations
    @deprecatedOverriding("This should always forward to the 3-arg version of this method", "2.13.4")
  21. def corresponds[B](that: IterableOnce[B])(p: (UnsafeRow, B) => Boolean): Boolean
    Definition Classes
    IterableOnceOps
  22. def count(p: (UnsafeRow) => Boolean): Int
    Definition Classes
    IterableOnceOps
  23. def distinct: Iterator[UnsafeRow]
    Definition Classes
    Iterator
  24. def distinctBy[B](f: (UnsafeRow) => B): Iterator[UnsafeRow]
    Definition Classes
    Iterator
  25. def drop(n: Int): Iterator[UnsafeRow]
    Definition Classes
    Iterator → IterableOnceOps
  26. def dropWhile(p: (UnsafeRow) => Boolean): Iterator[UnsafeRow]
    Definition Classes
    Iterator → IterableOnceOps
  27. def duplicate: (Iterator[UnsafeRow], Iterator[UnsafeRow])
    Definition Classes
    Iterator
  28. final def eq(arg0: AnyRef): Boolean
    Definition Classes
    AnyRef
  29. def equals(arg0: AnyRef): Boolean
    Definition Classes
    AnyRef → Any
  30. def exists(p: (UnsafeRow) => Boolean): Boolean
    Definition Classes
    IterableOnceOps
  31. val expressionAggInitialProjection: MutableProjection
    Attributes
    protected[this]
    Definition Classes
    AggregationIterator
  32. def filter(p: (UnsafeRow) => Boolean): Iterator[UnsafeRow]
    Definition Classes
    Iterator → IterableOnceOps
  33. def filterNot(p: (UnsafeRow) => Boolean): Iterator[UnsafeRow]
    Definition Classes
    Iterator → IterableOnceOps
  34. def finalize(): Unit
    Attributes
    protected[lang]
    Definition Classes
    AnyRef
    Annotations
    @throws(classOf[java.lang.Throwable])
  35. def find(p: (UnsafeRow) => Boolean): Option[UnsafeRow]
    Definition Classes
    IterableOnceOps
  36. def flatMap[B](f: (UnsafeRow) => IterableOnce[B]): Iterator[B]
    Definition Classes
    Iterator → IterableOnceOps
  37. def flatten[B](implicit ev: (UnsafeRow) => IterableOnce[B]): Iterator[B]
    Definition Classes
    Iterator → IterableOnceOps
  38. def fold[A1 >: UnsafeRow](z: A1)(op: (A1, A1) => A1): A1
    Definition Classes
    IterableOnceOps
  39. def foldLeft[B](z: B)(op: (B, UnsafeRow) => B): B
    Definition Classes
    IterableOnceOps
  40. def foldRight[B](z: B)(op: (UnsafeRow, B) => B): B
    Definition Classes
    IterableOnceOps
  41. def forall(p: (UnsafeRow) => Boolean): Boolean
    Definition Classes
    IterableOnceOps
  42. def foreach[U](f: (UnsafeRow) => U): Unit
    Definition Classes
    IterableOnceOps
  43. val generateOutput: (UnsafeRow, InternalRow) => UnsafeRow
    Attributes
    protected
    Definition Classes
    AggregationIterator
  44. def generateProcessRow(expressions: Seq[AggregateExpression], functions: Seq[AggregateFunction], inputAttributes: Seq[Attribute]): (InternalRow, InternalRow) => Unit
    Attributes
    protected
    Definition Classes
    AggregationIterator
  45. def generateResultProjection(): (UnsafeRow, InternalRow) => UnsafeRow
    Attributes
    protected
    Definition Classes
    TungstenAggregationIteratorAggregationIterator
  46. final def getClass(): Class[_ <: AnyRef]
    Definition Classes
    AnyRef → Any
    Annotations
    @native()
  47. def grouped[B >: UnsafeRow](size: Int): GroupedIterator[B]
    Definition Classes
    Iterator
  48. val groupingAttributes: Seq[Attribute]
    Attributes
    protected
    Definition Classes
    AggregationIterator
  49. val groupingProjection: UnsafeProjection
    Attributes
    protected
    Definition Classes
    AggregationIterator
  50. final def hasNext: Boolean
    Definition Classes
    TungstenAggregationIterator → Iterator
  51. def hashCode(): Int
    Definition Classes
    AnyRef → Any
    Annotations
    @native()
  52. def indexOf[B >: UnsafeRow](elem: B, from: Int): Int
    Definition Classes
    Iterator
  53. def indexOf[B >: UnsafeRow](elem: B): Int
    Definition Classes
    Iterator
  54. def indexWhere(p: (UnsafeRow) => Boolean, from: Int): Int
    Definition Classes
    Iterator
  55. def initializeAggregateFunctions(expressions: Seq[AggregateExpression], startingInputBufferOffset: Int): Array[AggregateFunction]
    Attributes
    protected
    Definition Classes
    AggregationIterator
  56. def initializeBuffer(buffer: InternalRow): Unit

    Initializes buffer values for all aggregate functions.

    Initializes buffer values for all aggregate functions.

    Attributes
    protected
    Definition Classes
    AggregationIterator
  57. def initializeLogIfNecessary(isInterpreter: Boolean, silent: Boolean): Boolean
    Attributes
    protected
    Definition Classes
    Logging
  58. def initializeLogIfNecessary(isInterpreter: Boolean): Unit
    Attributes
    protected
    Definition Classes
    Logging
  59. def isEmpty: Boolean
    Definition Classes
    Iterator → IterableOnceOps
    Annotations
    @deprecatedOverriding("isEmpty is defined as !hasNext; override hasNext instead", "2.13.0")
  60. final def isInstanceOf[T0]: Boolean
    Definition Classes
    Any
  61. def isTraceEnabled(): Boolean
    Attributes
    protected
    Definition Classes
    Logging
  62. def isTraversableAgain: Boolean
    Definition Classes
    IterableOnceOps
  63. final def iterator: Iterator[UnsafeRow]
    Definition Classes
    Iterator → IterableOnce
    Annotations
    @inline()
  64. def knownSize: Int
    Definition Classes
    IterableOnce
  65. final def length: Int
    Definition Classes
    Iterator
    Annotations
    @inline()
  66. def log: Logger
    Attributes
    protected
    Definition Classes
    Logging
  67. def logDebug(msg: => String, throwable: Throwable): Unit
    Attributes
    protected
    Definition Classes
    Logging
  68. def logDebug(msg: => String): Unit
    Attributes
    protected
    Definition Classes
    Logging
  69. def logError(msg: => String, throwable: Throwable): Unit
    Attributes
    protected
    Definition Classes
    Logging
  70. def logError(msg: => String): Unit
    Attributes
    protected
    Definition Classes
    Logging
  71. def logInfo(msg: => String, throwable: Throwable): Unit
    Attributes
    protected
    Definition Classes
    Logging
  72. def logInfo(msg: => String): Unit
    Attributes
    protected
    Definition Classes
    Logging
  73. def logName: String
    Attributes
    protected
    Definition Classes
    Logging
  74. def logTrace(msg: => String, throwable: Throwable): Unit
    Attributes
    protected
    Definition Classes
    Logging
  75. def logTrace(msg: => String): Unit
    Attributes
    protected
    Definition Classes
    Logging
  76. def logWarning(msg: => String, throwable: Throwable): Unit
    Attributes
    protected
    Definition Classes
    Logging
  77. def logWarning(msg: => String): Unit
    Attributes
    protected
    Definition Classes
    Logging
  78. def map[B](f: (UnsafeRow) => B): Iterator[B]
    Definition Classes
    Iterator → IterableOnceOps
  79. def max[B >: UnsafeRow](implicit ord: Ordering[B]): UnsafeRow
    Definition Classes
    IterableOnceOps
  80. def maxBy[B](f: (UnsafeRow) => B)(implicit cmp: Ordering[B]): UnsafeRow
    Definition Classes
    IterableOnceOps
  81. def maxByOption[B](f: (UnsafeRow) => B)(implicit cmp: Ordering[B]): Option[UnsafeRow]
    Definition Classes
    IterableOnceOps
  82. def maxOption[B >: UnsafeRow](implicit ord: Ordering[B]): Option[UnsafeRow]
    Definition Classes
    IterableOnceOps
  83. def min[B >: UnsafeRow](implicit ord: Ordering[B]): UnsafeRow
    Definition Classes
    IterableOnceOps
  84. def minBy[B](f: (UnsafeRow) => B)(implicit cmp: Ordering[B]): UnsafeRow
    Definition Classes
    IterableOnceOps
  85. def minByOption[B](f: (UnsafeRow) => B)(implicit cmp: Ordering[B]): Option[UnsafeRow]
    Definition Classes
    IterableOnceOps
  86. def minOption[B >: UnsafeRow](implicit ord: Ordering[B]): Option[UnsafeRow]
    Definition Classes
    IterableOnceOps
  87. final def mkString: String
    Definition Classes
    IterableOnceOps
    Annotations
    @inline()
  88. final def mkString(sep: String): String
    Definition Classes
    IterableOnceOps
    Annotations
    @inline()
  89. final def mkString(start: String, sep: String, end: String): String
    Definition Classes
    IterableOnceOps
  90. final def ne(arg0: AnyRef): Boolean
    Definition Classes
    AnyRef
  91. final def next(): UnsafeRow
    Definition Classes
    TungstenAggregationIterator → Iterator
  92. def nextOption(): Option[UnsafeRow]
    Definition Classes
    Iterator
  93. def nonEmpty: Boolean
    Definition Classes
    IterableOnceOps
    Annotations
    @deprecatedOverriding("nonEmpty is defined as !isEmpty; override isEmpty instead", "2.13.0")
  94. final def notify(): Unit
    Definition Classes
    AnyRef
    Annotations
    @native()
  95. final def notifyAll(): Unit
    Definition Classes
    AnyRef
    Annotations
    @native()
  96. def outputForEmptyGroupingKeyWithoutInput(): UnsafeRow

    Generate an output row when there is no input and there is no grouping expression.

  97. def padTo[B >: UnsafeRow](len: Int, elem: B): Iterator[B]
    Definition Classes
    Iterator
  98. def partition(p: (UnsafeRow) => Boolean): (Iterator[UnsafeRow], Iterator[UnsafeRow])
    Definition Classes
    Iterator
  99. def patch[B >: UnsafeRow](from: Int, patchElems: Iterator[B], replaced: Int): Iterator[B]
    Definition Classes
    Iterator
  100. val processRow: (InternalRow, InternalRow) => Unit
    Attributes
    protected
    Definition Classes
    AggregationIterator
  101. def product[B >: UnsafeRow](implicit num: Numeric[B]): B
    Definition Classes
    IterableOnceOps
  102. def reduce[B >: UnsafeRow](op: (B, B) => B): B
    Definition Classes
    IterableOnceOps
  103. def reduceLeft[B >: UnsafeRow](op: (B, UnsafeRow) => B): B
    Definition Classes
    IterableOnceOps
  104. def reduceLeftOption[B >: UnsafeRow](op: (B, UnsafeRow) => B): Option[B]
    Definition Classes
    IterableOnceOps
  105. def reduceOption[B >: UnsafeRow](op: (B, B) => B): Option[B]
    Definition Classes
    IterableOnceOps
  106. def reduceRight[B >: UnsafeRow](op: (UnsafeRow, B) => B): B
    Definition Classes
    IterableOnceOps
  107. def reduceRightOption[B >: UnsafeRow](op: (UnsafeRow, B) => B): Option[B]
    Definition Classes
    IterableOnceOps
  108. def reversed: Iterable[UnsafeRow]
    Attributes
    protected
    Definition Classes
    IterableOnceOps
  109. def sameElements[B >: UnsafeRow](that: IterableOnce[B]): Boolean
    Definition Classes
    Iterator
  110. def scanLeft[B](z: B)(op: (B, UnsafeRow) => B): Iterator[B]
    Definition Classes
    Iterator → IterableOnceOps
  111. def size: Int
    Definition Classes
    IterableOnceOps
  112. def slice(from: Int, until: Int): Iterator[UnsafeRow]
    Definition Classes
    Iterator → IterableOnceOps
  113. def sliceIterator(from: Int, until: Int): Iterator[UnsafeRow]
    Attributes
    protected
    Definition Classes
    Iterator
  114. def sliding[B >: UnsafeRow](size: Int, step: Int): GroupedIterator[B]
    Definition Classes
    Iterator
  115. def span(p: (UnsafeRow) => Boolean): (Iterator[UnsafeRow], Iterator[UnsafeRow])
    Definition Classes
    Iterator → IterableOnceOps
  116. def splitAt(n: Int): (Iterator[UnsafeRow], Iterator[UnsafeRow])
    Definition Classes
    IterableOnceOps
  117. def stepper[S <: Stepper[_]](implicit shape: StepperShape[UnsafeRow, S]): S
    Definition Classes
    IterableOnce
  118. def sum[B >: UnsafeRow](implicit num: Numeric[B]): B
    Definition Classes
    IterableOnceOps
  119. final def synchronized[T0](arg0: => T0): T0
    Definition Classes
    AnyRef
  120. def take(n: Int): Iterator[UnsafeRow]
    Definition Classes
    Iterator → IterableOnceOps
  121. def takeWhile(p: (UnsafeRow) => Boolean): Iterator[UnsafeRow]
    Definition Classes
    Iterator → IterableOnceOps
  122. def tapEach[U](f: (UnsafeRow) => U): Iterator[UnsafeRow]
    Definition Classes
    Iterator → IterableOnceOps
  123. def to[C1](factory: Factory[UnsafeRow, C1]): C1
    Definition Classes
    IterableOnceOps
  124. def toArray[B >: UnsafeRow](implicit arg0: ClassTag[B]): Array[B]
    Definition Classes
    IterableOnceOps
  125. final def toBuffer[B >: UnsafeRow]: Buffer[B]
    Definition Classes
    IterableOnceOps
    Annotations
    @inline()
  126. def toIndexedSeq: IndexedSeq[UnsafeRow]
    Definition Classes
    IterableOnceOps
  127. def toList: List[UnsafeRow]
    Definition Classes
    IterableOnceOps
  128. def toMap[K, V](implicit ev: <:<[UnsafeRow, (K, V)]): Map[K, V]
    Definition Classes
    IterableOnceOps
  129. def toSeq: Seq[UnsafeRow]
    Definition Classes
    IterableOnceOps
  130. def toSet[B >: UnsafeRow]: Set[B]
    Definition Classes
    IterableOnceOps
  131. def toString(): String
    Definition Classes
    Iterator → AnyRef → Any
  132. def toVector: Vector[UnsafeRow]
    Definition Classes
    IterableOnceOps
  133. final def wait(): Unit
    Definition Classes
    AnyRef
    Annotations
    @throws(classOf[java.lang.InterruptedException])
  134. final def wait(arg0: Long, arg1: Int): Unit
    Definition Classes
    AnyRef
    Annotations
    @throws(classOf[java.lang.InterruptedException])
  135. final def wait(arg0: Long): Unit
    Definition Classes
    AnyRef
    Annotations
    @throws(classOf[java.lang.InterruptedException]) @native()
  136. def withFilter(p: (UnsafeRow) => Boolean): Iterator[UnsafeRow]
    Definition Classes
    Iterator
  137. def zip[B](that: IterableOnce[B]): Iterator[(UnsafeRow, B)]
    Definition Classes
    Iterator
  138. def zipAll[A1 >: UnsafeRow, B](that: IterableOnce[B], thisElem: A1, thatElem: B): Iterator[(A1, B)]
    Definition Classes
    Iterator
  139. def zipWithIndex: Iterator[(UnsafeRow, Int)]
    Definition Classes
    Iterator → IterableOnceOps

Deprecated Value Members

  1. final def /:[B](z: B)(op: (B, UnsafeRow) => B): B
    Definition Classes
    IterableOnceOps
    Annotations
    @deprecated @inline()
    Deprecated

    (Since version 2.13.0) Use foldLeft instead of /:

  2. final def :\[B](z: B)(op: (UnsafeRow, B) => B): B
    Definition Classes
    IterableOnceOps
    Annotations
    @deprecated @inline()
    Deprecated

    (Since version 2.13.0) Use foldRight instead of :\

  3. def aggregate[B](z: => B)(seqop: (B, UnsafeRow) => B, combop: (B, B) => B): B
    Definition Classes
    IterableOnceOps
    Annotations
    @deprecated
    Deprecated

    (Since version 2.13.0) aggregate is not relevant for sequential collections. Use foldLeft(z)(seqop) instead.

  4. final def copyToBuffer[B >: UnsafeRow](dest: Buffer[B]): Unit
    Definition Classes
    IterableOnceOps
    Annotations
    @deprecated @inline()
    Deprecated

    (Since version 2.13.0) Use dest ++= coll instead

  5. final def hasDefiniteSize: Boolean
    Definition Classes
    Iterator → IterableOnceOps
    Annotations
    @deprecated @inline()
    Deprecated

    (Since version 2.13.0) hasDefiniteSize on Iterator is the same as isEmpty

  6. def scanRight[B](z: B)(op: (UnsafeRow, B) => B): Iterator[B]
    Definition Classes
    Iterator
    Annotations
    @deprecated
    Deprecated

    (Since version 2.13.0) Call scanRight on an Iterable instead.

  7. def seq: TungstenAggregationIterator.this.type
    Definition Classes
    Iterator
    Annotations
    @deprecated
    Deprecated

    (Since version 2.13.0) Iterator.seq always returns the iterator itself

  8. final def toIterator: Iterator[UnsafeRow]
    Definition Classes
    IterableOnceOps
    Annotations
    @deprecated @inline()
    Deprecated

    (Since version 2.13.0) Use .iterator instead of .toIterator

  9. final def toStream: Stream[UnsafeRow]
    Definition Classes
    IterableOnceOps
    Annotations
    @deprecated @inline()
    Deprecated

    (Since version 2.13.0) Use .to(LazyList) instead of .toStream

Inherited from AggregationIterator

Inherited from Logging

Inherited from Iterator[UnsafeRow]

Inherited from IterableOnceOps[UnsafeRow, Iterator, Iterator[UnsafeRow]]

Inherited from IterableOnce[UnsafeRow]

Inherited from AnyRef

Inherited from Any

Ungrouped