public abstract class CassandraRDD<T>
extends org.apache.spark.rdd.RDD<T>
Cells
element.| Modifier and Type | Field and Description |
|---|---|
protected org.apache.spark.broadcast.Broadcast<IDeepJobConfig<T>> |
config |
| Constructor and Description |
|---|
CassandraRDD(org.apache.spark.SparkContext sc,
IDeepJobConfig<T> config)
Public constructor that builds a new Cassandra RDD given the context and the configuration file.
|
| Modifier and Type | Method and Description |
|---|---|
scala.collection.Iterator<T> |
compute(org.apache.spark.Partition split,
org.apache.spark.TaskContext ctx)
Computes the current RDD over the given data partition.
|
static <W,T extends IDeepType> |
cql3SaveRDDToCassandra(org.apache.spark.rdd.RDD<W> rdd,
IDeepJobConfig<W> writeConfig)
Persists the given RDD to the underlying Cassandra datastore using the java cql3 driver.
Beware: this method does not perform a distributed write as saveRDDToCassandra(org.apache.spark.rdd.RDD<W>, com.stratio.deep.config.IDeepJobConfig<W>)
does, uses the Datastax Java Driver to perform a batch write to the Cassandra server.This currently scans the partitions one by one, so it will be slow if a lot of partitions are required. |
protected scala.runtime.AbstractFunction0<scala.runtime.BoxedUnit> |
getComputeCallback(DeepRecordReader recordReader,
DeepPartition dp)
Gets an instance of the callback that will be used on the completion of the computation of this RDD.
|
org.apache.spark.Partition[] |
getPartitions()
Returns the partitions on which this RDD depends on.
|
scala.collection.Seq<String> |
getPreferredLocations(org.apache.spark.Partition split)
Returns a list of hosts on which the given split resides.
|
static <W> void |
saveRDDToCassandra(org.apache.spark.api.java.JavaRDD<W> rdd,
IDeepJobConfig<W> writeConfig)
Persists the given JavaRDD to the underlying Cassandra datastore.
|
static <W,T extends IDeepType> |
saveRDDToCassandra(org.apache.spark.rdd.RDD<W> rdd,
IDeepJobConfig<W> writeConfig)
Persists the given RDD of Cells to the underlying Cassandra datastore, using configuration
options provided by writeConfig.
|
protected abstract T |
transformElement(Pair<Map<String,ByteBuffer>,Map<String,ByteBuffer>> elem)
Transform a row coming from the Cassandra's API to an element of
type
|
$plus$plus, aggregate, cache, cartesian, checkpoint, checkpointData_$eq, checkpointData, clearDependencies, coalesce, coalesce$default$2, collect, collect, collectPartitions, computeOrReadCheckpoint, conf, context, count, countApprox, countApprox$default$2, countApproxDistinct, countApproxDistinct$default$1, countByValue, countByValueApprox, countByValueApprox$default$2, dependencies, distinct, distinct, doCheckpoint, elementClassTag, filter, filterWith, first, firstParent, flatMap, flatMapWith, flatMapWith$default$2, fold, foreach, foreachPartition, foreachWith, generator_$eq, generator, getCheckpointFile, getDependencies, getStorageLevel, glom, groupBy, groupBy, groupBy, id, isCheckpointed, isTraceEnabled, iterator, keyBy, log, logDebug, logDebug, logError, logError, logInfo, logInfo, logTrace, logTrace, logWarning, logWarning, map, mapPartitions, mapPartitions$default$2, mapPartitionsWithContext, mapPartitionsWithContext$default$2, mapPartitionsWithIndex, mapPartitionsWithIndex$default$2, mapPartitionsWithSplit, mapPartitionsWithSplit$default$2, mapWith, mapWith$default$2, markCheckpointed, name_$eq, name, org$apache$spark$Logging$$log__$eq, org$apache$spark$Logging$$log_, org$apache$spark$rdd$RDD$$countPartition$1, org$apache$spark$rdd$RDD$$debugString$1, org$apache$spark$rdd$RDD$$dependencies__$eq, org$apache$spark$rdd$RDD$$dependencies_, org$apache$spark$rdd$RDD$$mergeMaps$1, org$apache$spark$rdd$RDD$$partitions__$eq, org$apache$spark$rdd$RDD$$partitions_, origin, partitioner, partitions, persist, persist, pipe, pipe, pipe, pipe$default$2, pipe$default$3, pipe$default$4, preferredLocations, reduce, repartition, sample, saveAsObjectFile, saveAsTextFile, saveAsTextFile, setGenerator, setName, sparkContext, subtract, subtract, subtract, take, takeOrdered, takeSample, toArray, toDebugString, toJavaRDD, top, toString, union, unpersist, unpersist$default$1, zip, zipPartitions, zipPartitions, zipPartitions, zipPartitions, zipPartitions, zipPartitionsprotected final org.apache.spark.broadcast.Broadcast<IDeepJobConfig<T>> config
public CassandraRDD(org.apache.spark.SparkContext sc,
IDeepJobConfig<T> config)
sc - the spark context to which the RDD will be bound to.config - the deep configuration object.protected abstract T transformElement(Pair<Map<String,ByteBuffer>,Map<String,ByteBuffer>> elem)
elem - the element to transform.public static <W,T extends IDeepType> void cql3SaveRDDToCassandra(org.apache.spark.rdd.RDD<W> rdd, IDeepJobConfig<W> writeConfig)
saveRDDToCassandra(org.apache.spark.rdd.RDD<W>, com.stratio.deep.config.IDeepJobConfig<W>)
does, uses the Datastax Java Driver to perform a batch write to the Cassandra server.rdd - the RDD to persist.writeConfig - the write configuration object.public static <W,T extends IDeepType> void saveRDDToCassandra(org.apache.spark.rdd.RDD<W> rdd, IDeepJobConfig<W> writeConfig)
rdd - the RDD to persist.writeConfig - the write configuration object.public static <W> void saveRDDToCassandra(org.apache.spark.api.java.JavaRDD<W> rdd,
IDeepJobConfig<W> writeConfig)
W - the generic type associated to the provided configuration object.rdd - the RDD to persist.writeConfig - the write configuration object.public scala.collection.Iterator<T> compute(org.apache.spark.Partition split, org.apache.spark.TaskContext ctx)
compute in class org.apache.spark.rdd.RDD<T>protected scala.runtime.AbstractFunction0<scala.runtime.BoxedUnit> getComputeCallback(DeepRecordReader recordReader, DeepPartition dp)
recordReader - the deep record reader.dp - the spark deep partition.public org.apache.spark.Partition[] getPartitions()
getPartitions in class org.apache.spark.rdd.RDD<T>Copyright © 2014. All rights reserved.