diff options
| author | msabhi <abhi.is2006@gmail.com> | 2016-12-04 08:52:46 -0500 |
|---|---|---|
| committer | GitHub <noreply@github.com> | 2016-12-04 08:52:46 -0500 |
| commit | 729cbb73db20226f91b40d16c4af9102c3c80b98 (patch) | |
| tree | a0550080aeb14194f561defad9b212fba33aac13 | |
| parent | a9883554b8e4ab00e41dbd8a358f97628f35f392 (diff) | |
Fixed indentation
| -rw-r--r-- | chapter/8/big-data.md | 23 |
1 files changed, 12 insertions, 11 deletions
diff --git a/chapter/8/big-data.md b/chapter/8/big-data.md index bae6b83..6778f52 100644 --- a/chapter/8/big-data.md +++ b/chapter/8/big-data.md @@ -105,20 +105,21 @@ The properties that power RDD with the above mentioned features : Spark API provide two kinds of operations on a RDD: -Transformations - lazy operations that return another RDD. -`map (f : T => U) : RDD[T] ⇒ RDD[U]` : Return a MappedRDD[U] by applying function f to each element -`flatMap( f : T ⇒ Seq[U]) : RDD[T] ⇒ RDD[U]` : Return a new FlatMappedRDD[U] by first applying a function to all elements and then flattening the results. -`filter(f:T⇒Bool) : RDD[T] ⇒ RDD[T]` : Return a FilteredRDD[T] having elemnts that f return true -`groupByKey()` : Being called on (K,V) Rdd, return a new RDD[([K], Iterable[V])] -`reduceByKey(f: (V, V) => V)` : Being called on (K, V) Rdd, return a new RDD[(K, V)] by aggregating values using eg: reduceByKey(_+_) -`join((RDD[(K, V)], RDD[(K, W)]) ⇒ RDD[(K, (V, W))]` :Being called on (K,V) Rdd, return a new RDD[(K, (V, W))] by joining them by key K. +- Transformations - lazy operations that return another RDD. + - `map (f : T => U) : RDD[T] ⇒ RDD[U]` : Return a MappedRDD[U] by applying function f to each element + - `flatMap( f : T ⇒ Seq[U]) : RDD[T] ⇒ RDD[U]` : Return a new FlatMappedRDD[U] by first applying a function to all elements and then flattening the results. + - `filter(f:T⇒Bool) : RDD[T] ⇒ RDD[T]` : Return a FilteredRDD[T] having elemnts that f return true + - `groupByKey()` : Being called on (K,V) Rdd, return a new RDD[([K], Iterable[V])] + - `reduceByKey(f: (V, V) => V)` : Being called on (K, V) Rdd, return a new RDD[(K, V)] by aggregating values using eg: reduceByKey(_+_) + - `join((RDD[(K, V)], RDD[(K, W)]) ⇒ RDD[(K, (V, W))]` :Being called on (K,V) Rdd, return a new RDD[(K, (V, W))] by joining them by key K. -Actions - operations that trigger computation on a RDD and return values. -`reduce(f:(T,T)⇒T) : RDD[T] ⇒ T` : return T by reducing the elements using specified commutative and associative binary operator -`collect()` : Return an Array[T] containing all elements -`count()` : Return the number of elements +- Actions - operations that trigger computation on a RDD and return values. + + - `reduce(f:(T,T)⇒T) : RDD[T] ⇒ T` : return T by reducing the elements using specified commutative and associative binary operator + - `collect()` : Return an Array[T] containing all elements + - `count()` : Return the number of elements Why RDD over Distributed Shared memory (DSM) ? |
