aboutsummaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authormsabhi <abhi.is2006@gmail.com>2016-12-04 08:51:00 -0500
committerGitHub <noreply@github.com>2016-12-04 08:51:00 -0500
commita9883554b8e4ab00e41dbd8a358f97628f35f392 (patch)
tree29fd71a3ac5597e9f696ac1a7a2372ebc0bea3d7
parentb92cacd9c46dd9da407eacad33a6fdb9acbf2ff2 (diff)
Fix diagram
-rw-r--r--chapter/8/big-data.md4
1 files changed, 4 insertions, 0 deletions
diff --git a/chapter/8/big-data.md b/chapter/8/big-data.md
index 1ca16aa..bae6b83 100644
--- a/chapter/8/big-data.md
+++ b/chapter/8/big-data.md
@@ -97,9 +97,13 @@ The properties that power RDD with the above mentioned features :
- A compute function to do a computation on partitions.
- Optionally, a Partitioner for key-value RDDs (e.g. to say that the RDD is hash-partitioned)
- Optional preferred locations (aka locality info), (e.g. block locations for an HDFS file)
+
+
<figure class="main-container">
<img src="./spark_pipeline.png" alt="MapReduce Execution Overview" />
</figure>
+
+
Spark API provide two kinds of operations on a RDD:
Transformations - lazy operations that return another RDD.
`map (f : T => U) : RDD[T] ⇒ RDD[U]` : Return a MappedRDD[U] by applying function f to each element