From 09ae3171dcc60933ed9a1bc3ebf27e6611423626 Mon Sep 17 00:00:00 2001 From: Jingjing Ren Date: Mon, 5 Dec 2016 10:56:29 -0500 Subject: update outline --- chapter/8/big-data.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/chapter/8/big-data.md b/chapter/8/big-data.md index 54dde79..608341e 100644 --- a/chapter/8/big-data.md +++ b/chapter/8/big-data.md @@ -30,7 +30,7 @@ by: "Jingjing and Abhilash" - Graphs : - Pregel :Overview of Pregel. Its implementation and working. its limitations. Do not stress more since we have a better model GraphX to explain a lot. - GraphX : Working on this. - - SparkSQL Catalyst & Spark execution model : Discuss Parser, LogicalPlan, Optimizer, PhysicalPlan, Execution Plan. Why catalyst? how catalyst helps in SparkSQL , data flow from sql-core-> catalyst->spark-core + - SparkSQL Catalyst & Spark execution model : Discuss Parser, LogicalPlan, Optimizer, PhysicalPlan, Execution Plan. Why catalyst? how catalyst helps in SparkSQL , data flow from sql-core-> catalyst->spark-core - Evaluation: Given same algorithm, what is the performance differences between Hadoop, Spark, Dryad? There are no direct comparison for all those models, so we may want to compare separately: - Hadoop vs. Spark @@ -77,7 +77,7 @@ reduce(String key, Iterator values): Emit(AsString(result)); ``` -*Execution* +*Execution* `TODO: move this to execution and talk about fault-tolerance instead` At high level, when the user program calls *MapReduce* function, the input files are split into *M* pieces and it runs *map* function on corresponding splits; then intermediate key space are partitioned into *R* pieces using a partitioning function; After the reduce functions all successfully complete, the output is available in *R* files. The sequences of actions are shown in the figure below. We can see from label (4) and (5) that the intermediate key/value pairs are written/read into disks, this is a key to fault-tolerance in MapReduce model and also a bottleneck for more complex computation algorithms.
-- cgit v1.2.3