aboutsummaryrefslogtreecommitdiff
path: root/chapter/8
diff options
context:
space:
mode:
authorJingjing Ren <renjj@ccs.neu.edu>2016-12-05 10:56:29 -0500
committerJingjing Ren <renjj@ccs.neu.edu>2016-12-05 10:56:29 -0500
commit09ae3171dcc60933ed9a1bc3ebf27e6611423626 (patch)
tree1a50f6a4f03f476f18287760ae4ed49e5bc2a6c6 /chapter/8
parentd64b5eea953b10e02e0c9bc232a7b2a803addbdd (diff)
update outline
Diffstat (limited to 'chapter/8')
-rw-r--r--chapter/8/big-data.md4
1 files changed, 2 insertions, 2 deletions
diff --git a/chapter/8/big-data.md b/chapter/8/big-data.md
index 54dde79..608341e 100644
--- a/chapter/8/big-data.md
+++ b/chapter/8/big-data.md
@@ -30,7 +30,7 @@ by: "Jingjing and Abhilash"
- Graphs :
- Pregel :Overview of Pregel. Its implementation and working. its limitations. Do not stress more since we have a better model GraphX to explain a lot.
- GraphX : Working on this.
- - SparkSQL Catalyst & Spark execution model : Discuss Parser, LogicalPlan, Optimizer, PhysicalPlan, Execution Plan. Why catalyst? how catalyst helps in SparkSQL , data flow from sql-core-> catalyst->spark-core
+ - SparkSQL Catalyst & Spark execution model : Discuss Parser, LogicalPlan, Optimizer, PhysicalPlan, Execution Plan. Why catalyst? how catalyst helps in SparkSQL , data flow from sql-core-> catalyst->spark-core
- Evaluation: Given same algorithm, what is the performance differences between Hadoop, Spark, Dryad? There are no direct comparison for all those models, so we may want to compare separately:
- Hadoop vs. Spark
@@ -77,7 +77,7 @@ reduce(String key, Iterator values):
Emit(AsString(result));
```
-*Execution*
+*Execution* `TODO: move this to execution and talk about fault-tolerance instead`
At high level, when the user program calls *MapReduce* function, the input files are split into *M* pieces and it runs *map* function on corresponding splits; then intermediate key space are partitioned into *R* pieces using a partitioning function; After the reduce functions all successfully complete, the output is available in *R* files. The sequences of actions are shown in the figure below. We can see from label (4) and (5) that the intermediate key/value pairs are written/read into disks, this is a key to fault-tolerance in MapReduce model and also a bottleneck for more complex computation algorithms.
<figure class="main-container">