From 768f7e51fd7d6bafdc5658b86503463f7e4a2486 Mon Sep 17 00:00:00 2001 From: msabhi Date: Sun, 4 Dec 2016 07:41:04 -0500 Subject: Updated outline --- chapter/8/big-data.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/chapter/8/big-data.md b/chapter/8/big-data.md index c6baac0..cc11e28 100644 --- a/chapter/8/big-data.md +++ b/chapter/8/big-data.md @@ -14,13 +14,13 @@ This chapter is organized in - PM of Dryad: can support DAG computation, limitations: low-level, `Q: Should this go to execution model?` - PM of Spark, RDD/lineage: can support iterative algorithm, interactive analytics - Large-scale Parallelism on Graphs - - PM of Pregel/GraphX + - PM of Pregel - Querying: more declarative `Q: put here or in the execution model?` - DryadLINQ, SQL-like, use Dryad as execution engine; - Pig, on top of Hadoop, independent of execution platform, in theory can compiled into DryadLINQ too; what is the performance gain/lost? Easier to debug? - Hive, SQL-like, on top of Hadoop, what is the performance gain/lost. - Dremel, query natively w/o translating into MP jobs - - Spark SQL, on top of Spark + - Spark SQL - how is it different from other above models? How does it leverage Spark execution engine and enhanced RDDs like data frames? what are its goals? whats a Dataframe API and how is it different from a RDD? - Execution Models - MapReduce (intermediate writes to disk) -- cgit v1.2.3