From cd0e236b5a13cb8ef7f96d21d0d82d611e0c64fd Mon Sep 17 00:00:00 2001 From: msabhi Date: Sun, 4 Dec 2016 08:59:12 -0500 Subject: Update big-data.md --- chapter/8/big-data.md | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-) diff --git a/chapter/8/big-data.md b/chapter/8/big-data.md index 6778f52..56bb9e2 100644 --- a/chapter/8/big-data.md +++ b/chapter/8/big-data.md @@ -14,7 +14,7 @@ This chapter is organized in - PM of Dryad: can support DAG computation, limitations: low-level, `Q: Should this go to execution model?` - PM of Spark, RDD/lineage: can support iterative algorithm, interactive analytics - Large-scale Parallelism on Graphs - - PM of Pregel + - Why a separate graph processing model? what is a BSP? working of BSP? Do not stress more since its not a map reduce world exactly. - Querying: more declarative `Q: put here or in the execution model?` - DryadLINQ, SQL-like, use Dryad as execution engine; - Pig, on top of Hadoop, independent of execution platform, in theory can compiled into DryadLINQ too; what is the performance gain/lost? Easier to debug? @@ -26,9 +26,10 @@ This chapter is organized in - MapReduce (intermediate writes to disk) - Limitations, iteration, performance - Spark (all in memory) - - Limitations ? + what is Spark? why is Spark so powerful - RDD and API? What is a RDD and why is it so efficient? properties of a RDD? + why is RDD better than DSM? What are the transformations and actions available in Spark ? what are the limitations of Spark ? - Pregel - - Limitations ? + Overview of Pregel. Its implementation and working. its limitations. Do not stress more since we have a better model GraphX to explain a lot. - Performance - Things people are building on top of MapReduce/Spark - // FlumeJava? ...Etc -- cgit v1.2.3