aboutsummaryrefslogtreecommitdiff
path: root/chapter/8
diff options
context:
space:
mode:
authormsabhi <abhi.is2006@gmail.com>2016-12-04 07:41:04 -0500
committerGitHub <noreply@github.com>2016-12-04 07:41:04 -0500
commit768f7e51fd7d6bafdc5658b86503463f7e4a2486 (patch)
tree74fc9bf61f0cc7f6a2b4b7615d13ef9f56bea8e6 /chapter/8
parentb90835ab7de523f18149ae26b3e972c6f6407e1e (diff)
Updated outline
Diffstat (limited to 'chapter/8')
-rw-r--r--chapter/8/big-data.md4
1 files changed, 2 insertions, 2 deletions
diff --git a/chapter/8/big-data.md b/chapter/8/big-data.md
index c6baac0..cc11e28 100644
--- a/chapter/8/big-data.md
+++ b/chapter/8/big-data.md
@@ -14,13 +14,13 @@ This chapter is organized in
- PM of Dryad: can support DAG computation, limitations: low-level, `Q: Should this go to execution model?`
- PM of Spark, RDD/lineage: can support iterative algorithm, interactive analytics
- Large-scale Parallelism on Graphs
- - PM of Pregel/GraphX
+ - PM of Pregel
- Querying: more declarative `Q: put here or in the execution model?`
- DryadLINQ, SQL-like, use Dryad as execution engine;
- Pig, on top of Hadoop, independent of execution platform, in theory can compiled into DryadLINQ too; what is the performance gain/lost? Easier to debug?
- Hive, SQL-like, on top of Hadoop, what is the performance gain/lost.
- Dremel, query natively w/o translating into MP jobs
- - Spark SQL, on top of Spark
+ - Spark SQL - how is it different from other above models? How does it leverage Spark execution engine and enhanced RDDs like data frames? what are its goals? whats a Dataframe API and how is it different from a RDD?
- Execution Models
- MapReduce (intermediate writes to disk)