diff options
| author | msabhi <abhi.is2006@gmail.com> | 2016-12-04 07:41:04 -0500 |
|---|---|---|
| committer | GitHub <noreply@github.com> | 2016-12-04 07:41:04 -0500 |
| commit | 768f7e51fd7d6bafdc5658b86503463f7e4a2486 (patch) | |
| tree | 74fc9bf61f0cc7f6a2b4b7615d13ef9f56bea8e6 /chapter/8 | |
| parent | b90835ab7de523f18149ae26b3e972c6f6407e1e (diff) | |
Updated outline
Diffstat (limited to 'chapter/8')
| -rw-r--r-- | chapter/8/big-data.md | 4 |
1 files changed, 2 insertions, 2 deletions
diff --git a/chapter/8/big-data.md b/chapter/8/big-data.md index c6baac0..cc11e28 100644 --- a/chapter/8/big-data.md +++ b/chapter/8/big-data.md @@ -14,13 +14,13 @@ This chapter is organized in - PM of Dryad: can support DAG computation, limitations: low-level, `Q: Should this go to execution model?` - PM of Spark, RDD/lineage: can support iterative algorithm, interactive analytics - Large-scale Parallelism on Graphs - - PM of Pregel/GraphX + - PM of Pregel - Querying: more declarative `Q: put here or in the execution model?` - DryadLINQ, SQL-like, use Dryad as execution engine; - Pig, on top of Hadoop, independent of execution platform, in theory can compiled into DryadLINQ too; what is the performance gain/lost? Easier to debug? - Hive, SQL-like, on top of Hadoop, what is the performance gain/lost. - Dremel, query natively w/o translating into MP jobs - - Spark SQL, on top of Spark + - Spark SQL - how is it different from other above models? How does it leverage Spark execution engine and enhanced RDDs like data frames? what are its goals? whats a Dataframe API and how is it different from a RDD? - Execution Models - MapReduce (intermediate writes to disk) |
