Updated outline

author: msabhi <abhi.is2006@gmail.com> 2016-12-04 07:41:04 -0500
committer: GitHub <noreply@github.com> 2016-12-04 07:41:04 -0500
commit: 768f7e51fd7d6bafdc5658b86503463f7e4a2486 (patch)
tree: 74fc9bf61f0cc7f6a2b4b7615d13ef9f56bea8e6 /chapter/8
parent: b90835ab7de523f18149ae26b3e972c6f6407e1e (diff)
1 files changed, 2 insertions, 2 deletions
diff --git a/chapter/8/big-data.md b/chapter/8/big-data.md
index c6baac0..cc11e28 100644
--- a/chapter/8/big-data.md
+++ b/chapter/8/big-data.md
@@ -14,13 +14,13 @@ This chapter is organized in
     - PM of Dryad: can support DAG computation, limitations: low-level, `Q: Should this go to execution model?`
     - PM of Spark, RDD/lineage: can support iterative algorithm, interactive analytics
   - Large-scale Parallelism on Graphs
-    - PM of Pregel/GraphX
+    - PM of Pregel
   - Querying: more declarative `Q: put here or in the execution model?`
     - DryadLINQ, SQL-like, use Dryad as execution engine;
     - Pig, on top of Hadoop, independent of execution platform, in theory can compiled into DryadLINQ too; what is the performance gain/lost? Easier to debug?
     - Hive, SQL-like, on top of Hadoop, what is the performance gain/lost.
     - Dremel, query natively w/o translating into MP jobs
-    - Spark SQL, on top of Spark
+    - Spark SQL - how is it different from other above models? How does it leverage Spark execution engine and enhanced RDDs like data frames? what are its goals? whats a Dataframe API and how is it different from a RDD?
 
 - Execution Models
   - MapReduce (intermediate writes to disk)
author	msabhi <abhi.is2006@gmail.com>	2016-12-04 07:41:04 -0500
committer	GitHub <noreply@github.com>	2016-12-04 07:41:04 -0500
commit	768f7e51fd7d6bafdc5658b86503463f7e4a2486 (patch)
tree	74fc9bf61f0cc7f6a2b4b7615d13ef9f56bea8e6 /chapter/8
parent	b90835ab7de523f18149ae26b3e972c6f6407e1e (diff)