aboutsummaryrefslogtreecommitdiff
path: root/chapter/8
diff options
context:
space:
mode:
authormsabhi <abhi.is2006@gmail.com>2016-12-08 17:19:13 -0500
committerGitHub <noreply@github.com>2016-12-08 17:19:13 -0500
commitb6bda137472d20297163ddf001f4f344be563410 (patch)
tree26820c37f8300c0de27b80db8013ef04986012f9 /chapter/8
parent919359282b6c81a5a5fec84a463ed402664808a3 (diff)
Updated Query section
Diffstat (limited to 'chapter/8')
-rw-r--r--chapter/8/big-data.md6
1 files changed, 4 insertions, 2 deletions
diff --git a/chapter/8/big-data.md b/chapter/8/big-data.md
index 1f98e6b..2afb1c5 100644
--- a/chapter/8/big-data.md
+++ b/chapter/8/big-data.md
@@ -203,9 +203,11 @@ Relational interface to big data is good, however, it doesn’t cater to users w
- ETL to and from various semi or unstructured data sources.
- advanced analytics like machine learning or graph processing.
-These user actions require best of both the worlds - relational queries and procedural algorithms. Pig Latin and Spark SQL bridges this gap by letting users to seamlessly intermix both relational and procedural API.
+These user actions require best of both the worlds - relational queries and procedural algorithms. Pig Latin and Spark SQL bridges this gap by letting users to seamlessly intermix both relational and procedural API. Both the frameworks free the programmer from worrying about internal execution model by providing implicit optimization on the user input DAG of transformations.
-Pig Latin {% cite olston2008pig --file big-data%} aims at a sweet spot between declarative and procedural programming. For advanced programmers, SQL is unnatural to implement program logic and Pig Latin wants to dissemble the set of data transformation into a sequence of steps. This makes Pig more verbose than Hive. However, Pig offers
+Pig Latin {% cite olston2008pig --file big-data%} aims at a sweet spot between declarative and procedural programming. For advanced programmers, SQL is unnatural to implement program logic and Pig Latin wants to dissemble the set of data transformation into a sequence of steps. This makes Pig more verbose than Hive.
+
+SparkSQL though has the same goals as that of Pig, is better given the Spark exeuction engine, efficient fault tolerance mechanism of Spark and specialized data structure called Dataset.
The following subsections will discuss Hive, Pig Latin, SparkSQL in details.