diff options
| author | msabhi <abhi.is2006@gmail.com> | 2016-12-15 01:35:48 -0500 |
|---|---|---|
| committer | GitHub <noreply@github.com> | 2016-12-15 01:35:48 -0500 |
| commit | 7d565b86c499491bc18e5fa1c439744eed056007 (patch) | |
| tree | 99cb69f11eb66ca462175b4abd093654243cbb06 /chapter | |
| parent | adb64f799c47d47804f0faddec29277ce05b5461 (diff) | |
updating query section
Diffstat (limited to 'chapter')
| -rw-r--r-- | chapter/8/big-data.md | 7 |
1 files changed, 0 insertions, 7 deletions
diff --git a/chapter/8/big-data.md b/chapter/8/big-data.md index 2fd3e59..511c7dd 100644 --- a/chapter/8/big-data.md +++ b/chapter/8/big-data.md @@ -230,13 +230,6 @@ Apart from Sawzal, Pig {%cite olston2008pig --file big-data %} and Hive {%cite Hive is built by Facebook to organize dataset in structured formats and still utilize the benefit of MapReduce framework. It has its own SQL-like language: HiveQL {%cite thusoo2010hive --file big-data %} which is easy for anyone who understands SQL. Hive reduces code complexity and eliminates lots of boiler plate that would otherwise be an overhead with Java based MapReduce approach. -Relational interface to big data is good, however, it doesn’t cater to users who want to perform - -- ETL to and from various semi or unstructured data sources. -- advanced analytics like machine learning or graph processing. - -These user actions require best of both the worlds - relational queries and procedural algorithms. Pig Latin {% cite olston2008pig --file big-data%} and Spark SQL {% cite armbrust2015spark --file big-data%} bridges this gap by letting users to seamlessly intermix both relational and procedural API. Both the frameworks free the programmer from worrying about internal execution model by providing implicit optimization on the user input DAG of transformations. - Pig Latin aims at a sweet spot between declarative and procedural programming. For advanced programmers, SQL is unnatural to implement program logic and Pig Latin wants to dissemble the set of data transformation into a sequence of steps. This makes Pig more verbose than Hive. SparkSQL though has the same goals as that of Pig, is better given the Spark exeuction engine, efficient fault tolerance mechanism of Spark and specialized data structure called Dataset. |
