diff options
| author | Jingjing Ren <renjj@ccs.neu.edu> | 2016-12-13 15:44:24 -0500 |
|---|---|---|
| committer | Jingjing Ren <renjj@ccs.neu.edu> | 2016-12-13 15:44:24 -0500 |
| commit | d481dd67059324d25a2af04214905d2bbac55995 (patch) | |
| tree | cdb077cb21292b5f9867ed4c6c8e758e987a0337 | |
| parent | b214b7afb85a61ea6932bdf235062e8f784cc0df (diff) | |
| -rw-r--r-- | chapter/8/big-data.md | 4 |
1 files changed, 2 insertions, 2 deletions
diff --git a/chapter/8/big-data.md b/chapter/8/big-data.md index 111b3a8..f51198f 100644 --- a/chapter/8/big-data.md +++ b/chapter/8/big-data.md @@ -119,7 +119,7 @@ The SiteData example{%cite chambers2010flumejava --file big-data %} shows that a ### 1.1.3 Dryad -Dryad is a more general and flexible execution engine that execute subroutines at a specified graph vertices. Developers can specify an arbitrary directed acyclic graph to combine computational "vertices" with communication channels (file, TCP pipe, shared-memory FIFO) and build a dataflow graph. Compared with MapReduce, Dryad can specify an arbitrary DAG that have multiple number of inputs/outputs and support multiple stages. Also it can have more channels and boost the performance when using TCP pipes and shared-memory. But like writing a pipeline of MapReduce jobs, Dryad is a low-level programming model and hard for users to program, thus a more declarative model - DryadLINQ {%cite yu2008dryadlinq --file big-data %} was created to fill in the gap. It exploits LINQ, a query language in .NET and automatically translates the data-parallel part into execution plan and passed to the Dryad execution engine. Like MR, writing raw Dryad is hard, programmers need to understand system resources and other lower-level details. This motivates a more declarative programming model: DryadLINQ - a querying language. +Dryad is a more general and flexible execution engine than MapReduce? that execute subroutines at a specified graph vertices. Developers can specify an arbitrary directed acyclic graph to combine computational "vertices" with communication channels (file, TCP pipe, shared-memory FIFO) and build a dataflow graph. Compared with MapReduce, Dryad can specify an arbitrary DAG that have multiple number of inputs/outputs and support multiple stages. Also it can have more channels and boost the performance when using TCP pipes and shared-memory. But like writing a pipeline of MapReduce jobs, Dryad is a low-level programming model and hard for users to program, thus a more declarative model - DryadLINQ {%cite yu2008dryadlinq --file big-data %} was created to fill in the gap. It exploits LINQ, a query language in .NET and automatically translates the data-parallel part into execution plan and passed to the Dryad execution engine. Like MR, writing raw Dryad is hard, programmers need to understand system resources and other lower-level details. This motivates a more declarative programming model: DryadLINQ - a querying language. ### 1.1.4 Spark @@ -461,7 +461,7 @@ Hence, in Spark SQL, transformation of user queries happens in four phases : ## 3. Big Data Ecosystem *Hadoop Ecosystem* -Apache Hadoop is an open-sourced framework that supports distributed processing of large dataset. It involves a long list of projects that you can find in this table https://hadoopecosystemtable.github.io/. In this section, it is also important to understand the key players in the system, namely two parts: the Hadoop Distributed File System (HDFS) and the open-sourced implementation of MapReduce model - Hadoop. +Apache Hadoop is an open-sourced framework that supports distributed processing of large dataset. It involves dozens of projects, all of which are listed [here](https://hadoopecosystemtable.github.io/). In this section, it is also important to understand the key players in the system, namely two parts: the Hadoop Distributed File System (HDFS) and the open-sourced implementation of MapReduce model - Hadoop. <figure class="main-container"> <img src="./hadoop-ecosystem.jpg" alt="Hadoop Ecosystem" /> |
