diff options
| author | msabhi <abhi.is2006@gmail.com> | 2016-12-11 00:05:46 -0500 |
|---|---|---|
| committer | GitHub <noreply@github.com> | 2016-12-11 00:05:46 -0500 |
| commit | 1181d1ca440e5f74c87193373ec733cac02cdf5c (patch) | |
| tree | 87585dfad0ff20cc38885b83f27960b3133098d0 | |
| parent | 9baf00cc2472ecea464a6f2003d34dadb6e73e9a (diff) | |
Adding missing references
| -rw-r--r-- | chapter/8/big-data.md | 1 |
1 files changed, 1 insertions, 0 deletions
diff --git a/chapter/8/big-data.md b/chapter/8/big-data.md index 49a4a0d..38b0691 100644 --- a/chapter/8/big-data.md +++ b/chapter/8/big-data.md @@ -250,6 +250,7 @@ Winding up - we can compare SQL vs Dataframe vs Dataset as below : <figure class="main-container"> <img src="./sql-vs-dataframes-vs-datasets.png" alt="SQL vs Dataframe vs Dataset" /> </figure> +*Figure from the website :* https://databricks.com/blog/2016/07/14/a-tale-of-three-apache-spark-apis-rdds-dataframes-and-datasets.html ### 1.3 Large-scale Parallelism on Graphs Map Reduce doesn’t scale easily and is highly inefficient for iterative / graph algorithms like page rank and machine learning algorithms. Iterative algorithms requires programmer to explicitly handle the intermediate results (writing to disks). Hence, every iteration requires reading the input file and writing the results to the disk resulting in high disk I/O which is a performance bottleneck for any batch processing system. |
