From 1181d1ca440e5f74c87193373ec733cac02cdf5c Mon Sep 17 00:00:00 2001 From: msabhi Date: Sun, 11 Dec 2016 00:05:46 -0500 Subject: Adding missing references --- chapter/8/big-data.md | 1 + 1 file changed, 1 insertion(+) (limited to 'chapter/8') diff --git a/chapter/8/big-data.md b/chapter/8/big-data.md index 49a4a0d..38b0691 100644 --- a/chapter/8/big-data.md +++ b/chapter/8/big-data.md @@ -250,6 +250,7 @@ Winding up - we can compare SQL vs Dataframe vs Dataset as below :
SQL vs Dataframe vs Dataset
+*Figure from the website :* https://databricks.com/blog/2016/07/14/a-tale-of-three-apache-spark-apis-rdds-dataframes-and-datasets.html ### 1.3 Large-scale Parallelism on Graphs Map Reduce doesn’t scale easily and is highly inefficient for iterative / graph algorithms like page rank and machine learning algorithms. Iterative algorithms requires programmer to explicitly handle the intermediate results (writing to disks). Hence, every iteration requires reading the input file and writing the results to the disk resulting in high disk I/O which is a performance bottleneck for any batch processing system. -- cgit v1.2.3