Adding missing references

author: msabhi <abhi.is2006@gmail.com> 2016-12-11 00:05:46 -0500
committer: GitHub <noreply@github.com> 2016-12-11 00:05:46 -0500
commit: 1181d1ca440e5f74c87193373ec733cac02cdf5c (patch)
tree: 87585dfad0ff20cc38885b83f27960b3133098d0
parent: 9baf00cc2472ecea464a6f2003d34dadb6e73e9a (diff)
1 files changed, 1 insertions, 0 deletions
diff --git a/chapter/8/big-data.md b/chapter/8/big-data.md
index 49a4a0d..38b0691 100644
--- a/chapter/8/big-data.md
+++ b/chapter/8/big-data.md
@@ -250,6 +250,7 @@ Winding up - we can compare SQL vs Dataframe vs Dataset as below :
 <figure class="main-container">
   <img src="./sql-vs-dataframes-vs-datasets.png" alt="SQL vs Dataframe vs Dataset" />
 </figure>
+*Figure from the website :* https://databricks.com/blog/2016/07/14/a-tale-of-three-apache-spark-apis-rdds-dataframes-and-datasets.html
 
 ### 1.3 Large-scale Parallelism on Graphs
 Map Reduce doesn’t scale easily and is highly inefficient for iterative / graph algorithms like page rank and machine learning algorithms. Iterative algorithms requires programmer to explicitly handle the intermediate results (writing to disks). Hence, every iteration requires reading the input file and writing the results to the disk resulting in high disk I/O which is a performance bottleneck for any batch processing system.
author	msabhi <abhi.is2006@gmail.com>	2016-12-11 00:05:46 -0500
committer	GitHub <noreply@github.com>	2016-12-11 00:05:46 -0500
commit	1181d1ca440e5f74c87193373ec733cac02cdf5c (patch)
tree	87585dfad0ff20cc38885b83f27960b3133098d0
parent	9baf00cc2472ecea464a6f2003d34dadb6e73e9a (diff)