From 68b6294cef1fd0f5c4a245ca3206038c824130d8 Mon Sep 17 00:00:00 2001 From: msabhi Date: Fri, 2 Dec 2016 05:49:56 -0500 Subject: Update big-data.md --- chapter/8/big-data.md | 18 ++++++++++-------- 1 file changed, 10 insertions(+), 8 deletions(-) (limited to 'chapter') diff --git a/chapter/8/big-data.md b/chapter/8/big-data.md index 1b0fff1..42e68d5 100644 --- a/chapter/8/big-data.md +++ b/chapter/8/big-data.md @@ -177,14 +177,16 @@ In Spark SQL, transformation happens in four phases : STILL WORKING ON THIS.. -## Large Scale Graph processing : -Map Reduce doesn’t scale easily and is highly inefficient for iterative / graph algorithms like page rank and machine learning algorithms. Iterative algorithms requires programmer to explicitly handle the intermediate results (writing to disks). Hence, every iteration requires reading the input file and writing the results to the disk resulting in high disk I/O which is a performance bottleneck for any batch processing system.
- Also graph algorithms require exchange of messages between vertices. In case of PageRank, every vertex requires the contributions from all its adjacent nodes to calculate its score. Map reduce currently lacks this model of message passing which makes it complex to reason about graph algorithms.
- -`Bulk synchronous parallel` model was introduced in 1980 to represent the hardware design features of parallel computers. It gained popularity as an alternative for map reduce since it addressed the above mentioned issues with map reduce to an extent. - - **Bulk synchronous parallel model** - This model was introduced in 1980 to represent the hardware design features of parallel computers. It gained popularity as an alternative for map reduce since it addressed the above mentioned issues with map reduce to an extent.
- In BSP model +## Large Scale Graph processing + +Map Reduce doesn’t scale easily and is highly inefficient for iterative / graph algorithms like page rank and machine learning algorithms. Iterative algorithms requires programmer to explicitly handle the intermediate results (writing to disks). Hence, every iteration requires reading the input file and writing the results to the disk resulting in high disk I/O which is a performance bottleneck for any batch processing system. + +Also graph algorithms require exchange of messages between vertices. In case of PageRank, every vertex requires the contributions from all its adjacent nodes to calculate its score. Map reduce currently lacks this model of message passing which makes it complex to reason about graph algorithms. + +**Bulk synchronous parallel model** + +This model was introduced in 1980 to represent the hardware design features of parallel computers. It gained popularity as an alternative for map reduce since it addressed the above mentioned issues with map reduce to an extent.
+In BSP model - Computation consists of several steps called as supersets. - The processors involved have their own local memory and every processor is connected to other via a point-to-point communication. -- cgit v1.2.3