aboutsummaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authormsabhi <abhi.is2006@gmail.com>2016-11-17 16:45:45 -0500
committerGitHub <noreply@github.com>2016-11-17 16:45:45 -0500
commit5b5b7221244988db67903836ebe6d76194001f23 (patch)
tree7e3799a41ea2db737274fb19c0cc2ba092fe035f
parentc4ffeb4613d4cc96d4f520d4f57be9b22d951c1f (diff)
Update big-data.md
-rw-r--r--chapter/8/big-data.md2
1 files changed, 1 insertions, 1 deletions
diff --git a/chapter/8/big-data.md b/chapter/8/big-data.md
index 89e3d6a..0bafaed 100644
--- a/chapter/8/big-data.md
+++ b/chapter/8/big-data.md
@@ -26,7 +26,7 @@ Pregel’s API provides <br />
+ Combiners reduce the amount of messages passed from multiple vertices to the same destination vertex.
+ Aggregators capture the global state of the graph. A reduce operation combines the value given by every vertex to the aggregator. The combined/aggregated value is passed onto to all the vertices in the next superstep.
+ Fault tolerance is achieved through checkpointing and instructing the workers to save the state of nodes to a persistent storage. When a machine fails, all workers restart the execution with state of their recent checkpoint.
-+ Master and worker implementation : The master partitions graph into set of vertices (hash on vertex ID mod number of partitions) and outgoing edges per partition. Each partition is assigned to a worker who manages the state of all its vertices by executing compute() method and coordinating the message communication. The workers also notifies the master of the vertices that are active for the next superstep.
++ Master and worker implementation : The master partitions graph into set of vertices (hash on vertex ID mod number of partitions) and outgoing edges per partition. Each partition is assigned to a worker who manages the state of all its vertices by executing compute() method and coordinating the message communication. The workers also notifies the master of the vertices that are active for the next superstep.<br />
Pregel works good for sparse graphs. However, dense graph could cause communication overhead resulting in system to break. Also, the entire computation state resides in the main memory.
Apache Giraph is an open source implementation of Pregel in which new features like master computation, sharded aggregators, edge-oriented input, out-of-core computation are added making it more efficient. The most high performance graph processing framework is GraphLab which is developed at Carnegie Melon University and uses the BSP model and executes on MPI.