aboutsummaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authormsabhi <abhi.is2006@gmail.com>2016-12-11 00:07:18 -0500
committerGitHub <noreply@github.com>2016-12-11 00:07:18 -0500
commitf492fcc86e98074561e494466a107f08d7bc26b0 (patch)
treec229dce116774d6c122ef513382329307b9818f9
parent1181d1ca440e5f74c87193373ec733cac02cdf5c (diff)
Update big-data.md
-rw-r--r--chapter/8/big-data.md4
1 files changed, 2 insertions, 2 deletions
diff --git a/chapter/8/big-data.md b/chapter/8/big-data.md
index 38b0691..fc4e8f2 100644
--- a/chapter/8/big-data.md
+++ b/chapter/8/big-data.md
@@ -284,14 +284,14 @@ Graph-parallel computation requires every vertex or edge to be processed in the
<figure class="main-container">
<img src="./edge-cuts.png" alt="edge cuts" />
</figure>
-
+*Figure from {%cite xin2013graphx --file big-data%}*
***Why Edge-cuts are expensive ?***
Edge-cuts for partitioning requires random assignment of vertices and edges across all the machines. hus the communication and storage overhead is proportional to the number of edges cut, and this makes balancing the number of cuts a priority. For most real-world graphs, constructing an optimal edge-cut is cost prohibitive, and most systems use random edge-cuts which achieve appropriate work balance, but nearly worst-case communication overhead.
<figure class="main-container">
<img src="./vertex-cuts.png" alt="Vertex cuts" />
</figure>
-
+*Figure from {%cite xin2013graphx --file big-data%}*
***Vertex-cuts - GraphX’s solution to effective partitioning*** : An alternative approach which does the opposite of edge-cut — evenly assign edges to machines, but allow vertices to span multiple machines. The communication and storage overhead of a vertex-cut is directly proportional to the sum of the number of machines spanned by each vertex. Therefore, we can reduce communication overhead and ensure balanced computation by evenly assigning edges to machines in way that minimizes the number of machines spanned by each vertex.
The GraphX RDG structure implements a vertex-cut representation of a graph using three unordered horizontally partitioned RDD tables. These three tables are as follows: