aboutsummaryrefslogtreecommitdiff
path: root/chapter/9/streaming.md
diff options
context:
space:
mode:
Diffstat (limited to 'chapter/9/streaming.md')
-rw-r--r--chapter/9/streaming.md12
1 files changed, 10 insertions, 2 deletions
diff --git a/chapter/9/streaming.md b/chapter/9/streaming.md
index d359a60..6ba92a4 100644
--- a/chapter/9/streaming.md
+++ b/chapter/9/streaming.md
@@ -135,11 +135,19 @@ In conclusion, *Storm* was a critical infrastructure at Twitter that powered man
*Storm* has long serverd as the core of Twitter for real-time analysis, however, as the scale of data being processed has increased, along with the increase in the diversity and the number of use cases, many limitations of *Storm* became apparent.
-There are several issues with *Storm* that make using is at Twitter become challenging. The first challenge is debug-bility, there is no clean mapping from the logical units of computation in the topology to each physical process, this makes finding the root cause of misbehavior extremely hard. Another challenge is as the cluster resouces becomes precious, the need for dedicated cluster resources in *Storm* leads to inefficiency and it is better to share resources across different types of systems. In addition, since in *Storm* provisioning a new production topology needs manual isolation of machines, this makes the management process cumbersome. Finally, Twitter needs a more efficient system, simply with the increase scale, any improvement in performance can translate to huge benefit.
+There are several issues with *Storm* that make using is at Twitter become challenging. The first challenge is debug-bility, there is no clean mapping from the logical units of computation in the topology to each physical process, this makes finding the root cause of misbehavior extremely hard. Another challenge is as the cluster resouces becomes precious, the need for dedicated cluster resources in *Storm* leads to inefficiency and it is better to share resources across different types of systems. In addition, Twitter needs a more efficient system, simply with the increase scale, any improvement in performance can translate to huge benefit.
Twitter realized in order to meet all the needs, they needed a new real-time stream data processing system-Heron, which is API-compatible with Storm and provides significant performance improvements, lower resouce consumption along with better debug-ability scalability and manageability.
-A key design goal for Heron is compatibility with the Storm API, thus Heron runs topologies, graphs with spouts and bolts like Storm.
+A key design goal for Heron is compatibility with the *Storm* API, thus Heron runs topologies, graphs with spouts and bolts like Storm. Unlike *Storm* though, the Heron topology is translated into a physical plan before actual execution, and there are multiple components in the physical plan.
+
+Each topology is run as an Aurora job, instead of using Nimbuz as scheduler, Twitter chose Aurora since it is developed and used by other Twitter projects. Each Aurora job is then consisted of several containers, the first container runs Topology Master, which provides a single point of contact for discovering the status of the topology and also serves as the gateway for the topology metrics through an endpoint. The other containers each run a Stream Manager, a Metrics Manager and a number of Heron Instances. The key functionality for each Stream Manager is to manage the routing of tuples efficiently, all Stream Managers are connected to each other and the tuples from Heron Instances in different containers would be transmitted through their Stream Managers, thus the Stream Managers can be viewed as Super Node for communication. Stream Manager also provides a backpressure mechanism, which can dynamically adjust the rate of the data flows through the network, for example, if the Stream Managers of the bolts are overwhelmed, they would then notice the Stream Managers of the spouts to slow down thus ensure all the data are properly processed. Heron Instance carries out the real work for a spout or a bolt, unlike woker in *Storm*, each Heron Instance runs only a single task as a process, in addition to performing the work, Heron Instance is also responsible for collecting multiple metrics. The metrics collected by Heron Instances would then be sent to the Metrics Manager in the same container and to the central monitoring system.
+
+The components in the Heron topology are clearly separated, so the failure in various level would be handled differently. For example, if the Topology Master dies, the container would restart the process, and the stand-by Topology Master would take over the master while the restarted would become the stand-by. When a Stream Manager dies, it gets started in the same container, and after rediscovers the Topology Master, it would fetch and check whether there are any changes need to be made in its state. Similarly, all the other failures can be handled gracefully by Heron.
+
+Heron addresses the challenges of *Storm*. First, each task is performed by a single Heron Instance, and the different functionalities are abstracted into different level, which makes debug clear. Second, the provisioning of resouces is abstracted out thus made sharing infrastucture with other systems easier. Third, Heron provides multiple metrics along with the backpressure mechanism, which can be used to precisely reason about and achieve a consistent rate of delevering results.
+
+*Storm* has been decommissioned and Heron is now the de-facto streaming system at Twitter and an interesting note is that after migrating all the topologies to Heron, there was an overall 3X reduction in hardware. Not only Heron reduces the infrastracture needed, it also outperform *Storm* by delivering 6-14X improvements in throughput, and 5-10X reductions in tuple latencies.
###Spotify