aboutsummaryrefslogtreecommitdiff
path: root/chapter
diff options
context:
space:
mode:
authorWilliam King <william.king@quentustech.com>2017-10-06 23:22:21 -0700
committerWilliam King <william.king@quentustech.com>2017-10-06 23:22:21 -0700
commit20f0986a1dafe8c4fdb83c4355189e4fb0c7e59f (patch)
treebab3f8935b0461422a1ed6841667a2292ca50f3e /chapter
parentb4c1801a9ea3c697e8d622acf262ffc618d945e5 (diff)
Fixing small typo. bath => batch.
Diffstat (limited to 'chapter')
-rw-r--r--chapter/9/streaming.md2
1 files changed, 1 insertions, 1 deletions
diff --git a/chapter/9/streaming.md b/chapter/9/streaming.md
index d805d1d..13abea2 100644
--- a/chapter/9/streaming.md
+++ b/chapter/9/streaming.md
@@ -195,7 +195,7 @@ In short, *Naiad* allows processing of messages from different epochs and aggreg
We now have seen three different systems that can process data stream in large scale, however, each of them are constraint in the way of viewing the dataset. *Storm* can perform stream processing on each tuple, where *Spark streaming* and *Naiad* have their own way of grouping tuples together into small batches before processing. The authors of *Google Dataflow* {% cite akidau2015dataflow --file streaming %} believe that the fundamental problem of those views is they are limited by the processing engine, for example, if you were to use *Spark streaming* to process the stream, you can only group the tuples into small time intervals. The motivation of *Google Dataflow* is then a general underlying system with which the users can express what processing model they want.
-*Google Dataflow* is a system that allows batch, micro-bath and stream processing where users can choose based on the tradeoffs provided by each processing model: latency or resouce constraint. *Google Dataflow* implements many features in order to achieve its goal, and we will briefly talk about them.
+*Google Dataflow* is a system that allows batch, micro-batch and stream processing where users can choose based on the tradeoffs provided by each processing model: latency or resouce constraint. *Google Dataflow* implements many features in order to achieve its goal, and we will briefly talk about them.
*Google Dataflow* provides a windowing model that supports unaligned event-time windows, which helped the users to express how to batch the tuples together in a stream. Windowing slices a dataset into finite chunks for processing as a group, one can think of it as batching as we discussed before. Unaligned windows are the windows that would only be applied to certain tuples during the period, for example, if we have an unaligned window *w[1:00,2:00)(k)*, and only the events with key *k* during the time period [1:00, 2:00) would be grouped by this window. This is powerful since it provides an alternative way of batching tuples other than just time before processing.