diff options
| author | Jingjing Ren <renjj@ccs.neu.edu> | 2016-12-15 11:02:58 -0500 |
|---|---|---|
| committer | Jingjing Ren <renjj@ccs.neu.edu> | 2016-12-15 11:02:58 -0500 |
| commit | d1ba81f4afc3eece7ade1aeae6e262c6b8a7165e (patch) | |
| tree | f574f5a61c12c151cc17e8ab4aff75fc76c56ad6 /chapter | |
| parent | 1e20be80a76ea452d9f9109b6924860e4e1d6f94 (diff) | |
update pig
Diffstat (limited to 'chapter')
| -rw-r--r-- | chapter/8/big-data.md | 22 |
1 files changed, 11 insertions, 11 deletions
diff --git a/chapter/8/big-data.md b/chapter/8/big-data.md index f800de7..1d08292 100644 --- a/chapter/8/big-data.md +++ b/chapter/8/big-data.md @@ -360,16 +360,6 @@ output = FOREACH big_groups GENERATE category, AVG(good_urls.pagerank); ``` -*Word count implementation in PIG* - -``` -Ignore the below - lines = LOAD 'input_fule.txt' AS (line:chararray); -words = FOREACH lines GENERATE FLATTEN(TOKENIZE(line)) as word; -grouped = GROUP words BY word; -wordcount = FOREACH grouped GENERATE group, COUNT(words); -DUMP wordcount; -``` *Interoperability* Pig Latin is designed to support ad-hoc data analysis, which means the input only requires a function to parse the content of files into tuples. This saves the time-consuming import step. While as for the output, Pig provides freedom to convert tuples into byte sequence where the format can be defined by users. This allows Pig to interoperate with other existing applications in Yahoo's ecosystem. @@ -379,8 +369,18 @@ DUMP wordcount; *Debugging Environment* Pig Latin has a novel interactive debugging environment that can generate a concise example data table to illustrate output of each step. -*Limitations* The procedural design gives users more control over execution, but at same time the data schema is not enforced explicitly, so it much harder to utilize database-style optimization. +*Limitations* The procedural design gives users more control over execution, but at same time the data schema is not enforced explicitly, so it much harder to utilize database-style optimization. +*Word count implementation in PIG* + +``` +Ignore the below + lines = LOAD 'input_fule.txt' AS (line:chararray); +words = FOREACH lines GENERATE FLATTEN(TOKENIZE(line)) as word; +grouped = GROUP words BY word; +wordcount = FOREACH grouped GENERATE group, COUNT(words); +DUMP wordcount; +``` ### 1.2.3 SparkSQL : |
