aboutsummaryrefslogtreecommitdiff
path: root/chapter/8
diff options
context:
space:
mode:
authormsabhi <abhi.is2006@gmail.com>2016-12-05 10:25:56 -0500
committerGitHub <noreply@github.com>2016-12-05 10:25:56 -0500
commitb9f699d22cc89fdee96d257ed9a65137327103ca (patch)
treec625374908efe2c91a197d7387d08b87244b2175 /chapter/8
parent2256ae1da929d709d12e1d5e7a13ba948a2a9b45 (diff)
Update big-data.md
Diffstat (limited to 'chapter/8')
-rw-r--r--chapter/8/big-data.md1
1 files changed, 1 insertions, 0 deletions
diff --git a/chapter/8/big-data.md b/chapter/8/big-data.md
index 25e0119..e812f54 100644
--- a/chapter/8/big-data.md
+++ b/chapter/8/big-data.md
@@ -185,6 +185,7 @@ Several of these operators like === for equality test, > for greater than, a ri
A cache() operation on the data frame helps Spark SQL store the data in memory so it can be used in iterative algorithms and for interactive queries. In case of Spark SQL, memory footprint is considerably less as it applies columnar compression schemes like dictionary encoding / run-length encoding.
The DataFrame API also supports inline UDF definitions without complicated packaging and registration. Because UDFs and queries are both expressed in the same general purpose language (Python or Scala), users can use standard debugging tools.
+
However, a DataFrame lacks type safety. In the above example, attributes are referred to by string names. Hence, it is not possible for the compiler to catch any errors. If attribute names are incorrect then the error will only detected at runtime, when the query plan is created.
Spark introduced a extension to Dataframe called ***Dataset*** to provide this compile type safety. It embraces object oriented style for programming and has an additional feature termed Encoders. Encoders translate between JVM representations (objects) and Spark’s internal binary format. Spark has built-in encoders which are very advanced in that they generate byte code to interact with off-heap data and provide on-demand access to individual attributes without having to de-serialize an entire object