aboutsummaryrefslogtreecommitdiff
path: root/chapter/7/langs-consistency.md
diff options
context:
space:
mode:
authorJames Larisch <james@jameslarisch.com>2016-12-15 23:08:03 -0500
committerJames Larisch <james@jameslarisch.com>2016-12-15 23:08:03 -0500
commitc97cb3b91165525b1cb0c3273ad1edc59f9f2bd9 (patch)
tree5b2dd79043f720413b91d6ea9ecc02c26a3e2806 /chapter/7/langs-consistency.md
parent1b7f186b0ce9dd0014e021331f70dc22bb5798ff (diff)
Small fixes
Diffstat (limited to 'chapter/7/langs-consistency.md')
-rw-r--r--chapter/7/langs-consistency.md6
1 files changed, 3 insertions, 3 deletions
diff --git a/chapter/7/langs-consistency.md b/chapter/7/langs-consistency.md
index b8f013f..78162bc 100644
--- a/chapter/7/langs-consistency.md
+++ b/chapter/7/langs-consistency.md
@@ -6,7 +6,7 @@ by: "James Larisch"
# Formal, Yet Relaxed: Models for Consistency
## What's the problem?
- As processors become expensive and the limits of Moore's Law are pushed, programmers find themselves in situations where they need to connect multiple computers together using a network cable. Perhaps it's not even due to cost or performance constraints; perhaps your company has servers in New York and San Fransisco, and there is some global state that requires synchronization across the country. Problems requiring solutions of this nature can be described as "distributed systems" problems. Your data / processing power / entry points are distributed for some reason. In many ways, web developers deal with distributed systems problems every day: your client and your server are in two different geographical locations, and thus, some coordination is required.
+ In many ways, web developers deal with distributed systems problems every day: your client and your server are in two different geographical locations, and thus, some coordination between computers is required.
As Aviral discussed in the previous section, many computer scientists have done a lot of thinking about the nature of distributed systems problems. As such, we realize that it's impossible to completely emulate the behavior of a single computational machine using multiple machines. For example, the network is simply not as reliable as, say, memory - and waiting for responses can result in a lack of timeliness for the application's client. After discussing the Consistency/Availability/Partition-tolerance theorem, Section 6 discussed how we can make drill down into the CAP pyramid and choose the properties of our systems. As stated, we can't perfectly emulate a single computer using multiple machines, but once we accept that fact and learn to work with it... there are plenty of things we *can* do!
@@ -35,7 +35,7 @@ Turns out there's a company out there called Amazon.com - and they've been havin
## Dynamo
Amazon built DynamoDB, which is basically a big distributed hash table. In other words, it's a hashmap spread across multiple computers. A user's cart would be stored as a value under the user's username as the key. When a user adds a new item to her cart, the cart data is replicated across a multiple machines within the network. If the client changes locations and performs another write or a few machines fail and later recover, it's possible for different machines to have different opinions about the state of a given user's cart.
-Dynamo has a rather unique way of dealing with these types of conflicts. Since Dynamo always wants to be available for both writes and reads (add/removes, viewing/checkouts, resp) it must have a way of combining inconsistent data. Dynamo chooses to perform this resolution at read time. When a client performs a `get()` on the user's cart, Dynamo will take the multiple conflicting carts...aaaaaand... push it all up to the application! Huh? I thought Dynamo resolves this for the programmer!? Actually, Dynamo is a generic key-value store. It detects inconsistencies in the data - but once it does, it simply tells the application (in this case the application is the shopping cart code) that there are some conflicts. The application (shopping cart, in this case) is free to resolve these inconsistencies as it pleases.
+Dynamo has a rather unique way of dealing with these types of conflicts. Since Dynamo always wants to be available for both writes and reads (add/removes, viewing/checkouts, resp) it must have a way of combining inconsistent data. Dynamo chooses to perform this resolution at read time. When a client performs a `get()` on the user's cart, Dynamo will take the multiple conflicting carts and push it all up to the application! Huh? I thought Dynamo resolves this for the programmer!? Actually, Dynamo is a generic key-value store. It detects inconsistencies in the data - but once it does, it simply tells the application (in this case the application is the shopping cart code) that there are some conflicts. The application (shopping cart, in this case) is free to resolve these inconsistencies as it pleases.
How should Amazon's shopping cart procede with resolution? It may be fed two cart states like so:
@@ -59,7 +59,7 @@ Green Umbrella
It's important to understand that Amazon has multiple machines storing the contents of your cart. These machines are asynchronously communicating in order to tell each other about updates they've received. Conflicts like this can happen when you try to read before the nodes have had time to gossip about your cart. More likely, however, is the situation in which one of the machines holding your cart goes offline and missing some updates. When it comes back online, you try to read, and this resolution process must occur.
### Good & Bad
-What do we love about Dynamo? It's a highly available key-value store. It replicates data well, and according to the paper, has an insanely high uptime and low latency. We love that it's *eventually consistent*. Nodes are constantly gossiping, so given enough time (and assuming failures are resolved), nodes' states will eventually converge. However, this property is *weak*. It's weak because when failures+conflicts occur, and [and they will occur](https://www.youtube.com/watch?v=JG2ESDGwHHY), it's up to the application developer to figure out how to handle it. In the case of the shopping cart, it's relatively trivial. But as a programmer, every time you'd like to use DynamoDB you need to consider your resolution strategy. The database doesn't provide a general solution.
+What do we love about Dynamo? It's a highly available key-value store. It replicates data well, and according to the paper, has high uptime and low latency. We love that it's *eventually consistent*. Nodes are constantly gossiping, so given enough time (and assuming failures are resolved), nodes' states will eventually converge. However, this property is *weak*. It's weak because when failures+conflicts occur, and [and they will occur](https://www.youtube.com/watch?v=JG2ESDGwHHY), it's up to the application developer to figure out how to handle it. In the case of the shopping cart, it's relatively trivial. But as a programmer, every time you'd like to use DynamoDB you need to consider your resolution strategy. The database doesn't provide a general solution.
Instead of constructing an all-purpose database and forcing the burden of resolution on programmers, what if we constructed general-purpose data structures that required no manual resolution? These data structures would resolve conflicts inherently, themselves, and depending on your application you could choose which data structure works best for you.