proj-plbook-plChDistributedSystems

Table of Contents for Programming Languages: a survey

Chapter: distributed systems

types of network failures you have to worry about a lot

https://en.wikipedia.org/wiki/Fallacies_of_Distributed_Computing

naming

redundancy

durability

idempotency

REST

caching

upgrades

proxy

CAP theorem

The CAP theorem says that you cannot have a database that guarantees consistency and availability and partition tolerance.

What the theorem actually says

Since the meaning of the terms in the theorem are often misunderstood, we provide excerpts from the paper Brewer's conjecture and the feasibility of consistent, available, partition-tolerant web services

Partially synchronous model: "every node has a clock, and all clocks increase at the same rate. However, the clocks themselves are not synchronized, in that they may display different values at the same real time...A local timer can be used to schedule an action to occur a certain interval of time after some other event. Furthermore, assume that every message is either delivered within a given, known time..., or it is lost. Also, every node processes a received message within a given, known time..., and local processing takes zero time"

Consistency:

"Atomic [5], or linearizable [4], consistency ...Under this consistency guarantee, there must exist a total order on all operations such that each operation looks as if it were completed at a single instant. This is equivalent to requiring requests of the distributed shared memory to act as if they were executing on a single node, responding to operations one at a time. One important property of an atomic read/write shared memory is that any read operation that begins after a write operation completes must return that value, or the result of a later write operation...Discussing atomic consistency is somewhat different than talking about an ACID database, as database consistency refers to transactions, while atomic consistency refers only to a property of a single request/response operation sequence. And it has a different meaning than the Atomic in ACID, as it subsumes the database notions of both Atomic and Consistent"

Availability: "every request received by a non-failing node in the system must result in a response"

Partition Tolerance: The above definitions of availability and atomicity are qualified by the need to tolerate partitions...When a network is partitioned, all messages sent from nodes in one component of the partition to nodes in another component are lost"

" It is impossible in the partially synchronous network model to implement a read/write data object that guarantees the following properties:

in all executions (even those in which messages are lost).

It is impossible in the asynchronous network model to implement a read/write data object that guarantees the following properties:

Summary of the proof (of a different but similar theorem in the same paper): "The basic idea of the proof is to assume that all messages between G1 and G2 are lost. Then if a write occurs in G1, and later a read occurs in G2, then the read operation cannot return the results of the earlier write operation."

What it does not say

CAP does not say that no nodes from a distributed consistent database can continue to serve in the case of a network partition. As Nick Lavezzo points out, nodes in a minority partition could simply detect that they have been partitioned and respond to all requests with an error message. If a majority partition exists with writable copies of all data, then clients who can connect to the majority partition can still read and write data. The CAP theorem is simply saying that not ALL nodes can continue to accept updates during a partition if consistency is to be maintained.

The CAP theorem does not say that a distributed database cannot be consistent and available during times when there is no partition.

The CAP theorem does not say that a distributed database must choose, in the event of a partition, C or A for all operations; the choice may be made differently for each operation, and may depend on the user or data involved. And "all three properties are more continuous than binary. Availability is obviously continuous from 0 to 100 percent, but there are also many levels of consistency, and even partitions have nuances, including disagreement within the system about whether a partition exists." ([1]).

The CAP theorem does not say that nodes cannot respond to requests at all, or cannot respond selectively to read requests during a partition. Nodes are still permitted to return error messages, to return stale data, or to accept read requests but refuse writes. As Brewer says, the Availability in CAP means availability for updates.

Links

routing

security

zero-trust global consensus algorithms e.g. bitcoin (but this is getting far away from programming language constructs...)