An Out of Date CAP

  • Comments posted to this topic are about the item An Out of Date CAP

  • Good article.

    For me, I'm surely one of those guys looking to expand into the distributed platform waters with Hadoop (particularly with HDFS, HBase, Hive and Impala). I'm not looking to replace the traditional RDBMS (e.g.: SQL Server in my case). I'm looking to expand on the core capabilities of my current stack and add more support for distributed processing and lower costs per terabyte.

    Through that journey, I have to consider the availability, consistency and partition tolerance (e.g.: network failure, node failure, etc). And I do have to make some trade-offs compared to other systems such as eventually consistent even with no guarantee it will be consistent. Therefore, it all comes down to what is acceptable (or tolerable) and not acceptable. There is no perfect solution per se.

    At the end of the day, the trade-off is tolerable in order to accomplish tasks that one system cannot handle that well such as computation of billions of records on a daily basis.

  • CAP isn't really out of date, it's a statement of fact. Fine it may not be a problem in all cases but that doesn't invalidate it.

    If I go to an ATM I expect my balance to be correct, not approximate.

    If I'm doing a regression analysis on a billion rows, missing a few is probably fine, but that's a business decision not a technology decision.

  • I've never heard of the CAP Theorem before. Very interesting reading!! I can see its application. And I'm quite impressed that it could be proven rigorously. Thanks for writing this article Steve.

    Happy #BackToTheFuture Day!!

    Kindest Regards, Rod Connect with me on LinkedIn.

  • Daniel Abadi has put forward an interesting alternative model for thinking about some of the kinds of trade-offs for which people commonly bring up the CAP theorem: FIT (Fairness, Isolation, and Throughput). The basic idea as I understand it is that you can have two of the following three things:

    -Treat transactions equally and process them in the order received

    -Freedom from inconsistent data and race conditions

    -Ability to process a high volume of transactions concurrently, free from lengthy locking or commit protocols

    The following paper gives more details:

    http://sites.computer.org/debull/A15mar/A15MAR-CD.pdf#page=12

    Abadi has also been working on a database/scheduler system called Calvin that I find promising. Calvin uses an up-front scheduler that generates non-deterministic values and determines a specific serial transaction ordering before sending those transactions to the distributed storage engine. The system sacrifices Fairness in order to maintain consistency and Throughput without the need for expensive commit protocols. More details here:

    http://cs-www.cs.yale.edu/homes/dna/papers/determinism-vldb10.pdf

    I really like Calvin's approach and hope that the idea takes off.

  • I'd never heard of CAP, seems to me to be sort of an expansion of ACID, and it makes sense to me. But I've never dealt with distributed systems, so that's been one headache that I've avoided.

    I'm about to reluctantly wade in to NoSQL waters. A new project at our school that's contracted out is going to be implemented in MongoDB, so that'll be a learning experience. We'd written the RFP mentioning SQL Server and .Net, but the funding is forcing us in to the OSS realm. I discussed PostgreSQL with the vendor at the kickoff meeting, which is a tech that I'd love to learn, but they think Mongo would be a better fit. So that's what I'll learn.

    -----
    [font="Arial"]Knowledge is of two kinds. We know a subject ourselves or we know where we can find information upon it. --Samuel Johnson[/font]

  • Distributed systems are complicated by necessity. Solutions to various, competing functional and non-functional requirements often require us to consider what many previously as systems as sub-systems.

    This leads to different components retaining their properties, such as consistency, whereas the overall system does not require it. A great example of this is stock levels at online retailers. The stock control sub-system may require accuracy, however, other sub-systems may work well with an approximation.

    Gaz

    -- Stop your grinnin' and drop your linen...they're everywhere!!!

Viewing 7 posts - 1 through 6 (of 6 total)

You must be logged in to reply to this topic. Login to reply