Distributed Systems and the End of the API (opens in new tab)

(writings.quilt.org)

209 pointsmcms12y ago94 comments

94 comments

71 comments · 18 top-level

At least the author is wise enough to see that REST/RPC are not so different from each other.

I actually find it interesting that as I was learning about earlier networked "objects" type systems, programmers ran into problems where they were treating the networked objects as if they were local and that the network always works. Now, when we build REST api's they always ship with client libraries that feel like local objects and completely abstract away most notions of network failure, etc.

I'm not saying we've made an unreasonable tradeoff, it's just interesting that we seem to be making more refined versions of the same solutions with the same fundamental problems.

I guess the author was making a similar point.

steveklabnik12y ago

"Layman's REST" is very much RPC, yes.

Fielding's REST is very much not.

mantrax512y ago

Fielding's REST is pretty much CRUD in HTTP disguise.

Don't get me wrong, this can be great for "hypermedia applications" as Fielding's paper argues. But "hypermedia applications" just doesn't fit what many distributed services do these days.

Services are naturally centered arounds verbs (commands and queries) and not nouns (resources), so like with any other CRUD system, at some point a REST API that shoehorns everything into the four standard verbs HTTP commonly gives us, no longer adequately describes the business requirements of your app. You can definitely force things to be RESTful, but it's typically not the natural way to build an API. Feels akin to the ORM kind of impedance mismatch in some ways.

steveklabnik12y ago

I agree that many services are simply CRUD wrappers. That doesn't have much to do with the nature of the architecture Fielding proposes.

I would be interested in some citations from Fielding which demonstrate that RPC is its organizational principle. I don't think they're there, though.

3 more replies

ademarre12y ago

The problem with REST is that many (I dare say most) who think they are applying it really aren't.

I think this is quite relevant: http://roy.gbiv.com/untangled/2008/rest-apis-must-be-hyperte...

What sets REST apart from RPC is the Hypermedia-as-the-Engine-of-Application-State principle (HATEOAS). With HATEOAS, interaction semantics are removed from URIs and defined in terms of link relations. This decouples clients from URIs; very valuable.

ADD: To equate REST with CRUD overlooks HATEOAS completely. Simple CRUD solutions work on resources that have already been identified, but HATEOAS adds resource identification/addressing and discovery in a very maintainable way.

2 more replies

bodhi12y ago

So, honest question. I've been musing over REST and RPC for a while, and was trying to come up with some domains that were verb-oriented instead of noun-oriented. The only thing I could come up with was message passing, a-la XMPP or streaming content.

What are some other problem domains that are better represented in verb-oriented terminology?

1 more reply

icedchai12y ago

Exactly this. I've seen people doing crazy, non-nonsensical stuff to make their APIs "RESTful". So RESTful that they make no sense.

dreamfactory212y ago

> Services are naturally centered arounds verbs (commands and queries) and not nouns (resources)

Can you qualify that? Queries are presumably queries on resources, and which commands did you have in mind apart from creating or updating resources?

1 more reply

grey-area12y ago

I think the reason people limit verbs (or nouns) usually comes down to attempting to limit the gestalt which others coming new to the system have to hold in their head - if I know that you serve n resources, each of which has 4 well understood verbs, it's much easier to reason about than if I must know the verbs which go with each object and what they do to that particular object in your world.

Of course the real world and real systems will never conform to this sort of system, and you have to break out of it occasionally, but sometimes it's a good starting point as long as you're don't let it limit the horizons of your world, such that only 4 verbs should be enough for anyone, or verbs become completely subsidiary to nouns and must be escorted by them at all times. Zealotry based on this perfectly reasonable idea (limiting complexity to promote understanding) often leads to a Kingdom of Nouns situation:

http://steve-yegge.blogspot.co.uk/2006/03/execution-in-kingd...

scotth12y ago

Aside: Why do you have so many accounts mantrax? At least one of them is dead, and I'm sure you can imagine why.

3 more replies

dgreensp12y ago· 7 in thread

As the author of EtherPad I'm familiar with CRDT, which is a cousin of OT. They don't really replace APIs, unless you are using an API to synchronize data, which is only one of many things you might be trying to do.

In other words, if you're building EtherPad or Wave, use a fancy data structure for the collaborative document. Otherwise, don't. Meteor's DDP provides a nice model, where the results of RPCs stream in asynchronously.

cemerick12y ago

Hi, author here. I'm not sure you read the whole piece. :-) (Modern) APIs are a very limited mechanism of state transfer that happens to be paired with often side-effecting operations. Thus, a "synchronization" (I don't think that word is particularly useful because reasons) mechanism paired with reactive computational services _does_ replace APIs, and offers the ability to do much, much more.

OTs (operational transforms) _are_ a related precursor to CRDTs only in that they are both ways to reconcile concurrent changes, but that's really the limit of the connection. Unfortunately, the substrate for OTs (text, integer-indexed sequences of characters) is fundamentally not amenable to commutative operations. This makes implementing OTs _very_ difficult and error-prone, and certain combinations of concurrent operations are completely unreconcilable (a result that came out of a Korean group's study, can't find the cite for it right now).

_urga12y ago

I think the paper you are referencing might be [1]?

It's one of my favorite papers on CRDTs and provides practical pseudocode for learning how to implement CRDTs yourself.

The structures they present are simple to understand and have good performance characteristics compared to similar CRDTs [2].

A key insight from the second paper is to write CRDTs that optimize for applying remote operations over applying local operations, as the ratio of remote operations to local operations will be greater. i.e. 100 clients making 1 change to a CRDT will require all 100 clients to each apply 99 remote operations and 1 local operation.

[1] Replicated abstract data types: Building blocks for collaborative applications - http://dl.acm.org/citation.cfm?id=1931272

[2] Evaluating CRDTs for Real-time Document Editing - http://hal.archives-ouvertes.fr/docs/00/62/95/03/PDF/doce63-...

cemerick12y ago

The cite I'm missing at the moment is a multi-year study that catalogued all known operational transforms over text (there were many more than I imagined prior), along with proofs showing that certain combinations of concurrent operations simply could not be reconciled consistently.

Thanks for the other pointers, though!

dgreensp12y ago

There's actually an interesting deeper connection between OT and CRDT, in which OT comes across as a special case of CRDT.

Suppose your state is a text document or array of characters (we could also examine other kinds of state like an unordered set of objects with properties, but it's less interesting). CRDT assigns a semi-permanent name to each unique data element (character), which is typically a string that indexes into a tree. It's permanent unless the names get too long, in which case you rebalance the tree. The papers I've read treat the rebalancing as an offline operation, to be done one day at 3am when no one is using the system, but in principle you could do it online, as long as you save enough information to rewrite the names in any operations you receive that were meant for the old tree to apply to the new tree. OT is equivalent to rebalancing the tree after every operation. You don't actually need a tree, then, and the names are just numbers (in the case of an array). Names are scoped to a revision, and operations are always rewritten to use the appropriate names before applying them.

Another maintenance operation you might do on a CRDT tree is to remove "garbage" (deleted elements, which you keep around so that you can perform insertion operations relative to them). OT always delete garbage immediately, and operations that refer to a deleted element are rewritten (when they are transformed against the operation that deleted the element).

I'm not saying one is better than the other. People seem to have an easier time wrapping their heads around CRDT, but maybe just because OT hasn't been explained well. The CRDT tree and name strings sounds like kind of a pain to implement versus OT's arrays, but I've only implemented OT and not CRDT.

Saying that APIs are a "mechanism of state transfer" is as overbroad as saying function calls are a mechanism of state transfer. The article at first seems to provide itself an out, by saying that only a certain class of APIs is being considered, but then it defines API as a "set of names." Similarly, you say that any application touching more than one computer is a distributed system, and then you preemptively defend against exceptions by saying, "If this doesn't apply to you, maybe you don't have a distributed system."

More concretely, APIs do a lot of stuff. They send and receive text messages and emails; they transcode video; they turn on your coffee maker; they post to your Facebook wall. Often there is little or no shared representation, except perhaps the status of the operation, which can typically be communicated in a simple way.

Don't get me wrong, I think more APIs could work by synchronizing state. Basically, use something equivalent to a git repo under the hood. Gmail could work this way. Maybe mail servers could even work this way.

Posting to a Facebook wall doesn't work this way. The way to make posting to a Facebook wall use CRDT would be to replace API calls like addPost and deletePost (say) with a single API call "updateWall" which performs arbitrary operations on a user's wall. Thanks to CRDT, this operation never fails (though the client may still want to know when it has completed). In casual conversation at Meteor, we call it the "Lotus Notes" model when all operations go through the data layer, which synchronizes over the network. Asana's internal framework also uses this model, so a couple Meteor devs who worked at Asana have experience with it. The main drawback is that it is difficult to perform validation and security checks. If the Facebook API only has "updateWall," Facebook must determine whether the diff it receives constitutes a valid operation or series of operations for user A to perform on user B's wall (for example, you can add any number of posts to anyone's wall, but only delete posts off your own). This is much more complicated than having addPost and deletePost, each with the appropriate security checks, and knowing that no other operations are permitted.

To abolish The API completely like you say, you'd have to not just have updateWall but basically one, unnamed API call for all of Facebook, and then you could say there's no API.

cemerick12y ago

A lot of different distributed storage and computation architectures are special cases of CRDTs, just with different sets of commutative operations and/or convergent types of state. (One of the aspects of CRDTs that I most appreciate, as it provides a framework within which one can compare different technologies in a thoroughgoing way.) Ones I like to cite as common examples that people have often touched before are datastores like Riak, CouchDB, and S3.

The document model treatment you describe is talked about some in the Shapiro et al. paper as a "continuous sequence", and is roughly what was used by Logoot and Treedoc. The latter is explored more thoroughly here: http://arxiv.org/abs/0907.0929.

I was only talking about network APIs in the original piece. The "set of names" bit was there to establish the lineage between "classic" programming language/library APIs and those that touch the network.

APIs themselves do exactly nothing. It is the computational service on the other side of an API that does something. This conflation is exactly the sort of thing that is allowed and encouraged by the construction of APIs as "just another function you call in your runtime".

I find the Facebook examples you offer very curious. APIs have no inherent model for authentication and authorization, and the same goes for CRDTs. So, why do you think that verifying authorization over a set of operations or set of modifications to some state is any different than verifying authorization on N operations attempted via N API endpoints? I'll certainly grant that the latter comes with a body of current programming practice and infrastructure, but that hardly an endorsement of its relative quality or suitability for the job-to-be-done.

My preferred characterization is that the Facebook API would be replaced with a data model. The original piece already hints at a number of advantages to such an architecture, and omits many others that I'll talk about at a later date.

1 more reply

gritzko12y ago

I'm the author (the leading one) of Yandex Live Letters, which is a CRDT-based EtherPad-like thing. Some flavours of CRDT are indeed related to OT. My favorite technique (pure op-based CRDT variant) is very much operation-centric, but instead of transformations (like in OT), it employs per-operation Lamport identifiers.

Based on our new project named Swarm [1] I may say that CRDT and "async RPC" fits rather nicely together.

OT indeed behaves poorly in a highly asynchronous environment. I suspect, that is the reason why Google Docs doesn't have decent offline mode yet. CRDT (any flavor) is async-friendly.

[1] http://slideshare.net/gritzko/swarm-34428560

_urga12y ago

I think operational transformation is more of a predecessor to CRDTs than a cousin, and OT simply does not work offline, whereas CRDTs do.

ChuckMcM12y ago· 6 in thread

Fun stuff, amusing that the definition of a distributed system used; "Where a computer that you never heard of can bring your system down." is actually one of Leslie Lamport's more famous quotes.

When I joined Sun in '86 I thought it was the pinnacle of technological excellence to be a kernel programmer, and I joined the Systems Group, the notional center of the Sun universe, in 1987. However I discovered that the primary reason you had to be picky about kernel programmers what that their bogus pointer references crashed the machine (as they occurred in kernel mode with full privileges) but discovered that network programmers could crash the whole world with their bugs. So clearly they must be in a pantheon above kernel programmers. :-)

The author has come to discover that in the network world things can die anywhere, and this makes reasoning about such systems very complicated. Having been a part of the RPC and CORBA evolution I keenly felt the challenges of making APIs that "looked" like function calls to a programmer but took place across a network fabric and thus introduced error conditions that couldn't exist in locally called routines. (like the inability to return from the function due to a network partition for a simple example).

Lamport's work in this space is brilliant and inspired. Network systems can be analysed and reasoned about as physical systems when they exhibit discontinuities when considered as simple algorithms. The value here is to realize that a large number of physical systems tolerate a tremendous amount of randomness and continue to work as intended (windmills for example) while many algorithms only work consistently given a set of key invariants.

I gave a talk that was inspired by Dr. Lamports work titled 'Java as Newtonian Physics' which was a call to action to create a set of invariants, in the spirit of physical laws, that would govern the behavior and capabilities of distributed systems. It was way early for its time (AOL dialup connections were still a thing) but much of the same inspiration (presumably from Lamport) made it into the Google Spanner project.

As with many things, at a surface level many people learn an API which does something under the covers across the network but having come up through their education thinking of everything as an API they don't fundamentally grasp the notion of distributed computation. Then at some point in their experience there will be that 'ah ha' moment when suddenly everything they know is wrong, which really means they suddenly see a bigger picture of things. It makes distributed systems questions in interviews an excellent litmus test for understanding where people are in their journey.

jacquesm12y ago

I've never seen an RPC system that I really liked. The closest to a model of distributed computing that gets me from 'a' to 'b' without going terminally insane is anything based on message passing. Even though there is significant overhead I figure that by the time you go distributed and your target of the RPC call or message lives on the other side of a barrier with unknown latency that overhead is probably low compared to the penalties that you'll be hit with anyway.

So then the trick becomes to make sure that a message contains a payload that is 'worth it'.

Making the assumption that any message may not make it to its destination and that confirmations may be lost (akin to your return example) is still challenging but I find it easier to reason about than in the RPC analogy.

I love that Lamport quote :)

A nasty side effect of all this network business is that what looks like a function call can activate an immense cascade of work behind the scenes, gethostbyname (ok, getaddrinfo) is a nice example of such a function. On the surface it's a pretty easily understood affair but by the time you're done and you get your results back you've likely triggered millions of cycles on 'machines that you've never heard of'.

arethuza12y ago

"I've never seen an RPC system that I really liked."

I must admit I've never seen a message passing system that I really liked either :-) Mind you that's possibly because of times making stuff work in environments where someone made the decision "you shall use message passing for all inter-system communication" even when it wasn't always the best option.

These days my practical test for a remote API is whether I can stand using it through cURL - if I can happily do stuff from the command line then the chances are that code to do stuff won't be too insane.

jacquesm12y ago

I liked QnX, currently playing around with Erlang. (Erlang has tons of warts but it gets enough of the moving parts just right that I find it interesting).

1 more reply

gritzko12y ago

Recently I was talking with a guy doing CRDT research. His past background was something CPU design related. I always considered a CPU a Newton/Turing ideal machine. I was surprised to know that it feels more like a distributed system. Due to high frequencies, events that happen in one part of CPU are unknown to other parts for quite a while, i.e. so many ticks later that they have to act semi-independently.

cemerick12y ago

Hi, author here. :-) Thank you for the fun anecdote and kind words.

Hopefully we can collectively get better at addressing these problems.

ChuckMcM12y ago

Absolutely there is more fun to be had. I clearly remember that sort of "ah ha" moment when I figured out that data structures could be computation. That took me from a loop that could not operate fast enough on the data, to one where the data set had some precomputation done on it and the loop only had to 'finish' it for various conditions and was plenty fast. Suddenly large vistas of "wow" open up. The posting from Julia's blog about how computers are really fast, same sort of experience for her. Suddenly a new understanding, the world shifts, and now you have a whole bunch of new insight to throw at problems. We can't help but get better at addressing problems.

I believe it was Leslie but it might have been Butler Lampson who mentioned you could stomp on a bunch of ants and the colony still worked fine. Ants are a great example of a durable distributed system that is robust in the face of massive amounts of damage. When you start thinking about computers like that it makes you realize you can build 100% uptime systems after all. The implementation of that property (individual machines are junk, collectively they are unstoppable) was done really well inside Google's infrastructure. They got to watch it in action when a colo facility they had clusters in caught fire.

logn12y ago· 6 in thread

Rest is less platform dependent than SOAP/RPC. I see that as the main benefit. JSON is easier to work with than XML. The whole idea of service oriented architectures is that users don't need to care about the tech stack details of your service. Rest and JSON do a better job of realizing that vision than SOAP/XML. I don't think anyone's claimed that Rest is a design pattern to end all woes. Maybe we haven't given due thought to what new design patterns (or data structures, architectures, etc) are emerging these days, and in that light, the article presents a lot of interesting pointers.

virmundi12y ago

Actually JSON makes it damn near impossible to uniformally implement REST. You need HATEOS. This means that there has to be a semantic of following links in resources. JSON lacks this ability. ATOM or RSS, both XML, have linking. Heck, XML at a language level supports document linking. HTTP + JSON != REST.

logn12y ago

Interesting. I looked into HATEOAS more and found this link to a PayPal API:

https://developer.paypal.com/docs/integration/direct/paypal-...

I think I'll try to follow Rest better. Anyhow, I still prefer JSON. To me it's a more concise way to explain data, with nice syntax for arrays and maps. I'd rather jump through a few hoops to implement true Rest than try to craft verbose XML schemas for simple things like arrays and maps. ATOM and RSS I think are good for their main use cases but aren't as generic of syntax as JSON.

icebraining12y ago

That's what JSON-LD and other formats are for. As a mediatype, application/json is certainly useless for REST, but there's nothing wrong with using it as an encoding for more semantically relevant formats.

jalfresi12y ago

Whilst I agree with you, there is nothing stopping someone defining link structures in JSON documents, coining a new media type e.g. JSON+Link and boom, problem solved.

virmundi12y ago

Well, I'm responding to you and to your siblings. You're right. You can create a new MIME type and the problem is solved. Fortunately, as the sibling comments pointed out, there is an extension. What I want to see happen is that the JSON + Links becomes a standard. Ideally a W3C. Otherwise we're into the old XKCD comic https://xkcd.com/927/

deathtrader66612y ago

If you want document linking, doesn't JSON-LD [1] work ?

1 - http://json-ld.org/

richm4412y ago· 6 in thread

I stopped at the point where he claimed that APIs were always synchronous, this wasn't even true in the 80s. For example XLib is a rather well used API and is asynchronous (there are many others).

koide12y ago

two paragraphs later he addresses that point, calling that support limited in current api designs

richm4412y ago

Not really, he talks about HTTP which wasn't really designed for that purpose. There are plenty of protocols that were. Does this actually have anything to add that isn't covered in http://en.wikipedia.org/wiki/Fallacies_of_Distributed_Comput... , if so then I'll read further.

pjscott12y ago

It has some very interesting stuff about CRDTs, which is definitely worth a look.

dllthomas12y ago

My recollection was that Xlib is an unfortunately synchronous library written for the the wonderfully asynchronous X protocol.

bitwize12y ago

My understanding was that the Xlib protocol's asynchrony was hardly wonderful, and made syncing with vblank and pixel-perfect frames difficult, which motivates its abandonment in favor of the synchronous, local-host-only Wayland protocol.

dllthomas12y ago

Certainly, the X protocol's asynchronous nature isn't without some downsides (though I think you could address the vblank sync without discarding it). However, I maintain that Xlib itself was a synchronous interface in front of an asynchronous protocol - which gives us the worst of both worlds and motivated Xcb.

mantrax512y ago· 5 in thread

So he favors exposing a standard set of distributed data models instead of having APIs.

What a horrible idea.

Exposing implementations is bad because implementations change.

Exposing implementations is bad because as you expose the intricacies of your data model to your client (which he claims is a benefit) you in turn obscure and hide the intricacies of your business domain, which will surely not allow you to patch a service's distributed data tree in an arbitrary fashion.

It's in essence like having SQL as your underlying data model, and replacing your API with an open read/write/delete access to your SQL server to the entire world, and hoping everyone will run the right queries and all will be all right.

It won't be all right.

APIs will become more asynchronous and eventually all APIs will be seen as protocols, that don't necessarily follow a simple request/response pattern.

But they'll remain in the form of abstract commands and queries modeled after the business domain of the application, and not the underlying data model of it.

derefr12y ago

> It's in essence like having SQL as your underlying data model, and replacing your API with an open read/write/delete access to your SQL server to the entire world, and hoping everyone will run the right queries and all will be all right.

I find it kind of amusing that this was the original purpose of having an "SQL server": letting people (e.g. auditors) submit arbitrary queries, so you won't have to anticipate what exactly they'll want to do with your data. (Write-access was intended to be segregated to particular database users writing to particular tables, though--basically parallel to using WebDAV with HTTP Basic Auth.)

mantrax512y ago

It was, yes, and to this day read-only SQL access to certain tables is not that bad of a practice to allow for report-generating apps within a company.

However the idea of exposing SQL databases publicly as an approach never took hold for many reasons we're today aware of. And the idea of public write access is ridiculous right from its premise.

The anti-API rant of this author shows us that those who don't know their history are doomed to repeat it.

cemerick12y ago

Hi, author here.

APIs already necessitate the use of "standard sets of data models", except such "standardization" takes place over and over for each provider of a particular type of service. Further, APIs themselves have incompatible changes that flow from their underlying transport mechanisms (changing URLs, etc).

Right now, you're sharing data with "clients" that end up depending upon the particular details of that data and its (probably impoverished) representation. IMO, might as well own up to it and address that instead of thinking that you're building anything other than siloed services that demand a high degree of client-server coupling.

Changing data models is a fact of life. I'd much rather have a data medium that accounted for that from the start than a set of folklore about which services accept which data, and in what formats. "Patching" of extant data is not necessary (though certainly possible, depending on all sorts of factors); things like views are hardly new, and can be leveraged at every level of the system to match old (or new!) shapes of data with services that expect new (or old!) data.

You say that "APIs will become" something. Their defining feature is their manifestation in our programming languages and libraries, not their semantics with regard to the network. Network APIs have been kicking around for 30-40 years, web APIs for 20 years now. I don't think we should expect much new at this point. I'd rather look towards approaches that have something substantial to say about the fundamental problems in question.

Shorel12y ago

I believe it is not having SQL as you data model, it is having a GIT repository as your data model.

Git has the lattice eventual consistence the article talks about.

About opening the data to the entire world, just thin about GitHub public repositories.

The missing issue is how to deal with merge conflicts.

Shorel12y ago

Errata: thin should be think

kylebrown12y ago· 4 in thread

Distributed API's are a big part of Ethereum. I think the Merkle tree of the bitcoin blockchain (and the Patricia tree of the Ethereum blockchain) might even qualify as a semilattice.

In fact, its by the physics of information theory that a cryptographic blockchain solves the consensus problem. Specifically, information theory emerges from the laws of thermodynamics: Maxwell's demon is essentially what secures one's private keys from brute-force cracking attempts.

I'd like to see a comparison of how the blockchain solves the CAP problem, alongside CRDT's. Are they not both solutions to the same problem?

marktangotango12y ago

FYI information theory entropy and physics entropy really aren't the same thing:

http://physics.ucsd.edu/do-the-math/2013/05/elusive-entropy/

kylebrown12y ago

Well that seemed excessively pedantic IMHO. It actually didn't touch much on information theory, and where it did, many of the comments disagree. I'll cite the Landauer limit[1] as what (yes, arguably) connects the entropy of information theory to the entropy of physics.[2]

Also, I only mentioned physics because the article did, quoting Lamport "Most people view concurrency as a programming problem or a language problem. I regard it as a physics problem."

Unfortunately the article didn't elaborate any more on the precise type of physics problem in question (Maybe Lamport does elsewhere), whether the physics of computational complexity or the physics of information theory, or something else. But even those two sub-fields have many connections and similarities (as does pretty much everything in physics and math. such connections are the bread-and-butter of theoreticians).

1. http://en.wikipedia.org/wiki/Von_Neumann-Landauer_limit

2. http://en.wikipedia.org/wiki/Entropy_in_thermodynamics_and_i...

kylebrown12y ago

Here's an article which discusses Lamport's view: "The physics of distributed information systems"[1]. The first sentence: "This paper aims to present distributed systems as a new (interesting) area of applications of statistical physics, and to make the the case that statistical physics can be quite useful in understanding such systems."

It has several mentions of statistical physics, but (curiously) no mentions of entropy. It does however discuss the Byzantine Generals problem, which of course is the problem the bitcoin blockchain solves.

1. http://iopscience.iop.org/1742-6596/473/1/012017/pdf/1742-65...

kylebrown12y ago

Okay, last one. Most concise counter-argument: http://en.wikipedia.org/wiki/Brute-force_attack#Theoretical_...

Derived from the Landauer limit.

jwingy12y ago· 2 in thread

CALM and CRDTs are interesting stuff.

That being said, I feel like the author is confusing a bit the specific implementations of modern APIs vs the concept of an API which I see as simply some (somewhat standardized) interface to a system which you don't own. Those seem like two different problem domains to me, but perhaps I'm arguing over a different definition of APIs than from what the author is talking about....

cemerick12y ago

Hi, author here.

(Somewhat standardized) interfaces are _fine_. My contention is that you can have an interface shared by disparate actors without the problematical bits of "APIs" (both in spirit and in their particular current best materializations), which provide no useful data model constraints, do not acknowledge the realities of the network, and inherently couple client and server.

The point is that you can have a shared "interface" over _data_, in exactly the same way as producers and consumers share shapes/types of messages routed via queues — except that there are ways (CRDTs being one) to extend that dynamic so that data can be replicated along any topology, and shared and reacted to by N actors, not just a consumer downstream of your producer.

I hope that clarifies. :-)

mantrax512y ago

I think you got it exactly right. It feels like the author got a little too excited about CRDT-s and forgot all the other principles of good system design (it's about clear stable interface, low coupling, single responsibility and so on).

ddp12y ago· 2 in thread

I believe it was Leslie Lamport who said, "A distributed system is one in which the failure of a computer you didn't even know existed can render your own computer unusable."

cmeiklejohn12y ago

The reference is here:

http://research.microsoft.com/en-us/um/people/lamport/pubs/d...

cemerick12y ago

Dammit, thank you. I've even read that exact message before. :-/ Post updated!

cmeiklejohn12y ago· 1 in thread

I also covered a variety of similar issues when discussing that offline rich-web applications are perfect for CRDTs, because you are effectively building a distributed system, in my EmberConf 2014 talk [1][2] called Convergent/Divergent.

[1] http://confreaks.com/videos/3311-emberconf2014-convergent-di...

[2] https://speakerdeck.com/cmeiklejohn/divergent

* edited to reformat list.

cmeiklejohn12y ago

Also, very relevant:

"A Note on Distributed Computing" 1994, Sun Microsystems Technical Report

* http://lambda-the-ultimate.org/node/1450

* http://dl.acm.org/citation.cfm?id=974938

baldeagle12y ago· 1 in thread

TL;DR: APIs have issues with concurrency and latency, amongst others. Use Consistency As Logical Monotonicity (CALM) or Conflict-free Replicated Data Types (CRDTs) instead. Here is a little about how CRDTs work. btw: speech in NY on the 15th of May.

lstamour12y ago

Thanks for the summary. Via Google, ended up at http://www.slideshare.net/jboner/the-road-to-akka-cluster-an... which I've bookmarked to watch at a future date, since I'm new to these concepts.

josephschmoe12y ago· 1 in thread

Many API users are not knowledgeable to the intricacies of network programming. This could definitely use an executive summary at the top with the following info: "What will these new libraries that replace APIs offer to single point API users?"

cemerick12y ago

(Author here.) As a start, hopefully removing the blinders that makes statements like "many API users are not knowledgeable to the intricacies of network programming" oh so true. That that statement can be made without popular incredulity only reinforces the point that modern network API technologies have largely been built to sustain the illusion that there is no network, and you're just making a method call somewhere. Insert this-plt-life GIF here. :-P

Building systems with things like CRDTs and tools and languages that support CALM will allow people using point-to-point APIs to continue to do the things they do now, but remove much of the incidental complexity from the equation. An example would be that, when you are relying upon N replication mechanisms to move CRDT state or operations from _here_ to _there_, you don't need complex timeout, retry, and backoff mechanisms to compensate for the realities of what's connecting the two parties. The message will arrive when it arrives...exactly the only guarantee that you can make in the general case of someone talking to external services.

anuraj12y ago· 1 in thread

This is a network programmers view. System, Network programmers concern themselves about systems and topologies. For an application programmer, both needs to be abstracted and only the business logic is important. Money is in the top of the pyramid now - hence the proliferation of APIs.

cemerick12y ago

The point of the piece, in large part, was to emphasize that we're all network programmers. If you're whacking away with APIs and couldn't care less about the broader system its topology, please stay away from me. ;-)

APIs are proliferating because their coupling around client/server process and data representation makes for high switching costs and thus sweet vendor lock-in.

dreamfactory212y ago· 1 in thread

Hmm the article seems based on some false assumptions. I'd argue that the whole point of REST as an architectural style is to be stateless and async. Of course you would use an ESB of some kind rather than point-to-point if you want to protect yourself from failure of a solution component - REST lends itself well to that or to building error-handling in the client. And isn't 'turning operations into data' what we are doing by switching from a verb-based model to a noun-based one?

cemerick12y ago

Hi, author here. REST has its set of semantics, but (a) I don't think they're particularly useful for building computational services with, and (b) it's for all practical purposes predicated on HTTP, which carries a lot of baggage. Each _request_ is stateless (barring things like sessions^H^H^H^H^H hack workarounds), but clients and servers certainly are not; and, how one maintains that state and orchestrates further REST interactions based on an intermediate response is entirely on the implementer, _every single time_ a service or client is built/used.

I'd personally much prefer communication and computational primitives that can just as easily be used for a point-to-point interaction as they can be used to _build_ an ESB (enterprise service bus, I believe you mean?) if that's what I want.

I don't think nouns vs. verbs are a useful distinction. Turning operations into data is a first step, but all data is not equivalent. Some representations lend themselves to composition such that you can represent essentially arbitrary structures (sets, graphs, trees, multimaps, etc), but most (including the common ones of JSON and XML) do not. Likewise, some data representations allow for commutative operations so as to reconcile concurrent actors' activity, but most (again including JSON and XML) do not.

ryanobjc12y ago· 1 in thread

CRDTs are absolutely fascinating, but sometimes I really wonder. It seems like you throw words like 'semi-lattice' around ...

Also there is one particular element to the eventual consistency that bothers me, it's that all these eventually consistent algorithms aren't how high powered neural nets will work. Our brain is highly eventually consistent, but it computes without ever needing these algorithms.

bm136212y ago

I think you're taking the neural net _model_ of the brain too strictly.

iadapter12y ago

Of course APIs that serve as synchronous endpoints to distributed systems are a leaky abstraction. But its not the only one of its kind, there's also Guaranteed Message Delivery [1].

I find the philosophy behind Akka in this context a better fit - embrace that networks are unreliable and build your app around this limitation accordingly [2]. The cost is that it results in more work for the developer just like with the usage of CRDTs.

[1] http://www.infoq.com/articles/no-reliable-messaging

[2] http://doc.akka.io/docs/akka/2.1.0/general/message-delivery-...

1 more reply

sagargv12y ago

Joel Spolsky had written along similar lines and argued that it's important to know what is happening beneath abstractions.

http://www.joelonsoftware.com/articles/LeakyAbstractions.htm...

rooted12y ago

I think a distributed system is better defined as a system where timing becomes an issue to the coordination of components in the system.

j / k navigate · click thread line to collapse

94 comments

71 comments · 18 top-level

programminggeek12y ago· 9 in thread

At least the author is wise enough to see that REST/RPC are not so different from each other.

I'm not saying we've made an unreasonable tradeoff, it's just interesting that we seem to be making more refined versions of the same solutions with the same fundamental problems.

I guess the author was making a similar point.

steveklabnik12y ago

"Layman's REST" is very much RPC, yes.

Fielding's REST is very much not.

mantrax512y ago

Fielding's REST is pretty much CRUD in HTTP disguise.

Don't get me wrong, this can be great for "hypermedia applications" as Fielding's paper argues. But "hypermedia applications" just doesn't fit what many distributed services do these days.

steveklabnik12y ago

I agree that many services are simply CRUD wrappers. That doesn't have much to do with the nature of the architecture Fielding proposes.

I would be interested in some citations from Fielding which demonstrate that RPC is its organizational principle. I don't think they're there, though.

3 more replies

ademarre12y ago

The problem with REST is that many (I dare say most) who think they are applying it really aren't.

I think this is quite relevant: http://roy.gbiv.com/untangled/2008/rest-apis-must-be-hyperte...

2 more replies

bodhi12y ago

What are some other problem domains that are better represented in verb-oriented terminology?

1 more reply

icedchai12y ago

Exactly this. I've seen people doing crazy, non-nonsensical stuff to make their APIs "RESTful". So RESTful that they make no sense.

dreamfactory212y ago

> Services are naturally centered arounds verbs (commands and queries) and not nouns (resources)

Can you qualify that? Queries are presumably queries on resources, and which commands did you have in mind apart from creating or updating resources?

1 more reply

grey-area12y ago

http://steve-yegge.blogspot.co.uk/2006/03/execution-in-kingd...

scotth12y ago

Aside: Why do you have so many accounts mantrax? At least one of them is dead, and I'm sure you can imagine why.

3 more replies

dgreensp12y ago· 7 in thread

cemerick12y ago

_urga12y ago

I think the paper you are referencing might be [1]?

It's one of my favorite papers on CRDTs and provides practical pseudocode for learning how to implement CRDTs yourself.

The structures they present are simple to understand and have good performance characteristics compared to similar CRDTs [2].

[1] Replicated abstract data types: Building blocks for collaborative applications - http://dl.acm.org/citation.cfm?id=1931272

[2] Evaluating CRDTs for Real-time Document Editing - http://hal.archives-ouvertes.fr/docs/00/62/95/03/PDF/doce63-...

cemerick12y ago

Thanks for the other pointers, though!

dgreensp12y ago

There's actually an interesting deeper connection between OT and CRDT, in which OT comes across as a special case of CRDT.

To abolish The API completely like you say, you'd have to not just have updateWall but basically one, unnamed API call for all of Facebook, and then you could say there's no API.

cemerick12y ago

1 more reply

gritzko12y ago

Based on our new project named Swarm [1] I may say that CRDT and "async RPC" fits rather nicely together.

OT indeed behaves poorly in a highly asynchronous environment. I suspect, that is the reason why Google Docs doesn't have decent offline mode yet. CRDT (any flavor) is async-friendly.

[1] http://slideshare.net/gritzko/swarm-34428560

_urga12y ago

I think operational transformation is more of a predecessor to CRDTs than a cousin, and OT simply does not work offline, whereas CRDTs do.

ChuckMcM12y ago· 6 in thread

Fun stuff, amusing that the definition of a distributed system used; "Where a computer that you never heard of can bring your system down." is actually one of Leslie Lamport's more famous quotes.

jacquesm12y ago

So then the trick becomes to make sure that a message contains a payload that is 'worth it'.

I love that Lamport quote :)

arethuza12y ago

"I've never seen an RPC system that I really liked."

jacquesm12y ago

I liked QnX, currently playing around with Erlang. (Erlang has tons of warts but it gets enough of the moving parts just right that I find it interesting).

1 more reply

gritzko12y ago

cemerick12y ago

Hi, author here. :-) Thank you for the fun anecdote and kind words.

Hopefully we can collectively get better at addressing these problems.

ChuckMcM12y ago

logn12y ago· 6 in thread

virmundi12y ago

logn12y ago

Interesting. I looked into HATEOAS more and found this link to a PayPal API:

https://developer.paypal.com/docs/integration/direct/paypal-...

icebraining12y ago

jalfresi12y ago

Whilst I agree with you, there is nothing stopping someone defining link structures in JSON documents, coining a new media type e.g. JSON+Link and boom, problem solved.

virmundi12y ago

deathtrader66612y ago

If you want document linking, doesn't JSON-LD [1] work ?

1 - http://json-ld.org/

richm4412y ago· 6 in thread

I stopped at the point where he claimed that APIs were always synchronous, this wasn't even true in the 80s. For example XLib is a rather well used API and is asynchronous (there are many others).

koide12y ago

two paragraphs later he addresses that point, calling that support limited in current api designs

richm4412y ago

pjscott12y ago

It has some very interesting stuff about CRDTs, which is definitely worth a look.

dllthomas12y ago

My recollection was that Xlib is an unfortunately synchronous library written for the the wonderfully asynchronous X protocol.

bitwize12y ago

dllthomas12y ago

mantrax512y ago· 5 in thread

So he favors exposing a standard set of distributed data models instead of having APIs.

What a horrible idea.

Exposing implementations is bad because implementations change.

It won't be all right.

APIs will become more asynchronous and eventually all APIs will be seen as protocols, that don't necessarily follow a simple request/response pattern.

But they'll remain in the form of abstract commands and queries modeled after the business domain of the application, and not the underlying data model of it.

derefr12y ago

mantrax512y ago

It was, yes, and to this day read-only SQL access to certain tables is not that bad of a practice to allow for report-generating apps within a company.

However the idea of exposing SQL databases publicly as an approach never took hold for many reasons we're today aware of. And the idea of public write access is ridiculous right from its premise.

The anti-API rant of this author shows us that those who don't know their history are doomed to repeat it.

cemerick12y ago

Hi, author here.

Shorel12y ago

I believe it is not having SQL as you data model, it is having a GIT repository as your data model.

Git has the lattice eventual consistence the article talks about.

About opening the data to the entire world, just thin about GitHub public repositories.

The missing issue is how to deal with merge conflicts.

Shorel12y ago

Errata: thin should be think

kylebrown12y ago· 4 in thread

Distributed API's are a big part of Ethereum. I think the Merkle tree of the bitcoin blockchain (and the Patricia tree of the Ethereum blockchain) might even qualify as a semilattice.

I'd like to see a comparison of how the blockchain solves the CAP problem, alongside CRDT's. Are they not both solutions to the same problem?

marktangotango12y ago

FYI information theory entropy and physics entropy really aren't the same thing:

http://physics.ucsd.edu/do-the-math/2013/05/elusive-entropy/

kylebrown12y ago

Also, I only mentioned physics because the article did, quoting Lamport "Most people view concurrency as a programming problem or a language problem. I regard it as a physics problem."

1. http://en.wikipedia.org/wiki/Von_Neumann-Landauer_limit

2. http://en.wikipedia.org/wiki/Entropy_in_thermodynamics_and_i...

kylebrown12y ago

1. http://iopscience.iop.org/1742-6596/473/1/012017/pdf/1742-65...

kylebrown12y ago

Okay, last one. Most concise counter-argument: http://en.wikipedia.org/wiki/Brute-force_attack#Theoretical_...

Derived from the Landauer limit.

jwingy12y ago· 2 in thread

CALM and CRDTs are interesting stuff.

cemerick12y ago

Hi, author here.

I hope that clarifies. :-)

mantrax512y ago

ddp12y ago· 2 in thread

I believe it was Leslie Lamport who said, "A distributed system is one in which the failure of a computer you didn't even know existed can render your own computer unusable."

cmeiklejohn12y ago

The reference is here:

http://research.microsoft.com/en-us/um/people/lamport/pubs/d...

cemerick12y ago

Dammit, thank you. I've even read that exact message before. :-/ Post updated!

cmeiklejohn12y ago· 1 in thread

[1] http://confreaks.com/videos/3311-emberconf2014-convergent-di...

[2] https://speakerdeck.com/cmeiklejohn/divergent

* edited to reformat list.

cmeiklejohn12y ago

Also, very relevant:

"A Note on Distributed Computing" 1994, Sun Microsystems Technical Report

* http://lambda-the-ultimate.org/node/1450

* http://dl.acm.org/citation.cfm?id=974938

baldeagle12y ago· 1 in thread

lstamour12y ago

Thanks for the summary. Via Google, ended up at http://www.slideshare.net/jboner/the-road-to-akka-cluster-an... which I've bookmarked to watch at a future date, since I'm new to these concepts.

josephschmoe12y ago· 1 in thread

cemerick12y ago

anuraj12y ago· 1 in thread

cemerick12y ago

APIs are proliferating because their coupling around client/server process and data representation makes for high switching costs and thus sweet vendor lock-in.

dreamfactory212y ago· 1 in thread

cemerick12y ago

ryanobjc12y ago· 1 in thread

CRDTs are absolutely fascinating, but sometimes I really wonder. It seems like you throw words like 'semi-lattice' around ...

bm136212y ago

I think you're taking the neural net _model_ of the brain too strictly.

iadapter12y ago

Of course APIs that serve as synchronous endpoints to distributed systems are a leaky abstraction. But its not the only one of its kind, there's also Guaranteed Message Delivery [1].

[1] http://www.infoq.com/articles/no-reliable-messaging

[2] http://doc.akka.io/docs/akka/2.1.0/general/message-delivery-...

1 more reply

sagargv12y ago

Joel Spolsky had written along similar lines and argued that it's important to know what is happening beneath abstractions.

http://www.joelonsoftware.com/articles/LeakyAbstractions.htm...

rooted12y ago

I think a distributed system is better defined as a system where timing becomes an issue to the coordination of components in the system.

j / k navigate · click thread line to collapse