When I worked at Google I sat down one day across from one of the gRPC engineering leads who was talking about the things they were doing for the then current generation of gRPC. I asked if I could ask some questions about it and they agreed and then dissected their design in half a dozen ways that would fail in both non-important but irritating ways, and in critical ways at scale. They were amazed at I had thought about this topic so deeply as it was all “state of the art” and I was, nominally, “old.” I pointed out that I had been the ONC RPC architect at Sun in the ‘80s during the first RPC wars and while the implementations had change, fundamentally messaging as a form of procedure call has some fundamentally bad properties. These challenges manifest in all aspects of RPC, from marshaling data, to rendezvousing, to delivery reliability and guarantees. Andy Birrell at DEC SRC and Leslie Lamport had written dozens of papers looking at these challenges in small systems and large. There was literally decades of solid research that the engineer in front of me at the cafeteria that day was re-discovering from first principles.
RPC protocols from Sun, DEC, Microsoft, OSI, SOAP, the IETF, and the Open Group have run at this problem again and come up with different solutions that each have their own set of warts. Good for some things, not great for others. But at this point there are enough options at this point.
What is missing from Buf’s material is what I might call the “Chesterson’s fence” material that dives into why all of these previous versions were insufficient and how their new version of gRPC will solve all those problems without adding new wrinkles.
I think it is great that they are trying to improve the state of the art, I would feel better about it if they also demonstrated they understood what had come before.
Maybe I'm just another clueless millennial developer who doesn't understand the history of the 80's or whatever, but I've never been able to understand this claim that RPC is broken. There's a lot of assertions that everyone knows RPC was broken because smart people in the 80's said so but... no one has ever been able to give me a concrete reason why.
RPC, at least as I've always known it, really just boils down to request/response protocols. You send a request, you get a response. While this is admittedly not the only possible networking pattern, it is the dominant one across almost all distributed systems I've worked with. HTTP itself is request/response -- it's basically the same thing.
All gRPC and Protobuf are doing that is different from HTTP is they're using a binary protocol based on explicitly-defined schemas. The protobuf compiler will take your schemas and generate convenient framework code for you, so you don't have to waste your time on boilerplate HTTP and parsing. And the binary encoding is faster and more compact than text-based encoding like JSON or XML. But this is all convenience and optimization, not fundamentally different on a conceptual level.
Neither HTTP nor RPC protocols have ever pretended to solve higher-level distributed systems concerns of fault tolerance, network partitions, reliable delivery, etc. Those are things you build on top. You need a basic mechanism to send messages before you can do any of that.
What, exactly, is the magical non-RPC approach of the 80's that we're all missing? Can you explain the alternative?
EDIT: Also, like, the entirety of Google is built out of services that RPC to each other, but the 80's called and said that's wrong? How am I supposed to take this seriously?
"non-RPC", as best I can interpret it, means "broadcasting" useful messages / FYIs without much out-of-band coupling and listening for interesting messages. You don't know who's gonna receive the message, what they'll do with it, "when" they'll act on it.
RPC is inspired by "procedure call" on a single CPU, which is the complete opposite. in a "procedure call" you know exactly the implementation you're gonna get, when it will be executed, etc.
You can find glimpses of this in lots of companies, when there's heavy use of a message bus like Kafka. Protobufs as "messages" instead of mere procedure "call" arguments.
What do you think?
The basic premise being that you specify the interface and you can use tooling to build some skeleton code that makes the code the user writes look like any other code they write, and yet it might magically be running on half a dozen machines.
Of course the actual difference between invoking a “procedure call” which is simply a program counter change and the same stack you had before, and one where the parameters provided are marshaled into a canonical form so that at the destination you can reliably unmarshal them and correctly interpret them, where the step that had been done by the linker resolving one symbol in your binary is now an active agent that is using yet another protocol at the start of execution to resolve the symbols and plumb the necessary networking code. And the execution itself which may happen exactly as expected, or happen multiple times without you knowing it has done so, or might not happen at all.
The minimalist camp, of which I consider myself a member, says “No, you can’t make these seamless, they really are just syntactic sugar that lets you specify a network protocol.” In that simple world you acknowledge that, and plan for, any part of the process to fail. Your code had failure checks and exceptions that deal with “at most once” or “at least once” semantics, you write functions rather than procedures to be idempotent when you can to minimize the penalty of trying to maintain the illusion of procedure call semantics in what is in fact a network protocol implementation.
But there is another camp, and from the material Buf has put out they seem to be in that camp, which is “networking is hard and complicated, but we can make it so that developers don’t need to even know they are going over a network. Just use these tools to describe what you want to do and we’ll do all the rest.”
My experience is that obfuscating what is going on under the hood to lower the cognitive load on developers breaks down when trying to distribute systems. That is especially true for languages that don’t explicitly allow for it. The number of projects/ideas/companies that have crashed on that reef are numerous.
And there is this part : “All gRPC and Protobuf are doing that is different from HTTP is they're using a binary protocol based on explicitly-defined schemas. The protobuf compiler will take your schemas and generate convenient framework code for you, so you don't have to waste your time on boilerplate HTTP and parsing. And the binary encoding is faster and more compact than text-based encoding like JSON or XML. But this is all convenience and optimization, not fundamentally different on a conceptual level.”
I agree 100% with that statement, and that is exactly what ONC RPC does, and that is exactly what ASN.1 does, and that is exactly what DCS does. That same wheel, again and again. So what I was suggesting originally is that Buf should try to explain what they are doing that these other systems failed to do, and in that explanation acknowledge the reasons this wheel has been re-invented so many times before, and then explain how they think they are going to make a more durable solution that lasts for more than a few years.
May 2020- $1M Pre-Seed
Sept 2020- $3.7M Seed
April 2021- $20.7M Series A
Dec 2021- $68M Series B
How can so many rounds be condensed so quickly for a business like this? Is the number of Homebrew downloads (37k as advertised on the home page) a metric that can lead to an 18 month ramp to Series B now?I think there's a trend at play here I'd love to hear more about.
(Disclosure: I was the maintainer of Protobuf who put together the first open source release at Google, and I made a small investment in buf early on.)
If someone wants to use json.com to create a company to promote json - DM me.
Shoot me an email, mine's on my HN profile.
If I'm deciphering the parent's comment correctly, probably with an HTTP POST to https://json.com/json {"json": "I love JSON!"}
--- btw @opendomain: the twitter handle on the site is outdated.
I read the company's primary blog blog post, https://buf.build/blog/api-design-is-stuck-in-the-past, about "schema driven development" and agree with a lot of it. Which is why I'm a huge fan of GraphQL and related completely free open source libraries, where I define my API endpoints with a strongly typed yet easily evolvable schema, and auto-generate my Typescript types from my GraphQL definitions.
$93 million dollars is just nuts to me.
And before anyone goes all ‘but what about Dropbox?’ when scrutinizing this idea... Dropbox was never really made for technical people, this is squarely at people who know what JSON means, so technical.
I don't understand this comparison. Apollo raised $130M this past summer -- doesn't seem that different to TFA. Is that also nuts to you?
The Protocol Buffers and gRPC ecosystem are also completely free open source libraries. Replace GraphQL with Protobuf and your post is still correct.
Yes, absolutely. I love the Apollo open source libs, and I can currently see how many customers would choose to pay for their services, but yes, I think $130 million is also nuts.
Note I did preface my comment with "I don't understand finance at all and never will." so I'm certainly not saying I'm right here.
1. Reduced bandwidth ingress 2. Automatic tracing of PII through your system 3. Developer-controlled ops stuff (annotating an RPC as cachable, etc) 4. Automated tracing instrumentation 5. Message streaming (gRPC streams are amazing)
I can think of a whole host of features that can be built off of protos (I've even built ORMs off of protobuffs for simple things [0]). The value prop is there IMO. HTTP + json APIs are a local minima. The biggest concerns "I want to be able to view the data that is being sent back and forth" is a tooling consideration (curl ... isn't showing you the voltages from the physical layer, it is decoded). Buff is building that tooling.
You can't detect all breaking changes automatically. A field can subtly shift semantics on an API level, yet that breaks a workflow for some downstream consumer somewhere.
OpenAPI and other API description languages give a clear an unambiguous description of an API that can auto-generate clients just fine. Binary JSON/gzipped JSON is frequently very space efficient too. I'm happy to grant the rest of your points, but I cannot see that much value here from an SME perspective. Using tracing and other advanced techniques require the right knowledge to use, and I don't think it's that common in smaller orgs.
My company uses gRPC and it's an absolute nightmare but not so much to the point where we'd use a company like this to add on MORE costs to our infrastructure.
It baffles me people choose buzzword technology because "ex-googler" or whatever when 99% of companies that choose it will NEVER hit the scale it was meant for. Best of luck to the sales team. They'll be the driving force I'm sure.
REST is fine for 99% of companies. Long live REST.
Article quote:
> We just closed a $68M Series B co-led By Lux and Tiger Global, with participation from Greenoaks Capital Partners, Lightspeed, Addition, and Haystack.
Good luck to buf.build to sell a revamped wsdl and getting everyone to adopt it.
Also does anyone here use them and have any thoughts about their product?
Well, they say the best time to take investment is when you don't need it. If you wait until you need it then the terms will be worse. If investors are offering you money when you don't need it, it may be the best time to accept it.
The sequence of raises here look like a fairly normal sequence for a growing startup, except that they happened much closer together than would be typical. The terms aren't shown but assuming they are in line with a typical sequence then this is a great outcome for buf as it gives them lots of room to build their vision without needing to stress over money for a while.
> Also does anyone here use them and have any thoughts about their product?
FWIW, long ago I was the maintainer of Protobuf at Google, including putting together the first open source release. I like what buf is doing -- enough that, full disclosure, I made an angel investment in their seed round.
There's a huge amount of room for better tooling around Protobuf. Binary and strongly-typed protocols require strong tooling to be usable, but with tooling they can be much better than dynamic and text-based approaches. Like, the fact that the protocol is binary shouldn't make it any harder for a human to read it, because your tools should decode it for you on-demand as easily as you could `cat` a text file. Protobuf historically has had sort of the bare minimum tooling and required a lot of ad hoc copying proto files around between projects to get anywhere, which was a pain. A registry seems like the right first step to making things easier, but I'm really excited about what can be done after that... once you have strong type information, you can have tools to dynamically explore APIs, trace communications, etc.
I know this might not be the best way to ask but have they considered creating proto rules for Bazel? The existing proto + gRPC story is pretty unfortunate.
Hmm interesting, I need to go read some books, gimme a few weeks to come up with an intelligent response, thanks for your insight
Based on their website, they solve the following problems:
- a central schema registery. Even if that's something you actually want, it's not a problem that requires $93M to solve, or a commercial company to operate
- communicating schema changes primarily via human-oriented sources like handwritten documentation on emails. I mean sure, if you are a masochist (or a sufficiently inefficient org), you might do just that. The rest of us here in the 21st century can check the schema into a Git repo instead.
- schema drift on the client end. Tough, this is what happens when you write software. Adding a third party won't help here.
- dependency management. For APIs? I cannot imagine a single case where that would help and your API isn't already a monstrosity.
I do agree with what you're saying from my perspective as an average swe that doesn't do anything highly specialized...
I wonder though, what would the workflows look like for a swe, product, ops person even, where schema changes get passed around and edited so much, (kind of making some assumptions here, comparing central schema registry to crm use cases), that this would be necessary...
Wonder what the shape of the problem looks like...
A binary format may save some bandwidth and be slightly harder to reverse engineer - at the cost of being easily introspectable out of the box during development.
I don't think there is enough value to sell something.
I hope it gains traction and cargo cuting companies with bored engineers start using them, so hopefully the next company I work with won't have some terribly complicated and unusable graphql but just protobufs.
For those who still want / need binary protocols and schemas, look at FlatBuffers or Cap'n Proto instead. At least they are capable of representing domain structures properly.
My previous commentary: https://news.ycombinator.com/item?id=18190005
> his article appears to be written by a programming language design theorist who, unfortunately, does not understand (or, perhaps, does not value) practical software engineering.
I'm not the author, but they mention their prior industrial experience with protobufs at Google, among other unnamed places.
I'm not a PL theorist either, and I see that you don't fully understand the problems of composability, compatibility, and versioning and are too eager to dismiss them based on your prior experience with inferior type systems. And here's why I think it is the case:
> > This is especially true when it comes to protocols, because in a distributed system, you cannot update both sides of a protocol simultaneously. I have found that type theorists tend to promote "version negotiation" schemes where the two sides agree on one rigid protocol to follow, but this is extremely painful in practice: you end up needing to maintain parallel code paths, leading to ugly and hard-to-test code. Inevitably, developers are pushed towards hacks in order to avoid protocol changes, which makes things worse.
You are conflating your experience with particular conventional tooling with a general availability of superior type systems and toolings out there. There's a high demand in utilising their properties in protocol designs today, where most of the currently popular protocols are hampering type systems for no good reason (no productivity gain, no performance gain, no resource utilisation gain).
Version negotiation is not the only option available to a protocol designer. It is possible to use implicit-for-client and explicit-for-developer strategies to schema migration. It is also possible to semi-automate inference of those strategies. Example [1]
> This seems to miss the point of optional fields. Optional fields are not primarily about nullability but about compatibility. Protobuf's single most important feature is the ability to add new fields over time while maintaining compatibility.
There are at least two ways to achieve compatibility, and the optional fields that expand a domain type to the least common denominator of all encompassing possibilities is the wrong solution to this. Schema evolution via unions, versioning, and migrations is the proper approach that allows for strict resolution of compatibility issues with a level of granularity (distinct code paths) you like.
> Real-world practice has also shown that quite often, fields that originally seemed to be "required" turn out to be optional over time, hence the "required considered harmful" manifesto. In practice, you want to declare all fields optional to give yourself maximum flexibility for change.
This is false. In practice I want a schema versioning and deprecation policies, and not ever-growing domain expansion to the blob of all-optional data.
> It's that way because the "oneof" pattern long-predates the "oneof" language construct. A "oneof" is actually syntax sugar for a bunch of "optional" fields where exactly one is expected to be filled in.
this is not true either, and it doesn't matter what pattern predates which other pattern. Tagged unions are neither a language construct nor a syntax sugar, it's a property of Type Algebra where you have union- and product-compositions. Languages that implement Type Algebra don't do it to just add another fancy construct, they do it to benefit from mathematical foundations of these concepts.
> How do you make this change without breaking compatibility?
you version it, and migrate over time at your own pace without bothering your clients too often [1]
You can draw your own conclusion based on the provided arguments and some additional exploratory work. Someone else's opinion is good but optional and is not always as insightful as your own discoveries.
IMO. Generating gRPC code in multiple languages is pretty tedious setup and maintain. Buf has the potential to replace / free up a lot of time for a small team of people maintaining this sort of thing in house.
- person who helps manage a protobuf monorepo.