Nothing mystical about it
I also published a space-efficiency benchmark of those same formats (https://arxiv.org/abs/2201.03051) and ended up creating https://jsonbinpack.sourcemeta.com as a proposed technology that does binary serialization of JSON using JSON Schema.
Sure - it wasn't all guns and roses, but overall it rocked.
However, it is still heavily under development and not ready for production use. Definitely looking for GitHub Sponsors or other type of funding to support it :)
It ads an extra layer of complexity most people don't need.
You need to compile the protobufs and update all services that use them.
It's extra software for security scans.
Regular old http 1 rest calls should be the default.
If you are having scaling problem only then should you consider moving to grpc.
And even then I would first consider other simpler options.
> You need to compile the protobufs and update all services that use them.
You need to update all the services when you change your REST API too right? At least protobufs generates your code automatically for you, and it can do it as part of your build process as soon as you change your proto. Changes are backwards compatible so you don't even need to change your services until they need to change.
The only real advantage of grpc and protobufs have are speed and reduced data transmission.
And hey fair enough man if those are your bottle necks.
And by doing that we've added extra layers and it ended up slower than it would have been had we just used regular rest.
Further more now we need to keep evoy up to date.
Occasionally they break their API on major versions. Their config files are complicated and confusing.
So, imo, grpc should only be used for service to service communication where you don't want to share the code with a UI and speed and throughput is very very important.
And speed of http 1 rarely is the bottleneck.
gRPC supports HTTPv1 and can be mapped to a RESTful API (e.g. https://google.aip.dev/131).
I re-used the dissector for my Dnstap fu, which has since been refactored to a simple composable agent (https://github.com/m3047/shodohflo/tree/master/agents) based on what was originally a demo program (https://github.com/m3047/shodohflo/blob/master/examples/dnst...) because "the people have spoken".
Notice that the demo program (and by extension dnstap_agent) convert protobuf to JSON: the demo program is "dnstap2json". It's puzzlingly shortsighted to me that the BIND implementation is not network aware it only outputs to files or unix sockets.
The moment I start thinking about network traffic / messaging the first question in my mind is "network or application", or "datagram or stream"? DNS data is emblematic of this in the sense that the protocol itself supports both datagrams and streams, recognizing that there are different use cases for distributed key-value store. JSON seems punctuation and metadata-heavy for very large amounts of streaming data, but a lot of use cases for DNS data only need a few fields of the DNS request or response so in practice cherry picking fields to pack into a JSON datagram works for a lot of classes of problems. In my experience protobuf suffers from a lack of "living off the land" options for casual consumption, especially in networked situations.
Honestly the biggest failing for those guys was not making a good Javascript implementation. Seems C++ aint enough these days. Maybe emcscripten works? Anyone tried it ?
https://news.ycombinator.com/item?id=25585844
kenton - if you’re reading this - learn the latest ECMAScript or Typescript and just go for it!
I mean, if I had infinite time, I'd love to! (Among infinite other projects.)
But keep in mind Cap'n Proto is not something I put out as a product. This confuses people a bit, but I don't actually care about driving Cap'n Proto adoption. Rather, Cap'n Proto is a thing I built initially as an experiment, and then have continued to develop because it has been really useful inside my other projects. But that means I only work on the features that are needed by said other projects. I welcome other people contributing the things they need (including supporting other languages) but my time is focused on my needs.
My main project (for the past 7 years and foreseeable future) is Cloudflare Workers, which I started and am the lead engineer of. To be blunt, Workers' success pays me money, Cap'n Proto's doesn't. So I primarily care about Cap'n Proto only to the extent it helps Cloudflare Workers.
Now, the Workers Runtime uses Cap'n Proto heavily under the hood, and Workers primarily hosts JavaScript applications. But, the runtime itself is written in C++ (and some Rust), and exposing capnp directly to applications hasn't seemed like the right product move, at least so far. We did recently introduce an RPC system, and again it's built on Cap'n Proto under the hood, but the API exposed to JavaScript is schemaless, so Cap'n Proto is invisible to the app:
https://blog.cloudflare.com/javascript-native-rpc
We've toyed with the idea of exposing schemaful Cap'n Proto as part of the Workers platform, perhaps as a way to communicate with external servers or with WebAssembly. But, so far it hasn't seemed like the most important thing to be building. Maybe that will change someday, and then it'll become in Cloudflare's interest to have really good Cap'n Proto libraries in many languages, but not today.
It does with minor hacks, I have C++ application compiled with Emscripten using CapnProto RPC over WebSockets. That is, if you are mad enough to write webapps in C++...
My gripe with CapnProto is that it is inconvenient to use it as internal applications structures, either you write boilerplate to convert from/to application objects, or deal with clunky Readers, Builders, Orphanages, etc. But again, I probably gone too far by storing CapnProto objects inside database.
To me wire format implies framing etc, enough stuff to actually get it across a stream in a reasonable way. For pb this usually means some sort of length delimited framing you come up with yourself.
Similarily pb doesn't have a canonical file format for multiple encoded buffers.
For these reasons I rarely use pb as an interchange format, it's great for internal stuff and good if you want to do your own framing or file format but if you want to store and eventually process things with other things then you are better off with stuff like Avro which does define things like the Object Container Format.
I've been using protobuf for a (non-web) hobbyist project for some time now and find it fairly straightforward to use, especially when working across multiple implementation languages. For me, it seems to be a nice middle-ground between the ease of JSON and the efficiency of a hand-rolled serialization format.
Mind you, I can see why people used to weakly typed languages would prefer to just slam everything into JSON.
I can't say the wire format has ever been a problem for me directly. Newer formats have reduced some CPU overheads, but haven't pulled it all together the way official protobuf and gRPC ecosystems have.
From what I've seen the biggest problem with the wire format is that the framing for a nested message requires a varint size. You don't know how many bytes to set aside for that integer until you know how many bytes the nested message will serialize to, and this applies recursively. Without a hacky size cache [1], you get exponential runtime. Even BSON did better here; its nested document framing is fixed-size so it can always go back and patch it later with just an external size stack, no need for an intrusive size cache.
There are still benefits to the wire format, especially over JSON. For example, you get real 64-bit unsigned integers, and you can disable varint encoding for them (fixed64). It gives you a lot of opportunities for both accuracy and efficiency.
The bad news is that normal idiomatic use of protobuf and gRPC infect your code. It's designed for the code to be generated in very predictable standard ways, but those aren't necessarily the ways you actually want. Even if you decide to isolate proto to a corner of your project and use your own model the rest of the time, the transformation between proto types and your own types can cost you more memory allocations and copy more memory instead of sharing existing memory. So if you care about performance, you often have to design a whole project around protobuf end-to-end, infecting your code even more than usual.
With JSON in either Go or Rust, you can make your own custom types that serialize to JSON and these types instantly feel and work first-class. You own how the schema is mapped to code; often the only living schema is defined in code anyway. In most cases you can use your own types throughout your project and serialize them to JSON as needed without JSON itself infecting your code. This helps even more if you also involve formats like BSON because they can all coexist just fine for the same types, unlike protobuf which insists on its own types and generated code.
Even if you fully embrace protobuf throughout your project, there are other problems and limitations with the generated code. For example, in official protobuf for Go, there's no way to avoid heap allocating each individual nested or repeated message, and there's no way to avoid an absurd 5-machine-word overhead in every single message. (Hopefully will be brought down to 3 soon, but it's been 5 for years).
If you're designing a proto schema around these problems, you can seriously compromise readability and maintainability just to work around poor implementation decisions in Go protobuf itself. I'm guilty of that kind of optimization, but when my team saw the benchmarks numbers they agreed it was worth it. This is not the kind of decision you want to have to make in a technical project, but I emphasize that it can still be the right choice in many circumstances.
The prost crate for Rust is not official but already gives you more control over your schema. You can technically use your own Rust code instead of the generated code, though I don't see anyone actually do this and it doesn't seem to be encouraged. In any case, my biggest issue with prost is that it makes it difficult (or perhaps currently impossible) to share sub-messages with Arc [2], which on the other hand is trivial with Go just using *. While prost avoids allocations in more cases than Go protobuf, my experience has been that it avoids copies less, and some of those copies require allocations anyway.
I'm encouraged that Google's upcoming Rust library seems to be modeled after the C++ one and not the Go one. I haven't seen the latest work on it but I trust it's in a good place given how many collective decades of experience in protobuf implementation are going into it.
In summary, for a project that was explicitly designed for efficiency, in practice it can limit your code in ways that hurt efficiency more than they ever helped. And while generating code for many languages is a handy feature, that generated code is unlikely to be what you want, and the more you embrace proto throughout your project the more places you pay efficiency and maintainability tolls.
[1] https://github.com/protocolbuffers/protobuf-go/blob/1d4293e0...
[2] Not within one message to be sent to a client, because that information would be redundant. More for sharing some nested messages across bigger messages sent to multiple clients.
The thing that pisses me off about protobuf is that the wire format doesn’t distinguish between different types of binary data: if it said “this is an object binary” then we could decompose it, even if we didn’t have the protocol definition. As it is, could be a string or an array or an object.
For short-lived RepeatedFields this is non-issue - they die in Gen0 heap and GC just shrugs them off, but the cost is definitely there and is felt with all these new vector DB libraries passing 1536-long buffers of F32s.
Please don't add to the confusion around the term REST. These days most people just mean they use the GET/POST/PUT/DELETE verbs specifically, which is just using the HTTP protocol itself, no REST about it.
https://htmx.org/essays/how-did-rest-come-to-mean-the-opposi...
Run very fast from it, unless you have a VERY good reason to use it.
And on the performance side, parsing our data takes ~1 second in JSON vs ~0.03 seconds with protobuf (in python).
I do see how it could be premature optimization, as JSON is even quicker to get up and running, and the overhead of bigger payloads and parsing costs isn't relevant until you've achieved some scale.
I'm sure it works well for Google.
Also the support tools are lacking generally compared to say JSON or SQL or Python or any other technology.
Bugs were hard to diagnose-- I'm assuming if you were a grpc pro this would be ok.
Perhaps my use case was far more simple.