Go Protobuf: The New Opaque API (opens in new tab)

(go.dev)

287 pointssecure1y ago212 comments

212 comments

140 comments · 21 top-level

dpeckett1y ago· 25 in thread

To be honest I kind of find myself drifting away from gRPC/protobuf in my recent projects. I love the idea of an IDL for describing APIs and a great compiler/codegen (protoc) but there's just soo many idiosyncrasies baked into gRPC at this point that it often doesn't feel worth it IMO.

Been increasingly using LSP style JSON-RPC 2.0, sure it's got it's quirks and is far from the most wire/marshaling efficient approach but JSON codecs are ubiquitous and JSON-RPC is trivial to implement. In-fact I recently even wrote a stack allocated, server implementation for microcontrollers in Rust https://github.com/OpenPSG/embedded-jsonrpc.

Varlink (https://varlink.org/) is another interesting approach, there's reasons why they didn't implement the full JSON-RPC spec but their IDL is pretty interesting.

elcritch1y ago

My favorite serde format is Msgpack since it can be dropped in for an almost one-to-one replacement of JSON. There's also CBOR which is based on MsgPack but has diverged a bit and added and a data definition language too (CDDL).

Take JSON-RPC and replace JSON with MsgPack for better handling of integer and float types. MsgPack/CBOR are easy to parse in place directly into stack objects too. It's super fast even on embedded. I've been shipping it for years in embedded projects using a Nim implementation for ESP32s (1) and later made a non-allocating version (2). It's also generally easy to convert MsgPack/CBOR to JSON for debugging, etc.

There's also an IoT focused RPC based on CBOR that's an IETF standard and a time series format (3). The RPC is used a fair bit in some projects.

1: https://github.com/elcritch/nesper/blob/devel/src/nesper/ser... 2: https://github.com/EmbeddedNim/fastrpc 3: https://hal.science/hal-03800577v1/file/Towards_a_Standard_T...

bccdee1y ago

What I really like about protobuf is the DDL. Really clear schema evolution rules. Ironclad types. Protobuf moves its complexity into things like default zero values, which are irritating but readily apparent. With json, it's superficially fine, but later on you discover that you need to be worrying about implementation-specific stuff like big ints getting mangled, or special parsing logic you need to set default values for string enums so that adding new values doesn't break backwards compatibility. Json-schema exists but really isn't built for these sorts of constraints, and if you try to use json-schema like protobuf, it can get pretty hairy.

Honestly, if protobuf just serialized to a strictly-specified subset of json, I'd be happy with that. I'm not in it for the fast ser/de, and something human-readable could be good. But when multiple services maintained by different teams are passing messages around, a robust schema language is a MASSIVE help. I haven't used Avro, but I assume it's similarly useful.

perezd1y ago

The better stack rn is buf + Connect RPC: https://connectrpc.com/ All the compatibility, you get JSON+HTTP & gRPC, one platform.

jeffrallen1y ago

Software lives forever. You have to take the long view, not the "rn" view. In the long view, NFS's XDR or ASN.1 are just fine and could have been enough, if we didn't keep reinventing things.

1 more reply

jcmfernandes1y ago

I'm using connectrpc, and I'm a happy customer. I can even easily generate an OpenAPI schema for the "JSON API" using https://github.com/sudorandom/protoc-gen-connect-openapi

rochacon1y ago

ConnectRPC is very cool, thanks for sharing. I would like to add 2 other alternatives that I like:

- dRPC (by Storj): https://drpc.io (also compatible with gRPC)

- Twirp (by Twitch): https://github.com/twitchtv/twirp (no gRPC compatibility)

bbkane1y ago

Buf seems really nice, but I'm not completely sure what's free and what's not with the Buf platform, so I'm hesitant to make it a dependency for my little open source side project ideas. I should read the docs a bit more.

1 more reply

crabmusket1y ago

> I love the idea of an IDL for describing APIs and a great compiler/codegen (protoc)

Me too. My context is that I end up using RPC-ish patterns when doing slightly out-of-the-ordinary web stuff, like websockets, iframe communications, and web workers.

In each of those situations you start with a bidirectional communication channel, but you have to build your own request-response layer if you need that. JSON-RPC is a good place to start, because the spec is basically just "agree to use `id` to match up requests and responses" and very little else of note.

I've been looking around for a "minimum viable IDL" to add to that, and I think my conclusion so far is "just write out a TypeScript file". This works when all my software is web/TypeScript anyway.

dpeckett1y ago

Now that's an interesting thought, I wonder if you could use a modified subset of TypeScript to create a IDL/DDL for JSON-RPC. Then compile that schema into implementations for various target languages.

IggleSniggle1y ago

Typia kinda does this, but currently only has a Typescript -> Typescript compiler.

crabmusket1y ago

Yeah that's what I'd look into. Maybe TS -> Json Schema -> target language.

girvo1y ago

Same; at my previous job for the serialisation format for our embedded devices over 2G/4G/LoRaWAN/satellite I ended up landing on MessagePack, but that was partially because the "schema"/typed deserialisation was all in the same language for both the firmware and the server (Nim, in this case) and directly shared source-to-source. That won't work for a lot of cases of course, but it was quite nice for ours!

hansvm1y ago

> efficiency

State of the art for both gzipped json and protobufs is a few GB/s. Details matter (big strings, arrays, and binary data will push protos to 2x-10x faster in typical cases), but it's not the kind of landslide victory you'd get from a proper binary protocol. There isn't much need to feel like you're missing out.

jlouis1y ago

The big problem with Gzipped JSON is that once unzipped, it's gigantic. And you have to parse everything, even if you just need a few values. Just the memory bottleneck of having to munch through a string in JSON is going to slow down your parser by a ton. In contrast, a string in Protobuf is length-encoded.

5-10x is not uncommon, and that's kissing an order of magnitude difference.

hansvm1y ago

> have to parse everything, even for just a few values

That's true of protobufs as much as it is for json, except for skipping over large submessages.

> memory bottleneck

Interestingly, JSON, gzipped JSON, and protobufs are all core-bound parsing operations. The culprit is, mostly, a huge data dependency baked into the spec. You can unlock another multiplicative 10x-30x just with a better binary protocol.

> 5-10x is not uncommon

I think that's in line with what I said. You typically see 2x-10x, sometimes more (arrays of floats, when serialized using the faster of many equivalent protobuf wire encodings, are pathologically better for protos than gzipped JSON), sometimes less. They were aware of and worried about some sort of massive perf impact and choosing to avoid protos anyway for developer ergonomics, so I chimed in with some typical perf numbers. It's better (perf-wise) than writing a backend in Python, but you'll probably still be able to measure the impact in real dollars if you have 100k+ QPS.

imtringued1y ago

Yeah this is something people don't seem to want to get into their heads. If all you care is minimizing transferred bytes, then gzip+JSON is actually surprisingly competitive, to the point where you probably shouldn't even bother with anything else.

Meanwhile if you care about parsing speed, there is MessagePack and CBOR.

If any form of parsing is too expensive for you, you're better off with FlatBuffers and capnproto.

Finally there is the holy grail: Use JIT compilation to generate "serialization" and "deserialization" code at runtime through schema negotiation, whenever you create a long lived connection. Since your protocol is unique for every (origin, destination) architecture+schema tuple, you can in theory write out the data in a way that the target machine can directly interpret as memory after sanity checking the pointers. This could beat JSON, MessagePack, CBOR, FlatBuffers and capnproto in a single "protocol".

And then there is protobuf/grpc, which seems to be in this weird place, where it is not particularly good at anything.

lowbloodsugar1y ago

Except gzip is tragically slow, so crippling protobuf by running it through gzip could indeed slow it down to json speeds.

hansvm1y ago

"gzipped json" vs "protobuf"

1 more reply

ajross1y ago

That's sort of where I've landed too. Protobufs would seem to fit the problem area well, but in practice the space between "big-system non-performance-sensitive data transfer metaformat"[1] and "super-performance-sensitive custom binary parser"[2] is... actually really small.

There are just very few spots that actually "need" protobuf at a level of urgency that would justify walking away from self-describing text formats (which is a big, big disadvantage for binary formats!).

[1] Something very well served by JSON

[2] Network routing, stateful packet inspection, on-the-fly transcoding. Stuff that you'd never think to use a "standard format" for.

bboygravity1y ago

Add "everything that communicates with a microcontroller" to 2.

That means potentially: the majority of devices in the world.

thadt1y ago

Perhaps surprisingly, I think microcontrollers may be a place where Protobufs are not a bad fit. Using something like Nanopb [1] gives you the size/speed/flexibility advantages of protocol buffers without being too heavyweight. It’ll be a bit slower than your custom binary protocol, but it comes with quite a few advantages, depending on the context.

[1] https://github.com/nanopb/nanopb

1 more reply

malkia1y ago

Apart from being text format, I'm not sure how well JSON-RPC handles doubles vs long integers and other types, where protobuf can be directed to handle them appropriately. That is a problem in JSON itself, so you may neeed to encode some numbers using... "string"

dpeckett1y ago

I'd say the success of REST kind of proves that's something that for the most part can be worked around. Often comes down to the JSON codec itself, many codecs will allow unmarshalling/marshalling fields straight into long int types.

Also JS now has BigInt types and the JSON decoder can be told to use them. So I'd argue it's kind of a moot point at this stage.

5 more replies

mirekrusin1y ago

Also json parsers are crazy fast nowadays, most people don't realize how fast they are.

Cthulhu_1y ago

While true, it's still a text and usually http/tcp based format; data -> json representation -> compression? -> http -> tcp -> decompression -> parsing -> data. Translating to / from a text just feels inefficient.

1 more reply

jeffbee1y ago· 13 in thread

Protobuf 3 was bending over backwards to try to make the Go API make sense, but in the process it screwed up the API for C++, with many compromises. Then they changed course and made presence explicit again in proto 3.1. Now they are saying Go gets a C++-like API.

What I'd like is to rewind the time machine and undo all the path-dependent brain damage.

sa461y ago

When I was at Google around 2016, there was a significant push to convince folks that the proto3 implicit presence was superior to explicit presence.

Is there a design doc with the rationale for switching back to explicit presence for Edition 2023?

The closest docs I've found are https://buf.build/blog/protobuf-editions-are-here and https://github.com/protocolbuffers/protobuf/tree/main/docs/d....

akshayshah1y ago

Best bet is likely https://github.com/protocolbuffers/protobuf/blob/main/docs/f..., which predates editions.

jeffbee1y ago

I was only there for the debate you mentioned and not there for the reversal, so I dunno.

IX-1031y ago

I wasn't there for the debate, but was there for the reversal. I don't remember there being anything explicitly said about it. The only thing I can think of is that I know of some important projects that couldn't migrate to proto3 because of this implicit field issue. So some people were still writing new code with proto2.

1 more reply

jcdavis1y ago

> it screwed up the API for C++, with many compromises

The implicit presence garbage screwed up the API for many languages, not just C++

What is wild is how obviously silly it was at the time, too - no hindsight was needed.

kubb1y ago

It was but when the wrong fool gets a say, they will mess a perfectly good thing up for everyone.

Organizations often promote fools who don’t second guess their beliefs and think they have it all figured out.

dekhn1y ago

I work mainly in Python, it's always seemed really bad that there are 3 main implementations of Protobufs, instead of the C++ being the real implementation and other platforms just dlopen'ing and using it (there are a million software engineering arguments around this; I've heard them all before, have my own opinions, and have listened to the opinions of people I disagree with). It seems like the velocity of a project is the reciprocal of the number of independent implementations of a spec because any one of the implementations can slow down all the implementations (like what happened with proto3 around required and optional).

From what I can tell, a major source of the problem was that protobuf field semantics were absolutely critical to the scaling of google in the early days (as an inter-server protocol for rapidly evolving things like the search stack), but it's also being used as a data modelling toolkit (as a way of representing data with a high level of fidelity). And those two groups- along with the multiple language developers who don't want to deal with native code- do not see eye to eye, and want to drive the spec in their preferred direction.

(FWIW nowadays I use pydantic for type descriptions and JSON for transport, but I really prefer having an external IDL unrelated to any specific programming language)

lmm1y ago

> It seems like the velocity of a project is the reciprocal of the number of independent implementations of a spec because any one of the implementations can slow down all the implementations (like what happened with proto3 around required and optional).

Velocity and stability/maturity are in tension, sure. I think for a foundational protocol like protobuf you want the stability and reliability that come from multiple independent implementations more than you want it to be moving fast and breaking things.

elcritch1y ago

Additionally calling C++ from other languages is a pain and you're forced to make a bridge C API. Doing so from Go is even less than ideal from what I gather. It requires using cgo and forces Go to interface with C call stacks, slows down the compilation, etc.

charleslmunger1y ago

You'll be thrilled to hear about upb then, which was designed to be embeddable to power other languages without a from-scratch implementation - and now powers python protos.

https://github.com/protocolbuffers/protobuf/tree/main/upb

sbrother1y ago

I still use proto2 if possible. The syntactic sugar around `oneof` wasn't nice enough to merit dealing with proto3's implicit presence -- maybe it is just because I learned proto2 with C++ and don't use Go, but proto3 just seemed like a big step back and introduced footguns that weren't there before. Happy to hear they are reverting some of those finally.

ein0p1y ago

Nice to see my comments on their proto3 design doc vindicated, lol. There were a lot of comments on that doc, far more than what you'd usually see. Some of those comments dealt with the misguided decision to basically drop nullability (that is, the `has_` methods) that proto2 had. The team then just deleted all the comments and disabled commenting on the doc and proceeded with their original design much to the consternation of their primary stakeholders.

boulos1y ago

That's not how I remember it. I thought proto3 was all about JSON compatibility. No?

the_gipsy1y ago· 13 in thread

> syntax = "proto2" uses explicit presence by default

> syntax = "proto3" used implicit presence by default (where cases 2 and 3 cannot be distinguished and are both represented by an empty string), but was later extended to allow opting into explicit presence with the optional keyword

> edition = "2023", the successor to both proto2 and proto3, uses explicit presence by default

The root of the problem seems to be go's zero-values. It's like putting makeup on a pig, your get rid of null-panics, but the null-ish values are still everywhere, you just have bad data creeping into every last corner of your code. There is no amount of validation that can fix the lack of decoding errors. And it's not runtime errors instead of compile-time errors, which can be kept in check with unit tests to some degree. It's just bad data and defaulting to carry on no matter what, like PHP back in the day.

9rx1y ago

> It's like putting makeup on a pig, your get rid of null-panics

How so? In Go, nil is the zero value for a pointer and is ripe for panic just like null. Zero values do not avoid that problem at all, nor do they intend to.

bbatha1y ago

Ill give you that nil is a fine default for pointers, and pointer like things (interfaces, maps, slices). Its mostly fine to use empty string. However 0 has semantic meaning for just about every serialized numeric type I've ever encountered. The zero value also does really poorly for PUT style apis, "did the user forget to send this or did they mean to set this field to empty string" is very poorly expressed in Go and often has footguns around adding new fields.

2 more replies

the_gipsy1y ago

That's just that they picked a worse case of zero value for slices and maps, presumably for performance gains.

9rx1y ago

The slice type is an implicit struct, in the shape:

    struct {
        data uintptr  
        len  int      
        cap  int      
    }

Which is usable when the underlying memory is set to zero. So its zero value is really an empty slice. Most languages seem to have settled on empty slice, array, etc. as the initialized state just the same. I find it interesting you consider that the worst case.

Maps are similar, but have internal data structures that require initialization, thus cannot be reliably used when zeroed. This is probably not so much a performance optimization as convention. You see similar instances in the standard library. For example:

    var file os.File
    file.Read([]byte{}) // panics; file must be initialized first.

1 more reply

tsimionescu1y ago

It should be noted that slices and maps are completely opposite ends of how they behave in relation to nil in Go. A nil slice is just an empty slice, there is no operation you could do with one that will fail if done with the other. In contrast, a nil map doesn't support any operation whatsoever, it will panic on doing anything with it.

1 more reply

delusional1y ago

I don't think the reason for zero values has anything to do with "avoiding null panics". If you want to inline the types, that is avoid using most of your runtime on pointer chasing, you can't universally encode a null value. If I'm unclear, ask yourself: What would a null int look like?

If what you wanted was to avoid null-panics, you can define the elementary operations on null. Generally null has always been defined as aggressively erroring, but there's nothing stopping a language definition from defining propagation rules like for float NaN.

the_gipsy1y ago

Sorry, I don't follow you. If you don't have zero values, you either have nulls and panics, or you have some kind of sum-type á la Option<T> and cannot possibly construct null or zero-ish values.

Is there a way to have your cake and eat it too, and are there real world examples of it?

delusional1y ago

You're thinking in abstract terms, I'm talking about the concrete implementation details. If we, just as an example, take C. and int can never be NULL. It can be 0, compilers will sometimes tell you it's "uninitialized", but it can never be NULL. all possible combinations of bit patterns are meaningfully int.

Pointers are different in that we've decided that the pattern where all bits are 0 is a value that indicates that it's not valid. Note that there's nothing in the definition of the underlying hardware that required this. 0 is an address just like any other, and we could have decided to just have all pointers mean the same thing, but we didn't.

The NULL is just a language construct, and as a language construct it could be defined in any way you want it. You could defined your language such that dereferencing NULL would always return 0. You could decide that doing pointer arithmetic with NULL would yield another NULL. At the point you realize that it's just language semantics and not fundamental computer science, you realize that the definition is arbitrary, and any other definition would do.

As for sum-types. You can't fundamentally encode any more information into an int. It's already completely saturated. What a sumtype does, at a fundamental level, is to bundle your int (which has a default value) with a boolean (which also has a default value) indicating if your int is valid. There's some optimizations you can do with a "sufficiently smart compiler" but like auto vectorization, that's never going to happen.

I guess my point can be boiled down to the dual of the old C++ adage. Resource Allocation is NOT initialization. RAINI.

1 more reply

bborud1y ago

No, there isn't. It is just other versions of the same problem with people pretending it is somehow different.

People generally like to complain about NULL/nil whatever, but they rarely think about what the alternatives mean and what arrangements are completely equivalent. No matter what you do, you have to put some thought into design. Languages can't do the design work for programmers.

TheDong1y ago

There is a way to have your cake and eat it too: rust.

In rust, you have:

    let s = S{foo: 42, ..Default::default()};

You just got all the remaining fields of 'S' set to "zero-ish" values, and there's no NPEs.

The way you do this is by having types opt in to it, since zero values only make sense in some contexts.

In go, the way to figure out if a type has a meaningful zero value is to read the docs. Every type has a zero value, but a lot of them just nil-pointer-exception or do something completely nonsensical if you try to use them.

In rust, at compiletime you can know if something implements default or not, and so you can know if there's a sensible zero value, and you can construct it.

Go doesn't give you your cake, it gives you doc comments saying "the zero value is safe to use" and "the zero value will cause unspecified behavior, please don't do it", which is clearly not _better_.

3 more replies

masklinn1y ago

> or you have some kind of sum-type á la Option<T> and cannot possibly construct null or zero-ish values.

Option types specifically allow defaulting (to none) even if the wrapped value is not default-able.

You can very much construct null or zero-ish values in such a langage, but it’s not universal, types have to be opted into this capability.

1 more reply

tech1321y ago

From what I remember, proto3 behavior happened to map to objective c since iOS maps coincidentally happened at around the same time so they could be loud.

It was partially reverted with proto3 optional and fully reverted finally. Go's implementation happened to come around the same time as proto3 so allowed struct access, despite behaving quite differently when accessing nil fields. That is also finally reverted. Hopefully more lessons already learned from the Java days will come sooner than later going forward...

knodi1y ago

Yes, as much as I love Go and love working with it every day. This inner workings of Go with zero-values has been an design issue that comes up again and again and again.

parhamn1y ago· 9 in thread

It's interesting, to everyone but but the mega shops like Google, protobuf is a schema declaration tool. To the megashops its a performance tool.

For most of my projects, I use a web-framework I built on protobuf over the years but slowly got rid of a lot of the protobufy bits (besides the type + method declarations) and just switched to JSON as the wire format. http2, trailing headers, gigantic multi-MB files of getters, setters and embedded binary representations of the schemas, weird import behaviors, no wire error types, etc were too annoying.

Almost every project I've tracked that tries to solve the declarative schema problem seems to slowly die. Its a tough problem an opinionated one (what to do with enums? sum types? defaults? etc). Anyone know of any good ones that are chugging along? OpenAPI is too resty and JSONSchema doesn't seem to care about RPC.

danans1y ago

> It's interesting, to everyone but but the mega shops like Google, protobuf is a schema declaration tool

There are lots of other benefits for non performance-oriented teams and projects: the codegen makes it language independent and it's pretty handy that you can share a single data model across all layers of your system.

If you don't care about the wire format, the standard JSON representation makes it pair well with JSON native databases, so you can get strict schema management without the need need for any clunky ORM.

Cthulhu_1y ago

That's assuming JSON native databases are fine for your use case though, but in practice it's only good for storing documents that don't need to be edited/queried much by the backend storing them.

danans1y ago

> in practice it's only good for storing documents that don't need to be edited/queried much by the backend storing them

Why aren't they good for that? They can have very high write throughput, they don't require ORMs, and they can be indexed and queried using standard database methods like the SQL language.

You can even enforce strict schemas on them if you want to, just as you would with an RDBMS.

notyourwork1y ago

At scale, the performance gains can be dramatic. For example, moving a json web service to CBOR, I was able to squeeze 15% more throughput out of existing hardware. When you’re dealing with hundreds, if not millions of requests per minute this can be financially prudent.

parhamn1y ago

I am curious about this, whats the difference between the fastest json libraries (e.g simdjson) and protobuf?

1 more reply

shepherdjerred1y ago

From Amazon: https://smithy.io/2.0/index.html (internally known as Coral)

bobnamob1y ago

nit: Coral and smithy are not comparable.

Coral is a schema definition language, yes. But it’s also a full rpc ecosystem.

Smithy at this point is only really an IDL that (in most cases, at least before I left) is “only” used to generate Coral models and then transitively Coral clients and services. The _vast_ majority of Amazon is still on “native” Coral

tinthedev1y ago

> OpenAPI is too resty

I'm curious as to why would you think that?

There's a bit of boilerplate in there if you want to use it for a naive implementation, but I don't find it exceedingly resty.

From my POV, using protobuf as a schema declaration tool (as opposed to being a performance tool) is blind follower behaviour. Getting over all the hurdles doesn't seem worth it for the payoff, and it only becomes less valuable when compared to all the OpenAPI tooling you could be enabling instead.

This being for a web-based problem, where we're solving schema declaration.

parhamn1y ago

> I'm curious as to why would you think that?

Huh? From the Swagger site:

"The OpenAPI Specification defines a standard interface to RESTful APIs..."

From their OpenAI Initiative:

"The OpenAPI Specifications provide a formal standard for describing HTTP APIs"

Not sure I care to understand your POV more given this obvious bit was missed by you.

kubb1y ago· 9 in thread

I hate this API and Go's handling of protocol buffers in general. Especially preparing test data for it makes for some of the most cumbersome and unwieldy files that you will ever come across. Combined with table driven testing you have thousands upon thousands of lines of data with an unbelievably long identifiers that can't be inferred (e.g. in array literals) that is usually copy pasted around and slightly changed. Updating and understanding all of that is a nightmare and if you miss a coma or a brace somewhere, the compiler isn't smart enough to point you to where so you get lines upon lines of syntax errors. But, being opaque has some advantages for sure.

GeneralMayhem1y ago

I find that the best way to set up test cases, regardless of language, is usually to use string constants in the proto text format (https://protobuf.dev/reference/protobuf/textformat-spec/). For arrays, and especially for oneofs, it's way less verbose than how things are represented in Go, C++, or Java, and generally at least on par with the Python constructors. Maps are the only thing that suffer a bit, because they're represented as a list of key-val pairs (like in the wire format) instead of an actual map. Your language's compiler won't help you debug, but the parse-text-proto function can point to the source of the issue on a line/character level.

With Go generics - and equivalent in most other languages - you can write a 5-line helper function that takes a string in that format and either returns a valid proto value (using the generic type param to decide which type to unmarshal) or `t.Fatal()`s. You would never do this in production code, but as a way to represent hand-written proto values it's pretty hard to beat.

kubb1y ago

Unless someone with authority in your workplace makes a rule against doing that…

GeneralMayhem1y ago

If your problem is humans making arbitrary and nonsensical decisions about how you can do your job, then you have a non-technical problem, and it's unlikely that any technical solution will solve it.

1 more reply

atombender1y ago

The generated Go code situation has always been wild to me. For example, every message embeds protoimpl.MessageState and a bunch of other types, which contain mutexes. That means proto structs cannot be copied or compared byte-for-byte like normal Go structs can.

For several years I used the GoGo Protobuf SDK. It was vastly superior to the awful Javaesque Go code that the official compiler generated. It allowed structs to be pure data structs, was much more performant, and supported a bunch of options to generate native-feeling, ergonomic Go code.

But the Google team refused to partake in any such improvements, and GoGo was shut down as the burden of following the upstream implementation became too big. [1]

I'm not an expert, but as far as I understand, the extra struct junk is mostly to avoid having a parallel set of types for metadata (including reflection). It's unclear to me why these can't simply be generated as internal types with some nice API on top.

Clearly the new field metadata adds to this extra information, and the Go team is moving in the opposite direction of what I thought the future was — they're doubling down on stuffing metadata into the structs, and making the structs bigger and even less wieldy.

I understand how this might make things more performant, but I was hoping this sort of thing could be solved with the type system, especially now that we have generics. For example, surely lazy field access could be done like this:

    type Info struct {
      User LazyProto[User]
    }

    userName := user.Get().Name

[1] https://x.com/awalterschulze/status/1584553056100057088

lalaithion1y ago

We put test inputs and outputs in testdata/test_name.in.textpb and testdata/test_name.out.textpb, respectively. Way nicer than defining both your inputs and your desired outputs in go code, even compared to not using protobuf at all, to the point where we occasionally write some proto definitions just for test inputs and outputs.

alienchow1y ago

The testing practice I've seen is to have a testdata/ directory with a bunch of textprotos for different test cases. If you're using Bazel, just include the entire directory glob as data dependency for the unit tests. The test tables are essentially just appropriately named textproto filenames that are unmarshaled into the proto message to be tested.

Then again I've also seen people do these thousand line in-code literal string protos which really grind my gears.

throwaway8943451y ago

I haven't used protocol buffers, but in general any kind of code generation produces awful code. I much prefer generating the machine spec (protocol buffers, in this case) from Go code rather than the other way around. It's not a perfect solution, but it's much better than dealing with generated code in my experience.

MobiusHorizons1y ago

Generated code tends to look very formulaic, but it doesn’t have to be unreadable. As a primative it is incredibly powerful, and can be easier to maintain than alternatives. You definitely need good build tooling though. Ideally you won’t need to look at the generated code and can infer the interface from the input file.

throwaway8943451y ago

It's pretty hard to generate formulaic Go code while ensuring that, for example, methods don't conflict with exported member names, or in cases when you're making identifiers from multiple words, that the identifiers for `FooBar Baz` does not conflict with the identifier for `Foo BarBaz` since both of these would naturally be rendered in Go as `FooBarBaz`. Or even how you model an optional type: in Go, an idiomatic optional may use nil for reference types, -1 for nonnegative integers, an empty string for nonempty strings, or a `(T, bool)` tuple. You can definitely make a code generator that does a decent job at modeling all of these things, but I've never seen one--they usually try to pick a rule that works for all cases, like `Foo_BarBaz` or using an extra indirection for optionals, which means all cases are not-idiomatic.

strawhatguy1y ago· 9 in thread

Great, now there's an API per struct/message to learn and communicate throughout the codebase, with all the getters and setters.

A given struct is probably faster for protobuf parsing in the new layout, but the complexity of the code probably increases, and I can see this complexity easily negating these gains.

secureOP1y ago

> Great, now there's an API per struct/message to learn and communicate throughout the codebase, with all the getters and setters.

No, the general idea (and practical experience, at least for projects within Google) is that a codebase migrates completely from one API level to another. Only larger code bases will have to deal with different API levels. Even in such cases, your policy can remain “always use the Open API” unless you are interested in picking up the performance gains of the Opaque API.

jrockway1y ago

I always used the getters anyway. Given:

   message M {
      string foo = 1;
   }
   message N {
       M bar = 2;
   }

I find (new(M)).Bar.Foo panicking pretty annoying. So I just made it a habit to m.GetBar().GetFoo() anyway. If m.GetBar().SetFoo() works with the new API, that would be an improvement.

There are some options like nilaway if you want static analysis to prevent you from writing this sort of code, but it's difficult to retrofit into an existing codebase that plays a little too fast and loose with nil values. Having code authors and code reviewers do the work is simpler, though probably less accurate.

The generated code's API has never really bothered me. It is flexible enough to be clever. I especially liked using proto3 for data types and then storing them in a kv store with an API like:

   type WithID interface { GetId() []byte }

   func Put(tx *Tx, x WithID) error { ... }
   func Get(tx *Tx, id []byte) (WithId, error) { ... }

The autogenerated API is flexible enough for this sort of shenanigan, though it's not something I would recommend except to have fun.

hellcow1y ago

I'd recommend transforming protobuf types to domain types at your API boundary. Then you have domain types through the whole application.

AYBABTME1y ago

I found that this ends up being a giant amount of useless code, and a ton of memory allocation noise, that only satisfied my desire for elegance. I've given up that approach and just use protobuf types throughout as the base type. I got sick of writing dumb conversion funcs.

mrbadguy1y ago

It’s fairly mindless boilerplate for sure, but it does mean that when something happens that causes a change like this protobuf update, the change in your codebase is isolated just to the interface between it and your code ie your dumb conversion funcs. Otherwise you end up with the problem the original commenter had.

It’s good to isolate your dependencies within the code :)

mxey1y ago

At which point I loose all the benefits of lazy decoding that the accessor methods can provide, so I could just decode directly into a sensible struct, except you can’t with Protobuf.

mort961y ago

Accessor methods aren't for lazy decoding but for more efficient memory layouts.

2 more replies

matrix871y ago

I've done this, it only makes sense to me if you're trying to recycle some legacy code that's already using the domain types. Or else there's a bunch of extra conversion logic and unnecessary copying, feels like an antipattern

mort961y ago

I mean calling it "a new API per message" is a bit of an exaggeration... the "API" per message is still the same: something with some set of attributes. It's just that those attributes are now set and accessed with getters and setters (with predictable names) rather than as struct fields. Once you know how to access fields on protobuf types in general, all message-specific info you need is which fields exist and what their types are, which was the case before too.

kyrra1y ago· 6 in thread

The opaque API brings some niceties that other languages have, specifically about initialization. The Java impl for protobuf will never generate a NullPointerException, as calling `get` on a field would just return the default instance of that field.

The Go OpenAPI did not do this. For many primative types, it was fine. But for protobuf maps, you had to check if the map had been initialized yet in Go code before accessing it. Meaning, with the Opaque API, you can start just adding items to a proto map in Go code without thinking about initialization. (as the Opaque impl will init the map for you).

This is honestly something I wish Go itself would do. Allowing for nil maps in Go is such a footgun.

ynniv1y ago

The Java impl for protobuf will never generate a NullPointerException, as calling `get` on a field would just return the default instance of that field.

This was a mistake. You still want to check whether it was initialized most of the time, and when you do the wrong thing it's even more difficult to see the error.

kyrra1y ago

Depends on your use. If you are parsing a message you just received, I agree that you want to do a "has" check before accessing a field. But when constructing a message, having to manually create all the options is really annoying. (I do love the java builder pattern for protos).

But I do know the footgun of calling "get" on a Java Proto Builder without setting it, as that actually initializes the field to empty, and could call it to be emitted as such.

Such are the tradeoffs. I'd prefer null-safety to accidental field setting (or thinking a field was set, when it really wasn't).

2 more replies

usrnm1y ago

It's so fun to watch go devs rediscover all the patterns that they so happily threw out in the beginning. It's like watching a person grow up from a sunny little kid to a mature disgruntled alcoholic.

rad_gruchalski1y ago

An alternative explanation is that non-go people got their hands on go and complain that go is not x or y.

Like with generics. Now they’re in go. They’re not great, they have some sense but they may as well not exist as far as I’m concerned. I find them pretty useless anyway without lower and upper type bounds.

mxey1y ago

Which non-Go people brought generics into Go?

1 more reply

the_gipsy1y ago

> The Java impl for protobuf will never generate a NullPointerException, as calling `get` on a field would just return the default instance of that field.

This is NOT the solution lmao

g0ld3nrati01y ago· 6 in thread

just curious, why do use protobuf instead of flatbuffers?

akshayshah1y ago

The whole FlatBuffers toolchain is wildly immature compared to Protobuf. Last I checked, flatc doesn’t even have a plugin system - all code generators had to be upstreamed into the compiler itself.

mort961y ago

Isn't protoc mostly the same? I mean I know the code generators are separate binaries (which is quite annoying frankly) but protoc needs to know of them all upstream to expose options like --go_out and --go_opt, right?

2 more replies

majormajor1y ago

If you work with both the ergonomic advantages of protobufs become quickly apparent - starting the first time you nest things a few times. Unless you are very very frequently not going to deserialize your entire messages and so can get huge benefits from the better selective-deser of only what a given consumer cares about at a certain time, I find using flatbuffers hard to justify.

tonyhart71y ago

Yeah idk why we didnt just send binary data representation therefore eliminate entire serialize and deserialize part

tsimionescu1y ago

Which binary data representation? If I'm sending a Java object, do you think a C program will be able to just use it? Or for that matter, do you think two different C++ implementations, maybe on different platforms, will use the same binary representation of a class object?

tonyhart71y ago

just need an standard for that

1 more reply

lakomen1y ago· 6 in thread

Graphql won the race for me. Grpc is no longer relevant. Too many hurdles, no proper to and from Web support. You have to use some 3rd party non free service.

nicce1y ago

Aren’t their usecases completely different?

lmm1y ago

No, they're both possible choices for your basic client-server communication layer that you build everything else on. (I mean, technically gRPC rather than protobuf, but protobuf is the biggest part of gRPC).

_cenw1y ago

Intersects quite heavily if you're defining a schema for your API

revskill1y ago

What are differences ?

lakomen1y ago

Not at all.

pensatoio1y ago

Yes. The use cases are very different, as far as these things go. To say otherwise is borderline misinformation.

You can build services internally with gRPC and serve a public graphQL API that aggregates them.

1 more reply

cyberax1y ago· 4 in thread

BTW, if you care so much about performance, then fix the freaking array representation. It should be simple `[]SomeStruct` instead of `[]*SomeStruct`.

This one small change can result in an order of magnitude improvement.

aktau1y ago

It's true that this would perform better, and greatly reduce allocations. But:

  - Messages (especially opaque ones) are not supposed to be copied. The
    recommendation is to use `m := &mypb.Message{}`.
  - This would make migrating to use the opaque API more difficult, if the
    getters don't return the same type as the old open API fields, much more
    code needs to be rewritten, or some wrapper that allocates a new slice on
    every get.
  - Users expect that `subm := m.GetSubMessages()[2] ;
    m.SetSubMessages(append(m.GetSubMessages(), anotherm))) ; subm.SetInt(42) ;
    assert(subm.GetInt() == m.GetSubMessages()[2].GetInt())`. This would not be
    the case if the API returned a slice of values.
  - ...

Effectively, a slice of pointers is baked into the API, and the way people use protocol buffers in Go. For these reasons, it's not clear to me this would end up performing better or causing less work.

If we had returned an iterator (new in Go 1.23) instead of an actual slice, then it would've been possible to vary the underlying representation (slice-of-pointers, slice-of-value-chunks, ...). But there are other downsides to that too:

  - Allocations when passing iterators to functions that expect a slice.
  - Extra API surface for modifying the list (append, getn, len, ...).

Not that clear of a win either.

Another thing that could be considered is: when decoding, allocate a slice of values ([]mypb.Message), *and* a slice with pointers (or do it lazily): []*mypb.Message. Then initialize:

  for i := range valuel {
    ptrl[i] = &valuel[i] // TODO: verify that this escape doesn't cause disjoint allocations.
  }

That might be beneficial due to grouping allocations, and the user would be none the wiser.

cyberax1y ago

> - Messages (especially opaque ones) are not supposed to be copied.

So?

> - This would make migrating to use the opaque API more difficult

The opaque API is stupid to begin with. Now the objects are no longer threadsafe. You can't just read a message in one thread and process it in two different threads.

> Users expect

Then don't expect this. If you're breaking the API, then at least break it in a way that makes it better afterwards.

secureOP1y ago

> Now the objects are no longer threadsafe. You can't just read a message in one thread and process it in two different threads.

This is not correct. The Opaque API provides the same guarantees as before, meaning you can read a message in one goroutine and then access it (but not modify it) from other goroutines concurrently.

aktau1y ago

> > - Messages (especially opaque ones) are not supposed to be copied.

> So?

If you have a []mypb.Message, and range over it in the normal way:

  if _, m := range msgs {
    // Use.
  }

That makes a copy of the struct. This is not supported in general for the opaque API, even though it appears to work for standard use cases. The representation is meant to be opaque.

1 more reply

cyberax1y ago· 3 in thread

Thanks. I hate it.

Now you can not use normal Go struct initialization and you'll have to write reams of Set calls.

oefrha1y ago

Like the sibling said, there's a complimentary _builder struct generated with a Build() method. For instance, for the sample message in the blog post, here's the public API of the generated _builder:

  type LogEntry_builder struct {
   BackendServer *string
   RequestSize   *uint32
   IpAddress     *string
   // contains filtered or unexported fields
  }

  func (b0 LogEntry_builder) Build() *LogEntry

cyberax1y ago

So they managed to screw up even that. The naming system is not idiomatic Go.

You still will need to create temporary objects (performance...) and for an unclear gain.

1 more reply

xyse531y ago

It's not in the post but when this was rolled out internally at Google there was a corresponding builder struct to initialize from.

abtinf1y ago· 3 in thread

This looks like an attempt to turn Go into Java/C#.

I certainly won’t allow this to be used by the engineering teams under me.

Zababa1y ago

I don't think it is. Effective Go says that Go doesn't provide automatic support for getters and setters but there's nothing wrong with providing them yourself. Since in that case they are actually doing something (checking/updating the bitfield that contains the presence of each field), it makes sense to use them.

They are called `GetFoo()` instead of the idiomatic `Foo()`, but that is to ensure compatibility with the API where the fields are directly exposed as `Foo`, which also makes sense.

pensatoio1y ago

Why? I'm going to encourage my engineers and other teams to use it. Using this API would 100% have prevented bugs created by accessing the generated structs directly, especially in the presence of an optional value.

cpuguy831y ago

It's attempt to provide a much more efficient and harder to misuse implementation to a project used in tons of places.

remram1y ago· 2 in thread

> version: 2, 3, 2023 (released in 2024)

I call this Battlefield versioning, after the Battlefield video game series [1]. I bet the next version will be proto V.

[1]: in order: 1942, 2, 2142, 3, 4, 1, V, 2042

Cthulhu_1y ago

Oh but it's "syntax proto2 / 3", but "edition 2023" and beyond will supersede "syntax".

remram1y ago

syntax 2, syntax 3, edition 2023, flavor V, generation 1

matrix871y ago· 2 in thread

I recently used code-gen'd protobuf deser objects as the value type for an in-memory db and was considering flattening them into a more memory-efficient representation and using bitfields. That was for java though, not sure if they are doing the same thing there

Glad to see this change, for that use case it would've been perfect

fofoz1y ago

Have you considered this? https://flatbuffers.dev/

matrix871y ago

at the time, no, but this would've been perfect :/

tonymet1y ago· 2 in thread

why is code generation under-utilized? protobufs and other go tooling are great for code generation. Yet in practice i see few teams using it at scale.

Lots of teams creating rest / json APIs, but very few who use code generation to provide compile-time protection.

kevmo3141y ago

Code generation leaves a layer of abstraction between the API and the actual implementation which works great if that code generation is bug-free but if it's not, you're like... totally fucked. Most commonly people say you can read the generated code and step backwards but that's like saying you can read the compiled JavaScript and it's basically open source. That layer of abstraction is an underrated mental barrier.

Of course, code generation is still practical and I'm a lot more likely to trust a third-party writing a code generator like protobufs, OpenAPI specs, etc, but I would not trust an internal team to do so without a very good reason. I've worked on a few projects that lost hundreds of dev hours trying to maintain their code generator to avoid a tiny bit of copy/paste.

kccqzy1y ago

Code generation is under utilized because most people don't have a build system good enough for it. Traditional make is fine: you just define dependencies and rules. But a lot of people want to use language-specific build systems and these often don't have good support for code generation and dependency tracking for generated code.

Yet another subtlety is that when cross-compiling, you need to build the code generation tool for the local target always even though the main target could be a foreign architecture. And because the code generation tool and the main code could share dependencies, these dependencies need to be built twice for different targets. That again is something many build tools don't support.

alakra1y ago· 2 in thread

Is this like the FlatBuffers "zero-copy" deserialization?

mort961y ago

I'm not done reading the article yet, but nothing so far indicates that this is zero-copy, just a more efficient internal representation

kyrra1y ago

Nope. This is just a different implementation that greatly improves the speed in various ways.

neonsunset1y ago· 2 in thread

The absolute state of Go dragging down the entire gRPC stack with it. Oh well, at least we have quite a few competent replacements nowadays.

aktau1y ago

Can you be specific? I'm curious.

neonsunset1y ago

Of course not because you wouldn't listen :)

1 more reply

Naru411y ago· 2 in thread

Why not just use a naive struct from the beginning? memcpy is the fastest way to get serialize into a form that we can use in actual running program.

schmichael1y ago

The article goes into great detail about the benefits of an opaque api vs open structs. Somewhat unintuitively open structs are not necessarily the “fastest” largely due to pointers requiring heap allocations. Opaque APIs can also be “faster” due to lazy loading and avoiding memcpy altogether. The latter appears in libraries like flat buffers but not here IIRC.

akira25011y ago

> memcpy is the fastest way

To bake endianess and alignment requirements into your protocol.

favflam1y ago· 1 in thread

Oh, this is great. I just did an implementation in gRPC in Go whereby I had to churn through 10MB/s of data. I could not implement any kind of memory pool and thus I had a lot of memory allocation issues which lead to bad memory usage and garbage collection eating up my CPU.

alecthomas1y ago

This is probably what you want: https://github.com/planetscale/vtprotobuf

h4ch11y ago

Surprisingly I saw this on the front page mere minutes after deciding to use protobufs in my new project.

Currently I'm not quite sold on RPC since the performance benefits seem to show up on a much larger scale than what I am aiming for, so I'm using a proto schema to define my types and using protoc codegen to generate only JSON marshaling/unmarshaling + types for my golang backed and typescript frontend, with JSON transferred between the two using REST endpoints.

Seems to give me good typesafety along with 0 headache in serializing/deserializing after transport.

One thing I also wanted to do was generate SQL schemas from my proto definitions or SQL migrations but haven't found a tool to do so yet, might end up making one.

Would love to know if any HN folk have ideas/critique regarding this approach.

tuetuopay1y ago

I can’t wait to try this new Protobuf Enterprise Edition, with its sea of getters and setters ad nauseam. /s

However I can get behind it for the lazy decoding which seems nice, though I doubt its actual usefulness for serious software (tm). As someone else already mentioned, an actual serious api (tm) will have business-scope types to uncouple the api definition from the implementation. And that’s how you keep sane as soon as you have to support multiple versions of the api.

Also, a lot of the benefits mentioned for footgun reductions smell like workarounds for the language shortcomings. Memory address comparisons, accidental pointer sharing and mutability, enums, optional handling, etc are already solved problems and where something like rust shines. (Disclaimer: I run several grpc apis written in rust in prod)

j / k navigate · click thread line to collapse

212 comments

140 comments · 21 top-level

dpeckett1y ago· 25 in thread

Varlink (https://varlink.org/) is another interesting approach, there's reasons why they didn't implement the full JSON-RPC spec but their IDL is pretty interesting.

elcritch1y ago

There's also an IoT focused RPC based on CBOR that's an IETF standard and a time series format (3). The RPC is used a fair bit in some projects.

1: https://github.com/elcritch/nesper/blob/devel/src/nesper/ser... 2: https://github.com/EmbeddedNim/fastrpc 3: https://hal.science/hal-03800577v1/file/Towards_a_Standard_T...

bccdee1y ago

perezd1y ago

The better stack rn is buf + Connect RPC: https://connectrpc.com/ All the compatibility, you get JSON+HTTP & gRPC, one platform.

jeffrallen1y ago

Software lives forever. You have to take the long view, not the "rn" view. In the long view, NFS's XDR or ASN.1 are just fine and could have been enough, if we didn't keep reinventing things.

1 more reply

jcmfernandes1y ago

I'm using connectrpc, and I'm a happy customer. I can even easily generate an OpenAPI schema for the "JSON API" using https://github.com/sudorandom/protoc-gen-connect-openapi

rochacon1y ago

ConnectRPC is very cool, thanks for sharing. I would like to add 2 other alternatives that I like:

- dRPC (by Storj): https://drpc.io (also compatible with gRPC)

- Twirp (by Twitch): https://github.com/twitchtv/twirp (no gRPC compatibility)

bbkane1y ago

1 more reply

crabmusket1y ago

> I love the idea of an IDL for describing APIs and a great compiler/codegen (protoc)

Me too. My context is that I end up using RPC-ish patterns when doing slightly out-of-the-ordinary web stuff, like websockets, iframe communications, and web workers.

I've been looking around for a "minimum viable IDL" to add to that, and I think my conclusion so far is "just write out a TypeScript file". This works when all my software is web/TypeScript anyway.

dpeckett1y ago

IggleSniggle1y ago

Typia kinda does this, but currently only has a Typescript -> Typescript compiler.

crabmusket1y ago

Yeah that's what I'd look into. Maybe TS -> Json Schema -> target language.

girvo1y ago

hansvm1y ago

> efficiency

jlouis1y ago

5-10x is not uncommon, and that's kissing an order of magnitude difference.

hansvm1y ago

> have to parse everything, even for just a few values

That's true of protobufs as much as it is for json, except for skipping over large submessages.

> memory bottleneck

> 5-10x is not uncommon

imtringued1y ago

Meanwhile if you care about parsing speed, there is MessagePack and CBOR.

If any form of parsing is too expensive for you, you're better off with FlatBuffers and capnproto.

And then there is protobuf/grpc, which seems to be in this weird place, where it is not particularly good at anything.

lowbloodsugar1y ago

Except gzip is tragically slow, so crippling protobuf by running it through gzip could indeed slow it down to json speeds.

hansvm1y ago

"gzipped json" vs "protobuf"

1 more reply

ajross1y ago

[1] Something very well served by JSON

[2] Network routing, stateful packet inspection, on-the-fly transcoding. Stuff that you'd never think to use a "standard format" for.

bboygravity1y ago

Add "everything that communicates with a microcontroller" to 2.

That means potentially: the majority of devices in the world.

thadt1y ago

[1] https://github.com/nanopb/nanopb

1 more reply

malkia1y ago

dpeckett1y ago

Also JS now has BigInt types and the JSON decoder can be told to use them. So I'd argue it's kind of a moot point at this stage.

5 more replies

mirekrusin1y ago

Also json parsers are crazy fast nowadays, most people don't realize how fast they are.

Cthulhu_1y ago

1 more reply

jeffbee1y ago· 13 in thread

What I'd like is to rewind the time machine and undo all the path-dependent brain damage.

sa461y ago

When I was at Google around 2016, there was a significant push to convince folks that the proto3 implicit presence was superior to explicit presence.

Is there a design doc with the rationale for switching back to explicit presence for Edition 2023?

The closest docs I've found are https://buf.build/blog/protobuf-editions-are-here and https://github.com/protocolbuffers/protobuf/tree/main/docs/d....

akshayshah1y ago

Best bet is likely https://github.com/protocolbuffers/protobuf/blob/main/docs/f..., which predates editions.

jeffbee1y ago

I was only there for the debate you mentioned and not there for the reversal, so I dunno.

IX-1031y ago

1 more reply

jcdavis1y ago

> it screwed up the API for C++, with many compromises

The implicit presence garbage screwed up the API for many languages, not just C++

What is wild is how obviously silly it was at the time, too - no hindsight was needed.

kubb1y ago

It was but when the wrong fool gets a say, they will mess a perfectly good thing up for everyone.

Organizations often promote fools who don’t second guess their beliefs and think they have it all figured out.

dekhn1y ago

(FWIW nowadays I use pydantic for type descriptions and JSON for transport, but I really prefer having an external IDL unrelated to any specific programming language)

lmm1y ago

elcritch1y ago

charleslmunger1y ago

You'll be thrilled to hear about upb then, which was designed to be embeddable to power other languages without a from-scratch implementation - and now powers python protos.

https://github.com/protocolbuffers/protobuf/tree/main/upb

sbrother1y ago

ein0p1y ago

boulos1y ago

That's not how I remember it. I thought proto3 was all about JSON compatibility. No?

the_gipsy1y ago· 13 in thread

> syntax = "proto2" uses explicit presence by default

> edition = "2023", the successor to both proto2 and proto3, uses explicit presence by default

9rx1y ago

> It's like putting makeup on a pig, your get rid of null-panics

How so? In Go, nil is the zero value for a pointer and is ripe for panic just like null. Zero values do not avoid that problem at all, nor do they intend to.

bbatha1y ago

2 more replies

the_gipsy1y ago

That's just that they picked a worse case of zero value for slices and maps, presumably for performance gains.

9rx1y ago

The slice type is an implicit struct, in the shape:

    struct {
        data uintptr  
        len  int      
        cap  int      
    }

    var file os.File
    file.Read([]byte{}) // panics; file must be initialized first.

1 more reply

tsimionescu1y ago

1 more reply

delusional1y ago

the_gipsy1y ago

Sorry, I don't follow you. If you don't have zero values, you either have nulls and panics, or you have some kind of sum-type á la Option<T> and cannot possibly construct null or zero-ish values.

Is there a way to have your cake and eat it too, and are there real world examples of it?

delusional1y ago

I guess my point can be boiled down to the dual of the old C++ adage. Resource Allocation is NOT initialization. RAINI.

1 more reply

bborud1y ago

No, there isn't. It is just other versions of the same problem with people pretending it is somehow different.

TheDong1y ago

There is a way to have your cake and eat it too: rust.

In rust, you have:

    let s = S{foo: 42, ..Default::default()};

You just got all the remaining fields of 'S' set to "zero-ish" values, and there's no NPEs.

The way you do this is by having types opt in to it, since zero values only make sense in some contexts.

In rust, at compiletime you can know if something implements default or not, and so you can know if there's a sensible zero value, and you can construct it.

3 more replies

masklinn1y ago

> or you have some kind of sum-type á la Option<T> and cannot possibly construct null or zero-ish values.

Option types specifically allow defaulting (to none) even if the wrapped value is not default-able.

You can very much construct null or zero-ish values in such a langage, but it’s not universal, types have to be opted into this capability.

1 more reply

tech1321y ago

From what I remember, proto3 behavior happened to map to objective c since iOS maps coincidentally happened at around the same time so they could be loud.

knodi1y ago

Yes, as much as I love Go and love working with it every day. This inner workings of Go with zero-values has been an design issue that comes up again and again and again.

parhamn1y ago· 9 in thread

It's interesting, to everyone but but the mega shops like Google, protobuf is a schema declaration tool. To the megashops its a performance tool.

danans1y ago

> It's interesting, to everyone but but the mega shops like Google, protobuf is a schema declaration tool

Cthulhu_1y ago

That's assuming JSON native databases are fine for your use case though, but in practice it's only good for storing documents that don't need to be edited/queried much by the backend storing them.

danans1y ago

> in practice it's only good for storing documents that don't need to be edited/queried much by the backend storing them

Why aren't they good for that? They can have very high write throughput, they don't require ORMs, and they can be indexed and queried using standard database methods like the SQL language.

You can even enforce strict schemas on them if you want to, just as you would with an RDBMS.

notyourwork1y ago

parhamn1y ago

I am curious about this, whats the difference between the fastest json libraries (e.g simdjson) and protobuf?

1 more reply

shepherdjerred1y ago

From Amazon: https://smithy.io/2.0/index.html (internally known as Coral)

bobnamob1y ago

nit: Coral and smithy are not comparable.

Coral is a schema definition language, yes. But it’s also a full rpc ecosystem.

tinthedev1y ago

> OpenAPI is too resty

I'm curious as to why would you think that?

There's a bit of boilerplate in there if you want to use it for a naive implementation, but I don't find it exceedingly resty.

This being for a web-based problem, where we're solving schema declaration.

parhamn1y ago

> I'm curious as to why would you think that?

Huh? From the Swagger site:

"The OpenAPI Specification defines a standard interface to RESTful APIs..."

From their OpenAI Initiative:

"The OpenAPI Specifications provide a formal standard for describing HTTP APIs"

Not sure I care to understand your POV more given this obvious bit was missed by you.

kubb1y ago· 9 in thread

GeneralMayhem1y ago

kubb1y ago

Unless someone with authority in your workplace makes a rule against doing that…

GeneralMayhem1y ago

If your problem is humans making arbitrary and nonsensical decisions about how you can do your job, then you have a non-technical problem, and it's unlikely that any technical solution will solve it.

1 more reply

atombender1y ago

But the Google team refused to partake in any such improvements, and GoGo was shut down as the burden of following the upstream implementation became too big. [1]

    type Info struct {
      User LazyProto[User]
    }

    userName := user.Get().Name

[1] https://x.com/awalterschulze/status/1584553056100057088

lalaithion1y ago

alienchow1y ago

Then again I've also seen people do these thousand line in-code literal string protos which really grind my gears.

throwaway8943451y ago

MobiusHorizons1y ago

throwaway8943451y ago

strawhatguy1y ago· 9 in thread

Great, now there's an API per struct/message to learn and communicate throughout the codebase, with all the getters and setters.

A given struct is probably faster for protobuf parsing in the new layout, but the complexity of the code probably increases, and I can see this complexity easily negating these gains.

secureOP1y ago

> Great, now there's an API per struct/message to learn and communicate throughout the codebase, with all the getters and setters.

jrockway1y ago

I always used the getters anyway. Given:

   message M {
      string foo = 1;
   }
   message N {
       M bar = 2;
   }

I find (new(M)).Bar.Foo panicking pretty annoying. So I just made it a habit to m.GetBar().GetFoo() anyway. If m.GetBar().SetFoo() works with the new API, that would be an improvement.

The generated code's API has never really bothered me. It is flexible enough to be clever. I especially liked using proto3 for data types and then storing them in a kv store with an API like:

   type WithID interface { GetId() []byte }

   func Put(tx *Tx, x WithID) error { ... }
   func Get(tx *Tx, id []byte) (WithId, error) { ... }

The autogenerated API is flexible enough for this sort of shenanigan, though it's not something I would recommend except to have fun.

hellcow1y ago

I'd recommend transforming protobuf types to domain types at your API boundary. Then you have domain types through the whole application.

AYBABTME1y ago

mrbadguy1y ago

It’s good to isolate your dependencies within the code :)

mxey1y ago

At which point I loose all the benefits of lazy decoding that the accessor methods can provide, so I could just decode directly into a sensible struct, except you can’t with Protobuf.

mort961y ago

Accessor methods aren't for lazy decoding but for more efficient memory layouts.

2 more replies

matrix871y ago

mort961y ago

kyrra1y ago· 6 in thread

This is honestly something I wish Go itself would do. Allowing for nil maps in Go is such a footgun.

ynniv1y ago

The Java impl for protobuf will never generate a NullPointerException, as calling `get` on a field would just return the default instance of that field.

This was a mistake. You still want to check whether it was initialized most of the time, and when you do the wrong thing it's even more difficult to see the error.

kyrra1y ago

But I do know the footgun of calling "get" on a Java Proto Builder without setting it, as that actually initializes the field to empty, and could call it to be emitted as such.

Such are the tradeoffs. I'd prefer null-safety to accidental field setting (or thinking a field was set, when it really wasn't).

2 more replies

usrnm1y ago

rad_gruchalski1y ago

An alternative explanation is that non-go people got their hands on go and complain that go is not x or y.

mxey1y ago

Which non-Go people brought generics into Go?

1 more reply

the_gipsy1y ago

> The Java impl for protobuf will never generate a NullPointerException, as calling `get` on a field would just return the default instance of that field.

This is NOT the solution lmao

g0ld3nrati01y ago· 6 in thread

just curious, why do use protobuf instead of flatbuffers?

akshayshah1y ago

The whole FlatBuffers toolchain is wildly immature compared to Protobuf. Last I checked, flatc doesn’t even have a plugin system - all code generators had to be upstreamed into the compiler itself.

mort961y ago

2 more replies

majormajor1y ago

tonyhart71y ago

Yeah idk why we didnt just send binary data representation therefore eliminate entire serialize and deserialize part

tsimionescu1y ago

tonyhart71y ago

just need an standard for that

1 more reply

lakomen1y ago· 6 in thread

Graphql won the race for me. Grpc is no longer relevant. Too many hurdles, no proper to and from Web support. You have to use some 3rd party non free service.

nicce1y ago

Aren’t their usecases completely different?

lmm1y ago

_cenw1y ago

Intersects quite heavily if you're defining a schema for your API

revskill1y ago

What are differences ?

lakomen1y ago

Not at all.

pensatoio1y ago

Yes. The use cases are very different, as far as these things go. To say otherwise is borderline misinformation.

You can build services internally with gRPC and serve a public graphQL API that aggregates them.

1 more reply

cyberax1y ago· 4 in thread

BTW, if you care so much about performance, then fix the freaking array representation. It should be simple `[]SomeStruct` instead of `[]*SomeStruct`.

This one small change can result in an order of magnitude improvement.

aktau1y ago

It's true that this would perform better, and greatly reduce allocations. But:

  - Messages (especially opaque ones) are not supposed to be copied. The
    recommendation is to use `m := &mypb.Message{}`.
  - This would make migrating to use the opaque API more difficult, if the
    getters don't return the same type as the old open API fields, much more
    code needs to be rewritten, or some wrapper that allocates a new slice on
    every get.
  - Users expect that `subm := m.GetSubMessages()[2] ;
    m.SetSubMessages(append(m.GetSubMessages(), anotherm))) ; subm.SetInt(42) ;
    assert(subm.GetInt() == m.GetSubMessages()[2].GetInt())`. This would not be
    the case if the API returned a slice of values.
  - ...

  - Allocations when passing iterators to functions that expect a slice.
  - Extra API surface for modifying the list (append, getn, len, ...).

Not that clear of a win either.

Another thing that could be considered is: when decoding, allocate a slice of values ([]mypb.Message), *and* a slice with pointers (or do it lazily): []*mypb.Message. Then initialize:

  for i := range valuel {
    ptrl[i] = &valuel[i] // TODO: verify that this escape doesn't cause disjoint allocations.
  }

That might be beneficial due to grouping allocations, and the user would be none the wiser.

cyberax1y ago

> - Messages (especially opaque ones) are not supposed to be copied.

So?

> - This would make migrating to use the opaque API more difficult

The opaque API is stupid to begin with. Now the objects are no longer threadsafe. You can't just read a message in one thread and process it in two different threads.

> Users expect

Then don't expect this. If you're breaking the API, then at least break it in a way that makes it better afterwards.

secureOP1y ago

> Now the objects are no longer threadsafe. You can't just read a message in one thread and process it in two different threads.

This is not correct. The Opaque API provides the same guarantees as before, meaning you can read a message in one goroutine and then access it (but not modify it) from other goroutines concurrently.

aktau1y ago

> > - Messages (especially opaque ones) are not supposed to be copied.

> So?

If you have a []mypb.Message, and range over it in the normal way:

  if _, m := range msgs {
    // Use.
  }

That makes a copy of the struct. This is not supported in general for the opaque API, even though it appears to work for standard use cases. The representation is meant to be opaque.

1 more reply

cyberax1y ago· 3 in thread

Thanks. I hate it.

Now you can not use normal Go struct initialization and you'll have to write reams of Set calls.

oefrha1y ago

Like the sibling said, there's a complimentary _builder struct generated with a Build() method. For instance, for the sample message in the blog post, here's the public API of the generated _builder:

  type LogEntry_builder struct {
   BackendServer *string
   RequestSize   *uint32
   IpAddress     *string
   // contains filtered or unexported fields
  }

  func (b0 LogEntry_builder) Build() *LogEntry

cyberax1y ago

So they managed to screw up even that. The naming system is not idiomatic Go.

You still will need to create temporary objects (performance...) and for an unclear gain.

1 more reply

xyse531y ago

It's not in the post but when this was rolled out internally at Google there was a corresponding builder struct to initialize from.

abtinf1y ago· 3 in thread

This looks like an attempt to turn Go into Java/C#.

I certainly won’t allow this to be used by the engineering teams under me.

Zababa1y ago

They are called `GetFoo()` instead of the idiomatic `Foo()`, but that is to ensure compatibility with the API where the fields are directly exposed as `Foo`, which also makes sense.

pensatoio1y ago

cpuguy831y ago

It's attempt to provide a much more efficient and harder to misuse implementation to a project used in tons of places.

remram1y ago· 2 in thread

> version: 2, 3, 2023 (released in 2024)

I call this Battlefield versioning, after the Battlefield video game series [1]. I bet the next version will be proto V.

[1]: in order: 1942, 2, 2142, 3, 4, 1, V, 2042

Cthulhu_1y ago

Oh but it's "syntax proto2 / 3", but "edition 2023" and beyond will supersede "syntax".

remram1y ago

syntax 2, syntax 3, edition 2023, flavor V, generation 1

matrix871y ago· 2 in thread

Glad to see this change, for that use case it would've been perfect

fofoz1y ago

Have you considered this? https://flatbuffers.dev/

matrix871y ago

at the time, no, but this would've been perfect :/

tonymet1y ago· 2 in thread

why is code generation under-utilized? protobufs and other go tooling are great for code generation. Yet in practice i see few teams using it at scale.

Lots of teams creating rest / json APIs, but very few who use code generation to provide compile-time protection.

kevmo3141y ago

kccqzy1y ago

alakra1y ago· 2 in thread

Is this like the FlatBuffers "zero-copy" deserialization?

mort961y ago

I'm not done reading the article yet, but nothing so far indicates that this is zero-copy, just a more efficient internal representation

kyrra1y ago

Nope. This is just a different implementation that greatly improves the speed in various ways.

neonsunset1y ago· 2 in thread

The absolute state of Go dragging down the entire gRPC stack with it. Oh well, at least we have quite a few competent replacements nowadays.

aktau1y ago

Can you be specific? I'm curious.

neonsunset1y ago

Of course not because you wouldn't listen :)

1 more reply

Naru411y ago· 2 in thread

Why not just use a naive struct from the beginning? memcpy is the fastest way to get serialize into a form that we can use in actual running program.

schmichael1y ago

akira25011y ago

> memcpy is the fastest way

To bake endianess and alignment requirements into your protocol.

favflam1y ago· 1 in thread

alecthomas1y ago

This is probably what you want: https://github.com/planetscale/vtprotobuf

h4ch11y ago

Surprisingly I saw this on the front page mere minutes after deciding to use protobufs in my new project.

Seems to give me good typesafety along with 0 headache in serializing/deserializing after transport.

One thing I also wanted to do was generate SQL schemas from my proto definitions or SQL migrations but haven't found a tool to do so yet, might end up making one.

Would love to know if any HN folk have ideas/critique regarding this approach.

tuetuopay1y ago

I can’t wait to try this new Protobuf Enterprise Edition, with its sea of getters and setters ad nauseam. /s

j / k navigate · click thread line to collapse