There are advantage to being language-specific, but a lot of disadvantages, as well (speaking as someone who recently had to write some Elixir code to unmarshal a Ruby object...). It seems hard to introduce this since you're forcing all communicating services to be Go-based, which is kind of contrary to the independence that microservices usually affords you.
Some of the benefits are simply design goals (e.g., top level arrays) which could also be done in a language-independent protocol. And even performance questions probably could. Like, Cap'n Proto I think is designed so that users of the protocol don't have to serialize/deserialize the data, right? They just pass it around and work with it directly.
I can see Rob Pike being frustrated with Protocol Buffers at Google, and I don't begrudge anyone for taking a big shot like this, but I wonder if he's found any success with it.
[1] Maybe there was no suitable one when Go was first created. Nowadays I believe CBOR is the best format for this job.
And if your language of choice doesn't have a CBOR lib, CBOR is fairly easy to implement and writing a encoder/decoder is very fun! I recently completed my implementation for the Gerbil Scheme language last week [0].
But now the interoperable serialization formats have a lot more energy spent on tools and such, and the formats are better-specified, to the point that you probably want to use them even where interoperability with other stuff doesn't force it.
The thing is that at Google protobufs are used everywhere. Like, absolutely everywhere. Think of all the places they could be, and it's way more. All the tooling understands them, code search and go-to-reference works on them everywhere, they are truly transformational on how many different services with many different implementations interact.
Are they perfect? Far from it. But a Go-only implementation misses almost all the value of protobufs. If I was inventing them for Python I could do better (pickle? maybe not) but the whole point is that they aren't language/ecosystem/use-case specific. If this was Rob Pike's frustration then I can't help but feel he missed the point, or this post is a little disingenuous as to the benefits.
I've not seen Gobs used in Google, but I'd imagine an engineer would need to make a very strong case against using protos between services, regardless of if both services are in Go.
Disclaimer: an employee at the said moving company.
Back when he did his best work, it was possible for one person to "just write the new thing", without making it fit with anything else. There was nothing else to fit it with. You could invent everything from scratch, and not only was it not a waste of time, if you were good enough it had a chance of being the best fit for purpose.
You could take shortcuts. You could have every part of your system be "odd", because nothing was "even".
That's not true anymore. And the way I see it Pike has not moved on.
Science in general had this switch at one point, too. There was a point where one could know all of science. But it's long gone.
gob was his opinionated way of doing a Go-specific encoding, while also supporting any number of other encodings in the language. Go has incredibly good support for almost every popular encoding there is.
gob has also been used successfully by a number of projects. In many cases, it's a perfectly good way to encode a piece of data that is completely local to a Go program.
Did he miss the point, or was the point to light a fire under the protobuf team to improve their product? Which they eventually did (e.g. required fields were removed after this post was made).
By the way, just curious what is your opinion on MessagePack?
It panics on malformed input which is a no go for us since high availability is really important for us, and it showed quite a lot in the performance and memory profiles (roughly 5 times the time and memory as doing the same with JSON).
The code was converting some data to gob, and storing it in the database for later.
We now just do the same but in json, it’s human readable and Postgres validates that the data is valid JSON.
And unmarshaling it does not panic.
How is it possible that they were getting malformed input? This was happening in go-to-go communication, or was there some kind of cross-language interop?
Any idea why the performance was so much slower than JSON in your case? The technique described in the OP would seem to make that impossible.
Do you think it's possible the database column type or collation was somehow affecting the gob?
The docs explicitly mention that invalid input will make it panic and that can be confirmed by reading the code or fuzzing the input.
From my understanding, there is no compile time schema so everything is done with runtime reflection and that is bound to not be super fast. Granted, JSON is the same on paper, I would guess that the JSON package had more eyes on it and optimizations.
In our case, everything was using JSON except this one component due to some historical oddity so it was also a win in terms of simplifying.
Okay, but would you rather have it crash or allow for a program to run on the wrong data? Especially if you do that and then say that everything has zero as a default value.
The question remains whether the serialization format should be taking care of that, or a round of parsing later on with a schema on the side; but if you do the former without the latter you're setting yourself up for deployment nightmares
Protocols often represent an interface between organizations. Especially when that is the case, you want to be as charitable as possible when accepting input, because getting any issues resolved may very well require getting the two organizations to agree.
Also, as things change over time, an overly strict interpretation when receiving packets will require unnecessary rework in the future, and possibly down time or lost business.
When dealing with protocols, it's generally best to be strict when emitting packets and as tolerant as possible when accepting them.
I think it's more like, 'if you want to deal with the crash in test instead of having to justify why it crashed in prod.'
> Especially when that is the case, you want to be as charitable as possible when accepting input, because getting any issues resolved may very well require getting the two organizations to agree.
I don't think we've been fans of "be rigorous in what you emit and permissive in what you accept" since IE6 showed us the error of our ways. "Be rigorous in your implementation of a permissive spec" is as far as we should go.
Not if it's something sensitive either, where maybe crashing is preferable to running the wrong way
Also Rob Pike is a great technical writer. Another example of his style is "Effective Go":
Sometimes even when something was posted a few years ago some people just haven't seen it yet.
If you're sure that's all you'll ever have to do, then sure. But unless you're 100% certain that the protocol will never evolve further, having a more complex structure allows it to change in a gradual way.
Nobody knows the future, and preparing for the future is a huge part of software engineering. Sending top-level arrays instead of sending them inside a struct is never the right way.
While I understand the sentiment, I 100% disagree on the 'never' qualifier.
Note that even protobufs, which doesn't allow encoding simple types at the top level, still has this debate when deciding whether to encode an array of simple types (inside a struct) or an array of structs (inside a struct). And Google's guidance is to use an array of structs if more data might be needed in the future:
>However, if additional data is likely to be needed in the future, repeated fields should use a message instead of a scalar proactively, to avoid parallel repeated fields.
>// Good: A separate message that can grow to include more fields
https://protobuf.dev/programming-guides/api/#order-independe...
By the way, in Go there are a like a million JSON encoders because a lot of things in the std library are not really coded for maximum performance but more for easy of usage, it seems. Perhaps this is the right balance for certain things (ex: the http library, see [6]).
There are also a bunch of libraries that allow you to modify a JSON file "in place", without having to fully deserialize into structs (ex: GJSON/SJSON [7] [8]). This sounds very convenient and more efficient that fully de/serializing if we just need to change the data a little.
--
1: https://github.com/alecthomas/go_serialization_benchmarks
2: https://github.com/golang/go/issues/29766#issuecomment-45492...
--
3: https://github.com/goccy/go-json
4: https://github.com/vmihailenco/msgpack
5: https://github.com/fxamacker/cbor
--
6: https://github.com/valyala/fasthttp#faq
--
For a recent project, I needed a simple key-value store. I was evaluating using a full RDBMS, but I ended up just putting gob files in a directory.
Calling it "dead" just invites a tedious thread about what the definition of "dead" is, so I won't, I'll just sort of imply it in this sentence without actually coming out and saying it in a clear manner. I would generally both A: recommend against this, not necessarily as a dire warning, just, you know, a recommendation and B: for anyone who is perturbed by the idea of this existing, just be aware that it's not like this package has embedded itself into the Go ecosystem or anything.
Gobs of data - https://news.ycombinator.com/item?id=2365430 - March 2011 (2 comments)
I saw gob more as an experiment that the Go team used to check the reflect package's usability. (Which sucks anyway, by the way.)
I'm surprised it's still in the stdlib. I would have guessed it would have been removed for Go 1.0, because it was already clear then that it was not suitable for anything more experiments.