Gobs of data (2011) (opens in new tab)

(go.dev)

89 pointsash2y ago58 comments

58 comments

46 comments · 13 top-level

losvedir2y ago· 12 in thread

Interesting. I wonder to what extent it's found use at Google over this past decade.

There are advantage to being language-specific, but a lot of disadvantages, as well (speaking as someone who recently had to write some Elixir code to unmarshal a Ruby object...). It seems hard to introduce this since you're forcing all communicating services to be Go-based, which is kind of contrary to the independence that microservices usually affords you.

Some of the benefits are simply design goals (e.g., top level arrays) which could also be done in a language-independent protocol. And even performance questions probably could. Like, Cap'n Proto I think is designed so that users of the protocol don't have to serialize/deserialize the data, right? They just pass it around and work with it directly.

I can see Rob Pike being frustrated with Protocol Buffers at Google, and I don't begrudge anyone for taking a big shot like this, but I wonder if he's found any success with it.

lifthrasiir2y ago

Yeah, after years of dealing with language-specific serialization formats---and inadvertently learning internals of them (including Go gob, Python pickle and PHP serialize), I'm over. And gob is not even a schematic serialization format (i.e. not only you don't need to define a schema beforehand, you can't). There is some interesting idea, but that's all. Use a well-known schemaless serialization format with some extensibility [1] if you really need.

[1] Maybe there was no suitable one when Go was first created. Nowadays I believe CBOR is the best format for this job.

packetlost2y ago

I'm in the same boat. Not to mention security concerns that often crop up in (interpreted) language specific deserialization (I'm looking at you, pickle, thinly veiled `eval()`). I agree that CBOR should generally be the serialization tool of choice for self-describing data (ie. in places where you might otherwise choose JSON).

And if your language of choice doesn't have a CBOR lib, CBOR is fairly easy to implement and writing a encoder/decoder is very fun! I recently completed my implementation for the Gerbil Scheme language last week [0].

[0]: https://github.com/chiefnoah/gerbil-cbor

twotwotwo2y ago

Yep, I very much agree with this. It's probably inevitable that languages with reflection grow some kind of language-specific serialization format because a) they can and b) there's often some use case where it looks handy to have a format adapted to the quirks of the language. Plus, when some of these bespoke formats were created, the world hadn't converged as much as it has now around a few common text and binary formats.

But now the interoperable serialization formats have a lot more energy spent on tools and such, and the formats are better-specified, to the point that you probably want to use them even where interoperability with other stuff doesn't force it.

danpalmer2y ago

As an engineer at Google, my opinion on Protocol Buffers changed massively. Pre-Google I found them awkward, the language bindings in Python sucked, and I didn't really see the point. I knew a schema for services was a good idea, but protobufs didn't seem like the best option.

The thing is that at Google protobufs are used everywhere. Like, absolutely everywhere. Think of all the places they could be, and it's way more. All the tooling understands them, code search and go-to-reference works on them everywhere, they are truly transformational on how many different services with many different implementations interact.

Are they perfect? Far from it. But a Go-only implementation misses almost all the value of protobufs. If I was inventing them for Python I could do better (pickle? maybe not) but the whole point is that they aren't language/ecosystem/use-case specific. If this was Rob Pike's frustration then I can't help but feel he missed the point, or this post is a little disingenuous as to the benefits.

I've not seen Gobs used in Google, but I'd imagine an engineer would need to make a very strong case against using protos between services, regardless of if both services are in Go.

yegle2y ago

It's called "Larry & Sergey Protobuf Moving Co." for a reason. See the t-shirt design in the video screenshot here: https://isocpp.org/blog/2020/07/cppcon-2019-there-are-no-zer...

Disclaimer: an employee at the said moving company.

knorker2y ago

In my opinion this is pretty on-brand for Pike.

Back when he did his best work, it was possible for one person to "just write the new thing", without making it fit with anything else. There was nothing else to fit it with. You could invent everything from scratch, and not only was it not a waste of time, if you were good enough it had a chance of being the best fit for purpose.

You could take shortcuts. You could have every part of your system be "odd", because nothing was "even".

That's not true anymore. And the way I see it Pike has not moved on.

Science in general had this switch at one point, too. There was a point where one could know all of science. But it's long gone.

hellozomo2y ago

This is an unnecessary ad hominem attack. He wrote this over a decade ago, in the midst of doing his best work creating Go itself.

gob was his opinionated way of doing a Go-specific encoding, while also supporting any number of other encodings in the language. Go has incredibly good support for almost every popular encoding there is.

gob has also been used successfully by a number of projects. In many cases, it's a perfectly good way to encode a piece of data that is completely local to a Go program.

1 more reply

danpalmer2y ago

Interesting hypothesis. I can't comment on Pike's history here, but at Google there's certainly a noticeable difference between old special cases that were built pre-2015 ish, and the modern world where everything is very cohesive. I get the impression that there was a big push to achieve that, led from around that time by various products. It's hit some areas more than others, but there do seem to be almost no new products building in the way you've described now.

geodel2y ago

Well, it is 2011 post. Now Pike is retired, I don't think he cares or matters anyway.

1 more reply

Timon32y ago

As awkward as Protobufs might be - is there any similar format with so many client bindings? I tried comparing the message formats I could find (e.g. Cap'n'Proto, Protobuf, msgpack and others), and still Protobuf seems to have the most supported languages. I'd be happy for any other suggestions!

randomdata2y ago

> I can't help but feel he missed the point

Did he miss the point, or was the point to light a fire under the protobuf team to improve their product? Which they eventually did (e.g. required fields were removed after this post was made).

2 more replies

kunley2y ago

Nice recap.

By the way, just curious what is your opinion on MessagePack?

broken_broken_2y ago· 5 in thread

Just finished removing this encoding in our production services.

It panics on malformed input which is a no go for us since high availability is really important for us, and it showed quite a lot in the performance and memory profiles (roughly 5 times the time and memory as doing the same with JSON).

The code was converting some data to gob, and storing it in the database for later.

We now just do the same but in json, it’s human readable and Postgres validates that the data is valid JSON.

And unmarshaling it does not panic.

grose2y ago

That's interesting because I've had basically the opposite experience. I used encoding/json with BadgerDB and saw that json.Unmarshal in a hot loop was using about 68% of total CPU time in a profile taken from production. By switching to gob it significantly decreased to around 28% (for gob's decode function). I've read that decoding interfaces in gob is slow[1], maybe that accounts for my difference as I don't have any in this particular struct. Also this was a very read-heavy service, so that could be a major difference as well.

[1]: https://groups.google.com/g/golang-nuts/c/12qhqiG1J70

jjtheblunt2y ago

Have you tried the superset of json from AWS?

https://amazon-ion.github.io/ion-docs/

abtinf2y ago

I've been considering adopting the gob package. I haven't used it before, so I only know what's in the docs -- and all of your claims are surprising to me. Could you share more information?

How is it possible that they were getting malformed input? This was happening in go-to-go communication, or was there some kind of cross-language interop?

Any idea why the performance was so much slower than JSON in your case? The technique described in the OP would seem to make that impossible.

Do you think it's possible the database column type or collation was somehow affecting the gob?

broken_broken_2y ago

The column type was bytea (basically blob) so it should be stored as is by the database. The profiling showed the hotspots in the gob package directly.

The docs explicitly mention that invalid input will make it panic and that can be confirmed by reading the code or fuzzing the input.

From my understanding, there is no compile time schema so everything is done with runtime reflection and that is bound to not be super fast. Granted, JSON is the same on paper, I would guess that the JSON package had more eyes on it and optimizations.

In our case, everything was using JSON except this one component due to some historical oddity so it was also a win in terms of simplifying.

scottlawson2y ago

If the issue is panic why not create a wrapper func with recover and present the same interface that you want?

sebstefan2y ago· 5 in thread

>[Required fields are] also a maintenance problem. Over time, one may want to modify the data definition to remove a required field, but that may cause existing clients of the data to crash.

Okay, but would you rather have it crash or allow for a program to run on the wrong data? Especially if you do that and then say that everything has zero as a default value.

The question remains whether the serialization format should be taking care of that, or a round of parsing later on with a schema on the side; but if you do the former without the latter you're setting yourself up for deployment nightmares

cosmic_quanta2y ago

Indeed, I'd rather have the program crash rather than repeat this nightmare: https://specbranch.com/posts/knight-capital/

tomohawk2y ago

If you want to deal with the crash and justify why the system went down because you were more correct than the other guys, then sure.

Protocols often represent an interface between organizations. Especially when that is the case, you want to be as charitable as possible when accepting input, because getting any issues resolved may very well require getting the two organizations to agree.

Also, as things change over time, an overly strict interpretation when receiving packets will require unnecessary rework in the future, and possibly down time or lost business.

When dealing with protocols, it's generally best to be strict when emitting packets and as tolerant as possible when accepting them.

taneq2y ago

> If you want to deal with the crash and justify why the system went down because you were more correct than the other guys, then sure.

I think it's more like, 'if you want to deal with the crash in test instead of having to justify why it crashed in prod.'

> Especially when that is the case, you want to be as charitable as possible when accepting input, because getting any issues resolved may very well require getting the two organizations to agree.

I don't think we've been fans of "be rigorous in what you emit and permissive in what you accept" since IE6 showed us the error of our ways. "Be rigorous in your implementation of a permissive spec" is as far as we should go.

sebstefan2y ago

That's the motto for browsers and I agree with it in context, but if it's something you control (like services of a distributed application) then not really. You can just make sure the versions match during deployment and save yourself some debugging headaches

Not if it's something sensitive either, where maybe crashing is preferable to running the wrong way

mst2y ago

Largely agree, with the addendum that it's a really good idea to collect metrics as to how much tolerance your code has been required to show. Whether you need to present those metrics to the sender and ask them to tweak their emissions or simply keen an eye on them is situation dependent, but having them at all is definitely in the "future you will thank current you later" ... and I will absolutely confess that current me has cursed past me for not doing so on more than one occasion, and I can only hope I remember more often in the future ;)

buro92y ago· 5 in thread

this is quite old, so I'm curious about what triggered it being posted again, has something happened / changed?

ashOP2y ago

I've posted it because I'm always on the lookout for simple solutions for complex problems, and especially for how these solutions are designed. The post describes the design process well.

Also Rob Pike is a great technical writer. Another example of his style is "Effective Go":

https://go.dev/doc/effective_go

buro92y ago

yup, and if people are looking for usage I just found a gist that shows how gob handling can be useful (writing to cache that allows the reading back to be castable into the correct structs) https://gist.github.com/pioz/ca5b7a11200f54afbd76dee7acbcc06...

jstanley2y ago

Just because you already knew it all doesn't mean everyone else did. I hadn't seen it before.

Sometimes even when something was posted a few years ago some people just haven't seen it yet.

blowski2y ago

It's an entirely reasonable question to ask "is there any specific as to why this is being posted today?". If the answer is no, that's fine, but there may be extra context that is interesting and not obvious.

sudhirj2y ago

Ten thousand people, to be exact https://xkcd.com/1053/

dmi2y ago· 4 in thread

> If all you want to send is an array of integers, why should you have to put it into a struct first?

If you're sure that's all you'll ever have to do, then sure. But unless you're 100% certain that the protocol will never evolve further, having a more complex structure allows it to change in a gradual way.

lsaferite2y ago

It was clear, from the post, that they were saying, "If all I need is a simple array, why should I be required to wrap it in a struct?" The whole point (from the post) being that protobuf required structs but gob allowed simpler types _in addition_ to structs.

orf2y ago

all I need _right now_ is a simple array

Nobody knows the future, and preparing for the future is a huge part of software engineering. Sending top-level arrays instead of sending them inside a struct is never the right way.

lsaferite2y ago

> Sending top-level arrays instead of sending them inside a struct is never the right way.

While I understand the sentiment, I 100% disagree on the 'never' qualifier.

1 more reply

Thorrez2y ago

dmi knows that. dmi was saying that even if the encoding scheme allows encoding simpler types, it's often not smart to use that functionality, because you won't be able to evolve the format in the future. If you encode a message instead of a simple type, you'll be able to evolve it later as you add more features to your program.

Note that even protobufs, which doesn't allow encoding simple types at the top level, still has this debate when deciding whether to encode an array of simple types (inside a struct) or an array of structs (inside a struct). And Google's guidance is to use an array of structs if more data might be needed in the future:

>However, if additional data is likely to be needed in the future, repeated fields should use a message instead of a scalar proactively, to avoid parallel repeated fields.

https://google.aip.dev/144

>// Good: A separate message that can grow to include more fields

https://protobuf.dev/programming-guides/api/#order-independe...

azaras2y ago· 2 in thread

I did not know it, but I think so few changes from the proto-buffer that it is a waste of time.

bheadmaster2y ago

Note that this was written in 2011, while the first mention of "proto3" in protobuf repository was in 2014. So this blogpost probably influenced the development of proto3, which fixed many issues of proto2 (which is referred to as just "protocol buffers" in the blogpost).

icholy2y ago

Eh, I like Go and respect Rob Pike, but I seriously doubt gob had any impact on the proto3 design

emmanueloga_2y ago

Someone made a benchmark of serialization libraries in go [1], and I was surprised to see gobs is one of the slowest ones, specially for decoding. I suspect part of the reason is that the API doesn't not allow reusing decoders [2]. From my explorations it seems like both JSON [3], message-pack [4] and CBOR [5] are better alternatives.

By the way, in Go there are a like a million JSON encoders because a lot of things in the std library are not really coded for maximum performance but more for easy of usage, it seems. Perhaps this is the right balance for certain things (ex: the http library, see [6]).

There are also a bunch of libraries that allow you to modify a JSON file "in place", without having to fully deserialize into structs (ex: GJSON/SJSON [7] [8]). This sounds very convenient and more efficient that fully de/serializing if we just need to change the data a little.

1: https://github.com/alecthomas/go_serialization_benchmarks

2: https://github.com/golang/go/issues/29766#issuecomment-45492...

3: https://github.com/goccy/go-json

4: https://github.com/vmihailenco/msgpack

5: https://github.com/fxamacker/cbor

6: https://github.com/valyala/fasthttp#faq

7: https://github.com/tidwall/gjson

8: https://github.com/tidwall/sjson

assbuttbuttass2y ago

Gob is a great serialization format! It's super easy to use, and supports go native types (kind of like Python's pickle).

For a recent project, I needed a simple key-value store. I was evaluating using a full RDBMS, but I ended up just putting gob files in a directory.

jerf2y ago

FWIW, this isn't used much by the community. Being a standard library package it still get some use of course, but for comparison, encoding/gob shows about 22.5K imports [1] to encoding/json's nearly 800K, and whereas you can see in the JSON search an ecosystem of JSON libraries, gob is basically just gob.

Calling it "dead" just invites a tedious thread about what the definition of "dead" is, so I won't, I'll just sort of imply it in this sentence without actually coming out and saying it in a clear manner. I would generally both A: recommend against this, not necessarily as a dire warning, just, you know, a recommendation and B: for anyone who is perturbed by the idea of this existing, just be aware that it's not like this package has embedded itself into the Go ecosystem or anything.

[1]: https://pkg.go.dev/search?q=gob

[2]: https://pkg.go.dev/search?q=json

dang2y ago

Not gobs of comments but discussed at the time:

Gobs of data - https://news.ycombinator.com/item?id=2365430 - March 2011 (2 comments)

jeffrallen2y ago

I used gob for my first client/server Go program, which was a "make one of something you know about to throw away" new language experiment. It worked, but I quickly turned away from it, because it would never be cross platform.

I saw gob more as an experiment that the Go team used to check the reflect package's usability. (Which sucks anyway, by the way.)

I'm surprised it's still in the stdlib. I would have guessed it would have been removed for Go 1.0, because it was already clear then that it was not suitable for anything more experiments.

zgiber2y ago

It may not be a good tool for communicating between services implemented in different languages. But i’d happily use it to save stuff to disk where database is overkill.

art_vandalay2y ago

I forgot Go was still around. Thanks for reminding me.

j / k navigate · click thread line to collapse

58 comments

46 comments · 13 top-level

losvedir2y ago· 12 in thread

Interesting. I wonder to what extent it's found use at Google over this past decade.

I can see Rob Pike being frustrated with Protocol Buffers at Google, and I don't begrudge anyone for taking a big shot like this, but I wonder if he's found any success with it.

lifthrasiir2y ago

[1] Maybe there was no suitable one when Go was first created. Nowadays I believe CBOR is the best format for this job.

packetlost2y ago

[0]: https://github.com/chiefnoah/gerbil-cbor

twotwotwo2y ago

danpalmer2y ago

I've not seen Gobs used in Google, but I'd imagine an engineer would need to make a very strong case against using protos between services, regardless of if both services are in Go.

yegle2y ago

It's called "Larry & Sergey Protobuf Moving Co." for a reason. See the t-shirt design in the video screenshot here: https://isocpp.org/blog/2020/07/cppcon-2019-there-are-no-zer...

Disclaimer: an employee at the said moving company.

knorker2y ago

In my opinion this is pretty on-brand for Pike.

You could take shortcuts. You could have every part of your system be "odd", because nothing was "even".

That's not true anymore. And the way I see it Pike has not moved on.

Science in general had this switch at one point, too. There was a point where one could know all of science. But it's long gone.

hellozomo2y ago

This is an unnecessary ad hominem attack. He wrote this over a decade ago, in the midst of doing his best work creating Go itself.

gob has also been used successfully by a number of projects. In many cases, it's a perfectly good way to encode a piece of data that is completely local to a Go program.

1 more reply

danpalmer2y ago

geodel2y ago

Well, it is 2011 post. Now Pike is retired, I don't think he cares or matters anyway.

1 more reply

Timon32y ago

randomdata2y ago

> I can't help but feel he missed the point

Did he miss the point, or was the point to light a fire under the protobuf team to improve their product? Which they eventually did (e.g. required fields were removed after this post was made).

2 more replies

kunley2y ago

Nice recap.

By the way, just curious what is your opinion on MessagePack?

broken_broken_2y ago· 5 in thread

Just finished removing this encoding in our production services.

The code was converting some data to gob, and storing it in the database for later.

We now just do the same but in json, it’s human readable and Postgres validates that the data is valid JSON.

And unmarshaling it does not panic.

grose2y ago

[1]: https://groups.google.com/g/golang-nuts/c/12qhqiG1J70

jjtheblunt2y ago

Have you tried the superset of json from AWS?

https://amazon-ion.github.io/ion-docs/

abtinf2y ago

I've been considering adopting the gob package. I haven't used it before, so I only know what's in the docs -- and all of your claims are surprising to me. Could you share more information?

How is it possible that they were getting malformed input? This was happening in go-to-go communication, or was there some kind of cross-language interop?

Any idea why the performance was so much slower than JSON in your case? The technique described in the OP would seem to make that impossible.

Do you think it's possible the database column type or collation was somehow affecting the gob?

broken_broken_2y ago

The column type was bytea (basically blob) so it should be stored as is by the database. The profiling showed the hotspots in the gob package directly.

The docs explicitly mention that invalid input will make it panic and that can be confirmed by reading the code or fuzzing the input.

In our case, everything was using JSON except this one component due to some historical oddity so it was also a win in terms of simplifying.

scottlawson2y ago

If the issue is panic why not create a wrapper func with recover and present the same interface that you want?

sebstefan2y ago· 5 in thread

>[Required fields are] also a maintenance problem. Over time, one may want to modify the data definition to remove a required field, but that may cause existing clients of the data to crash.

Okay, but would you rather have it crash or allow for a program to run on the wrong data? Especially if you do that and then say that everything has zero as a default value.

cosmic_quanta2y ago

Indeed, I'd rather have the program crash rather than repeat this nightmare: https://specbranch.com/posts/knight-capital/

tomohawk2y ago

If you want to deal with the crash and justify why the system went down because you were more correct than the other guys, then sure.

Also, as things change over time, an overly strict interpretation when receiving packets will require unnecessary rework in the future, and possibly down time or lost business.

When dealing with protocols, it's generally best to be strict when emitting packets and as tolerant as possible when accepting them.

taneq2y ago

> If you want to deal with the crash and justify why the system went down because you were more correct than the other guys, then sure.

I think it's more like, 'if you want to deal with the crash in test instead of having to justify why it crashed in prod.'

> Especially when that is the case, you want to be as charitable as possible when accepting input, because getting any issues resolved may very well require getting the two organizations to agree.

sebstefan2y ago

Not if it's something sensitive either, where maybe crashing is preferable to running the wrong way

mst2y ago

buro92y ago· 5 in thread

this is quite old, so I'm curious about what triggered it being posted again, has something happened / changed?

ashOP2y ago

I've posted it because I'm always on the lookout for simple solutions for complex problems, and especially for how these solutions are designed. The post describes the design process well.

Also Rob Pike is a great technical writer. Another example of his style is "Effective Go":

https://go.dev/doc/effective_go

buro92y ago

jstanley2y ago

Just because you already knew it all doesn't mean everyone else did. I hadn't seen it before.

Sometimes even when something was posted a few years ago some people just haven't seen it yet.

blowski2y ago

sudhirj2y ago

Ten thousand people, to be exact https://xkcd.com/1053/

dmi2y ago· 4 in thread

> If all you want to send is an array of integers, why should you have to put it into a struct first?

lsaferite2y ago

orf2y ago

all I need _right now_ is a simple array

Nobody knows the future, and preparing for the future is a huge part of software engineering. Sending top-level arrays instead of sending them inside a struct is never the right way.

lsaferite2y ago

> Sending top-level arrays instead of sending them inside a struct is never the right way.

While I understand the sentiment, I 100% disagree on the 'never' qualifier.

1 more reply

Thorrez2y ago

>However, if additional data is likely to be needed in the future, repeated fields should use a message instead of a scalar proactively, to avoid parallel repeated fields.

https://google.aip.dev/144

>// Good: A separate message that can grow to include more fields

https://protobuf.dev/programming-guides/api/#order-independe...

azaras2y ago· 2 in thread

I did not know it, but I think so few changes from the proto-buffer that it is a waste of time.

bheadmaster2y ago

icholy2y ago

Eh, I like Go and respect Rob Pike, but I seriously doubt gob had any impact on the proto3 design

emmanueloga_2y ago

1: https://github.com/alecthomas/go_serialization_benchmarks

2: https://github.com/golang/go/issues/29766#issuecomment-45492...

3: https://github.com/goccy/go-json

4: https://github.com/vmihailenco/msgpack

5: https://github.com/fxamacker/cbor

6: https://github.com/valyala/fasthttp#faq

7: https://github.com/tidwall/gjson

8: https://github.com/tidwall/sjson

assbuttbuttass2y ago

Gob is a great serialization format! It's super easy to use, and supports go native types (kind of like Python's pickle).

For a recent project, I needed a simple key-value store. I was evaluating using a full RDBMS, but I ended up just putting gob files in a directory.

jerf2y ago

[1]: https://pkg.go.dev/search?q=gob

[2]: https://pkg.go.dev/search?q=json

dang2y ago

Not gobs of comments but discussed at the time:

Gobs of data - https://news.ycombinator.com/item?id=2365430 - March 2011 (2 comments)

jeffrallen2y ago

I saw gob more as an experiment that the Go team used to check the reflect package's usability. (Which sucks anyway, by the way.)

I'm surprised it's still in the stdlib. I would have guessed it would have been removed for Go 1.0, because it was already clear then that it was not suitable for anything more experiments.

zgiber2y ago

It may not be a good tool for communicating between services implemented in different languages. But i’d happily use it to save stuff to disk where database is overkill.

art_vandalay2y ago

I forgot Go was still around. Thanks for reminding me.

j / k navigate · click thread line to collapse