A new ProtoBuf generator for Go (opens in new tab)

(vitess.io)

296 pointstanoku5y ago73 comments

73 comments

30 comments · 9 top-level

jzelinskie5y ago· 7 in thread

I hadn't realized that Gogo was in such a bad spot with the upstream Go protobuf changes. There was lots of drama when the changes were made and I guess that overshadowed any optics I had on Gogo.

Making vtprotobuf an additional protoc plugin seems like the Right Thing™, although it's a shame how complicated protoc commands end up becoming for mature projects. I'm pretty tempted to port Authzed over to this and run some benchmarks -- our entire service requires e2e latency under 20ms, so every little bit counts. The biggest performance win is likely just having an unintrusive interface for pooling allocated protos.

jeffbee5y ago

Proto message unmarshal in Go for a small message should be 5 orders of magnitude below 20ms, shouldn't even begin to matter until you are sweating individual microseconds.

AYBABTME5y ago

That's true if your program only does a single unmarshal at a time at a leisurely pace. And in a steady state situation, the memory trashing left behind each individual unmarshal call needs to be paid up by some poor future request.

I agree it's unlikely the difference here will be solely responsible for tipping the GP's request above 20ms, but the memory problems could reasonably ruin tail latencies.

lttlrck5y ago

The significance of 20ms isn't clear so this is hard to judge.

Perhaps they have significant external (network) latency leaving only a few ms budget for the application stack - so they could easily be up against a wall.

morelisp5y ago

Until the GC kicks in and steals a full 200usec + a bunch of your throughput...

(Holy shit, who is downvoting this? It's literally the whole article!)

2 more replies

brandmeyer5y ago

3% regression in QPS, 20% regression in CPU, and 5% regression in memory usage according to the article. Those are considerably worse than "5 orders of magnitude below".

1 more reply

rapsey5y ago

> our entire service requires e2e latency under 20ms

Why are you using Go then?

kodah5y ago

20ms is a pretty considerable amount of time WRT E2E transaction time in today's world. Can you expand on your concerns with Go?

1 more reply

shoefindortz5y ago· 4 in thread

> Arenas are, however, unfeasible to implement in Go because it is a garbage collected language.

If you are willing to use cgo, google already implemented one for gapid.

https://github.com/google/gapid/tree/master/core/memory/aren...

pjmlp5y ago

Not only that, there are other garbage collected languages like D, Nim and C# that offer the language features to do arenas without having to touch any C code.

There is still so much education to do.

p_l5y ago

Aren't arenas old news in GC languages in general?

Most of the time, their non-presence is due to general pools being just as good most of the time, or people simply not needing them that much with modern GC

1 more reply

throwaway8943455y ago

Do I misunderstand what arenas are? I thought it was just "allocate this big array as a single allocation rather than N little allocations"? If so, how is that not supported in Go? (e.g., `arena := make([]Foo, 1000000000)`)

1 more reply

dimitrios15y ago

I can't believe we've managed to have this lengthy of a discussion about GC languages and speed without anyone mentioning rust. Has HN turned a corner?

2 more replies

jen205y ago· 3 in thread

I'm not sure that the phrasing in the article is particularly fair:

> The maintainers of Gogo, understandably, were not up to the gigantic task.

I'm 99% sure they are "up to" (as in "capable of") doing so, they are just not "up for" it (as in, "will not do it").

Zababa5y ago

They could be "not up to" because of lack of resources, probably time and/or money. I think that's what is implied, rather than lack of technical knowledge.

lux5y ago

I got the sense that they meant "not willing" but I agree that's one of those English phrases that can easily be misconstrued towards the more negative interpretation.

That said, I love the detailed post and the interesting solution, and the commitment to performance!

jahewson5y ago

Yes I assume the author meant “not up for”

jupp0r5y ago· 2 in thread

Using CPU utilization as a performance metric can be extremely misleading. My favorite article on the subject is from Brendan Gregg:

http://www.brendangregg.com/blog/2017-05-09/cpu-utilization-...

A much better way to test the influence of the new compiler would be to test the actual throughput at which saturation is achieved (which is what the benchmark in the C++ grpc library measure to assess their performance).

dkhenry5y ago

There is a fairly robust set of benchmarks that are run to test out performance improvements[1] and macro benchmarks are the ultimate test of holistic improvement. CPU isn't a great proxy, but one of the biggest problems in real world performance on this specific system ( databases in general ) is latency. CPU time is a really good proxy for latency so by taking a look at CPU time we can get an idea of how the system will respond under "normal" conditions.

1.https://benchmark.vitess.io/macrobench

et13375y ago

In this case the regression also caused a 3% decrease in throughput.

gilgad135y ago· 2 in thread

Maybe I'm missing something, but my read of golang/protobuf#364[1] was that part of the motivation for the re-organization in protobuf-go v2 was to allow for optimizations like gogoprotobuf to be developed without requiring a complete fork. I totally understand that the authors of gogoprotobuf do not have the time to re-architect their library to use these hooks, but best I can figure this generator does not use these hooks either. Instead it defines additional member functions, and wrappers that look for those specialized functions and fallback to the generic ones if not found.

For example, it looks like pooled decoders could be implemented by setting a custom unmarshaller through the ProtoMethods[2] API.

I wonder why not? Did the authors of the vtprotobuf extension not want to bite off that much work? Is the new API not sufficient to do what they want (thus failing some of the goals expressed in golang/protobuf#364?

[1]: https://github.com/golang/protobuf/issues/364

[2]: https://pkg.go.dev/google.golang.org/protobuf@v1.26.0/reflec...

alecthomas5y ago

I haven't looked in more detail, but one blocker is that `ProtoMethods() *methods` returns a private type, making it effectively unimplementable outside this package.

zeeboo5y ago

So, I thought this at one point, too. But it turns out that methods is a type alias to an unnamed type, so there's no package level privacy issues: https://github.com/protocolbuffers/protobuf-go/blob/v1.26.0/...

2 more replies

sa465y ago· 1 in thread

Funny timing, I've just written most of a TypeScript generator for protobufs. I learned about some fun corners of protobufs I didn't expect trying to pass the protouf conformance tests [1] (which this one passes, that's no mean feat!).

- If you write the same message multiple times, protobuf implementations should merge fields with a last write wins policy (repeated fields are concatenated). This includes messages in oneofs.

- For a boolean array, you're better off using a packed, repeated int64 (if wire size matters a lot). Protobuf bools use varint encoding meaning you need at least 2 bytes for every boolean, 1+ for the tag and type and 1 byte for the 0 or 1 value. With a repeated int64, you'd encode the tag and length in 2 varints, and then you get 64 bools per 8 bytes.

- Fun trivia: Varints take up a max of 10 bytes but could be implemented in 9 bytes. You get 7 bits per varint byte, so 9 bytes gets you 63 bits. Then you could use the most significant bit of the last byte to indicate if the last bit is 0 or 1. Learned by reading the Go varint implementation [2].

- Messages can be recursive. This is easy if you represent messages as pointers since you can use nil. It's a fair bit harder if you want to always use a value object for each nested message since you need to break cycles by marking fields as `T | undefined` to avoid blowing the stack. Figuring out the minimal number of fields to break cycles is an NP hard problem called the minimum feedback arc set[3].

- If you're writing a protobuf implementation, the conformance tests are a really nice way to check that you've done a good job. Be wary of implementations that don't implement the conformance tests.

[1]: https://github.com/protocolbuffers/protobuf/tree/master/conf...

[2]: https://github.com/golang/go/blob/master/src/encoding/binary...

[3]: https://en.wikipedia.org/wiki/Feedback_arc_set#Minimum_feedb...

nly5y ago

The varint format also isnt as dense on average as it could be and allows for non-canonical encodings. I.e. you can encode any integer in multiple ways (up to 9 or 10 bytes)

The solution for this is to subtract 1 from the integer every time you encode a byte (since the existence of the next byte you're adding already indicates that the intermediate value isn't 0)

flakiness5y ago· 1 in thread

I wonder what Google is thinking about the v2 performance. It's well known that protobuf processing is taxing heavy on their data center [1]. It's hard to imagine they just leave it slow. Or do they?

[1] https://research.google/pubs/pub44271/

justicezyx5y ago

There was a project to develop a asic (probably bundled inside NIC) to do protobuf parsing. At some point Sanjay did a change to proto API that rendered that project less appealing.

Disclaimer: Google had a lot of internal stuff they considered important to their core tech competencies. For example, no open source about Google paxos APIs and infrastructure, networking, etc.

PostThisTooFast5y ago· 1 in thread

Is there one for Kotlin yet? It's pretty pathetic that Google's own protocol lacks native support for its most popular operating system.

hn_go_brrrrr5y ago

Yes: https://developers.google.com/protocol-buffers/docs/kotlintu...

1 more reply

n0x1m5y ago

the biggest current problem with Go and ProtoBuf is swagger support when using it for API returns. Enums are not supported for example. The leniency of protojson can't be used in other languages that built on top of the swagger docs.

j / k navigate · click thread line to collapse

73 comments

30 comments · 9 top-level

jzelinskie5y ago· 7 in thread

I hadn't realized that Gogo was in such a bad spot with the upstream Go protobuf changes. There was lots of drama when the changes were made and I guess that overshadowed any optics I had on Gogo.

jeffbee5y ago

Proto message unmarshal in Go for a small message should be 5 orders of magnitude below 20ms, shouldn't even begin to matter until you are sweating individual microseconds.

AYBABTME5y ago

I agree it's unlikely the difference here will be solely responsible for tipping the GP's request above 20ms, but the memory problems could reasonably ruin tail latencies.

lttlrck5y ago

The significance of 20ms isn't clear so this is hard to judge.

Perhaps they have significant external (network) latency leaving only a few ms budget for the application stack - so they could easily be up against a wall.

morelisp5y ago

Until the GC kicks in and steals a full 200usec + a bunch of your throughput...

(Holy shit, who is downvoting this? It's literally the whole article!)

2 more replies

brandmeyer5y ago

3% regression in QPS, 20% regression in CPU, and 5% regression in memory usage according to the article. Those are considerably worse than "5 orders of magnitude below".

1 more reply

rapsey5y ago

> our entire service requires e2e latency under 20ms

Why are you using Go then?

kodah5y ago

20ms is a pretty considerable amount of time WRT E2E transaction time in today's world. Can you expand on your concerns with Go?

1 more reply

shoefindortz5y ago· 4 in thread

> Arenas are, however, unfeasible to implement in Go because it is a garbage collected language.

If you are willing to use cgo, google already implemented one for gapid.

https://github.com/google/gapid/tree/master/core/memory/aren...

pjmlp5y ago

Not only that, there are other garbage collected languages like D, Nim and C# that offer the language features to do arenas without having to touch any C code.

There is still so much education to do.

p_l5y ago

Aren't arenas old news in GC languages in general?

Most of the time, their non-presence is due to general pools being just as good most of the time, or people simply not needing them that much with modern GC

1 more reply

throwaway8943455y ago

1 more reply

dimitrios15y ago

I can't believe we've managed to have this lengthy of a discussion about GC languages and speed without anyone mentioning rust. Has HN turned a corner?

2 more replies

jen205y ago· 3 in thread

I'm not sure that the phrasing in the article is particularly fair:

> The maintainers of Gogo, understandably, were not up to the gigantic task.

I'm 99% sure they are "up to" (as in "capable of") doing so, they are just not "up for" it (as in, "will not do it").

Zababa5y ago

They could be "not up to" because of lack of resources, probably time and/or money. I think that's what is implied, rather than lack of technical knowledge.

lux5y ago

I got the sense that they meant "not willing" but I agree that's one of those English phrases that can easily be misconstrued towards the more negative interpretation.

That said, I love the detailed post and the interesting solution, and the commitment to performance!

jahewson5y ago

Yes I assume the author meant “not up for”

jupp0r5y ago· 2 in thread

Using CPU utilization as a performance metric can be extremely misleading. My favorite article on the subject is from Brendan Gregg:

http://www.brendangregg.com/blog/2017-05-09/cpu-utilization-...

dkhenry5y ago

1.https://benchmark.vitess.io/macrobench

et13375y ago

In this case the regression also caused a 3% decrease in throughput.

gilgad135y ago· 2 in thread

For example, it looks like pooled decoders could be implemented by setting a custom unmarshaller through the ProtoMethods[2] API.

[1]: https://github.com/golang/protobuf/issues/364

[2]: https://pkg.go.dev/google.golang.org/protobuf@v1.26.0/reflec...

alecthomas5y ago

I haven't looked in more detail, but one blocker is that `ProtoMethods() *methods` returns a private type, making it effectively unimplementable outside this package.

zeeboo5y ago

2 more replies

sa465y ago· 1 in thread

- If you write the same message multiple times, protobuf implementations should merge fields with a last write wins policy (repeated fields are concatenated). This includes messages in oneofs.

- If you're writing a protobuf implementation, the conformance tests are a really nice way to check that you've done a good job. Be wary of implementations that don't implement the conformance tests.

[1]: https://github.com/protocolbuffers/protobuf/tree/master/conf...

[2]: https://github.com/golang/go/blob/master/src/encoding/binary...

[3]: https://en.wikipedia.org/wiki/Feedback_arc_set#Minimum_feedb...

nly5y ago

The varint format also isnt as dense on average as it could be and allows for non-canonical encodings. I.e. you can encode any integer in multiple ways (up to 9 or 10 bytes)

The solution for this is to subtract 1 from the integer every time you encode a byte (since the existence of the next byte you're adding already indicates that the intermediate value isn't 0)

flakiness5y ago· 1 in thread

I wonder what Google is thinking about the v2 performance. It's well known that protobuf processing is taxing heavy on their data center [1]. It's hard to imagine they just leave it slow. Or do they?

[1] https://research.google/pubs/pub44271/

justicezyx5y ago

There was a project to develop a asic (probably bundled inside NIC) to do protobuf parsing. At some point Sanjay did a change to proto API that rendered that project less appealing.

Disclaimer: Google had a lot of internal stuff they considered important to their core tech competencies. For example, no open source about Google paxos APIs and infrastructure, networking, etc.

PostThisTooFast5y ago· 1 in thread

Is there one for Kotlin yet? It's pretty pathetic that Google's own protocol lacks native support for its most popular operating system.

hn_go_brrrrr5y ago

Yes: https://developers.google.com/protocol-buffers/docs/kotlintu...

1 more reply

n0x1m5y ago

j / k navigate · click thread line to collapse