Making vtprotobuf an additional protoc plugin seems like the Right Thing™, although it's a shame how complicated protoc commands end up becoming for mature projects. I'm pretty tempted to port Authzed over to this and run some benchmarks -- our entire service requires e2e latency under 20ms, so every little bit counts. The biggest performance win is likely just having an unintrusive interface for pooling allocated protos.
I agree it's unlikely the difference here will be solely responsible for tipping the GP's request above 20ms, but the memory problems could reasonably ruin tail latencies.
Perhaps they have significant external (network) latency leaving only a few ms budget for the application stack - so they could easily be up against a wall.
(Holy shit, who is downvoting this? It's literally the whole article!)
If you are willing to use cgo, google already implemented one for gapid.
https://github.com/google/gapid/tree/master/core/memory/aren...
There is still so much education to do.
Most of the time, their non-presence is due to general pools being just as good most of the time, or people simply not needing them that much with modern GC
> The maintainers of Gogo, understandably, were not up to the gigantic task.
I'm 99% sure they are "up to" (as in "capable of") doing so, they are just not "up for" it (as in, "will not do it").
That said, I love the detailed post and the interesting solution, and the commitment to performance!
http://www.brendangregg.com/blog/2017-05-09/cpu-utilization-...
A much better way to test the influence of the new compiler would be to test the actual throughput at which saturation is achieved (which is what the benchmark in the C++ grpc library measure to assess their performance).
For example, it looks like pooled decoders could be implemented by setting a custom unmarshaller through the ProtoMethods[2] API.
I wonder why not? Did the authors of the vtprotobuf extension not want to bite off that much work? Is the new API not sufficient to do what they want (thus failing some of the goals expressed in golang/protobuf#364?
[1]: https://github.com/golang/protobuf/issues/364
[2]: https://pkg.go.dev/google.golang.org/protobuf@v1.26.0/reflec...
- If you write the same message multiple times, protobuf implementations should merge fields with a last write wins policy (repeated fields are concatenated). This includes messages in oneofs.
- For a boolean array, you're better off using a packed, repeated int64 (if wire size matters a lot). Protobuf bools use varint encoding meaning you need at least 2 bytes for every boolean, 1+ for the tag and type and 1 byte for the 0 or 1 value. With a repeated int64, you'd encode the tag and length in 2 varints, and then you get 64 bools per 8 bytes.
- Fun trivia: Varints take up a max of 10 bytes but could be implemented in 9 bytes. You get 7 bits per varint byte, so 9 bytes gets you 63 bits. Then you could use the most significant bit of the last byte to indicate if the last bit is 0 or 1. Learned by reading the Go varint implementation [2].
- Messages can be recursive. This is easy if you represent messages as pointers since you can use nil. It's a fair bit harder if you want to always use a value object for each nested message since you need to break cycles by marking fields as `T | undefined` to avoid blowing the stack. Figuring out the minimal number of fields to break cycles is an NP hard problem called the minimum feedback arc set[3].
- If you're writing a protobuf implementation, the conformance tests are a really nice way to check that you've done a good job. Be wary of implementations that don't implement the conformance tests.
[1]: https://github.com/protocolbuffers/protobuf/tree/master/conf...
[2]: https://github.com/golang/go/blob/master/src/encoding/binary...
[3]: https://en.wikipedia.org/wiki/Feedback_arc_set#Minimum_feedb...
The solution for this is to subtract 1 from the integer every time you encode a byte (since the existence of the next byte you're adding already indicates that the intermediate value isn't 0)
Disclaimer: Google had a lot of internal stuff they considered important to their core tech competencies. For example, no open source about Google paxos APIs and infrastructure, networking, etc.