FlatBuffers: a memory efficient serialization library (opens in new tab)

(google-opensource.blogspot.com)

126 pointsrw12y ago62 comments

62 comments

39 comments · 6 top-level

kentonv12y ago· 22 in thread

Huh. So Google is releasing a competitor to Cap'n Proto. As the former maintainer of Protobufs (at Google) and author of Cap'n Proto (after Google), I'm pretty surprised that I hadn't heard about this. I also don't recognize any of the names, so this is not from the people who were working on Protobufs at the time I left.

I'm the main competitor, so take me with a grain of salt here.

The docs don't look very detailed, but taking a quick look through...

> "On purpose, the format leaves a lot of details about where exactly things live in memory undefined, e.g. fields in a table can have any order... To be able to access fields regardless of these uncertainties, we go through a vtable of offsets. Vtables are shared between any objects that happen to have the same vtable values."

Hrm, that sounds like a lot of complexity and a big performance hit. In order to read a field from a struct, you have to do a vtable lookup to find the offset? Maybe you can still get that done in an inline accessor, but it will make the optimizer sad.

How is de-duping of vtables accomplished? It doesn't look like the API exposes any details about vtables, meaning you'd have to do it magically behind the scenes, which seems expensive?

It looks like the authors may have been trying to maintain the concept of "optional fields" from Protobufs, where if you don't set an optional field, it takes zero bytes on the wire. But the main use of optional fields in practice is to implement unions -- i.e. a list of fields where only one should be filled in at a time.

Cap'n Proto's solution to this was just to build unions into the language, something Protobufs should have done a long time ago.

> "FlatBuffers relies on new field declarations being added at the end, and earlier declarations to not be removed, but be marked deprecated when needed. We think this is an improvement over the manual number assignment that happens in Protocol Buffers."

That's not an improvement at all. Declarations should be ordered semantically, so that related things go together. More importantly, even if you say that they can't be, developers will still do it, and will accidentally introduce incompatibilities. To make matters worse, it tends to be hard to detect these issues in testing, because your tests are probably build from the latest code. Simply adding field numbers as in Protobufs and Cap'n Proto has proven an effective solution to this in practice.

Very surprised to see Google get this wrong, since they have more experience with this than anyone.

> Benchmarks

I couldn't find the benchmark code in the github repo, so it's hard to say what they may be measuring here. FlatBuffers presumably shares Cap'n Proto's property that encodes and decodes take zero time, which makes it hard to create a meaningful benchmark in the first place. Indeed the numbers they put up kind of look like they're timing a noop loop. I can't see how the "traverse" time could be so much better than with protobufs, which makes me think the code might be written in such a way that the compiler was able to optimize away all the traverse time for FlatBuffers because it thought the results were unused anyway -- a mistake that's easy to make.

But I'd love to see the code to verify. Hopefully I just missed it when digging through the repo.

Aardappel12y ago

Hi. I designed most of FlatBuffers, so let me see if I can clarify:

Clearly, FlatBuffers seeks a different design tradeoff than Cap'n Proto. We wanted to retain all the flexibility of Protobufs, while having the advantages of the "zero parsing / zero allocation" approach. That brought us to the vtable design.

In use cases where performance matters, i.e. you are churning through a large amount of data, you will be accessing the same vtable repeatedly, and the cost of the extra indirection (and the code associated with it) will be masked by the memory latency of the main data you're going through. It will be (close to) "for free". Even in cases where that isn't the case, whether or not it is able to match Cap'n Proto's accessor code is less interesting than it being endlessly faster than Protobuf and most other serializers out there, given that it is just as flexible.

vtables are de-duped on the fly during construction. It is typically not expensive.

Note this tiny overhead holds for "tables", we also have naked "structs", which do not have the indirection overhead, but have no forwards/backwards compatability. Useful for things like 2d/3d vector types, colors, and other small POD types that are used a lot.

FlatBuffers also has unions, which you should use instead of optionals when appropriate. We feel optionals have a lot of uses beyond just mere unions and forwards/backwards compatibility, however. Game objects can have a LOT of fields, many of which are often at their default value, and thus not stored on the wire. This gives significant compression. The zero-byte compression in Cap'n Proto is cool, but we prefer to not have to use additional buffers when reading. Optionals also give a lot of design freedom, i.e. you can add a field that you know is only needed for very few instances without fear of bloating your binaries, as an alternative to "subclassing", or indeed unions.

On field declaration order: to each his own, I guess. That said, the omission of explicit field id's is something that's handled at the level of the schema compiler, we could easily allow optional id declarations in the schema for people who want the flexibility of declaring things in an arbitrary order. Noted.

We should clean up the benchmark code, so it can be included, yes. It is not running a noop loop, we made sure of that. I haven't verified what makes Protobuf traversal that much slower, I am guessing it's causing a memory allocation somewhere.

blt12y ago

Structs are a welcome improvement. Using Protobuf, I was forced to move from a 3d vector type to an array of doubles to get packed=true performance. Small loss of expressiveness, no big deal, but I can imagine it getting a lot worse for heterogeneous and/or more complex structs. Was a shock to see a dynamic allocation for every vector in my 1000-element array. (I understand why this is necessary given the spec of Protobuf though.)