I'm seeing little specifications or conversions regarding endianness so I'm guessing that's out of scope for this project. It seems almost completely backwards incompatible and I'm not too sure about their security validations. I don't think this and Flatbuffers are competing in the same space, really.
I definitely believe this is fast, it's as close to a memcpy to a network packet as you can get. I'd be wary to use this on external data in any native language without any kind of fuzzing first.
That said, I do like the way the generators work.
I’d recommend not using the word “proven” here. In computer science this word typically refers to a mathematical proof. In this case it seems that you ran a regular benchmark for some schemas.
I’d also like to see more what the benchmark actually does. A typical trade-off of these formats is how much you do up-front vs on-demand. E.g. accessing fields after multiple variable-length field: Here it’s possible during “decoding” to make sure all fields can be accessed in O(1), or you can do nothing and then every time you access a field you compute the field location. Whether the benchmark accesses the field once or ten times will make a huge difference.
In general: If you’re just telling me that it’s 10 times faster without explaining why I will be skeptical.
This is just one of those quirks of human language. Yes it's occasionally annoying similar things have very disparate meanings, but English in particular is never going to shake them off.
It’s pretty normal to use “proved” in place of “observed” or “measured” in day-to-day speech though. Usually a few repeatable measurements are enough proof (evidence) for most people to consider something “proven”.
Edit: I forgot the proofread meaning pointed out by the sibling comment.
Flatbuffers trades off encoding speed, programmer ergonomics and binary size (it produces many bytes and it's awkward and still pretty slow to encode) for decoding speed (almost a no-op if you forego buffer verification, which you shouldn't most of the time). Imho it's not a good choice for network wire formats, but for storage it's pretty good.
Like, ok, its 10x faster unzipping than another obscure language dependent format, but how is that better than perl storables or python pickles or ruby ser's other than being "faster"?
How do i call this from java or dotNet, and why would i do this other than to make everyone I work with miserable to adopt yet another format?
Languages
Currently, we have focus on WebAssembly, and because of that those are the languages supported:
AssemblyScript
Golang/TinyGo
~Swift/SwiftWasm~
Zig
CWhere is the flatbuffers native C (or C++) implementation of the benchmark? Are memory allocations avoided/excluded in the benchmark?
In my experience with Capn Proto, the vast majority of the time the zero copy feature is pointless. The Capn Proto C++ APIs are extremely unergonomic so 99% of the time you end up copying the data into your internal nice C++ structures anyway, completely giving up zero copy.
I've used Capnp quite a lot and I really wouldn't recommend it. It's quite old and complex and the unpleasantness of the API alone is enough to put me off. I would pick Protobufs every day for small amounts of data.
For large amounts you are better off with SQLite or DuckDB.
There's also not much out there comparable to its capability passing RPC either.
Agree on the zero copy though, and the RPC framework does malloc a lot.
* Its design goals and rationale
* How those decisions are translated into the actual performance
* What is the trade off made to achieve that
* Why should/shouldn't anyone else use it
Rather than just a vague performance claim that it's ten times faster than something else. It's not just for this specific library, but applicable to any libraries seeking for broader audiences.> kmparser: implement id generator
> That is the first step to implement Unions/Interfaces, it's also useful to know what is the expected message type to decode.
I don't see any other mention or plan about DU's in the repo or metadata. I'm curious what their position is on it.
[1]: https://github.com/inkeliz/karmem/commit/626e6d3b380eb5236c9...
Case in point: I used rust for 1-2 years and am now on a project in Go. Even though Go fits my style and my use case better, I miss enums soo much. Both the std lib types like results and option but also the custom ones.
These questions may not matter for every use case (e.g. you ship a single binary from a single codebase) but I think that clearly defining these rules opens up a lot of very cool use cases that are otherwise prohibited.
> In order words: you can't edit the description of one inline struct without breaking compatibility.
and
> Tables: Tables can be used when backward compatibility matters. For example, tables can have new fields append at the bottom without breaking compatibility.
Otherwise, I agree it's rather unclear exactly what you can do with tables.
So perhaps using a generic message serialization library is too slow for its use case since WASM's data types are just ints and floats since the parsing code can't behave like on a native CPU with things like bytes and C-structs?
It would have been great if they had disclosed links to issues regarding out-of-bounds access for things like Protobuf or Flatbuffer.
https://rkyv.org/ https://github.com/jamesmunns/postcard
postcard seems like it would be particularly strong for the wasm use case as it produces small messages that are light in memory.
My anecdotal experience ties out with those FWIW.
10x "faster" than that is something targeting an FPGA, and I don't see any Verilog in the repo.
Come on folks, #1?