The current offerings (Thrift, ProtoBuffs, Avro, etc.) tend to have similar opinions about things like schema versioning, and very different opinions about things like wire format, protocol, performance tradeoffs, etc. Bond is essentially a serialization framework that keeps the schema logic stuff the same, but making the tasks like wire format, protocol, etc., highly customizable and pluggable. The idea being that instead of deciding ProtoBuffs isn’t right for you, and tearing it down and starting Thrift from scratch, you just change the parts that you don’t like, but keep the underlying schema logic the same.
In theory, this means one team can hand another team a Bond schema, and if they don’t like how it’s serialized, fine, just change the protocol, but the schema doesn’t need to.
The way this works, roughly, is as follows. For most serialization systems, the workflow is: (1) you declare a schema, and (2) they generate a bunch of files with source code to de/serialize data, which you can add to a project and compile into programs that need to call functions that serialize and deserialize data.
In Bond, you (1) declare a schema, and then (2) instead of generating source files, Bond will generate a de/serializer using the metaprogramming facilities of your chosen language. So customizing your serializer is a matter of using the Bond metaprogramming APIs change the de/serializer you’re generating.
"By design Bond is language and platform independent and is currently supported for C++, C#, and Python on Linux, OS X and Windows."
Versus Thrift:
"language bindings - Thrift is supported in many languages and environments C++ C# Cocoa D Delphi Erlang Haskell Java OCaml Perl PHP Python Ruby Smalltalk"
i got around 300k msgs/s throughtput with msgpack-d-rpc
It is about time considering that Microsoft research has been one of the main funders of work on the Haskell compiler.
This make it pretty "unportable" because its the same dependency with the Java VM, so how can i distribute code with this library, with a dependency like that, asking people to download the whole GHC ?!
Unfortunately for libraries that should be embedded in third-party code, the reality beyond C/C++ is pretty harsh.. for full applications the reality is different.. but for embedded libraries.. despite the fact that i've liked the solution for something im doing, i had to pass because of this small detail.. and im too busy to write a parser in C++ to make this more portable in source code form.. so i had to get back to protobuf :/
I've been toying with the idea of using something like PB, Cap'n Proto, or now Bond to define and track schema changes and centralize marshaling / serializing logic. I'm not concerned about having RPC. Does this sound like crazy talk? Anyone else happen to track schemas agains schemaless data stores?
(I also like the idea of not having to ship JSON everywhere if I don't want to.)
A few things:
- ElasticSearch is definitely not schema-less, but it can try to generate a schema (aka "mapping") for you if you don't give it one: http://www.elasticsearch.org/guide/en/elasticsearch/hadoop/c...
- ElasticSearch has tons of ways to customize the data you get back, so, unless you really don't want the ES cluster crunching things for you, you can do a lot of the transformation server-side. You can go so far as to have your own type + mapping for e.g. a report, which sources data from another type and transforms it: http://www.elasticsearch.org/guide/en/elasticsearch/referenc...
- This covers both why the schema can't, by nature, be dynamic (so the argument of "schema-less / dynamic schema" is BS in practice IMO), as well as how to get data out from one index an into another (e.g. your "report" index which does scripted transformation).
- Another idea would be to use the scripting module to write a custom "view": http://www.elasticsearch.org/guide/en/elasticsearch/referenc...
- You can use Groovy, mvel, JS, or Python for scripts. If you combine this with how ES lets you do "site plugins", you could make a JS + CSS + HTML site which is actually served by the ES cluster, which interacts with it and generates reports or whatever all without additional infrastructure. Example: https://github.com/karmi/elasticsearch-paramedic
What a I overlooking? What information is know at runtime that isn't already available at build time? (And no, "the exact CPU/memory/etc. the code runs on is not a valid answer. This is C# code, so there always is a runtime that handles that stuff)
1) In some scenarios you have information at runtime that allows you do generate much faster code. The canonical example is untagged protocols, where serialized payload doesn't contain any schema information and you get schema at runtime. Bond supports untagged protocols (like Avro) in addition to tagged ones (like protobuf and Thrift) and the C# version generates insanely fast deserializer for untagged.
2) It allows programmatic customizations. If the work is done via codegen'ed source code then the only way for user to do something custom is to change the code generator to emit modified code. Even if codegen provides ability to do that, it is very hard to maintain such customizations. In Bond the serialization and deserialization are composed from more generic abstractions: parsers and transforms. These lower level APIs are exposed to the user. As an example imagine that you need to scrub PII information from incoming messages. This is a bit like deserialization, because you need to parse the payload, and a bit like serialization, because you need to write the scrubbed data. In Bond you can implement such an operation from those underlying abstractions and because you can emit the code at runtime you don't sacrifice performance.
BTW, Bond allows to do something similar in C++. The underlying meta-programming mechanism is different (compile-time template meta-programming instead of runtime JIT) but the principle that serialization and deserialization are not special but are composed from more generic abstractions is the same.
Main content has horizontal scroll on portrait monitors, which underlaps the transparent fixed div they used for navigation.