Bond – An extensible framework for working with schematized data (opens in new tab)

(microsoft.github.io)

99 pointsdons11y ago41 comments

41 comments

26 comments · 8 top-level

sdave11y ago· 6 in thread

how does it compare to protobuf,thrift ?

Quoting apc @ https://lobste.rs/s/7w6p95/msft_open_sources_production_seri...

The current offerings (Thrift, ProtoBuffs, Avro, etc.) tend to have similar opinions about things like schema versioning, and very different opinions about things like wire format, protocol, performance tradeoffs, etc. Bond is essentially a serialization framework that keeps the schema logic stuff the same, but making the tasks like wire format, protocol, etc., highly customizable and pluggable. The idea being that instead of deciding ProtoBuffs isn’t right for you, and tearing it down and starting Thrift from scratch, you just change the parts that you don’t like, but keep the underlying schema logic the same.

In theory, this means one team can hand another team a Bond schema, and if they don’t like how it’s serialized, fine, just change the protocol, but the schema doesn’t need to.

The way this works, roughly, is as follows. For most serialization systems, the workflow is: (1) you declare a schema, and (2) they generate a bunch of files with source code to de/serialize data, which you can add to a project and compile into programs that need to call functions that serialize and deserialize data.

In Bond, you (1) declare a schema, and then (2) instead of generating source files, Bond will generate a de/serializer using the metaprogramming facilities of your chosen language. So customizing your serializer is a matter of using the Bond metaprogramming APIs change the de/serializer you’re generating.

kentonv11y ago

That's cool... but, from what I can tell (correct me if I'm wrong), Bond accomplishes this by using common classes for in-memory objects which have no relation to the wire format, and then simply invoking a pluggable wire format and parse/serialize time. This lets you plug in previous-generation serialization protocols like Protobuf, Thrift, or Avro but probably won't allow you to plug in a next-generation zero-copy protocol like Cap'n Proto, SBE, or FlatBuffers, where the in-memory data structure and the wire format are one and the same. If you want to try one of them, you'll still have to rewrite all your code, unfortunately.

1 more reply

nly11y ago

Thrift has pluggable protocols. It comes with 'compact' (protobuf-like), 'dense', 'binary' and json out of the box. It also has pluggable transports and multiple server implementations (threaded, async, etc). I'm personally not seeing any innovation here... I think they just wanted their own version of Thrift such that they could ignore the languages they don't care about.

2 more replies

bradleyankrom11y ago

One key differentiator is the limited set of languages Bond currently supports:

"By design Bond is language and platform independent and is currently supported for C++, C#, and Python on Linux, OS X and Windows."

Versus Thrift:

"language bindings - Thrift is supported in many languages and environments C++ C# Cocoa D Delphi Erlang Haskell Java OCaml Perl PHP Python Ruby Smalltalk"

thesnider11y ago

After struggling with thrift's Go binding (it happily generated broken Go code with the Aurora project's thrift file), I'm now skeptical that any of the others really works. I've never encountered a more frustrating project in this space.

1 more reply

_asummers11y ago

Or something like CBOR or JSONB?

nly11y ago· 4 in thread

No RPC? Disappointing. There are so few choices C and C++ programmers with regard to battle-tested, easy (read: code generation for decode and dispatch), language-agnostic RPC.

sapek11y ago

We are planning to release cross-platform RPC support but it just wasn't ready yet and we didn't want hold up the core release for it.

bradleyankrom11y ago

Have you tried any of the MessagePack RPC implementations? I haven't but I'm curious.

yawniek11y ago

i recently evaluated msgpack-rpc and thrift for a small side project. surprisingly it turned out that msgpack was not only much faster but also way easier to use (for lots of small messages).

i got around 300k msgs/s throughtput with msgpack-d-rpc

nly11y ago

I haven't, although it's really nice that they chose to adopt the Thrift IDL. From a cursory glance however, it doesn't look like the code generator produces any dispatch code. Atm you're still going to need to write a tonne of boilerplate.

gregwebs11y ago· 3 in thread

The Bond compiler is written in Haskell: http://blog.nullspace.io/bond-oss.html

It is about time considering that Microsoft research has been one of the main funders of work on the Haskell compiler.

oscargrouch11y ago

I've saw this yesterday, pretty happy about it, but then.. see that the compiler was coded in haskell..

This make it pretty "unportable" because its the same dependency with the Java VM, so how can i distribute code with this library, with a dependency like that, asking people to download the whole GHC ?!

Unfortunately for libraries that should be embedded in third-party code, the reality beyond C/C++ is pretty harsh.. for full applications the reality is different.. but for embedded libraries.. despite the fact that i've liked the solution for something im doing, i had to pass because of this small detail.. and im too busy to write a parser in C++ to make this more portable in source code form.. so i had to get back to protobuf :/

sapek11y ago

You need Haskell only to build the Bond compiler. Once you do that, you get a native, stand-alone executable for your system (you don't need Haskell to run the Bond compiler). You use the Bond compiler as part of build process for programs using Bond. Programs using Bond don't have any Haskell dependency.

2 more replies

cbd198411y ago

Can you please explain how you connected Haskell with Java in your mind? I'm honestly curious.

leetrout11y ago· 2 in thread

Slightly OT- I'm working with data sets that might change, but not often if at all, which are provided by Elasticsearch. I'm processing the raw data in Flask (API), munging, joining, and dropping what I don't want going out to the world.

I've been toying with the idea of using something like PB, Cap'n Proto, or now Bond to define and track schema changes and centralize marshaling / serializing logic. I'm not concerned about having RPC. Does this sound like crazy talk? Anyone else happen to track schemas agains schemaless data stores?

(I also like the idea of not having to ship JSON everywhere if I don't want to.)

seanp2k211y ago

TL;DR by using more of the available features in ElasticSearch, you can probably replace all of your external app with ElasticSearch.

A few things:

- ElasticSearch is definitely not schema-less, but it can try to generate a schema (aka "mapping") for you if you don't give it one: http://www.elasticsearch.org/guide/en/elasticsearch/hadoop/c...

- ElasticSearch has tons of ways to customize the data you get back, so, unless you really don't want the ES cluster crunching things for you, you can do a lot of the transformation server-side. You can go so far as to have your own type + mapping for e.g. a report, which sources data from another type and transforms it: http://www.elasticsearch.org/guide/en/elasticsearch/referenc...

- This covers both why the schema can't, by nature, be dynamic (so the argument of "schema-less / dynamic schema" is BS in practice IMO), as well as how to get data out from one index an into another (e.g. your "report" index which does scripted transformation).

- Another idea would be to use the scripting module to write a custom "view": http://www.elasticsearch.org/guide/en/elasticsearch/referenc...

- You can use Groovy, mvel, JS, or Python for scripts. If you combine this with how ES lets you do "site plugins", you could make a JS + CSS + HTML site which is actually served by the ES cluster, which interacts with it and generates reports or whatever all without additional infrastructure. Example: https://github.com/karmi/elasticsearch-paramedic

rch11y ago

I'm doing something similar at work. I'd be happy to chat about it if you drop me a line sometime.

ziedaniel111y ago· 2 in thread

It's cool that the .NET version actually JITs specialized serialization and deserialization code at runtime. This is one place where managed languages really shine, because emitting bytecode is easier and more portable than emitting, say, raw x86. It's also safer -- the runtime can verify the memory safety and type safety of the code.

Someone11y ago

Is it? To start that JIT process, you need to have a class in your code that the compiler for the .NET version generated. Disk space is cheap nowadays, even on mobile, so I do not see a big disadvantage of generating the deserialization code at the same time the source code for the class gets generated (and if you things that way, you lose get one-time delay, and you don't need the code that generates those serializers in your application)

What a I overlooking? What information is know at runtime that isn't already available at build time? (And no, "the exact CPU/memory/etc. the code runs on is not a valid answer. This is C# code, so there always is a runtime that handles that stuff)

sapek11y ago

There are two advantages to generating code at runtime:

1) In some scenarios you have information at runtime that allows you do generate much faster code. The canonical example is untagged protocols, where serialized payload doesn't contain any schema information and you get schema at runtime. Bond supports untagged protocols (like Avro) in addition to tagged ones (like protobuf and Thrift) and the C# version generates insanely fast deserializer for untagged.

2) It allows programmatic customizations. If the work is done via codegen'ed source code then the only way for user to do something custom is to change the code generator to emit modified code. Even if codegen provides ability to do that, it is very hard to maintain such customizations. In Bond the serialization and deserialization are composed from more generic abstractions: parsers and transforms. These lower level APIs are exposed to the user. As an example imagine that you need to scrub PII information from incoming messages. This is a bit like deserialization, because you need to parse the payload, and a bit like serialization, because you need to write the scrubbed data. In Bond you can implement such an operation from those underlying abstractions and because you can emit the code at runtime you don't sacrifice performance.

BTW, Bond allows to do something similar in C++. The underlying meta-programming mechanism is different (compile-time template meta-programming instead of runtime JIT) but the principle that serialization and deserialization are not special but are composed from more generic abstractions is the same.

1 more reply

a_c11y ago· 1 in thread

How would this compared with apache thrift?

sapek11y ago

See https://news.ycombinator.com/item?id=8868045

sapek11y ago

There's been a lot of questions on how Bond compares to Protobuf, Thrift and Avro. I tried to put some information at this page: http://microsoft.github.io/bond/why_bond.html

drivingmenuts11y ago

And yet they still can't build a web page that isn't a shitshow.

Main content has horizontal scroll on portrait monitors, which underlaps the transparent fixed div they used for navigation.

j / k navigate · click thread line to collapse

41 comments

26 comments · 8 top-level

sdave11y ago· 6 in thread

how does it compare to protobuf,thrift ?

joncfoo11y ago

Quoting apc @ https://lobste.rs/s/7w6p95/msft_open_sources_production_seri...

In theory, this means one team can hand another team a Bond schema, and if they don’t like how it’s serialized, fine, just change the protocol, but the schema doesn’t need to.

kentonv11y ago

1 more reply

nly11y ago

2 more replies

bradleyankrom11y ago

One key differentiator is the limited set of languages Bond currently supports:

"By design Bond is language and platform independent and is currently supported for C++, C#, and Python on Linux, OS X and Windows."

Versus Thrift:

"language bindings - Thrift is supported in many languages and environments C++ C# Cocoa D Delphi Erlang Haskell Java OCaml Perl PHP Python Ruby Smalltalk"

thesnider11y ago

1 more reply

_asummers11y ago

Or something like CBOR or JSONB?

nly11y ago· 4 in thread

No RPC? Disappointing. There are so few choices C and C++ programmers with regard to battle-tested, easy (read: code generation for decode and dispatch), language-agnostic RPC.

sapek11y ago

We are planning to release cross-platform RPC support but it just wasn't ready yet and we didn't want hold up the core release for it.

bradleyankrom11y ago

Have you tried any of the MessagePack RPC implementations? I haven't but I'm curious.

yawniek11y ago

i recently evaluated msgpack-rpc and thrift for a small side project. surprisingly it turned out that msgpack was not only much faster but also way easier to use (for lots of small messages).

i got around 300k msgs/s throughtput with msgpack-d-rpc

nly11y ago

gregwebs11y ago· 3 in thread

The Bond compiler is written in Haskell: http://blog.nullspace.io/bond-oss.html

It is about time considering that Microsoft research has been one of the main funders of work on the Haskell compiler.

oscargrouch11y ago

I've saw this yesterday, pretty happy about it, but then.. see that the compiler was coded in haskell..

sapek11y ago

2 more replies

cbd198411y ago

Can you please explain how you connected Haskell with Java in your mind? I'm honestly curious.

leetrout11y ago· 2 in thread

(I also like the idea of not having to ship JSON everywhere if I don't want to.)

seanp2k211y ago

TL;DR by using more of the available features in ElasticSearch, you can probably replace all of your external app with ElasticSearch.

A few things:

- ElasticSearch is definitely not schema-less, but it can try to generate a schema (aka "mapping") for you if you don't give it one: http://www.elasticsearch.org/guide/en/elasticsearch/hadoop/c...

- Another idea would be to use the scripting module to write a custom "view": http://www.elasticsearch.org/guide/en/elasticsearch/referenc...

rch11y ago

I'm doing something similar at work. I'd be happy to chat about it if you drop me a line sometime.

ziedaniel111y ago· 2 in thread

Someone11y ago

sapek11y ago

There are two advantages to generating code at runtime:

1 more reply

a_c11y ago· 1 in thread

How would this compared with apache thrift?

sapek11y ago

See https://news.ycombinator.com/item?id=8868045

sapek11y ago

There's been a lot of questions on how Bond compares to Protobuf, Thrift and Avro. I tried to put some information at this page: http://microsoft.github.io/bond/why_bond.html

drivingmenuts11y ago

And yet they still can't build a web page that isn't a shitshow.

Main content has horizontal scroll on portrait monitors, which underlaps the transparent fixed div they used for navigation.

j / k navigate · click thread line to collapse