http://code.google.com/p/protobuf/
I need to move some data faster and with less parsing on either side of the transmission, and these seem like a good choice.
http://msgpack.sourceforge.net/
Make sure you benchmark simple JSON. It might be enough.
I also have some benchmarks of Python deserialization of JSON, and talk about alternatives here: http://blog.metaoptimize.com/2009/03/22/fast-deserialization...
I encourage you to listen to jokull, and find out if JSON really is too slow for your needs. My advisor taught me that if you keep your data in an easy-to-read format, you're more likely to catch bugs in the output, merely because you have first-class tools for inspecting the file and are more likely to do so.
That said, overall I agree with your overall sentiment, certainly do look at JSON as well. Protobuf is overkill for a lot of things, and JSON keeps things simpler.
In another case, more of an experiment, I used them to serialize game data and sent it using enet. This proved very flexible and easy to change/add things, and the packets were extremely compact.
Pros:
* Read/Write access to data from C++ or Python
* Generated API's were easy to work with
* Very compact representation
* Ascii-dump version very useful for debugging
* More error checking than something like json (i.e. it tells you if you leave out a required field)
Cons:
* Adds some build steps, can be more of a headache to maintain (compared to json or something)
* API can't parse ascii version, bad for config data or other stuff that might want to be human readable (vs. xml or json)
* Generally requires copying your data into the protobuf struct, and then packing, rather than going straight from your "native" format into a packed buffer.
* Adds a bit more complexity
* Not as lightweight as json
For what you're doing, I would recommend them.
They're great for "structure" style data, a little weird for array-style. For example, one of the things I was storing was a 4x4 matrix, and I resorted to making a struct with 16 members such as m_00, m_01, etc.. which worked fine and it stored it compactly but was a little weird. I don't think there's a way to have a float[16] or something like that. I could be wrong, maybe there's a better way to do this.
Generally, these days i use one of three formats. I am very happy to have outgrown xml.
protobuf -- for hierarchical, nested data, if it needs to be compact and accessed from different languages
JSON -- for quick and dirty stuff, when format needs to be flexible (or when i need to use javascript)
GTO -- for large sets of structured data. (www.opengto.org)
Another approach to consider is using a text format (XML, JSON), then running it through fast compression like QuickLZ. This has the benefit of not having to change the program much more than a call to compress/decompress.
* Serializing data is ok but parsing takes quite a bit of time especially for large requests. (I am talking in milli seconds) * PBs always require a copy from your internal app data to its structures. Couldn't find a way to avoid that. * They have variable length encoding and it might be a good option if your data comprised of large percent of integers. From our experience don't use it if you are sending within your corp network as packing and parsing takes more time compared to savings in amount of data transfer. They might be a good option if you are sending data across slow networks.
Some of the metrics show that Thrift performs better than PBs. Also Thrift provides options of using different protocols. If Performance is prime criteria JSON + zipping should be a good option. Also they won't have an intermediate step of generating marshaling code.
As long as there are libraries for the languages you're using it's not a big deal. I'd recommend solving the problem and moving on - in the serialization format wars the real victim is productivity.
What I did for an app was encode a kind of JSON in PBs:
http://pwpwp.blogspot.com/2009/08/storing-json-as-protocol-b...