So this:
"users": [
{"first": "Homer", "last": "Simpson"},
{"first": "Hank", "last": "Hill"},
{"first": "Peter", "last": "Griffin"}
],
Becomes: "users": [
["first", "last"],
["Homer", "Simpson"],
["Hank", "Hill"],
["Peter", "Griffin"]
], "users":
{
"first":["Homer","Hank","Peter"],
"last":["Simpson","Hill","Griffin"]
},
then the information about the original structure can be restored by a set of object paths which need to be "rotated" from column to row orientation.
["users"]saving a few characters in the process. though I see that the advantage of this system is supposedly that it can handle any sort of shape of data, not just ones with a fixed schema. I've been trying to figure out how trang [http://www.thaiopensource.com/relaxng/trang.html] does its schema inference trick. (it turns a set of xml files into a relaxNG schema). If you have a schema for a JSON file, it's knowledge you can apply to algorithmically creating really efficient transformations.
test.json = 285 bytes
test.rjson = 233 bytes (18%)
test.json.gz = 205 bytes (27%)
If you are able to bundle a RJSON parser, why not just bundle an existing, well understood/tested compression scheme such as http://stuartk.com/jszip/ or https://github.com/olle/lz77-kit/blob/master/src/main/js/lz7... instead?
using the order 2 precise model on this page I get 190 bytes-- and that is still a generic non-json model. http://nerget.com/compression/
Along these lines - shipping a schema with the data payload is avro-like ... which is also questionable in terms of efficiency when compared with gzip/LZO.
`Content-encoding: gzip` anyone?
EDIT: or as a comment above states, compress (gzip/deflate) it yourself. Not the most elegant, but if space is an issue.
I had to do this for an application which streamed several hundreds of data points per second to a browser-client. Both data-size on the network and parsing time in the browser was my biggest issues. Since it was a html-app i had to use either JSON.parse or custom parsing written in javascript, the second option being too slow to be viable. I ended up with something based on almost only json-arrays and then the client would know what the different items in the array meant. With his example it would only look something this: [7, ["programming", "javascript"],["Peter", "Griffin", "Homer", "Simpson", "Hank", "Hill"]]
So in other words it's just enough overhead to make it parsable by json.parse but otherwise you only transfer the actual data.
Note that I wouldn't recommend this option unless you really hit a wall where you realize that this in fact is a bottleneck.
I agree with Too that specialized protocol will always win, but RJSON IMO decreases structural entropy almost to 0 without need to debugging own protocol.
{"users": [
{"first": "Homer", "last": "Simpson"},
{"first": "Hank", "last": "Hill"},
[2, "Peter", "Griffin"]
} {
"users": [
{
"first": "Homer",
"last": "Simpson"
},
[
2,
"Hank",
"Hill"
],
[
0,
2,
"Peter",
"Griffin"
]
]
}
There was a tester linked from the site. Looks like he added the [0, to handle that case.It comes it to:
{
"users": [
{
"first": "Homer",
"last": "Simpson"
},
[
2,
"Hank",
"Hill"
],
[
0,
2,
"Peter",
"Griffin"
]
]
}What if i look at your API output and assume it's json (only got unique items) but it's rjson? Or whatever?
The most important thing when adding another layer to a protocol is identification.
So, please, put the whole altered object into a rjson root node so it's clear what we're dealing with.
The ideal place to handle this is (as pointed out above) content negociation - specifically as 'application/rjson+json'