RJSON: compress JSON to JSON (opens in new tab)

(cliws.com)

62 pointsdogada14y ago41 comments

41 comments

31 comments · 11 top-level

ZenPsycho14y ago· 5 in thread

I see a pretty major problem with this. It seems to depend on the order of key value pairs in object literals being defined. The order is not defined by the JSON or the ecmascript standards. So you can't really depend on the order of keys in a json object, unless you explicitly define some order (like alphabetical, for instance). I like the basic concept of compressing json to json but this is not a particularly good way to do it- since the order of those keys may not be preserved in round trips through json encoders and decoders in various languages.

whalesalad14y ago

I saw the same thing right away. Seems like the first item in the array would have to be used as a hash of key/index pairs to be used for the following items. I guess this would only be useful with large sets of data, the examples makes them look kind of silly =)

So this:

    "users": [
        {"first": "Homer", "last": "Simpson"},
        {"first": "Hank", "last": "Hill"},
        {"first": "Peter", "last": "Griffin"}
    ],

Becomes:

    "users": [
        ["first", "last"],
        ["Homer", "Simpson"],
        ["Hank", "Hill"],
        ["Peter", "Griffin"]
    ],

ZenPsycho14y ago

if you can infer a schema, I'd almost prefer a column oriented arrangement.

   "users": 
        {
         "first":["Homer","Hank","Peter"], 
         "last":["Simpson","Hill","Griffin"]
         },

then the information about the original structure can be restored by a set of object paths which need to be "rotated" from column to row orientation. ["users"]

saving a few characters in the process. though I see that the advantage of this system is supposedly that it can handle any sort of shape of data, not just ones with a fixed schema. I've been trying to figure out how trang [http://www.thaiopensource.com/relaxng/trang.html] does its schema inference trick. (it turns a set of xml files into a relaxNG schema). If you have a schema for a JSON file, it's knowledge you can apply to algorithmically creating really efficient transformations.

1 more reply

snorkel14y ago

Prefer this approach. It's more readable and less fragile.

microtonal14y ago

Doesn't this introduce ambiguity? How do you represent list of 'tuple lists'?

2 more replies

dogadaOP14y ago

Thanks ZenPsycho, I added sorting of object schema keys to fix this issue: https://github.com/dogada/RJSON/commit/a27c8927cd0c2d7d151e2...

gcb14y ago· 4 in thread

It's all fun until someone loses an eye.

What if i look at your API output and assume it's json (only got unique items) but it's rjson? Or whatever?

The most important thing when adding another layer to a protocol is identification.

So, please, put the whole altered object into a rjson root node so it's clear what we're dealing with.

orthecreedence14y ago

Correct me if I'm wrong, but isn't this what Content-type is for?

1 more reply

dogadaOP14y ago

I agree that identification is important issue, but IMO it's protocol level issue. RJSON is not protocol, it's algorithm. Someone will prefer to wrap all in {"rjson": ...}, someone like {"format": "rjson", "data": ...} and so on. I belive, algorithm itself should create as less limitations as possible.

rubyruy14y ago

The MIME type is allowed to add information about the nature of the data encoded on top of the encoding. This is why we have application/xhtml+xml. It means "this is encoded using xml, and btw the parsed xml structure is xhtml".

The ideal place to handle this is (as pointed out above) content negociation - specifically as 'application/rjson+json'

gcb14y ago

It's an algo. But the resulting data is a protocol.

zippie14y ago· 3 in thread

Would like to emphasize that this is only really useful in environments where gzip is not available (as the OP notes)...some tests using the demo JSON (minified):

test.json = 285 bytes

test.rjson = 233 bytes (18%)

test.json.gz = 205 bytes (27%)

If you are able to bundle a RJSON parser, why not just bundle an existing, well understood/tested compression scheme such as http://stuartk.com/jszip/ or https://github.com/olle/lz77-kit/blob/master/src/main/js/lz7... instead?

ZenPsycho14y ago

An arithmetic coding scheme which has a model based on the probabilities found in JSON abstract syntax trees would significantly improve on most typically used generic compression schemes. Arithmetic coding schemes have largely been avoided thus far due to patents which have recently expired, if I remember correctly.

using the order 2 precise model on this page I get 190 bytes-- and that is still a generic non-json model. http://nerget.com/compression/

zippie14y ago

This - JSON specific compression schemes aren't going to yield gains over AST friendly schemes unless the JSON serialization specification changes significantly.

Along these lines - shipping a schema with the data payload is avro-like ... which is also questionable in terms of efficiency when compared with gzip/LZO.

1 more reply

microtonal14y ago

And if you used gzip on a file, is has some overhead (the 10-byte gzip header) and a freshly initialized deflate state. Usually, compression improves when more data is seen, since the dynamic Huffman tree improves and there are more blocks for LZ77 to backreference.

perfunctory14y ago· 3 in thread

How would it compress and then decompress the following document?

  {"users": [
    {"first": "Homer", "last": "Simpson"},
    {"first": "Hank", "last": "Hill"},
    [2, "Peter", "Griffin"]
  }

asdfaoeu14y ago

    {
        "users": [
            {
                "first": "Homer",
                "last": "Simpson"
            },
            [
                2,
                "Hank",
                "Hill"
            ],
            [
                0,
                2,
                "Peter",
                "Griffin"
            ]
        ]
    }

There was a tester linked from the site. Looks like he added the [0, to handle that case.

dogadaOP14y ago

Yeah, for arrays with first numeric value is reserved schema with index 0. You can see more examples in the unit tests: https://github.com/dogada/RJSON/blob/master/test/tests.js

1 more reply

judofyr14y ago

There's a demo here: http://www.cliws.com/p/rjson/

It comes it to:

    {
      "users": [
        {
          "first": "Homer",
          "last": "Simpson"
        },
        [
          2,
          "Hank",
          "Hill"
        ],
        [
          0,
          2,
          "Peter",
          "Griffin"
        ]
      ]
    }

jimktrains214y ago· 2 in thread

> reduce JSON data size and network traffic when gzip isn't available. For example, in-browser 3D-modeling tools like Mydeco 3D-planner may process and send to server megabytes of JSON-data;

`Content-encoding: gzip` anyone?

mhansen14y ago

I don't believe that browsers let uploads be gzipped, as it can't be sure that the server supports gzipped requests.

jimktrains214y ago

Good call. I would think there should be a way to specify it... Something to look into I guess.

EDIT: or as a comment above states, compress (gzip/deflate) it yourself. Not the most elegant, but if space is an issue.

Too14y ago· 1 in thread

Good idea but i think for those extreme cases where you really have use for this you might as well go with a specialized protocol, which can be based on json if you need it.

I had to do this for an application which streamed several hundreds of data points per second to a browser-client. Both data-size on the network and parsing time in the browser was my biggest issues. Since it was a html-app i had to use either JSON.parse or custom parsing written in javascript, the second option being too slow to be viable. I ended up with something based on almost only json-arrays and then the client would know what the different items in the array meant. With his example it would only look something this: [7, ["programming", "javascript"],["Peter", "Griffin", "Homer", "Simpson", "Hank", "Hill"]]

So in other words it's just enough overhead to make it parsable by json.parse but otherwise you only transfer the actual data.

Note that I wouldn't recommend this option unless you really hit a wall where you realize that this in fact is a bottleneck.

dogadaOP14y ago

RJSON not only compresses data of any structure but keeps it in JSON format that is huge advantage over gzip/7z and other binary compression algorithms. You can compress 1M of JSON to 300Kb of gzipped JSON but then you will to parse 1Mb of JSON on client anyway. With RJSON you can compress 1Mb of JSON to 500Kb of RJSON, then gzip it down to 200Kb and parse only 500Kb of JSON. Then you can analyze some core properties of parsed RJSON document and unpack it on demand. For example you may send to client collection of 20 documents, parse each and show title of each document to the client, then fully unpack only selected by client document.

I agree with Too that specialized protocol will always win, but RJSON IMO decreases structural entropy almost to 0 without need to debugging own protocol.

uncle-ezra14y ago· 1 in thread

If your HTTP connection is going to be gzip compressed, then manual compression of this kind is not guaranteed to reduce the size of the final result, and may in fact hurt it.

lsb14y ago

If your HTTP connection is going to be gzip compressed, then this will only give you monetary savings to compensate for the additional code complexity if you have medium to big data, at which point storing it in JSON is a questionable-at-best proposition.

reedhedges14y ago· 1 in thread

dogada, do you have any results showing both size reduction, and speed of compression/decompression vs. other methods?

dogadaOP14y ago

On algorithmic level compression is mainly identical to JSONH. I didn't do performance tests yet, I still have ideas how to improve RJSON.

gfosco14y ago

I wondered if it was April Fools day... Really, JSON is lightweight enough that it doesn't need compressing. We're not dealing with XML here! This just adds another point of failure.

chubbard14y ago

If you just use GZIP, supported by most browsers, it compresses without the need of special software on the client side to rebuild the original JSON response. And does a better job at compression I bet. I thought of doing something like this to rebuild cycles in object graphs, but I didn't because it requires special parsing logic for the client to use.

leeoniya14y ago

similar to JSONH, which i use to pass recordsets back and forth. there's no recursiveness for nested stuff though.

https://github.com/WebReflection/JSONH

j / k navigate · click thread line to collapse

41 comments

31 comments · 11 top-level

ZenPsycho14y ago· 5 in thread

whalesalad14y ago

So this:

    "users": [
        {"first": "Homer", "last": "Simpson"},
        {"first": "Hank", "last": "Hill"},
        {"first": "Peter", "last": "Griffin"}
    ],

Becomes:

    "users": [
        ["first", "last"],
        ["Homer", "Simpson"],
        ["Hank", "Hill"],
        ["Peter", "Griffin"]
    ],

ZenPsycho14y ago

if you can infer a schema, I'd almost prefer a column oriented arrangement.

   "users": 
        {
         "first":["Homer","Hank","Peter"], 
         "last":["Simpson","Hill","Griffin"]
         },

then the information about the original structure can be restored by a set of object paths which need to be "rotated" from column to row orientation. ["users"]

1 more reply

snorkel14y ago

Prefer this approach. It's more readable and less fragile.

microtonal14y ago

Doesn't this introduce ambiguity? How do you represent list of 'tuple lists'?

2 more replies

dogadaOP14y ago

Thanks ZenPsycho, I added sorting of object schema keys to fix this issue: https://github.com/dogada/RJSON/commit/a27c8927cd0c2d7d151e2...

gcb14y ago· 4 in thread

It's all fun until someone loses an eye.

What if i look at your API output and assume it's json (only got unique items) but it's rjson? Or whatever?

The most important thing when adding another layer to a protocol is identification.

So, please, put the whole altered object into a rjson root node so it's clear what we're dealing with.

orthecreedence14y ago

Correct me if I'm wrong, but isn't this what Content-type is for?

1 more reply

dogadaOP14y ago

rubyruy14y ago

The ideal place to handle this is (as pointed out above) content negociation - specifically as 'application/rjson+json'

gcb14y ago

It's an algo. But the resulting data is a protocol.

zippie14y ago· 3 in thread

Would like to emphasize that this is only really useful in environments where gzip is not available (as the OP notes)...some tests using the demo JSON (minified):

test.json = 285 bytes

test.rjson = 233 bytes (18%)

test.json.gz = 205 bytes (27%)

ZenPsycho14y ago

using the order 2 precise model on this page I get 190 bytes-- and that is still a generic non-json model. http://nerget.com/compression/

zippie14y ago

This - JSON specific compression schemes aren't going to yield gains over AST friendly schemes unless the JSON serialization specification changes significantly.

Along these lines - shipping a schema with the data payload is avro-like ... which is also questionable in terms of efficiency when compared with gzip/LZO.

1 more reply

microtonal14y ago

perfunctory14y ago· 3 in thread

How would it compress and then decompress the following document?

  {"users": [
    {"first": "Homer", "last": "Simpson"},
    {"first": "Hank", "last": "Hill"},
    [2, "Peter", "Griffin"]
  }

asdfaoeu14y ago

    {
        "users": [
            {
                "first": "Homer",
                "last": "Simpson"
            },
            [
                2,
                "Hank",
                "Hill"
            ],
            [
                0,
                2,
                "Peter",
                "Griffin"
            ]
        ]
    }

There was a tester linked from the site. Looks like he added the [0, to handle that case.

dogadaOP14y ago

Yeah, for arrays with first numeric value is reserved schema with index 0. You can see more examples in the unit tests: https://github.com/dogada/RJSON/blob/master/test/tests.js

1 more reply

judofyr14y ago

There's a demo here: http://www.cliws.com/p/rjson/

It comes it to:

    {
      "users": [
        {
          "first": "Homer",
          "last": "Simpson"
        },
        [
          2,
          "Hank",
          "Hill"
        ],
        [
          0,
          2,
          "Peter",
          "Griffin"
        ]
      ]
    }

jimktrains214y ago· 2 in thread

> reduce JSON data size and network traffic when gzip isn't available. For example, in-browser 3D-modeling tools like Mydeco 3D-planner may process and send to server megabytes of JSON-data;

`Content-encoding: gzip` anyone?

mhansen14y ago

I don't believe that browsers let uploads be gzipped, as it can't be sure that the server supports gzipped requests.

jimktrains214y ago

Good call. I would think there should be a way to specify it... Something to look into I guess.

EDIT: or as a comment above states, compress (gzip/deflate) it yourself. Not the most elegant, but if space is an issue.

Too14y ago· 1 in thread

Good idea but i think for those extreme cases where you really have use for this you might as well go with a specialized protocol, which can be based on json if you need it.

So in other words it's just enough overhead to make it parsable by json.parse but otherwise you only transfer the actual data.

Note that I wouldn't recommend this option unless you really hit a wall where you realize that this in fact is a bottleneck.

dogadaOP14y ago

I agree with Too that specialized protocol will always win, but RJSON IMO decreases structural entropy almost to 0 without need to debugging own protocol.

uncle-ezra14y ago· 1 in thread

If your HTTP connection is going to be gzip compressed, then manual compression of this kind is not guaranteed to reduce the size of the final result, and may in fact hurt it.

lsb14y ago

reedhedges14y ago· 1 in thread

dogada, do you have any results showing both size reduction, and speed of compression/decompression vs. other methods?

dogadaOP14y ago

On algorithmic level compression is mainly identical to JSONH. I didn't do performance tests yet, I still have ideas how to improve RJSON.

gfosco14y ago

I wondered if it was April Fools day... Really, JSON is lightweight enough that it doesn't need compressing. We're not dealing with XML here! This just adds another point of failure.

chubbard14y ago

leeoniya14y ago

similar to JSONH, which i use to pass recordsets back and forth. there's no recursiveness for nested stuff though.

https://github.com/WebReflection/JSONH

j / k navigate · click thread line to collapse