The only three formats I've found so far that are readable and well supported are XML, JSON, and YAML. XML is too hefty and wasteful. YAML has had a bad history of insecure encoders and decoders but overall is my favorite data format. However, it still has the downside of needing a special decoder since browsers don't support it, and it requires specific indentation for its hierarchical data format which is wasteful in its own way.
That just leaves JSON in my opinion. It's easily understood and read, and native browser and Node.js encoding and decoding is more than fast enough.
I've always found this pretty semantic. Obviously tool support for reading ascii or UTF-8 encoded text is very strong, but it's all binary, so it's a question of tools. I do find JSON the most palatable of the text-encoded formats, but I'd adore it if a solid binary replacement gained favor. Unfortunately I think a lot of the protocols that have had pretty good energy behind them (pbuffers, thrift, hessian, etc.) end up either going the compiled-stubs route ,or bundle RPC, or both.
I'd really like to see a binary protocol with solid type support (please God, let it define a built-in datetime) that can be written and read dynamically (without a header file/stub).
I keep seeing this but having looked at hundred megabyte large JSON objects or XML files, some might as well be binary to my eyes.
The page you linked to doesn't support your argument, BTW. It tries to assert that sexprs support hashes - by extending the syntax with a reader macro!
This line shows the author's confusion: "S-expressions are more powerful (because of the duality of code and data)". The guy is confusing a format that strictly should not have behaviour beyond constructing a data structure, with Lisp more generally. If you allow the data structure to contain code that further interprets the data structure, the complexity of sanitizing input greatly increases.
It probably doesn't fit your "well supported" criterion I suppose, but how about Clojure's EDN? I've been meaning to try it out more myself.
Every time I run across JSON examples, I see REBOL without the elegance. The two languages are related of course. REBOL strongly influenced the design of JSON.
ref: On JSON and REBOL - http://www.rebol.com/cgi-bin/blog.r?view=0522 (HN - https://news.ycombinator.com/item?id=5654895)
So why did you even comment in the first place?
In this case, I'd say JSON not being the fastest it could possibly be doesn't matter too much. As a result, pointing that out isn't particularly valuable, but could be a bit more valuable if there was an easy-to-adopt solution also being proposed.
There are language specific serializations if you never leave that language environment. There are protobufs, ASN.1, messagepack, thrift, bert, avro.
I am personally am not rushing to replace it but I can see the point he is making.
I can live with that. Any solution that people come up with, there will be folks who can point out some sort of flaw in it. Sometimes the flaws are worth paying some attention to, when fixing them might lead to some tangible benefits. It's difficult to imagine a slightly less CPU intensive replacement for JSON bringing any tangible benefits with it, though.
HTML doesn't handle URIs any differently. They're just string attributes on elements(generally).
Is the problem that there aren't clients that can generally consume the JSON and find the hyperlinks? Is the idea that the JSON should be able to be rendered as a site kind of like the whole xml/xslt idea?
I guess I just don't get what the problem is here? What would built-in uri's look like?
Javascript assumes all numbers are double-precision FP, yes?
I don't think JSON requires the number representation to be double-precision, that can be up to the parser to decide how to represent a number. I think most people misrepresent this because the number implementation in JavaScript is double-precision, but that's not necessarily true for JSON. Am I wrong about that? Does JSON require the number format to be double-precision?
One huge benefit of JSON over msgpack or something similar is READABILITY.
This is not to say that the arguments are wrong, but I think you may not understand why Erlangers feel the way they do until you realize that Erlangers also don't generally get to experience the advantages of JSON, either; when you get none of the advantages and only the disadvantages, anything looks bad, no matter how good it may be in the abstract.
This is a particularly rich irony in light of the fact that Erlang's native data model is conceptually the closest language I know to natively using JSON as the only data type. Erlang has no user-defined types, and everything is an Erlang term... imagine if JS worked by only allowing JSON, so, no objects, no prototypes, etc. The entire language basically works by sending "ErlON" around as the messages, including between nodes. It's just that there's no good mapping whatsoever between Erlang's "ErlON" and JSON.
Parsing the binary blobs for interop is annoying, see for exmaple the Wings file format, made almost entirely of Erlang serialized objects:
http://en.wikibooks.org/wiki/Wings_3D/User_Manual/Wings_File...
This is a glimpse of the last place on earth where relying on US ASCII is considered a positive good.
If JSON did actually specify UTF-8 and provided a sensible escape mechanism for the whole of unicode, this would be an improvement, because it would turn JSON into an actual standard data-exchange format, rather than a textual encoding of its infoset.
Erlang's internal serialization format, meanwhile—and all the language's native pattern-matching constructs—are built on what are basically "generators" that consume a (possibly-infinite) binary stream, and lex tokens directly out of it. JSON doesn't really work with this approach; what you end up doing is having one generator that consumes the binary stream and emits codepoints, and then another generator that consumes codepoints and emits tokens. This introduces a lot of intermediate allocations and message-passing—whereas, with most Erlang protocol handlers, your TCP handler passes you a slice of a VM-managed shared binary, and then you just pass around and re-slice that slice.
https://github.com/talentdeficit/jsx
Erlang has been receiving and sending data. It just like to deal well defined binary messages and it likes to encode/decode them into its own representations (records, terms) at the boundary where they come in and leave the system.
Going with JSON first makes sense, if you need to create an alternate more efficient protocol in the future you'll at least have a few more data points to use when selecting something new like msgpack/protobuffers/thrift or whatever.
..But only the JSON format will have any users, and I still have to deal with anything I find objectionable about JSON.
The problem isn't really JSON - JSON is an exceptional format for what it is supposed to do - the problem is that Erlang was created for a specific purpose and that purpose wasn't to vend out strings/JSON over HTTP.
JSON string are translated to binaries. There is no reason to keep repeating oh inters=strings. Maps will help.
> The problem isn't really JSON
I think the author disagrees. He talks about the problems of "JSON". There are a few he notes:
* No binaries.
* No way to have hyperlinks
* Limited floating point number implementations
JSON doesn't solve every problem. It does solve the problem, quite elegantly, of creating a 'endianless' interchange format that is easily consumed, constructed, and debugged.
Oh, wait....
Oh, wait, you haven't read the post.
A general purpose serialization format that requires per-char processing is a terrible pick.
Except that JSON actually just specifies the number type as an arbitrary precision decimal number.
Many implementations use floating point numbers when decoding JSON, but that is not inherent in JSON.
My biggest complaint about numbers in json is that often times floating point numbers get turned into integers when encoding/decoding using some implementations. (e.g. (float)2 gets encoded as just 2 and when decoding, it is an integer rather than a floating point.
This is precisely the reason it is used so heavily. Easily readable format to reason about. And incredibly slow is a pretty relative term, when you're waiting on database calls or doing other complex logic that is orders of magnitude slower, who cares how "slow" json parsing is.
> It has to be valid UTF-8, meaning it's incredibly slow to validate.
I think being able to embed all sorts of different characters and languages is, again a plus. See argument above about performance.
> Its numbers representation is double-precision floating-point, meaning it's incredibly imprecise and limited.
This argument I don't get. I'm pretty sure I might be missing something, but in my experience you can just put a plain old integer and any parser in any language will extract as an exact integer. Nobody is converting a "1" in json into a double/float. Maybe somebody can elaborate and what the author might have actually meant?
So, really, the argument all boils down to, it's slow and wasteful. Well, while that's true I think it's pretty much been established time and time again that Moore's law has made it possible to value programmer time over CPU overhead, to a reasonable extent (i.e. if the overhead you're adding overtakes Moore's law and makes infrastructure particularly expensive). If you have a format that is human readable, easy to understand, and simple, that helps tremendously in software development and it would take order of magnitude performance hits to really make it bad tradeoff (and even the, if you weren't getting a lot of traffic, who would care?), not just 2x or 3x.
We live in a web based world, and that world is fundamentally based on a text based protocol (http) and text based messaging formats. There are plenty of valid and good reasons why it happened the way it did.
*Final Note: I would like to see some empirical evidence to compare the wasted CPU cycles and energy that JSON uses compared to if all messages were sent with msgpack or something like it instead. While my own inclination is that number would be dwarfed by the overall energy used in computation I would prefer to see evidence rather then conjecture if you're going to make a point like that.
> Maybe somebody can elaborate and what the author might have actually meant?
I think JSON numbers are not necessarily double-precision, at least I don't think I've ever seen them specifically described this way. In JavaScript, however, numbers are double-precision, there are no real integers in JavaScript. So maybe the author misrepresented JSON numbers because JavaScript numbers are double-precision. But that isn't necessarily the case AFAIK.
"On a tangent: IMHO most binary serialization formats like BSON, MessagePack or Protocol Buffers are misdesigns. They add lots of complications to save a couple of bytes. But that does not translate into a substantial improvement of the parsing speed compared to a heavily tuned parser for a much simpler text-based format."
[1] http://www.freelists.org/post/luajit/Adding-assembler-code-t...
It may not be perfect, but compared to the complexity of the XML world and the opaqueness of binary formats JSON is a very pleasant compromise.
Shame about it not having comments though.... :-)
> * It has to be valid UTF-8, meaning it's incredibly
> slow to validate.
If make basic mistakes about what the format allows, the rest of your comment is liable to be junk.* It's text-based, meaning it's readable and easy to parse.
* It has to be valid UTF-8, no hassle with other formats.
* Its numbers representation is double-precision floating-point, i can choose what i want.
Who cares? Use what you want and what solves your problem at the best way.