`python3 -m json.tool somefile.json` or `cat foo.json | python3 -m json.tool` will print it in "one line per node" format. 3.9 introduces a --sort-keys switch for sorted objects also.
(I have actually encountered a order-dependent JSON-subset parser before, but to my mind, that code is broken)
The main apps I've seen that depend on JSON structure are for hashing, which would also be broken by whitespace / linebreak variances in pretty-printers.
Something like:
open file.json | select colors | each { ^echo $it.hex }
is much nicer than jqjson_pp < somefile.json
The only thing I don't like is that it doesn't process commandline arguments. You have to pipe the file in. It is also fairly strict, I've run into a number of malformed JSON files that it rejects but other parsers would accept. Naked TRUE/FALSE statements are one thing it hates that are super common, especially from places like Google.
{
"colors": [
{ "color": "black", "hex": "#000", "rgb": [ 0, 0, 0 ] },
{ "color": "red", "hex": "#f00", "rgb": [ 255, 0, 0 ] },
{ "color": "yellow", "hex": "#ff0", "rgb": [ 255, 255, 0 ] },
{ "color": "green", "hex": "#0f0", "rgb": [ 0, 255, 0 ] },
{ "color": "cyan", "hex": "#0ff", "rgb": [ 0, 255, 255 ] },
{ "color": "blue", "hex": "#00f", "rgb": [ 0, 0, 255 ] },
{ "color": "magenta", "hex": "#f0f", "rgb": [ 255, 0, 255 ] },
{ "color": "white", "hex": "#fff", "rgb": [ 255, 255, 255 ] }
]
}Numbers to the right make it much more pleasant to my eyes
{
"colors": [
{ "color": "black", "hex": "#000", "rgb": [ 0, 0, 0 ] },
{ "color": "red", "hex": "#f00", "rgb": [ 255, 0, 0 ] },
{ "color": "yellow", "hex": "#ff0", "rgb": [ 255, 255, 0 ] },
{ "color": "green", "hex": "#0f0", "rgb": [ 0, 255, 0 ] },
{ "color": "cyan", "hex": "#0ff", "rgb": [ 0, 255, 255 ] },
{ "color": "blue", "hex": "#00f", "rgb": [ 0, 0, 255 ] },
{ "color": "magenta", "hex": "#f0f", "rgb": [ 255, 0, 255 ] },
{ "color": "white", "hex": "#fff", "rgb": [ 255, 255, 255 ] }
]
}You mean:
{
"colors": [
{ "color": "black" , "hex": "#000", "rgb": [ 0, 0, 0 ] },
{ "color": "red" , "hex": "#f00", "rgb": [ 255, 0, 0 ] },
{ "color": "yellow" , "hex": "#ff0", "rgb": [ 255, 255, 0 ] },
{ "color": "green" , "hex": "#0f0", "rgb": [ 0, 255, 0 ] },
{ "color": "cyan" , "hex": "#0ff", "rgb": [ 0, 255, 255 ] },
{ "color": "blue" , "hex": "#00f", "rgb": [ 0, 0, 255 ] },
{ "color": "magenta", "hex": "#f0f", "rgb": [ 255, 0, 255 ] },
{ "color": "white" , "hex": "#fff", "rgb": [ 255, 255, 255 ] }
]
}That's an incredibly XML-ified version of a color table. I can clearly see the tags now. Can't just do a look up of a color color, instead I would have to iterate over the members or store it in a different data structure.
Why even use JSON? Blech.
"colors": { "red":{"rgb":"fff"}", ... }
And before you argue that dictionaries can still be iterated in order, you better check the sibling threads where people are arguing you shouldn’t rely on that.
color | hex | rgb ║
red | #f00 | [3] ║
black | #000 | [3] ║
yellow | #ff0 | [3] ║
green | #0f0 | [3] ║
cyan | #0ff | [3] ║
blue | #00f | [3] ║
magenta | #f0f | [3] ║
white | #fff | [3] ║
I would be nice to inline the rgb column here.This seems to be a bad idea. The JSON language spec has ORDERED object members. But the order is arbitrary (precisely the one given in the JSON string) and does not have to be the lexicographic.
Sorting the object members by default would introduce problems whenever the order matters to the consumer of the JSON.
False. “An object is an unordered collection of zero or more name/value pairs, where a name is a string and a value is a string, number, boolean, null, object, or array.” [emphasis added][0]
The normative text has the "real" answer, and the real answer is that it's basically undefined behavior. It starts by saying "The names within an object SHOULD be unique", and then elaborates:
An object whose names are all unique is interoperable in the sense that all
software implementations receiving that object will agree on the name-value
mappings. When the names within an object are not unique, the behavior of
software that receives such an object is unpredictable. Many implementations
report the last name/value pair only. Other implementations report an error
or fail to parse the object, and some implementations report all of the
name/value pairs, including duplicates.
JSON parsing libraries have been observed to differ as to whether or not they
make the ordering of object members visible to calling software.
Implementations whose behavior does not depend on member ordering will be
interoperable in the sense that they will not be affected by these
differences.
https://tools.ietf.org/html/rfc8259#section-4 {
"foo": "this is a comment about foo",
"foo": "actual value of foo that overwrites the comment"
}
The trick is that the second value value of foo overwrites the first. But, clearly, sorting would would wreak havoc here (if the value was used in the sort key). ;)A fun fact about MongoDB is it will actually store that JSON, both duplicate keys. The implication is that whatever MongoDB client you're using, that maps Mongo data to dictionaries/maps, is not capable of representing all valid MongoDB documents. It's important to recognize that Mongo may be storing data your client will not be able to access.
I learned this when the Python client was showing one value for a key, and the Ruby client was showing another value for the same key, and neither client was showing the whole document.
> An object is an unordered collection of zero or more name/value pairs, where a name is a string and a value is a string, number, boolean, null, object, or array.
"whenever the order matters to the consumer of the JSON" should be never.
More pragmatically, regardless of what the spec says, a ton of JSON tooling assumes the order doesn't matter and relying on it would be a big mistake.
That's the great thing about specs -- if you don't like what one says, there's always another to support your position. :)
Agreed. But this does not mean that a tool should break it.
My assumption would be that
fn(parse(pretty_print(someJSONString)))
should always evaluate to the same as fn(someJSONString)
(for all functions fn)It's unclear: At one point it says "An object is an unordered set of name/value pairs." while in the actual grammar it is ordered:
object
'{' ws '}'
'{' members '}'
members
member
member ',' membershttps://www.ecma-international.org/wp-content/uploads/ECMA-4...
The whole idea of pretty notation is automatically inserting non-significant whitespace to make it look nice. Step 2, "one line per node", inserts spaces and newlines. Step 4, "human style" strategically removes some of those so the lines look nice -- the 2nd level dict has lots of content, so it was split across multiple lines... while the 3rd level dict has fewer data, so it all fits on one line.
As opposed to this, Tree Notation is all about single canonical representation. So whitespace is significant, and you can never add or remove it to make output look nicer. You do whatever your schema tells you, and I hope you like many short lines.
So what they are really talking about is just pretty code. Their favorite examples utilize alignment (tree notation does that better—every tree doc is ismorphic to a spreadsheet and you don't have to align things to the left spine, and their are grid langs that don't do that).
The colors et al are called "secondary notations" and again Tree Notation can't be beat. Adding secondary notations is simple. Here's an example: https://www.youtube.com/watch?v=vn2aJA5ANUc
And the source for that homepage is here: https://github.com/treenotation/treenotation.org
Always open to PR!
That said, it is just another pun on the text as art thing. In that it doesn't really scale, and you are going to upset someone by not having a codified tool for automatically doing this. (I don't recall seeing align-regex in any popular tool.)
foo.bar.baz = 10
.biz = 12 // foo.bar.biz
..boz.baz = 31 // foo.boz.baz
etc. It basically combines really brittle context-sensitive grammar production with complete lack of greppability. [ { foo: a bar: 123.45 }
{ foo: abc bar: 6.7 } ]It's too ugly for humans (too many quotes, too many escape characters, and no comments) and too texty for machines.
Treating the colons as white space, as you've done with the commas, will move you one step closer to The Correct Answer™.
SEN is new. After dealing with broken JSON due to commas missing or one at the end of an array and some of the team using Javascript this was a way of sucking in the broken JSON and fixing it.
Postel tried to warn us.
> ...nice having the extra reminder that the left side of the colon is a key and the right a value.
Totally. IMHO: whitespace, formatting, delimiters are for humans. The parsers can do without. With some exceptions, like your examples of quoting strings to remove ambiguity.
Valid JSON:
{
"key1": "hello",
"key2": "world"
}
You could "trick" a JS formatter to format it by wrapping with a fake function, etc. Some minimum valid JS: json({
key1: "hello",
key2: "world",
});
MongoDB has some JS libraries that use similar tricks to use JS parsers for their shell query format (which is similar to JSON). For example, around line 597: https://unpkg.com/browse/ejson-shell-parser@1.1.1/dist/ejson...JSON.stringify(JSON.parse(require('fs').readfileSync('myfile.json')),null,2);
Remark and Unified are some well-known projects that wooorm maintains.
Is that a convention I'm not aware of? Seems a little obtuse and unnecessary, why not just accept two arguments? One less arbitrary usage detail to remember.
At this point I honestly take XML over JSON where I have a choice because of CDATA and comments.
I mean, at least before one has proven a tool's ubiquitous use, use a longer name.
jq just got lucky but I don't think it was because of its name ;).
I like the idea that the incompatible format is off by default.
How does SEN deal with numbers-encoded as string? is it something like .4 ? that's a bit confusing
YAML is “a superset of JSON”, yes, but there are two separate meanings to that:
• YAML has alternative syntactic sugar for expressing the same underlying JSON-equivalent semantics (sort of the same as Avro being canonically a binary compact expression of underlying JSON — in both cases, libraries for the codec expect JSON-encodable data structures as #encode input, and produce JSON-encodable data structures as #decode output)
• YAML has its own semantics (like node type annotations, or references) that JSON doesn’t have, such that documents that use these are no longer transposable into JSON.
I love bullet point #1. I hate bullet point #2.
Personally, I wish there was a name for the reduced subset of YAML that is still a “syntactic superset of JSON”, but which has none of the extended semantics of bullet-point #2.
Many systems that “consume YAML” already actually require their documents to be this “strictly-JSONifiable YAML”! Kubernetes, for example: it might seem to expose a YAML manifest API, but actually, internally, it does everything in JSON. All the resources in k8s etcd are stored in canonicalized JSON. The k8s controller just prettifies that JSON to YAML on its way out to you; and uglifies it back to JSON when you send it in. Which means that any YAML features that don’t survive that translation, can’t be used.
IMHO, if YAML hadn’t been designed with any extended semantics, but instead had strictly targeted being a “sugared alternative encoding of JSON”, I think everyone would have switched to sending YAML in place of JSON a long time ago. Browsers would have likely added YAML parsing as well.
But those added semantics are just so much extra work for everybody. Type annotations are source of so many vulnerabilities in programs that were unaware their input could “reach in and do things” through those types; and yet many YAML parser libs don’t have any flag to restrict them from decoding these type annotations (i.e. no way to “defuse the bomb.”) References change the entire way you have to write a YAML parser, disallowing some types of parsing grammar altogether, meaning you might no longer have access to the first-class parsing solution of your language runtime; meaning that for many runtimes, the YAML codec lib for that runtime is much slower — and memory-intensive! — than the JSON codec lib for the same runtime. Etc.
Honestly, if we could all agree on a name for “strict, JSONifiable YAML”, and create libraries that only parse/validate/accept that subset of YAML while rejecting the higher-level semantics, those libs—and that interchange format—would be immediately more popular than YAML. The time for this to happen hasn’t passed! We still have a chance!
Pretty JSON is inevitably for either logging or config files, and YAML is better at both of those.
1. First notice that there is a world of difference between what users want and what they are willing to achieve. Know this more than anything else. People will ask for all kinds of shit, and.... A wish list is not a fully explored business requirement with known sub-tasks and test cases. A simple ask can become something worthy of a different independent project.
2. Too subjective. Everybody has subtle different personal preferences. In some cases the inability to support some edge case of some language will cause certain users to have an emotional episode. WTF. This is free software providing a convenience that you can easily live without.
3. A lot of work. You have to be very clear about what language, grammar, class of languages, or other various of characters you are willing to support. For example there is HTML then there are about billion trillion different HTML template schemes each with their own syntax and inside that syntax is a wildly different language than the surrounding HTML.
4. Carve out a measurable portion of your life. This is an investment of time you will never get back. Writing a code beautifier is far more work than it sounds. First, you need a parser. If one does not exist for the language you wish to support in the language or format of your tool you will need to write one. Be careful though, because that parser will have to support conventions that are unique to beautification and not necessarily useful elsewhere. In the case of the HTML example above you will need multiple different parsers that can achieve a nesting of parse trees or achieve harmony of a uniform parse tree beloved by all languages. This is achievable, as I have done it, but good luck.
5. Maintenance. There are always new edge cases, new languages, new grammars, new features and your users will want them all. Set hard boundaries.
------
With the amount of work required you will begin to ask yourself some basic life questions:
Does this tool bring me more money or a better job? Does it bring me prestige AND satisfy a craving for attention? Does it improve my work, as in other real work outside your beautification tool?
In my case, for a while, the tool did allow me access to better jobs with increased pay. It demonstrated I could do things many other developers could not and that I was willing to dedicate some absurd about of effort into something people actually used. But, that will only take your career so far after which you are just spinning your wheels and burning time.
When I got further in my career I realized I wasn't beautifying my code ever. I had no need for the tool I was maintaining and despite continuous maintenance by me the tool started to decay, because the requirements had grown out of control and I was no longer an end user.
Your final example is just approaching a JSON -> YAML converter. If your complaint about your chosen human readable serialization format is that it isn't human readable enough, then switch to something more inherently human readable instead of writing tools to temporarily transform it.