Tips on adding JSON output to your CLI app (opens in new tab)

(blog.kellybrazil.com)

183 pointskbrazil4y ago110 comments

110 comments

73 comments · 19 top-level

1vuio0pswjnm74y ago· 12 in thread

Honestly I do not really understand this idea because, AFAIK, JSON was designed for Javascript in a web browser. By and large, Javascript-enabled web browser expect access to generous amounts of memory. This is not the case for the common UNIX userland programs. These programs do not expect large amounts of memory and many are written with the intent that they may be used to process text line-by-line. This JSON idea reminds me of Windows "PowerShell". Microsoft actually has to limit how much memory can be used by the shell. Why is that.

https://devblogs.microsoft.com/scripting/learn-how-to-config...

One of the things I like most about the UNIX userland is that I can use small programs to edit vary large files, without needing lots of memory. I want programs that are designed to accomodate the possibility of line-by-line processing.

If the intent is to make output network friendly, maybe something like netstrings is useful. Easy to parse. Low memory footprint.

Seems to me this JSON idea is not designed to improve performance, agility or resource efficiency but to ignore the UNIX example in favour of a different, slower, approach that is perceived as easier for some people to use. Namely those who do not want to spend the time to learn how to use an existing, faster solution with lower resource requirements.

alexiaya4y ago

This isn't some sort of philosophy debate that you're trying to make it out to be. The output of a lot of tools quite simply isn't in a machine-friendly format, and it can be a nightmare to try to write a parser for them yourselves.

You are misinterpreting the Unix philosophy. It's fine to use a bunch of sed, awk, grep, etc. when you're either transforming text or processing already well-structured data. But trying to write a full-fledged parser for something with only human-readable output, especially as a shell script, definitely goes against that philosophy. Congratulations, you've managed to piece together 50 commands in a pipeline and create a monstrosity that's far from the minimalist philosophy.

In fact, I would argue that by using `jc` together with `jq` you can actually create some nice pipelines for parsing the data that will be much more in line with the Unix philosophy.

Nobody ever said this was designed to improve performance, but I have a hard time believing your claims about it being significantly slower which is not backed up by any source. Most likely, eliminating the JSON conversion would be at most an unnecessary micro-optimization. But if your code was truly performance-critical, you wouldn't be piecing it together with shell pipelines that cause a bunch of unnecessary forks, you'd write it in something like C instead.

And the "JSON was designed for the web browser" argument doesn't hold much water either. You're about several decades too late for that, JSON is extremely ubuquitous and used in a lot of non-browser contexts. Sure, some people depending on their needs may use other formats like XML or protobuf, but JSON is still very common.

cormacrelf4y ago

Yes. The performance point boiled down to this:

> These programs do not expect large amounts of memory and many are written with the intent that they may be used to process text line-by-line

Which is only a problem if you are being very silly, don't choose NDJSON (newline-delimited JSON) and instead shove 10GB of data in a big [] array that the parser has to read in all at once. Almost every single JSON library can do NDJSON already. One of the most heavily used JSON-over-stdio applications is the Language Server Protocol, which uses JSON-RPC 2.0 and is entirely NDJSON. Same for about 15 different log-yeeting tools. Nobody has ever suggested switching LSP to plain text for performance reasons, only lower-overhead binary formats that don't throw out everything gained by having structure at all.

Large memory use by JSON is not something inherent to the encoding that plain text is somehow immune to. All sorts of CLI programs read stdin in all at once, and you don't see plain text getting slammed for exorbitant memory use.

In the context of the original post, `jc` etc, we're talking about essentially a constant sized output that's just much easier to parse, so the complaint is not relevant to those at all.

acmj4y ago

> But if your code was truly performance-critical, you wouldn't be piecing it together with shell pipelines that cause a bunch of unnecessary forks, you'd write it in something like C instead.

You are underestimating the power of unix tools. A chain of unix tools can match or exceed the performance of C programs written by average programmers. That is a true beauty of unix and partly why it is still relevant today. The author has little idea about performance and doesn't understand how unix works; otherwise he wouldn't make arrogant claims like:

> With jc, we can make the linux world a better place until the OS and GNU tools join us in the 21’st century!

talideon4y ago

> AFAIK, JSON was designed for Javascript in a web browser

It wasn't. It was _inspired_ by JS's syntax (and that of Python), but wasn't designed for it. Crockford designed it as a lightweight data exchange format that used a familiar syntax. Quoting from the json.org website itself:

> JSON (JavaScript Object Notation) is a lightweight data-interchange format. It is easy for humans to read and write. It is easy for machines to parse and generate. It is based on a subset of the JavaScript Programming Language Standard ECMA-262 3rd Edition - December 1999. JSON is a text format that is completely language independent but uses conventions that are familiar to programmers of the C-family of languages, including C, C++, C#, Java, JavaScript, Perl, Python, and many others. These properties make JSON an ideal data-interchange language.

JSON isn't terribly difficult to parse, nor does it require "generous amounts of memory". Shy of something like s-expressions, it's about as straightforward as you can get when it comes to structured data.

Netstrings are really useful but they encode strings, not structured data.

gizdan4y ago

JSON in shell might not be faster (or it might be, I've not benchmarked), but it certainly is more efficient to do select and filter and whatnot using something like jq. It's not about network, it's about making the output more predictable when running a script. I've lost count the number of times I've tried to capture a specific part of the output of a command only to be tripped up by an edge case like spaces or something else.

pjmlp4y ago

Powershell idea isn't new, it is how REPL worked across all Xerox PARC workstations.

While Windows isn't a whole language OS like those, .NET and COM gets pretty close to it, and that is what PowerShell knows about, instead of raw text.

This is what is missing across most traditional UNIX shells, integrate raw text, UNIX IPC (and newer ones like D-BUS/gRPC), shared libraries, structured data, into a single REPL experience.

kbrazilOP4y ago

In actuality most CLI output is quite small and can easily be represented as a single JSON document. There are a few commands that can produce huge amounts of output and those are good candidates for using JSON Lines as noted in the article.

JC has streaming parsers that lazily output JSON lines for these types of commands. (ls, ping, vmstat, iostat, etc.)

jiggawatts4y ago

The default behaviour of PowerShell is to stream objects one by one, the same as typical UNIX shells. In principle, it can process unlimited amounts of data on a single pipeline with a small, fixed amount of memory. The exceptions are commands like Sort-Object, which do require everything to be held at once in RAM. In theory, it could do an offline sort like the UNIX "sort" command does, but the issue is that that might break some scripts that rely on .NET objects that aren't serializable. If you're super keen, it would be possible to add this feature and develop a "Sort-ObjectOffline", at the risk that very rarely it might shred some objects...

The problem with JSON is that it does not support streaming by default. It's possible to use non-standard JSON-like formats to work around this, but then you're no longer using JSON!

sixothree4y ago

> The problem with JSON is that it does not support streaming by default.

ndjson is worth knowing about. We use it for things too large to stream.

https://github.com/ndjson/ndjson-spec

1vuio0pswjnm74y ago

"The probelm with JSON is that it does not support streaming by default."

This summarises the problem I have with JSON more succintly. It was not designed for streaming, thus "it does not support streaming by default".

Non-standard, line-oriented JSON formats are usable, although as a user I cannot see how they offer any significant improvement over previous approaches with fewer brackets, braces, colons, commas and quotes (BBCCQ). Consider the BSD utility mtree or the BSD-version of stat. These have options to output text in "shell-friendly" formats,1 minus all the BBCC and excessive Q. Sure, people could add options to utilities to output XML, or line-oriented JSON, but generally they don't. Why is that. Perhaps there is a reason.

You said it best: "It's possible use non-standard JSON-like formats to work around [JSON's limitations], but then you're no longer using JSON!"

Maybe JSON is just about hype or something. An attractant for today's "developers". This would explain why I am just not attracted by it.

https://man.netbsd.org/mtree.8

https://man.netbsd.org/stat.1

1 more reply

tester7564y ago

for me it is:

readable,

handy since probably every language has libs that work with it fine,

there's a lot of tools that work with jsons e.g generating code classes from json,

it's insanely popular,

really easy to learn

wruza4y ago

JSONL is the answer for arrays. “[…50mb…]” is too big to be processed in a streaming mode, but particular items usually cannot be split anyway, and “{…}\n50mb more” is what you need.

enriquto4y ago· 8 in thread

If you ever need to fight against this annoying json trend (e.g., when your tool only emits certain information in stupid ungrepable json), consider filtering the output through a gron so that it becomes saner.

redact2074y ago

Jq makes life easier. I'd say for complex output, using filters is more readable than using awk to extract random positions of substrings

trulyme4y ago

Thank you, sounds very useful for quick queries: https://github.com/TomNomNom/gron

hencoappel4y ago

Grep is for simple text, jq allows much more powerful searching/selection. Don't get me wrong, I believe it should be a choice and not force JSON on users, but for some it's useful.

dylan6044y ago

why is JSON ungrepable?

grep key file.json | awk -F: '{print $2}'

if you're already searching for a key, seems like you're just wanting the value.

granted, i hardly ever (have i actually ever??) interact with JSON this way, so i'm not exactly familiar with pitfalls.

fragmede4y ago

The pitfall is that JSON has zero guarantees for how often line breaks do and don't occur, and is often used to represent hierarchical data. Grepping for 'key: foo', and some liberal use of -A and -B may find you what you're looking for, but grep is simply the wrong tool for that job. (And how do you handle a key with newlines in it?) jq [0] is the right tool, but jq's syntax is it's own, and is harder to use (unless you use it regularly).

[0] https://stedolan.github.io/jq/

1 more reply

b3morales4y ago

When it's not formatted, just emitted as a single line.

(Granted grep still works, but...not nicely.)

1 more reply

nrclark4y ago

JSON is greppable if all you need a simple key-value from a known format and indentation. It's much harder if you don't know the indentation/line breaks, or if it's whitespace-free, or if your key can ever appear in your data.

throwawayboise4y ago

Or just:

    awk -F: '/key/ {print $2}' file.json

simonw4y ago· 6 in thread

I hadn't seen jc before (by the author of this piece: https://github.com/kellyjonbrazil/jc ) - what a great idea! It has parsers for around 80 different classic Unix utilities such that it can convert their output to JSON.

    ~ % dig example.com | jc --dig | jq
    [
      {
        "id": 61315,
        "opcode": "QUERY",
        "status": "NOERROR",
        "flags": [
          "qr",
          "rd",
          "ra"
        ],
        "query_num": 1,
        "answer_num": 1,
        "authority_num": 0,
        "additional_num": 1,
        "opt_pseudosection": {
          "edns": {
            "version": 0,
            "flags": [],
            "udp": 512
          }
        },
        "question": {
          "name": "example.com.",
          "class": "IN",
          "type": "A"
        },
        "answer": [
          {
            "name": "example.com.",
            "class": "IN",
            "type": "A",
            "ttl": 85586,
            "data": "93.184.216.34"
          }
        ],
        "query_time": 29,
        "server": "10.0.0.1#53(10.0.0.1)",
        "when": "Sun Dec 05 15:12:08 PST 2021",
        "rcvd": 56,
        "when_epoch": 1638745928,
        "when_epoch_utc": null
      }
    ]

EdwardDiego4y ago

Oh Christ yes, it supports lsof! The output from lsof has always been hard to script up.

aasasd4y ago

Turns out I can cross out a todo item off my list, because I wanted to make exactly the same thing.

Now, to find a query tool with a saner language than Jq...

kbrazilOP4y ago

If you like python you might check out jello[0]. I basically wrote it to give you the power and simplicity of python without the boiler plate in a jq-like form-factor. Jello also allows you to use dot-notation instead of dict bracket notation, so it does make things easier on the command line.

Also, there is jellex[1], which is a TUI wrapper around jello that can help you build your queries.

[0] https://github.com/kellyjonbrazil/jello [1] https://github.com/kellyjonbrazil/jellex

ZeroGravitas4y ago

What kind of thing are you trying to do?

jq can get pretty deep but for most things in this area I'm not sure how it could improve upon, but would be interested in hearing alternatives.

https://github.com/fiatjaf/jiq

Is a realtime feedback wrapper which I find useful when crafting one-off command line uses for jq and it starts getting crazy.

1 more reply

pdimitar4y ago

Have you checked these?

- https://github.com/antonmedv/fx

- https://github.com/antonmedv/gofx

kbrazilOP4y ago

New parser contributions for JC are always welcome!

b3morales4y ago· 6 in thread

Another DON'T, silently switching between JSON and human-readable depending on whether the output destination is a pipe. Just an extra hassle when I'm writing my downstream command. Or could be phrased as a DO: give the user a switch to pick the output format, if you have both.

GauntletWizard4y ago

I started to try to make an argument for pipe autodetection, but I just can't. It seems like a useful feature but is actually a trap. Shell scripts that are going to rely on json output should always explicitly specify that they want json output, and giving them the ability to shortcut that by autodetecting a pipe will only make it easy to ignore that - And then break when some other format comes into vogue. Human readable output should generally be the default, unless the programs are explicitly designed as only part of a pipeline.

Having multiple outputs is a great feature, though. I'm especially fond of tooling in Kubernetes that allows you to nicely pipe things in and out in multiple formats.

strictfp4y ago

Pretty much every linux tool pulls these shenanigans. I hate it when there's no flag to control output.

lytedev4y ago

What about things like ANSI color codes?

MereInterest4y ago

My preference would be to leave them in if they're the default, but to have an option to switch entirely to a more machine-readable format. It's a single line of sed to strip them out[0], and it's a bigger pain to figure out why a program has different behavior when debugging.

[0] https://superuser.com/a/380778

1 more reply

foodstances4y ago

Hope it supports NO_COLOR (http://no-color.org/)

    env NO_COLOR=1 ...

1 more reply

b3morales4y ago

Good question; those are more reasonable to me since they aren't visible as characters and don't change the structure. So if I'm looking at the colorized output I can still use that as the basis for a sed or awk script operating on the non-colorized version.

1 more reply

throw0101a4y ago· 5 in thread

See also libxo:

> The libxo library allows an application to generate text, XML, JSON, and HTML output using a common set of function calls. The application decides at run time which output style should be produced. The application calls a function "xo_emit" to product output that is described in a format string. A "field descriptor" tells libxo what the field is and what it means.

* https://github.com/Juniper/libxo

Then add an "--output-format" option.

GordonS4y ago

A +1 from me on something like `--format` - pipe auto-detection feels unnecessary and like an inevitable footgun.

As just one example, the Azure CLI defaults to human-readable output, but has an "output" parameter so you can have JSON if you want - I've never once wanted any kind of format auto-detection, and I have to say that I still don't.

talideon4y ago

Yup! In fact, there's only one good reason to do automatic pipe detection, and that's if your tool normally outputs ANSI escape codes, which aren't something you want in something going to a pipeline.

2 more replies

hnlmorg4y ago

The problem with auto-detection is that in POSIX-like shells it only works for the most simplest of problems (eg `ls` becoming a single column list when piped) because ultimately everything is treated by the shell as white space delimited list and treated by the OS as an untyped stream of bytes.

However more modern shells fix this problem with having typed pipelines and builtins written to understand more than just a flat file of bytes.

Take _murex_ for example (disclaimer, I'm the author of that shell):

  » jobs
  PID   State      Background  Process  Parameters
  2104  Executing  true        exec     sleep 9000000
  2240  Executing  true        exec     sleep 9000000

It's readable but what if I wanted to pass it as a table?

  » jobs | cat
  ["PID","State","Background","Process","Parameters"]
  [2104,"Executing",true,"exec","sleep 9000000"]
  [2240,"Executing",true,"exec","sleep 9000000"]

ok, so it auto-detects it is running as a pipe and outputs it as a jsonlines table. That would be annoying in Bash. But with a type aware shell, that shell knows it's a jsonlines table, eg

  » jobs | debug | [[ /Data-Type/Murex ]]
  jsonl

...but what can we do with a jsonlines table? Well you can select individual columns:

  » jobs | [ PID State ]
  [
      "PID",
      "State"
  ]
  [
      "2104",
      "Executing"
  ]
  [
      "2240",
      "Executing"
  ]

run SQL against it

  » jobs | select * where PID > 2200
  ["PID","State","Background","Process","Parameters"]
  ["2240","Executing","true","exec","sleep 9000000"]

iterate through each row

  » jobs | foreach proc { if { =$proc[0]>2200 } then { echo $proc } }
  [2240,"Executing",true,"exec","sleep 9000000"]

or even just convert it into another format, like CSV

  » jobs | format csv
  PID,State,Background,Process,Parameters
  2104,Executing,true,exec,sleep 9000000
  2240,Executing,true,exec,sleep 9000000

...or YAML...

  » jobs | format csv
  - - PID
    - State
    - Background
    - Process
    - Parameters
  - - "2104"
    - Executing
    - "true"
    - exec
    - sleep 9000000
  - - "2240"
    - Executing
    - "true"
    - exec
    - sleep 9000000

And it all just works without you having to think or even know what data format is traversing the pipeline.

However unfortunately none of this is possible with Bash. And thus the majority of tools are forced to be dumb to compensate.

1 more reply

foodstances4y ago

libxo is integrated into FreeBSD and many of its core utilities, so structured output is supported out of the box there.

rurban4y ago

perfect for the systemd island. Horrors for everyone else.

Output should be readable, not structured.

IgorPartola4y ago· 5 in thread

    "kb_read_s": 0.12

This worries me. JSON doesn’t have support for fixed point math, does it? When will some random POSIX tool spit out scientific notation at me.

Also, if you just output a flat schema, is there much of a point in this vs just:

    cpu: 0.2
    kw_read_s: 0.12

The difference is that you can use a JSON parser vs splitting on new lines and colons?

I do like the idea of JSON output as an option but before every bug and mistake gets canonized as POSIX or some other standard can we at least talk about the output format for a bit?

zeroimpl4y ago

You'll prefer JSON the minute it becomes:

    cpu: 0.2
    kb_read_s: 0.12
    mac_addr: 10:AA:FF:00:55:66

IgorPartola4y ago

    dict((key.strip(), val.strip()) for key, _, val in line.strip().partition(‘:’) for line in text.split(‘\n’) if line and line.strip())

Also who says we can’t have a universal parser for this format just like we have for JSON? Not everyone needs to write the one liner like above, just use the libtextformat.parse(text) or whatever we would call it.

2 more replies

EuAndreh4y ago

  cut -d: -f2-

toomanybeersies4y ago

> JSON doesn’t have support for fixed point math

Plain text doesn't have support for numbers at all, which isn't much of a solution.

IgorPartola4y ago

Right. So if we replace plain text then let’s maybe do better than JSON or not do it at all.

Let’s put it this way: if I proposed XML as the substitution for plain text, would you rather keep plain text or switch to XML?

wruza4y ago· 4 in thread

CLI and JSON would be amazing if terminals made a step forward too. Because both raw json and triangled in-browser console json logging just suck for daily reading. A new terminal could either detect patterns or use explicit cues in json to format structures, and show raw data on demand. E.g. this json5: [{_repr:"ls:file", name:"foo.txt", type:"text/plain", size:512, access:"664", …}] could be presented as usual ls does, but processed further as json. A whole lot of representations (and editors) – including ui-based – could be added to the system (e.g. /usr/local/share/repr/ls:file (+x)) to format any sort of data, instead of formatting it in-program with getopt and printf. And when there is no repr file, well, you still have triangles mode. We’re too stuck with text=text=text idiom. Structure=text=ui would be so much better.

(I’m aware of powershell and am ignoring it consciously)

cwalv4y ago

This kindof reminds me of the ipython display protocol

https://carreau.github.io/posts/29-JupyterCon-DisplayProtoco...

laumars4y ago

Such a shell already exists

https://github.com/lmorg/murex

You’d have to learn a new shell syntax but at least it’s compatible with existing CLI tools (which Powershell isn’t)

wruza4y ago

I was talking about terminal [emulators], not a shell. A shell has nothing to do with how the output stream is displayed. Murex is more like powershell in this regard, which is on/around the level of implementation that I personally find inappropriate.

1 more reply

nicoburns4y ago

Nushell does at least some of this

nyuszika7h4y ago· 3 in thread

I don't really understand the point about flattening.

> This way I can easily filter the data in jq or other tools without having to traverse levels.

How is doing `jq '.cpu.speed'` any harder than doing `jq '.cpu_speed'`?

IMO as long as you aren't going insane with nesting levels, it's actually better to have a proper structure than dumping everything into an ugly flat object.

kbrazilOP4y ago

The article is not advocating flattening willy nilly. The point is to have bias for flatter structures to make it easier for the user, but of course not all structures can or should be flat. On the other hand, don’t over-engineer your data structure so it makes finding things difficult.

Grabbing an attribute is not necessarily any harder in a deeply nested structure, but filtering based on multiple deeply nested attributes in different branches can make a query quite complex.

nyuszika7h4y ago

> The article is not advocating flattening willy nilly.

It certainly seems like it does when the first example for flattening is oversimplifying an already simple structure that doesn't really need flattening. Maybe that was not meant to be a serious example but rather just for ease of understanding, but then the article should have probably said so.

> filtering based on multiple deeply nested attributes in different branches can make a query quite complex

Can you elaborate on this please? Maybe I'm just too tired to think clearly at 1 am, but I don't see how filtering is any harder. You would just do something like `jq '.foo | select(.bar.baz >= 42 and .qux.moo.asd == "abc")`.

3 more replies

anyfactor4y ago

Deep nesting makes it harder for accessing data through a systematic way.

If you are accessing a deep nested data that means you have to account for layers of existence of keys. If cpu exist then see if speed exist then access speed. Nothing wrong with deep nesting as long as you can guarantee a key and data will be generated but more often than not when the data is not being generated the JSON data and the key will not simply exist.

And people do get carried away with nesting. Also it is nice to have core information available at the surface level of JSON file.

nickysielicki4y ago· 2 in thread

I don't know if it's just me but I've found jq and/or json output of little relevance to my day-to-day command line usage. I will almost always reach for `python -c` before I reach for jq -- for better or worse.

kbrazilOP4y ago

That’s why I created jello[0]. You get the power of python without the boilerplate so the experience is closer to jq with python syntax.

[0] https://github.com/kellyjonbrazil/jello

nickysielicki4y ago

Starred and bookmarked, will definitely reach for this the next time I need to work with json in a pipeline. Thanks for the link! Cool project.

gumby4y ago· 1 in thread

If you’re going to make a schema, which is a good idea, then make a command line option that emits it.

kbrazilOP4y ago

Yep, in JC you can see the schema for any parser like so:

$ jc -h --dig

agjmills4y ago· 1 in thread

Imagine if http APIs has similar output to lsof or df -h - nobody would write a script to use them! JSON makes a lot of sense, but a human-parseable format is also needed.

hnlmorg4y ago

You're conflating two different concerns:

1. human review

2. scripting / automation

In case one, human readable formats are obviously preferable. But the moment you need to script a command, you want it in a machine readable format.

A perfect example of the differences between the two are how badly spaces in file names are handled. Granted POSIX deserve a lot of the blame here too.

setheron4y ago· 1 in thread

When I was at Amazon long time ago, there was a suite of tools that would use structured text.

I think it was called "recs"

kseistrup4y ago

There's also GNU recutils: https://www.gnu.org/software/recutils/

lukeholder4y ago

MacOS Monterey added a CLI tool for speedtesting, and I noticed they have a

  -c: Produce computer-readable output

So I tried it out:

  ~ networkquality -c | jq '{dl: (.dl_throughput / 1000000), ul: (.ul_throughput / 1000000)}'

awesome!

  {
    "dl": 176.861488,
    "ul": 6.742952
  }

(Starlink in Perth, Western Australia)

umvi4y ago

In addition to an option for writing output as JSON, consider also adding an option for streaming output to stdout. Those two features were added to GCC9 gcov and are what enabled me to write a tool that parallelizes coverage report generation.

In practice this enabled generating coverage reports orders of magnitude faster than traditional gcov wrappers like lcov

anyfactor4y ago

My opinion about JSON is that-

1. You don't need to categorize every piece of data 2. You don't need to include everything in a single JSON file.

Deep nesting JSON is very annoying. The key-value pair structure of JSON is simply being abused at this point. Also, I really don't appreciate using numerical values as keys. Please use a list.

gorgoiler4y ago

I took a leaf out of zfs’s book and make all my apps’ output look like zfs/zpool.

Optional header, selectable columns, one line per record, machine readable (raw) vs human readable numbers.

I’ve nothing against JSON output but I just don’t need it when you can print out two columns, select on the first, and print the second.

  $ users -H -o name,hair |
  > awk ‘$1 == “gorgoiler” {print $2}’
  gray

Admittedly, that awk invocation is so commonly used it could probably be a lot more terse. Also, this whole house of cards collapses when you have data containing spaces.

giaour4y ago

This is one area where I find Powershell preferable. It allows commands to return structured data in a standardized way, which really helps interop between programs from different publishers

lunfard0004y ago

come to the powershell-side, we have ConvertFrom-Json

billpg4y ago

This would be a great option for find/xargs. -print0/-0 do the job for each other but then there's a disconnect with a user being able to view the list of files.

j / k navigate · click thread line to collapse

110 comments

73 comments · 19 top-level

1vuio0pswjnm74y ago· 12 in thread

https://devblogs.microsoft.com/scripting/learn-how-to-config...

If the intent is to make output network friendly, maybe something like netstrings is useful. Easy to parse. Low memory footprint.

alexiaya4y ago

In fact, I would argue that by using `jc` together with `jq` you can actually create some nice pipelines for parsing the data that will be much more in line with the Unix philosophy.

cormacrelf4y ago

Yes. The performance point boiled down to this:

> These programs do not expect large amounts of memory and many are written with the intent that they may be used to process text line-by-line

In the context of the original post, `jc` etc, we're talking about essentially a constant sized output that's just much easier to parse, so the complaint is not relevant to those at all.

acmj4y ago

> But if your code was truly performance-critical, you wouldn't be piecing it together with shell pipelines that cause a bunch of unnecessary forks, you'd write it in something like C instead.

> With jc, we can make the linux world a better place until the OS and GNU tools join us in the 21’st century!

talideon4y ago

> AFAIK, JSON was designed for Javascript in a web browser

Netstrings are really useful but they encode strings, not structured data.

gizdan4y ago

pjmlp4y ago

Powershell idea isn't new, it is how REPL worked across all Xerox PARC workstations.

While Windows isn't a whole language OS like those, .NET and COM gets pretty close to it, and that is what PowerShell knows about, instead of raw text.

This is what is missing across most traditional UNIX shells, integrate raw text, UNIX IPC (and newer ones like D-BUS/gRPC), shared libraries, structured data, into a single REPL experience.

kbrazilOP4y ago

JC has streaming parsers that lazily output JSON lines for these types of commands. (ls, ping, vmstat, iostat, etc.)

jiggawatts4y ago

The problem with JSON is that it does not support streaming by default. It's possible to use non-standard JSON-like formats to work around this, but then you're no longer using JSON!

sixothree4y ago

> The problem with JSON is that it does not support streaming by default.

ndjson is worth knowing about. We use it for things too large to stream.

https://github.com/ndjson/ndjson-spec

1vuio0pswjnm74y ago

"The probelm with JSON is that it does not support streaming by default."

This summarises the problem I have with JSON more succintly. It was not designed for streaming, thus "it does not support streaming by default".

You said it best: "It's possible use non-standard JSON-like formats to work around [JSON's limitations], but then you're no longer using JSON!"

Maybe JSON is just about hype or something. An attractant for today's "developers". This would explain why I am just not attracted by it.

https://man.netbsd.org/mtree.8

https://man.netbsd.org/stat.1

1 more reply

tester7564y ago

for me it is:

readable,

handy since probably every language has libs that work with it fine,

there's a lot of tools that work with jsons e.g generating code classes from json,

it's insanely popular,

really easy to learn

wruza4y ago

JSONL is the answer for arrays. “[…50mb…]” is too big to be processed in a streaming mode, but particular items usually cannot be split anyway, and “{…}\n50mb more” is what you need.

enriquto4y ago· 8 in thread

redact2074y ago

Jq makes life easier. I'd say for complex output, using filters is more readable than using awk to extract random positions of substrings

trulyme4y ago

Thank you, sounds very useful for quick queries: https://github.com/TomNomNom/gron

hencoappel4y ago

Grep is for simple text, jq allows much more powerful searching/selection. Don't get me wrong, I believe it should be a choice and not force JSON on users, but for some it's useful.

dylan6044y ago

why is JSON ungrepable?

grep key file.json | awk -F: '{print $2}'

if you're already searching for a key, seems like you're just wanting the value.

granted, i hardly ever (have i actually ever??) interact with JSON this way, so i'm not exactly familiar with pitfalls.

fragmede4y ago

[0] https://stedolan.github.io/jq/

1 more reply

b3morales4y ago

When it's not formatted, just emitted as a single line.

(Granted grep still works, but...not nicely.)

1 more reply

nrclark4y ago

throwawayboise4y ago

Or just:

    awk -F: '/key/ {print $2}' file.json

simonw4y ago· 6 in thread

    ~ % dig example.com | jc --dig | jq
    [
      {
        "id": 61315,
        "opcode": "QUERY",
        "status": "NOERROR",
        "flags": [
          "qr",
          "rd",
          "ra"
        ],
        "query_num": 1,
        "answer_num": 1,
        "authority_num": 0,
        "additional_num": 1,
        "opt_pseudosection": {
          "edns": {
            "version": 0,
            "flags": [],
            "udp": 512
          }
        },
        "question": {
          "name": "example.com.",
          "class": "IN",
          "type": "A"
        },
        "answer": [
          {
            "name": "example.com.",
            "class": "IN",
            "type": "A",
            "ttl": 85586,
            "data": "93.184.216.34"
          }
        ],
        "query_time": 29,
        "server": "10.0.0.1#53(10.0.0.1)",
        "when": "Sun Dec 05 15:12:08 PST 2021",
        "rcvd": 56,
        "when_epoch": 1638745928,
        "when_epoch_utc": null
      }
    ]

EdwardDiego4y ago

Oh Christ yes, it supports lsof! The output from lsof has always been hard to script up.

aasasd4y ago

Turns out I can cross out a todo item off my list, because I wanted to make exactly the same thing.

Now, to find a query tool with a saner language than Jq...

kbrazilOP4y ago

Also, there is jellex[1], which is a TUI wrapper around jello that can help you build your queries.

[0] https://github.com/kellyjonbrazil/jello [1] https://github.com/kellyjonbrazil/jellex

ZeroGravitas4y ago

What kind of thing are you trying to do?

jq can get pretty deep but for most things in this area I'm not sure how it could improve upon, but would be interested in hearing alternatives.

https://github.com/fiatjaf/jiq

Is a realtime feedback wrapper which I find useful when crafting one-off command line uses for jq and it starts getting crazy.

1 more reply

pdimitar4y ago

Have you checked these?

- https://github.com/antonmedv/fx

- https://github.com/antonmedv/gofx

kbrazilOP4y ago

New parser contributions for JC are always welcome!

b3morales4y ago· 6 in thread

GauntletWizard4y ago

Having multiple outputs is a great feature, though. I'm especially fond of tooling in Kubernetes that allows you to nicely pipe things in and out in multiple formats.

strictfp4y ago

Pretty much every linux tool pulls these shenanigans. I hate it when there's no flag to control output.

lytedev4y ago

What about things like ANSI color codes?

MereInterest4y ago

[0] https://superuser.com/a/380778

1 more reply

foodstances4y ago

Hope it supports NO_COLOR (http://no-color.org/)

    env NO_COLOR=1 ...

1 more reply

b3morales4y ago

1 more reply

throw0101a4y ago· 5 in thread