It's not unique in that regard. 'sed' is Turing complete[1][2], but few people get farther than learning how to do a basic regex substitution.
[1] https://catonmat.net/proof-that-sed-is-turing-complete
[1] And arguably a Turing tarpit.
Closest I've come, if you're willing to overlook its verbosity and (lack of) speed, is actually PowerShell, if only because it's a bit nicer than Python or JavaScript for interactive use.
I think it might be more cognitive load than it is worth to expect everyone en masse to learn another single-line-punctuation-driven-language to perform everyday tasks with.
I suspect my use cases are less complex than yours. Or maybe jq just fits the way I think for some reason.
I dream of a world in which all CLI tools produce and consume JSON and we use jq to glue them together. Sounds like that would be a nightmare for you.
Here's an example of my white whale, converting JSON arrays to TSV.
cat input.json | jq -S '(first|keys | map({key: ., value: .}) | from_entries), (.[])' | jq -r '[.[]] | @tsv' > out.tsv
<input.json jq -S -r '(first | keys) , (.[]| [.[]]) | @tsv'
<input.json # redir
jq
-S # sort
-r # raw string out
'
(first | keys) # header
, # comma is generator
(.[] | # loop input array and bind to .
[ # construct array
.[] # with items being the array of values of the bound object
])
| @tsv' # generator binds the above array to . and renders to tsv cat input.json | jq -r '(first | keys) as $cols | $cols, (.[] | [.[$cols[]]]) | @tsv'
That whole map and from entries throws it off. It's not a good use for what you're doing. tsv expects a bunch of arrays, whereas you're getting a bunch of objects (with the header also being one) and then converting them to arrays. That is an unnecessary step and makes it a little harder to understand.that world exists and mature (powershell)
jq is the CLI I like the most, but sometimes even I struggled to understand the queries I wrote in the past. celq uses a more familiar language (CEL)
# Common Expression Language
The Common Expression Language (CEL) implements common
semantics for expression evaluation, enabling different
applications to more easily interoperate.
## Key Applications
- Security policy: organizations have complex infrastructure
and need common tooling to reason about the system as a whole
- Protocols: expressions are a useful data type and require
interoperability across programming languages and platforms.I think my personal preference for syntax would be Python’s. One day I want to try writing a query tool with https://github.com/pydantic/monty
$ cat package.json | dq 'Object.keys(data).slice(0, 5)'
[ "name", "type", "version", "scripts", "dependencies" ]
https://crespo.business/posts/dq-its-just-js/No more fiddling around trying to figure out the damn selector by trying to track the indentation level across a huge file. Also easy to pipe into fzf, then split on "=", trim, then pass to jq
I was working at lot with Rego (the DSL for Open Policy Agent) and realized it was actually a pretty nice syntax for jq type use cases.
Of course, this doesn't matter now, I just ask an LLM to make the query for me if it's so complex that I can't do it by hand within seconds.
this and other reasons is why I built: https://github.com/dhuan/dop
You don't have to use my implementation, you could easily write your own.
Sure there are 0.000001% edge cases where that MIGHT be the next big bottleneck.
I see the same thing repeated in various front end tooling too. They all claim to be _much_ faster than their counterpart.
9/10 whatever tooling you are using now will be perfectly fine. Example; I use grep a lot in an ad hoc manner on really large files I switch to rg. But that is only in the handful of cases.
The difference between 2ms and 0.2ms might sound unneeded, or even silly to you. But somebody, somewhere, is doing stream processing of TB-sized JSON objects, and they will care. These news are for them.
People would say, "Why use this when it's harder to read and only saves N ms?" He'd reply that you'd care about those ms when you had to read a database from 500 remote servers (I'm paraphrasing. He probably had a much better example.)
Turns out, he wrote a book that I later purchased. It appears to have been taken over by a different author, but the first release was all him and I bought it immediately when I recognized the name / unix.com handle. Though it was over my head when I first bought it, I later learned enough to love it. I hope he's on HN and knows that someone loved his posts / book.
https://www.amazon.com/Pro-Bash-Programming-Scripting-Expert...
Also performance improvements on heavy used systems unlocks:
Cost savings
Stability
Higher reliability
Higher throughput
Fewer incidents
Lower scaling out requirements.
For example, doing dangerous thing might be faster (no bound checks, weaker consistency guarantee, etc), but it clearly tend to be a reliability regression.
So went not compare that case directly? We'd also want to see the performance of the assumed overheads i.e. how it scales.
That's crazy to think about. My JSON files can be measured in bytes. :-D
Either way, I have really big doubts that there will be ever a significant amount of people who'd choose jq for that.
It’s the same sentiment as “Individuals don’t matter, look at how tiny my contribution is.”. Society is made up of individuals, so everybody has to do their part.
> 9/10 whatever tooling you are using now will be perfectly fine.
It is not though. Software is getting slower faster than hardware is getting quicker. We have computers that are easily 3–4+ orders of magnitudes faster than what we had 40 years ago, yet everything has somehow gotten slower.
Out of curiosity, have you read the jq manpage? The first 500 words explain more or less the entire language and how it works. Not the syntax or the functions, but what the language itself is/does. The rest follows fairly easily from that.
Say a number; make a real argument. Don't just wave your hand and say "just imagine how right I could be about this vague notion if we only knew the facts"
If I/you was working with JSON of that size where this was important, id say you probably need to stop using JSON! and some other binary or structured format... so long as it has some kinda tooling support.
And further if you are doing important stuff in the CLI needing a big chain of commands, you probably should be programming something to do it anyways...
that's even before we get to the whole JSON isn't really a good data format whatsoever... and there are many better ways. The old ways or the new ways. One day I will get to use my XSLT skills again :D
"Fast enough" will always bug me. "Still ahead of network latency" will always sound like the dog ate your homework. I understand the perils of premature optimization, but not a refusal to optimize.
And I doubt I'm alone.
>
> 9/10 whatever tooling you are using now will be perfectly fine
Are you working in frontend? On non-trivial webapps? Because this is entirely wrong in my experience. Performance issues are the #1 complaint of everyone on the frontend team. Be that in compiling, testing or (to a lesser extend) the actual app.
Either the team I worked at was horrible, or you are from Google/Meta/Walmart where either everyone is smart or frondend performance is directly related to $$.
From that I completely agree with your statement - however, you're not addressing the point he makes which kinda makes your statement completely unrelated to his point
99.99% of all performance issues in the frontend are caused by devs doing dumb shit at this point
The frameworks performance benefits are not going to meaningfully impact this issue anymore, hence no matter how performant yours is, that's still going to be their primary complaint across almost all complex rwcs
And the other issue is that we've decided that complex transpiling is the way to go in the frontend (typescript) - without that, all built time issues would magically go away too. But I guess that's another story.
It was a different story back when eg meteorjs was the default, but nowadays they're all fast enough to not be the source of the performance issues
I don't think I remember one case where jq wasn't fast enough
Now what I'd really want is a jq that's more intuitive and easier to understand
Unfortunately I don’t recall the name, but there was something submitted to HN not too long ago (I think it was still 2026) which was like jq but used JavaScript syntax.
Opencode, ClaudeCode, etc, feel slow. Whatever make them faster is a win :)
The vast majority of Linux kernel performance improvement patches probably have way less of a real world impact than this.
I'm sure there are reasons against switching to something more efficient–we've all been there–I'm just surprised.
You could probably do something similar for a faster jq.
For about a month now I've been working on a suite of tools for dealing with JSON specifically written for the imagined audience of "for people who like CLIs or TUIs and have to deal with PILES AND PILES of JSON and care deeply about performance".
For me, I've been writing them just because it's an "itch". I like writing high performance/efficient software, and there's a few gaps that it bugged me they existed, that I knew I could fill.
I'm having fun and will be happy when I finish, regardless, but it would be so cool if it happened to solve a problem for someone else.
> The query language is deliberately less expressive than jq's. jsongrep is a search tool, not a transformation tool-- it finds values but doesn't compute new ones. There are no filters, no arithmetic, no string interpolation.
Mind me asking what sorts of TB json files you work with? Seems excessively immense.
If you work at a hyperscaler, service log volume borders on the insane, and while there is a whole pile of tooling around logs, often there's no real substitute for pulling a couple of terabytes locally and going to town on them.
Fully agree. I already know the locations of the logs on-disk, and ripgrep - or at worst, grep with LC_ALL=C - is much, much faster than any aggregation tool.
If I need to compare different machines, or do complex projections, then sure, external tooling is probably easier. But for the case of “I know roughly when a problem occurred / a text pattern to match,” reading the local file is faster.
Sometimes those will actually need to process through a bunch of data unexpectedly.
Sometimes those will be run on a loop - once per second, N per minute (etc), and the results will be used to monitor a situation until a bug is fixed or a spike in load is resolved or a proper monitoring program/metric can be deployed.
Sometimes those are to investigate a pegged CPU and the amortized lower runtime across all the tasks on the CPU is noticable.
We run our machines hot and part of the reason we can do that is being in the habit of choosing lower cost (in cycles) tooling whenever we can. If i can spend a little time and effort learning a tool that saves a bunch of cpu in aggregate, its a win. When the whole company does it, we can spend a lot less on hardware than it costs in engineer time to make these decisions.
Another way of putting it is: its a type of frugality (not cheapness, just spending wisely). If you save a dollar once, its nothing. If you have a habit of saving a dollar every time the opportunity arises, it adds up quickly. By having a habit of choosing more performant tools, you're less likely to hit a case where you wish you did use more performant tools, and are practiced at it when the need arises for pure parsimony and it's less painful.
- Someone likes tool X
- Figures, that they can vibe code alternative
- They take Rust for performance or FAVORITE_LANG for credentials
- Claude implements small subset of features
- Benchmark subset
- Claim win, profit on showcase
Note: this particular project doesn't have many visible tells, but there's pattern of overdocumentation (17% comment-to-code ratio, >1000 words in README, Claude-like comment patterns), so it might be a guided process.
I still think that the project follows the "subset is faster than set" trend.
Usually, a perceptive user/technical mind is able to tweak their usage of the tools around their limitations, but if you can find a tool that doesn't have those limitations, it feels far more superior.
The only place where ripgrep hasn't seeped into my workflow for example, is after the pipe and that's just out of (bad?) habit. So much so, sometimes I'll do this foolishly rg "<term>" | grep <second filter>; then proceed to do a metaphoric facepalm on my mind. Let's see if jg can make me go jg <term> | jq <transformation> :)
(Honestly, who even still writes shell scripts? Have a coding agent write the thing in a real scripting language at least, they aren't phased by the boilerplate of constructing pipelines with python or whatever. I haven't written a shell script in over a year now.)
Prioritizing SEO-ing speed over supporting the same features/syntax (especially without an immediately prominent disclosure of these deficiencies) = marketing bullshit
A faster jq except it can't do what jq does... maybe I can use this as a pre-filter when necessary.
But every now and then a well-optimised tool/page comes along with instant feedback and is a real pleasure to use.
I think some people are more affected by that than others.
Obligatory https://m.xkcd.com/1205
However, as someone who always loved faster software and being an optimisation nerd, hat's off!
If you don't mind me asking, which yq? There's a Go variant and a Python pass-through variant, the latter also including xq and tomlq.
A bit of a fun fact: there's a quote by Farah where he said that the language and semantics of the tool he was writing, didn't really "click in" until he was well into writing it :-) I myself have been on occasion pulling my hair out trying to wield `yq`'s language, there's some inconsistencies here and there which I think are related to the novel nature of the language (not novel to everyone but it's uncommon even for those well versed with e.g. SQL). `jq` suffers from similar woes, but to a lesser degree.
Had to spend some efforts to set up completions, also there some small rough edges around commands discoverability, but anyway, much better than the previous oh-my-zsh setup
Ideally, wish it also had a flag to enforce users to write type annotations + compiling scripts as static binaries + a TUI library, and then I'd seriously consider it for writing small apps, but I like and appreciate it in the current state already
Also, there are lots of charts without comparison so the numbers mean nothing...
It looks like jaq has already progressed much further in the right direction than jsongrep has just started in the not-quite-as-right direction.
Second, some comments on the presentation: the horizontal violin graphs are nice, but all tools have the same colours, and so it's just hard to even spot where jsongrep is. I'd recommend grouping by tool and colour coding it. Besides, jq itself isn't in the graphs at all (but the title of the post made me think it would be!).
Last, xLarge is a 190MiB file. I was surprised by that. It seems too low for xLarge. I daily check 400MiB json documents, and sometimes GiB ones.
[0]: https://catalog.data.gov/dataset/?res_format=JSON
[1]: https://catalog.data.gov/dataset/crimes-2001-to-present
The whole tool would be like a few dozen lines of c++ and most likely be faster than this.
> Jq is a powerful tool, but its imperative filter syntax can be verbose for common path-matching tasks. jsongrep is declarative: you describe the shape of the paths you want, and the engine finds them.
IMO, this isn't a common use case. The comparison here is essentially like Java vs Python. Jq is perfectly fine for quick peeking. If you actually need better performance, there are always faster ways to parse JSON than using a CLI.
It's just sparkling memory safe high performance software
It does some kind of stack forking which is what allows its funky syntax
https://github.com/jqlang/jq/issues/1826
So any replacement candidate should also benchmark like hyperfine "jq .a <<< '{"a": 10 }'" . This oneliner does not work but should illustrate the idea.
Also please just use jshon if you need to just extract specific value from some small JSON. jshon uses way less resources by any conceivable metric.
Everything can be rewritten in Rust will be written in Rust.
jq is supposed to fit in to other bash scripts as a one liner. That's it's super power. I know very few people who write regex on the fly either (unless you were using it everyday) they check the documentation and flesh it out when they need it.
Just use Claude to generate the jq expression you need and test it.
basically the double jump to find values in the heap is what slows down these tools most
Nice write up. I will try out your tool.
Also "jg" reads very similar to "jq", and initially I thought he was talking about "jq" all along, and I was like: where can I see the "jasongrep" examples? Threw me off for a minute.
https://news.ycombinator.com/item?id=47542182
The reason I was interested, was adding the new tool to arkade (similar to Brew, but more developer/devops focused - downloads binaries)
The agent found no Arm binaries.. and it seemed like an odd miss for a core tool
If the arm64 version was on homebrew (didn’t check if it is but assume not because it’s not mentioned on the page), I’d install it from there rather than from cargo.
I don’t really manually install binaries from GitHub, but it’s nice that the author provides binaries for several platforms for people that do like to install it that way.
To address the concern, anyway, I'm sure it would soon be available in brew as an arm binary.
[1]: https://github.com/micahkepe/jsongrep/releases/tag/v0.8.0
$ cat sample.json | jg -F name
I would humbly suggest that a better syntax would be:
$ cat sample.json | jg .name
for a leaf node named "name"; or
$ cat sample.json | jg -F .name.
for any node named "name".
But I will admit, the new syntax makes a lot more sense.
Some bits of the site are hard to read "takes a query and a JSON input" query is in white and the background of the site is very light which makes it hard to read.
For example, web pages sometimes contain inline "JSON". But as this is not a proper JSON file, jq-style utilties cannot process it
The solution I have used for years is a simple utility written in C using flex^1 (a "filter") that reformats "JSON" on stdin, regardless of whether the input is a proper JSON file or not, into stdout that is line-delimited, human-readable and therefore easy to process with common UNIX utilities
The size of the JSON input does not affect the filter's memory usage. Generally, a large JSON file is processed at the same speed with the same resource usage as a small one
The author here has provided musl static-pie binaries instead of glibc. HN commenters seeking to discredit musl often claim glibc is faster
Personally I choose musl for control not speed
1. jq also uses flex
Just added this new tool to arkade, along with the existing jq/yq.
No Arm64 for Darwin.. seriously? (Only x86_64 darwin.. it's a "choice")
No Arm64 for Linux?
For Rust tools it's trivial to add these. Do you think you can do that for the next release?
netstrings has no such issues
Yes, "cmd <file" is more efficient for the computer but not for the reader in many cases. I read from left to the right and the pipeline might be long or "cmd" might have plenty of arguments (or both). Having "cat file | cmd" immediately gives me the context for what I am working with and corresponds well with "take this file, do this, then that, etc" with it) and makes it easier for me to grok what is happening (the first operation will have some kind of input from stdin). Without that, the context starts with the (first) operation like in the sentence "do this operation, on this file (,then this, etc)". I might not be familiar with it or knowing the arguments it expects.
At least for me, the first variant comes more naturally and is quicker to follow (in most cases), so unless it is performance sensitive that is what I end up with (and cat is insanely fast for most cases).
<file command
which is equivalent to command <file