Well, taking into account that jq development has been halted for 5 years and only recently revived again, it's no wonder that bug reports have been sitting there for that time, both well known and new ones. I bet they'll get up to speed and slowly but surely clear the backlog that has built up all this time.
That's not to take away from JAQ by any means I just find the JQ style syntax uber hard to grokk so jql makes more sense for me.
Everyone seems to want to invent their own new esoteric symbolic query language as if everything they do is a game of code golf. I really wish everyone would move away from this old Unix mentality of extremely concise, yet not-self-evident syntax and do more like the power shell way.
With somewhat tabular data, you can use sqlite to read the data into tables and then work from there.
Example 10 from https://opensource.adobe.com/Spry/samples/data_region/JSONDa... (slightly fixed by removing the ellipsis) results in this interaction:
sqlite> select json_extract(value, '$.id'), json_extract(value, '$.type') from json_each(readfile('test.json'), '$.items.item[0].batters.batter');
1001|Regular
1002|Chocolate
1003|Blueberry
1004|Devil's Food
sqlite> select json_extract(value, '$.id'), json_extract(value, '$.type') from json_each(readfile('test.json'), '$.items.item[0].topping');
5001|None
5002|Glazed
5005|Sugar
5007|Powdered Sugar
5006|Chocolate with Sprinkles
5003|Chocolate
5004|Maple
Instead of "select" this could also flow into freshly created tables using "insert into" for more complex scenarios.I personally don't understand why people aren't willing to learn instead. It's not hard to sit down and pick up a new skill and it's good to step out of one's comfort zone. I personally hate Powershell syntax, brevity is the soul of wit and PS could learn a thing or two from bash and "the linux way".
We seem obsessed with molding the machine to our individual preferences. Perhaps we should obsess over the opposite: molding our mind to think more like the machine. This keeps a lot of things simple, uncomplicated, and flexible.
Does a painter wish for paints that were more like how he wanted them to be? Sure, but at the end of the day he buys the same paint everyone else does and learns to work with his medium.
- https://steampipe.io/docs/sql/querying-json#querying-json #example w/the AWS steampipe plugin (I think this is a wrapper around the AWS go SDK)
- https://hub.steampipe.io/plugins/turbot/config #I think this lets you query random json files.
(edited to try to fix the bulleting)
do more like the power shell way
I just checked the GitHub page [1] for Microsoft PowerShell. It looks written in C# and available on Win32/MacOS/Linux, where DotNet is now supported. Do you use PowerShell only on Win32 or other platforms also? Everyone seems to want to invent their own new esoteric symbolic query language
Can you give an example of something that PS can do that is built-in for text processing, instead of a proprietary symbolic query language?You could ask the same with respect to XML too -- why XPath/XSLT instead of SQL?
The problem is that SQL isn't that convenient when you're querying data in a free-form and recursive schema. Especially the latter, because recursive queries in SQL are just not pithy. I say this as someone who loves SQL.
N.B. those aliases are not created by default on *nix
It's pipeline-based and procedural, but you can be very declarative in data processing
'|={"b""d"=2, "c"}'
this appears to be something like jq's: 'select(."b"."d" == 2 or ."c" != null)'
which.. is obviously longer, but I think I prefer it, it's clearer?(actually it would be `.[] | select(...)`, but I'm not sure something like that isn't true of jql too without trying it, I don't know if the example's intended to be complete - and I don't think it affects my verdict)
You're not alone. ChatGPT (3.5) is terrible at it also, for anything non-trivial.
I'm not sure if that's because of the nature of the jq syntax, but I do wonder.
Sadly 99% of what I do with jq is “| jq .”
That's a lot of dependencies..
Also re "lots of dependencies": This is kind of unavoidable in Rust because the stdlib is deliberately very lean, and focuses on basic data structures that are needed for interop (e.g. having common string types is important for different libraries to work together with each other) or not possible to implement without specific compiler support (e.g. marker traits or boxing). Contrast this with Go where the stdlib contains things like a full-fledged HTTP server and regex engine. It's easy to build things in Go with a rather short go.mod file, but only because the go.mod file does not show all the stdlib packages that you're using.
SQL is a much more natural language if the data is somewhat tabular.
It is somewhat similar to Linq in C# although SQL there is more standardised so I like it more. Also, it would be fantastic to have in-language support for querying raw collections with SQL. Even better: to be able to transparently store collections in Sqlite.
It is always sad to see code which takes some data from db/whatever and then does simple processing using loops/stream api. SQL is much higher level and more concise language for these use cases than Java/Kotlin/Python/JavaScript
I've noticed what I'm creating are DAGs, and that I'm constantly restarting it from the last-successfully-proccessed record. Is there a `Make`-like tool to represent this? Make doesn't have sql targets, but full-featured dag processors like Airflow are way too heavyweight to glue together shell snippets.
Also hey, been a while ;)
Edit: I stand corrected, the latest spec (rfc8259) only formally specifies the textual format, but not the semantics of numbers.
However, it does have this to say:
> This specification allows implementations to set limits on the range/and precision of numbers accepted. Since software that implements IEEE 754 binary64 (double precision) numbers [IEEE754] is generally available and widely used, good interoperability can be achieved by implementations that expect no more precision or range than these provide, in the sense that implementations will approximate JSON numbers within the expected precision.
In practice, most implementations treat JSON as a subset of Javascript, which implies that numbers are 64-bit floats.
However what you say is good practice anyway. The spec (RFC 8259) has this note on interoperability:
> This specification allows implementations to set limits on the range and precision of numbers accepted. Since software that implements IEEE 754 binary64 (double precision) numbers [IEEE754] is generally available and widely used, good interoperability can be achieved by implementations that expect no more precision or range than these provide, in the sense that implementations will approximate JSON numbers within the expected precision. A JSON number such as 1E400 or 3.141592653589793238462643383279 may indicate potential interoperability problems, since it suggests that the software that created it expects receiving software to have greater capabilities for numeric magnitude and precision than is widely available.
Are you sure? Looking at https://www.json.org/json-en.html I don't see anything about 64 bit floats.
Many will produce higher precision but parse as float64 by default. But maximally-compatible JSON systems should always handle arbitrary precision.
Also, what!! Hey! Miss you man.
> Use decimal number literals to preserve precision. Comparison operations respects precision but arithmetic operations might truncate.
I tried to do `echo *json | rush -- jaq -rf ./this-program.jq {} | datamash ...` and in that context I don't think it's appropriate to try to get artistic with the tty.
The cause of the errors, for whatever it's worth, is that `jaq` lacks `strftime`.
$ ./jaq-v1.2.0-x86_64-unknown-linux-gnu -sf aoc22-13.jq input.txt
Error: undefined filter
╭─[<unknown>:30:18]
│
30 │ ╭─▶ "bad input" | halt_error
31 │ ├─▶ end;
│ │
│ ╰───────────────── undefined filter
────╯
and (after commenting out halt_error) slower than both jq and gojq $ time jq -sf aoc22-13.jq input.txt
6415
20056
real 0m0.023s
user 0m0.010s
sys 0m0.010s
$
$ time gojq -sf aoc22-13.jq input.txt
6415
20056
real 0m0.070s
user 0m0.030s
sys 0m0.000s
$
$ time ./jaq-v1.2.0-x86_64-unknown-linux-gnu -sf aoc22-13.jq input.txt
6415
20056
real 0m0.103s
user 0m0.065s
sys 0m0.000s
aoc22-13.jq is here https://pastebin.com/raw/YiUjEu2n
and input.txt is here https://pastebin.com/raw/X0FSyTNf- yq changed its syntax between version 3 and 4 to be more like jq (but not quite the same for some reason)
- yq has no if-then-else https://github.com/mikefarah/yq/issues/95 which is a poor design (or omission) in my opinion
So yq works when you need to process YAML, it can even handle comments quite well. Buy for pure JSON processing jq is a better tool.
If this wrong behavior from jq, or some artifact consistent with how the floating point spec is defined, surprising, but faithful to IEEE 754 nonetheless?
It may be more verbose, but I never have to google anything, which makes a bigger difference in my experience
Not really in "production", but I have a lot of small-ish shell scripts all over the place, mostly in ~/bin, and some in CI (GitHub Actions) as well.
echo '{"a": 1, "b": 2}' | jaq 'add'
3
Construct an array from an object in two ways and show that they are equal:
$ echo '{"a": 1, "b": 2}' | jaq '[.a, .b] == [.[]]'
true
But I just looked at jql and I liked it even less. The pedantry about requiring all keys in selectors to be double quoted is, um, painful for a CLI tool.
I think they kind of stuck in the development, even the mule engine only have one active developer from the github commit ….
You learn something new everyday. Does anyone have any idea why this might be happening? Seems like more than just a bug..
`jq` is a really powerful tool and `jaq` promises to be even more powerful. But, as a system administrator, most lot of the time that I'm dealing with json files, something that behaved more like grep would be sufficient.
It converts your nested json into a line by line format which plays better with tools like `grep`
From the project's README:
▶ gron "https://api.github.com/repos/tomnomnom/gron/commits?per_page..." | fgrep "commit.author"
json[0].commit.author = {};
json[0].commit.author.date = "2016-07-02T10:51:21Z";
json[0].commit.author.email = "mail@tomnomnom.com";
json[0].commit.author.name = "Tom Hudson";
https://github.com/tomnomnom/gron
It was suggested to me in HN comments on an article I wrote about `jq`, and I have found myself using it a lot in my day to day workflow
It flattens the structure. And makes for easy diffing.
yq -o=props my-file.yaml
The idea is that you get awk/grep like commands for operating on structured data.
Since JSON is JavaScript Object Notation, then an obvious non-special-snowflake language for such expressions on the CLI is JavaScript: https://fx.wtf/getting-started#json-processing
It needs to justify moving to a completely different shell, but the way you deal with data in general does not restrict itself to manipulating json, but also the output of many commands, so you kinda have one unified piping interface for all these structured data manipulations, which I think is neat.
But jq's strength is its syntax - the difficulty is the semantics.
these little one-off unique syntaxes that i'm never going to properly learn are one of my favourite uses of chatGPT.
Or https://github.com/AtomGraph/JSON2XML which is based on https://www.w3.org/TR/xslt-30/#json-to-xml-mapping
It even looks like we could use an XSLT 3 processor with the json-to-xml function (https://www.w3.org/TR/xslt-30/#func-json-to-xml) and then use XQuery or stay with XSLT 3.
Now I have to test it.
(: file json2xml.xq :)
declare default element namespace "http://www.w3.org/2005/xpath-functions";
declare option saxon:output "method=text";
declare variable $file as xs:string external;
json-to-xml(unparsed-text($file))/<your xpath goes here>
java -cp ~/Java/SaxonHE12-3J/saxon-he-12.3.jar net.sf.saxon.Query -q:json2xml.xq file='/path/to/file.json' for $price in json-to-xml(unparsed-text($file))/map/map/number[@key="price"]
return $price+2
For the following JSON document: {
"fruit1": {
"name": "apple",
"color": "green",
"price": 1.2
},
"fruit2": {
"name": "pear",
"color": "green",
"price": 1.6
}
}
The call to json-to-xml() produces this XML document: <?xml version="1.0" encoding="UTF-8"?>
<map xmlns="http://www.w3.org/2005/xpath-functions">
<map key="fruit1">
<string key="name">apple</string>
<string key="color">green</string>
<number key="price">1.2</number>
</map>
<map key="fruit2">
<string key="name">pear</string>
<string key="color">green</string>
<number key="price">1.6</number>
</map>
</map>I simply gave up understanding the whole thing, and restored the balance in the universe by rewriting it in Perl.
But it's such a painful language to look at.
‘cat’ your json file and describe what you want I think should be the way to go
We have so many json query tools now it's insane.
Another likely reason is that it seems a motivation for jaq is improving the performance of jq. Any low-hanging fruit there in the jq implementation was likely handled a long time ago, so improving this in jq is likely to be hard. Writing a brand new implementation allows for trying out different ways of implementing the same functionality, and using a different language known for its performance helps too.
Using a language like Rust also helps with the goal of ensuring correctness and safety.
There's two classes of performance problems:
- implementation issues
- language issues
The latter is mainly a problem in `foreach` and also some missing ways to help programmers release references (via `$bindings`) that they no longer need.
The former is mostly a matter of doing a variety of bytecode interpreter improvements, and maybe doing more inlining, and maybe finding creative ways to reduce the number of branches.
The closest I’ve gotten is to wrap the APIs with GraphQL. This achieves joining, but requires strict typing and coding the schema+relationships ahead of time which restricts query flexibility for unforeseen edge cases.
Another is a workflow automation tool like n8n which isn’t as strict and is more user-friendly, but still isn’t very dynamic either.
Postman supports chaining, but in a static way with getting/setting env variables in pre/post request JS scripts.
Bash piping is another option, and seems like a more natural fit, but isn’t super reusable for data sources (e.g. with complex client/auth setup) and I’m not sure how well it would support batch requests.
It would be an interesting tool/language to build, but I figure there has to be a solution out there already.
open http://… | select * where …
# FROM can be omitted because you’re loading a pipe
https://murex.rocks/optional/select.html[1] : https://github.com/jinyus/related_post_gen
[2]: https://github.com/jinyus/related_post_gen/blob/main/jq/rela...