undefined | Better HN

0 pointsvog8y ago0 comments

The so-called "plain text formats" are also just representations of complex objects. Often these formats are ad-hoc and hard to parse correctly. Multiple text files in a file system hierarchy (e.g. /etc) are also just nested structures.

So in principle, treating those as complex object structures is the right way to go. Also, I believe this is the idea behind D-Bus and similar modern Unix developments.

However, what's hard is to provide good tooling with a simple syntax that is simple to understand:

* Windows Registry and Power Shell show how not to do it.

* The XML toolchain demonstrate that any unnecessary complexity in the meta-structure will haunt you through every bit of the toolchain (querying, validation/schema, etc.).

* "jq" goes somewhat into the right direction for JSON files, but still hasn't found wide adoption.

This appears to be a really hard design issue. Also, while deep hierarchies are easier to process by scripts, a human overview is mostly achieved by advances searching and tags rather than hierarchies.

A shell needs to accomodate for both, but maybe we really just need better command line tools for simple hierarchial processing of arbitrary text file formats (ad-hoc configs, CSV, INI, JSON, YAML, XML, etc.).

0 comments

vidarh8y ago

I lean towards wanting all my tools to support at least one common structured format (and JSON generally would seem the best supported) these days. It'd be great if there was a "standard" environment variable to toggle JSON input/output on for apps. If you did that, then e.g. a "structured" shell could just default to turning that on, and try detecting JSON on output and offering according additional functionality.

That'd be a tolerable graduated approach where you get benefits even if not every tool support it.

With "jq", I'm already increasingly leaning on JSON for this type of role.

It'd need to be very non-invasive, though, or all kinds of things are likely to break.

skissane8y ago

I wish there was a way for pipes to carry metadata about the format of the data being passed over the pipe, like a MIME type (plain text, CSV, JSON, XML, etc). Ideally with some way for the two ends of the pipe to negotiate with each other about what types each end supports. I think that sort of out-of-band communication would most likely need some kernel support.

Maybe an IOCTL to put a pipe into "packet mode" in which rather than just reading/writing raw bytes you send/receive Type-Length-Value packets... If you put your pipe in packet mode, and the other end then reads or writes without enabling packet mode, the kernel can send a control message to you saying "other end doesn't support packet mode" and then you can send/receive data in your default format. Whereas, if both ends enter packet mode before reading/writing, then the kernel sends a control message to one end saying "packet mode negotiated", and which triggers the two ends to exchange further control messages to negotiate a data format, before actually sending the data. (This implies pipes must be made bidirectional, at least for control packets.)

zdkl8y ago

As counterpoint if only for the sake of my wetware memory, I'd rather have only a few pipes being as agnostic and generally capable as possible than complex plumbing.

I don't mean to argue there is no value in specialised adapters, but I believe the default should be as general as can be. Let me worry about parsing/formats at the application level and get me simple underlying pipes. If I need something specific I should be prepared to dig into docs and find out what I need anyway, so default text vs install or explicitely configure your system to use something else seems like a sane feature/complexity segregation in the general case.

EDIT: good quote from another thread to illustrate my point: > Removing the responsibility for reliable communication from the packet transport mechanism allows us to tailor reliability to the application and to place error recovery where it will do the most good. This policy becomes more important as Ethernets are interconnected in a hierarchy of networks through which packets must travel farther and suffer greater risks.

replace Ethernet with pipes and the point still has merit IMO. Lifted off of https://news.ycombinator.com/item?id=14675115

Too8y ago

Programs can determine if the receiving output is a TTY or not and enable e.g. colored output accordingly, sounds like something similar.

valarauca18y ago

This is called a "Unix Domain Socket" they're 20+ years old.

vram228y ago

It's an innovative and interesting idea. But just to play devil's advocate for a minute, for this to get widespread uptake, the way it is implemented would have to be agreed on as being the right way, by a lot of people (i.e. users). And there could be lots of different ways of implementing what you describe, with variations, which makes it more difficult (though not impossible) to get accepted widely, compared to the existing plain pipes and passing data as text around between commands, because not much variation is possible there.

jtolmar8y ago

I'd prefer a "crash if receiver doesn't support sender's MIME type" model for debugability.

Which could still be ergonomic if all the common piping programs supported some specific collection of formats and there was a relatively robust format-converting program.

ori_b8y ago

> It'd be great if there was a "standard" environment variable to toggle JSON input/output on for apps.

Look at FreeBsd's libxo. It's supported by most of the base system.

drothlis8y ago

It looks like libxo does the output in structured format, but what about input? (For chaining commands together with pipes). Did the FreeBsd porting work[1] implement the input side with libxo, independently of libxo, or not at all?

(Not to diminish libxo --it looks pretty cool, and I didn't know about it before-- just curious.)

[1]: https://wiki.freebsd.org/LibXo

1 more reply

vidarh8y ago

That looks very interesting, though I actually think that standardising on the command-line/env variable API is more important than an implementation. It's the moment you can "automagically" "upgrade" a pipe to contain structured data for apps that support it that you get the most value.

But given that it's there, I'll certainly consider it when writing tools, and consider supporting the command-line option/env var even if/when I don't use the implementation...

ibotty8y ago

Amazing! I did not know that.

vogOP8y ago

Nice idea! Maybe this issue is similar to that of command completion, but could also result in a similar mess. Every command has its "-h" or "--help", showing the syntax, available options and so on. But since even that is not really standardized, separate command-line completion rules are written for every command. And for every shell.

skissane8y ago

Define a standard format for describing command syntax. I'd suggest basing it on JSON, but you could use XML or Protobuf or Thrift or ASN.1 or Sexprs or whatever. Embed in executables a section containing the command syntax description object. Now when I type in a command, the shell finds the executable on the path, opens it, locates the command description syntax section (if present), and if found uses that to provide tab completion, online help, smarter syntax checking, whatever–if you've ever used OS/400, it has a feature where it constructs a fill-in-form dialog based on the command syntax descriptions, which you can call up at the press of a function key. (Obviously the shell should cache this data, it shouldn't reread /bin/ls every time I type "ls".)

3 more replies

vidarh8y ago

I still miss AmigaOS, where command options parsing was largely standardised ca. 1987 via ReadArgs() (barring the occasional poor/direct port from other OS's) [1]. It wasn't perfect (it only provided a very basic usage summary), but it was typed, and it was there.

[1] http://www.pjhutchison.org/tutorial/cmdline_arrgs.html

1 more reply

codesnik8y ago

jq is awesome. "json lines" data format (where each record is json without literal newlines, and newline separates records) nicely upgrades unix pipes to structured output, so you can mix old unix tools, like head, tail etc with jq. maybe what's needed is some set of simple json wrappers around core utils to increase adoption.

subnixr8y ago

> "json lines" data format (where each record is json without literal newlines, and newline separates records)

Good point, but jq handles this already. If the payload is an array of object, simply `jq -c '.[]'` to get an object per line

gregopet8y ago

jq can't handle big numbers, it silently mangles them, beware! Was bitten by this twice already..

kemayo8y ago

That's per the spec[1], enjoyably:

    Note that when such software is used, numbers that are
    integers and are in the range [-(2**53)+1, (2**53)-1]
    are interoperable in the sense that implementations will
    agree exactly on their numeric values.

2^53 is only 9007199254740990, so it's not too hard to exceed that, particularly in things like twitter status ids.

The recommended use is to have big numbers as strings, since it's the only way to reliably pass them around. (Yes, this is kind of horrible.)

[1]: https://tools.ietf.org/html/rfc7159#section-6

1 more reply

fnord1238y ago

>A shell needs to accomodate for both, but maybe we really just need better command line tools for simple hierarchial processing of arbitrary text file formats (ad-hoc configs, CSV, INI, JSON, YAML, XML, etc.).

Start with record based streams first. Text streams requires [buggy] parsing to be implemented everywhere. It should be possible to have escaped record formats that allow the right side of the pipe to use AWK style $1, $2, $3, etc.

After removing the need to parse the fields, the next priority, imo, would be to introduce integral types so the actual data itself doesn't need to be parsed. u32 on the LHS can just be 4 bytes and then read out on the RHS as 4 bytes. This could save a lot of overhead when processing large files.

Only then would I want to get into hierarchies, product types, sum types, etc.

moe8y ago

This is dead on.

Half of all parsing work consists of splitting things into records, by lines, delimiters or whitespace. That's where the great escaping headache begins.

In a better shell the following command would Just Work™:

> find | rm %path/%filename

int_19h8y ago

Powershell does two things right: it uses structured output, and it separates producing the data from rendering the data.

It also does one thing wrong: it uses objects (i.e. the stuff that carries behavior, not just state). This ties it to a particular object model, and the framework that supports that model.

What's really needed is something simple that's data-centric, like JSON, but with a complete toolchain to define schemas and perform transformations, like XML (but without the warts and overengineering).

still_grokking8y ago

> What's really needed is something simple that's data-centric, like JSON, but with a complete toolchain to define schemas and perform transformations, like XML (but without the warts and overengineering).

What would it be? Is there a real alternative to XML with those features out there? I don't think so.

When you would want to have the features of XML and would design it from scratch I'm quite sure it would have the complexity of XML again.

Usually implementing some functionality yields every time the same level of complexity regardless of how you implement it (given that none of the implementations isn't out right stupid of curse).

int_19h8y ago

> When you would want to have the features of XML and would design it from scratch I'm quite sure it would have the complexity of XML again.

I don't think so. The problem with the XML stack is that it has been designed with some very "enterprisey" (for the lack of better term) scenarios in mind - stuff like SOAP. Consequently, it was all design by committee in the worst possible sense of the word, and it shows.

To see what I mean, take a look at XML Schema W3C specs. That's probably the worst part of it, so it should be readily apparent what I mean:

https://www.w3.org/TR/xmlschema11-1/ https://www.w3.org/TR/xmlschema11-2/

The other problem with XML is that it's rooted in SGML, and inherited a lot of its syntax and semantics, which were designed for a completely different use case - marking up documents. Consequently, the syntax is overly verbose, and some features are inconsistent for other scenarios - for example, if you use XML to describe structured data, when do you use attributes, and when do you use child elements? Don't forget that attributes are semantically unordered in XDM, while elements are ordered, but also that attributes cannot contain anything but scalar values and arrays thereof.

Oh, and then don't forget all the legacy stuff like DTD, which is mostly redundant in the face of XML Schema and XInclude, except it's still a required part of the spec.

I guess the TL;DR version of it is that XML today is kinda like Java - it was there for too long, including periods when our ideas of best practices were radically different, and all that was enshrined in the design, and then fossilized in the name of backwards compatibility.

One important takeaway from XML - why it was so successful, IMO - is that having a coherent, a tightly bound spec stack is a good thing. For example, with XML, when someone is talking about schemas, you can pretty much assume it's XML Schema by default (yes, there's also RELAX NG, but I think calling it schema is a misnomer, because it doesn't delve much into semantics of what it describes - it's more of a grammar definition language for XML). To transfer XML, you use XSLT. To query it, you use XPath or XQuery (which is a strict superset). And so on. With JSON, there's no such certainty.

The other thing that the XML stack didn't quite see fully through, but showed that it could be a nice thing, is its homoiconicity: e.g. XML Schema and XSLT being XML. Less so with XPath and XQuery, but there they had at least defined a canonical XML representation for it, which gives you most of the same advantages. Unfortunately, with XML it was just as often a curse as it was a blessing, because of how verbose and sometimes awkward its syntax is - anyone who wrote large amounts of XSLT especially knows what I'm talking about. On the other hand, at least XML had comments, unlike JSON!

Hey, maybe that's actually the test case? A data representation language must be concise enough, powerful enough, and flexible enough to make it possible to use it to define its own schema and transformations, without it being a painful experience, while also being simple enough that a single person can write a parser for it in a reasonable amount of time.

1 more reply

skywhopper8y ago

The brilliant thing about the status quo is that it already does accomodate both line-based and structured pipelines. There's no reason we should expect to use the same set of commands for both purposes. In fact, we most definitely shouldn't. A tool like jq is a great example of this: it works great for JSON, and it doesn't attempt to do more. When we want to work with JSON in a pipeline, we pull out jq. When we want to work with XML, we pull out xmlstarlet or the like. When we want to work with delimited columnar data, we can use awk or cut. I'm failing to see what's missing from an architectural point of view.

zdkl8y ago

> in principle, treating those as complex object structures is the right way to go

I do not see this as a given. It's a matter of abstraction vs performance tradeoff and that is highly subjective. Unless you pioneer a new standard form for complex object notation this'll just end up back to a format flamewar.

(And if we go that way, I'd argue for s-exprs or -dare I say it- xml)

mos_basik8y ago

On phone so unsure if it's been mentioned in the thread already, but I've been enjoying learning a js library JSONata [0] after finding it included in Node-RED.

I remember finding jq a few weeks ago and thinking "wow, this will probably come in handy for a specific kind of situation" and filing it mentally for later use, but I haven't used it for anything yet so I'm not super familiar with the extent of it's features.

I have been using a lot of JSONata one liners to replace several procedural functions that were doing data transforms on JSON objects, and I'm very impressed. It's a querying library but it's Turing complete - it has lambdas, it can save references to data and functions as variables, etc.

It also seems relatively new/unknown; I've found hardly any blogs or forums mentioning it. The developer is active - he fixed a bug report I submitted in less then a day.

I'd love to have that kind of functionality in a CLI tool. Maybe jq is equally powerful, I don't know.

I haven't had time to run any performance analysis on JSONata and haven't found anyone else online who's done any yet. I'm very curious how its queries compare to efficiently implemented procedural approaches.

0: http://jsonata.org/

elcritch8y ago

Thanks! Will have to try out `jsonata`, as `jq` never clicked for me (too complex). Alternately I've been using a node.js program `json` [1] which has a basic but straightforward cli which covers 95% of my daily needs. I believe I found after trying to figure out the tool Joyent.com uses in there SmartOS sysadmin setup. Not sure if it's the exact same tool, but it's very similar.

As an example, basic manipulation is just:

`echo '{"age":10}' | json -e 'this.age++' #=> {"age": 11}`

1: https://github.com/trentm/json

still_grokking8y ago

For XML there are xml-coreutils[1].

[1] http://www.lbreyer.com/xml-coreutils.html

merb8y ago

btw. one of my favorite config is HOCON. it allows ini style and json style.https://github.com/typesafehub/config/blob/master/HOCON.md

agumonkey8y ago

sexp ?

ps: I kinda like powershell (the few hours I toyed in it

eecc8y ago

S-expressions

j / k navigate · click thread line to collapse

0 comments

vidarh8y ago

That'd be a tolerable graduated approach where you get benefits even if not every tool support it.

With "jq", I'm already increasingly leaning on JSON for this type of role.

It'd need to be very non-invasive, though, or all kinds of things are likely to break.

skissane8y ago

zdkl8y ago

As counterpoint if only for the sake of my wetware memory, I'd rather have only a few pipes being as agnostic and generally capable as possible than complex plumbing.

replace Ethernet with pipes and the point still has merit IMO. Lifted off of https://news.ycombinator.com/item?id=14675115

Too8y ago

Programs can determine if the receiving output is a TTY or not and enable e.g. colored output accordingly, sounds like something similar.

valarauca18y ago

This is called a "Unix Domain Socket" they're 20+ years old.

vram228y ago

jtolmar8y ago

I'd prefer a "crash if receiver doesn't support sender's MIME type" model for debugability.

Which could still be ergonomic if all the common piping programs supported some specific collection of formats and there was a relatively robust format-converting program.

ori_b8y ago

> It'd be great if there was a "standard" environment variable to toggle JSON input/output on for apps.

Look at FreeBsd's libxo. It's supported by most of the base system.

drothlis8y ago

(Not to diminish libxo --it looks pretty cool, and I didn't know about it before-- just curious.)

[1]: https://wiki.freebsd.org/LibXo

1 more reply

vidarh8y ago

But given that it's there, I'll certainly consider it when writing tools, and consider supporting the command-line option/env var even if/when I don't use the implementation...

ibotty8y ago

Amazing! I did not know that.

vogOP8y ago

skissane8y ago

3 more replies

vidarh8y ago

[1] http://www.pjhutchison.org/tutorial/cmdline_arrgs.html

1 more reply

codesnik8y ago

subnixr8y ago

> "json lines" data format (where each record is json without literal newlines, and newline separates records)

Good point, but jq handles this already. If the payload is an array of object, simply `jq -c '.[]'` to get an object per line

gregopet8y ago

jq can't handle big numbers, it silently mangles them, beware! Was bitten by this twice already..

kemayo8y ago

That's per the spec[1], enjoyably:

    Note that when such software is used, numbers that are
    integers and are in the range [-(2**53)+1, (2**53)-1]
    are interoperable in the sense that implementations will
    agree exactly on their numeric values.

2^53 is only 9007199254740990, so it's not too hard to exceed that, particularly in things like twitter status ids.

The recommended use is to have big numbers as strings, since it's the only way to reliably pass them around. (Yes, this is kind of horrible.)

[1]: https://tools.ietf.org/html/rfc7159#section-6

1 more reply

fnord1238y ago

Only then would I want to get into hierarchies, product types, sum types, etc.

moe8y ago

This is dead on.

Half of all parsing work consists of splitting things into records, by lines, delimiters or whitespace. That's where the great escaping headache begins.

In a better shell the following command would Just Work™:

> find | rm %path/%filename

int_19h8y ago

Powershell does two things right: it uses structured output, and it separates producing the data from rendering the data.

It also does one thing wrong: it uses objects (i.e. the stuff that carries behavior, not just state). This ties it to a particular object model, and the framework that supports that model.

still_grokking8y ago

What would it be? Is there a real alternative to XML with those features out there? I don't think so.

When you would want to have the features of XML and would design it from scratch I'm quite sure it would have the complexity of XML again.

Usually implementing some functionality yields every time the same level of complexity regardless of how you implement it (given that none of the implementations isn't out right stupid of curse).

int_19h8y ago

> When you would want to have the features of XML and would design it from scratch I'm quite sure it would have the complexity of XML again.

To see what I mean, take a look at XML Schema W3C specs. That's probably the worst part of it, so it should be readily apparent what I mean:

https://www.w3.org/TR/xmlschema11-1/ https://www.w3.org/TR/xmlschema11-2/

Oh, and then don't forget all the legacy stuff like DTD, which is mostly redundant in the face of XML Schema and XInclude, except it's still a required part of the spec.

1 more reply

skywhopper8y ago

zdkl8y ago

> in principle, treating those as complex object structures is the right way to go

(And if we go that way, I'd argue for s-exprs or -dare I say it- xml)

mos_basik8y ago

On phone so unsure if it's been mentioned in the thread already, but I've been enjoying learning a js library JSONata [0] after finding it included in Node-RED.

It also seems relatively new/unknown; I've found hardly any blogs or forums mentioning it. The developer is active - he fixed a bug report I submitted in less then a day.

I'd love to have that kind of functionality in a CLI tool. Maybe jq is equally powerful, I don't know.

0: http://jsonata.org/

elcritch8y ago

As an example, basic manipulation is just:

`echo '{"age":10}' | json -e 'this.age++' #=> {"age": 11}`

1: https://github.com/trentm/json

still_grokking8y ago

For XML there are xml-coreutils[1].

[1] http://www.lbreyer.com/xml-coreutils.html

merb8y ago

btw. one of my favorite config is HOCON. it allows ini style and json style.https://github.com/typesafehub/config/blob/master/HOCON.md

agumonkey8y ago

sexp ?

ps: I kinda like powershell (the few hours I toyed in it

eecc8y ago

S-expressions

j / k navigate · click thread line to collapse