You probably misunderstand XML (opens in new tab)

(lemire.me)

71 pointsmsbmsb15y ago29 comments

29 comments

22 comments · 7 top-level

dgreensp15y ago· 8 in thread

Thumbs down.

He taught an entire course on XML, which he calls a "great meta-example on how to deal with semi-structured data"? And his only defense of XML over JSON is... it's worked ok for some file formats?

The only point in this whole article is that XML is not well-suited for RPCs, though he fails to argue that it's well-suited for anything else.

One argument is that XML is better than JSON for use cases like XHTML, where you heavily mix tags and content. I get the feeling XML wasn't really made for this case, though, it was made for the JSON-like case. Processing XHTML with E4X (the "XML for JavaScript" standard) is painful, and XML libraries in general assume your document basically consists of a tree of tags, maybe with text nodes at the leaves.

I was expecting some argument invoking the power of DTDs and XSLT or whatever else, or the original point of XML that people overlook, and all I got was an extremely weak defense of XML from someone who taught a whole course on it.

jamesbritt15y ago

"I get the feeling XML wasn't really made for this case, though, it was made for the JSON-like case."

Back in 1997, XML was "SGML for the Web." It was a way to pass around structured, plain-text, human-readable documents that did not require expensive, buggy, incomplete parsers.

It then got misapplied as an RPC transport encoding, and tools vendors were more than happy to start pushing specs, such as W3C Schemas, that demanded the use of tools.

It started out to be simple, but, as things happen, got hijacked. But the fault is with the misapplication, not XML itself.

jasonwatkinspdx15y ago

If you read the annotated xml spec, it really is quality work. I don't necessarily agree with every design decision, but I think a lot of people look at the complexity of xml applications and falsely blame xml itself.

Sadly there were some sensible early formats that were left behind. XML-RPC's serialization is a bit verbose but otherwise is quite similar to JSON. Somehow that got turned into SOAP and then eventually the WS- tar pit of complexity.

Likewise XML as a configuration file language can be quite elegant, almost like a literate coding version of common .ini or .conf files. But instead of this simple flat document littered with variables, xml config files in the wild end up with deeply nested structure that contributes dubious value and makes the files far less human friendly.

XML itself, with the possible exception of namespaces and a few other features, is quite simple. I totally agree it's the applications that have gotten out of hand, particularly in areas where XML is used as structured data exchange rather than document markup.

jimbokun15y ago

I guess the reason why this happened was because we didn't have JSON then, so XML looked like the best available option to many people at the time (except for those few who knew about s-expressions).

Now that we have JSON, there is no longer any excuse.

jaaron15y ago

If you believe his best point is that XML is not well-suited for RPCs then you've missed the point.

XML is good for exactly what it stands for: an extensible markup language. It's good for dealing with semi-structured data, especially when you have to deal with data from multiple domains.

Have you ever used SGML (other than HTML)? If so, then you'd likely agree that XML is a superior standard. But I'm guessing that you have not, because for some reason you believe that XML was created for data serialization.

DTDs and XSLT _are_ useful aspects of XML and I doubt the author in unaware of them. Rather the author assumed too much of the readers in understanding the history of XML and the nature of semi-structured data.

fauigerzigerk15y ago

I don't agree at all. Mixed content was once the primary use case for XML. XML and SGML before it was made for marking up documents and documents contain mixed content. They're not databases and they're not message formats containing structured data.

flogic15y ago

The only good point for XML is there are existing tools that do things via XML. There are tools that generate ATOM and RSS for you. And there are tools that consume ATOM and RSS. So if XML is already a well defined and followed standard for what you want do, use XML. In all other situations use something else.

fauigerzigerk15y ago

Then help me translate this into JSON please

How about that way:

{tag: "p", class: "content", text: ["Then help me translate this into ", {tag: "span", class: "highlight", text: "JSON"}, " please"]}

And that's just a small example where you can see all the start and end tags on one screen. Now change the example to insert a hyperlink, say, around the word "help". How easy is it to change?

1 more reply

jimbokun15y ago

"One argument is that XML is better than JSON for use cases like XHTML, where you heavily mix tags and content."

Yes, this is true. The point of using XML is when you have data where you know the structure of some parts, but not others. This is true of most things that begin life as prose, and then have some structure added to them later. It is a point between "bag of words" information retrieval, and SQL queries, that requires a different approach.

"I get the feeling XML wasn't really made for this case, though, it was made for the JSON-like case."

No, this is false. XML is awful for the JSON like case. What would make you think that XML was created for it?

adulau15y ago· 3 in thread

My biggest issue with XML is "XML misunderstands Unix philosophy". You can't easily use cut, awk and grep with XML without having a million of edge case to handle. There are some tools like XMLStarlet, xsltproc or xalan. But you can't safely extract content from XML files with standards tools even if you use the XML extension for gawk.

You could argue that XML documents are complex and cannot be described using simple comma separated. Maybe but some many XML documents are just there to store simple key,value data.

And now, we have "jsawk" (https://github.com/micha/jsawk) for parsing JSON under your terminal...

jerf15y ago

Actually, the problem is XML demonstrates the limits of the "UNIX philosophy". Plain text simply isn't the be-all, end-all of formats. You can't easily use cut, awk, or grep on JSON, either. The "UNIX philosophy" does poorly with trees and graphs (in the computer science sense).

That's not a bad thing. The UNIX philosophy encourages you to avoid those things if you don't need them. It's very powerful. But when you actually, factually need them, you're not going to get very far with UNIX tools. That's OK; it is neither an indictment of UNIX nor of the data. Different tools are called for.

ams611015y ago

xsl and transforms are what you use to extract data from xml. xslt is the coolest thing about xml IMHO.

the only real complaint I have is that xsl, being itself xml, is pretty verbose and can be tedious to write.

JonnieCache15y ago

xslt is beyond tedious, it is infuriating. There are few use cases for xslt that would not be better served with a procedural technique, eg. python and a parser.

also the whole "using xml to define a transformation on some other xml" thing is so overly meta as to induce a massive brain hemorrhage out of my nose and all over my desk.

1 more reply

unwind15y ago· 2 in thread

Dupe, and not very old at that: http://news.ycombinator.com/item?id=1916489.

msbmsbOP15y ago

I looked for a prior submission before posting - search and browse. Somehow I missed it.

RiderOfGiraffes15y ago

<fx: thoughtful frown>

http://searchyc.com/submissions/xml?sort=by_date

First result.

1 more reply

gruseom15y ago· 1 in thread

As Tim Lister once said, if everybody's getting it wrong, there's something wrong with it.

dkarl15y ago

That's a cop-out. You really have to define "everybody."

I know a guy who deployed a Java application on servers with 64MB of memory, and he did it back before the JIT compiler was any good. It was performant and got the job done. He's not unique: lots of performant Java applications were built on hardware that was tiny compared to today's hardware. But for some reasonable meaning of "everybody," everybody writes horrible bloated Java code that requires costly hardware to run.

I've used simple, practical XML web services -- in fact, we have several running at work, and when adding or changing functionality, dealing with the XML aspect is a rounding error compared to implementing the application logic. But for some reasonable meaning of "every," everybody writing enterprise XML web services creates overengineered, overcomplex, finicky interfaces that require ongoing error-prone tweaking of DOM or SAX code.

Sometimes when everybody's getting it wrong, that just means "it" has proved irresistible to stupid people and PHBs. It doesn't mean a sensible, tasteful engineer won't be able to use it correctly. Ditching a technology because stupid people love to misuse it may be a good fashion choice, and it may have a good way to influence hiring if you don't have more direct influence, but there's no engineering justification for it.

And don't forget that for some reasonable meaning of "everybody," everybody who has tried Lisp programming has become horribly lost and failed to accomplish anything with it. (This may be less true since Lisp is rarely taught in colleges nowadays, but it was true at some point in time.)

uriel15y ago· 1 in thread

It is the author of this article who, despite claiming to have taught a course on XML, seems to misunderstand XML.

I think one of the persons who best understood XML was Erik Naggum, or at least few have explained it so eloquently:

http://harmful.cat-v.org/software/xml/s-exp_vs_XML

jallmann15y ago

That Naggum email was beautiful, informative, funny and wildly digressive. Mind blown.

mojuba15y ago

Glad to hear SOAP is basically done: http://blogs.msdn.com/b/interoperability/archive/2010/11/10/...

That's laws of natural selection at work.

drivebyacct215y ago

Which is the part that I was supposed to have misunderstood exactly?

j / k navigate · click thread line to collapse

29 comments

22 comments · 7 top-level

dgreensp15y ago· 8 in thread

Thumbs down.

He taught an entire course on XML, which he calls a "great meta-example on how to deal with semi-structured data"? And his only defense of XML over JSON is... it's worked ok for some file formats?

The only point in this whole article is that XML is not well-suited for RPCs, though he fails to argue that it's well-suited for anything else.

jamesbritt15y ago

"I get the feeling XML wasn't really made for this case, though, it was made for the JSON-like case."

Back in 1997, XML was "SGML for the Web." It was a way to pass around structured, plain-text, human-readable documents that did not require expensive, buggy, incomplete parsers.

It then got misapplied as an RPC transport encoding, and tools vendors were more than happy to start pushing specs, such as W3C Schemas, that demanded the use of tools.

It started out to be simple, but, as things happen, got hijacked. But the fault is with the misapplication, not XML itself.

jasonwatkinspdx15y ago

jimbokun15y ago

I guess the reason why this happened was because we didn't have JSON then, so XML looked like the best available option to many people at the time (except for those few who knew about s-expressions).

Now that we have JSON, there is no longer any excuse.

jaaron15y ago

If you believe his best point is that XML is not well-suited for RPCs then you've missed the point.

XML is good for exactly what it stands for: an extensible markup language. It's good for dealing with semi-structured data, especially when you have to deal with data from multiple domains.

fauigerzigerk15y ago

flogic15y ago

fauigerzigerk15y ago

Then help me translate this into JSON please

How about that way:

{tag: "p", class: "content", text: ["Then help me translate this into ", {tag: "span", class: "highlight", text: "JSON"}, " please"]}

And that's just a small example where you can see all the start and end tags on one screen. Now change the example to insert a hyperlink, say, around the word "help". How easy is it to change?

1 more reply

jimbokun15y ago

"One argument is that XML is better than JSON for use cases like XHTML, where you heavily mix tags and content."

"I get the feeling XML wasn't really made for this case, though, it was made for the JSON-like case."

No, this is false. XML is awful for the JSON like case. What would make you think that XML was created for it?

adulau15y ago· 3 in thread

You could argue that XML documents are complex and cannot be described using simple comma separated. Maybe but some many XML documents are just there to store simple key,value data.

And now, we have "jsawk" (https://github.com/micha/jsawk) for parsing JSON under your terminal...

jerf15y ago

ams611015y ago

xsl and transforms are what you use to extract data from xml. xslt is the coolest thing about xml IMHO.

the only real complaint I have is that xsl, being itself xml, is pretty verbose and can be tedious to write.

JonnieCache15y ago

xslt is beyond tedious, it is infuriating. There are few use cases for xslt that would not be better served with a procedural technique, eg. python and a parser.

also the whole "using xml to define a transformation on some other xml" thing is so overly meta as to induce a massive brain hemorrhage out of my nose and all over my desk.

1 more reply

unwind15y ago· 2 in thread

Dupe, and not very old at that: http://news.ycombinator.com/item?id=1916489.

msbmsbOP15y ago

I looked for a prior submission before posting - search and browse. Somehow I missed it.

RiderOfGiraffes15y ago

<fx: thoughtful frown>

http://searchyc.com/submissions/xml?sort=by_date

First result.

1 more reply

gruseom15y ago· 1 in thread

As Tim Lister once said, if everybody's getting it wrong, there's something wrong with it.

dkarl15y ago

That's a cop-out. You really have to define "everybody."

uriel15y ago· 1 in thread

It is the author of this article who, despite claiming to have taught a course on XML, seems to misunderstand XML.

I think one of the persons who best understood XML was Erik Naggum, or at least few have explained it so eloquently:

http://harmful.cat-v.org/software/xml/s-exp_vs_XML

jallmann15y ago

That Naggum email was beautiful, informative, funny and wildly digressive. Mind blown.

mojuba15y ago

Glad to hear SOAP is basically done: http://blogs.msdn.com/b/interoperability/archive/2010/11/10/...

That's laws of natural selection at work.

drivebyacct215y ago

Which is the part that I was supposed to have misunderstood exactly?

j / k navigate · click thread line to collapse