A brief XML rant (opens in new tab)

(atl.me)

129 pointsalsothings13y ago73 comments

73 comments

54 comments · 17 top-level

masklinn13y ago· 8 in thread

> do not use template languages to generate XML.

Small correction: do not use text template languages (Jinja, moustache, erb — which seems to be the one used here considering `%= display_date %>`, raw PHP, smarty, freemarker, what have you) to generate XML. There are templating languages whose primary use case is to generate markup (including XML)[0] and (unless they're broken to uselessness) they should guarantee the output is valid XML.

> Schema-design-wise, the content:encoded and excerpt:encoded element names are deeply suspect, as if someone looked at RSS 2.0, squinted, shrugged, and invented their own ad hoc analogous namespace prefix, rather than understanding the role of elements in XML.

They seem to be using Wordpress's WXR import/export format, hence the wp-namespaced elements. The "content" and "excerpt" namespace garbage comes straight from there according to http://ipggi.wordpress.com/2011/03/16/the-wordpress-extended...

> <content:encoded> Is the replacement for the restrictive Rss <description> element. Enclosed within a character data enclosure is the complete WordPress formatted blog post, HTML tags and all.

> <excerpt:encoded> This is an unknown elementThis is a summary or description of the post often used by RSS/Atom feeds..

Considering the cottage industry of wordpress interaction, it was probably a good move to shoot for interop (should allow posterous exports to be directly imported into wordpress?). Not sure they succeeded though.

[0] genshi for instance http://genshi.edgewall.org/

timdorr13y ago

   There are templating languages whose primary use case is to generate markup 
   (including XML)[0] and (unless they're broken to uselessness) they should 
   guarantee the output is valid XML.

Since they are using Rails, they should be using Builder for this: http://api.rubyonrails.org/classes/ActionView/Base.html#labe... https://github.com/jimweirich/builder

masklinn13y ago

> Since they are using Rails, they should be using Builder for this

Indeed. It's really odd that they munged together an XML export in ERB when builder exist. Does it have some sort of breaking issue with namespaces or something which could explain the choice?

1 more reply

purephase13y ago

RABL does a pretty decent job at generating XML too.

1 more reply

the_mitsuhiko13y ago

While I do not recommend people to generate XML with Jinja2, it's actually not to bad at doing that. It will escape properly for you automatically and unlike many other solutions in Python it actually supports streaming.

</biased response>

masklinn13y ago

> It will escape properly for you automatically and unlike many other solutions in Python it actually supports streaming.

True and true, but it does not guarantee the output XML will be valid: as far as Jinja's concerned it's all just text is it not? Genshi also supports streaming (using `serialize`), will also properly escape everything and — using the default xml serializer — ensures the output is valid XML.

(edit: I want to note that I wasn't trying to put down jinja, it's just the first text-based template I thought of when trying to write down a list, it's a fine templating language) (just not to generate XML)

TazeTSchnitzel13y ago

  Error on line 2: Closing tag for non-existent opening tag "biased"

  Error on line 2: Closing tags cannot have attributes

adamtaro13y ago

Agreed on your stipulation on Genshi and the like.

And thanks for the further reverse engineering of the likely intent of the export. I wouldn't disagree with most of the WP-centric design choices. But attempting to run through a real XML parser might've been a good choice as well. (And I note there's a fair bit of complaint on the WP forums about the difficulty of using the data for import.)

bambax13y ago

Just generated an export from Posterous. It's not just namespaces. XML files contain unescaped html entities (  for example). What a mess.

gizzlon13y ago· 6 in thread

He has a few valid complaints (by a few I mean one), but this is really not that bad compared to a lot of the XML floating around. No reason to be shocked

"There are no namespace declarations. No self-respecting XML parser will have anything to do with this XML data."

I don't get this comment. I have never seen an XML parser that would refuse to parse XML without a namespace..

Am i missing something? Or is that just mindless hyperbole?

masklinn13y ago

> Am i missing something? Or is that just mindless hyperbole?

Note that the document uses namespaces but does not declare them. In Python, both ElementTree and LXML will blow up parsing when they encounter the first undeclared prefix (dc, from dc:creator)

gizzlon13y ago

Ah, you're right, I did miss something =)

Still nothing to be "shocked" about though ..

1 more reply

bambax13y ago

You can't process XML that uses namespaces without a namespace declaration. A namespace prefix is just a shorthand for the namespace itself.

prefix:name-of-element doesn't mean anything by itself, you need to know what 'prefix' stands for.

As it is, this XML is not parsable; it's not well-formed and therefore it shouldn't even be called XML; it's just text with random tags thrown in.

It is, indeed, quite shocking.

laurent12345613y ago

Maybe it's just me, and it's probably wrong, but more than once I pre-processed XML data by replacing all the "namespace:tag" by "namespace-tag" so that I can easily parse the XML without having to care about namespaces. I've never been convinced that this feature has much use anyway.

1 more reply

masklinn13y ago

Technically you "can" if you manage to find a non-namespace-aware XML parser, it'll parse `prefix:name` as the ELEMENTNAME `prefix:name`.

> As it is, this XML is not parsable

It's parsable with a non-namespace-aware XML parser (ignoring tagsoup parsers as we're pretenting this is supposed to be an XML document)

1 more reply

obviouslygreen13y ago

"Worse things have happened" is a very tempting and unfortunate dismissal... I think we all do it, but when something is broken, is it really that important what else has been broken that may have been worse?

Yes, shit happens, and it's never going to stop happening. Not in the face of all the misguided idealism in the world. But is being punched in the face OK because the puncher didn't use brass knuckles? If he did, is it still OK, because people have been shot in the face, and that's a lot worse than being punched?

Anyway, I thought the same about namespacing until that was addressed in a more constructive reply. So thanks for asking that question. :)

kaoD13y ago· 6 in thread

Who uses XML in 2013 anyways?

duaneb13y ago

What would you recommend to replace XML that handles arbitrary trees, namespaces, attributes, and tools that are built on this, e.g. XSLT?

I don't think XML is amazing, but it still has its place.

kaoD13y ago

Put your torches out, it's just a joke :)

icebraining13y ago

Anyone who wants to interoperate with software not written in 2013?

kaoD13y ago

Shame on them.

function_seven13y ago

Sketchers (http://www.skechers.com/). Go View Source on that.

rrreese13y ago

In 2013 XML is widely used. What alternatives would you suggest?

stblack13y ago· 3 in thread

I don't see any problem with this XML that can't be easily overcome.

The comment about GMT-offsetting the date is particularly pithy, Assuming the blog in question isn't about ephemerides. By and large, blog posts have dates. If you desperately need an hour-offset from GMT, one might suggest this is your edge-case because, by and large, it doesn't matter.

Count me among those who would argue that the omission of a schema is a blessing.

I've wasted whole f*cking days of my life wrangling with so-called "non-amateur" XML. Invariably this was over-bloated XML with schemas that did nothing to help the discoverability and the processing of the data. Plain and simple, XML is over-spec'd and many data publishers, aided by their inflexible toolsets, pushed their XML beyond reason.

Be careful what you wish for.

I would take this XML, map-it, iterate it, done! End of story. I don't think there's much to complain about here.

masklinn13y ago

> Count me among those who would argue that the omission of a schema is a blessing.

TFA didn't ask for a schema, TFA asked for namespace declarations. Because they're kind-of necessary to parse namespaces with a namespace-aware XML parser. That's got 0 relation with a Schema. He only mentioned in passing because `content:encoded` and `excerpt:encoded` make very little sense... schema-wise (not "in an XML-Schema document").

> I would take this XML

You can't "take this XML" because it's not XML. Once you know it's not XML you can "take this tag soup", shove it into a tagsoup library (maybe with some encoding-guessing beforehand) and hope things come out about right at the other end — with no insurance that this is the case, you're deep in GIGO land at this point — but you can't "take it and map it"

obviouslygreen13y ago

As someone who has used BeautifulSoup very happily without considering its etymology... is "tag soup" an actual term or just a very apt description you're using?

[edit: A quick search, which I should have conducted instead of posting this, shows this has at least been used before, and enough not to be deleted by Wikipedia editors for lack of notability. That's pretty funny.]

adamtaro13y ago

Hi, TFA here.

Masklinn is right: I didn't ask for a schema. I didn't ask for anything. I somehow expected well-formed XML in a directory full of .xml files. That's 90% of what the rant is about.

I wasted a handful of fscking years of my life editing a significant international standard that used a peculiar dialect of W3C XML Schema. I know from schema over-design. I'm just talking about understanding the bare basics of xml and seeing that 'excerpt:encoded' might not convey what you think it means when set next to 'content:encoded'.

And it indeed took less time to hack together a solution to extract the information I needed (yay sed!) than it did to write this quick rant. That's not the point. The point is that the hacks and work-arounds should have been unnecessary. It's passing the savings of having one careless dev on as a cost to countless others having to deal with the data downstream.

paulnechifor13y ago· 3 in thread

This isn't really criticism of XML, though. You can do a good job of screwing up in any language or format.

adamtaro13y ago

It was not intended as a criticism of XML at all. XML is a perfectly cromulent standard. It is a criticism of amateurish use of XML.

egeozcan13y ago

Which is everywhere (xhtml, anyone?)

3 more replies

dscrd13y ago

XML is so complex and obtuse that one can hardly blame the practicioners for misusing it.

1 more reply

tlarkworthy13y ago· 3 in thread

It would take 0.5 days work to get that into any format you desire so I don't think it fails its purpose.

jerf13y ago

No. Once you screw up encoding, the information is generally gone. It's not just a matter of munging, it's often a matter of having to grovel over the entire file, by hand, correcting things.

Programmers seem to love to think that encoding errors are a joke, but they aren't. The data is gone. That's a big deal. Why are you even writing a program in the first place if it's just going to output unrecoverable gibberish? So you can throw the onus on the user to figure it out?

And that's to say nothing of trying to recover the date.

mikeash13y ago

It drives me bonkers. Use UTF-8. Use other encodings only when talking to systems that require it, and use those other encodings only when actually reading or writing the data. Translate to UTF-8 at the earliest opportunity, and translate from UTF-8 at the last possible moment, and only if you must.

This isn't the 90s. This stuff is basically solved now, except people can't be bothered to use the solution.

ohwp13y ago

Lets say you could have earned $100 per hour instead of writing your own "parser". Then suddenly 0.5 days is $400.

bazzargh13y ago· 2 in thread

One thing that bugs me about this is the use of CDATA. CDATA sections are just-about ok in hand-crafted xml, but in machine generated xml, they are absolutely pointless, and usually hint that the coder doesn't know what they're doing.

For example, the author thinks that the content inside the CDATA is escaped, but in fact, it isn't necessarily - eg in this case they're including chunks of html which may contain more CDATA sections, and of course they don't nest (you need to terminate and restart the CDATA section). I've also seen examples where the enclosing encoding and the encoding of the CDATA section were incompatible.

The worst thing is specs with CDATA sections in examples. Junior devs bend over backwards to use things like xsl's disable-output-escaping to get a character-for-character match in test results, and then wonder why their code breaks in production.

gav13y ago

Outside of a few special cases (such as wanting to make embedded content in XML human editable) CDATA should be treated as a big warning flag that the author of the code that generated the XML doesn't really understand what they are doing.

There's always the issue that one day ']]>' will somehow sneak in and everything will break.

The key is using a tool to generate the XML that will transparently handle things like escaping correctly instead of using templating tools designed for text or HTML output.

mnarayan0113y ago

I'm not sure making "XML human editable" should really be considered a special case.

nanoscopic13y ago· 2 in thread

"There are no namespace declarations. No self-respecting XML parser will have anything to do with this XML data."

I would argue that any self respecting xml parser should parser it just find and shouldn't demand the namespaces to be defined at all.

"...invented their own ad hoc analogous namespace prefix, rather than understanding the role of elements in XML"

I don't think you understand the base concept of XML much. It is meant to be a generic container to hold whatever you want. XML in and of itself doesn't enforce node naming. Sure if you are talking about the official spec it does, but people pretty much globally use whatever node names they want. Don't have a cow.

"I haven’t been able to determine the intended encoding of the files"

Well maybe you should look into a parser that just parses as is without attempting to use some specific encoding.

Check out XML::Bare on cpan for perl. It will parse pretty much anything you throw at it, in any encoding. It leaves it up to you, the user, to decide what to do with the data after parsing.

masklinn13y ago

> I would argue that any self respecting xml parser should parser it just find and shouldn't demand the namespaces to be defined at all.

The XML Namespaces specification unambiguously requires that a namespace be declared:

> The namespace prefix, unless it is xml or xmlns, MUST have been declared in a namespace declaration attribute in either the start-tag of the element where the prefix is used or in an ancestor element (i.e., an element in whose content the prefixed markup occurs).

A self-respecting XML parser would follow the spec. A namespace-aware XML parser must fault on undeclared namespaces.

Most XML parsers are namespace-aware.

> I don't think you understand the base concept of XML much.

Pot, meet kettle.

> XML in and of itself doesn't enforce node naming. Sure if you are talking about the official spec it does

Don't you feel like you're contradicting yourself a bit there?

> Well maybe you should look into a parser that just parses as is without attempting to use some specific encoding.

So he should look into parsers which do not parse XML and have no issue mangling the content? What are they going to do, assume the encoding is ascii-compatible anyway and go to town? How wonderfully anglo-centric.

> Check out XML::Bare on cpan for perl.

XML::Bare is an XML parser in the same sense that xhtml interpreted as text/html is an XML document: not in any way, shape or form. And if that's what you're shooting for, don't pretend to suggest an XML parser and suggest a recovering "soup" parser instead, something like html5lib or BeautifulSoup.

But herein remains the issue: I expect Posterous advertised their export as XML files, not as "encoding-deficient tag soup" (which it apparently is). I'm sure TFA would have had no expectations if he'd been told he got garbage in, and would have relied on tagsoup-parsing and encoding-guessing (using whatever libraries for doing so are available in his language of choice).

As it stands, he did have the pretty basic and undemanding expectation that he could shove supposedly-XML files into an XML parser and get data.

sanderjd13y ago

You seem to know a lot about the XML specification. More than your parent and certainly more than me. That's great, and following specifications is good and all, but citing the spec as requiring that "a namespace-aware XML parser must fault on undeclared namespaces" does not give me any sense for why I would want it to. Put another way - what does the namespace declaration and halting error due to its omission accomplish for me?

Failing to define a content type is obviously dumb, but I can't seem to get riled up about leaving off namespace declarations.

3 more replies

fpgeek13y ago· 1 in thread

> Get off my lawn, you kids.

Isn't that what they were doing?

dylangs103013y ago

Upped for giving me a chuckle in the midst of some very heated XML discussion :)

TheAnimus13y ago· 1 in thread

I'd just like to take a moment to mention Nested Comments.

Oh if I had a £1 for every time I'd had to sift through lines and lines of code, because I can't just comment an element. I just can't comprehend why they'd need to reserve -- inside a comment.

masklinn13y ago

> I just can't comprehend why they'd need to reserve -- inside a comment.

It's because the feature was inherited from SGML, first for commenting in element declarations (e.g. <!ELEMENT -- this is an element>) and then generalized to the whole document: in SGML, the grammar for a comment is

    comment declaration =
        MDO ("<!"), (comment, ( s | comment )* )?, MDC (">")
    comment =
        COM ("--"), SGML character*, COM ("--")

HTML — as an SGML application — theoretically inherited this feature (most UA don't really implement it correctly so it's not exactly safe to use sequences of dashes inside a comment, browsers may or may not toggle commenting). See http://www.howtocreate.co.uk/SGMLComments.html for a more extensive explanation especially in relation to browsers (SGML-compliant comments handling used to be part of early ACID2, before being removed because it was a stupid idea)

Meanwhile XML took half of it, threw the rest away, and called it a day.

daGrevis13y ago· 1 in thread

Probably they used regexes to parse it. :)

_kst_13y ago

http://stackoverflow.com/a/1732454/827263 for those who haven't seen it.

LoneWolf13y ago· 1 in thread

Am I the only one bothered by the extremely oversized xml snippets? Or is it just me?

Chrome 25.0.1364.97 m

adamtaro13y ago

It's a pretty new redesign. I use Chrome myself, but shoot me a screenshot? hello at article_domain

westi13y ago

To be fair to the Posterous Team they are doing a good job of fixing the bugs in the export as they are reported to them.

Hopefully they will get all of them fixed before the final close down.

If you want an easy way to get your Posterous Export file cleaned up and into a more Valid XML file then feel free to use the Import from Posterous option over at WordPress.com - http://en.support.wordpress.com/import/import-from-posterous...

We've spent some time on writing code which cleans up the XML file so that it can be imported into WordPress successfully.

You can then export a clean WXR file and import elsewhere much easier - http://en.support.wordpress.com/export/

peterkelly13y ago

There's nothing wrong with invalid XML - why is everyone complaining? C compilers should similarly take a stab in the dark about what the programmer meant if they encounter invalid syntax as well. And those linking errors always annoy me - it should just pick the closest matching symbol if the specified one can't be found.

Sami_Lehtinen13y ago

So it seems that we prefer XML which is easy to read. I have seen those files way often. Like: <xml><item><key>1</key><value>Something</value></item><item....></xml>

Then you have to combine what ever keys and values are in item tags. I found out these to be very annoying files to handle. Especially when key is X3 and value is 83d, you have to look for every combination from some kind of mapping, because non of those tells you absolutely nothing directly. At least its easy to create files that full fill the schema, because the complexity is pushed out of XML level. Often these files are created by "upgrading" CSV to XML. Let's just call column key # and then put what ever is in that column to the value tag. Yes attributes could be used, but often aren't.

Then you have to know that if key X contains value Y then you also need to look for key Z and hopefully it does contain value N or what ever.

niggler13y ago

It's ironic how many problems (large and irritating enough to justify blog posts or public spates) could have been avoided if someone bothered to test beforehand.

If someone did a trial export he would immediately see the missing dates.

icedchai13y ago

Yes, it's crap, but it would take a few minutes to clean this up with a couple of sed scripts to turn ns:tag into ns_tag or something to make it parseable.

Or you could prepend some fake namespace declarations.

j / k navigate · click thread line to collapse

73 comments

54 comments · 17 top-level

masklinn13y ago· 8 in thread

> do not use template languages to generate XML.

> <content:encoded> Is the replacement for the restrictive Rss <description> element. Enclosed within a character data enclosure is the complete WordPress formatted blog post, HTML tags and all.

> <excerpt:encoded> This is an unknown elementThis is a summary or description of the post often used by RSS/Atom feeds..

[0] genshi for instance http://genshi.edgewall.org/

timdorr13y ago

   There are templating languages whose primary use case is to generate markup 
   (including XML)[0] and (unless they're broken to uselessness) they should 
   guarantee the output is valid XML.

Since they are using Rails, they should be using Builder for this: http://api.rubyonrails.org/classes/ActionView/Base.html#labe... https://github.com/jimweirich/builder

masklinn13y ago

> Since they are using Rails, they should be using Builder for this

Indeed. It's really odd that they munged together an XML export in ERB when builder exist. Does it have some sort of breaking issue with namespaces or something which could explain the choice?

1 more reply

purephase13y ago

RABL does a pretty decent job at generating XML too.

1 more reply

the_mitsuhiko13y ago

</biased response>

masklinn13y ago

> It will escape properly for you automatically and unlike many other solutions in Python it actually supports streaming.

TazeTSchnitzel13y ago

  Error on line 2: Closing tag for non-existent opening tag "biased"

  Error on line 2: Closing tags cannot have attributes

adamtaro13y ago

Agreed on your stipulation on Genshi and the like.

bambax13y ago

Just generated an export from Posterous. It's not just namespaces. XML files contain unescaped html entities (  for example). What a mess.

gizzlon13y ago· 6 in thread

He has a few valid complaints (by a few I mean one), but this is really not that bad compared to a lot of the XML floating around. No reason to be shocked

"There are no namespace declarations. No self-respecting XML parser will have anything to do with this XML data."

I don't get this comment. I have never seen an XML parser that would refuse to parse XML without a namespace..

Am i missing something? Or is that just mindless hyperbole?

masklinn13y ago

> Am i missing something? Or is that just mindless hyperbole?

Note that the document uses namespaces but does not declare them. In Python, both ElementTree and LXML will blow up parsing when they encounter the first undeclared prefix (dc, from dc:creator)

gizzlon13y ago

Ah, you're right, I did miss something =)

Still nothing to be "shocked" about though ..

1 more reply

bambax13y ago

You can't process XML that uses namespaces without a namespace declaration. A namespace prefix is just a shorthand for the namespace itself.

prefix:name-of-element doesn't mean anything by itself, you need to know what 'prefix' stands for.

As it is, this XML is not parsable; it's not well-formed and therefore it shouldn't even be called XML; it's just text with random tags thrown in.

It is, indeed, quite shocking.

laurent12345613y ago

1 more reply

masklinn13y ago

Technically you "can" if you manage to find a non-namespace-aware XML parser, it'll parse `prefix:name` as the ELEMENTNAME `prefix:name`.

> As it is, this XML is not parsable

It's parsable with a non-namespace-aware XML parser (ignoring tagsoup parsers as we're pretenting this is supposed to be an XML document)

1 more reply

obviouslygreen13y ago

Anyway, I thought the same about namespacing until that was addressed in a more constructive reply. So thanks for asking that question. :)

kaoD13y ago· 6 in thread

Who uses XML in 2013 anyways?

duaneb13y ago

What would you recommend to replace XML that handles arbitrary trees, namespaces, attributes, and tools that are built on this, e.g. XSLT?

I don't think XML is amazing, but it still has its place.

kaoD13y ago

Put your torches out, it's just a joke :)

icebraining13y ago

Anyone who wants to interoperate with software not written in 2013?

kaoD13y ago

Shame on them.

function_seven13y ago

Sketchers (http://www.skechers.com/). Go View Source on that.

rrreese13y ago

In 2013 XML is widely used. What alternatives would you suggest?

stblack13y ago· 3 in thread

I don't see any problem with this XML that can't be easily overcome.

Count me among those who would argue that the omission of a schema is a blessing.

Be careful what you wish for.

I would take this XML, map-it, iterate it, done! End of story. I don't think there's much to complain about here.

masklinn13y ago

> Count me among those who would argue that the omission of a schema is a blessing.

> I would take this XML

obviouslygreen13y ago

As someone who has used BeautifulSoup very happily without considering its etymology... is "tag soup" an actual term or just a very apt description you're using?

adamtaro13y ago

Hi, TFA here.

Masklinn is right: I didn't ask for a schema. I didn't ask for anything. I somehow expected well-formed XML in a directory full of .xml files. That's 90% of what the rant is about.

paulnechifor13y ago· 3 in thread

This isn't really criticism of XML, though. You can do a good job of screwing up in any language or format.

adamtaro13y ago

It was not intended as a criticism of XML at all. XML is a perfectly cromulent standard. It is a criticism of amateurish use of XML.

egeozcan13y ago

Which is everywhere (xhtml, anyone?)

3 more replies

dscrd13y ago

XML is so complex and obtuse that one can hardly blame the practicioners for misusing it.

1 more reply

tlarkworthy13y ago· 3 in thread

It would take 0.5 days work to get that into any format you desire so I don't think it fails its purpose.

jerf13y ago

No. Once you screw up encoding, the information is generally gone. It's not just a matter of munging, it's often a matter of having to grovel over the entire file, by hand, correcting things.

And that's to say nothing of trying to recover the date.

mikeash13y ago

This isn't the 90s. This stuff is basically solved now, except people can't be bothered to use the solution.

ohwp13y ago

Lets say you could have earned $100 per hour instead of writing your own "parser". Then suddenly 0.5 days is $400.

bazzargh13y ago· 2 in thread

gav13y ago

There's always the issue that one day ']]>' will somehow sneak in and everything will break.

The key is using a tool to generate the XML that will transparently handle things like escaping correctly instead of using templating tools designed for text or HTML output.

mnarayan0113y ago

I'm not sure making "XML human editable" should really be considered a special case.

nanoscopic13y ago· 2 in thread

"There are no namespace declarations. No self-respecting XML parser will have anything to do with this XML data."

I would argue that any self respecting xml parser should parser it just find and shouldn't demand the namespaces to be defined at all.

"...invented their own ad hoc analogous namespace prefix, rather than understanding the role of elements in XML"

"I haven’t been able to determine the intended encoding of the files"

Well maybe you should look into a parser that just parses as is without attempting to use some specific encoding.

Check out XML::Bare on cpan for perl. It will parse pretty much anything you throw at it, in any encoding. It leaves it up to you, the user, to decide what to do with the data after parsing.

masklinn13y ago

> I would argue that any self respecting xml parser should parser it just find and shouldn't demand the namespaces to be defined at all.

The XML Namespaces specification unambiguously requires that a namespace be declared:

A self-respecting XML parser would follow the spec. A namespace-aware XML parser must fault on undeclared namespaces.

Most XML parsers are namespace-aware.

> I don't think you understand the base concept of XML much.

Pot, meet kettle.

> XML in and of itself doesn't enforce node naming. Sure if you are talking about the official spec it does

Don't you feel like you're contradicting yourself a bit there?

> Well maybe you should look into a parser that just parses as is without attempting to use some specific encoding.

> Check out XML::Bare on cpan for perl.

As it stands, he did have the pretty basic and undemanding expectation that he could shove supposedly-XML files into an XML parser and get data.

sanderjd13y ago

Failing to define a content type is obviously dumb, but I can't seem to get riled up about leaving off namespace declarations.

3 more replies

fpgeek13y ago· 1 in thread

> Get off my lawn, you kids.

Isn't that what they were doing?

dylangs103013y ago

Upped for giving me a chuckle in the midst of some very heated XML discussion :)

TheAnimus13y ago· 1 in thread

I'd just like to take a moment to mention Nested Comments.

Oh if I had a £1 for every time I'd had to sift through lines and lines of code, because I can't just comment an element. I just can't comprehend why they'd need to reserve -- inside a comment.

masklinn13y ago

> I just can't comprehend why they'd need to reserve -- inside a comment.

    comment declaration =
        MDO ("<!"), (comment, ( s | comment )* )?, MDC (">")
    comment =
        COM ("--"), SGML character*, COM ("--")

Meanwhile XML took half of it, threw the rest away, and called it a day.

daGrevis13y ago· 1 in thread

Probably they used regexes to parse it. :)

_kst_13y ago

http://stackoverflow.com/a/1732454/827263 for those who haven't seen it.

LoneWolf13y ago· 1 in thread

Am I the only one bothered by the extremely oversized xml snippets? Or is it just me?

Chrome 25.0.1364.97 m

adamtaro13y ago

It's a pretty new redesign. I use Chrome myself, but shoot me a screenshot? hello at article_domain

westi13y ago

To be fair to the Posterous Team they are doing a good job of fixing the bugs in the export as they are reported to them.

Hopefully they will get all of them fixed before the final close down.

We've spent some time on writing code which cleans up the XML file so that it can be imported into WordPress successfully.

You can then export a clean WXR file and import elsewhere much easier - http://en.support.wordpress.com/export/

peterkelly13y ago

Sami_Lehtinen13y ago

So it seems that we prefer XML which is easy to read. I have seen those files way often. Like: <xml><item><key>1</key><value>Something</value></item><item....></xml>

Then you have to know that if key X contains value Y then you also need to look for key Z and hopefully it does contain value N or what ever.

niggler13y ago

It's ironic how many problems (large and irritating enough to justify blog posts or public spates) could have been avoided if someone bothered to test beforehand.

If someone did a trial export he would immediately see the missing dates.

icedchai13y ago

Yes, it's crap, but it would take a few minutes to clean this up with a couple of sed scripts to turn ns:tag into ns_tag or something to make it parseable.

Or you could prepend some fake namespace declarations.

j / k navigate · click thread line to collapse