undefined | Better HN

0 pointswfunction9y ago0 comments

Is it possible to show what XSLT is and why it's useful in like 5 minutes? I've always wanted a transformation language of some sort, but I've never managed to figure out XSLT (probably because I've never needed it) so I don't know what problems it solves or doesn't solve.

0 comments

fatihpense9y ago

For example, It is very easy to wrap some XML in other XML. Selecting XML nodes with XPath is also powerful. You don't have to write boilerplate Java etc. However, its template logic has a learning curve and it is only useful in work related to XML.

cturner9y ago

     I've always wanted a transformation language of some
     sort, but I've never managed to figure out XSLT
     (probably because I've never needed it) so I don't
     know what problems it solves or doesn't solve.

You'd be familiar with SQL. SQL is a declarative language for interacting with a relational structure. You say what you want and from where. It outputs to a table. Or, with significant effort, to more complex forms.

XPath + XSLT are declarative languages for interacting with a tree structure. You say what you want, and how you want the results laid out. It's particularly useful in integration scenarios. (I need to take data in horror format X, and then transform it into the completely different horror format Y)

Example A: you have a directory full of XML files that represent streets, estates, houses, buildings and apartments across the nation. The depth of data in these nodes is inconsistent: houses are generally top-level; apartments are nested within buildings within estates. You need to (1) select one or two bedroom homes their are in a particular set of postcodes. And (2) to capture some facts about each of those homes in a completely different XML format.

XPath is useful for finding the things. XSLT is your tool for manipulating the results from the XPath query into the output document format.

Example B: a vendor sends you accounting data. They are set up to send you one nasty format only. You have a third-party internal finance system that requires a separate specific format.

Source example:

    <document>
        <account name="1234">
            <payment date="20150808" ccy="USD" amt="500" />
            <payment date="20150810" ccy="USD" amt="600" />
            <payment date="20150810" ccy="NZD" amt="700" />
        </account>
    </document>

Destination example:

    <document>
        <date="20150808" account="1234">
            <trans>USD 500</trans>
        </date>
        <date="20150810" account="1234>
            <trans>USD 600</trans>
            <trans>NZD 700</trans>
        </date>
    </document>

You could definitely knock something up that did this transform in python or perl. Particularly if you were confident where the newlines would be. There are situations where this makes sense: XML tooling is not as strong in those platforms as Java/C#, and you may want colleagues to be able to maintain this stuff without them having to learn entirely new technology stacks.

However, once you're dealing with a complex problem, XSLT+XPath are what you want. If you wrote perl or python to do this, your perl or python would evolve to 80% of a slow, ill-conceived, badly implemented ripoff of the apache XPath+XSLT. And you'd run into all kinds of problems with edge-case stuff like unicode.

If I was building an editorial pipeline for a newspaper or publisher, it'd be XML+XSLT all the way. But there's a lot of places where I would avoid XML and not need XSLT.

XML is flawed for the domain where it gets the most action: system APIs. XML encourages complex, monolithic, document-separated interfaces. To correct for this, the community has layered yet more complex schema systems on top of it.

System interfaces should steer towards being tight, flat, specific, discoverable and stream-oriented. System interfaces with those qualities are easier to build and maintain and learn.

In place of XML, I prefer the approach below. At the start of your feed, assert the interface you think the other person should be receiving on. Then send messages over those vectors.

    # i lines assert the interface (emphasis: this is an assertion, /not/ an IDL)
    i ccy h
    i account h name
    i trans h date account_h ccy_h amount
        i leg trans_h amount
    #
    # now send your data stream over those vectors
    ccy USD
    ccy NZD
    account account/1234 "John Smith"
    trans trans/0 20150808 account/1234 USD 500
    trans trans/1 20150810 account/1234 USD 600
    trans trans/2 20150810 account/1234 NZD 700
        leg trans/2 200
        leg trans/2 500

If the receiver disagrees with the interface, then it errors at startup and not half way through the stream.

In this format, tree structures are possible. But you have to work for them. This nudges interfaces towards flat forms that are more greppable and awkable.

Imagine a complex business where all the interchange formats were captured in this interface script. A studious non-developer could quickly learn to really dance with it and think in terms of their data flows. They could discover things, and respond to emergencies with a text editor. You could give trusted users access to a kind of power that is rarely shared with non-developers. Users who have worked on systems like this talk of them in hushed tones that acknowledge the respect and power that was shown to them.

With XML it's harder to make reliable inferences about the schema, and harder to debug entry errors. For this reason you generally can't trust end-users with it.

Why have I gone through this? Because: if you're careful about designing your serialisation mechanisms, you can get further along before you need to resort to XSLT.

There are python3 parsing and producer mechanisms for interface script at github.com/cratuki/solent in package solent.util.interface_script (or: pip3 install solent). It wouldn't be much work to write Java/C# SAX interfaces to it.

wfunctionOP9y ago

Thank you for taking the time to write this!!

j / k navigate · click thread line to collapse