Fun(ctional programming) with fold-left and transform

Over on StackOverflow someone asked how to apply an XSLT 1.0 stylesheet for merging two XML documents to all XML files in a directory. The approach there in the question as well as the answer is to use a shell script to run the stylesheet with Saxon on two files, then merge the result with a third file and so on.

However, given that since XPath 3.1 we have the uri-collection function to process a sequence of files and that XPath 3.1 even offers a transform function to perform a transformation directly in XSLT/XPath I thought it should also be possible to solve the problem completely in XSLT 3.0. Additionally the algorithm to process a sequence of input files, applying the merge transformation repeatedly to each file, accumulating the result, looked like an opportunity to use the fold-left function supported also in XSLT 3.0 since being included in the XPath 3 functions and operators specification.

The original XSLT 1.0 stylesheet takes a primary input document and a with parameter to provide the URL of the second file to be merged. The transform function provided in XSLT/XPath 3 allows us to call such a stylesheet by passing in a single map argument with three items, one being the stylesheet-node, one being the source-node and for the parameters stylesheet-params we need to use a further map containing the single with parameter as an xs:QName -> xs:string key -> value pair. So to encapsulate that into a single function I have come up with the following function mf:merge taking the primary input document as a node, the secondary input doc URL as a string and assuming the already loaded stylesheet being present as a global variable $merge-sheet:


The transform function, as well as taking a map argument, also returns a map, as you can see, inside the mf:merge function we simply access the main transformation result in that map with the key output directly, using the lookup operator ?output, and return it.

With that setup, what's left is to process a sequence of input files, passing in the first file as a node to the above function, the second as a URL and then to merge the result with the third file and so on. That is where fold-left comes in handy as follows:


As you can see, that function simply takes a sequence of input file URIs and then calls fold-left with the tail of the sequence as the first argument, passing in the loaded first file with doc(head($input-uris)) and a named function reference to the function mf:merge#2 shown earlier.

The whole XSLT 3.0 stylesheet then looks as follows:


As you can see, it uses the uri-collection function with a Saxon specific collection URI extension <xsl:param name="file-selection-pattern" as="xs:string" select="'?select=*.xml'"/> to read in the URIs of all .xml files in a folder and then simply has to call mf:merge($input-uris).

The stylesheet can be run with Saxon 9.8 PE or EE, unfortunately it does not work with 9.8 HE for two reasons, first, higher-order functions like fold-left and named function references are not supported in HE, and secondly, the original stylesheet is an XSLT 1.0 stylesheet which 9.8 HE does not support. So to use the original stylesheet with 9.8 HE, we would need to edit the original merge code to use version 3.0, and we would need to rewrite the functions and implement the recursion that fold-left provides us without using a named function reference.

For the time being, here are three sample input documents to be merged and the result that Saxon 9.8 executing the also shown original XSLT 1.0 stylesheet creates:


As promised in the first edit of this post, I below also show to implement the same approach of the transform use for Saxon 9.8 HE by avoiding the use of fold-left and a named function reference and instead implementing the recursion in a user-defined function:



Make sure you edit the original merge XSLT 1.0 stylesheet to use version="3.0" to allow running it with Saxon 9.8 HE. This concludes this blog post.


Comments

Post a Comment

Popular posts from this blog

Using accumulators to number items in a streamable way

Extracting sub trees of a document using snapshot()