Using accumulators to number items in a streamable way

Both in XSLT 1 and 2 you can use the powerful xsl:number instruction to count and number nodes. Of course, that instruction is also supported in XSLT 3, however, if you want to use xsl:number with streaming then you will find that the streamability of that instruction is rather restricted:
 xsl:number can be used for formatting of numbers supplied directly using the value attribute, and also for numbering of nodes in a non-streamed document, but it cannot be used for numbering streamed nodes
On the other hand, you will see in this article that using accumulators you can write a stylesheet to count and number nodes in XSLT 3 in a way that works with both streaming and non-streaming.

As an example let's take a recent post on StackOverflow that wants to count and number the row and col elements in the input sample

<?xml version="1.0" encoding="UTF-8"?>
<root>
<row>
<col>v11</col>
<col>v12</col>
<col>v13</col>
<col>v14</col>
</row>
<row>
<col>v21</col>
<col>v22</col>
<col>v23</col>
<col>v24</col>
</row>
</root>
view raw input.xml hosted with ❤ by GitHub

and transform it to a flat structure with the row and col index as attributes on a data element:

<?xml version="1.0" encoding="UTF-8"?>
<root>
<data row="1" col="1">v11</data>
<data row="1" col="2">v12</data>
<data row="1" col="3">v13</data>
<data row="1" col="4">v14</data>
<data row="2" col="1">v21</data>
<data row="2" col="2">v22</data>
<data row="2" col="3">v23</data>
<data row="2" col="4">v24</data>
</root>
view raw result.xml hosted with ❤ by GitHub

Using accumulators we can continue to use pattern matching (like with xsl:number) to match certain nodes in a document and we can store a primitive value like an xs:integer count with the accumulator, so for the row element we simply need to define an accumulator matching those row elements and write a rule that increments the value each time such an element is parsed:

<xsl:accumulator name="row-count" as="xs:integer" initial-value="0" streamable="yes">
<xsl:accumulator-rule match="row" select="$value + 1"/>
</xsl:accumulator>
For the col elements we want to count the occurrence inside each row, so here we need to write two rules, one that resets the accumulator to 0 each time a row is parsed, the second then simply increments the accumulator value again each time a col inside a row parent is parsed:

<xsl:accumulator name="col-count" as="xs:integer" initial-value="0" streamable="yes">
<xsl:accumulator-rule match="row" select="0"/>
<xsl:accumulator-rule match="row/col" select="$value + 1"/>
</xsl:accumulator>
Taking all this together we have a declarative way of storing and incrementing the counters and then only need to access the values using the function accumulator-before in the template matching a col element:

<xsl:template match="col">
<data row="{accumulator-before('row-count')}" col="{accumulator-before('col-count')}">{.}</data>
</xsl:template>

The whole stylesheet then looks as follows:

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:xs="http://www.w3.org/2001/XMLSchema"
xmlns:math="http://www.w3.org/2005/xpath-functions/math"
exclude-result-prefixes="xs math"
expand-text="yes"
version="3.0">
<xsl:mode streamable="yes" use-accumulators="#all" on-no-match="shallow-skip"/>
<xsl:output indent="yes"/>
<xsl:accumulator name="row-count" as="xs:integer" initial-value="0" streamable="yes">
<xsl:accumulator-rule match="row" select="$value + 1"/>
</xsl:accumulator>
<xsl:accumulator name="col-count" as="xs:integer" initial-value="0" streamable="yes">
<xsl:accumulator-rule match="row" select="0"/>
<xsl:accumulator-rule match="row/col" select="$value + 1"/>
</xsl:accumulator>
<xsl:template match="root">
<xsl:copy>
<xsl:apply-templates/>
</xsl:copy>
</xsl:template>
<xsl:template match="col">
<data row="{accumulator-before('row-count')}" col="{accumulator-before('col-count')}">{.}</data>
</xsl:template>
</xsl:stylesheet>

As you can see, the mode and the accumulators are declared as streamable so this stylesheet, given an XSLT 3 processor like Saxon 9.8 EE, works with streaming, however, the nice thing about this approach is that an XSLT 3 processor like Saxon 9.8 HE or PE that doesn't support streaming simply ignores the requests for streaming but uses the accumulators and runs the stylesheet in a conventional way giving the wanted result.

To summarize, to count and number nodes in XSLT 3 we can use accumulators and write a stylesheet that exploits streaming with huge documents with a steaming processor like Saxon 9 EE but falls back to conventional processing with a non-streaming processor like Saxon HE or EE.

Comments

  1. Brilliant Post. I played with these examoles and went ahead to add additional scenarios and was able to add summation logic of each "col" node by creating a new accumulator that can select "$value + current ()"

    ReplyDelete
  2. Thanks for your post. I was searching for some alternative of using Saxon to find counts. Your example using Accumulator worked like butter.

    ReplyDelete

Post a Comment

Popular posts from this blog

Extracting sub trees of a document using snapshot()

Fun(ctional programming) with fold-left and transform