Using accumulators to number items in a streamable way
Both in XSLT 1 and 2 you can use the powerful xsl:number instruction to count and number nodes. Of course, that instruction is also supported in XSLT 3, however, if you want to use xsl:number with streaming then you will find that the streamability of that instruction is rather restricted:
As an example let's take a recent post on StackOverflow that wants to count and number the row and col elements in the input sample
and transform it to a flat structure with the row and col index as attributes on a data element:
Using accumulators we can continue to use pattern matching (like with xsl:number) to match certain nodes in a document and we can store a primitive value like an xs:integer count with the accumulator, so for the row element we simply need to define an accumulator matching those row elements and write a rule that increments the value each time such an element is parsed:
For the col elements we want to count the occurrence inside each row, so here we need to write two rules, one that resets the accumulator to 0 each time a row is parsed, the second then simply increments the accumulator value again each time a col inside a row parent is parsed:
Taking all this together we have a declarative way of storing and incrementing the counters and then only need to access the values using the function accumulator-before in the template matching a col element:
The whole stylesheet then looks as follows:
As you can see, the mode and the accumulators are declared as streamable so this stylesheet, given an XSLT 3 processor like Saxon 9.8 EE, works with streaming, however, the nice thing about this approach is that an XSLT 3 processor like Saxon 9.8 HE or PE that doesn't support streaming simply ignores the requests for streaming but uses the accumulators and runs the stylesheet in a conventional way giving the wanted result.
To summarize, to count and number nodes in XSLT 3 we can use accumulators and write a stylesheet that exploits streaming with huge documents with a steaming processor like Saxon 9 EE but falls back to conventional processing with a non-streaming processor like Saxon HE or EE.
On the other hand, you will see in this article that using accumulators you can write a stylesheet to count and number nodes in XSLT 3 in a way that works with both streaming and non-streaming.xsl:number
can be used for formatting of numbers supplied directly using thevalue
attribute, and also for numbering of nodes in a non-streamed document, but it cannot be used for numbering streamed nodes
As an example let's take a recent post on StackOverflow that wants to count and number the row and col elements in the input sample
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
<?xml version="1.0" encoding="UTF-8"?> | |
<root> | |
<row> | |
<col>v11</col> | |
<col>v12</col> | |
<col>v13</col> | |
<col>v14</col> | |
</row> | |
<row> | |
<col>v21</col> | |
<col>v22</col> | |
<col>v23</col> | |
<col>v24</col> | |
</row> | |
</root> |
and transform it to a flat structure with the row and col index as attributes on a data element:
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
<?xml version="1.0" encoding="UTF-8"?> | |
<root> | |
<data row="1" col="1">v11</data> | |
<data row="1" col="2">v12</data> | |
<data row="1" col="3">v13</data> | |
<data row="1" col="4">v14</data> | |
<data row="2" col="1">v21</data> | |
<data row="2" col="2">v22</data> | |
<data row="2" col="3">v23</data> | |
<data row="2" col="4">v24</data> | |
</root> |
Using accumulators we can continue to use pattern matching (like with xsl:number) to match certain nodes in a document and we can store a primitive value like an xs:integer count with the accumulator, so for the row element we simply need to define an accumulator matching those row elements and write a rule that increments the value each time such an element is parsed:
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
<xsl:accumulator name="row-count" as="xs:integer" initial-value="0" streamable="yes"> | |
<xsl:accumulator-rule match="row" select="$value + 1"/> | |
</xsl:accumulator> |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
<xsl:accumulator name="col-count" as="xs:integer" initial-value="0" streamable="yes"> | |
<xsl:accumulator-rule match="row" select="0"/> | |
<xsl:accumulator-rule match="row/col" select="$value + 1"/> | |
</xsl:accumulator> |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
<xsl:template match="col"> | |
<data row="{accumulator-before('row-count')}" col="{accumulator-before('col-count')}">{.}</data> | |
</xsl:template> |
The whole stylesheet then looks as follows:
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
<?xml version="1.0" encoding="UTF-8"?> | |
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" | |
xmlns:xs="http://www.w3.org/2001/XMLSchema" | |
xmlns:math="http://www.w3.org/2005/xpath-functions/math" | |
exclude-result-prefixes="xs math" | |
expand-text="yes" | |
version="3.0"> | |
<xsl:mode streamable="yes" use-accumulators="#all" on-no-match="shallow-skip"/> | |
<xsl:output indent="yes"/> | |
<xsl:accumulator name="row-count" as="xs:integer" initial-value="0" streamable="yes"> | |
<xsl:accumulator-rule match="row" select="$value + 1"/> | |
</xsl:accumulator> | |
<xsl:accumulator name="col-count" as="xs:integer" initial-value="0" streamable="yes"> | |
<xsl:accumulator-rule match="row" select="0"/> | |
<xsl:accumulator-rule match="row/col" select="$value + 1"/> | |
</xsl:accumulator> | |
<xsl:template match="root"> | |
<xsl:copy> | |
<xsl:apply-templates/> | |
</xsl:copy> | |
</xsl:template> | |
<xsl:template match="col"> | |
<data row="{accumulator-before('row-count')}" col="{accumulator-before('col-count')}">{.}</data> | |
</xsl:template> | |
</xsl:stylesheet> |
As you can see, the mode and the accumulators are declared as streamable so this stylesheet, given an XSLT 3 processor like Saxon 9.8 EE, works with streaming, however, the nice thing about this approach is that an XSLT 3 processor like Saxon 9.8 HE or PE that doesn't support streaming simply ignores the requests for streaming but uses the accumulators and runs the stylesheet in a conventional way giving the wanted result.
To summarize, to count and number nodes in XSLT 3 we can use accumulators and write a stylesheet that exploits streaming with huge documents with a steaming processor like Saxon 9 EE but falls back to conventional processing with a non-streaming processor like Saxon HE or EE.
Brilliant Post. I played with these examoles and went ahead to add additional scenarios and was able to add summation logic of each "col" node by creating a new accumulator that can select "$value + current ()"
ReplyDeleteThanks for your post. I was searching for some alternative of using Saxon to find counts. Your example using Accumulator worked like butter.
ReplyDelete