Using accumulators to number items in a streamable way
Both in XSLT 1 and 2 you can use the powerful xsl:number instruction to count and number nodes. Of course, that instruction is also supported in XSLT 3, however, if you want to use xsl:number with streaming then you will find that the streamability of that instruction is rather restricted:
As an example let's take a recent post on StackOverflow that wants to count and number the row and col elements in the input sample
and transform it to a flat structure with the row and col index as attributes on a data element:
Using accumulators we can continue to use pattern matching (like with xsl:number) to match certain nodes in a document and we can store a primitive value like an xs:integer count with the accumulator, so for the row element we simply need to define an accumulator matching those row elements and write a rule that increments the value each time such an element is parsed:
For the col elements we want to count the occurrence inside each row, so here we need to write two rules, one that resets the accumulator to 0 each time a row is parsed, the second then simply increments the accumulator value again each time a col inside a row parent is parsed:
Taking all this together we have a declarative way of storing and incrementing the counters and then only need to access the values using the function accumulator-before in the template matching a col element:
The whole stylesheet then looks as follows:
As you can see, the mode and the accumulators are declared as streamable so this stylesheet, given an XSLT 3 processor like Saxon 9.8 EE, works with streaming, however, the nice thing about this approach is that an XSLT 3 processor like Saxon 9.8 HE or PE that doesn't support streaming simply ignores the requests for streaming but uses the accumulators and runs the stylesheet in a conventional way giving the wanted result.
To summarize, to count and number nodes in XSLT 3 we can use accumulators and write a stylesheet that exploits streaming with huge documents with a steaming processor like Saxon 9 EE but falls back to conventional processing with a non-streaming processor like Saxon HE or EE.
On the other hand, you will see in this article that using accumulators you can write a stylesheet to count and number nodes in XSLT 3 in a way that works with both streaming and non-streaming.xsl:number
can be used for formatting of numbers supplied directly using thevalue
attribute, and also for numbering of nodes in a non-streamed document, but it cannot be used for numbering streamed nodes
As an example let's take a recent post on StackOverflow that wants to count and number the row and col elements in the input sample
and transform it to a flat structure with the row and col index as attributes on a data element:
Using accumulators we can continue to use pattern matching (like with xsl:number) to match certain nodes in a document and we can store a primitive value like an xs:integer count with the accumulator, so for the row element we simply need to define an accumulator matching those row elements and write a rule that increments the value each time such an element is parsed:
For the col elements we want to count the occurrence inside each row, so here we need to write two rules, one that resets the accumulator to 0 each time a row is parsed, the second then simply increments the accumulator value again each time a col inside a row parent is parsed:
Taking all this together we have a declarative way of storing and incrementing the counters and then only need to access the values using the function accumulator-before in the template matching a col element:
The whole stylesheet then looks as follows:
As you can see, the mode and the accumulators are declared as streamable so this stylesheet, given an XSLT 3 processor like Saxon 9.8 EE, works with streaming, however, the nice thing about this approach is that an XSLT 3 processor like Saxon 9.8 HE or PE that doesn't support streaming simply ignores the requests for streaming but uses the accumulators and runs the stylesheet in a conventional way giving the wanted result.
To summarize, to count and number nodes in XSLT 3 we can use accumulators and write a stylesheet that exploits streaming with huge documents with a steaming processor like Saxon 9 EE but falls back to conventional processing with a non-streaming processor like Saxon HE or EE.
Brilliant Post. I played with these examoles and went ahead to add additional scenarios and was able to add summation logic of each "col" node by creating a new accumulator that can select "$value + current ()"
ReplyDeleteThanks for your post. I was searching for some alternative of using Saxon to find counts. Your example using Accumulator worked like butter.
ReplyDelete