Saturday, July 3, 2010

Overcome Slow Performance in SSIS with SQL FOR XML Clause

SQL Server Integration Services has a significant performance issue with large result sets returned by SQL'ss FOR XML clause. The problem is due to memory consumption and the way SSIS's data flow engine handles row-based results. When SSIS receives a data stream, it buffers a portion of the stream, performs the necessary operations on the rows in the buffer, clears the buffer, and repeats the process until there is no more data remaining in the stream. The problem with the way FOR XML returns data is that it doesn't return rows, but rather a result set with a single row, which is seen by SSIS as a large Binary Large Object ( BLOB).

The initial reaction many developers have to this issue is to decrease the DefaultBufferSize property of data flow task. Unfortunately, this won't help at all since the buffer is not able work with fractional rows, which means the buffer(s) will continue to fill beyond their threshold until a whole number of rows is reached. With a single row, this means the entire stream will be loaded into memory. SQL 2005 and above provide a solution to this problem with the ability to shred the xml data type into individual rows. Compare the result returned by the below queries.

DECLARE @XML xml

SET @XML = (SELECT
UserID,
FirstName,
LastName
FROM
Users
FOR XML PATH('User'))
SELECT @XML AS UserXML

results with a single row

DECLARE @XML xml

SET @XML = (SELECT
UserID,
FirstName,
LastName
FROM
Users
FOR XML PATH('User'))


SELECT nref.query('.') AS UserXML
from @XML.nodes('//User') AS R(nref)

results with multiple rows SSIS can efficiently buffer the result set returned by the second query, which will enable large sets to be processed without issue.

No comments:

Post a Comment