Should I Replace My FOR XML PATH String Merges with String_agg?

By Jeffry Schwartz | Function

Mar 07

If you are looking for major performance gains, the answer is no.  However, if you are using SQL Server 2017 and developing new code, you should consider using the new String_agg function that is available on SQL Server 2017 because code that uses this function is so much more readable and easier to write and maintain.  This article compares the performance of the two methods while explaining the differences in the code.  The examples provided employ lists of SQL Server table names that are merged into a dynamic SQL clause that can be used in where or join clauses.

Testing Strategy

Since list length is often a concern when splitting or merging items, various numbers of records were merged and timed.  A relatively small table was chosen to minimize the impact of I/O, and caches were purged before the execution of each method.  Since the individual execution times were very small, the pertinent code was placed within a 10,000-iteration loop.  The tested numbers of records to be merged into a string were as follows:  225, 450, 675, 900, and 1125, and the loop code is shown below.

declare @NumRows int = 1125 -– # of rows to be merged
declare @strSeparator nvarchar(20) = ' or '
declare @MaxLoop int = 10000
declare @LoopInx int = 0
declare @strFilter nvarchar(max) = ''
set nocount on
while @LoopInx <= @MaxLoop 
begin
-– String merge code is inserted here 
	set @LoopInx += 1
end
set nocount off
go

Each test was performed within its own batch on an otherwise idle system.  Run-time performance was captured using an Extended Events session, which was configured to capture sqlserver.sql_batch_completed events ONLY to minimize overhead.  The session definition, start, and stop code are shown below.  This session collects information for completed batches that ran longer than 10 microseconds and were executed from database #7.

use master;
IF EXISTS(SELECT * FROM sys.server_event_sessions WHERE name='SQLRxPerformanceMonitoring')
    DROP EVENT session SQLRxPerformanceMonitoring ON SERVER
GO

 -- replace database_id(s) with list
CREATE EVENT SESSION SQLRxPerformanceMonitoring ON SERVER 
ADD EVENT sqlserver.sql_batch_completed(
SET collect_batch_text=(1)
ACTION(sqlserver.server_instance_name, sqlserver.database_id, sqlserver.session_id, sqlserver.client_pid, sqlserver.client_app_name, sqlserver.database_name, sqlserver.server_principal_name, sqlserver.client_hostname, sqlserver.tsql_frame, sqlserver.tsql_stack, sqlserver.sql_text, sqlserver.request_id, sqlserver.query_hash, sqlserver.query_plan_hash, package0.event_sequence)
    	WHERE ([duration] > (10) and (database_id = 7)) -- values must be hard-coded
) -- replace database_id
ADD TARGET package0.event_file(SET filename= N'E:\SQLRx\SQLScriptOutput\DevPerf.xel')
WITH (MAX_MEMORY=4096 KB, EVENT_RETENTION_MODE=ALLOW_SINGLE_EVENT_LOSS, MAX_DISPATCH_LATENCY=5 SECONDS, MAX_EVENT_SIZE=0 KB, MEMORY_PARTITION_MODE=NONE, TRACK_CAUSALITY=OFF, STARTUP_STATE=OFF)
GO

-- entry exists when session is active
if not exists(select * from [master].[sys].[dm_xe_sessions] where [name] = 'SQLRxPerformanceMonitoring')
	alter EVENT SESSION SQLRxPerformanceMonitoring ON SERVER STATE = START

-- entry exists when session is active
if exists(select * from [master].[sys].[dm_xe_sessions] where [name] = 'SQLRxPerformanceMonitoring')
	alter EVENT SESSION SQLRxPerformanceMonitoring ON SERVER STATE = STOP

The Code

Two different T-SQL methods were used for testing.  The FOR XML PATH solution incorporates the reverse and stuff functions along with FOR XML PATH.  The String_agg solution is much simpler, but is only available on SQL Server 2017 currently.

The FOR XML PATH method has been documented thoroughly over the years in many articles, so these explanations will not be repeated here.  However, two points are worth mentioning.  The first one is that the @strSeparator (in this case ‘ or  ’) string will reside at the end of the constructed string (in this case @strFilter).  Since the last occurrence of @strSeparator must be removed to prevent a syntax error, characters must be dropped from the end of the string.  Although it may seem obvious that four characters (the length of @strSeparator) should be removed, actually only three need be removed because the end of the constructed string is “r” not “ “.  Therefore, the length argument of the left function specification is the length of @strFilter minus three.

The second point of interest concerns the parentheses that surround the select statement.  They may seem unnecessary, but they are important because without them, the following errors occur:

Msg 156, Level 15, State 1, Line 26
Incorrect syntax near the keyword 'select'.

Msg 102, Level 15, State 1, Line 28
Incorrect syntax near ')'.

The second method, String_agg, is much simpler to implement, but does have a restriction, as shown below:

Msg 9829, Level 16, State 1, Line 62
STRING_AGG aggregation result exceeded the limit of 8000 bytes. 
Use LOB types to avoid result truncation.

This limitation can be circumvented by converting the [Full Table Name] column to an nvarchar(max) variable, so that the overall string length no longer matters.

The code for the two methods is shown below:

-- FOR XML PATH
select @strFilter = reverse( stuff( reverse(
	(select 'FullTableName = ' + '''' + [Full Table Name] + '''' +
@strSeparator as [text()]
from IndexColumnAnalysisList 
where RecID <= @NumRows for xml path('')
)
) -- first reverse
	,1,1,'') -- stuff
	) -- second reverse
Set @strFilter = left(@strFilter, len(@strFilter) - 3) – drop off last ‘ or ’
-- FOR XML PATH

-- String_agg
select @strFilter = string_agg('FullTableName = ' + '''' + 
cast([Full Table Name] as nvarchar(max)) + '''', @strSeparator)
	from IndexColumnAnalysisList
	where RecID <= @NumRows
–- String_agg

Test Results

Results of the tests are shown in the Figure 1 below.  Clearly, there was very little difference between the two methods, although the number of elements directly affected performance of both methods. Durations were equal to processor times and proportional to the number of records that were merged.  The query plans for the two methods were quite similar and shown in Figure 2 and Figure 3, respectively.

Figure 1: Average Execution Time

Figure 2: FOR XML PATH Execution Plan

Figure 3: String_agg Execution Plan

Conclusion

In summary, existing T-SQL code need not be replaced.  However, if possible, new development that will run on SQL Server 2017 or higher should use the String_agg function for simplicity, improved readability, and easier maintenance.

For more information about blog posts, concepts and definitions, further explanations, or questions you may have…please contact us at SQLRx@sqlrx.com. We will be happy to help! Leave a comment and feel free to track back to us. Visit us at www.sqlrx.com!

 

About the Author

  • […] Jeffry Schwartz compares the performance of STRING_AGG in SQL Server 2017 versus the tried and true method of FOR XML PATH for string concatenation: […]

  • >