How Indexing Affects Deletion Queries

Aug 10

The Problem

Many articles concerning SQL Server discuss how record insertion overhead increases with each additional index. They discuss b-tree manipulations and page splits in addition to leaf and non-leaf levels. However, few discuss the fact that deletion overhead increases as well, especially when large numbers of records are deleted by individual queries. Recently, I was working with a client who regularly purged large numbers of records from tables that ranged in size from large to gigantic. For example, one table contained over 6.5 billion records. I added an index (4^th overall) to one table expressly for the purpose of expediting the large deletion process, and the deletion run ran longer, despite using the new index! To determine how the numbers of indices and records to be deleted interact, I conducted an experiment to test several combinations. The specifics of the tests and their corresponding results are summarized below.

Test Table Creation & Load

To determine deletion/index behavior, a generic table was constructed and filled with 20 million records. Each record contained an identity column, an ID column, a text column, and 47 metric columns whose random values ranged between 1 and 1,000,000,000. The large number of table columns was used to insure SQL Server would choose an index option when appropriate. To minimize duplication of column values and create a uniform distribution of values, the ABS(cast(CHECKSUM(NewId()) as bigint)) % @ModBase formula was used to generate values that were as random as possible, where @ModBase was set to one billion. Three indices were created initially: a clustered index that used the identity column as its only key, a nonclustered index that used DupID as its only column, and a nonclustered index that used Metric43 and Metric14 as keys with Metric01, Metric02, Metric03, and Metric04 as included columns. The scripts for the creation, loading, and initial indexing of the table are shown below.

— ##############################################################

— Create test table

— ##############################################################

drop table FewDuplicates;

CREATE TABLE FewDuplicates (

IDCol bigint identity (20000000,1),

DupID bigint,

MyText varchar(10),

Metric01 bigint, Metric02 bigint, Metric03 bigint, Metric04 bigint,

Metric05 bigint, Metric06 bigint, Metric07 bigint, Metric08 bigint,

Metric09 bigint, Metric10 bigint, Metric11 bigint, Metric12 bigint,

Metric13 bigint, Metric14 bigint, Metric15 bigint, Metric16 bigint,

Metric17 bigint, Metric18 bigint, Metric19 bigint, Metric20 bigint,

Metric21 bigint, Metric22 bigint, Metric23 bigint, Metric24 bigint,

Metric25 bigint, Metric26 bigint, Metric27 bigint, Metric28 bigint,

Metric29 bigint, Metric30 bigint, Metric31 bigint, Metric32 bigint,

Metric33 bigint, Metric34 bigint, Metric35 bigint, Metric36 bigint,

Metric37 bigint, Metric38 bigint, Metric39 bigint, Metric40 bigint,

Metric41 bigint, Metric42 bigint, Metric43 bigint, Metric44 bigint,

Metric45 bigint, Metric46 bigint, Metric47 bigint

)

— ##############################################################

— Load original table

— ##############################################################

declare @DupID bigint = 1

declare @NumRecs bigint = 20000000

declare @ModBase bigint = 1000000000

truncate table FewDuplicates

set nocount on

while (@DupID <= @NumRecs)

begin

insert into [dbo].[FewDuplicates] (

[DupID], [MyText]

,[Metric01], [Metric02], [Metric03], [Metric04], [Metric05], [Metric06]

,[Metric07], [Metric08], [Metric09], [Metric10], [Metric11], [Metric12]

,[Metric13], [Metric14], [Metric15], [Metric16], [Metric17], [Metric18]

,[Metric19], [Metric20], [Metric21], [Metric22], [Metric23], [Metric24]

,[Metric25], [Metric26], [Metric27], [Metric28], [Metric29], [Metric30]

,[Metric31], [Metric32], [Metric33], [Metric34], [Metric35], [Metric36]

,[Metric37], [Metric38], [Metric39], [Metric40], [Metric41], [Metric42]

,[Metric43], [Metric44], [Metric45], [Metric46], [Metric47]

)

VALUES (

@DupID,‘my text’,

ABS(cast(CHECKSUM(NewId()) as bigint)) % @ModBase,

)

set @DupID += 1

end — group option loop

set nocount off

— ##############################################################

— Create indices on the test table

— ##############################################################

CREATE UNIQUE CLUSTERED INDEX [ci_RecID] ON [dbo].[FewDuplicates]

(

[IDCol] ASC

)

WITH (fillfactor = 100, PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON)

ON [PRIMARY]

CREATE NONCLUSTERED INDEX [ix_DupID] ON [dbo].[FewDuplicates]

(

DupID ASC

)

WITH (fillfactor = 100, PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON)

ON [PRIMARY]

CREATE NONCLUSTERED INDEX ix_CombinedIndex ON [dbo].[FewDuplicates]

(

[Metric43],

[Metric14]

)

INCLUDE (

[Metric01],

[Metric02],

[Metric03],

[Metric04]

)

WITH (fillfactor = 100, PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON)

ON [PRIMARY]

Queries that Perform Varying Numbers of Record Deletions

To illustrate the issue, six queries were created that deleted various numbers of records and they are listed in Table 1. The deletions were designed to use the ix_CombinedIndex index, thereby emulating the client situation in which the optimal index for deletions was used. The only differences involved the numbers of records deleted and these are highlighted in the table below for easier comparison. The selected values were chosen so that small, medium, and large numbers of records would be deleted.

Table 1: All Six Deletion Queries

The plan for Query #4 with three indices is shown below in Figure 1 and it is fairly simple, as one would expect. Things become more complicated with additional indices, as discussed in the next section.

Figure 1: Query Plan for Deletion Query #4 with Three Indices

Indices that Cause Additional Deletion Overhead

To examine the effects of additional indices on the deletion queries, up to eight were added to the table using different keys. None of the indices was used for direct data access and the index definitions are shown below, followed by the commands to delete them. Only 4 of these indices were used for the 7-index test, whereas all 8 were used for the 11-index test.

— create indices

CREATE NONCLUSTERED INDEX ix_AddedIndexNumber1 ON [dbo].[FewDuplicates]

(

[Metric01],

[Metric02]

)

INCLUDE (

[Metric03],

[Metric04],

[Metric43],

[Metric14]

)

CREATE NONCLUSTERED INDEX ix_AddedIndexNumber2 ON [dbo].[FewDuplicates]

(

[Metric03],

[Metric04]

)

INCLUDE (

[Metric01],

[Metric02],

[Metric43],

[Metric14]

)

CREATE NONCLUSTERED INDEX ix_AddedIndexNumber3 ON [dbo].[FewDuplicates]

(

[Metric05],

[Metric06]

)

INCLUDE (

[Metric01],

[Metric02],

[Metric43],

[Metric14]

)

CREATE NONCLUSTERED INDEX ix_AddedIndexNumber4 ON [dbo].[FewDuplicates]

(

[Metric07],

[Metric08]

)

INCLUDE (

[Metric01],

[Metric02],

[Metric43],

[Metric14]

)

CREATE NONCLUSTERED INDEX ix_AddedIndexNumber5 ON [dbo].[FewDuplicates]

(

[Metric09],

[Metric10]

)

INCLUDE (

[Metric01],

[Metric02],

[Metric43],

[Metric14]

)

CREATE NONCLUSTERED INDEX ix_AddedIndexNumber6 ON [dbo].[FewDuplicates]

(

[Metric11],

[Metric12]

)

INCLUDE (

[Metric01],

[Metric02],

[Metric43],

[Metric14]

)

CREATE NONCLUSTERED INDEX ix_AddedIndexNumber7 ON [dbo].[FewDuplicates]

(

[Metric13],

[Metric14]

)

INCLUDE (

[Metric01],

[Metric02],

[Metric43],

[Metric44]

)

CREATE NONCLUSTERED INDEX ix_AddedIndexNumber8 ON [dbo].[FewDuplicates]

(

[Metric15],

[Metric16]

)

INCLUDE (

[Metric01],

[Metric02],

[Metric43],

[Metric14]

)

— drop indices

drop INDEX ix_AddedIndexNumber1 ON [dbo].[FewDuplicates]

drop INDEX ix_AddedIndexNumber2 ON [dbo].[FewDuplicates]

drop INDEX ix_AddedIndexNumber3 ON [dbo].[FewDuplicates]

drop INDEX ix_AddedIndexNumber4 ON [dbo].[FewDuplicates]

drop INDEX ix_AddedIndexNumber5 ON [dbo].[FewDuplicates]

drop INDEX ix_AddedIndexNumber6 ON [dbo].[FewDuplicates]

drop INDEX ix_AddedIndexNumber7 ON [dbo].[FewDuplicates]

drop INDEX ix_AddedIndexNumber8 ON [dbo].[FewDuplicates]

Figure 2 shows the plan for the same query that was shown in Figure 1, but this time with 11 indices on the table instead of 3. Clearly, the plan is MUCH more complex and the effects of the additional indices are obvious. This suggests what the test metrics will ultimately confirm: work grows demonstrably as the number of indices increases.

Blog_20170810_3

Figure 2: Query Plan for Deletion Query #4 with 11 Indices

Test Results

A total of 18 tests were conducted. Three index configurations were used with the following numbers of indices: 3, 7, and 11. Queries #1 – #6 were run against each configuration. As shown in Figure 3 through Figure 6, most of the metrics are comparable until the 100,000-record level is exceeded, at which point great divergence occurs. One of the most interesting findings involves writing, whose curve shape differs completely from those of the other metrics. These figures graphically illustrate why many deletions of small-to-moderate numbers of records are not often noticed as the number of indices increases. However, it also illustrates clearly how the size of the deletion and the number of indices can combine to negate the improvement provided by a specialty deletion index.

Figure 3: Deletion Queries – Duration (Seconds)

Figure 4: Deletion Queries – CPU (Seconds)

Figure 5: Deletion Queries – Reads

Figure 6: Deletion Queries – Writes

Conclusion

This article illustrated a situation in which the amount of work performed by deletion queries increased dramatically as the number of records to be deleted increased beyond approximately 100,000. Although this particular value is obviously applicable only to the test case, the overall message is clear – as the number of records to be deleted increases well beyond 100,000, the work performed by the query increases dramatically. In addition, this situation worsens considerably as the number of indices spanning a table increases. Therefore, although the overhead associated with deleting low numbers of records may not be noticed as indices are added, the performance of queries that delete large numbers of records will degrade noticeably as the number of indices increases. In some cases, the overhead may negate any improvement that might be gained by adding an index whose purpose is to expedite the deletion process. Therefore, before considering adding an index to improve deletion performance, insure that the batch of deleted records is not too large and the number of indices on the table is small.

For more information about blog posts, concepts and definitions, further explanations, or questions you may have…please contact us at SQLRx@sqlrx.com. We will be happy to help! Leave a comment and feel free to track back to us. Visit us at www.sqlrx.com!

Deletes And Indexes – Curated SQL says:

August 11, 2017 at 7:00 am

[…] Jeff Schwartz looks at the performance cost of indexes when it comes to deleting rows: […]

Ray Herring says:

August 12, 2017 at 10:52 am

This is the exact use case that led me to develop an extensive and automated process to manage updatable, partitioned views. (note: not Table Partitions).
The process is essentially a date based, sliding window that automatically creates a new “sub-table” for the upcoming time period and then recreates the updatable view (removing the oldest sub-table and adding the new subtable). The removed table is then Truncated and Dropped (or in some cases moved to a history database).
Using this approach we can quickly (less than 1 minute) remove more than 200GB from the system.
In our case Enterprise Edition is not an option, but when I studied Partitioned tables I was not confident they would provide a significant improvement over our partitioned view approach.