In Parts 1 through 3 of this blog series the author discussed various aspects of SQL Server indices such as:
Part 4 covers the following topics:
The query optimizer determines the appropriate execution plan for a query. “The SQL Server Query Optimizer is a cost-based optimizer. It analyzes a number of candidate execution plans for a given query, estimates the cost of each of these plans and selects the plan with the lowest cost of the choices considered. Indeed, given that the Query Optimizer cannot consider every possible plan for every query, it actually has to do a cost-based balancing act, considering both the cost of finding potential plans and the costs of plans themselves.”
“Cost-assessment of each plan – While the Query Optimizer DOES NOT GENERATE EVERY POSSIBLE EXECUTION PLAN, it assesses the resource and time cost of each plan it does generate. The plan that the Query Optimizer deems to have the lowest cost of those it’s assessed is selected, and passed along to the Execution Engine.” Source: http://www.simple-talk.com/sql/sql-training/the-sql-server-query-optimizer. Therefore, the best plan is NOT GUARANTEED and too many choices yield indeterminate results.
Statistics contain information about distribution of values in one or more columns of table or indexed view. The Query Optimizer uses statistics to estimate the number of rows that a particular query operator will generate. Outdated statistics can cause a query to use a suboptimal index or perform a full scan, so it is prudent to insure that they are up-to-date as much as possible. Large numbers of insertions or key modifications can cause them to be skewed and inaccurate with respect to the current data because even when statistics are updated automatically they may lag behind the actual data. The AUTO_CREATE_STATISTICS and AUTO_UPDATE_STATISTICS are automatically on by default in Tempdb and often on by default in many other databases. Please see the following article for a complete discussion of statistics and when SQL Server determines that they are sufficiently out-of-date to warrant updating: https://msdn.microsoft.com/en-us/library/dd535534(v=sql.100).aspx.
Out-of-date statistics can be determined with very little overhead by using the stats_date (object_id, index_id) function and the sys.indexes view. Since creating or updating statistics can be expensive depending upon table size, sample size, and frequency, several options are available to keep the overhead at a reasonable level and these are shown below.
Balancing the need for updated statistics and the costs of creating or updating them can cause customers to turn off all automatic options and then update them manually using the preceding commands. However, sometimes these good intentions can cause surprisingly out-of-date statistics as shown in Table 4 below. This customer did not intend for this situation to occur, but as the reader can see clearly from the example, the statistics of many large tables were not updated for over a year. Obviously, this is an undesirable situation, especially since many thousands of records were inserted into some of these tables daily.
Larger numbers of indices create exponentially more query plan possibilities. When too many choices exist, the Optimizer will give up partway through and just pick the best plan thus far. As more indices are added the problem worsens and compilation times, i.e., processor times, increase to a point. This can be illustrated best by reviewing an actual customer example. In this case, one table had 144 indices attached to it and several others had between 20 and 130 indices. The queries were quite complex with as many as fifteen joins, many of which were outer joins. Query and index tuning were impossible because query performance was often counterintuitive and sometimes nonsensical. Adding an index that addressed a specific query need often made the query run worse one time and better the next. Note: Cached query plan issues, e.g., parameter sniffing or plan reuse were not problems in this case. The only solution was to tear down ENTIRE indexing structure and rebuild it with SQL Server’s guidance and nine days’ worth of production queries. Table 5 summarizes the results of the index restructuring project. The performance of 98 percent of the queries was comparable to or better than it was when the large numbers of indices were present.
The original application was built in Microsoft Access and each office used its own version of the database. The company decided to consolidate all of the Access databases into a single corporate-wide SQL Server database and implemented most of the indices that had been used within Access. No formal or cohesive indexing strategy was ever implemented. Desperation combined with bad luck and extremely complex queries created a perfect environment for this kind of problem to arise. The list below summarizes the key points regarding how they got themselves into this mess.
Every record insertion creates overhead maintenance work because all non-subset indices are updated for every insert. Page splits may be involved if a page has been filled already. Upper index levels may be affected and may split as well. A page split is defined as follows: “When a new row is added to a full index page, the Database Engine moves approximately half the rows to a new page to make room for the new row. This reorganization is known as a page split. A page split makes room for new records, but can take time to perform and is a resource intensive operation. Also, it can cause fragmentation that causes increased I/O operations.” Locks are held while all this happens, and fillfactor and insertion rate affect frequency of this activity. Further research can be conducted via http://msdn.microsoft.com/en-us/library/ms177459.aspx.
Increased overhead leads to lower transaction throughput, increased blocking, and increased deadlocking. These worsen if queries do not use the Nolock query hint because Page and Row locks are acquired and potentially held longer. Use of Nolock can greatly reduce these problems for selection queries — do not forget to add the Nolock hint to join clauses. Research transaction level isolation for further information. In high volume record insertion cases, page latch wait time can become a major constraining factor.
The next article in this blog series will cover how to detect indices whose functionality is duplicated by other indices. Until then….happy tuning and please contact us with any questions.
For more information about blog posts, concepts and definitions, further explanations, or questions you may have…please contact us at SQLRx@sqlrx.com. We will be happy to help! Leave a comment and feel free to track back to us. Visit us at www.sqlrx.com!
Please log in again. The login page will open in a new window. After logging in you can close it and return to this page.