Tempdb
: Since there is only one tempdb for each SQL Server instance, this can be a performance and a disk space bottleneck. The number of times my applications have slowed (or even stopped working altogether) due to the tempdb being full! Like other databases, tempdb needs monitoring for space issues usually caused by excessive DDL and/or DML operations.
A combination of any of the above can prove fatal for applications and in the main, your userbase. For instance, if a large table is missing a useful index, this will cause the query to slow down. As a result of that, it also puts heavy pressure on the I/O subsystem to read the unnecessary data pages and on memory which has to store these pages in the cache.
So how do I know which one is it?
I liken it to �driving into work� � you drive to work using the same, trusted route that you�ve been traveling for years. But over time new, quicker routes may have become available and you don�t realize how much quicker it could be if you simply changed the way you traveled in. Or, more likely, unwanted roadworks may result in temporary diversion routes.
You need to find the quickest & most convenient route to your desk & that much needed first coffee!
All good literature will tell you it�s vital to have some idea of what you deem as �normal performance�. By normal I mean acceptable. You need to have some kind of baseline, upon which to measure against.
i.e. what does your database run like, how does memory, CPU and I/O operate when all your users are happy and the overall system is �hunky dory�??
If you�re reading this and you have no performance problems (yet!), then collect some baseline information � now. It will aid the diagnosis process.
You might find that the problem is a resource that is running near capacity and that SQL Server cannot support the workload in its current configuration. To address this issue, you may need to add more processing power, memory, or increase the bandwidth of your I/O or network channel.
But, before you take that step, it is useful to understand some common causes of resource bottlenecks.
CPU Bottlenecks
People may automatically think � I�ll just throw money at the server and add some more and/or faster processors - that will fix it. But before rushing around the January sales, you should see whether your existing CPUs are tuned properly in the first place.
Processor bottlenecks usually happen suddenly (and mostly unexpectedly).
Because of either poor configuration, poor design or not using the most optimal query plans.
The Processor:% Processor Time counter in Perfmon (System Monitor) is a good place to start � if the value is high (and Microsoft deem +80% to be high here) then yes you have a CPU bottleneck. For more information on System Monitor counters see Books Online.
There are various performance monitoring tools on the market to continually keep an eye on counters such as these and allows the DBA to be alerted to any potential causes for concern. Read the article
http://www.gre-sqlserver-solutions.com/MonitoringTools.html
as an example of their use.
Some other potential causes:
Schedulers
Another symptom of CPU bottlenecks is a high number of runnable tasks. You can monitor the SQL Server schedulers using the sys.dm_os_schedulers view to see if the number of runnable tasks is typically nonzero. A nonzero value indicates tasks have to wait for their time slice to run.
Recompilations
When a batch or remote procedure call (RPC) is submitted to SQL Server, before it begins executing, the server checks for the validity and correctness of the query plan.
If one of these checks fails, the batch may have to be compiled again to produce a different query plan. These are known as recompilations.
And going back to my �driving to work� metaphor, if a route continually changes and you keep having to alter your routine or if roadworks continue to hinder the journey, it will soon become a hassle!
Recompilations used to be a bigger problem in SQL 2000 because when it recompiled a stored procedure, the entire stored procedure was recompiled, not just the statement that triggered the recompile.
Thankfully this has been improved in SQL 2005 (only the statement that caused the
recompilation is compiled�not the entire procedure). This obviously uses less CPU
bandwidth and results in less contention overall.
Perfmon can also be a very effective tool in diagnosing recompilations:
The SQL Statistics object provides counters to monitor compilation and the type of requests that are sent to an instance of SQL Server. You must monitor the number of query compilations and recompilations in conjunction with the number of batches received to find out if the compiles are contributing to high CPU use.
The key data counters to look are as follows.
-
SQL Server: SQL Statistics: Batch Requests/sec
-
SQL Server: SQL Statistics: SQL Compilations/sec
-
SQL Server: SQL Statistics: SQL Recompilations/sec
Ideally, the ratio of SQL Recompilations/sec to Batch Requests/sec should be very low unless users are submitting ad hoc queries.
You could then use another tool - SQL Server Profiler to find out information such as which stored procedures were being recompiled.
If the PerfMon counters indicate a high number of recompiles, the recompiles could be contributing to the high CPU consumed by SQL Server. We would then need to look at the profiler trace to find the stored procedures that were being recompiled. The SQL Server Profiler trace gives us that information along with the reason for the recompilation. You can use the following events to get this information.
Again, for more in-depth information, on what to monitor in the trace see SQL Server Books Online.
There are numerous resolutions to excessive recompilations, they include:-
� Batches should have qualified object names (for example, dbo.mytablename) to avoid recompilation and to avoid ambiguity between objects.
� Run Database Engine Tuning Advisor (DTA) to see if any indexing changes improve the compile time and the execution time of the query.
� Check to see if the stored procedure was created with the WITH RECOMPILE option. If a procedure was created with the WITH RECOMPILE option, in SQL Server 2005, we may be able to take advantage of the statement level RECOMPILE hint if a particular statement within that procedure needs to be recompiled. This would avoid the necessity of recompiling the whole procedure each time it executes, while at the same time allowing the individual statement to be compiled.
Inefficient query plans
When generating an execution plan for a query, the SQL Server optimizer attempts to choose a plan that provides the fastest response time for that query.
Now I�m no statistician so for an excellent white paper that describes in detail how the SQL Server optimizer uses statistics, see
http://www.microsoft.com/technet/prodtechnol/sql/2005/qrystats.mspx
This white paper discusses how the optimizer uses statistics, best practices for maintaining up-to-date statistics, and some common query design issues that can prevent accurate estimate cardinality and thus cause inefficient query plans.
Consider the following options if you have detected inefficient query plans.
-
Tune the query with the Database Engine Tuning Advisor to see if it produces any index recommendations.
-
Check for issues with bad cardinality estimates.
Are the queries written so that they use the most restrictive WHERE clause that is applicable? Unrestricted queries are resource intensive by their very nature.
Run UPDATE STATISTICS on the tables involved in the query and check to see if the problem persists.
� If it is not possible to modify the schema or the query, SQL Server 2005 has a new plan guide feature that allows you to specify query hints to add to queries that match certain text.
As mentioned at the start of this article, I have not been able to mention all the
possible causes. There could potentially be many other reasons why you are
experiencing CPU bottlenecks (poor use of cursors being another one).
Memory Bottlenecks
For a comprehensive insight into how SQL Server memory operates see SQL Svr Books Online.
But to get you started here are some general steps that will help you troubleshoot memory errors:-
1. Start collecting performance monitor counters for SQL Server: Buffer Manager, SQL Server: Memory Manager.
2. Verify the memory configuration parameters (sp_configure), min memory per query, min/max server memory, awe enabled, and the Lock Pages in Memory privilege. Watch for unusual settings. Correct them as necessary. (Account for increased memory requirements for SQL Server 2005!)
3. Check for any nondefault sp_configure parameters that might indirectly affect the server.
4. Observe DBCC MEMORYSTATUS output and the way it changes when you see memory error messages.
5. Check the workload (number of concurrent sessions, currently executing queries).
Below are 4 of the most common memory-related error messages as described in official Microsoft SQL Server technical documentation:-
701 - There is insufficient system memory to run this query.
Causes
This is very generic out-of-memory error for the server. It indicates a failed memory allocation. It can be due to a variety of reasons, including hitting memory limits on the current workload. With increased memory requirements for SQL Server 2005 and certain configuration settings (such as the max server memory option) users are more likely to see this error as compared to SQL Server 2000. Usually the transaction that failed is not the cause of this error.
Troubleshooting
Regardless of whether the error is consistent and repeatable (same state) or random (appears at random times with different states), you will need to investigate server memory distribution during the time you see this error. When this error is present, it is possible that the diagnostic queries will fail. Start investigation from external assessment.
Possible solutions include: Remove external memory pressure. Increase the max server memory setting. Free caches by using one of the following commands: DBCC FREESYSTEMCACHE, DBCC FREESESSIONCACHE, or DBCC FREEPROCCACHE. If the problem reappears, reduce workload.
802 - There is insufficient memory available in the buffer pool.
Causes
This error does not necessarily indicate an out-of-memory condition. It might indicate that the buffer pool memory is used by someone else. In SQL Server 2005, this error should be relatively rare.
Troubleshooting
Use the general troubleshooting steps and recommendations outlined for the 701 error above.
8628 - A time out occurred while waiting to optimize the query. Rerun the query.
Causes
This error indicates that a query compilation process failed because it was unable to obtain the amount of memory required to complete the process. As a query undergoes through the compilation process, which includes parsing, algebraization, and optimization, its memory requirements may increase. Thus the query will compete for memory resources with other queries. If the query exceeds a predefined timeout (which increases as the memory consumption for the query increases) while waiting for resources, this error is returned. The most likely reason for this is the presence of a number of large query compilations on the server.
Troubleshooting
1. Follow general troubleshooting steps to see if the server memory consumption is affected in general.
2. Check the workload. Verify the amounts of memory consumed by different components.
3. Check the output of DBCC MEMORYSTATUS for the number of waiters at each gateway (this information will tell you if there are other queries running that consume significant amounts of memory).
4. Reduce workload if possible.
8645 - A time out occurred while waiting for memory resources to execute the query. Rerun the query.
Causes
This error indicates that many concurrent memory intensive queries are being executed on the server. Queries that use sorts (ORDER BY) and joins may consume significant amount of memory during execution. Query memory requirements are significantly increased if there is a high degree of parallelism enabled or if a query operates on a partitioned table with non-aligned indexes. A query that cannot get the memory resources it requires within the predefined timeout (by default, the timeout is 25 times the estimated query cost or the sp_configure �query wait� amount if set) receives this error. Usually, the query that receives the error is not the one that is consuming the memory.
Troubleshooting
1. Follow general steps to assess server memory condition.
2. Identify problematic queries: verify if there is a significant number of queries that operate on partitioned tables, check if they use non-aligned indexes, check if there are many queries involving joins and/or sorts.
3. Check the sp_configure parameters degree of parallelism and min memory per query. Try reducing the degree of parallelism and verify if min memory per query is not set to a high value. If it is set to a high value, even small queries will acquire the specified amount of memory.
I/O Bottlenecks
SQL Server performance depends heavily on the I/O subsystem. Unless your database fits into physical memory, SQL Server constantly brings database pages in and out of the buffer pool. This generates substantial I/O traffic. Similarly, the log records need to be flushed to the disk before a transaction can be declared committed. And finally, SQL Server uses tempdb for various purposes such as to store intermediate results, to sort, to keep row versions and so on. So a good I/O subsystem is critical to the performance of SQL Server.
Access to log files is sequential except when a transaction needs to be rolled back while access to data files, including tempdb, is randomly accessed. So as a general rule, you should have log files on a separate physical disk than data files for better performance.
If you have a slow I/O subsystem, your users may experience performance problems such as slow response times, and tasks that abort due to timeouts.
You can use the following performance counters (as referenced from Microsoft literature) to identify I/O bottlenecks:-
� PhysicalDisk Object: Avg. Disk Queue Length represents the average number of physical read and write requests that were queued on the selected physical disk during the sampling period. If your I/O system is overloaded, more read/write operations will be waiting. If your disk queue length frequently exceeds a value of 2 during peak usage of SQL Server, then you might have an I/O bottleneck.
� Avg. Disk Sec/Read is the average time, in seconds, of a read of data from the disk. Any number
Less than 10 ms - very good
Between 10 - 20 ms - okay
Between 20 - 50 ms - slow, needs attention
Greater than 50 ms � Serious I/O bottleneck
� Avg. Disk Sec/Write is the average time, in seconds, of a write of data to the disk. Please refer to the guideline in the previous bullet.
� Physical Disk: %Disk Time is the percentage of elapsed time that the selected disk drive was busy servicing read or write requests. A general guideline is that if this value is greater than 50 percent, it represents an I/O bottleneck.
� Avg. Disk Reads/Sec is the rate of read operations on the disk. You need to make sure that this number is less than 85 percent of the disk capacity. The disk access time increases exponentially beyond 85 percent capacity.
� Avg. Disk Writes/Sec is the rate of write operations on the disk. Make sure that this number is less than 85 percent of the disk capacity. The disk access time increases exponentially beyond 85 percent capacity.
Note :- When using above counters, you may need to adjust the values for RAID configurations using these formulas (as referenced from Microsoft literature)
Raid 0 -- I/Os per disk = (reads + writes) / number of disks
Raid 1 -- I/Os per disk = [reads + (2 * writes)] / 2
Raid 5 -- I/Os per disk = [reads + (4 * writes)] / number of disks
Raid 10 -- I/Os per disk = [reads + (2 * writes)] / number of disks
You should not rely on one counter to determine a bottleneck; look for multiple counters to cross check the validity of your findings.
You can also identify I/O bottlenecks by examining latch waits and using the DMV sys.dm_os_wait_stats (See Books Online for description and examples)
When you have identified an I/O bottleneck, what can you do about it?
� Check the memory configuration of SQL Server. If SQL Server has been configured with insufficient memory, it will incur more I/O overhead. You can examine following counters to identify memory pressure
� Buffer Cache hit ratio
� Page Life Expectancy
� Checkpoint pages/sec
� Lazywrites/sec
� Increase I/O bandwidth.
� Add more physical drives to the current disk arrays and/or replace your current disks with faster drives. This helps to boost both read and write access times. But don't add more drives to the array than your I/O controller can support.
� Add faster or additional I/O controllers. Consider adding more cache (if possible) to your current controllers.
� Examine execution plans and see which plans lead to more I/O being consume. It is possible that a better plan (for example, index) can minimize I/O. If there are missing indexes, you may want to run Database Engine Tuning Advisor to find missing indexes
Tempdb
Tempdb �globally stores both internal and user objects and the temporary tables, objects, and stored procedures that are created during SQL Server operation�.
As mentioned at the start of my paper, tempdb can be a performance and disk space bottleneck. The tempdb can become overloaded in terms of space available and excessive DDL/DML operations. This can cause unrelated applications running on the server to slow down or fail.
Some of the common issues with tempdb are as follows:
� Lack of space. Happens all the time at place of work!!
� Slow queries (thanks to the I/O bottleneck in tempdb)
� Excessive DDL operations leading to a bottleneck in the system tables.
� Allocation contention.
Again prevention is better than cure. You need to monitor the space in tempdb. You can use the following performance counters to monitor the amount of space tempdb is using.
� Free Space in tempdb (KB). This counter tracks free space in tempdb in kilobytes. Administrators can use this counter to determine if tempdb is running low on free space.
SQL Server 2005 has tried to alleviate some of the space problems that SQL 2000 had by attempting to �shrink� parts of tempdb. During the shrink process, the longest running transactions that have not yet generated any row versions are marked as victims. This frees up the version space used by them. Message 3967 is generated in the error log for each such victim transaction. If a transaction is marked as a victim, it can no longer read the row versions in the version store or create new ones. Message 3966 is generated and the transaction is rolled back when the victim transaction attempts to read row versions.
If the shrink of the version store succeeds, then more space is available in tempdb. Otherwise, tempdb still continues to run out of space.
Another way in which SQL 2005 has help improve the situation with contention in tempdb is to �cache� local temp tables. Before frequent creating and dropping of temporary tables would incur more contention.
Again you can also monitor Perfmon counters for any unusual increase in the temporary objects allocation/deal location activity. (as referenced from Microsoft literature)
� SQL Server:Access MethodsWorkfiles Created /Sec
� SQL Server:Access MethodsWorktables Created /Sec
� SQL Server:Access MethodsMixed Page Allocations /Sec
� SQL Server:General StatisticsTemp Tables Created /Sec
� SQL Server:General StatisticsTemp Tables for destruction
Slow-Running Queries
You know the symptoms � because your phone doesn�t stop ringing from all those irate & frustrated users! Slow or long running queries can be a common problem.
Sometimes though, like me, you have no control on the application code and cannot do things like putting in more reasonable WHERE clauses to avoid an influx of full table scans. Poor application design can result in blocking and bad query plans (choosing that wrong way into work again !)
Majority of causes for slow running queries is the maintenance or lack of maintenance in fact, of indexes. In addition to missing indexes, there may be indexes that are not used.
So the motto here is to regularly maintain your indexes.
To conclude
I have only briefly touched upon some common causes of a badly performing SQL Server database. There are many more possible reasons why your users are pulling their hair out and those performance-monitoring tools are constantly throwing up red alerts.
There is a lot of literature out there describing, in finite detail sometimes, on subjects like the impact dodgy indexes have on your queries or the logic behind memory & I/O bottlenecking. Here at www.gre-sqlserver-solutions.com we can offer performance diagnosis and expert advice, assisting in the resolution of many common day-to-day issues.
So, don�t just sit there and let your databases die a slow & painful death, contact us today at
http://www.gre-sqlserver-solutions.com/ContactUS.html
.
You�ll be satisfied as your databases sit there purring nicely and it may even put smiles back on your users faces!
SQL Server Performance Problems - Contact us