Monday, January 31, 2011

Query Tuning - At grassroot level


Query Tuning Steps

Most of the DBA’s new to query tuning would wonder where to start in order to tune a query. I hope this article would probably guide them in understanding the steps to begin with.


Step 1:
Run the query in Management Studio and view the actual execution plan. To view the execution plan, press Ctrl+M and then execute the query in SSMS.

Step 2:
In Execition plan next to results ,Check if there are any table scans or Clustered index scan or Index scan involved in the execution plan. If yes, then you should analyze that table’s info thoroughly in the execution.

Step 3:
Identify the actual rows in the table where there is scan involved. If the table is slightly larger i.e. greater than 2000 rows I would suggest you to check if there are proper indexes in the table. If the table has less than 2000 records table scan wouldn’t be a problem and I would rather prefer a table scan on those tables.

Step 4:
If there is already an index you have to analyze why the optimizer preferred a Clustered index scan or an Index scan rather than Seeks. The reason may be due to fragmentation or outdated statistics or due to the least selectivity or the query cost.

Step 5:
The following query will give the exact % of fragmentation in the indexes for a particular table. The below query will display the fragmentation status in the table “Person.Address” in Adventureworks database.

SELECT CAST(DB_NAME(database_id) AS varchar(20)) AS [Database Name],
CAST(OBJECT_NAME(object_id) AS varchar(20)) AS [TABLE NAME], Index_id, Index_type_desc, Avg_fragmentation_in_percent, Avg_page_space_used_in_percent
FROM sys.dm_db_index_physical_stats(DB_ID('AdventureWorks'),OBJECT_ID('person.address'),NULL,NULL,'Detailed')

If the avg_fragmentation_in_percent is > 40% rebuild the index (using Alter index rebuild command) to eliminate fragmentation. It’s recommended to have a rebuild index job for all the tables scheduled to run on a weekly basis. Please NOTE that rebuilding an index is an expensive operation and ensure that it’s done only during OFF-Production hours.


Step 6:
If the indexes are fine, then check the statistics. Sometimes the index will be fine but the query would still continue to be slow since the optimizer wouldn’t be able to use the correct indexes due to outdated statistics. The following query gives the last time when the statistics for an index was last updated.

SELECT Name AS Stats_Name, STATS_DATE(object_id, stats_id) AS Statistics_update_date
FROM sys.stats
WHERE object_id=OBJECT_ID('person.address')

The statistics should be updated either weekly or daily or on alternate days depending on the frequency of modifications in the table. The more frequent the table is modified the more frequent the statistics should be updated. Sometimes for high transactional tables you can schedule a job to update the statistics on a regular basis.
Please NOTE that rebuilding the index will automatically update the statistics as well. Hence avoid updating the statistics if you are rebuilding the index.



Step 7:
If you see any key lookups happening in the execution plan, make use of Included columns to create a covering Nonclustered index to avoid expensive lookup operation. This will help in improving the query performance as the logical reads would be reduced to a great extent.

Step8:
Ensure that each table has a clustered index preferably on primary key columns (by default there is one unless you explicitly mention Nonclustered) or on Identity columns. The clustered index should always be defined on unique valued columns like primary keys or identity.

Step9:
If you have a composite index, ensure to have the most selective field (the ones which have unique values) as the leading column in the index.

Step10:
If you couldn’t tune the query further or if you are clueless, try to use Database Tuning Advisor (DTA). Provide the SQL query as input file and run the DTA. It will provide a list of recommendations to reduce the query cost.

Please do NOT blindly implement the suggestions doing so would certainly improve the query performance but you would end up creating numerous indexes which will be difficult to maintain during maintenance operations. You have to take the call of creating indexes as suggested by DTA, check whether the index will be used in most cases or if you can rewrite the query to make use of the existing indexes.

Step11:
While tuning stored procedures you need to ensure that the query plan for stored procedures is cached. The following query will help in providing the caching info for the stored procedures.

SELECT usecounts, cacheobjtype, objtype, [text]
FROM sys.dm_exec_cached_plans P
CROSS APPLY sys.dm_exec_sql_text(plan_handle) S
WHERE cacheobjtype = 'Compiled Plan' AND objtype='Proc'
AND [text] NOT LIKE '%dm_exec_cached_plans%'
AND S.DBID=11
--MENTION THE DATABASE ID FOR THE RESPECTIVE DATABASE (USE SP_HELPDB TO GET THE DBID)


The value of usecounts will increase every time you run the same stored procedure.If there is a problem in caching check if there is any SET options as most of them will cause a recompile in query plan. Also the plan will be flushed out every time you run DBCC Freeproccache or DBCC FlushprocinDB. Never use both of them in production environment as it will remove the cache for all the procedures and they (SP) will have to be recompiled the next time they are run.

If you suspect there might be some problem in the query plan, you can try to use WITH RECOMPILE option which will recompile the particular stored procedure every time it runs and see how the performance is.


CREATE PROC Test
WITH RECOMPILE
AS
Statement 1
Statement 2

Step12:
Finally if all the above options are fine and the query couldn’t be tuned, try to rewrite the query. In few cases as soon as you view the query such as the ones below we need to rewrite the query:

  • Creating a view with TOP 100% in order to include the ORDERBY clause in view definition where the view will not be sorted unless we explicitly sort the view by issuing
<!--[endif]-->
Select * from view order by column1 –Result will be sorted
Select * from view Result will NOT be sorted
Thus there is a extra cost involved in sorting by using the ORDER BY clause in view definition even though the result is NOT sorted. Hence we should avoid ORDER BY in view definition and instead use it as Select * from view order by column1
  • <!--[endif]-->Using correlated sub queries will cause RBAR – Row by agonizing Row and will affect the performance.
  • Avoid using Scalar functions in select statements and instead use Inline or Table valued function. Since Scalar function behaves like a cursor we need to avoid it being referenced in the Select statement

Changing server collation


How to change server collation

    The most clear and easy way is to completely reinstall the sql server instance. In sql server 2000 and earlier versions there is rebuild master utility - rebuildm.exe which has very clear interface but there are many cases when this utility hangs and you lose your master database forever.
   Starting from sql server 2005 rebuild master utility is integrated into installation package. So you need to load your installation drive and run from command line the next command
setup.exe /q /ACTION=RebuildDatabase /INSTANCENAME=MSSQLSERVER /SAPWD="password"
/SQLSYSADMINACCOUNTS="BUILTIN\ADMINISTRATORS" /SqlCollation=Latin1_General_CI_AS
(type this command in a single line in your cmd console)
Adjust this params for your needs - INSTANCENAME - put the instance name you want to change collation of.  INSTANCENAME=MSSQLSERVER - default instance. SAPWD - new sa password, SqlCollation - new sql server instance collation. This query may be used to rebuild master of sql server 2005 or to rebuild master of sql server 2008
Consider that after rebuild you will have a fresh sql server instance. You will need to attach or restore your databases, recreate logins and so on.
How to change database collation
    To change the database collation you can use  alter database operator. For instance to change collation of previously created database collationtest you can run the script
alter database collationtest collate Cyrillic_General_CI_AS

Script that returns all your databases' collations
select name, collation_name
from sys.databases
 Query that returns your sql server instance's collation
select SERVERPROPERTY('collation')

Saturday, January 29, 2011

Script to find the job details and timings

/* script to find the server jobs and their details
This script is for SQL server 2005
Author: Arun S Prasad
Mail    : arprasaad@gmail.com
Developed for thedbadelight.blogspot.com
*/

set nocount on
create table #joblisting(jobname varchar(100), enabled varchar(10), description varchar(100), frequency varchar(200),lastrundate varchar(100),Lastrun_status varchar(100))

declare @name varchar(100)
declare @enabled varchar(100)
declare @description varchar(100)
declare @freq_type varchar(100)
declare @freq_interval varchar(100)
declare @freq_subday_type varchar(100)
declare @freq_subday_interval varchar(100)
declare @freq_relative_interval varchar(100)
declare @freq_recurrence_factor varchar(100)
declare @active_start_date varchar(100)
declare @active_end_date varchar(100)
declare @active_start_time varchar(100)
declare @active_end_time varchar(100)
declare @schedule_description varchar(8000)
declare @rundate varchar(100)
declare @status varchar(10)
declare @jobid varchar(100)

DECLARE job_cursor CURSOR
FOR
select a.[name],
a.enabled,
a.description,
c.freq_type,
c.freq_interval,
c.freq_subday_type,
c.freq_subday_interval,
c.freq_relative_interval,
c.freq_recurrence_factor,
c.active_start_date,
c.active_end_date,
c.active_start_time,
c.active_end_time
from msdb..sysjobs a,
msdb..sysjobschedules b,
msdb..sysschedules c
where a.job_id=b.job_id and b.schedule_id = c.schedule_id

OPEN job_cursor

FETCH NEXT FROM job_cursor
INTO   @name,
  @enabled,
  @description,
  @freq_type,
  @freq_interval,
  @freq_subday_type,
  @freq_subday_interval,
  @freq_relative_interval,
  @freq_recurrence_factor,
  @active_start_date,
  @active_end_date,
  @active_start_time,
  @active_end_time


while @@fetch_status = 0
begin

EXEC msdb.dbo.sp_get_schedule_description
@freq_type = @freq_type,
@freq_interval = @freq_interval,
@freq_subday_type = @freq_subday_type,
@freq_subday_interval = @freq_subday_interval,
@freq_relative_interval = @freq_relative_interval,
@freq_recurrence_factor = @freq_recurrence_factor,
@active_start_date = @active_start_date,
@active_end_date = @active_end_date,
@active_start_time = @active_start_time,
@active_end_time = @active_end_time,
@schedule_description = @schedule_description out

select @jobid = job_id from sysjobs where name = @name
select @status=run_status, @rundate=convert(varchar(30),max(run_date))+' '+convert(varchar(30),max(run_time)) from sysjobhistory where job_id = @jobid group by run_status

insert into #joblisting values
(@name,@enabled,@description,@schedule_description,@rundate,case @status when 1 then 'Success' else 'Failure' end)

FETCH NEXT FROM job_cursor
INTO   @name,
  @enabled,
  @description,
  @freq_type,
  @freq_interval,
  @freq_subday_type,
  @freq_subday_interval,
  @freq_relative_interval,
  @freq_recurrence_factor,
  @active_start_date,
  @active_end_date,
  @active_start_time,
  @active_end_time

end

close job_cursor
deallocate job_cursor
--change line number 108 as select * from #joblisting to get the complete detail of the job. Only our requirement is met by this script.
select jobname as JOBNAME,frequency as SCHEDULE from #joblisting

drop table #joblisting

Monday, January 10, 2011

Dynamic Management Views and Functions

Dynamic management views and functions return server state information that can be used to monitor the health of a server instance, diagnose problems, and tune performance.
Dynamic management views and functions return internal, implementation-specific state data. Their schemas and the data they return may change in future releases of SQL Server. Therefore, dynamic management views and functions in future releases may not be compatible with the dynamic management views and functions in SQL Server 2008.
There are two types of dynamic management views and functions:
  • Server-scoped dynamic management views and functions. These require VIEW SERVER STATE permission on the server.
  • Database-scoped dynamic management views and functions. These require VIEW DATABASE STATE permission on the database.

Most Important DMVs

1)  sys.dm_exec_sessions and sys.dm_exec_requests. These two combine effectively as the replacement of sysprocesses

2)  sys.dm_exec_query_stats. Current performance statistics for cached objects and queries

3)  sys.dm_os_wait_stats. Find out where the waits are occuring in the engine. Replaces DBCC SQLPERF('WAITSTATS')

4)  sys.dm_db_missing_index_details. See what indexes could have helped performance of previous queries. A must to look at for any
   server that is having performance problems

5)  sys.dm_exec_sql_text. A dynamic management function to get the text of a SQL query based on its SQL handle

6)  sys.dm_os_ring_buffers  - for advanced use


TIP:

To “join” the use of a function like sys.dm_exec_sql_text you need to use the CROSS APPLY operator.
For example, here is a query to get the “input buffer” for sessions found in sys.dm_exec_requests:

Select session_id, s2.text from sys.dm_exec_requests
Cross apply sys.dm_exec_sql_text(sql_handle) s2

Don’t run this query in the context of a database in a compatibility mode < 90. It will fail because the APPLY operator is new to SQL 2005

The following query allows you to identify what SQL statements are currently executing. This information can be useful in debugging the cause of both long running queries and blocking, and also showing the execution progress of a stored procedure or batch of SQL.

SELECT [Spid] = session_Id
            , ecid
            , [Database] = DB_NAME(sp.dbid)
            , [User] = nt_username
            , [Status] = er.status
            , [Wait] = wait_type
            , [Individual Query] = SUBSTRING (qt.text, er.statement_start_offset/2,
                   (CASE WHEN er.statement_end_offset = -1
                        THEN LEN(CONVERT(NVARCHAR(MAX), qt.text)) * 2
                    ELSE er.statement_end_offset END - er.statement_start_offset)/2)
            ,[Parent Query] = qt.text
            , Program = program_name
            , Hostname
            , nt_domain
            , start_time
      FROM sys.dm_exec_requests er
      INNER JOIN sys.sysprocesses sp ON er.session_id = sp.spid
      CROSS APPLY sys.dm_exec_sql_text(er.sql_handle) as qt
      WHERE session_Id > 50               -- Ignore system spids.
      AND session_Id NOT IN (@@SPID)      -- Ignore this current statement.
      ORDER BY 1, 2


Column name
Data type
Description
spid
smallint
SQL Server process ID.
ecid
smallint
Execution context ID used to uniquely identify the subthreads operating on behalf of a single process.
dbid
smallint
ID of the database currently being used by the process.
nt_username
nchar(128)
Windows user name for the process, if using Windows Authentication, or a trusted connection.
status
nchar(30)
Process ID status. For example, running and sleeping.
wait_type
bigint
Current wait time in milliseconds.
Individual Query
varchar
SQL Statement currently running.
Parent Query
varchar
Routine that contains the Individual Query.
program_name
nchar(128)
Name of the application program.
Hostname
nchar(128)
Name of the workstation.
nt_domain
nchar(128)
Microsoft Windows domain for the client, if using Windows Authentication, or a trusted connection.
Start_time
datetime
Time when the request is scheduled to run.


Yet we often hear confusion about their use.  People ask, " there are so many of them—which are the most important ones?  Where do I begin?"
You can begin right here.
I have drafted a team of five first-round picks which comprise DMV All-Stars Dream Team. They are simple to use & provide a rich source of useful information.
The chosen queries provide information on the following topics:
·         Expensive Queries (CPU, reads, frequency, etc.)
·         Wait Stats
·         Virtual File Stats (including calculations for virtual file latency)
·         Plan Cache
·         Blocking (real-time)

I have provided backward-compatible versions for three of them. The first has no equivalent in SQL Server 2000.  The fifth has an equivalent, but would require a big post all its own.

These are among the most frequent DMVs I use—that's why I put them in one place.  I hope you find them helpful as well.  There’s not much unique in what I’ve provided (though I do have a surprise for a forthcoming post).  In fact, you're likely to recognize all of them.  Some of them I've liberated right out of BOL (2005 2008).  (And sh-h-h-h—don't tell anyone but I've copied-&-pasted one directly from the SQL Server Customer Advisory Team's (SQL CAT) Best Practices Toolbox).  Yet, the point is, ladies & gents, goyles & geeks, you have here in one place many of the DMVs you’ll need for much of your routine work.

There were many worthy queries which I have not included.  Perhaps someday I’ll draft a team of bench warmers which might include, e.g., DMVs for fragmentation.  If you don’t like my picks, nominate DMVs for your own  team.


--<<<<<<<<<<----------------------------------------------------------------->>>>>>>>>>--
--DMV_All-Stars.sql
--Table of Contents
--1. expensive queries
--2. wait stats
--3. virtual file stats (& virtual file latency)
--4. plan cache interrogation
--5. real-time blockers
--<<<<<<<<<<----------------------------------------------------------------->>>>>>>>>>--
--Weasel Clause: This script is provided "AS IS" with no warranties, and confers no rights.
--  Use of included script samples are subject to the terms specified at
--  http://www.microsoft.com/info/cpyright.htm
--<<<<<<<<<<----------------------------------------------------------------->>>>>>>>>>--

--1. expensive queries
    --text *and* statement
    --usage: modify WHERE & ORDER BY clauses to suit circumstances
SELECT TOP 25
      -- the following four columns are NULL for ad hoc and prepared batches
      DB_Name(qp.dbid) as dbname , qp.dbid , qp.objectid , qp.number
    --, qp.query_plan --the query plan can be *very* useful; enable if desired
    , qt.text
    , SUBSTRING(qt.text, (qs.statement_start_offset/2) + 1,
        ((CASE statement_end_offset
            WHEN -1 THEN DATALENGTH(qt.text)
            ELSE qs.statement_end_offset END
                - qs.statement_start_offset)/2) + 1) as statement_text
    , qs.creation_time , qs.last_execution_time , qs.execution_count
    , qs.total_worker_time    / qs.execution_count as avg_worker_time
    , qs.total_physical_reads / qs.execution_count as avg_physical_reads
    , qs.total_logical_reads  / qs.execution_count as avg_logical_reads
    , qs.total_logical_writes / qs.execution_count as avg_logical_writes
    , qs.total_elapsed_time   / qs.execution_count as avg_elapsed_time
    , qs.total_clr_time       / qs.execution_count as avg_clr_time
    , qs.total_worker_time , qs.last_worker_time , qs.min_worker_time , qs.max_worker_time
    , qs.total_physical_reads , qs.last_physical_reads , qs.min_physical_reads , qs.max_physical_reads
    , qs.total_logical_reads , qs.last_logical_reads , qs.min_logical_reads , qs.max_logical_reads
    , qs.total_logical_writes , qs.last_logical_writes , qs.min_logical_writes , qs.max_logical_writes
    , qs.total_elapsed_time , qs.last_elapsed_time , qs.min_elapsed_time , qs.max_elapsed_time
    , qs.total_clr_time , qs.last_clr_time , qs.min_clr_time , qs.max_clr_time
    --, qs.sql_handle , qs.statement_start_offset , qs.statement_end_offset
    , qs.plan_generation_num  -- , qp.encrypted
    FROM sys.dm_exec_query_stats as qs
    CROSS APPLY sys.dm_exec_query_plan(qs.plan_handle) as qp
    CROSS APPLY sys.dm_exec_sql_text(qs.sql_handle) as qt
    --WHERE...
    --ORDER BY qs.execution_count      DESC  --Frequency
      ORDER BY qs.total_worker_time    DESC  --CPU
    --ORDER BY qs.total_elapsed_time   DESC  --Durn
    --ORDER BY qs.total_logical_reads  DESC  --Reads
    --ORDER BY qs.total_logical_writes DESC  --Writes
    --ORDER BY qs.total_physical_reads DESC  --PhysicalReads   
    --ORDER BY avg_worker_time         DESC  --AvgCPU
    --ORDER BY avg_elapsed_time        DESC  --AvgDurn    
    --ORDER BY avg_logical_reads       DESC  --AvgReads
    --ORDER BY avg_logical_writes      DESC  --AvgWrites
    --ORDER BY avg_physical_reads      DESC  --AvgPhysicalReads

    --sample WHERE clauses
    --WHERE last_execution_time > '20070507 15:00'
    --WHERE execution_count = 1
    --  WHERE SUBSTRING(qt.text, (qs.statement_start_offset/2) + 1,
    --    ((CASE statement_end_offset
    --        WHEN -1 THEN DATALENGTH(qt.text)
    --        ELSE qs.statement_end_offset END
    --            - qs.statement_start_offset)/2) + 1)
    --      LIKE '%MyText%'

--<<<<<<<<<<----------------------------------------------------------------->>>>>>>>>>--
--2. wait stats
--2000
--DBCC sqlperf(waitstats,CLEAR)               --re-initialize waitstats
DBCC sqlperf(waitstats)
--2005
--DBCC sqlperf('sys.dm_os_wait_stats',CLEAR)  --re-initialize waitstats
SELECT * , (wait_time_ms - signal_wait_time_ms) as resource_wait_time_ms
    FROM sys.dm_os_wait_stats
    ORDER BY resource_wait_time_ms DESC
    --ORDER BY wait_time_ms DESC
    --ORDER BY signal_wait_time_ms DESC
    --ORDER BY waiting_tasks_count DESC
    --ORDER BY max_wait_time_ms DESC

--<<<<<<<<<<----------------------------------------------------------------->>>>>>>>>>--
--3. virtual file stats
--2000
SELECT DB_NAME(DBID) as DbName
    , (IoStallMS/(NumberReads + NumberWrites)) as IsStall
    , * FROM ::fn_virtualfilestats(-1,-1)
--2005
SELECT
    -- virtual file latency calcs
    IsReadStall
        = CASE WHEN num_of_reads = 0
               THEN 0 ELSE (io_stall_read_ms/num_of_reads) END
  , IsWriteStall
       = CASE WHEN io_stall_write_ms = 0
              THEN 0 ELSE (io_stall_write_ms/num_of_writes) END
  , IsStall
      = CASE WHEN (num_of_reads = 0 AND num_of_writes = 0)
             THEN 0 ELSE (io_stall/(num_of_reads + num_of_writes)) END
  , LEFT(mf.physical_name,2) as Drive
  , DB_NAME(vfs.database_id) as DB
  --, mf.name AS FileName
  , vfs.*
  , mf.physical_name
  FROM sys.dm_io_virtual_file_stats(NULL,NULL) as vfs
    JOIN sys.master_files as mf ON vfs.database_id = mf.database_id AND vfs.file_id = mf.file_id
  --WHERE vfs.file_id = 2 -- log files
  ORDER BY IsStall DESC
  -- ORDER BY IsReadStall DESC
  -- ORDER BY IsWriteStall DESC

--<<<<<<<<<<----------------------------------------------------------------->>>>>>>>>>--
--4. plan cache interrogation
-- note: sys.dm_exec_cached_plans is diminutive version of syscacheobjects
    -- no dbid, setopts
-- we want reusable code, absence of ad hoc SQL
-- we want relatively few rows with low usecounts
--2000  
SELECT cacheobjtype , objtype , usecounts , pagesused , dbid , sql
    FROM master.dbo.syscacheobjects
    WHERE cacheobjtype = 'Compiled Plan'
    ORDER BY usecounts DESC
    --ORDER BY sql
--2005
SELECT c.cacheobjtype , c.objtype , c.usecounts , c.size_in_bytes , t.dbid , t.text
    FROM sys.dm_exec_cached_plans as c
        CROSS APPLY sys.dm_exec_sql_text(plan_handle) as t
    WHERE c.cacheobjtype = 'Compiled Plan'
    ORDER BY c.usecounts DESC
    --ORDER BY t.text

--<<<<<<<<<<----------------------------------------------------------------->>>>>>>>>>--
--5. real-time blockers
    --Report Blocker and Waiter SQL Statements
    --http://www.microsoft.com/technet/scriptcenter/scripts/sql/sql2005/trans/sql05vb044.mspx?mfr=true
    -- SQLCAT BPT
SELECT
    t1.resource_type as lock_type
  , db_name(resource_database_id) as DB
  , t1.resource_associated_entity_id as blkd_obj
  , t1.request_mode as lock_req          -- lock requested
  , t1.request_session_id as waiter_sid-- spid of waiter
  , t2.wait_duration_ms as waittime
  , (SELECT text FROM sys.dm_exec_requests as--- get sql for waiter
        CROSS APPLY sys.dm_exec_sql_text(r.sql_handle)
            WHERE r.session_id = t1.request_session_id) as waiter_batch
  , (SELECT SUBSTRING(qt.text , r.statement_start_offset/2
          , (CASE WHEN r.statement_end_offset = -1
                THEN LEN(CONVERT(nvarchar(MAX), qt.text)) * 2
                ELSE r.statement_end_offset END - r.statement_start_offset)/2)
        FROM sys.dm_exec_requests as r
            CROSS APPLY sys.dm_exec_sql_text(r.sql_handle) as qt
            WHERE r.session_id = t1.request_session_id) as waiter_stmt    --- this is the statement executing right now
   , t2.blocking_session_id as blocker_sid -- spid of blocker
   , (SELECT text FROM sys.sysprocesses as p       --- get sql for blocker
            CROSS APPLY sys.dm_exec_sql_text(p.sql_handle)
            WHERE p.spid = t2.blocking_session_id) as blocker_stmt
FROM sys.dm_tran_locks as t1
    JOIN sys.dm_os_waiting_tasks as t2
        ON t1.lock_owner_address = t2.resource_address




Script Repository: SQL Server 2005