Below is my list of the top 15 things I believe developers should do as a matter of course to tune performance when coding. These are the low hanging fruit of SQL Server performance – they are easy to do and often have a substantial impact. Doing these won’t guarantee lightening fast performance, but it won’t be slow either.
- Create a primary key on each table you create and unless you are really knowledgeable enough to figure out a better plan, make it the clustered index (note that if you set the primary key in Enterprise Manager it will cluster it by default).
- Create an index on any column that is a foreign key. If you know it will be unique, set the flag to force the index to be unique.
- Don’t index anything else (yet).
- Unless you need a different behaviour, always owner qualify your objects when you reference them in TSQL. Use dbo.sysdatabases instead of just sysdatabases.
- Use set nocount on at the top of each stored procedure (and set nocount off) at the bottom.
- Think hard about locking. If you’re not writing banking software, would it matter that you take a chance on a dirty read? You can use the NOLOCK hint, but it’s often easier to use SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED at the top of the procedure, then reset to READ COMMITTED at the bottom.
- I know you’ve heard it a million times, but only return the columns and the rows you need.
- Use transactions when appropriate, but allow zero user interaction while the transaction is in progress. I try to do all my transactions inside a stored procedure.
- Avoid temp tables as much as you can, but if you need a temp table, create it explicitly using Create Table #temp.
- Avoid NOT IN, instead use a left outer join – even though it’s often easier to visualize the NOT IN.
- If you insist on using dynamic sql (executing a concatenated string), use named parameters and sp_executesql (rather than EXEC) so you have a chance of reusing the query plan. While it’s simplistic to say that stored procedures are always the right answer, it’s also close enough that you won’t go wrong using them.
- Get in the habit of profiling your code before and after each change. While you should keep in mind the depth of the change, if you see more than a 10-15% increase in CPU, Reads, or Writes it probably needs to be reviewed.
- Look for every possible way to reduce the number of round trips to the server. Returning multiple resultsets is one way to do this.
- Avoid index and join hints.
- When you’re done coding, set Profiler to monitor statements from your machine only, then run through the application from start to finish once. Take a look at the number of reads and writes, and the number of calls to the server. See anything that looks unusual? It’s not uncommon to see calls to procedures that are no longer used, or to see duplicate calls. Impress your DBA by asking him to review those results with you
It’s all about indexing
Pretty much everyone with even a little database experience agrees that indexes are crucial for developing efficient database applications. Yet in practice, indexing gets neglected or isn’t implemented properly. Often, senior-level software engineers develop applications that lack proper indexes (see “Four steps to proper table indexing” below).
Without proper indexes, SQL Server has to scan all data in the table to find matching rows. Imagine how difficult it would be to look for certain words in a book without having an index at the end. You would have to read the book cover to cover every time you needed to find something. But that’s essentially what SQL Server has to do anytime you look for data in a large table that’s not indexed. As a rule, you should index columns that are included in the WHERE clause.
Careful what you COUNT(*)
Here is another coding inefficiency developers commit: “If the result of SELECT COUNT(*) is greater than 0, do something.” If you look at it more closely, you’ll see that what we want is “If at least one row meets the condition, do something.” The problem with using COUNT(*) here is that the SQL query will count all the rows that match; instead, we can stop it after finding the first match using the EXISTS query. So keep this subtle difference in mind and count everything only when you need the actual count. Use EXISTS when all you need is to check for the existence of at least one matching row.
Get queries, data types in sync
When writing Transact-SQL and ADO.NET code that uses parameters, it is important that you match data types in the queries with column data types in the tables. In some cases SQL Server can do an implicit data type conversion for you (e.g., convert a number to a string). But what you may not realize is that while the conversion of the data types will not affect the results, SQL Server may not be able to use the existing index on the column being queried.
For an example, let’s turn to Microsoft’s AdventureWorks sample database. In the Employee table, there is a column called NationalIDNumber; it is defined in the database as the variable-length data type NVARCHAR(15). The column is indexed, so SQL Server should be able to perform a quick lookup if you search for a particular ID. Now look at these two queries that return the same result:
* FROM [HumanResources].[Employee]
WHERE NationalIDNumber = N’233069302′
* FROM [HumanResources].[Employee]
WHERE NationalIDNumber = 233069302
Figure 1 shows the execution plans for both queries. (In each window, “Query cost” shows what percentage of execution time the query took, and percentages for each step are displayed below the query lines.)
Figure 1. This execution plan shows that when the data types match (top window), SQL Server performs an index seek to find matching data. When the data types don’t match in the second query, SQL Server uses a less efficient index scan.
In the first query, we search using an NVARCHAR string, N'233069302', so the data type in the query matches the data type in the table. SQL Server uses the existing index on the NationalIDNumber column and performs an index seek. That gives us a quick lookup.
In the second query, we search using an integer, 233069302, so there is a type mismatch between the query and the data type in the table. Subsequently, SQL Server converts the value in each row in the table from NVARCHAR to INTEGER to perform a comparison, and it scans all the rows in the table. It still uses the index, because it’s faster to scan all national IDs in the index pages as opposed to data pages, but a scan requires much more processing to deliver the same result.
In this particular database, the difference in execution is not very large (54% of total execution time for both queries versus 46%), but if the table contained millions of rows, we could be talking about an execution time taking valuable seconds instead of milliseconds. The difference in the query is subtle, but it’s an easy mistake to make if the programmer assumes the data type in the table was stored as an integer and doesn’t double check.
Indexing your databases makes the applications faster and more responsive, and efficient coding cuts down on processing time. When you are using small databases and your front-end applications aren’t accessed by many concurrent users, inefficiencies aren’t noticeable. But as the data piles on and your applications become more heavily used, the database and the applications can slow down to the point where they’re not usable, users will get timeouts and your customers will be unhappy. Following these coding and database efficiency practices and coding your applications with scalability in mind will ultimately improve SQL Server performance.
Four steps to proper table indexing
The importance of indexing for speedier applications cannot be stressed enough. Use this indexing checklist as a guideline for making sure your tables are indexed properly.
- Create a clustered index. To help SQL Server organize data more efficiently, each table should have a clustered index, which determines the order of data in a table. Choose your ordering method carefully (by last name, ID, date of birth) because you can only have one clustered index.
- Create a primary key. Each table should have a primary key to implement entity integrity. In some cases, you may also want to have a clustered index on the primary key. When you are doing a range search on dates, for example, it’s best to create a clustered index on the date column.
- Create unique constraints. If you have a group of columns in which each row should have a distinct value, enforce this requirement in the table design. You could implement it as a unique index or a unique constraint. The latter option is encouraged in SQL Server Books Online because it makes the objective clearer.
- Create indexes on foreign keys. You’ll want to create an index on your foreign keys so that child records can be found quickly. But don’t blindly create an index on each foreign key in the table. Take column selectivity into account and index only columns that will contain numerous distinct values. For example, indexing the OrderID column in the OrderDetails table makes sense, because the column is selective; it will have as many distinct values as the number of orders. But if you have a column like OrderStatusID, you will only have a few distinct values across the table, so the index won’t likely be used.
It is very important to choose appropriate order of fields on each index. Bad order causes that index could be unused. First rule is that the most selective columns should go first. This rule sometimes can lead to misunderstanding that every index should contains most selective column as a leading column. Let’s consider you have table with 3 columns: ID, fname and lname. ID is most selective column with clustered index. Now you want to create non-clustered index for fname and lname. lname has higher selectivity than fname. If you want to create non-clustered index for the rest two columns, place lname on the first name and fname on the second. Don’t place ID column on index. Be very careful which column you place on the first name. This is because of the SQL server keeps histogram only for first column of an index. That means, that SQL Server knows only the actual distribution of values of the first column. If the first column is not selective, the index may not be used.
Here I will show you some basic examples which demonstrate how indexes can affect performance of queries. For test purposes I have created table [Person].[Person_Test] in database AdwentureWorks2008R2. This table has the same structure as [Person].[Person]. I have created this table because I don’t want to change existing table.
If you run a query on table with no indexes, SQL Server does Table Scan against the table to look through every row to determine if any of the records have last name of “Brown”. As you can see on picture below, This Query has an Estimated Subtree Cost of 2.84525. This value represents the total cost of the query optimizer for executing this query and all operations preceding it on the same subtree. The lower the number, the less resource intensive is execution of query for SQL Server.
In this example we will create a non-cluster index on
CREATE NONCLUSTERED INDEX [IX_Person_Test_LastName] ON [Person].[person_test] ( [LastName] ASC ) WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY] GO
When you run this query, SQL Server uses index to do the Index Seek (instead of Table Scan) and after this operation It will need to do RID Lookup to get the actual data. On picture below you can see Estimated Execution Cost of 0.299353 that indicates, that this query performs much more better with index (in this case non-clustered index on LastName column).
Here I will demonstrate you how clustered index can affects performance. Before executing query You have to create a clustered index on LastName column.
CREATE CLUSTERED INDEX [IX_Person_Test_LastName_Clustered] ON [Person].[person_test] ( [LastName] ASC )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY] GO
When this query runs, SQL Server does an Index Seek. The great thing is that index point directly to the actual data pages. As you can see, tha Estimated Subtree Cost is only 0.0155815. Using clustered indexes is the fastest access method for this type of query.
In this example we are only requesting LastName column. Since this query can be handled by just the non-clustered index, SQL Server does not need to access the actual data pages. Based on this query the Estimated Subtree Cost is only 0.0033832. As you can see this even better then previous example.
To take this a step further, the below output is based on having a clustered index on lastname and no non-clustered index. You can see that the subtree cost is still the same as returning all of the columns even though we are only selecting one column. So the non-clustered index performs better.
When you run this query, SQL Server uses index to do the Index Seek and after this operation It will need to do RID Lookup to get the actual data. On picture below you can see Estimated Execution Cost of 0.29934.
Table with non-clustered index on lastname column including firstname (selecting LastName and FirstName)
In this example you will see how covering index can improve performance of the query. Following script allows you to create covering index.
CREATE NONCLUSTERED INDEX [IX_Person_Test_LastName_Include_FirstName] ON [Person].[person_test] ( [LastName] ASC )INCLUDE (FirstName) WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY] GO
In this query we are requesting 2 columns (LastName and FirstName). This query can be handled just by non-clustered index, because this index contains information about data pages of both columns. The Estimated Subtree Cost is 0.0033832 which is much more lower than in previous example.
In this example I will show you how to create filtered index and how can this type of index improve performance. Let’s consider following query:
SELECT LastName from person.person_test where modifieddate<'2005-01-01'
In case there is no non-clustered filtered index on table SQL Server does full Table Scan. If you have a millions records in table, it could take a long time.
To improve query performance you can create non-clustered filtered index:
CREATE NONCLUSTERED INDEX [IX_Person_Test_LastName_Filtered_ModifiedDate] ON [Person].[person_test] ( [LastName] ASC )WHERE ModifiedDate <'2005-01-01' WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, önLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY] GO
As you can see the Estimated Subtree Cost is much more lower in case of non-clustered filtered index than in case without index.
When you are creating indexes, often the default options are used. This options create index in ascending order. This is usually the most logical way if creating an index, but in some cases this approach wouldn’t be the best. For example when you create index on ColumnA of TableA using default options, the newest data are at the end. This works perfectly when you want to get data in ascending order from the last recent at the top to the most recent at the end. But what if you need to get the most recent data at the top?. In this case you can create index in descending order. In a few following examples I will show you hot to create indexes in different order and how they can affect performance of queries. For all following examples I will use PurchasingOrderHeader of AdventureWorks2008R2 database.
In this first example we are just requesting OrderDate column of PurchasingOrderHeader.
You can see that Estimated Subtree Cost is 0.0380656. But what will happen if we use Order By clause? Let’s try to investigate.
When we user Order By clause SQL Server will sort requested data. As you can see, the sort operation is the most resource sensitive operation in this case and the overall Estimated Subtree cost is increased. To improve this query’s performance we can create a non-clustered index on OrderDate column.
CREATE NONCLUSTERED INDEX [IX_PurchaseOrderHeader_OrderDate] ON [Purchasing].[PurchaseOrderHeader] ( [OrderDate] ASC )
If we run query again we will see that Estimated Subtree Cost is better than before because SQL Server query optimizer doesn’t perform Sort operation. It just do Index Seek.
But here you can ask what if you need data in descending order. You can apply the same technique for indexes using descending order. I will show you that they can affect performance on the same way as indexes using ascending order.
When you run query which sorts records using descending option you can see that Estimated Subtree Cost is the same as in case of ascending order without index. For increasing performance of query you can create non-clustered index using descending option:
CREATE NONCLUSTERED INDEX [IX_PurchaseOrderHeader_OrderDate] ON [Purchasing].[PurchaseOrderHeader] ( [OrderDate] DESC )