Since I’ve been giving some advice to others recently I thought I would share tips for dealing with large SQL Server scripts. Which are based on my own experiences over the years.
Of course, we all have our own ideas about what a large SQL Server script is. Some people would say it’s hundreds of lines of code and others would say thousands.
In reality, there are lots of resources available to work on query performance. In this post I am sharing tips for dealing with large SQL Server scripts based on my own experiences.
Because this post is a bit long, below are some quick links to each section.
- Break it down
- Execution Plans
- Missing indexes
- Copy and paste
- Pseudo tables and indexes
- Order by statements
- Final word
My first tip below definitely applies if you are working with a script that is thousands of rows long.
If you have been given a script that is thousands of lines long from a developer, the first thing I would ask is if they can split it up.
I say this because a lot of developers who write long scripts tend to have come from various backgrounds. Hence, some of them are used to developing on other programming languages.
So, they do not always appreciate that SQL is a set-based language. In addition, they do not always appreciate SQL Server is optimised for set based queries.
For example, one giveaway that they are used to writing in another programming language is that you find one or more GOTO statements in the code. Which allows you to jump to another part of the code.
Now, I remember using the GOTO syntax on my first computer many years ago. It has it’s uses in many programming languages. However, I personally feel that SQL Server is not one of them.
With this in mind I’m not going to share the link for the GOTO statement in this post. Because personally I think it makes the code less readable and I want to discourage it’s use.
I recommend using calls to other stored procedures or functions instead. With this in mind, I will share the link for how to execute a stored procedure here. So that everybody is aware of best practices.
Explain to whoever is responsible for the code the below benefits they will get from breaking the code down:
- Makes the code more readable.
- Smaller queries can be optimised better.
- Reduces possibility of changes causing problems.
- Makes it easier to identify issues caused by code changes.
- Easier to identify which areas require performance tuning.
However, bear in mind that breaking it down will take some time on their part. So, you might have to wait until they have some free time.
In the meantime, you may have to look into it yourself and recommend changes. If you are going to do this then I suggest you keep reading this post.
If possible, look at the estimated or actual execution plans for the script.
Of course, the larger the query the higher the chance the estimated execution plan is not the same as the actual one.
In addition, there is another thing to consider if testing on another server. Which is that the plans created may differ if the environment is not the same.
Bare these things in mind if using the estimated execution plan.
If it’s a very long query you will probably want to view the plan in Plan Explorer by SentryOne. Which will make it more readable and help to identify issues quicker.
I discussed Plan Explorer in a previous post about free SQL Server tools here.
Focus on what is taking up the most percentage in the plan. Especially if it’s something like an index scan or a key lookup. Which brings me nicely to my next section.
Most of the time when you look at the execution plans the issue is fairly obvious. For example, one certain part of the query is around ninety five percent and is doing either an index scan or a key lookup.
Now, key lookups are not always bad. However, they are not always good either. In reality, if a query is doing a key lookup for columns in the same table you may want to test a new covering index.
Of course, there are many different methods you can use to identify potentially missing indexes. Personally, I use Glenn Berry’s missing indexes script which I discussed previously here.
In fact, it’s very useful to use Glenn’s script if the query has run since the last server reboot. Because, usually if the script you are looking to improve takes a while it is being run outside of the working day.
So, you can see which potentially missing queries were identified during the time your long script was run.
Of course, there are other documented methods you can use to identify missing indexes. At the end of the day a lot of it is down to personal preference.
Another key point I want to make is for you to remember that filtered and Columnstore indexes exist. Because introducing either of these can make a significant difference to long running queries.
In reality, I have seen forum posts go on for pages without mentioning either of these index types. So, I just wanted to remind you all here.
Of course, there are times when just reviewing the execution plan will not always show the main problem. On some occasions, you may know which section of the script is causing a problem, but you cannot find the issue.
If this piece of SQL does not rely on anything too complex earlier in the script I recommend that you copy and paste that section to a new query window.
Afterwards, you can focus on improving that section by itself. Because the issue may be more complex than a missing index.
If you query has a large select statement in it joining multiple tables it’s worth looking to split that query up to make it perform better. One method I like to use is something I call pseudo tables.
Which is where I create a temporary table of one or more of the tables used in the join beforehand to improve the query performance.
For example, say you have the below SQL which is taking a while:
SELECT A.col1,A.col2,B.col3,B.col4,c.col5 INNER JOIN B on A.col1=B.col3 INNER JOIN C on B.col4=c.col5 WHERE a.col1='a' and B.col4='B'
What you can do to speed things up is run a SELECT INTO statement to create a temporary table containing a subset of these results. In effect, creating what I like to call a pseudo table as below.
SELECT B.col3,B.col4,c.col5 INTO #tmptable FROM B INNER JOIN C on B.col4=c.col5 WHERE B.col4='B' SELECT A.col1,A.col2,T.col1,T.col2,T.col1 INNER JOIN #tmptable T on A.col1=T.col1 WHERE a.col1='a'
I have used this method myself in the past and it has made significant difference. Now, before anybody says anything the above example could probably be improved with indexes.
However, I used it to give you an example of inserting a subset of a query into a temporary table first.
In fact, if the same subset is queried later in the query you can take it one step further. By creating what I call a pseudo index on the temporary table.
For example, if a subset of this result set is queried for all rows where col5 equals c you can create the below filtered index:
CREATE NONCLUSTERED INDEX IX_col4_col5 ON #tmptable (col4,col5) WHERE col5 = 'c'
Again, this is purely a basic example and a good indexing strategy may be better for this particular query.
However, combining result sets into a temporary table and then creating a covering index on certain columns can be very powerful.
In reality, a few people in our industry have different terms for these temporary tables and indexes. I call them pseudo. However, what matters is the outcome not the name.
If you have performance issues with a large SELECT statement that has one or more joins in it this is definitely something I recommend.
My last tip for dealing with large SQL Server scripts is about ORDER BY statements. Whilst going through the query look at what ORDER BY statements are being used. Because quite often I find a lot of these statements early on the query when they are not actually needed.
For example, they are used in a SELECT statement to return results. Which are then used later in the query which does not required the results ordered.
If the ORDER BY is used in the final results, check where the output is going. Because it may not be needed or quicker to do in the final destination.
For example, if the results are being copied in Excel why order it. Because the end user can easily do that in Excel themselves.
I hope my tips for dealing with large SQL Server scripts helps some of you.
Like I said earlier the purpose of this post was to share my own experiences. In reality, there are plenty of other resources available for more in-depth performance tuning methods.
Of course, if you have any views of this you are welcome to add a comment.