Code Optimization

Interesting things in software development and code optimization

MS SQL - Speed up Order By with OFFSET FETCH paging

UPDATE:

So adding nonclustered index, as the MS SQL Execution Plan suggests, seems reduced the query time even more, here is the suggested nonclustered index:

USE [myDB]

GO


CREATE NONCLUSTERED INDEX NONCLUSTERED_INDEX2_tblPictures

ON [dbo].[tblPictures] ([RatingTotal],[shape])

INCLUDE ([PictureID],[OwnerID],[PictureTypeID])

GO

but using only nonclustered index, without my #temp table technic, seems does not help too much.


Hello,

This time I'm going to share my solution on how I did reduce time of my SQL query with Order By clause by almost x2 times.

Firstly let me show my original query:

select @rowstotal = count(*)

From [dbo].[tblPictures]

where (@OwnerId = 0 OR @OwnerId = [OwnerID])

and (@FilterBy = 0 OR @FilterBy = PictureTypeID)

and (@Shape = '' OR @Shape = [shape])



Select

@rowstotal as TotalCount

, PictureID

, OwnerID

, PictureName

, [Description]

, Description2

, Description3

, aspectRatio

, [min]

, [max]

, [percent]

, thumbWidth

, thumbHeight

, processState

, keywords

From [dbo].[tblPictures]

where (@OwnerId = 0 OR @OwnerId = [OwnerID])

and (@FilterBy = 0 OR @FilterBy = PictureTypeID)

and (@Shape = '' OR @Shape = [shape])

Order By RatingTotal Desc

OFFSET @pageNumber ROWS FETCH NEXT @pageSize ROWS ONLY;


So to be able to calculate pages we have to know the total number of rows in the DB. Each row will contain TotalCount - total number of rows


(pay attention, that the techinc COUNT(*) OVER () as TotalCount is slower than just select @rowstotal = count(*) From that I use in my queries )


Then we select all needed columns with the order and paging technic.

Everything looks great, simple and fast unless you got more than 300 000 rows in the table.

Main problem here is that the Order By clause takes 98% of the whole stored procedure and in my case it takes 5 sec in total.

How to speed it up? 

I did google a lot of posts and almost everyone suggests to use indexes or nonclustered indexes, or other things.

As I'm not DBA and do not know a lot about all these things I decided to check what if I would select only one column instead of all of them? 

And when I got my 40 rows for one page I would select the rest of columns?

I did write some test query and had been surprised that it took almost x2 time less than before.


So here is the new query:

IF OBJECT_ID(N'tempdb..#tempPage', N'U') IS NOT NULL

DROP TABLE #tempPage;


create table #tempPage

(

TotalCount int null

, PictureID int

, OwnerID int null

, PictureName nvarchar(500) null

, [Description] nvarchar(500) null

, Description2 nvarchar(500) null

, Description3 nvarchar(max) null

, aspectRatio decimal(18,2) null

, [min] money null

, [max] money null

, [percent] decimal(18,2) null

, thumbWidth decimal(18,2) null

, thumbHeight decimal(18,2) null

, processState varchar(50) null

, keywords varchar(500) null

)


insert into #tempPage

Select

null --@rowstotal

,PictureID

,null

,null

,null

,null

,null

,null

,null

,null

,null

,null

,null

,null

,null

From [dbo].[tblPictures]

where (@OwnerId = 0 OR @OwnerId = [OwnerID])

and (@FilterBy = 0 OR @FilterBy = PictureTypeID)

and (@Shape = '' OR @Shape = [shape])

Order By RatingTotal Desc

OFFSET @pageNumber ROWS FETCH NEXT @pageSize ROWS ONLY


Update t

Set t.OwnerID = p.OwnerID

,t.TotalCount = @rowstotal

,t.PictureName=p.PictureName

,t.[Description]=p.[Description]

,t.Description2=p.Description2

,t.Description3=p.Description3

,t.aspectRatio=p.aspectRatio

,t.[min]=p.[min]

,t.[max]=p.[max]

,t.[percent]=p.percent

,t.thumbWidth=p.thumbWidth

,t.thumbHeight=p.thumbHeight

,t.processState=p.processState

,t.keywords=p.keywords

From #tempPage as t

INNER JOIN [dbo].[tblPictures] as p on p.PictureID = t.PictureID;

select * from #tempPage;

DROP TABLE #tempPage;


 Now we use a temp sql table to select only IDs firstly and then we update all 40 rows and set all other columns' values.

Also do not forget to drop temp table before and after to avoid existing table errors.


So this is the way that reduced my query time from 5 seconds to almost 2 seconds in total.


(SQL Server 2012 version: x64 11.0.3156.0)


Of course, if I would add nonclustered index, as MS SQL Graphical Execution Plan suggests, it may be even faster, but to be able to add them we have to understand what it is and how to use it. So maybe next step will be to learn and add nonclustered index ;)



Thank you and see you soon :)

 







1vqHSTrq1GEoEF7QsL8dhmJfRMDVxhv2y



SQL COUNT(*) is slow - how to increase the performance

Hi friends,

today I'm going to show you one possible way to speed up the COUNT(*)  performance in case when you need to know the total amount of rows of your select query with paging.

Here is the example of my first and slow query:

Select

    COUNT(*) OVER () as TotalCount

    , ID

    , OwnerID

    , Name

    , [Description]

    , keywords

From [dbo].[tblData]

where pState = N'a' and [status] <> 'P'

    and (@FilterBy = 0 OR @FilterBy = TypeID)

Order By DateAdded Desc

OFFSET @p0 ROWS FETCH NEXT @p1 ROWS ONLY;


this query did take from 4 seconds up to 9 seconds selecting 315000 rows from about 450000 rows -

that is really slow. It seems like OVER () overloads the query itself (but this is just my guess).

Also converting from nchar to char takes a lot of time, so I did change it (notice the N letter before string)

After some time of thinking and playing with that around, I did come to the following solution:

select @rowstotal = count(*)

From [dbo].[tblData]

where pState = 'a' and [status] <> 'P'

     and (@FilterBy = 0 OR @FilterBy = TypeID)


select

      @rowstotal as TotalCount

    , ID

    , OwnerID

    , Name

    , [Description]

    , keywords

From [dbo].[tblData]

where pState = 'a' and [status] <> 'P'

    and (@FilterBy = 0 OR @FilterBy = TypeID)

Order By DateAdded Desc

OFFSET @p0 ROWS FETCH NEXT @p1 ROWS ONLY;


(pay attention that to calculate total number of rows to be selected you do not need ordering. I think count(*) with ordering works much slower)


Now this query takes up to 1 second to select the same 315000 rows from about 450000 that is minimum 4 times faster :) 


I'm not sure if this is really best solution as I'm not a DBA master :)

but in my case it sped up the query enough.


Thank you.


1vqHSTrq1GEoEF7QsL8dhmJfRMDVxhv2y



Remote SQL Backup to local PC

Hello friends,


Each of us had issue with remote SQL backup files. Not with files itself but actually how to get bak files from remote sql server to local PC.

I googled a lot about this question (and I'm sure you did it as well) and did not find any solution. To be honest, there was no way to do it till yesterday :)

Yesterday, I did face with this problem again and seems I did find reliable solution or at least a chance to do it but have to say that it is not 100% working solution, it is about 99% and I will describe why at the end of this post.

So, in general it looks like this:

- execute an sql script to make backup of database;

- create a temp db with table and column of type varbinary type;

- get the *.bak file and insert it into the temp table;

- stream this row to your local pc and save as file;

- drop temp table and db;


That's it. Sounds like not very complex task but I can say that there may be some problems and you will have to solve them, and some problems even may be not possible to solve and then you are in the 1% who is not luck :(


Now lets take a look into each step more precisely, so first step is to create backup. But there are some problems, first problem is that any binary type field can include up to 2^31-1 bytes that is almost 2GB - 1 byte in size, so we will split our backup onto a few files, and there is the second problem - SQL Server supports splitting up to 64 files. So if our backup size is more than about 128GB I think that would be not possible :(

Ok, we have a few GB database and it is enough to split it onto up to 64 bak files (even less than 64), lets do it.

Lets calculate a size of our database: 

USE MyDB

SELECT CAST(SUM(size) * 8. / 1024 AS BIGINT)

FROM sys.master_files WITH(NOWAIT)

WHERE database_id = DB_ID() GROUP BY database_id

this will return long value in MB.

Next step is to create our bak files:

BACKUP DATABASE MyDB TO DISK = N'MyDB_tmp_1.bak'

,DISK = N'MyDB_tmp_2.bak'

,DISK = N'MyDB_tmp_3.bak'

,DISK = N'MyDB_tmp_4.bak'

,DISK = N'MyDB_tmp_5.bak'

WITH NOFORMAT, NOINIT, NAME = N'MyDB-Full Database Backup', SKIP, NOREWIND, NOUNLOAD, STATS = 10

This will backup our db into default backup location on the server.

Now its time to create temp db and table, or you may use the same db if you have no permission to create a new db:

IF db_id('TempDB') IS NULL

begin

create database [TempDB];

end

else

begin

use master;

ALTER DATABASE [TempDB] SET SINGLE_USER WITH ROLLBACK IMMEDIATE;

drop database [TempDB];

create database [TempDB];

end

use [TempDB];

create table Temp (filename nvarchar(512), [file] varbinary(max));

also we need to know the default backup path:

SELECT TOP 1 physical_device_name

FROM msdb.dbo.backupset b

JOIN msdb.dbo.backupmediafamily m ON b.media_set_id = m.media_set_id

WHERE database_name = '{0}'

and backup_finish_date >=N'{1:yyyy-MM-dd}'

and backup_finish_date < N'{2:yyyy-MM-dd}'

ORDER BY backup_finish_date DESC

Now we have everything we need to insert each file into the temp table and download them one by one,

lets insert file:

INSERT INTO [{0}].dbo.Temp([filename], [file])

SELECT N'{1}' as [filename], * FROM OPENROWSET(BULK N'{1}', SINGLE_BLOB) AS [file]

Here we may have another problem - you may get error that you have no permission to BULK INSERT and this is real problem as well, so you will finish here or you may try to upload a web app (if you use web hosting) and code it to add to the table as byte array.

Now everything is ready to download the file, but pay attention its better to use streaming instead of default batch reading:

SELECT * FROM TempDB.dbo.Temp WHERE [filename] = N'{0}'

and C# code:

sqlCmd = new SqlCommand("SELECT * FROM [" + tmpDBName + "].dbo.Temp WHERE [filename] = N'" +

string.Format(bakFileName, this.defaultBakPath, sqlConnection.Database, i) + "'", sqlConnection);


sqlCmd.CommandTimeout = sqlConnection.ConnectionTimeout;


SqlDataReader sqldr = sqlCmd.ExecuteReader(System.Data.CommandBehavior.SequentialAccess);

sqldr.Read();

string fileName = sqldr.GetString(0);

System.IO.FileStream file = new System.IO.FileStream(System.IO.Path.Combine(this.localPath, System.IO.Path.GetFileName(fileName)),

System.IO.FileMode.OpenOrCreate, System.IO.FileAccess.ReadWrite);

long startIndex = 0;

const int ChunkSize = 1024 * 32; //32 KB block

byte[] buffer = new byte[ChunkSize];

while (true)

{

long retrievedBytes = sqldr.GetBytes(1, startIndex, buffer, 0, ChunkSize);

file.Write(buffer, 0, (int)retrievedBytes);

startIndex += retrievedBytes;

if (retrievedBytes != ChunkSize)

break;

}

file.Close();

sqlCmd.Dispose();

ok, we have got first file so now we need to repeat the same by deleting each downloaded row in the temp table and inserting next one file and so on.

Finally, delete everything you don't need anymore - temp database and table:

DELETE FROM [TempDB].dbo.Temp

use master; ALTER DATABASE [TempDB] SET SINGLE_USER WITH ROLLBACK IMMEDIATE; drop database [TempDB];

Voila! You have your remote SQL backup file on your local PC. Cool!

I have written a simple C#.NET application that will do it all for you, but, please, make sure y ou have the following permissions:

- you have BULK INSERT permission or complete admin rights

- your backup files in total size less than 128GB

- some other problems that I have not faced with yet


You can extend it by deleting bak files on a disk, by calculating size of bak file to split it onto less number of bak files, and more.

If you have any comment - you are welcome.

Thank you.


RemoteToLocalSQLBackup.zip (9.4KB)
RemoteToLocalSQLBackup_src.zip (13.8KB)

1vqHSTrq1GEoEF7QsL8dhmJfRMDVxhv2y