Home > SQL Challenge > When ROW_NUMBER() OVER (PARTIION BY … ORDER BY …) function is not appropriate

When ROW_NUMBER() OVER (PARTIION BY … ORDER BY …) function is not appropriate

ROW_NUMBER() function has been available since SQL Server 2005. There are many good articles out there about this function. There are many bad ones out there too. (So be careful about what you reading. Always create test cases for your data to make sure you get the results you expect.)

ROW_NUMBER() function can be used with or without the PARTITION BY clause, but it must have the ORDER BY clause. It’s the usage with the PARTITION BY clause that interests me the most. I use ROW_NUMBER() with the PARTITION BY clause in cleansing the raw data with unwanted duplicates.

Here is a simple example. In this example, the device is the same device, but with two different inventory date and database id. We need to clean this kind data by removing the one that has older inventory date or smaller database id.

image

Here is what you can do to achieve that:

select    *
from    (    select    (other columns)
            ,        ranked = row_number() over
                        (    partition by DeviceName
                            order by LastInventoryDate desc, DatabaseID desc )
            from    myRawTable ) a
where    ranked = 1

image

 

With the above logic, the first record will be removed. This logic will be applied to every device.

Recently, while working on a SQL Reporting Services report, I realized that I need to use the RANK() function instead to meet the reporting requirement.

The requirement goes like this:

1) Data will come from 4 different sources. Using each asset’s operating system to determine which source file to use since these 4 sources files overlap assets.

2)  However, within each source file, we want ALL the hits, not just the unique hits.

These two requirements cry out for the RANK() function with the PARTITION BY clause.

image 

With the RANK() function, data from the different source file will get a different rank number, but will get the same rank number if they are from the same source file, which will cause them to be removed all together or be kept all together.

Categories: SQL Challenge
  1. No comments yet.
  1. No trackbacks yet.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: