Saturday, August 17, 2013

Learn You Some SQL For Great Good - Unlearn What You Have Learned OR How Select Logically Works (the SELECT clause--OVER Ranking clause)

"The rich ruleth over the poor,
and the borrower is servant to the lender.
He that soweth iniquity shall reap vanity:
and the rod of his anger shall fail.
"
-- Proverbs 22:7-8

RANKing OVER data sets


We are still on the SELECT clause.

  1. FROM
  2. ON
  3. JOIN
  4. WHERE
  5. GROUP BY
  6. WITH CUBE or WITH ROLLUP
  7. HAVING
  8. SELECT
  9. DISTINCT
  10. ORDER BY
  11. TOP
We are going to talk about what I think is one of the most useful functionality available in modern SQL, the OVER clause.  For this post we'll look at the ranking function of RANK.

The easy way to think of the OVER clause is that it is a GROUP  BY applied to the data set being SELECTed (note this is not what is actually happening, but it is the easy way to think about it).

RANKing


Given the following table (or data set) of playwrights:


CREATE TABLE playwrights (  id [int] IDENTITY(1,1) ,name [varchar](100) ,birthYear [int] ,birthEra [char](2));


INSERT INTO playwrights
(name, birthYear, birthEra)
VALUES
  ('Oscar Wilde', 1854, 'AD')
 ,('Euripides', 480, 'BC')
 ,('Aeschylus', 525, 'BC')
 ,('Sophocles', 497, 'BC')
 ,('William Shakespeare', 1564, 'AD')
 ,('Jacopone da Todi', 1230, 'AD')

;

We see that the playwrights have a natural ranking by birthEra and birthYear, so if we wanted to rank them by when they were born we could do something like the following for the BC era playwrights:

SELECT
   *
  ,RANK() OVER (ORDER BY birthYear DESC) AS rank
  FROM playwrights
  WHERE birthEra = 'BC'
;


(on SQL Fiddle)

Looking at the results we see that the rank returned is ordered by the the birthYear in descending order, for the BC era this is what we want.

Now for AD era we want the birthYear to be ranked in ascending order, like this:

SELECT
   *
  ,RANK() OVER (ORDER BY birthYear ASC) AS rank
  FROM playwrights
  WHERE birthEra = 'AD'
;


(on SQL Fiddle)

If we want to combine the results into one, we can UNION the two sets and do a RANKing over the results:

SELECT
   RANK() OVER (ORDER BY birthEra DESC, year_rank ASC) AS rank
  ,*
  FROM (

SELECT
   *
  ,RANK() OVER (ORDER BY birthYear DESC) AS year_rank
  FROM playwrights
  WHERE birthEra = 'BC'

UNION ALL

SELECT
   *
  ,RANK() OVER (ORDER BY birthYear ASC) AS year_rank
  FROM playwrights
  WHERE birthEra = 'AD'

) AS needed_for_sub_select
;


(on SQL Fiddle)

The sub select will allow us to rank each era in the correct way (BC by DESC and AD by ASC).  We combine each result from the two sub selects with a UNION ALL, this gives us one big data set which is piped to the outer SELECT statement.  Given this big data set we can now do an OVER clause on the UNION of the two sets in the sub select with a RANKing by the birthEra descending (so that BC is before AD) followed by year_rank.  This is a fairly clean and readable way to solve the problem (plus it is very fast, more on that later).