Sunday, July 21, 2013

Learn You Some SQL For Great Good - Unlearn What You Have Learned OR How Select Logically Works (the GROUP BY clause)

"A gang of cynics can upset a whole city;
a group of sages can calm everyone down"
-- Proverbs 29:8

Grouping by Groups


We are now onto GROUP BY in our trip down the SELECT clause.

  1. FROM
  2. ON
  3. JOIN
  4. WHERE
  5. GROUP BY
  6. WITH CUBE or WITH ROLLUP
  7. HAVING
  8. SELECT
  9. DISTINCT
  10. ORDER BY
  11. TOP

Counting


As Celia said to Rosalind, "It is as easy to count atomies as to resolve the propositions of a lover" (As You Like It, Act III, Scene II, Lines 225-226).  Counting is one of those skills that you use through out your life, in many different ways and for many different reason.  SQL provides the means to count members of sets through use of the COUNT aggregate.

If we just wanted to get the total number of records we could just use the following:



SELECT COUNT(*)
  FROM plays
;

This would give use the total number of plays in our set (26 in the set we have been working with).

It would be useful if we could apply an aggregate like to groups of object so that we get things like the total count of plays for each playwright in our database.  Luckily SQL give us the GROUP BY clause to solve just this kind of problem.

The GROUP BY takes the data given to it by the FROM and WHERE clauses and creates sets of data based on attribute it is given to group on.  Say we wanted to get the total number of plays for each playwright in our database.  We can use the following SQL to get this count:



SELECT
   a.name AS playwright
  ,COUNT(*) AS number_of_plays
  FROM plays AS p
  INNER JOIN playwrights AS a
    ON p.playwrightId = a.id
  GROUP BY a.name
;

In the query above each play is associated to its playwright, this set of data is given to the GROUP BY which is going to create data sets based on whom the playwright is for the play.  For each one of these sets from the GROUP BY clause, we are going to get the name of the playwright and the COUNT of the number of plays they have in our database.

If we want to we could get just the number of plays each playwright that was born in AD has using the following:



SELECT
   a.name AS playwright
  ,COUNT(*) AS number_of_plays
  FROM plays AS p
  INNER JOIN playwrights AS a
    ON p.playwrightId = a.id
  WHERE a.birthEra = 'AD'
  GROUP BY a.name
;

This will just give use the number of plays for Oscar Wilde and William Shakespeare because the WHERE clause is applied to the data set before the GROUP BY gets the data set, so the GROUP BY only sees Oscar Wilde and William Shakespeare.  This is similar to why what is given to the SELECT statement outside of an aggregate function like COUNT, must be in the GROUP BY clause, otherwise it is not available to the SELECT, but we getting a head of ourselves.

As always the SQL examples can be found on SQL Fiddle.