Sunday, August 25, 2013

Developer Freedom is a Blank Text Editor

"Though age from folly could not give me freedom,
It does from childishness.
"
-- Shakespeare, Antony and Cleopatra
Act I Scene III Lines 57-58

"Ye shall know the truth, and the truth shall set you free."
-- John 8:32

"Freedom! Horrible, horrible freedom!"
-- Simpsons 3F31

For a software developer there is nothing as freeing or as scary as a blank text editor.  Just looking at it is both liberating and intimidating at the same time.


(or to not start any wars)


With a blank text editor you can create anything and everything.  This is how Google and Facebook got their start.  The software used on the Space Shuttle, Unix, and even the Internet itself all got their starts from a blank text editor.  Think about this next time someone tries to sell you a tool to limit your freedom.

Saturday, August 17, 2013

Learn You Some SQL For Great Good - Unlearn What You Have Learned OR How Select Logically Works (the SELECT clause--OVER Ranking clause)

"The rich ruleth over the poor,
and the borrower is servant to the lender.
He that soweth iniquity shall reap vanity:
and the rod of his anger shall fail.
"
-- Proverbs 22:7-8

RANKing OVER data sets


We are still on the SELECT clause.

  1. FROM
  2. ON
  3. JOIN
  4. WHERE
  5. GROUP BY
  6. WITH CUBE or WITH ROLLUP
  7. HAVING
  8. SELECT
  9. DISTINCT
  10. ORDER BY
  11. TOP
We are going to talk about what I think is one of the most useful functionality available in modern SQL, the OVER clause.  For this post we'll look at the ranking function of RANK.

The easy way to think of the OVER clause is that it is a GROUP  BY applied to the data set being SELECTed (note this is not what is actually happening, but it is the easy way to think about it).

RANKing


Given the following table (or data set) of playwrights:


CREATE TABLE playwrights (  id [int] IDENTITY(1,1) ,name [varchar](100) ,birthYear [int] ,birthEra [char](2));


INSERT INTO playwrights
(name, birthYear, birthEra)
VALUES
  ('Oscar Wilde', 1854, 'AD')
 ,('Euripides', 480, 'BC')
 ,('Aeschylus', 525, 'BC')
 ,('Sophocles', 497, 'BC')
 ,('William Shakespeare', 1564, 'AD')
 ,('Jacopone da Todi', 1230, 'AD')

;

We see that the playwrights have a natural ranking by birthEra and birthYear, so if we wanted to rank them by when they were born we could do something like the following for the BC era playwrights:

SELECT
   *
  ,RANK() OVER (ORDER BY birthYear DESC) AS rank
  FROM playwrights
  WHERE birthEra = 'BC'
;


(on SQL Fiddle)

Looking at the results we see that the rank returned is ordered by the the birthYear in descending order, for the BC era this is what we want.

Now for AD era we want the birthYear to be ranked in ascending order, like this:

SELECT
   *
  ,RANK() OVER (ORDER BY birthYear ASC) AS rank
  FROM playwrights
  WHERE birthEra = 'AD'
;


(on SQL Fiddle)

If we want to combine the results into one, we can UNION the two sets and do a RANKing over the results:

SELECT
   RANK() OVER (ORDER BY birthEra DESC, year_rank ASC) AS rank
  ,*
  FROM (

SELECT
   *
  ,RANK() OVER (ORDER BY birthYear DESC) AS year_rank
  FROM playwrights
  WHERE birthEra = 'BC'

UNION ALL

SELECT
   *
  ,RANK() OVER (ORDER BY birthYear ASC) AS year_rank
  FROM playwrights
  WHERE birthEra = 'AD'

) AS needed_for_sub_select
;


(on SQL Fiddle)

The sub select will allow us to rank each era in the correct way (BC by DESC and AD by ASC).  We combine each result from the two sub selects with a UNION ALL, this gives us one big data set which is piped to the outer SELECT statement.  Given this big data set we can now do an OVER clause on the UNION of the two sets in the sub select with a RANKing by the birthEra descending (so that BC is before AD) followed by year_rank.  This is a fairly clean and readable way to solve the problem (plus it is very fast, more on that later).

Sunday, August 4, 2013

Learn You Some SQL For Great Good - Unlearn What You Have Learned OR How Select Logically Works (the SELECT clause)

"Though thanks to all, must I select from all. The rest
Shall bear the business in some other fight,
"
-- Shakespeare, Coriolanus
Act I, Scene VI, Lines 81-82

SELECT information on selections


We are finally on the term SELECT.  Yep, this is normally the first thing covered when one learns about SQL, but as you now know SELECT is logically the 8th thing that gets looked at when doing a SELECT SQL query.

  1. FROM
  2. ON
  3. JOIN
  4. WHERE
  5. GROUP BY
  6. WITH CUBE or WITH ROLLUP
  7. HAVING
  8. SELECT
  9. DISTINCT
  10. ORDER BY
  11. TOP
SELECT has a lot of different things going on, so we'll spend a bit of time on SELECT.  For this post we'll just cover projections.

Projections OR how to impress people when talking about SELECT


A projection is a fancy relational algebra term for something that is very easy to understand.  A project can be thought of as a list of things that one wants from a set.

Given the following tables (or data sets):

CREATE TABLE plays (
  id [int] IDENTITY(1,1)
 ,name [varchar](100)
 ,playwrightId [int]
);

CREATE TABLE playwrights (
  id [int] IDENTITY(1,1)
 ,name [varchar](100)
 ,birthYear [int]
 ,birthEra [char](2)
);

We can say I am only interested in the name column (or attribute) of the plays table with the following query.



SELECT name
  FROM plays
;

(on SQL Fiddle)

This will limit are results to just the plays in the plays table.

Like wise we can join the two tables together and say which columns (attributes) we want for each.



SELECT
   p.name AS play
  ,w.name AS playwright
  FROM plays AS p
  INNER JOIN playwrights AS w
    ON p.playwrightId = w.id
;

(on SQL Fiddle)

Notice we just do not speak of the things we do not want, SQL is a declarative language (yep, even Forbes think declarative programming is a good thing), this means we describe what we want and not how to do it.

Also notice the AS term, this term allows us to alias things.  If we did not have the alias of p and w on the plays and playwrights tables we would not be able to distinguish between the plays' name column and the playwrights' name column or the id on plays and the id on playwrights

SELECT
   name AS play
  ,name AS playwright
  FROM plays
  INNER JOIN playwrights
    ON playwrightId = id
;

(on SQL Fiddle)

We get the following error:

Ambiguous column name 'id'.: SELECT name AS play ,name AS playwright FROM plays INNER JOIN playwrights ON playwrightId = id

There is no way for the database management system to be able to tell the difference between the plays' id and the playwrights' id.  To fix this we can do the following.

SELECT
   plays.name AS play
  ,playwrights.name AS playwright
  FROM plays
  INNER JOIN playwrights
    ON plays.playwrightId = playwrights.id
;

(on SQL Fiddle)

By saying which table they come from we are able to use column name and id in the query above.  This is a bit wordy which is why we have the AS term.  AS allows us to alias the fully name of the table as a different value.  Let's look back at the original query.

SELECT
   p.name AS play
  ,w.name AS playwright
  FROM plays AS p
  INNER JOIN playwrights AS w
    ON p.playwrightId = w.id
;

(on SQL Fiddle)

In the above query we have alias plays AS p and playwrights AS w.  Now when we go to use the playwrights id we can simply say w.id, likewise p.name is the plays' name column and w.name is the playwrights' name.  Notice that we further alias the projections of p.name AS play and w.name AS playwright, this will change the attributes on the result set of the query to play for p.name and playwright for w.name.