Saturday, November 16, 2013

Working with Multiple Values in a Column with SQL Server

"Any hard lesson that may do thee good."
-- Shakespeare, Much Ado About Nothing
Act I, Scene I, Line 272

SQL is a good tool when you have large sets of relational records--whether or not these kinds of relationships exist in the real world is not the aim of this post.  However, SQL is not very good at normalizing multiple records in a single attribute.

I know, an example would be good right about now.


As we see above Attribute 2 has three different values in it (Value 1, Value 2, and Value 3).

Say we want to do the following.

We want to take the Order File's Order Records and place the Order ID and others into the Order Table while placing the Order ID and Flags into the Order Flag Table.

Well we have a problem, how are we going to take the Flags and normalize them into the Order Flag Table?

Say the Flags are separated by spaces so Flags looks like this in an Order Record:


How would we do this in SQL?  Hmmm, well SQL is not really good at this kind of process so we need to do a kind of hack.

In SQL Server 2012 (should work in 2008 too) we can transform the Flags into XML that looks like this:

CHS
XMS
Once the data looks like this we can SQL Server's built in XML processing to create a virtual table to obtain the values from.

We can set up a test case using the following a CTE.

;WITH cte(value, id) AS (
  SELECT *
    FROM (
         VALUES
          ('HERP DREP CHS XMS', 1)
         ,('HELLO WORLD', 2)
       ) AS x(a, b)
)
SELECT *
FROM cte

Which looks like this in SQL Fiddle:


We see that we have two records:  ('HERP DREP CHS XMS', 1) and ('HELLO WORLD', 2).

Now lets take the Flags and transform them using an XML trick.


;WITH cte(value, id) AS (
  SELECT *
    FROM (
         VALUES
          ('HERP DREP CHS XMS', 1)
         ,('HELLO WORLD', 2)
       ) AS x(a, b)
)
SELECT
  flags.flag.value('.', 'varchar(50)') AS Flag
 ,id AS "Order ID"
 ,ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) AS "KEY"
FROM (
  SELECT
     CAST('<flag>' + REPLACE(value, ' ', '</flag><flag>') + '</flag>' AS XML)
    ,id
  FROM cte
) AS f(x, id)
CROSS APPLY f.x.nodes('flag') AS flags(flag)


We now have this in SQL Fiddle.

http://sqlfiddle.com/#!6/aed15/2

This gives us each Flag with its Order ID (Herp with 1, Derp with 1, CHS with 1, XMS with 1, Hello with 2, and World with 2).

This is almost what we want but we need an unique Key for each Flag across all of the records.  Now in the ETL tool I was using when I faced this problem this week, it needed an unique Key for each of the Flag records that would come out of the SQL used, even though it would be mapping this unique Key to an Identity column in SQL Server.  Yes, this is very dumb but

When tools do dumb things, one must make sure that one does not do dumb things himself. 
-- Me, this blog post

With this in mind, let us create an unique Key in a smart way for each Flag record coming out of our SQL hack.


This gives us the following in SQL Fiddle.

http://sqlfiddle.com/#!6/aed15/3

Now we have an unique Key!

This gives us each Flag with its Order ID and Key (Herp with 1 and 1, Derp with 1 and 2, CHS with 1 and 3, XMS with 1 and 4, Hello with 2 and 5, and World with 2 and 6).

Nice now we can get back to delivering real value to our customers and not just figuring out tools and strange data.

If you want to see other examples using CTEs and SELECT NULL see these other post by me.
http://comp-phil.blogspot.com/2013/02/the-case-of-not-having-unique-identity.html
http://comp-phil.blogspot.com/2013/02/how-to-remove-duplicate-records-using.html