Sunday, February 10, 2013

The Case of Not Having an Unique Identity

"Always remember that you're unique. Just like everyone else."
-- Alison Boulter

Every now and then it happens, you get a data set that has been corrupted and does not have a unique identity column.  What to do?  Well you can copy the data into a temp table that has an identity, truncate the table that it is in, and then copy the data from the temp table back into the original table.  This will work but it would take a long time to execute on a large data set and it assumes you can run truncate (otherwise you would have to run delete and fill the transaction log).

Your other option would be to come up with a statement that would assign a unique identity to each row and then use this to update the table.  Luckily the windowing function ROW_NUMBER can be use for just this task.

Say you have the following table:


CREATE TABLE dbo.PurchaseOrderDetail
(
    PurchaseOrderID int NOT NULL,
    LineNumber smallint NOT NULL,
    ProductID int NULL,
    UnitPrice money NULL,
    OrderQty smallint NULL
);


And the following data:


INSERT INTO dbo.PurchaseOrderDetail
(
    PurchaseOrderID,
    LineNumber,
    ProductID,
    UnitPrice,
    OrderQty
)
VALUES (1, 110, 201, 10.99, 3),
       (1, 112, 2, 5.98, 1),
       (1, 113, 21, 11.99, 2),
       (1, 220, 1, 1.99, 5),
       (1, 210, 1, 1.99, 6),
       (1, 410, 201, 10.99, 7),
       (1, 50, 201, 10.99, 31),
       (1, 500, 301, 99.99, 1),
       (1, 116, 21, 11.99, 1),
       (1, 100, 2, 5.98, 1),
       (1, 10, 201, 10.99, 1);


As you can see there is an issue, the PurchaseOrderID has been corrupted.  Its value is always 1.  It should have a unique identity.  Let's use the ROW_NUMBER function in a CTE to give us an unique identity for each row.


WITH cte
AS
(
  SELECT
     PurchaseOrderID
    ,ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) AS num
    FROM dbo.PurchaseOrderDetail
)


We use a little trick of ORDERing BY SELECT NULL since we do not care about order in which we assign the row numbers and we need to have an ORDER BY for ROW_NUMBER.

Now all we need to do is use this CTE in an UPDATE to change the PurchaseOrderID to ROW_NUMBER value.


WITH cte
AS
(
  SELECT
     PurchaseOrderID
    ,ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) AS num
    FROM dbo.PurchaseOrderDetail
)
UPDATE cte
  SET PurchaseOrderID = num;


With this one simple statement we have just assigned a unique identity to the PurchaseOrderID.  Now quickly ALTER the table and make it a primary key, so this does not happen again!

This whole solution can be found on SQL Fiddle here.