Saturday, February 23, 2013

Upserting in SQL Server

"There is nothing more uncommon than common sense."
--- Frank Lloyd Wright

"Less is more."
-- Ludwig Mies van der Rohe

It happens, your on a project and some architect comes up with some "great" idea.  Something along the lines of, "Instead of deleting data we'll do a "soft" delete, that way if we need to unefface the data will not have to go to the backed up source data or the data warehouse or one of the many other copies we have".  There you are, stuck living with faking a deletion of data, adding unnecessary complexity  to your system.



One of the great things about a "soft" delete is now you need to check if the record already exist as a "soft" deleted record before you can insert any new data.  Great.  So how do you do this?

The wrong way to do it would be to do a SELECT and then either an INSERT or UPDATE based on the results of the SELECT.  This causes two different interaction with the database.

The better way to handle a "soft" delete is to use a MERGE for an Upsert.

Say you have the following tables.


CREATE TABLE dbo.stage
(
  id [int] NOT NULL,
  first_name [varchar] (100) NOT NULL,
  last_name [varchar] (100) NOT NULL
);

CREATE TABLE dbo.party
(
  id [int] NOT NULL,
  first_name [varchar] (100) NOT NULL,
  last_name [varchar] (100) NOT NULL,
  is_deleted [char] DEFAULT('N') NOT NULL
);


With the following data.


INSERT INTO dbo.stage
(
  id
 ,first_name
 ,last_name
)
VALUES
  (1, 'Mike', 'Harris')
 ,(2, 'Kelsey', 'Harris')
 ,(3, 'Jack', 'Harris')
;

INSERT INTO dbo.party
(
  id
 ,first_name
 ,last_name
 ,is_deleted
)
VALUES
  (1, 'Mike', 'Harris', 'Y')
 ,(2, 'Kelsey', 'Hopkins', 'N')
 ,(4, 'Tom', 'Smith', 'N')
;


Staging
IDFIRST_NAMELAST_NAME
1MikeHarris
2KelseyHarris
3JackHarris

Party with "soft" delete
IDFIRST_NAMELAST_NAMEIS_DELETED
1MikeHarrisY
2KelseyHopkinsN
4TomSmithN

Looking at the data above, we see that the record with ID value of 1 has been "soft" deleted.

Say we want to merge the staging data into the Party data.  Assuming we have a full data set, we can "delete" any of the records not found in the staging data.

In order to merge the staging data into the Party data with "soft" deletes we would use the following MERGE statement.


MERGE dbo.party AS TARGET
USING dbo.stage AS SOURCE
  ON TARGET.id = SOURCE.id
WHEN MATCHED
  AND (TARGET.first_name <> SOURCE.first_name
       OR TARGET.last_name <> SOURCE.last_name)
   OR TARGET.is_deleted = 'Y' THEN
  UPDATE
    SET TARGET.first_name = SOURCE.first_name
       ,TARGET.last_name = SOURCE.last_name
       ,TARGET.is_deleted = 'N'
WHEN NOT MATCHED BY TARGET THEN
  INSERT (id, first_name, last_name)
  VALUES (SOURCE.id, SOURCE.first_name, SOURCE.last_name)
WHEN NOT MATCHED BY SOURCE THEN
  UPDATE
    SET TARGET.is_deleted = 'Y'
;

The statements in bold are what make this MERGE statement use "soft" deletes.  On the WHEN MATCHED we want to include an OR TARGET.is_deleted = 'Y' (or whatever kind of delete indicator you are using), this will allow for an UPDATE to take place when the record that is being inserted is a "soft" deleted record.  The WHEN NOT MATCHED BY SOURCE THEN must do an UPDATE with SET TARGET.is_deleted = 'Y' (or whatever kind of delete indicator you are using) instead of the typical DELETE.

Party with "soft" delete
IDFIRST_NAMELAST_NAMEIS_DELETED
1MikeHarrisN
2KelseyHarrisN
4TomSmithY
3JackHarrisN

In the table above, Mike has been un-"soft" deleted, Kelsey's last name has changed to Harris, Tom has been "soft" deleted, and Jack has been inserted into the table.

Well, that is how one does a "soft" delete in SQL Server.

I used SQL Fiddle to test this out, the sample code can be found here.