Sunday, November 9, 2014

Distinctly Preserving Order in Clojure

"Tell me, you heavens, in which part of his body
Shall I destroy him? – whether there, or there, or there? –
That I may give the local wound a name,
And make distinct the very breach whereout
Hector's great spirit flew: answer me, heavens!
"
-- Shakespeare, Troilus and Cressida
Act IV, Scene V, Lines 242-246

4clojure problem number 56 states:

Write a function which removes the duplicates from a sequence. Order of the items must be maintained.

The trick is you cannot use distinct.

We are giving the following test cases:

(= (__ [1 2 1 3 1 2 4]) [1 2 3 4])

(= (__ [:a :a :b :b :c :c]) [:a :b :c])

(= (__ '([2 4] [1 2] [1 3] [1 3])) '([2 4] [1 2] [1 3]))

(= (__ (range 50)) (range 50))

Let's try using group-by and get the first value of each group.

(fn [xs] (->> (group-by identity xs) (map first)))

This works fine for all but the last test case.  It seems that group-by does not preserve order.

(type (group-by identity (range 50)))
;; clojure.lang.PersistentHashMap

The PersistentHashMap in Clojure is implemented as a wide-tree with each node having up to 32 children (for more information see this excellent post by Karl Krukow or this excellent StackOverflow answer from Daniel Janus).

At this point we need to figure out a way to reorder the output from the group-by and first to what was given in the original input.  We can use sort-by but we need to use something that will give us the original input, this is were .indexOf saves the day.

We can use Java's .indexOf wrapped in a function to give sort-by an order which corresponds with the original input.  Using .indexOf will allow us to resort the output of the group-by followed by the first mapping to order that was in the original input, giving us this final function which preserve order and gives distinct values.

  (fn [xs] 
      (->> 
           (group-by identity xs) 
           (map first) 
           (sort-by #(.indexOf xs %))))

This passes all of the 4clojure's test cases.