Sunday, January 10, 2016

Data Structures Matters

"I see thee yet, in form as palpable"
-- Shakespeare, Macbeth
Act II, Scene I, Line 31 

One of the katas we like to use at work while interviewing candidates is the Roman Numeral kata.  It is a simple kata in which you create a function which takes an integer and gives you back a string representing the roman numeral representation of the number.  It is fairly simple to explain but is deep enough to get a good idea of how the person works.

Here is a possible solution to the Roman Numeral kata in C#.



Looking at this solution we see a few things.  The solution is using C# 6.0 syntax but beyond that we see three major things.

  1. translation are being applied using the higher order function Fold (in C# Aggregate)
  2. mutation of the input value is used to maintain state through iterations
  3. translations are held in a Dictionary
Let us look at the Dictionary data structure which is the main focus of this post.  In general a Dictionary is not ordered, now some implementations are ordered in the order in which the key value pairs are inserted (this is the case with the C# implementation), but this is not a property of the Dictionary data structure (see also the NIST entry).

Here is a possible solution to the Roman Numeral kata in Clojure.


Looking at this solution we see a few things.
  1. translations are being applied using the higher order function Fold (in Clojure reduce
  2. hash maps are used to maintain state through iterations
  3. translations are held in a vector of vectors
We see that we cannot use a Dictionary (called maps in Clojure) to hold the translation.  Clojure's HashMap uses Phil Bagwell's Hash Array Mapped Trie which makes no claim about order, in fact we see that maps (and sets) are unordered while vectors (lists and seq) are ordered.  Since our algorithm requires that the translations be in the order we insert them we must use a data structure which preserves this order.  Thus we must used something that is ordered like a vector.

How about F#?  Here is a possible solution to the Roman Numeral kata in F#.


Looking at this solution we see a few things.
  1. translations are being applied using the higher order function Fold (using F#'s List.fold)
  2. tuples are used to maintain state through iterations
  3. translations are held in a List
For the same reasons that we found in the Clojure solution we cannot use a Dictionary (called a Map in F#) since they are ordered by F#'s generic comparison and not in the ordered in which they were inserted.  Luckily we can use a List which is ordered.

There you have it, the Roman Numeral kata in three different languages.  Despite being a simple kata we have learned something about the Dictionary data structure, mainly that it does not have an order and thus should not be used in cases in which order matters.