Sunday, January 25, 2015

Reducing Map to Reduce OR All You Need is Fold

"That would reduce these bloody days again"
-- Shakespeare, Richard III
Act V, Scene V, Line 36

Intro


Have you heard the tale of the madman who came running into the town late at night, screaming that all you need is fold?

In Graham Hutton's paper "A tutorial on the universality and expressiveness of fold" it is shown that we can reduce recursive functions into a definition using fold.  Let's take a look reducing map to a fold.

(O-the tale of the madman is about me and not Dr. Hutton.)

Map


Map goes by a few different names, to those who speak Clojure, Haskell, or Underscore.js it goes by the name Map, to those which speak C# it goes by the name Select, those who speak C++ call it Transform, ... we could go on like this for a bit.


The point is that idea of the higher order function Map goes by different names in different language but performs the same function of taking a collection and applying a function against each member and placing it into the resulting collection.


The easiest example that actual does something is to apply an increment function with a Map.  Looking above we see that when we Map the increment function against a collection with the members of 1, 2, 3, 4 we get a resulting collection with the members of 2, 3, 4, 5.

In Clojure and C# it would look like this:



Fold


Fold goes by a few different names too, to those that speak Clojure or Underscore.js it goes by the name Reduce, to those which speak C# it is known as Aggregate, in Haskell it is known as Fold, in C++ it goes by the name Accumulate, ... we could go on like this for a while.  Again the point is that these names are all for the same idea of the higher order function Fold.


A Fold takes a collection and applies one function against each member of the collection and an accumulated result (also known as a memorize).

A very simple example of Fold would be summing.


We see in the image above that we are taking the collection 1, 2, 3, 4 and summing it up to the value of 10.  We do this by taking member 1 and the seed value of 0 and adding them together to get 1.  Next, we take the accumulated value of 1 and add the next member 2 to it giving us 3.  Then, we take the accumulated value of 3 and the member 3 add them and get 6.  Last, we take the accumulated value of 6 and the member of 4 giving us the final result of 10.

In Clojure and C# this would look like the following:



Mapping Map to Fold


... and birds go tweet, what is the point?  Glad you asked, we see with both Map and Fold some similarities.

  1. applying a function against each member of a collection
  2. building a resulting collection
Let's look at the first Map example again.  We see that we applying a function against each member of a collection and are accumulating a resulting collection, this sounds very similar to Fold.


In fact I would say that if we had an empty collection in which we accumulated the results of applying the function against the members of the collection we would have the same functionality.  That was a lot of words lets have some code.

In Clojure and C# we would have the following:


We see that both Map and the Fold give the exact same result.  In both the Clojure and C# code we see that we need an initial empty collection to store the resulting collection in and place the result of applying the function against a member into the resulting collection (using conj in Clojure and Add in C#).  For the C# code since we are using a List to accumulated the resulting collection instead of an array we need to reshape the resulting collection using the ToArray method.

By looking at this simple example we see that we can reduce Map to a Fold and thus all we need is Fold.  :)

Sunday, January 11, 2015

The city of Mapcat

"I have forgot the map."
-- Shakespeare, Henry IV Part I
Act III, Scene I, Line 5.1

Mapcat is an interesting higher order function which is not talked about much.

"You've never danced like this before
We don't talk about it"
-- Milky Chance,
Stolen Dance

The simplest case of Mapcat is when you have a collection of collection and you want to map over the collection of collection yielding a single level of collection.


In the image above we see that we have a collection of two collections (top and bottom) and a function which maps and cats (mapper catter) when these are Mapcatted that yield a result which is a single collection.  The size of the result from the Mapcat is the size of the collection's collections, in the image above we see that we have 4 members of the first collection and 2 members in the second collection giving us the resulting collection with 6 members.

The simplest example of Mapcat would be using the identity function (a function which just returns whatever is given to it).


We see that collection which contains a collection we 1, 2, 3, and 4 and another collection with 20 and 10 when Mapcatted with the Identity function will yield a resulting collection with 1, 2, 3, 4, 20, and 10.



We see in C# we can use the SelectMany to perform this simple Mapcat, while Clojure has a function called mapcat.

This is nice but how about a simple example which actual does something?


How about incrementing the values?



We see that in both C# and Clojure the lambda in the Mapcat is given a collection, so in both cases we need to have a function which can work with a collection.  We have a choice in both C# and Clojure we can either have a Map in the lambda or we can chain the result by removing a level with the identity function follow by a simple Map with an incrementing lambda function.  In Clojure we can do chaining using the double arrow macro (thread last) ->> or just nest the calls.

Saturday, January 3, 2015

On Zipping

"With those legions
Which I have spoke of, whereunto your levy
Must be supplyant: the words of your commission
Will tie you to the numbers and the time
Of their despatch.
"
-- Shakespeare, Cymbeline
Act III, Scene VIII, Lines 12.2 - 16.1

Zipping is a very simple concept from higher order functions which does not get talked about much.

With a zip you have a number of collection which you wish to apply a function against each matching member in order to produce a single "zipped" collection.

In general:

GIVEN a function f

FOREACH x_i in X, y_i in Y, ...
  r_i = f(x_i, y_i, ..)

GIVING r_i in R

The function f is the "zipper", while X, Y, ... are the collections being zipped with R being the zipped result.

A simple example with 2 collections:



Let's use Add as the "zipper", [1, 2, 3] as the first collection, and [10, 20, 30] as the second collection.


We see that this gives us the resulting collection of [11, 22, 33].

What should happen if the first or second collection has a different number of elements?



OR


We'll look at what C#, Clojure, and Haskell do in these cases.



We see that they each have a resulting collection as large as the smallest input collection and simply just ignore the rest of the input in the larger collection.  There you have it a very simple introduction to Zip.

Wednesday, November 26, 2014

Reflections on "One Hacker Way"

"If the sovereign can realize the difficulty of being a sovereign, and the minister realize the difficulty of being a minister, then the government will be well ordered, and the common people will strive diligently after Virtue."

-- The Counsels of the Great Yu
Book of Documents as translated by James Legge

I am interested in anything which gets people's emotions on fire.  When I saw the twitter firestorm around Erik Meijer's presentation "One Hacker Way" I knew I had to see it for myself.


One Hacker Way - Erik Meijer from Reaktor on Vimeo.

It has been a few days since that viewing and overall I have to say he has some good points.

Is this the best presentation I've ever seen?  No.  Is every minute the best possible thing you could do we your time?  No.  Could it make you think about the way you do your work?  Most likely yes.

I got three points from the talk:

  1. We talk too much about code and do not write enough of it.
  2. Code which delivers value to someone is the ultimate goal, everything else is either giving context to the code or noise.
  3. We need to seek real feedback and adjust.
We talk too much about code and do not write enough of it.  Erik starts off the presentation asking his audience at a developer conference how many of them check in code last week?  He then tells those that did not that maybe they should leave.  Harsh but to the point.

There is a lot of talk about code in our industry, there are whole conferences and alliances around how best to get others to write code.  Do not take my word for it, read this blog post titled "Agile is Dead" by Dave Thomas an Agile Manifesto signer.  In his post Dave talks about how the term Agile has become meaningless, instead of following the principles leading to the Agile Manifesto a whole industry has formed to sell Agile.

"When the relationship between superiors and subordinates becomes disordered, at first the subordinates usurp the actuality (i.e. real power), but continue to preserve the name.  Once the usurpation has lasted for some time, though, the name is appropriated and usurped as well."

-- Wang Fuzhi
Commentary on Confucius Analects 13.14
Translated by Edward Slingerland

The term Agile has been usurped and has become meaningless.  At one time it did have a meaning and here is what it was:

Individuals and interactions over processes and tools
Working software over comprehensive documentation
Customer collaboration over contract negotiation
Responding to change over following a plan


-- Manifesto for Agile Software

Instead of prompting the left side of the manifesto the usurpers are selling the right side.  I believe this is is what Erik calls a disease of our industry.  Instead of talking about how to write code which delivers value and seeks feedback faster we are having people talk about tools, certifications, and overhead.

Code which delivers value to someone is the ultimate goal, everything else is either giving context to the code or noise.  This goes straight to the heart of the Agile Manifesto: "Working software over comprehensive documentation" and "Customer collaboration over contract negotiation".  If we are professionals then we need to focus on what the job is.  What is that job being asked of us?  Delivering working software with the features found through customer collaboration.  

We are not paid to move stickies across a board, we are paid to deliver working software which meets the customers needs.  This is what is meant by "Individuals and interactions over processes and tools" in the Agile Manifesto, tools and processes are to be used to reach the goal of working software which delivers value to the customer.  If you find that having stickies helps you in achieving this goal then use them.

"Our highest priority is to satisfy the customer through early and continuous delivery of valuable software."

-- Principles behind the Agile Manifesto

We need to "Responding to change over following a plan".  We need to seek real feedback and adjust.

"Business people and developers must work together daily throughout the project."

"At regular intervals, the team reflects on how to become more effective, then tunes and adjusts its behavior accordingly."


As Heraclitus would say, everything is in motions, everything is becoming, nothing is constant except change itself.  If we cannot change along with our customer's needs then we are dead.

Is "One Hacker Way" the greatest call to action ever?  No.  Do I agree with everything in it?  No.  Do I agree with most of it?  No.  Did I find the three main points valid?  Yes.


Sunday, November 9, 2014

Distinctly Preserving Order in Clojure

"Tell me, you heavens, in which part of his body
Shall I destroy him? – whether there, or there, or there? –
That I may give the local wound a name,
And make distinct the very breach whereout
Hector's great spirit flew: answer me, heavens!
"
-- Shakespeare, Troilus and Cressida
Act IV, Scene V, Lines 242-246

4clojure problem number 56 states:

Write a function which removes the duplicates from a sequence. Order of the items must be maintained.

The trick is you cannot use distinct.

We are giving the following test cases:

(= (__ [1 2 1 3 1 2 4]) [1 2 3 4])

(= (__ [:a :a :b :b :c :c]) [:a :b :c])

(= (__ '([2 4] [1 2] [1 3] [1 3])) '([2 4] [1 2] [1 3]))

(= (__ (range 50)) (range 50))

Let's try using group-by and get the first value of each group.

(fn [xs] (->> (group-by identity xs) (map first)))

This works fine for all but the last test case.  It seems that group-by does not preserve order.

(type (group-by identity (range 50)))
;; clojure.lang.PersistentHashMap

The PersistentHashMap in Clojure is implemented as a wide-tree with each node having up to 32 children (for more information see this excellent post by Karl Krukow or this excellent StackOverflow answer from Daniel Janus).

At this point we need to figure out a way to reorder the output from the group-by and first to what was given in the original input.  We can use sort-by but we need to use something that will give us the original input, this is were .indexOf saves the day.

We can use Java's .indexOf wrapped in a function to give sort-by an order which corresponds with the original input.  Using .indexOf will allow us to resort the output of the group-by followed by the first mapping to order that was in the original input, giving us this final function which preserve order and gives distinct values.

  (fn [xs] 
      (->> 
           (group-by identity xs) 
           (map first) 
           (sort-by #(.indexOf xs %))))

This passes all of the 4clojure's test cases.