Tuesday, February 8, 2011

Functional Thinking Part 1: Looping Alternatives

I intend this to be the first in a multi-part series explaining some of the advantages of functional programming.  For all these parts, I'm generally going to be using Scala, because of its similarity to Java (which should be fairly well-known).  That said, I'm not going to try get too Scala-specific.  These techniques should transfer over to other functional languages easily, and may even transfer over to non-functional languages. 

This part is going to be on various alternatives to the general imperative looping constructs (for( ;; ){}, while(){}, and do{} while();).

In imperative languages, the aforementioned basic looping constructs are your bread and butter.  They do just about anything and everything related to looping.  There are literally an infinite number of problems that require looping of the sort, yet they can all be broken down into one of several simple constructs.  Cool.  In a Zen sort of way.

However, this implies that there is a lot of overlap between the different problems.  Inevitably, this leads to repetitive code, which leads to bloat, lost time, and boredom.  We can do better.

"If you don't want looping constructs, then you must mean you want recursion.  That doesn't always make things prettier, you know!"  Hmm...not quite.  Recursion is kind of the functional equivalent of the for, while, etc. loops.  Both its strength and its weakness is that it is so general.  With recursion, it might be easier to infer code correctness, but it won't help on the repetition.

What kind of repetition?  Well, lets consider loops that iterate over elements of lists/arrays.  This is a very common problem.  So common, in fact, that many languages have ways to shorten up the loops that must traverse them.  For instance, Java has the for-each loop, like so:

// assume "list" is a List of Integers
for( Integer current : list ) {}

This is much cleaner than a more general while loop that grabs elements from an Iterator, not to mention shorter.  But this isn't what I'm talking about.  I'm talking about what people then use the loop body for.  Consider the following:

// assume "list" is a List of Integers
for( Integer current : list ) {
  if ( current.intValue() > 3 ) {
    ...
  }
}

The nested "if" adds a bit of complexity.  We go over every element of the list, but we only want to do the "real" body for every element that is greater than 3.  This is known as filtering, where we filter out only those elements which have some property that is of interest (in this case, only those > 3).  This is a very common operation.

In a typical imperative language, this is often taken as-is.  Yeah, the if forces the code to become more indented, and it's a little long-winded, but there isn't a lot that can be done with it.  However, in functional languages, where there is typically a means of very concisely writing and passing functions as parameters, this is not so.  One can instead write code for a general filtering loop, which will filter out all the items that match a given predicate function.  

In Scala, this operation is known as, well, filter.  The above Java code fragment instead becomes:

list.filter( _.intValue() > 3 )

This is clearly much shorter.  This returns a new list in which each element is greater than 3.  

Generally, the type of the filter operation is "( (A -> Boolean ),  [A] ) -> [A]", where "A" is a type, and "[A]" refers to a list of A.  The general contract is that the size of the returned list is <= the size of the original list, and that the order has been preserved.

Consider another general problem.  Given a list, one performs some processing over every element of the list.  This processing results in the generation of a new list of the same size, where the nth element of the new list was generated by processing the nth element in the original list.  

Explained with a bit more Java code:
// assume "list" is a List of Integers
List< Integer > newList = new LinkedList< Integer >();
for( Integer current : list ) {
  newList.add( new Integer( current.intValue() + 1 ) );
}

"newList" results from adding one to each element of the given list "list".  Granted, this is a simple, pretty useless example, but this is also a common real-world operation.  

In functional programming land, this type of operation is known as "map", as one establishes a mapping between input and output.  In scala, the above Java code looks like:
list.map( _.intValue() + 1 )

This returns a new list in the same way as before, only in a much clearer way.  The type of the map operation is usually ((A -> B), [A]) -> [B], where one creates a list of the return type of the given function.  The general contract is that the resulting list is the exact same length as the input list, and that the nth element in the result list resulted from applying the given function to the nth element in the first list (i.e. ordering is preserved).  

There is a third type of general loop: a reduction.  This is when one processes over every element of a list, processing each element in a certain way.  Once all the looping is complete, there is a single result.  For instance, summing the elements in a list is a reduction.  Getting the maximal/minimal element in a list is also a reduction.  

Reductions sound tricky.  They are different across the board.  However, many times these can be simplified, too.  Consider the following Java code, which sums all the elements of the given list:

 // assume "list" is a List of Integers
int retval = 0;
for ( Integer current : list ) {
 retval += current.intValue();
}

This code is surprisingly concise for what it does.  For one, 0 acts as an initial value.  It doesn't change anything to add zero, and it doubles as a default value for the case where there are no elements in the list.  It sounds like this would be difficult to generalize.

However, it doesn't actually add this much more complication: it just means the addition of a parameter.  Any general operation must take an initial value as a parameter.

In functional programming, this is called a fold.  One can fold both right and left, referring to the order in which we traverse elements.  (One can go left to right, a left fold, or right to left, a right fold.  Left folds start at the front of the list, and right folds start at the back of the list.)  The following Scala code is an equivalent version of the above Java snippet:

list.foldLeft( 0 )( _ + _ )

This simply takes 0 as an initial value, and instructs it to add each element together.  The item to the left of the "+" refers to the result as we are building it, and the item on the right is the current list element.  The type of the fold operation is generally ( (A,  A) -> A, A, [A] ) -> A, where one is given a function that reduces two elements into one element (of the same type), an initial value, and the list, returning a single element.  

Note that there is very similar operation called reduce, which doesn't take an initial element.  When there is only one element in the list, it returns that element.  When there are none, this is considered an error, and Scala will throw an exception under such conditions.  This is good for when there is no good default value, and one expects there to be at least some data.
This has just scratched the surface on functional looping constructs.  Scala alone has several more, and I bet that there are a variety of other, less common ones that are not implemented in Scala.  Maybe I'll go into detail about those later.

As a brief aside, to any C# fans, some of this may sound quite familiar to certain operations in LINQ.  You're absolutely right.  "filter" is equivalent to LINQ's "where", "map" is equivalent to "select", and "reduceLeft" is equivalent to "aggregate".  LINQ is actually based on functional programming ideas, which is why such operations are there.  Pretty cool, and pretty useful.

No comments:

Post a Comment