Mind Your Language

Saturday, February 25, 2012

Anatomy of a One-Liner: FizzBuzz in Haskell

In the NashFP group, a gang of functionally inclined Nashville area geeks recently organized by Bryan Hunter, we decided to share implementations of the classic FizzBuzz problem in various functional languages. I took my first shot in Scheme and then decided it was time to dust off my Haskell. On my first attempt I waded through the syntax enough to get something working, but it included several named functions and was clearly more code than was necessary to solve the problem. As I began to look for ways to shorten it, I remembered that a year or two ago Bryan had written a FizzBuzz implementation in Erlang that was short enough to tweet. Obviously this became my new goal.

I managed to whittle it down below 140 characters, but its readability suffered quite a bit. I thought it might be fun to dissect here bit by bit, and it might even come out shorter and more readable. I keep learning new tricks in Haskell and in the general functional approach that offer new opportunities to make my code shorter and more streamlined. You might learn some of these here, or if you have any pointers to offer me in comments, I would love to learn some from you as well.

In case you're not familiar with FizzBuzz, it is a programming exercise in which goal is to list integers from 1 to 100, replacing those divisible by 3 with the term "Fizz," those divisible by 5 with the term "Buzz," and those divisible by both 3 and 5 with the term "FizzBuzz."

For this exercise I will be using my favorite IDE, the REPL, in this case GHCI, the Glasgow Haskell Compiler Interactive.

Let's start with the input range. We ultimately want to go from 1 to 100, but let's start with 1 to 15 just for brevity of output while we evolve this thing.

Prelude> [1..15]
[1,2,3,4,5,6,7,8,9,10,11,12,13,14,15]

To apply a function to each member of this list, we could use the map function or a list comprehension, which should be shorter, so let's try that. We'll start with a simple identity comprehension, which will do nothing for us. This step is just about making ourselves a place to do some work.

Prelude> [x | x <- [1..15]]
[1,2,3,4,5,6,7,8,9,10,11,12,13,14,15]

Let's try the Fizz substitution for multiples of 3. Note that the else case needs the show function to convert the integer to a string since Haskell lists must be homogenous.

Prelude> [if mod x 3 == 0 then "Fizz" else show x | x <- [1..15]]
["1","2","Fizz","4","5","Fizz","7","8","Fizz","10","11","Fizz","13","14","Fizz"]

Now for the Buzz case for multiples of 5:

Prelude> [if mod x 3 == 0 then "Fizz" else if mod x 5 == 0 then "Buzz" else show x | x <- [1..15]]
["1","2","Fizz","4","Buzz","Fizz","7","8","Fizz","Buzz","11","Fizz","13","14","Fizz"]

This works up through 14, but I hate to add another case, and it's already getting too long due to repetitive code. Under normal circumstances I might look to extract a method here, but we're working on a one-liner, so let's try another way. The factor mappings must be stated, and the most concise way I know is as a list of tuples.

Prelude> [(3,"Fizz"),(5,"Buzz")]
[(3,"Fizz"),(5,"Buzz")]

Now we need a way to iterate through this list, run the divisibility test, and make the appropriate substitutions. First off we need to give the mapping list a chance to dance with each element in the list of integers. We can do this by tucking it into a tuple with the integer inside the comprehension.

Prelude> [(x,[(3,"Fizz"),(5,"Buzz")]) | x <- [1..15]]
[(1,[(3,"Fizz"),(5,"Buzz")]),(2,[(3,"Fizz"),(5,"Buzz")]),(3,[(3,"Fizz"),(5,"Buzz")]),(4,[(3,"Fizz"),(5,"Buzz")]),(5,[(3,"Fizz"),(5,"Buzz")]),(6,[(3,"Fizz"),(5,"Buzz")]),(7,[(3,"Fizz"),(5,"Buzz")]),(8,[(3,"Fizz"),(5,"Buzz")]),(9,[(3,"Fizz"),(5,"Buzz")]),(10,[(3,"Fizz"),(5,"Buzz")]),(11,[(3,"Fizz"),(5,"Buzz")]),(12,[(3,"Fizz"),(5,"Buzz")]),(13,[(3,"Fizz"),(5,"Buzz")]),(14,[(3,"Fizz"),(5,"Buzz")]),(15,[(3,"Fizz"),(5,"Buzz")])]

The resulting list can now be used as the input for another list comprehension, where we can take another shot at applying our divisibility tests. Once again we will start with an identity comprehension just to get our terms in place before we put them to work. Since each element of the list is a 2-tuple, we can go ahead and give each member a variable, f for factor and n for name.

Prelude> [(x,[(f,n) | (f,n) <- [(3,"Fizz"),(5,"Buzz")]]) | x <- [1..15]]
[(1,[(3,"Fizz"),(5,"Buzz")]),(2,[(3,"Fizz"),(5,"Buzz")]),(3,[(3,"Fizz"),(5,"Buzz")]),(4,[(3,"Fizz"),(5,"Buzz")]),(5,[(3,"Fizz"),(5,"Buzz")]),(6,[(3,"Fizz"),(5,"Buzz")]),(7,[(3,"Fizz"),(5,"Buzz")]),(8,[(3,"Fizz"),(5,"Buzz")]),(9,[(3,"Fizz"),(5,"Buzz")]),(10,[(3,"Fizz"),(5,"Buzz")]),(11,[(3,"Fizz"),(5,"Buzz")]),(12,[(3,"Fizz"),(5,"Buzz")]),(13,[(3,"Fizz"),(5,"Buzz")]),(14,[(3,"Fizz"),(5,"Buzz")]),(15,[(3,"Fizz"),(5,"Buzz")])]

Now we can add a filter to the new comprehension to get only the pairs whose factor evenly divides the integer.

Prelude> [(x,[(f,n) | (f,n) <- [(3,"Fizz"),(5,"Buzz")],mod x f == 0]) | x <- [1..15]]
[(1,[]),(2,[]),(3,[(3,"Fizz")]),(4,[]),(5,[(5,"Buzz")]),(6,[(3,"Fizz")]),(7,[]),(8,[]),(9,[(3,"Fizz")]),(10,[(5,"Buzz")]),(11,[]),(12,[(3,"Fizz")]),(13,[]),(14,[]),(15,[(3,"Fizz"),(5,"Buzz")])]

That's closer, but that comprehension is still returning the factor-name pair, and we're only interested in the name, so let's make that adjustment.

Prelude> [(x,[n | (f,n) <- [(3,"Fizz"),(5,"Buzz")],mod x f == 0]) | x <- [1..15]]
[(1,[]),(2,[]),(3,["Fizz"]),(4,[]),(5,["Buzz"]),(6,["Fizz"]),(7,[]),(8,[]),(9,["Fizz"]),(10,["Buzz"]),(11,[]),(12,["Fizz"]),(13,[]),(14,[]),(15,["Fizz","Buzz"])]

This is starting to bear some resemblance to the result we're after, but notice that our Fizzes and Buzzes are still coming out in lists. We might grab the head of the list, but the FizzBuzzes have two elements, and we don't want to drop one. We need to concatenate them, which we can do with the concat function.

Prelude> [(x,concat [n | (f,n) <- [(3,"Fizz"),(5,"Buzz")],mod x f == 0]) | x <- [1..15]]
[(1,""),(2,""),(3,"Fizz"),(4,""),(5,"Buzz"),(6,"Fizz"),(7,""),(8,""),(9,"Fizz"),(10,"Buzz"),(11,""),(12,"Fizz"),(13,""),(14,""),(15,"FizzBuzz")]

Now we have a list of tuples, each containing one value that we want and another that we don't. First we'll add a show to the x to make them the same type so we can compare them.

Prelude> [(show x,concat [n | (f,n) <- [(3,"Fizz"),(5,"Buzz")],mod x f == 0]) | x <- [1..15]]
[("1",""),("2",""),("3","Fizz"),("4",""),("5","Buzz"),("6","Fizz"),("7",""),("8",""),("9","Fizz"),("10","Buzz"),("11",""),("12","Fizz"),("13",""),("14",""),("15","FizzBuzz")]

Now that the elements in the tuples are the same type, we can use the max function to make the choice between them.

Prelude> [max (show x) (concat [n | (f,n) <- [(3,"Fizz"),(5,"Buzz")],mod x f == 0]) | x <- [1..15]]
["1","2","Fizz","4","Buzz","Fizz","7","8","Fizz","Buzz","11","Fizz","13","14","FizzBuzz"]

Now all that's left to do is to remove some readability spaces to squeeze it down and blow up our input range to give us 100 inputs instead of 15.

Prelude> [max(show x)(concat[n|(f,n)<-[(3,"Fizz"),(5,"Buzz")],mod x f==0])|x<-[1..100]]
["1","2","Fizz","4","Buzz","Fizz","7","8","Fizz","Buzz","11","Fizz","13","14","FizzBuzz","16","17","Fizz","19","Buzz","Fizz","22","23","Fizz","Buzz","26","Fizz","28","29","FizzBuzz","31","32","Fizz","34","Buzz","Fizz","37","38","Fizz","Buzz","41","Fizz","43","44","FizzBuzz","46","47","Fizz","49","Buzz","Fizz","52","53","Fizz","Buzz","56","Fizz","58","59","FizzBuzz","61","62","Fizz","64","Buzz","Fizz","67","68","Fizz","Buzz","71","Fizz","73","74","FizzBuzz","76","77","Fizz","79","Buzz","Fizz","82","83","Fizz","Buzz","86","Fizz","88","89","FizzBuzz","91","92","Fizz","94","Buzz","Fizz","97","98","Fizz","Buzz"]

78 characters. Not too bad. Please let me know if you see where I could shave off some more.

Saturday, January 28, 2012

Roman Numerals Kata in Ruby, Take II

Last week I posted a screencast of my take on the Roman Numerals kata in Ruby. It was Corey Haines who encouraged me to do that, and he was gracious enough to take the time to watch it and offer some suggestions. We both thought it might be interesting if I did another iteration of it and discussed the changes. His two main concerns were that (1) it would have been nice to see the pattern extracted sooner and (2) there must be a way to introduce the zero case without blowing up all the tests.

Here is the first video for reference:

It would have been nice to see the pattern extracted sooner. In the first version, I let quite a few similar cases accumulate before making the inconsequential adjustments necessary to cultivate the uniformity that opened the door to bringing them together in the loop body (about 6 minutes in). The kata process of solving the same problem over and over has led me to a question I have not faced before: how much of the knowledge acquired in previous runs is it fair to apply in the current one? When do we rehearse the discovery process, and when do we simply benefit from the discovery? I've got a feeling I could spend the rest of my life pondering this question.

There must be a way to introduce the zero case without without blowing up all the tests. This happens around 6:42. I told Corey that this one was a matter of honesty. I knew that introducing the recursion without the zero case would blow everything up because I had seen it happen before. But I thought coding for it without first seeing the test failure felt like cheating, like skipping the first step in the red-green-refactor dance. After giving it more thought, I see where I was off. Only new tests should be red. The red that I allowed to happen was not a new test but a bad refactoring. Yes, test coverage is our safety net while we refactor, but still we should never intentionally fall off the trapeze.

It was short-sighted to introduce the recursion without the trivial exit case in the first place. We all know that such a case is a necessary part of recursion, and it is not cheating to introduce it in a refactoring without waiting for everything to blow up first. I introduced the recursion much earlier this time (1:45). Granted, I knew unfairly that was the direction I would go eventually, but still I coded only what was necessary to get to green. This change in approach addressed both problems; it allowed the pattern to be extracted earlier, and I never broke all my tests. As an added bonus, this also enabled me to shave three minutes off the time.

Here is the new version is set to the fourth movement of Mendelssohn's Piano Trio No. 2 in C minor:

Saturday, January 21, 2012

Geek Ballet: Roman Numerals Kata in Ruby

For a while now I've been a fan of code kata screencasts set to music. The musical accompaniment accentuates how beautiful code can be, and frames it to be appreciated as an artistic endeavor, even though it is still a logical exercise. I think of it as geek ballet. Corey Haines first introduced me to this art form, and he recently encouraged me to try one of my own, so here it is. It was more work than I expected, but it was good fun, and I learned a lot. In the same way that writing about any subject forces you to understand it better, demonstrating how to practice forces you to practice more.

In this video I'm using Ruby with RSpec, AutoTest, and Growl notifications. The music is the fourth movement of my favorite symphony, Dvorak's 9th in E minor.

Enjoy.

Sunday, January 8, 2012

Set-Based Operations: They’re Not Just For Databases

What do you think of when you hear the term set-based operations? I have always thought of that as a database concept. Set-based operations address or operate on multiple data elements, seemingly in parallel, as opposed to iterating through and executing operations one by one. I was introduced to this concept in the context of SQL in relational databases, and it was a struggle at first. My brain was more naturally predisposed to think in terms of iterating over a list, doing one thing at a time.

I remember first running into this mismatch of approaches writing my first SQL Server triggers. (Let's pretend for the moment that this is not a holy war topic.) I intuitively wanted to do a particular thing with each row that was inserted into a table. I wanted to deal with the inserted column values as scalar values.

update inserted_row set full_name = @first_name + ' ' + @last_name

My problem was that since an INSERT can affect multiple rows, and a trigger fires once per INSERT, an INSERT trigger has to handle all the inserted rows at once. So of course the mechanism provided is the INSERTED virtual table rather than a group of variables like I wanted.

update people
set full_name = people.first_name + ' ' + people.last_name
from inserted join people on inserted.id = people.id

I had to learn to think in terms of sets instead of items. In my group of very young and inexperienced developers, there were several that really struggled with making this leap. This was in the good ole days before SQL Server supported cursors. (Sorry, I'm really not to trying to incite a riot here.) So for quite a few years there, I thought of executing operations on multiple data elements as being done in a set-based manner in SQL,

select sum(salary) from employees

and iteratively in the other languages I used, such as VB, JavaScript, C#, etc.

var total = 0.0;
foreach (var employee in employees)
    total += employee.Salary;

Then along came lambda expressions. This is of course faulty chronology, since lambda expressions were introduced well before I was born. But it was years later that dear Ruby introduced me to the beauty of lambdas. They afford us the ability to specify what needs to be done with each element without bothering ourselves with the particulars of crawling through lists.

employees.map{|employee| employee.salary}.inject(:+)

And of course they found their way into .NET as well.

employees.Sum(employee => employee.Salary)

This syntax is more concise, more declarative, more about intention than implementation. This is all good, but is it anything more than syntactic sugar? I'm a syntactic sweet-tooth myself, but there is more value here than that. We have separated the concern of getting the data we want, from that of dealing with a particular looping construct or iteration implementation. The same lambda can be used regardless of whether we want to iterate through the list serially, spawn threads to execute in parallel, wrap the operations in a transaction, or whatever.

I have recently been programming some matrix math, which has got me thinking about set-based or matrix-based operations as opposed to the more conventional iterative approach. Your geek core might be harder than mine, but I had not used matrices and vectors since high school. In case you don't want to go do any additional reading on matrices, here are a few insultingly simple examples that should be fairly intuitive to read:

1. Multiply a matrix and a scalar:

| 3  1  5 |                   | 6  2 10 |
| 0  2  2 |    *    2    =    | 0  4  4 |
| 6  3  4 |                   | 12 6  8 |

2. Multiply a column vector and a row vector:

| 2 |                             | 2  4  6 |
| 0 |    *    | 1  2  3 |    =    | 0  0  0 |
| 4 |                             | 4  8 12 |

3. Multiply a row vector and a column vector (note matrix multiplication is not commutative):

                    | 2 |
| 1  2  3 |    *    | 0 |    =    | 14 |
                    | 4 |

I was pleasantly surprised to find matrix support libraries for pretty much every language I thought to check for them. Just for fun, let's look at two ways to solve each of these simple examples, each with a different language.

Example 1 in Ruby

Iterative approach:

matrix = [[3, 1, 5], [0, 2, 2], [6, 3, 4]]
for i in 0..2
  for j in 0..2
    matrix[i][j] *= 2
  end
end
# matrix => [[6, 2, 10], [0, 4, 4], [12, 6, 8]]

Matrix approach using the core Matrix library:

require 'matrix'
matrix = Matrix.rows([[3, 1, 5], [0, 2, 2], [6, 3, 4]])
p matrix * 2 #=> Matrix[[6, 2, 10], [0, 4, 4], [12, 6, 8]]

Example 2 in C#

Iterative approach:

int[] column = {2, 0, 4};
int[] row = {1, 2, 3};
var product = new int [3, 3];
for (var i = 0; i < 3; i++)
    for (var j = 0; j < 3; j++)
        product[i, j] = column[i] * row[j];
// product => {{2, 4, 6}, {0, 0, 0}, {4, 8, 12}}

Matrix approach using the Math.NET Numerics library:

var column = new DenseMatrix(3, 1);
column.SetColumn(0, new double[] {2, 0, 4});

var row = new DenseMatrix(1, 3);
row.SetRow(0, new double[] {1, 2, 3});

var product = column * row;
// => 2,4,6
//    0,0,0
//    4,8,12

Example 3 in Python

Iterative approach:

row = [1, 2, 3]
column = [2, 0, 4]
product = 0

for i in range(3):
    product += column[i] * row[i]

# product => 14

Matrix approach using the NumPy library:

from numpy import *

row = matrix('1 2 3')
column = matrix('2; 0; 4')
product = row * column #=> [[14]]

You can see how this approach eliminates a layer or two of loops from your code as well as the accumulation code of adding up sums or whatever. There may not be a huge benefit in these simple cases, but when addressing problems of more realistic complexity this can work wonders for both readability and performance. I have used these matrix constructs mostly in the context of algebraic summations over matrix elements, which are used heavily in various machine learning algorithms such as linear regression and neural networks.

The next time you start to write some looping code, take a second to think about whether there might be another way to go. Chances are that you can accomplish the same thing with less code that will read better and often run faster.

Saturday, December 31, 2011

Problems Already Solved

Somewhere I recently heard someone say something like, "Programming is easy; all the interesting problems have already been solved by mathematics." I was a bit taken aback by this at first, but it got me thinking about what problems I am solving when I write code. When I'm programming, am I more often addressing a real-world problem, or some kind of software problem? How much of my code is spent on converting, transforming, serializing, deserializing, logging, formatting, validating, and error handling? Usually too much to be very interesting. It's all plumbing, but what I really want is water.

If I look at the systems I have worked on professionally, an alarming number of them do exactly the same thing, regardless of the business or the problem domain. It's just setting up a place to store data, a way to present it to users, and a way to translate it back and forth. There is still the question of where and how to build in domain logic and intelligence, and that's where most of the interesting debates take place, but this is still a woefully small slice of the pie. Maybe this is why I love writing command line programs. Come to think of it, maybe this is also why I love writing unit tests. There's no user interface, no data persistence, just logic. Okay, you're right, there is still setup and teardown plumbing, but it's not too hard to keep that stuff separate and out of the way. I also love working through the Project Euler problems. The programs feel pure. Just find the answer, and that's it.

I recently completed an online course in machine learning from Stanford Engineering. I loved it. It covered common applications of machine learning such as reading human handwriting, recognizing objects in images, and recommending movies and music. These applications are driving major features of Gmail, Facebook, Amazon, Netflix, and Pandora, to name a few. I was surprised to find that most of the course focused on math. The programming part was not much more than an afterthought. We did programming exercises in a language called Octave, but the work could have been done in any language. These problems are solved by math, not by programming. The programming just does the math. It's a lot like the role your calculator plays in your calculus class. You solve the problems; the calculator just does the math.

I was intrigued by the scene in the The Social Network where Zuckerberg has the idea and hacking skills to slam Facemash together, but he has to ask his buddy Eduardo Saverin, not a code slinger, for the "key ingredient": a ranking algorithm. The problem has already been solved by mathematics.

There are countless opportunities in the world today to make a dent in the universe by applying some existing mathematical solution to a modern problem. I'm looking for one.

Sunday, December 4, 2011

Compassion-Driven Development

I have been developing software for a pretty long time now. I have worked on a lot of systems in a lot of different environments. In several of these places my orientation process included a visit to the place where the end users were, a call center or client site or whatever, to see what they did and how they used the system. Oddly enough, this usually happened once when I first started, but pretty much never again.

I recently wrapped up a project that afforded me an opportunity I had never had before as a software developer. I sat with my end users every day. I lived with them. This may sound like a bad idea, and in many cases it might be, but in this particular context I loved it. I have never had such a tight feedback loop. I could push out a new feature and within five minutes hear someone down the row yell, "Hey, look at this!"

Not only did it give me the chance to experience their reactions to my work, but more importantly it gave me the chance to see what they needed. While we geeks love to complain about users asking too much, sometimes they ask too little. Being in the midst of these folks gave me the chance to see them doing what they did everyday. They were working with a legacy system that left a lot to be desired in the way of user experience, and they had come to accept far too many inconveniences as facts of life. They were doing so many manual tasks that were easy to incorporate into the system. They had built processes around design flaws and browser limitations.

They may have had ideas, complaints, and requests, as users always do, but they often overlooked the most obvious things. There is pain in every job, and sometime users can't quite imagine how software improvements could relieve it. Or maybe they've lived with the pain for so long that it no longer occurs to them that they're wasting time and energy. Being with these folks every day as they worked gave me the chance to say to them, "You should not have to do this."

I call this Compassion-Driven Development.

Sunday, September 4, 2011

C# REPL

I started a new job recently that has taken me from a primarily interpreted, dynamically typed existence to one that is primarily compiled and statically typed. More specifically, I have gone from Ruby back to C#. This transition is only around my day job; I still love dynamic languages. But since I spend a good bit of time at work, this has affected a significant chunk of my programming life.

So what is my primary pain point coming back to the .NET world after a couple of years of peace, love, and Volkswagen minibuses? No surprises here; it's Visual Studio. In my day-to-day work I do everything I possibly can in VIM. So light, so immediate. Please don't make me open Visual Studio. It is so slow in everything that it does, even just opening and closing. You know, I'm not always creating a multi-tiered enterprise solution with three UI alternatives and seven WCF services. I very often want to write just one or two lines of code to clarify my understanding of some language behavior or some detail of the framework. It is at these times that I most resent waiting two minutes for the IDE to open. And please don't make me create a new solution just to run one line of code. Can't I just get straight to the language for a second?

One thing that Ruby got me addicted to was the instant feedback afforded by irb. Irb stands for "interactive Ruby." It is nothing unique; it's just Ruby's REPL. If you're not familiar with this term, a REPL is a read-evaluate-print loop. It's a command line shell in which you can type a line of code and have it execute immediately and print the result. For you Visual Studio folks, it's kind of like the Immediate Window, but it doesn't require a paused executing application.

Of course the REPL was not born with Ruby. I think it originated with LISP (before I was born), but you get one with Python, Scala, Haskell, Perl, Prolog—lots of languages. Modern web browsers with good debugging tools provide a REPL for JavaScript, but I prefer the one you get with Node. You might notice that these are all non-Microsoft languages, but my first experience with a REPL was in BASIC, and it was there in the version that came with MS-DOS back in the early 80s.

What is so special about a REPL? If programming is the way we communicate to a computer what we want it to do, must there be so much ceremony around saying anything at all? Writing an application is like writing a book. A small program might be more like a personal letter. But what if you just want to say something, or ask a single question? What's the largest integer value? Can you implicitly cast a float to a Boolean? What happens if I try to add a decimal and a null? What date is ninety days from today? Can you square a list of integers in one line of code? This is where the REPL shines. It's more like a conversation.

If you're like me, you've been spoiled by a REPL in another environment, and now you're spending your days in .NET, you've got to be mourning the loss of that instant feedback. If this a luxury you have never had, I'm telling you now what you've been missing. The good news is that there is a C# REPL in Mono. I don't know why I have not heard more talk about it, but it is a big deal to me. It is totally worth installing Mono for, even if you use it for nothing else.

Just head over to mono-project.com and install Mono. You don't even need MonoDevelop (although it is worth checking out). Add Mono's bin folder to your path, and you've suddenly got this at your disposal:

C# in a REPL. No Visual Studio. No project. And if that's not cool enough for you, it's Mono, so it works on OS X and Linux, too. Use wisely. Enjoy.