Saturday, January 28, 2012

Roman Numerals Kata in Ruby, Take II

Last week I posted a screencast of my take on the Roman Numerals kata in Ruby. It was Corey Haines who encouraged me to do that, and he was gracious enough to take the time to watch it and offer some suggestions. We both thought it might be interesting if I did another iteration of it and discussed the changes. His two main concerns were that (1) it would have been nice to see the pattern extracted sooner and (2) there must be a way to introduce the zero case without blowing up all the tests.

Here is the first video for reference:

It would have been nice to see the pattern extracted sooner. In the first version, I let quite a few similar cases accumulate before making the inconsequential adjustments necessary to cultivate the uniformity that opened the door to bringing them together in the loop body (about 6 minutes in). The kata process of solving the same problem over and over has led me to a question I have not faced before: how much of the knowledge acquired in previous runs is it fair to apply in the current one? When do we rehearse the discovery process, and when do we simply benefit from the discovery? I've got a feeling I could spend the rest of my life pondering this question.

 There must be a way to introduce the zero case without without blowing up all the tests. This happens around 6:42. I told Corey that this one was a matter of honesty. I knew that introducing the recursion without the zero case would blow everything up because I had seen it happen before. But I thought coding for it without first seeing the test failure felt like cheating, like skipping the first step in the red-green-refactor dance. After giving it more thought, I see where I was off. Only new tests should be red. The red that I allowed to happen was not a new test but a bad refactoring. Yes, test coverage is our safety net while we refactor, but still we should never intentionally fall off the trapeze.

It was short-sighted to introduce the recursion without the trivial exit case in the first place. We all know that such a case is a necessary part of recursion, and it is not cheating to introduce it in a refactoring without waiting for everything to blow up first. I introduced the recursion much earlier this time (1:45). Granted, I knew unfairly that was the direction I would go eventually, but still I coded only what was necessary to get to green. This change in approach addressed both problems; it allowed the pattern to be extracted earlier, and I never broke all my tests. As an added bonus, this also enabled me to shave three minutes off the time.

Here is the new version is set to the fourth movement of Mendelssohn's Piano Trio No. 2 in C minor:

Saturday, January 21, 2012

Geek Ballet: Roman Numerals Kata in Ruby

For a while now I've been a fan of code kata screencasts set to music. The musical accompaniment accentuates how beautiful code can be, and frames it to be appreciated as an artistic endeavor, even though it is still a logical exercise. I think of it as geek ballet. Corey Haines first introduced me to this art form, and he recently encouraged me to try one of my own, so here it is. It was more work than I expected, but it was good fun, and I learned a lot. In the same way that writing about any subject forces you to understand it better, demonstrating how to practice forces you to practice more.

In this video I'm using Ruby with RSpec, AutoTest, and Growl notifications. The music is the fourth movement of my favorite symphony, Dvorak's 9th in E minor.


Sunday, January 8, 2012

Set-Based Operations: They’re Not Just For Databases

What do you think of when you hear the term set-based operations? I have always thought of that as a database concept. Set-based operations address or operate on multiple data elements, seemingly in parallel, as opposed to iterating through and executing operations one by one. I was introduced to this concept in the context of SQL in relational databases, and it was a struggle at first. My brain was more naturally predisposed to think in terms of iterating over a list, doing one thing at a time.

I remember first running into this mismatch of approaches writing my first SQL Server triggers. (Let's pretend for the moment that this is not a holy war topic.) I intuitively wanted to do a particular thing with each row that was inserted into a table. I wanted to deal with the inserted column values as scalar values.

update inserted_row set full_name = @first_name + ' ' + @last_name

My problem was that since an INSERT can affect multiple rows, and a trigger fires once per INSERT, an INSERT trigger has to handle all the inserted rows at once. So of course the mechanism provided is the INSERTED virtual table rather than a group of variables like I wanted.

update people
set full_name = people.first_name + ' ' + people.last_name
from inserted join people on =

I had to learn to think in terms of sets instead of items. In my group of very young and inexperienced developers, there were several that really struggled with making this leap. This was in the good ole days before SQL Server supported cursors. (Sorry, I'm really not to trying to incite a riot here.) So for quite a few years there, I thought of executing operations on multiple data elements as being done in a set-based manner in SQL,

select sum(salary) from employees

and iteratively in the other languages I used, such as VB, JavaScript, C#, etc.

var total = 0.0;
foreach (var employee in employees)
    total += employee.Salary;

Then along came lambda expressions. This is of course faulty chronology, since lambda expressions were introduced well before I was born. But it was years later that dear Ruby introduced me to the beauty of lambdas. They afford us the ability to specify what needs to be done with each element without bothering ourselves with the particulars of crawling through lists.{|employee| employee.salary}.inject(:+)

And of course they found their way into .NET as well.

employees.Sum(employee => employee.Salary)

This syntax is more concise, more declarative, more about intention than implementation. This is all good, but is it anything more than syntactic sugar? I'm a syntactic sweet-tooth myself, but there is more value here than that. We have separated the concern of getting the data we want, from that of dealing with a particular looping construct or iteration implementation. The same lambda can be used regardless of whether we want to iterate through the list serially, spawn threads to execute in parallel, wrap the operations in a transaction, or whatever.

I have recently been programming some matrix math, which has got me thinking about set-based or matrix-based operations as opposed to the more conventional iterative approach. Your geek core might be harder than mine, but I had not used matrices and vectors since high school. In case you don't want to go do any additional reading on matrices, here are a few insultingly simple examples that should be fairly intuitive to read:

1. Multiply a matrix and a scalar:

| 3  1  5 |                   | 6  2 10 |
| 0  2  2 |    *    2    =    | 0  4  4 |
| 6  3  4 |                   | 12 6  8 |

2. Multiply a column vector and a row vector:

| 2 |                             | 2  4  6 |
| 0 |    *    | 1  2  3 |    =    | 0  0  0 |
| 4 |                             | 4  8 12 |

3. Multiply a row vector and a column vector (note matrix multiplication is not commutative):

                    | 2 |
| 1  2  3 |    *    | 0 |    =    | 14 |
                    | 4 |

I was pleasantly surprised to find matrix support libraries for pretty much every language I thought to check for them. Just for fun, let's look at two ways to solve each of these simple examples, each with a different language.

Example 1 in Ruby

Iterative approach:
matrix = [[3, 1, 5], [0, 2, 2], [6, 3, 4]]
for i in 0..2
  for j in 0..2
    matrix[i][j] *= 2
# matrix => [[6, 2, 10], [0, 4, 4], [12, 6, 8]]
Matrix approach using the core Matrix library:
require 'matrix'
matrix = Matrix.rows([[3, 1, 5], [0, 2, 2], [6, 3, 4]])
p matrix * 2 #=> Matrix[[6, 2, 10], [0, 4, 4], [12, 6, 8]]

Example 2 in C#

Iterative approach:
int[] column = {2, 0, 4};
int[] row = {1, 2, 3};
var product = new int [3, 3];
for (var i = 0; i < 3; i++)
    for (var j = 0; j < 3; j++)
        product[i, j] = column[i] * row[j];
// product => {{2, 4, 6}, {0, 0, 0}, {4, 8, 12}}
Matrix approach using the Math.NET Numerics library:
var column = new DenseMatrix(3, 1);
column.SetColumn(0, new double[] {2, 0, 4});

var row = new DenseMatrix(1, 3);
row.SetRow(0, new double[] {1, 2, 3});

var product = column * row;
// => 2,4,6
//    0,0,0
//    4,8,12

Example 3 in Python

Iterative approach:
row = [1, 2, 3]
column = [2, 0, 4]
product = 0

for i in range(3):
    product += column[i] * row[i]

# product => 14
Matrix approach using the NumPy library:
from numpy import *

row = matrix('1 2 3')
column = matrix('2; 0; 4')
product = row * column #=> [[14]]

You can see how this approach eliminates a layer or two of loops from your code as well as the accumulation code of adding up sums or whatever. There may not be a huge benefit in these simple cases, but when addressing problems of more realistic complexity this can work wonders for both readability and performance. I have used these matrix constructs mostly in the context of algebraic summations over matrix elements, which are used heavily in various machine learning algorithms such as linear regression and neural networks.

The next time you start to write some looping code, take a second to think about whether there might be another way to go. Chances are that you can accomplish the same thing with less code that will read better and often run faster.