TDD Vs. math formalism: friend or foe?

It is not uncommon to oppose the empirical process of TDD, together with its heavy use of unit tests, to the more mathematically based techniques, with the “formal methods” and formal verification at the other end of the spectrum. However I experienced again recently that the process of TDD can indeed help discover and draw upon math formalisms well-suited to the problem considered. We then benefit from the math formalism for an easier implementation and correctness.

It is quite frequent that maths structures, or more generally “established formalisms” as Eric Evans would say, are hidden everywhere in the business concepts we need to model in software.

Dates and how we take liberties with them for trading of financial instruments offer a good example of a business concept with an underlying math structure: traders of futures often use a notation like ‘U8’ to describe an expiry date like September 2018; ‘U’ means September, and the ‘8’ digit refers to 2018, but also to 2028, and 2038 etc. Notice that this notation only works for 10 years, and each code is recycled every decade.

The IMM trading floor in the early 70's (photo CME Group)

In the case of IMM contract codes, we only care about quarterly dates on:

  • March (H)
  • June (M)
  • September (U)
  • December (Z)

This yields only 4 possibilities for the month, combined with the 10 possible year digits, hence 40 different codes in total, over the range of 10 years.

How does that translate into source code?

As a software developer we are asked all the time to manage such IMM expiry codes:

  • Sort a given set of IMM contract codes
  • Find the next contract from the current “leading month” contract
  • Enumerate the next 11 codes from the current “leading month” contract, etc.

This is often done ad hoc with a gazillion of functions for each use-case, leading to thousands of lines of code hard to maintain because they involve parsing of the ‘U8’ format everytime we want to calculate something.

With TDD, we can now tackle this topic with more rigor, starting with tests to define what we want to achieve.

The funny thing is that in the process of doing TDD, the cyclic logic of the IMM codes struck me and strongly reminded me of the cyclic group Z/nZ. I had met this strange maths creature at school many years ago, I had a hard time with it by the way. But now on a real example it was definitely more interesting!

The source code (Java) for this post is on Github.

Draw on established formalisms

Thanks to Google it is easy to find something even with just a vague idea of how it’s named, and thanks to Wikipedia, it is easy to find out more about any established formalism like Cyclic Groups. In particular we find that:

Every finite cyclic group is isomorphic to the group { [0], [1], [2], …, [n ? 1] } of integers modulo n under addition

The Wikipedia page also mentions a concept of the product of cyclic groups in relation with their order (here the number of elements). Looks like this is the math-ish way to say that 4 possibilities for quarterly months combined with 10 possible year digits give 40 different codes in total.

So what? Sounds like we could identify the set of the 4 months to a cyclic group, the set of the 10 year digits to another, and that even the combination (product) of both also looks like a cyclic group of order 10 * 4 = 40 (even though the addition operation will not be called like that). So what?

Because we’ve just seen that there is an isomorphism between any finite cyclic group and the cyclic group of integer of the same order, we can just switch to the integer cyclic group logic (plain integers and the modulo operator) to simplify the implementation big time.

Basically the idea is to convert from the IMM code “Z3” to the corresponding ‘ordinal’ integer in the range 0..39, then do every operation on this ‘ordinal’ integer instead of the actual code. Then we can format back to a code “Z3” whenever we really need it.

Do I still need TDD when I have a complete formal solution?

I must insist that I did not came to this conclusion as easily. The process of TDD was indeed very helpful not to get lost in every possible direction along the way. Even when you have found a formal structure that could solve your problem in one go, even in a “formal proof-ish fashion”, then perhaps you need less tests to verify the correctness, but you sure still need tests to think on the specification part of your problem. This is your gentle reminder that TDD is not about unit tests.

Partial order in a cyclic group

Given a list of IMM codes we often need to sort them for display. The problem is that a cyclic group has no total order, the ordering depends on where you are in time.

Let’s take the example of the days of the week that also forms a cycle: MONDAY, TUESDAY, WEDNESDAY…SUNDAY, MONDAY etc.

If we only care about the future, is MONDAY before WEDNESDAY? Yes, except if we’re on TUESDAY. If we’re on TUESDAY, MONDAY means next MONDAY hence comes after WEDNESDAY, not before.

This is why we cannot unfortunately just implement Comparable to take care of the ordering. Because we need to consider a reference IMM code-aware partial order, we need to resort to a Comparator that takes the reference IMM code in its constructor.

Once we identify that situation to the cyclic group of integers, it becomes easy to shift both operands of the comparison to 0 before comparing them in a safe (total order-ish) way. Again, this trick is made possible by the freedom to experiment given by the TDD tests. As long as we’re still green, we can go ahead and try any funky approach.

Try it as a kata

This example is also a good coding kata that we’ve tried at work not long ago. Given a simple presentation of the format of an IMM contract code, you can choose to code the sort, find the next and previous code, and perhaps even optimize for memory (cache the instances, e.g. lazily) and speed (cache the toString() value, e.g. in the constructor) if you still have some time.

In closing

Maths structures are hidden behind many common business concepts. I developed an habit to look for them whenever I can, because they always help make us think, they help question our understanding of the domain problem (“is my domain problem really similar in some way to this structure?”), and of course because they often offer wonderful ready-made implementation hints!

The source code (Java) for this post is on Github.
Follow me on Twitter!
Photo: CME Group

Read More

Manipulating things collectively

There is great power in being able to manipulate collective things as one single thing. It gives you simplicity, hence control. You can focus your attention on it and reason about it, even though behind the hood it is made of many parts. The composite thing is kept simple, therefore you can also deal with several of them at a time. This would not be possible if you had to deal with every part they are made of, because it would be overwhelming.

There exists many strategies to deal with collective things as if they were one single thing: statistics, multiple selection, groups, classifications and super-signs.

Statistics

Statistics is probably the most obvious way to deal with collective things, when the things can be expressed as numbers. Historically it has been used with great results in physics, thermodynamics in particular.

It is all about extracting a few macro properties that we can reason on instead of the whole set of data:

  • number of elements
  • mean, deviation, moments, percentiles, etc.
  • regression, clusters
  • total property: total weight, total volume, total price

Multiple selection

Many software applications enable you to select multiple elements at a time in order to apply one operation to each element:

Arman accumulation
Arman accumulation
  • When sending an email, you can select multiple addresses to send to
  • In a word processor, you can select several words, several paragraphs, or even all the document to copy, paste or apply formatting to each element
  • In a spreadsheet, you can select multiple rows or columns to apply operations to, and you can also repeat formula for each row or column

The selected elements can be of the same kind or not. However for multiple selection to be useful, they must share at least something in common: the capability of being copied or pasted, or the fact that they are specific for a particular user.

Functional programming and the three higher order functions map, fold and filter address very well how to apply operations collectively to many elements.

Groups

When multiple selections are often needed, you can create groups. We can consider a group to be a multiple selection made explicit. You create a group and you explicitly add elements to it. Common examples of groups:

  • Mailing lists are named groups of email addresses
  • Vectors in maths

As for multiple selection, the elements in a group must share something in common. For example, they must all have a price. Elements of various kinds can be grouped if they relate to something common, for example  the set of various data (name, address, phone number, preferred colour and date of birth) specific for a user is called a user profile.

A group is extensional. The elements in the group may or may not know they belong to a group.

Java packages are groups, and they are declared within the same file as the elements they refer to. Java classes also group fields and methods under one name.

The Composite pattern suggests to group objects that share the same interface into a Composite that also shares the same interface. The intent is to manipulate the collective set of objects as if it was one single object, i.e. without knowing it is collective.

Classification

You get control over multiple things if you just classify them. Given several flowers, if you classify them into categories, then you can talk about several flowers collectively without having to enumerate each of them: the category is a way to refer to several flowers with just one name.

Classifications enable intensional grouping. This means that groups are defined not by the set of their elements, but by a condition (predicate) to be satisfied. The condition can test for the category of something (is this animal a bird?), or test for its attribute (is this car red?).

Of course abstraction is one particular way to classify.

Java modifiers (private, public, abstract, interface etc.) classify Java elements, and can be used to refer collectively to them, as in “let’s generate the Javadoc for every public elements”.

Super signs

Super-sign
Super-sign

There are elements that exhibit a special property when considered together as a whole. For example, the ink dots on the paper can be seen as letters. Letters next to each other can be seen together as words, which again can be seen together as sentences, and then again up to the novel. Collective arrangements of multiple things that together exhibit a property are called super signs.

This phenomenon is related to emergence, and only exists for a given observer if he can recognize the super sign.

In science in general, we use models to account for the collective behavior of several elements, typically objects with measurable properties, and forces in action.

In a Java program, idioms and patterns can be considered super signs for those who know them.

Conclusion

Manipulating multiple things in a simple way really matters, it is a life saver.

In software development it is paramount because it is a lever you use to manage tons of data with no effort. The art is to find the way you think about collective things that reduces the most your effort.

I already mentioned this topic in previous posts: group together things that go together, don’t make things artificially different, and my definition of abstraction, because abstraction is an essential way to refer to different things in what they share in common.

Read More