TDD Vs. math formalism: friend or foe?

It is not uncommon to oppose the empirical process of TDD, together with its heavy use of unit tests, to the more mathematically based techniques, with the “formal methods” and formal verification at the other end of the spectrum. However I experienced again recently that the process of TDD can indeed help discover and draw upon math formalisms well-suited to the problem considered. We then benefit from the math formalism for an easier implementation and correctness.

It is quite frequent that maths structures, or more generally “established formalisms” as Eric Evans would say, are hidden everywhere in the business concepts we need to model in software.

Dates and how we take liberties with them for trading of financial instruments offer a good example of a business concept with an underlying math structure: traders of futures often use a notation like ‘U8’ to describe an expiry date like September 2018; ‘U’ means September, and the ‘8’ digit refers to 2018, but also to 2028, and 2038 etc. Notice that this notation only works for 10 years, and each code is recycled every decade.

The IMM trading floor in the early 70's (photo CME Group)

In the case of IMM contract codes, we only care about quarterly dates on:

  • March (H)
  • June (M)
  • September (U)
  • December (Z)

This yields only 4 possibilities for the month, combined with the 10 possible year digits, hence 40 different codes in total, over the range of 10 years.

How does that translate into source code?

As a software developer we are asked all the time to manage such IMM expiry codes:

  • Sort a given set of IMM contract codes
  • Find the next contract from the current “leading month” contract
  • Enumerate the next 11 codes from the current “leading month” contract, etc.

This is often done ad hoc with a gazillion of functions for each use-case, leading to thousands of lines of code hard to maintain because they involve parsing of the ‘U8’ format everytime we want to calculate something.

With TDD, we can now tackle this topic with more rigor, starting with tests to define what we want to achieve.

The funny thing is that in the process of doing TDD, the cyclic logic of the IMM codes struck me and strongly reminded me of the cyclic group Z/nZ. I had met this strange maths creature at school many years ago, I had a hard time with it by the way. But now on a real example it was definitely more interesting!

The source code (Java) for this post is on Github.

Draw on established formalisms

Thanks to Google it is easy to find something even with just a vague idea of how it’s named, and thanks to Wikipedia, it is easy to find out more about any established formalism like Cyclic Groups. In particular we find that:

Every finite cyclic group is isomorphic to the group { [0], [1], [2], …, [n ? 1] } of integers modulo n under addition

The Wikipedia page also mentions a concept of the product of cyclic groups in relation with their order (here the number of elements). Looks like this is the math-ish way to say that 4 possibilities for quarterly months combined with 10 possible year digits give 40 different codes in total.

So what? Sounds like we could identify the set of the 4 months to a cyclic group, the set of the 10 year digits to another, and that even the combination (product) of both also looks like a cyclic group of order 10 * 4 = 40 (even though the addition operation will not be called like that). So what?

Because we’ve just seen that there is an isomorphism between any finite cyclic group and the cyclic group of integer of the same order, we can just switch to the integer cyclic group logic (plain integers and the modulo operator) to simplify the implementation big time.

Basically the idea is to convert from the IMM code “Z3” to the corresponding ‘ordinal’ integer in the range 0..39, then do every operation on this ‘ordinal’ integer instead of the actual code. Then we can format back to a code “Z3” whenever we really need it.

Do I still need TDD when I have a complete formal solution?

I must insist that I did not came to this conclusion as easily. The process of TDD was indeed very helpful not to get lost in every possible direction along the way. Even when you have found a formal structure that could solve your problem in one go, even in a “formal proof-ish fashion”, then perhaps you need less tests to verify the correctness, but you sure still need tests to think on the specification part of your problem. This is your gentle reminder that TDD is not about unit tests.

Partial order in a cyclic group

Given a list of IMM codes we often need to sort them for display. The problem is that a cyclic group has no total order, the ordering depends on where you are in time.

Let’s take the example of the days of the week that also forms a cycle: MONDAY, TUESDAY, WEDNESDAY…SUNDAY, MONDAY etc.

If we only care about the future, is MONDAY before WEDNESDAY? Yes, except if we’re on TUESDAY. If we’re on TUESDAY, MONDAY means next MONDAY hence comes after WEDNESDAY, not before.

This is why we cannot unfortunately just implement Comparable to take care of the ordering. Because we need to consider a reference IMM code-aware partial order, we need to resort to a Comparator that takes the reference IMM code in its constructor.

Once we identify that situation to the cyclic group of integers, it becomes easy to shift both operands of the comparison to 0 before comparing them in a safe (total order-ish) way. Again, this trick is made possible by the freedom to experiment given by the TDD tests. As long as we’re still green, we can go ahead and try any funky approach.

Try it as a kata

This example is also a good coding kata that we’ve tried at work not long ago. Given a simple presentation of the format of an IMM contract code, you can choose to code the sort, find the next and previous code, and perhaps even optimize for memory (cache the instances, e.g. lazily) and speed (cache the toString() value, e.g. in the constructor) if you still have some time.

In closing

Maths structures are hidden behind many common business concepts. I developed an habit to look for them whenever I can, because they always help make us think, they help question our understanding of the domain problem (“is my domain problem really similar in some way to this structure?”), and of course because they often offer wonderful ready-made implementation hints!

The source code (Java) for this post is on Github.
Follow me on Twitter!
Photo: CME Group

Read More

Domain-Driven Design: where to find inspiration for Supple Design? [part1]

Domain-Driven Design encourages to analyse the domain deeply in a process called Supple Design. In his book (the blue book) and in his talks Eric Evans gives some examples of this process, and in this blog I suggest some sources of inspirations and some recommendations drawn from my practice in order to help about this process.

When a common formalism fits the domain well, you can factor it out and adapt its rules to the domain.

A known formalism can be reused as a ready-made, well understood model.

Obvious sources of inspiration

Analysis patterns

It is quite obvious in the book, DDD builds clearly on top of Martin Fowler analysis patterns. The patterns Knowledge Level (aka Meta-Model), and Specification (a Strategy used as a predicate) are from Fowler, and Eric Evans mentions using and drawing insight from analysis patterns many times in the book.Analysis Patterns: Reusable Object Models (Addison-Wesley Object Technology Series)

Reading analysis patterns helps to appreciate good design; when you’ve read enough analysis patterns, you don’t even have to remember them to be able to improve your modelling skills. In my own experience, I have learnt to look for specific design qualities such as explicitness and traceability in my design as a result of getting used to analysis patterns such as Phenomenon or Observation.

Design patterns

Design patterns are another source of inspiration, but usually less relevant to domain modelling. Evans mentions the Strategy pattern, also named Policy (I rather like using an alternative name to make it clear that we are talking about the domain, not about a technical concerns), and the pattern Composite. Evans suggests considering other patterns, not just the GoF patterns, and to see whether they make sense at the domain level.

Programming paradigms

Eric Evans also mentions that sometimes the domain is naturally well-suited for particular approaches (or paradigms) such as state machines, predicate logic and rules engines. Now the DDD community has already expanded to include event-driven as a favourite paradigm, with the  Event-Sourcing and CQRS approaches in particular.

On paradigms, my design style has also been strongly influenced by elements of functional programming, that I originally learnt from using Apache Commons Collections, together with a increasingly pronounced taste for immutability.

Maths

It is in fact the core job of mathematicians to factor out formal models of everything we experience in the world. As a result it is no surprise we can draw on their work to build deeper models.

Graph theory

The great benefit of any mathematical model is that it is highly formal, ready with plenty of useful theorems that depend on the set of axioms you can assume. In short, all the body of maths is just work already done for you, ready for you to reuse. To start with a well-known example, used extensively by Eric Evans, let’s consider a bit of graph theory.

If you recognize that your problem is similar (mathematicians would say isomorphic or something like that) to a graph, then you can jump in graph theory and reuse plenty of exciting results, such as how to compute a shortest-path using a Dijkstra or A* algorithm. Going further, the more you know or read about your theory, the more you can reuse: in other words the more lazy you can be!

In his classical example of modelling cargo shipping using Legs or using Stops, Eric Evans, could also refer to the concept of Line Graph, (aka edge-to-vertex dual) which comes with interesting results such as how to convert a graph into its edge-to-vertex dual.

Trees and nested sets

Other maths concepts common enough include trees and DAG, which come with useful concepts such as the topological sort. Hierarchy containment is another useful concept that appear for instance in every e-commerce catalog. Again, if you recognize the mathematical concept hidden behind your domain, then you can then search for prior knowledge and techniques already devised to manipulate the concept in an easy and correct way, such as how to store that kind of hierarchy into a SQL database.

Don’t miss the next part: part 2

  • Maths continued
  • General principles

Read More

Degrees of freedom analysis

The concept of degrees of freedom looks so relevant to software development that I am wondering why it is not considered more often. Fortunately Michael L. Perry dedicates a full section of his blog to that concept. In this post I will quote a lot, please consider that as a sign of enthusiasm.

A common concept in maths

The concept of DOF is central in solving systems of linear equations, and in the main post of his series, Michael L. Perry starts by focusing on mathematical system of linear equations:

A mathematical model of a problem is written with equations and unknowns. You can think of the number of unknowns as representing the number of dimensions in the problem.

Depending on the number m of equations compared to the number n of unknowns (variables), there are several cases:

  1. m > n: there is no solution, the problem is over-constrained
  2. m = n: there is only one solution
  3. m < n: the system is under-determined system, and the dimension of the solution set is usually equal to n ? m, where n is the number of variables and m is the number of equations.

A common concept in mechanics

3 legs are enough to be stable, but tables usually have 4 legs anyway.
3 legs are enough to be stable, but tables usually have 4 legs anyway.

The concept of DOF is prevalent in mechanics. In particular, a system with more internal constraints than the total possible number of DOFs has no solution. However in practice it can still work, provided some of the bodies are not absolutely rigid.

The table is often given as an example, because it only needs three legs to be stable on the ground, but usually has 4 legs. This only works because the table is not fully rigid, and can accommodate the small imperfection of the ground.

Not yet common in software

In his post, Michael L. Perry explains in practice how to analyse software using DOF. First find the unknowns:

To identify the degrees of freedom in software, start by defining the unknowns. These are usually pretty simple to spot. These are the things that can change. In a checkbook program, for example, each transaction amount is an unknown, as are the the account balance and the color used to display it (black or red).

Then find out the constraints between the DOFs.

Next, define the equations. These are the relationships between the unknowns that the software has to enforce. In the checkbook, the balance is the sum of all transaction amounts. And the color is red if the balance is negative or black otherwise.

Finally:

Subtract to find your degrees of freedom. One amount per transaction (n), one balance, and one color gives n+2 unknowns. The balance sum and the color rule give us two equations. n+2-2 = n degrees of freedom, one per transaction.

What for?

Quoting again Michael L. Perry (across various posts in the DOF category):

Understanding the degrees of freedom in the software helps to create a maintainable design.

Adding independent data to a system increases its degrees of freedom. Adding dependent data does not. Adding an immutable field does not.

You want no more degrees of freedom in the system than the problem calls for.

The concept of degree of freedom is remarkably useful to help distilling the domain down to the essential variable parts and the constraints between them. Any extra independent data can only create opportunities for bugs.

Read More