TDD Vs. math formalism: friend or foe?

It is not uncommon to oppose the empirical process of TDD, together with its heavy use of unit tests, to the more mathematically based techniques, with the “formal methods” and formal verification at the other end of the spectrum. However I experienced again recently that the process of TDD can indeed help discover and draw upon math formalisms well-suited to the problem considered. We then benefit from the math formalism for an easier implementation and correctness.

It is quite frequent that maths structures, or more generally “established formalisms” as Eric Evans would say, are hidden everywhere in the business concepts we need to model in software.

Dates and how we take liberties with them for trading of financial instruments offer a good example of a business concept with an underlying math structure: traders of futures often use a notation like ‘U8’ to describe an expiry date like September 2018; ‘U’ means September, and the ‘8’ digit refers to 2018, but also to 2028, and 2038 etc. Notice that this notation only works for 10 years, and each code is recycled every decade.

The IMM trading floor in the early 70's (photo CME Group)

In the case of IMM contract codes, we only care about quarterly dates on:

  • March (H)
  • June (M)
  • September (U)
  • December (Z)

This yields only 4 possibilities for the month, combined with the 10 possible year digits, hence 40 different codes in total, over the range of 10 years.

How does that translate into source code?

As a software developer we are asked all the time to manage such IMM expiry codes:

  • Sort a given set of IMM contract codes
  • Find the next contract from the current “leading month” contract
  • Enumerate the next 11 codes from the current “leading month” contract, etc.

This is often done ad hoc with a gazillion of functions for each use-case, leading to thousands of lines of code hard to maintain because they involve parsing of the ‘U8’ format everytime we want to calculate something.

With TDD, we can now tackle this topic with more rigor, starting with tests to define what we want to achieve.

The funny thing is that in the process of doing TDD, the cyclic logic of the IMM codes struck me and strongly reminded me of the cyclic group Z/nZ. I had met this strange maths creature at school many years ago, I had a hard time with it by the way. But now on a real example it was definitely more interesting!

The source code (Java) for this post is on Github.

Draw on established formalisms

Thanks to Google it is easy to find something even with just a vague idea of how it’s named, and thanks to Wikipedia, it is easy to find out more about any established formalism like Cyclic Groups. In particular we find that:

Every finite cyclic group is isomorphic to the group { [0], [1], [2], …, [n ? 1] } of integers modulo n under addition

The Wikipedia page also mentions a concept of the product of cyclic groups in relation with their order (here the number of elements). Looks like this is the math-ish way to say that 4 possibilities for quarterly months combined with 10 possible year digits give 40 different codes in total.

So what? Sounds like we could identify the set of the 4 months to a cyclic group, the set of the 10 year digits to another, and that even the combination (product) of both also looks like a cyclic group of order 10 * 4 = 40 (even though the addition operation will not be called like that). So what?

Because we’ve just seen that there is an isomorphism between any finite cyclic group and the cyclic group of integer of the same order, we can just switch to the integer cyclic group logic (plain integers and the modulo operator) to simplify the implementation big time.

Basically the idea is to convert from the IMM code “Z3” to the corresponding ‘ordinal’ integer in the range 0..39, then do every operation on this ‘ordinal’ integer instead of the actual code. Then we can format back to a code “Z3” whenever we really need it.

Do I still need TDD when I have a complete formal solution?

I must insist that I did not came to this conclusion as easily. The process of TDD was indeed very helpful not to get lost in every possible direction along the way. Even when you have found a formal structure that could solve your problem in one go, even in a “formal proof-ish fashion”, then perhaps you need less tests to verify the correctness, but you sure still need tests to think on the specification part of your problem. This is your gentle reminder that TDD is not about unit tests.

Partial order in a cyclic group

Given a list of IMM codes we often need to sort them for display. The problem is that a cyclic group has no total order, the ordering depends on where you are in time.

Let’s take the example of the days of the week that also forms a cycle: MONDAY, TUESDAY, WEDNESDAY…SUNDAY, MONDAY etc.

If we only care about the future, is MONDAY before WEDNESDAY? Yes, except if we’re on TUESDAY. If we’re on TUESDAY, MONDAY means next MONDAY hence comes after WEDNESDAY, not before.

This is why we cannot unfortunately just implement Comparable to take care of the ordering. Because we need to consider a reference IMM code-aware partial order, we need to resort to a Comparator that takes the reference IMM code in its constructor.

Once we identify that situation to the cyclic group of integers, it becomes easy to shift both operands of the comparison to 0 before comparing them in a safe (total order-ish) way. Again, this trick is made possible by the freedom to experiment given by the TDD tests. As long as we’re still green, we can go ahead and try any funky approach.

Try it as a kata

This example is also a good coding kata that we’ve tried at work not long ago. Given a simple presentation of the format of an IMM contract code, you can choose to code the sort, find the next and previous code, and perhaps even optimize for memory (cache the instances, e.g. lazily) and speed (cache the toString() value, e.g. in the constructor) if you still have some time.

In closing

Maths structures are hidden behind many common business concepts. I developed an habit to look for them whenever I can, because they always help make us think, they help question our understanding of the domain problem (“is my domain problem really similar in some way to this structure?”), and of course because they often offer wonderful ready-made implementation hints!

The source code (Java) for this post is on Github.
Follow me on Twitter!
Photo: CME Group

Read More

Collaborative Construction by Alberto Brandolini

Alberto Brandolini (@ziobrando) gave a great talk at the last Domain-Driven Design eXchange in London. In this talk, among many other insights, he described a recurring pattern he had seen many times in several very different projects: “Collaborative Construction, Execution & Tracking. Sounds familiar? Maybe we didn’t notice something cool”

Analysis conflicts are hints

In various unrelated projects we see similar problems. Consider a project that deals with building a complex artifact like an advertising campaigns. Creating a campaign involves a lot of different communication channels between many people.

On this project, the boss said:

We should prevent the user from entering incorrect data.

Sure you don’t want to have corrupt data, which is reasonable: you don’t want to launch a campaign with wrong or corrupt data! However the users were telling a completely different story:

[the process with all the strict validations] cannot be applied in practice, there’s no way it can work!

Why this conflict? In fact they are talking about two different processes, and they could not notice that. Sure, it takes the acute eyes of a Domain-Driven Design practitioner to recognize that subtlety!

Alberto mentions what he calls the “Notorious Pub Hypothesis“: think about the pub where all the bad people gather at night, where you don’t go if you’re an honest citizen. The hypothesis comes from his mother asking:

Why doesn't the police shut down this place?

Why doesn’t the police shut down this place? Actually there is some value in having this kind of place, since the police knows where all the bad guys are, it makes it easier to find them when you have to.

In a similar fashion, maybe there’s also a need somewhere for invalid data. What happens before we have strictly validated data?  Just like the bad guys who exist even if we don’t like it, there is a whole universe outside of the application, in which the users are preparing the advertising campaign with more than one month of preparation of data, lots of emails and many other communication types, and all that is untraceable so far.

Why not acknowledge that and include this process, a collaborative process, directly into the application?

Similar data, totally different semantics

Coming from a data-driven mindset, it is not easy to realize that it’s not because the data structures are pretty much the same that you have to live with only one type of representation in your application. Same data, completely different behavior: this suggests different Bounded Contexts!

The interesting pattern recurring in many applications is a split between two phases: one phase where multiple stakeholders collaborate on the construction of a deliverable, and a second phase where the built deliverable is stored, can be reviewed, versioned, searched etc.

The natural focus of most projects seems to be on the second phase; Alberto introduced the name Collaborative Construction to refer to the first phase, often missed in the analysis. Now we have a name for this pattern!

The insight in this story is to acknowledge the two contexts, one of collaborative construction, the other on managing the outcome of the construction.

Looks like “source Vs. executable”

During collaborative construction, it’s important to accept inconsistencies, warnings or even errors, incomplete data, missing details, because the work is in progress, it’s a draft. Also this work in progress is by definition changing quickly thanks to the contributions of every participant.

Once the draft is ready, it is then validated and becomes the final deliverable. This deliverable must be complete, valid and consistent, and cannot be changed any more. It is there forever. Every change becomes a new revision from now on.

We therefore evolve from a draft semantics to a “printed” or “signed” semantics. The draft requires comments, conversations, proposals, decisions. On the other hand the resulting deliverable may require a version history and release notes.

The insight that we have  these two different bounded contexts now in turn helps dig deeper the analysis, to discover that we probably need different data and different behaviors for each context.

Some examples of this split in two contexts:

  • The shopping cart is a work in progress, that once finalized becomes an order
  • A request for quote or an auction process is a collaborative construction in search of the best trade condition, and it finally concludes (or not) into a trade
  • A legal document draft is being worked on by many lawers, before it is signed off to become the legally binding contract, after the negotiations have happened.
  • An example we all know very well, our source code in source control is a work in progress between several developers, and then the continuous integration compiles it into an executable and a set of reports, all immutable. It’s ok to have compilation errors and warnings while we’re typing code. It’s ok to have checkstyle violations until we commit. Once we release we want no warning and every test to pass. If we need to change something, we simply build another revision, each release cannot change (unless we patch but that’s another gory story)

UX demanding

Building software to deal with collaborative construction is quite demanding with respect to the User Experience (UX).

Can we find examples of Collaborative Construction in software? Sure, think about Google Wave (though it did not end well), Github (successful but not ready for normal users that are not developers), Facebook (though we’re not building anything useful with it).

Watch the video of the talk

Another note, among many other I took away from the talk, is that from time to time we developers should ask the question:

what if the domain expert is wrong?

It does happen that the domain expert is going to mislead the team and the project, because he’s giving a different answer every day, or because she’s focusing on only one half of the full domain problem. Or because he’s evil…

Alberto in front of Campbell's Soup Cans, of course talking about Domain-Driven Design (picture Skillsmatter)

And don’t hesitate to watch the 50mn video of the talk, to hear many other lessons learnt, and also because it’s fun to listen to Alberto talking about zombies while talking about Domain-Driven Design!

Follow me (@cyriux) on Twitter!

Read More

What’s your signal-to-noise ratio in your code?

You write code to deliver business value, hence your code deals with a business domain like e-trading in finance, or the navigation for an online shoe store. If you look at a random piece of your code, how much of what you see tells you about the domain concepts? How much of it is nothing but technical distraction, or “noise”?

Like the snow on tv

I remember TV used to be not very reliable long ago, and you’d see a lot of “snow” on top of the interesting movie. Like in the picture below, this snow is actually a noise that interferes with the interesting signal.

TV signal hidden behind snow-like noise
TV signal hidden behind snow-like noise

The amount of noise compared to the signal can be measured with the signal-to-noise ratio. Quoting the definition from Wikipedia:

Signal-to-noise_ratio (often abbreviated SNR or S/N) is a measure used in science and engineering that compares the level of a desired signal to the level of background noise. It is defined as the ratio of signal power to the noise power. A ratio higher than 1:1 indicates more signal than noise.

We can apply this concept of signal-to-noise ratio to the code, and we must try to maximize it, just like in electrical engineering.

Every identifier matters

Look at each identifier in your code: package names, classes and interfaces names, method names, field names, parameters names, even local variables names. Which of them are meaningful in the domain, and which of them are purely technicalities?

Some examples of class names and interface names from a recent project (a bit changed to protect the innocents) illustrate that. Identifiers like “CashFlow”or “CashFlowSequence” belong to the Ubiquitous Language of the domain, hence they are the signal in the code.

Examples of classnames as signals, or as noise
Examples of classnames as signals, or as noise

On the other hand, identifiers like “CashFlowBuilder” do not belong to the ubiquitous language and therefore are noise in the code. Just counting the number of “signal” identifiers over the number of “noise” identifiers can give you an estimate of your signal-to-noise ratio. To be honest I’ve never really counted to that level so far.

However for years I’ve been trying to maximize the signal-to-noise ratio in the code, and I can demonstrate that it is totally possible to write code with very high proportion of signal (domain words) and very little noise (technical necessities). As usual it is just a matter of personal discipline.

Logging to a logging framework, catching exceptions, a lookup from JNDI and even @Inject annotations are noise in my opinion. Sometimes you have to live with this noise, but everytime I can live without I definitely chose to.

For the domain model in particular

All these discussion mostly focuses on the domain model, where you’re supposed to manage everything related to your domain. This is where the idea of a signal-to-noise ratio makes most sense.

A metric?

It’s probably possible to create a metric for the signal-to-noise ratio, by parsing the code and comparing to the ubiquitous language “dictionary” declared in some form. However, and as usual, the primary interest of this idea is to keep it in mind while coding and refactoring, as a direction for action, just like test coverage.

I introduced the idea of signal-to-code ratio in my talk at DDDx 2012, you can watch the video here. Follow me (@cyriux) on Twitter!

Credits:

TV noise picture: Some rights reserved CC par massimob(ian)chi

Read More

Key insights that you probably missed on DDD

As suggested by its name, Domain-Driven Design is not only about Event Sourcing and CQRS. It all starts with the domain and a lot of key insights that are too easy to overlook at first. Even if you’ve read the “blue book” already, I suggest you read it again as the book is at the same time wide and deep.

You got talent

The new "spoken" language makes heavy use of the thumb
A new natural language that makes heavy use of your thumbs

Behind the basics of Domain-Driven Design, one important idea is to harness the huge talent we all have: the ability to speak, and this talent of natural language can help us reason about the considered domain.

Just like multi-touch and tangible interfaces aim at reusing our natural strength in using our fingers, Eric Evans suggests that we use our language ability as an actual tool to try out loud modelling concepts, and to test if they pass the simple test of being useful in sentences about the domain.

This is a simple idea, yet powerful. No need for any extra framework or tool, one of the most powerful tool we can imagine is already there, wired in our brain. The trick is to find a middle way between natural language in all its fuzziness, and an expressive model that we can discuss without ambiguity, and this is exactly what the Ubiquitous Language addresses.

One model to rule them all

Another key insight in Domain-Driven Design is to identify -equate- the implementation model with the analysis model, so that there is only one model across every aspect of the software process, from requirements and analysis to code.

This does not mean you must have only one domain model in your application, in fact you will probably get more than one model across the various areas* of the application. But this means that in each area there must be only one model shared by developers and domain experts. This clearly opposes to some early methodologies that advocated a distinct analysis modelling then a separate, more detailed implementation modelling. This also leads naturally to the Ubiquitous Language, a common language between domain experts and the technical team.

The key driver is that the knowledge gained through analysis can be directly used in the implementation, with no gap, mismatch or translation. This assumes of course that the underlying programming language is modelling-oriented, which object oriented languages obviously are.

What form for the model?

Text is supplemented by pictures
Text is supplemented by pictures

Shall the model be expressed in UML? Eric Evans is again quite pragmatic: nothing beats natural language to express the two essential aspects of a model: the meaning of its concepts, and their behaviour. Text, in English or any other spoken language, is therefore the best choice to express a model, while diagrams, standard or not, even pictures, can supplement to express a particular structure or perspective.

If you try to express the entirety of the model using UML, then you’re just using UML as a programming language. Using only a programming language such as Java to represent a model exhibits by the way the same shortcoming: it is hard to get the big picture and to grasp the large scale structure. Simple text documents along with some diagrams and pictures, if really used and therefore kept up-to-date, help explain what’s important about the model, otherwise expressed in code.

A final remark

The beauty in Domain-Driven Design is that it is not just a set of independent good ideas on why and how to build domain models; it is itself a complete system of inter-related ideas, each useful on their own but that also supplement each other. For example, the idea of using natural language as a modelling tool and the idea of sharing one same model for analysis and implementation both lead to the Ubiquitous Language.

* Areas would typically be different Bounded Contexts

Read More

Domain-Driven Design: where to find inspiration for Supple Design? [part1]

Domain-Driven Design encourages to analyse the domain deeply in a process called Supple Design. In his book (the blue book) and in his talks Eric Evans gives some examples of this process, and in this blog I suggest some sources of inspirations and some recommendations drawn from my practice in order to help about this process.

When a common formalism fits the domain well, you can factor it out and adapt its rules to the domain.

A known formalism can be reused as a ready-made, well understood model.

Obvious sources of inspiration

Analysis patterns

It is quite obvious in the book, DDD builds clearly on top of Martin Fowler analysis patterns. The patterns Knowledge Level (aka Meta-Model), and Specification (a Strategy used as a predicate) are from Fowler, and Eric Evans mentions using and drawing insight from analysis patterns many times in the book.Analysis Patterns: Reusable Object Models (Addison-Wesley Object Technology Series)

Reading analysis patterns helps to appreciate good design; when you’ve read enough analysis patterns, you don’t even have to remember them to be able to improve your modelling skills. In my own experience, I have learnt to look for specific design qualities such as explicitness and traceability in my design as a result of getting used to analysis patterns such as Phenomenon or Observation.

Design patterns

Design patterns are another source of inspiration, but usually less relevant to domain modelling. Evans mentions the Strategy pattern, also named Policy (I rather like using an alternative name to make it clear that we are talking about the domain, not about a technical concerns), and the pattern Composite. Evans suggests considering other patterns, not just the GoF patterns, and to see whether they make sense at the domain level.

Programming paradigms

Eric Evans also mentions that sometimes the domain is naturally well-suited for particular approaches (or paradigms) such as state machines, predicate logic and rules engines. Now the DDD community has already expanded to include event-driven as a favourite paradigm, with the  Event-Sourcing and CQRS approaches in particular.

On paradigms, my design style has also been strongly influenced by elements of functional programming, that I originally learnt from using Apache Commons Collections, together with a increasingly pronounced taste for immutability.

Maths

It is in fact the core job of mathematicians to factor out formal models of everything we experience in the world. As a result it is no surprise we can draw on their work to build deeper models.

Graph theory

The great benefit of any mathematical model is that it is highly formal, ready with plenty of useful theorems that depend on the set of axioms you can assume. In short, all the body of maths is just work already done for you, ready for you to reuse. To start with a well-known example, used extensively by Eric Evans, let’s consider a bit of graph theory.

If you recognize that your problem is similar (mathematicians would say isomorphic or something like that) to a graph, then you can jump in graph theory and reuse plenty of exciting results, such as how to compute a shortest-path using a Dijkstra or A* algorithm. Going further, the more you know or read about your theory, the more you can reuse: in other words the more lazy you can be!

In his classical example of modelling cargo shipping using Legs or using Stops, Eric Evans, could also refer to the concept of Line Graph, (aka edge-to-vertex dual) which comes with interesting results such as how to convert a graph into its edge-to-vertex dual.

Trees and nested sets

Other maths concepts common enough include trees and DAG, which come with useful concepts such as the topological sort. Hierarchy containment is another useful concept that appear for instance in every e-commerce catalog. Again, if you recognize the mathematical concept hidden behind your domain, then you can then search for prior knowledge and techniques already devised to manipulate the concept in an easy and correct way, such as how to store that kind of hierarchy into a SQL database.

Don’t miss the next part: part 2

  • Maths continued
  • General principles

Read More

Your cross-cutting concerns are someone else core domain

Consider a domain, for example an online bookshop project that we call BuyCheapBooks. The Ubiquitous Language for this domain would talk about Book, Category, Popularity, ShoppingCart etc.

Business Domains

From scratch, coding this domain can be quite fast, and we can play with the fully unit-tested domain layer quickly. However if we want to ship, we will have to spend several times more effort because of all the extra cross-cutting concerns we must deal with: persistence, user preferences, transactions, concurrency and logging (see non-functional requirements). They are not part of the domain, but developers often spend a large amount of their time on them, and by the way, middleware and Java EE almost exclusively focus on these concerns through JPA, JTA, JMX and many others.

On first approximation, our application is made of a domain and of several cross-cutting concerns. However, when it is time to implement the cross-cutting concerns, they each become the core domain -a technical one- of another dedicated project in its own right. These technical projects are managed by someone else, somewhere not in your team, and you would usually use these specific technical projects to address your cross-cutting concerns, rather than doing it yourself from scratch with code.

Technical Domains

For example, persistence is precisely the core domain of an ORM like Hibernate. The Ubiquitous Language for such project would talk about Data Mapper, Caching, Fetching Strategy (Lazy Load etc.), Inheritance Mapping (Single Table Inheritance, Class Table Inheritance, Concrete Table Inheritance) etc. These kinds of projects also deal with their own cross-cutting concerns such as logging and administration, among others.

Logging is the core domain of Log4j, and it must itself deal with cross-cutting concerns such as configuration.

domain_ccc1

In this perspective, the cross-cutting concerns of a project are the core domains of other satellite projects, which focus on technical domains.

Hence we see that the very idea of core domain Vs. cross-cutting concerns is essentially relative to the project considered.

Note, for the sake of it, that there may even be cycles between the core domains and the required cross-cutting concerns of several projects. For example there is a cycle between a (hypothetical) project Conf4J that focuses on configuration (its core domain) and that requires logging (as a cross-cutting concern), and another project Log4J that focuses on logging (its core domain) and that requires configuration (as a cross-cutting concern).

Conclusion

There is no clear and definite answer as to whether a concept is part of the domain or whether it is just a cross-cutting concern: it depends on the purpose of the project. There is almost always a project which domain addresses the cross-cutting concern of another.

For projects that target end-users, we usually tend to reuse the code that deals with cross-cutting concerns through middleware and APIs, in order to focus on the usually business-oriented domain, the one that our users care about. But when our end-users are developers, the domain may well be technical.

Read More

Big battles are won on small details

Small details matter because you deal with them often. Any enhancement you make thus yields a benefit often, hence a bigger overall benefit. In other words: invest small care, get big return. This is an irresistible proposal!

balance
Every single step matters

Examples of small design-level details that I care about because I have experienced great payback from them:

  1. Using Value Objects rather than naked primitives
  2. One argument instead of two in a method,
  3. Well-thought names for every programming element
  4. Favour side-effect free methods and immutability as much as possible
  5. Keeping the behaviour close to the related data
  6. Investing enough time to deeply distillate each concept of the domain, even the most simple ones

Ivan Moore has an excellent series of blog entries on this approach: programming in the small.

All these details emphasize that code is written once then used many times. The extra care at time of writing pays back at time of using, each time, again and again. Each enhancement that minimises brain effort at time of use is welcome, because software design is a matter of economy.

Other kinds of “details” that I care about involve the human aspects of crafting software: being on site, face-to-face communication rather than electronic media, respect and consideration at all times, always celebrate achievements, etc. Because ultimately, it also boils down to people that feel like building something together.

Read More

Principles for using annotations

Deciding where and how to place the annotations is not innocent. The last thing we want is to create extra maintenance effort because of the annotations. In other words, we want annotations that are stable, or that change for the same reasons and at the same time than the elements they annotate. This article suggests some good practices on how to design annotations.

Annotations are location-based

annotations
A special kind of wall annotation

Language annotations or even good-old xDoclet tags enable to augment program elements with additional semantics, which can be used to configure tools, frameworks or containers.

Configuration is now increasingly done through annotations spread all over the project elements. The key advantage is that the location of the annotation directly references the program element (interface, class etc.), as opposed to configuration files that must reference program elements using awkward and error-prone qualified names: “com.mycompany.somepackage.MyClass”, that are also fragile to refactoring.

For example, we can annotate an entity to declare how it must be persisted, we can annotate a class to declare how it must be instantiated by a Dependency Injection framework, and we can annotate test methods to declare their purpose.

If not placed and thought carefully, annotations can make your code harder to maintain. This happens when annotations are placed at the “wrong” place, or when they introduce undesirable coupling, as we will see.

Dependencies still matter

The question of coupling between elements of the code base is also relevant for annotations. That the coupling is done via an annotation rather than plain code does not make it more acceptable.

We want to group together things that change together. As a consequence, put your annotations on the elements that change with the annotations.

In particular, when the annotation is used to declare a dependency:

Only annotate the dependent element, not the element depended on

If you use Dependency Injection and you want the class MyServiceImpl to be injected everywhere the interface MyService is used, then Guice offers the annotation @ImplementedBy:

@ImplementedBy(MyServiceImpl.class)
interface MyService {... }

This annotation is a direct violation of the advice above, since it makes a pure abstraction (an interface) aware of an implementation, whereas the regular dependency should be the other way round: only the implementation must depend on the interface.

I must however acknowledge that the annotation @ImplementedBy is quite convenient for unit tests anyway, to declare a default implementation for the interface. And it was done just for that, as described in the Guice documentation along with a warning:

Use @ImplementedBy carefully; it adds a compile-time dependency from the interface to its implementation.

Favor intrinsic annotations

annotation2
Another annotation on the wall in Paris

If you want to declare that a service is stateless, you cannot get it wrong: just put the annotation @Stateless on its interface. This is straightforward because being stateless is a truly intrinsic property. It also makes perfect sense to annotate a method argument with the @Nullable annotation, as the capability to expect null or not is really intrinsic to the method.

On the other hand, a service interface does not really care about how it is called. It can be called by another object (local call) or remotely, through some remote proxy. The object is not intrinsically local or remote in itself.

The point is that the decision to consume the service locally or remotely does not belong to the service, in itself, but depends on each client. It actually belongs to each use-case considered.

Said another way, specifying @Remotable or @Local directly on the service would require the developer of the service to guess how it will be used!

Intrinsic properties really belong to the element and therefore are stable, as opposed to use-case-specific properties that vary with the particular case of use. Hence, if we want stable annotations:

Only annotate an element about its intrinsic properties, and avoid annotating about use-case-specific properties.

Annotations as pointcuts

Let’s consider an example of  an accounting service in a bank. Only selected categories of staff can access this service. We can use annotations to declare its security configuration:

@RolesAllowed({"auditor", "bankmanager", "admin"})

The problem with that approach is that it couples the service directly to the user roles defined elsewhere; as a consequence, if we want to change the user roles (we now need to add the user role “externalauditor”), we will have to review every security annotation and change them. On the other hand, if we want to change the access policy (which happen every time a new senior management comes into place), we will also have to change annotations all over the code. How can we improve that?

We can improve the situation by going back to the business analysis on the topic and separate what’s intrinsic and what’s not. In other words, we want to find out how did a BA came up with the security roles for the service.

Rather than specifying the need for security in terms of allowed user roles, we can instead declare the facts: the service is “sensitive” and is about “accounting”:

@Domain(Accounting)
@Confidentiality(Sensitive)
And now a beautiful car annotation
And now a beautiful car annotation

Then we can define expressions that use the declared annotations (which are now stable because they are intrinsic) to select elements (here services) and associate them to allowed user roles. These rules should be defined outside of the elements they apply to, typically in configuration files.

Thanks to the annotations that already define half of the security knowledge, expressing the rules becomes much simpler that doing it method by method. So next time the senior management changes and decides that from now on, “every service that is both Confidentiality(Sensitive) and Domain(Accounting) is only allowed to corporate-officer roles”, you just have to update a few rules expressed in terms of domain and confidentiality level, rather than by listing many method.

The mindset is very similar to AOP where we first define pointcuts, and then attach advices to them. Here we use annotations as an alternative way to declare the pointcuts.

Conclusion

Annotations are very efficient to declare properties about program elements directly on the elements. They are robust versus refactoring and are easier to use than specifying long qualified names in XML files.

To get the best of annotations, we still need to consider the coupling they can introduce, in particular with respect to dependencies. If a class should not know about another, its annotations should not either.

Annotations are much more stable (less likely to change) when they only relate to intrinsic properties of the elements they are located on. When we need to configure cross-cutting concerns (security, transactions etc.) annotations can be used to declare the half of the knowledge that is really intrinsic to the elements, in the same spirit than pointcuts in AOP.

All that leads to the acknowledgement that even though annotations can be of huge value, in practice there is still a case for configuration files to complement them. In this approach, annotations on elements declare what belongs to the elements, while each use-case-specific configuration file makes use of the annotations and as a result is much simpler.

Read More

Degrees of freedom analysis

The concept of degrees of freedom looks so relevant to software development that I am wondering why it is not considered more often. Fortunately Michael L. Perry dedicates a full section of his blog to that concept. In this post I will quote a lot, please consider that as a sign of enthusiasm.

A common concept in maths

The concept of DOF is central in solving systems of linear equations, and in the main post of his series, Michael L. Perry starts by focusing on mathematical system of linear equations:

A mathematical model of a problem is written with equations and unknowns. You can think of the number of unknowns as representing the number of dimensions in the problem.

Depending on the number m of equations compared to the number n of unknowns (variables), there are several cases:

  1. m > n: there is no solution, the problem is over-constrained
  2. m = n: there is only one solution
  3. m < n: the system is under-determined system, and the dimension of the solution set is usually equal to n ? m, where n is the number of variables and m is the number of equations.

A common concept in mechanics

3 legs are enough to be stable, but tables usually have 4 legs anyway.
3 legs are enough to be stable, but tables usually have 4 legs anyway.

The concept of DOF is prevalent in mechanics. In particular, a system with more internal constraints than the total possible number of DOFs has no solution. However in practice it can still work, provided some of the bodies are not absolutely rigid.

The table is often given as an example, because it only needs three legs to be stable on the ground, but usually has 4 legs. This only works because the table is not fully rigid, and can accommodate the small imperfection of the ground.

Not yet common in software

In his post, Michael L. Perry explains in practice how to analyse software using DOF. First find the unknowns:

To identify the degrees of freedom in software, start by defining the unknowns. These are usually pretty simple to spot. These are the things that can change. In a checkbook program, for example, each transaction amount is an unknown, as are the the account balance and the color used to display it (black or red).

Then find out the constraints between the DOFs.

Next, define the equations. These are the relationships between the unknowns that the software has to enforce. In the checkbook, the balance is the sum of all transaction amounts. And the color is red if the balance is negative or black otherwise.

Finally:

Subtract to find your degrees of freedom. One amount per transaction (n), one balance, and one color gives n+2 unknowns. The balance sum and the color rule give us two equations. n+2-2 = n degrees of freedom, one per transaction.

What for?

Quoting again Michael L. Perry (across various posts in the DOF category):

Understanding the degrees of freedom in the software helps to create a maintainable design.

Adding independent data to a system increases its degrees of freedom. Adding dependent data does not. Adding an immutable field does not.

You want no more degrees of freedom in the system than the problem calls for.

The concept of degree of freedom is remarkably useful to help distilling the domain down to the essential variable parts and the constraints between them. Any extra independent data can only create opportunities for bugs.

Read More

Design your value objects carefully and your life will get better!

Many concepts look obvious because they are used often, not because they are really simple.

Small quantities that we encounter all the time in most projects, such as date, time, price, quantity of items, a min and a max… hardly are a subject of interest from developers “as it is obvious that we can represent them with primitives”.

Yes you can, but you should not.

I have experienced through different projects that every decision to use a new value value object or to improve an existing value object, even very small and trivial, was probably the design decision that yields the most benefit in every further development, maintenance or evolution. This comes from the simple fact that using something simpler “all the time” just makes our life simpler all the time.

We often dont pay attention to the usual chair, however it was carefully designed.
We often dont pay attention to the usual chair, however it was carefully designed.

As a reminder, value objects are objects with an identity solely defined by their data. Using value objects brings many typical benefits of Object Orientation, such as:

  1. One value object often gathers several primitives together, and one thing is always simpler that a few things
  2. The value object can be fully unit-tested: this gives a tremendous return, because once it is trusted there will simply be no bug about it any more
  3. Methods that encapsulate even the simplest expression with a good name tells exactly what it does, instead of having to interpret the expression every time (think of isEmpty() instead of “size() == 0”, it does save some time each time)
  4. The value object is a place to put documentation such as the corresponding unit and other conventions, and have them enforced. Static creators makes creation easy, self-explanatory, can enforce the validations and help select the right unit (percent, meter, currency)
  5. The method toString() can just tell in a pretty-formatted fashion what it is, and this is precious!
  6. The value object can also define various abilities, such as being Comparable, being immutable, be reused in a Flyweight fashion etc.

And of course, when needed it can be changed to evolve, without breaking the code using it. I could not imagine a team claiming to be really agile without the ability at least to evolve its most basic concepts in an easy way.

The first time it takes some extra time to build a value object, as opposed to jumping to a few primitives; but being lazy actually means doing less in the long run, not just right now.

Read More