Agile France 2016 – Decentralized Architecture

At Agile France 2016 in Paris I ran an open-space session on Decentralized Architecture. This was an opportunity to collect various perspectives on a topic that I’ve been thinking about and discussing with colleagues over the last few years.

Architecture Definition Ambiguity

We started with a quick survey with all the attendees, who were asked to give “examples of architecture”, in any form and in their own interpretation of the word.

Going through the examples, the very first realization was that in practice there are many different visions of architecture:

  • “Preventing accidental complexity”
  • “Ensuring consistency”, “avoiding doing the same thing twice”, “coordination between multiple applications”
  • “The norms & blueprints: REST API, Service-Orientation…”
  • “The Main Decisions, especially the structuring decisions”
  • “The Vision, Consistency, Technical Debt management”
  • “The Coding guidelines: testable code, ability to extend…”
  • Psychological comfort: “We expect the architect to comfort the team and reassure them on the quality of their decisions”

Some of these definitions focus on the purpose, other on the solutions. We don’t mean the same thing at all when we use the word “architecture”. The definition of architecture is disturbingly ambiguous.

But this little survey was also a way for many to express their frustrations with respect to doing architecture and in particular with the architects.

The Architecture Frustration

The bottleneck effect

The most obvious complaint was that architecture as usually done by a central architecture group is a source of delays:

“architects slow us down by several weeks on any new project”.

As a result, it’s not uncommon for projects to game the rules to avoid going through the architects. For example if all projects beyond 100 man/day have to go through a mandatory architecture review, then a team with a genuine 120man/day project may try to split it into two 60 man/day projects to go under the threshold.

Conflicting schools of thoughts.

The other complaint is that “The architects impose arbitrary rules we disagree with, and that are detrimental to our project”.

Perhaps the team is just wrong, in which case the architects are trying to improve the things but they fail to convince. But perhaps the architects are outdated in their world view. In any case, when then teams and the architects follow different schools of thoughts then you can expect confusion, mis-communication and distrust.

Activity over Role

At this point of the session, another big realization was that we should not confuse the activity of “doing architecture” and the role of “the architect”. Taking care of the architecture is the most important, regardless of weather  you have a role of architect or not. This was the point put forward by my colleague Ly-Jia (@Ly_Jia) in her past talks on the topic.

Architecture as an activity over Architect as a role

Of course, when a team does not know how to architect (the activity), then they call an architect (the role) for help, who will bring their skills to solve the issues.

Architect as the Decision-Maker

While we mentioned before that some teams try to avoid dealing with their architects, other actually wish they had access to architects for advices or even to make the decisions.

“We have no architect and we should have one to reduce our current mess”

“Without architect, it’s chaos.”

“We expect the architect to take the important decisions that are vital to the project”.

In this view, the architect is the decision-maker for the team. The team is then supposed to follow the decisions. The architects should check the conformity after the work is done, even if they usually have no time to do that.

The architect as the decision-maker comes at a cost: it slows you down. This is hard to accept when you want short time-to-market. If you knew how to make decisions yourself in the team, you could go faster.

When the team knows how to do architecture

When the team has all the skills then it’s perfect, at least for their project. Decisions are local, fast, and well-informed.

“We don’t have an architect and it’s fine”.

This is an ideal situation we should nurture and encourage. However in practice not every team has all the skills. For example when surveying what’s important about architecture during the session at the conference, hardly anyone mentioned Data Authority as a concern.

Teams are probably not self-sufficient in all necessary skills. They probably need training and external help from people more skilled in architecture.

Architect as the Trainer on Architecture Skills

Many teams need training in architecture, and the architect is often the right person to teach that to the teams. In this view, the architect does not take decisions, but acts as a trainer, until there’s no need for training any more. It’s a biodegradable role, but with many teams, some turnover and the evolution of architecture skills, don’t expect that role to become useless anytime soon.

In fact this role of training the team already happens to some extent and informally during the meetings with the architects, even when the architects are the decision-makers. If they explain the decision-making and the rationale behind the decisions, then the attendees can learn how to reason like the architect. And perhaps next time the team can decide on its own in confidence.

Codifying the Architecture Skills

In a company I worked for, I used to collect every principle behind our decision-making in a list we called “the Codex”. This was an attempt to codify the way architects make decisions.

There were rules like “always know where’s the authoritative source of data”, or “don’t ever talk about solutions before you have stated the problem”, or “YAGNI: Don’t build a framework, build only what you need (it’s up to the next project to decide if they want to extract a framework from your stuff)”.
The main problem with the Codex was that it was quickly growing bigger and bigger. Last time I looked there were 30+ principles listed.

Even if everything can work fine locally thanks to skilled teams, perhaps the global consistency can still be endangered by the local ignorance of the bigger context.

Taking care of the architecture within the team can work at small scale. At larger scale, we need specific skills and more importantly we need specific visibility over many applications: this cross-system knowledge becomes the rare resource, only known by a few people. In bigger companies the central architects usually play this role of bringing this visibility to local teams to help them make informed decisions. But they remain bottlenecks on the path of the project delivery.

Bottleneck-free Architecture

Communities of Practices are a standard solution to give visibility on the overall state of the system to all teams. For example, delegates from each team meet in a 2-hours meetings every 2 weeks to exchange on the knowledge of the global architecture, the current stakes and challenges, what’s already existing somewhere else etc. It’s also a place to harmonize what they call architecture and what is desirable under this name.

In this view, the architecture can be seen as “the invisible frame, defined by consensus, which states what we like the system to be like”, as Jonathan (@jonathanperret) expressed it.

In addition to communities of Practices, tools can help too. For example GitHub Enterprise and its built-in search engine offers a simple way to explore the large scale galaxy of projects by searching for keywords. For example this is helpful to find out who’s also working on “Refund policies” in order to coordinate with them. And there is more we can do in this area to help make the bigger context accessible for a larger audience.

Large-Scale Governance

At larger scale, governance becomes a topic that matters. We want to manage the legacy, decide which components we want to decommission and which components we would like to invest into to make them more useful.

At this level we often talk about a “master plan” (“schema directeur”). This is usually done by committee, and it can take a lot of time to define. This kind of master plan aims at optimizing the whole information system, its consistency, the data consistency, the investments in relation with the strategy, etc. This is again a bottleneck, and a large-scale one indeed.

If we want to avoid this bottleneck, then we want to favor local architects, embedded within teams or within small departments. They spend most of their time working with their colleagues, doing work and ideally wiring code as well. And they also meet regularly between peers to exchange and coordinate on company-wide blueprints, in a federated fashion.

Tools can help as well. Registry-like documentation, aka “applications catalogues” for example can enable anyone to have an overview of the overall system and their components.

However at large scale these documents are unlikely to be accessible to anyone but architects, as they are so complicated, with tons of boxes and arrows, or huge lists of recommendations, all with specific jargon.

Another problem at this scale is that the quantity of information to consider to reason on the architecture vastly exceeds the ability of any single brain. It’s just not possible for an architect, let alone for each team member, to be aware of every application, SLA and other concerns in a big company.

Coordination-less Architecture

Going a bit more radical at this point, is it really so obvious that this desirable “consistency” is always a good thing? Did anyone ever really investigated this question? Similarly, is waste necessarily a bad thing?

When we talk about all the expected benefits of architecture we easily see the value (remove duplication, standardize all the things, decide on THE unique tool for each function…) but we hardly see all the costs.

In particular most people ignore or underestimate the huge coordination cost associated with consensus on large-scale decisions. Most people also ignore or underestimate the cost of one-size-fits-all kind of decisions.

For example, contrast how standards are made by consortium, such as Corba or the various WS-* standards, with Internet standards, which require that for a standard to be chosen, there has to be at least two working implementations already available.

In the second approach we accept some waste (work being done more than once) for the sake of proving that a good approach is really a good one. In the first approach, we try to be perfect upfront, at the expense of the coordination cost of a long and expensive process. This is waste too, just a different kind of waste.

Once we accept some waste, we can embrace some diversity too. “Amazon now has one single deployment tool, but it eventually emerged from a diversity of 3 or 4 ones that used to be accepted for years”. It was ok to have more than one tool for a given functionality. In particular, the choice of the standard tool did not delays he delivery of any project.

Of course this doesn’t mean that diversity should be infinite either. “At Spotify they run meetings to limit the number of technologies currently in use”. But it’s an evolving thing. Anyone should be able to try a new technology for a while, which may or may not become part of the standard technology list later.

On the governance of the applications, which is also an evolving thing, we could learn to renew each part more often. We could put an expiration date on each component. Or limit their maximum size to ensure they can be re-written in a short period of time (microservices anyone?) This is in contrast with master plans that ambition to predict the future and try to precise the “target” of the information system. In fact I don’t know any sensible architect these days who still believes you could precise the target of the information system in details.

Emerging Architecture from Local Decisions

Going further, at large scale you don’t really have the choice on the way you do architecture. Consider Amazon which now has 1400 services: at this scale, you just can’t do architecture the old-fashion say, it’s impossible.

But you know that at Amazon each team lives by a limited number of strict rules that are absolutely mandatory: “2 pizza team”, “API-first” etc, as defined by Jeff Bezos in his famous memo. Few rules, strict, and promulgated by a dictator.

The challenge is to identify a set of dictatorial rules that do not constraint too much the freedom to deliver wonders, and that let the architecture emerge from purely local decisions.

It’s a challenge when you start a new company, and it’s another challenge when you already have a large legacy, both legacy systems and legacy culture. And nobody talks about the companies where the boss also decided of the rules and which failed.

Are Micro-Services about self-organizing teams?

The same morning at Agile France Emmanuel Gaillot (@egaillot) and Yannick Francois (@ya_f) had proposed a game on microservices where many teams had to build a game from many services, without any formal or central coordination. This was very interesting to experiment how many teams self-organize and how they find ways to avoid coordination as much as possible.

Ingredients for Emergence

How do we find out the few rules that everybody has to abide to and that enable emerging properties out of a large system? How do we deal with a number of parts we can’t keep into our heads?

We don’t have all the answers, but I’d mention 3 main orientations: Purpose, Options, and Automation.

Purpose-Oriented

A rule that will apply to anyone at any scale should be a direct expression of a company goal or a direct consequence of its vision. And if should be abstract enough to be interpreted differently in various contexts.

Option-Oriented

One key example of a general rule is “create (cheap) options for the future”. This one is so general that it could work pretty much everywhere. Perhaps that’s what Jeff had in mind with his API-First principle: “Whatever you build must give us the option to sell it to the outside world if we wish to”.

Conformity Automation

The third aspect is automation. If you really want fault-tolerance, then you should test for it. If you can’t do that by hand on a large system, robots can. This is what Netflix famously achieved with their Simian Army. They literally managed to automate a large part of checking their Architecture conformity with bots:

Latency Monkey induces artificial delays

Conformity Monkey finds instances that don’t adhere to best-practices and shuts them down.

Doctor Monkey taps into health checks to remove unhealthy instances.

Janitor Monkey searches for unused resources and disposes of them on the cloud.

Security Monkey finds security violations or vulnerabilities and terminates the offending instances.

10-18 Monkey detects configuration and run time problems in instances serving customers in multiple geographic regions.

Chaos Gorilla is similar to Chaos Monkey, but simulates an outage of an entire Amazon availability zone.

This automation forces everyone in each team to own the concerns enforced by the Simian Army, which also happen to be direct requirements. That’s a good thing. We’re not that far from a genuine Test-Driven Architecture at this stage.

Closing

Thanks everyone for the discussions in this session!

Read More

How to fail an Open Space session

And the answer is: when you try to change the format until it’s not really an open space session any more.

Since 2011 I’ve been running and facilitating dozens of open space sessions in many conferences across various countries: at the Software Craftsmanship Meetup in Paris, in Socrates Germany, Socrates France, in Bucharest’s ITAKE, Agile France and so forth. It always worked fine, except twice. Precisely the two times I tried to mix the open space format with a more structured one.

A word on Open Space technology

Open Space technology is a beautiful format. Basically it’s all about people interested in a common topic joining to discuss it together, usually with a flip chart. It comes with rules that actually remind participants that they are the actors of what is going to happen, and these rules shape the expectations:

  • Whoever comes is the right people
  • Whenever it starts is the right time
  • Wherever it is, is the right place
  • Whatever happens is the only thing that could have, be prepared to be surprised!
  • When it’s over, it’s over (within this session)

And there’s the most important rule : the “Law of two feet”:

If at any time during our time together you find yourself in any situation where you are neither learning nor contributing, use your two feet, go someplace else.

Note that there is no moderator role, just a facilitator role to remind the rules and organize the very basics like time and space. Anyone can hijack a session, and the law of two feet is supposed to counter-balance that from happening.

The form does not promise much and there’s no guarantee you’ll be happy, but because it’s fully interactive you can do the steering yourself by suggesting, reacting, asking questions or answering directly. In practice this means you’re quite likely to be happy at the end. Except when you take risks and try to change the format.

A Tale of Two Disasters

At Socrates Germany 2014 I had explained monoids to many attendees. At the same Socrates the next year in Soltau I proposed a session to go further, which I called the ”Monoid Safari”. It was supposed to be a collective exploration to identify more examples of monoids in the business domains of the attendees.

I came prepared with a large slide deck “just in case”. I had it ready just in case I had to remind people of what a monoid is, or to show some of the examples of monoids I had already collected and described in the past. I wanted an open space session, but I had a full content for a talk as a backup. It should have been safe.

This one wasn’t really a disaster, but was not a success either. Most attendees had no idea what a monoid was, and they came to learn that. I quickly found out that my plan was not working, so I tried to explain monoids quickly to carry on to the safari. But still questions were coming: “do you have examples?”. “Oh yes I have some, here they are”. And I jump to the most appropriate slides to show some the examples, quickly. And still hoping to come back to the safari right after.

A few attendees got it fast and suggested interesting examples, but not much. At the end of the session, I was disappointed of the safari, while the overall feeling of the audience was that I could have explained better the concept of monoids. #fail

More recently, at DDD Europe in Brussels I had proposed an ambitious session called ”Bounded Contexts: the Illustrated Bestiary”. I consider myself knowledgeable in DDD and Bounded Contexts, which also means I have open questions on the edge of this topic. I thought this conference dedicated to DDD was the best place to find knowledgeable peers to discuss that and try to answer some of the puzzles. So I proposed this session, precisely because I do not have all the definitive answers.

There was an abstract, which was not meant for everyone:

From one entry in the Blue Book, Bounded Contexts have since evolved into a bestiary of mythical creatures like Bubble Context, Interchange Context or even “Uber-Context”, each with different rationales that somehow challenge their initial definition. We now have recurring relationships between them like in the Collaborative Construction pattern. We now know that Bounded Contexts are a solution thing, not a problem thing, but some confusion remains, and we can make the matter even more confusing with crazy questions like ”Can Bounded Contexts be nested?”, ”Are Aggregates mini Bounded Contexts?’ or ”Is it useful to say that legacy UI and DB layers are their own Bounded Contexts?”.

During this semi-structured Open-Space session, every attendee can contribute examples or feedbacks, ask questions and share their ideas and opinions on this topic. Contributions in any form (slides, pictures, code…) are also welcome prior to the session and will be be credited.

Given the topic is very abstract I had in mind to run for 1 hour, no more. And I was expecting just a few attendees. But the program generously planned two hours, and more than 40 people came in early, sitting in a very large circle!

20160128-IMG_7641
Again I came prepared with a slide deck covering each example and concept mentioned in the abstract, “just in case”. From the start it appeared that many attendees did not clearly know what a Bounded Context was, and they came to learn that more in-depth. So I started to explain quickly. As I felt tied by the abstract, I tried to cover all the questions, quickly. It was uncomfortable switching from lecture-style and then back to open space style, and again quickly, every 5mn. At this pace I was also speaking too fast, which combined to my French accent is a recipe for disaster. And it was a disaster!

The feedback from the attendees was quite harsh, and illustrates how the expectations were not clear, it was not a talk, and it was not an open space either:

  • “Even for an open session it could have been beter structured. At the end, I do not know if I learned something.”
  • “Although it was open space, it seemed like the speaker didn’t prepare anything. The crowd was waiting for some definitions, explanations to open the discussions, but these never came…”
  • “Speaker clearly did not master the topic. Also, the open space form was unknown to most of the participants, it was bland”
  • “The speaker wanted to organise a structured open space, but this approach had three problems: the speaker was not very understandable,  did not reign in the discussion, and he let loud-mouth participants walk all over him”
  • “Due to the fact, that there were no experts, no questions were answered. Instead I left with more questions than before. The content was with e.g. “bubble context” very specific and it was hard to discuss with the others without having knowledge about this.”

Some did appreciate, but still a disaster:

  • “Very interesting topic. However, the group was way too big to have a useful conversation. Left the session after 30 minutes.”

Some offered suggestions:

  • “be a bit more clear on what the format of the open space is”
  • “I dont think it was the problem of the speaker. But he expected a MUCH higher level of the audience. Wasnt really prepared to give a lecture about bounded contexts. A warning sign “only for experienced attendees” on the schedule would have help immensively.”
  • “Split the group into multiple small groups to discuss the topics”

It’s called a training?

Thinking about it, the format I was looking for exists and I know it well, it’s just called a training, not open space. And that’s funny since that’s already the way I do trainings! Although it may look like conversational, it remains actively directed by the trainer. In the ideal situation, the trainer does not have to talk that much, as the learning happen thanks to exercises, discussions and coding. But still, this is not open space.

However a training assumes that the trainer knows the topic completely. So how do we do when we want to create opportunities to discuss topics precisely to clarify open questions and to explore the boundaries of our knowledge? When there’s a pre-requisite level of knowledge to even listen and follow what’s happening? So far the best opportunities I’ve met were in conferences hallways, during random discussions with speakers and attendees, unfortunately always too short and usually without taking notes. Advanced topics can be discussed during open spaces of course, as it regularly happens in various Socrates un-conferences, but it’s usually between small groups, the topics are broadly defined, e.g. “Combinators in FP”, and the answers usually exist in the literature.

I’ll try again, but without pretending it’s open space when it’s not really. Or as a pure open space, hence without any commitment or hard promise, to relax the expectations. I’ll probably put myself in danger again from time to time though :)

 

 

Read More

Make Money vs Reduce Risks dichotomy

In sports, football for example, players have only one goal in mind: score, score, again and again, as often as possible. Close to them, but not too close, arbiters have only one goal in mind: detect quickly all violations of the rules of the game, and sanction them.

The players know the rules, still we need an antagonism role, the arbiters, to keep the game fair. It is never perfect, but this is not plain chaos either.

Many mature businesses have chosen a similar structure. There is a role to make money, as much money as possible, and another role to control risk under an acceptable level.

20141125-001955.jpg

In finance, this schema is visible at several levels. Traders and sales people in the front-office focus on making money, while officers in the risk department closely monitor their activity to control they don’t go too far. We hear very loud in the news when the traders go too far. We don’t hear much when the risk people go too far, but reducing risk usually hurts profitability in the short term.

This schema occurs again between banks, who want to make money, and the regulators, who is supposed to protect the country and the customers. That the regulators do a good job or not is not the point of this article, my point is that there is a common business pattern there.

When there is a common business pattern, and when the business is heavily supported by software systems, does this mean there is a corresponding pattern in the software itself? I believe there is, a bit like a generalized Conway’s Law. The corresponding software pattern is: when the business has an obvious antagonism like “Making Money vs Reducing Risk”, then it probably calls for two distinct Bounded Contexts in the corresponding software.

This dichotomy is not a rule, it is just an heuristics to suggest there may be a need for two distinct Bounded Contexts.

Who is the key decision maker is probably the question that shapes everything. I learnt that a few years ago in a course with @ziobrando. In particular, when two management hierarchies are involved, even if their visions coincides right now, it’s unlikely that both visions will evolve the same way over time. This is a reason to split the solution in two Bounded Contexts that will evolve independently. So if you have a Direction of Trading and the Direction of Risk, you’re in this situation.

Modeling in the two contexts

Making money typically involves good commercial relationships and a competitive pricing expertise, plus enough speed to react to opportunities.

Software systems for that typically manages the business one deal at a time. They often need to be real-time, or fast enough not to lose impatient customers. Sometimes we may even accept to trade calculation accuracy in exchange for speed. For example we may be using floating point calculations instead of Big Decimals, or an approximation instead of the exact formula.

Software systems to support making money need to help people doing the sales to be fast, for example with rich defaulting of the input values.

By contrast, software for officers who want to reduce or control risk often computes risk metrics out of a lot of deals. It may be fraud analysis or a stress tests simulating markets crisis. It is often just computing the overall risk taken by summing up the numbers from each deal. Some do that in realtime too, but usually it can accommodate much slower paces like on-demand, daily, weekly or even monthly.

20141125-002012.jpg

Sometimes the competition is so tight that risk control becomes the key differentiator to make money between competitors. In this situation risk control has become another miniature sub-domain within the domain of selling and pricing. Still, it has its own risk-oriented perspective on the business, and it is like a delegation of responsibility from the risk officers to the front office people and their trading bots. Even in this situation there will also be a full-featured domain of risk control outside, with the corresponding software in its own Bounded Context.

A developer example

DevOps is the classical example in software development: developers want to release often to deliver more value. However ops people know that each release comes with risks for the production so they traditionally prefer to release less frequently. “No release, no risk” would be ideal for them.

20141125-002020.jpg

In this scheme, developers and ops teams use different tools, and don’t monitor the same indicators. When they get closer as in DevOps, the ops usually delegate some risk control to the development team and their automated testing tools, but they keep their own expertise and their specific tooling.

Many thanks to @ziobrando and @mathiasverraes for the early feedback and some complements incorporated into the text.

Read More

The Actual vs Plan Dichotomy

I’ve once worked for the IT department of a large restaurant network. They had a secret sauce, and this wasn’t one you could eat.

In a restaurant kitchen, when a customer orders a dish on the menu you have to cook it. You take the recipe with a list of ingredients and their quantities, and you prepare the dish by following the instructions. In theory, if the recipe says you need a quarter of a lemon, then you should be able to cook 4 dishes with one lemon. Unfortunately, when you analyze the actual consumption of ingredients over a day, you realize that in practice you actually made 20 dishes with 6 lemons. That’s 3.3 dish by lemon! Life is not as nice as theory, in real life you have waste. A quarter of lemon falls on the floor and you can’t use it. A lemon was too small so you has to use a “bigger quarter”, and there are only 3 bigger quarters in one lemon. And the last lemon that was used could not be used completely because the number of dishes actually ordered is not a multiple of 4.

20141125-002327.jpg

The restaurants network I’ve worked for were very good at analytical accounting. This was their secret sauce. They were actually monitoring the usage of lemons, among other things. That was key to their profitability, because businesses like restaurants have quite narrow margins. It was important for them to know better the gap between the recipe and the actual cooking of the dishes.

It happens that tracking the gaps between what was planned – the recipe, and what is actually built – the cooking, is important for lots of businesses in all domains.

20141125-002411.jpg

In manufacturing, you have a bill of material, the BOM, that is necessary to build a product. The BOM states that you need 12 screws to close the enclosing. However in practice you may lose one screw and end up consuming 13, not 12. If the tracking report shows this situation happens often and if the screws are expensive enough, then some engineers will want to improve something on the production line to prevent that. You may automate, or make the screwdrivers magnetic.

In hardware electronic manufacturing you build silicon waffles that will make thousands of chips. This is the theory, before the quality control where you discover how many of them are dysfunctional. This kind of manufacturing is expensive, so you want to monitor that.

The point so far is that mature businesses have lots of competitors, and they often compete on continuous optimizations of the delivery process. You need tracking for that. That’s a recurring business problem, and you need software for that.

So how do we design such software in this situation?

The answer is that you most likely have two distinct Bounded Contexts: one for describing the plan, and another for tracking the gaps between the actual and the plan. And even if the two contexts often seem to deal with the same real-life concepts, they have in fact no reason to look similar.

The one context approach

At first, “the recipe from the book” and “the recipe as I’ve actually done it” look just the same thing. That’s why a common approach is to mix both models into one. I think this is a poor hack.

For example project management tools focus on the theoretical plan, and then they add extra fields to track the time actually burnt directly on the elements of the plan. It may work as long as it’s very simple. The more we move forward into complexity, the more it won’t make sense to keep the two models aligned. With a TODO list, “Done!” might be enough, but as the tracking becomes more complex, it will grow its own tracking language.

Different intentions

Modeling the recipe and the tracking of the actual cooking of the dish look similar, but they are totally different in their intentions. The recipe aims at telling how to prepare the dish, including instructions on how to stir or how to cut the vegetables. The tracking focuses on waste, or errors, or improving the purchase process or the supply chain management. That’s a strong reason to call for a different point of view.

Other differences

For tracking you may not be able to observe all details. The recipe talks about quarters of lemon, but a quarter of a lemon does not exist on the market. You can only measure the number of lemons you bought, and the number of remaining lemons at the end of the day.

What actually happens during cooking, during a manufacturing process or during a project can be totally different from the plan. Perhaps there was no more lemon, so the cook had to exceptionally replace with some lemon juice, an ingredient that is not even in the recipe. A good tracking model should accommodate the tracking of this kind of event.

On a similar note, for steering purposes, a coarse level of granularity is often enough. You can neglect details from the process. Perhaps a weekly or monthly inventory of the remaining ingredients is enough. Tracking the actual consumption of ingredients dish by dish would be a total waste of time!

Modeling

Modeling the plan, the bill of material, the manufacturing steps etc. is not the challenging part of the Actual vs Plan Gap Tracking relationship.

Typically the tracking context needs its own distinct model, a totally different one than the planning model. For tracking cargos shipment, for example, you’d probably track loading and unloading events at various ports. The tracking model would be a journal of events. For tracking the consumption of materials, the tracking model would be a time-stamped snapshot of the current inventory.

20141125-002349.jpg

The tracking model may link to the plan. This case is a bit like the Knowledge Level pattern (Fowler): the plan is the Knowledge Level, and the tracking of the actual production would be the Operation Level. Here the knowledge Level defines the ideal case, but the actual behavior at the operation level will regularly diverge from the ideal, because this discrepancy is indeed what we want to track.

20141125-002403.jpg

The tracking can also live in its own bubble, with no link to the plan. If that’s the case, a separate reconciliation mechanism will do the comparison to highlight the gaps.

A developer example

As usual, as developers we are already familiar with this actual vs plan dichotomy: it’s just called testing.

A test defines the plan, thats the expected value, or the expected behavior on a mock. A test also observes the actual result of the code and tracks all gaps between them in the test results report. It’s obvious that the model of the plan and the model of tracking the actual results are different, and use a totally different language.

In this area, simulation testing also uses a reconciliation mechanism a posteriori to track the discrepancies.

Another example is simply software project planning and tracking. With software projects we know very well that forcing a match between plan and execution is a really dumb idea.

Conclusion

The key thing in this stereotypical relationship is to be fully aware of the two distinct contexts. Avoid the trap of mixing both contexts, unless it is a deliberate and conscious decision. Usually recognizing the two contexts will lead to the conclusion that distinct models are needed, and this will make everything simpler.

Many thanks to @ziobrando and @mathiasverraes for the early feedback and some complements incorporated into the text.

Read More

Canary tests

Canary Tests are minimal tests to quickly and automatically verify that everything you depend on is ready. You run Canary tests before other time-consuming tests, and before wasting time investigating in your code when the other tests are red. If the canary test fails, you know you have to fix something on the environments first.

This idea of Canary test is different from the Canary Deployment. In Canary Deployment you deploy to a small fraction of your users to check everything’s fine before rolling out to more users.

Save time by checking what should be always OK

Canary tests check for the obvious and frequent sources of issues, such as:

  • connectivity to network: firewall rules ok, ports open, proxy working fine, NAT, ping below a good threshold

  • Databases and middleware are up
  • disk quota for logs not almost full
  • every needed login and password is valid
  • installed software available in the right version: dll installed, registry set-up, environment variables set, user directories all exist, the frameworks and OS versions are fit, timezone and locale are as expected
  • reference data integrity and consistency (dates, valuations…) are ok
  • Database schema and audit of applied scripts are as expected
  • Licences are not expired (there is usually a way to check that automatically)

Canary tests should run regularly, ideally before any expensive tests like end-to-end tests. Of course you want to run them whenever there is a trouble somewhere, before wasting time on manual investigations in your code when the expected environment is not fully available.

Even at the code level, a canary test is just a trivial test to verify that the testing framework works correctly, as mentioned by Marcus on his blog:

	assertTrue(true)

Don’t forget to verify that your tests can fail too!

Simple and low-maintenance

The canary test tools should not assume much from the application. They must be independent from new developments to be as stable as possible. They should require little to no maintenance at all.

One way to do that in practice is to simply scan configuration files for every URL, password and just ping them one by one against a predefined time threshold. Any log path mentioned in the configuration files can be scanned and checked for the required write permissions and available disk space. Any login and password can be checked, even though this may be more complicated.

Canary tests are documentation too

Doing Canary tests may require explicit declarations of expectations, e.g. an annotation AssumedPermission(‘777’) to declare the permissions required on the files referenced in the configuration files. Alternatively you may rely on a Convention Over Configuration principle. For example every

 log.*.path

variable is assumed  to be a log path to check against some predefined expectations like being writable and being ok with disk quota.

When you add canary tests, this automation itself is a form of documentation that makes assumption more explicit.

You could export a report of every canary test that has been ran into a readable form that can become part of your Living Documentation.

Read More

Øredev 2013 – What you probably missed

Øredev 2013 was last week, and it was fantastic!

Sharing knowledge

Øredev is in Malmö, Sweden. It’s very close to Copenhagen, so you can fly to there and then take a 20mn train to arrive in Malmö.

It’s a fantastic conference, totally vendor-neutral (that’s very important). It’s big yet friendly, with a mix of well established topics and more experimental ideas. This year the theme was “The Arts”, and as a result it was deliberately provoking or weird in some aspects, and that is a good thing!

For me the highlights of the conference were the radical ideas brought by two guys with some experience in the business: Woody Zuill and Fred George (I’ll come to it in a minute). I also enjoyed a lot how Jessica Kerr @jessitron manages to make alternative ways of thinking more accessible and attractive for developer using mainstream languages like Java or C#. Unfortunately I missed @Bodil talks because the room was too packed to be even able to open the door…

Before the conference you can attend 1-day trainings, and I decided to attend the Value-Driven Product Developmentcourse by JB Rainsberger @jbrains. It’s a very good course, more advanced and probably not for beginners. I knew a lot about BDD and has attended other courses already, yet I still learnt a lot during this workshop. I missed other talks from JB, but I want to watch the videos since I had very good feedbacks from other attendees.

It was interesting to listen to experience reports (New Frontiers For In-House Legal Practice by Kate Sullivan, Data @ King – How we are able analyze 100M DAU by Mats-Olov Eriksson, Curiosity killed the cat, but what kills curiosity?by Ann-Marie Charrett @charrett, Less is more! – when it comes to art and software, by @JimmyNilsson) with anecdotes and honest accounts of successes, failures and evolutions of mindset.

Radical Ideas

The Øredev program committee likes to take risks and challenge the way we think about software, as demonstrated by Woody and Fred talks, but also through the talk Code as a crime scene by Adam Petersen Tornhill @AdamTornhill?. Adam tries to reuse forensic methods used for crime investigations to help on large legacy code bases. He built the tool CodeMaat to visualize likely aggressions on the code base based on these ideas.

More radically, Woody Zuill @WoodyZuill talked about the Mob Programming approach his team has been practicing for some time now. He does not claim you should do the same, and he explains that this approach is just the result of doing more of the good things as found during retrospectives. His team found that working together on one task at a time on one single machine at a time was good, so they decided to do that all the time. You must watch his talks: Mob Programming, A Whole Team Approach (Roy “Woody” Zuill). It includes a time-lapse video and is very interesting. It also challenges the way we think about work. What if what the actual “work” was actually what’s between what we usually call “work”?

Very radical too, Fred Georges @fgeorge52 talked about his approach of Programmer Anarchy, “because that’s what it is”. He’s now replicated the experiment at two different companies including a rather traditional one (the Daily Mail newspaper), and is starting again in yet another. Again he does not claim you should do the same, just that it works for them. Again using the power of retrospectives, they got rid of every role except just the customer and the programmer roles. They don’t use the usual software craftsmanship practices like testing and refactoring. However they take great care of the business domain, just like a trader and a developer working closely together can end up giving suggestions to each other, in both ways. As Fred says: “Power to the programmer!”.

This approach works thanks to the use of Micro-Services. This style of architecture in itself is also a bit radical, with a “rapid”, an ordered bus of all the events of the whole system, and a lot of very small, cohesive, disposable micro-services that listen and publish to the bus. You can copy-paste a service to create another, you can rewrite a service rather than make changes, you can plug your new service directly into production! It may sound chaotic but in my opinion this style is disciplined indeed.

Woody gave another talk No Estimates: Let’s explore the possibilities (Roy “Woody” Zuill). It’s really a beautiful talk thanks to the beautiful illustrations from his wife Andrea. Woody does a great job at making us question our need for estimates, what it really means and how it can harm. More importantly, he suggests that estimates are an obstacle against delivering something truly wonderful!

I was lucky to spend some time talking with Woody and Fred, and what they do is very exciting. It’s a paradox, but both still really follow agile values, despite taking huge liberties with respect to the usual principles and practices. Both Fred and Woody also know a lot about object oriented principles and made sure their teams was skilled in that too. However in each case the experiments are also biased because of the very presence of outstanding developers like Fred or Woody!

Testing is not just checking

Software development requires a mix of many different skills. Some of the important skills revolve around testing. At Øredev you could listen and talk to some of the most notable representatives of the testing community: Heuristics of Testability (James Bach) @jamesmarcusbach, Regression Obsession (Michael Bolton) @michaelbolton, Balancing ATDD, GUI Automation and Exploratory Testing (Michael Larsen) @mkltesthead?, (Curiosity killed the cat, but what kills curiosity? by Ann-Marie Charrett @charrett). Other talks (The Beauty of Minimizing Effort and Maximizing Creativity While Integrating Performance Throughout the Lifecycle by Scott Barber and The Psychology of Testing, by M isko Hevery) were also about testing.

I realized that testing is much much more than just checking facts. There is a whole universe of testing practices that you are probably not even aware of, and most of this universe cannot and should not be automated.

Software development is a creative job!

As part of the theme “The Arts” some talks were not about software development. I really loved the talk Shakespeare in Dev (Thomas Q Brady) and the opening keynote of the second day “The Creativity (R)Evolution” by Denise Jacobs @DeniseJacobs. Denise managed to trigger the desire to write, talk and share insights from many attendees in the room during her keynote!

My talk

I was excited to talk at Øredev on Friday after lunch: Refactor your specs! (Cyrille Martraire) The room was almost full, which may suggest that the topic is of interest for many. As a speaker I loved the professionalism of the staff doing the video, sound and organization all around so that everything runs smooth for everyone. Thanks a lot to you all! Overall my talk was well received and I had many good questions and very good feedback’s. As I said, this talk is just the beginning of a conversation that will go on, so feel free to contribute.

All the Øredev videos are available on this page: http://oredev.org/2013/videos (still not complete at the time of writing), so have fun and enjoy them all! Also have a look at the #oredev hashtag on Twitter for more quotes, and don’t forget to follow me at @cyriux on Twitter!

20131115-200133.jpg

20131115-200223.jpg

20131115-200244.jpg

20131115-200255.jpg

20131115-200340.jpg

Read More

TDD Vs. math formalism: friend or foe?

It is not uncommon to oppose the empirical process of TDD, together with its heavy use of unit tests, to the more mathematically based techniques, with the “formal methods” and formal verification at the other end of the spectrum. However I experienced again recently that the process of TDD can indeed help discover and draw upon math formalisms well-suited to the problem considered. We then benefit from the math formalism for an easier implementation and correctness.

It is quite frequent that maths structures, or more generally “established formalisms” as Eric Evans would say, are hidden everywhere in the business concepts we need to model in software.

Dates and how we take liberties with them for trading of financial instruments offer a good example of a business concept with an underlying math structure: traders of futures often use a notation like ‘U8’ to describe an expiry date like September 2018; ‘U’ means September, and the ‘8’ digit refers to 2018, but also to 2028, and 2038 etc. Notice that this notation only works for 10 years, and each code is recycled every decade.

The IMM trading floor in the early 70's (photo CME Group)

In the case of IMM contract codes, we only care about quarterly dates on:

  • March (H)
  • June (M)
  • September (U)
  • December (Z)

This yields only 4 possibilities for the month, combined with the 10 possible year digits, hence 40 different codes in total, over the range of 10 years.

How does that translate into source code?

As a software developer we are asked all the time to manage such IMM expiry codes:

  • Sort a given set of IMM contract codes
  • Find the next contract from the current “leading month” contract
  • Enumerate the next 11 codes from the current “leading month” contract, etc.

This is often done ad hoc with a gazillion of functions for each use-case, leading to thousands of lines of code hard to maintain because they involve parsing of the ‘U8’ format everytime we want to calculate something.

With TDD, we can now tackle this topic with more rigor, starting with tests to define what we want to achieve.

The funny thing is that in the process of doing TDD, the cyclic logic of the IMM codes struck me and strongly reminded me of the cyclic group Z/nZ. I had met this strange maths creature at school many years ago, I had a hard time with it by the way. But now on a real example it was definitely more interesting!

The source code (Java) for this post is on Github.

Draw on established formalisms

Thanks to Google it is easy to find something even with just a vague idea of how it’s named, and thanks to Wikipedia, it is easy to find out more about any established formalism like Cyclic Groups. In particular we find that:

Every finite cyclic group is isomorphic to the group { [0], [1], [2], …, [n ? 1] } of integers modulo n under addition

The Wikipedia page also mentions a concept of the product of cyclic groups in relation with their order (here the number of elements). Looks like this is the math-ish way to say that 4 possibilities for quarterly months combined with 10 possible year digits give 40 different codes in total.

So what? Sounds like we could identify the set of the 4 months to a cyclic group, the set of the 10 year digits to another, and that even the combination (product) of both also looks like a cyclic group of order 10 * 4 = 40 (even though the addition operation will not be called like that). So what?

Because we’ve just seen that there is an isomorphism between any finite cyclic group and the cyclic group of integer of the same order, we can just switch to the integer cyclic group logic (plain integers and the modulo operator) to simplify the implementation big time.

Basically the idea is to convert from the IMM code “Z3” to the corresponding ‘ordinal’ integer in the range 0..39, then do every operation on this ‘ordinal’ integer instead of the actual code. Then we can format back to a code “Z3” whenever we really need it.

Do I still need TDD when I have a complete formal solution?

I must insist that I did not came to this conclusion as easily. The process of TDD was indeed very helpful not to get lost in every possible direction along the way. Even when you have found a formal structure that could solve your problem in one go, even in a “formal proof-ish fashion”, then perhaps you need less tests to verify the correctness, but you sure still need tests to think on the specification part of your problem. This is your gentle reminder that TDD is not about unit tests.

Partial order in a cyclic group

Given a list of IMM codes we often need to sort them for display. The problem is that a cyclic group has no total order, the ordering depends on where you are in time.

Let’s take the example of the days of the week that also forms a cycle: MONDAY, TUESDAY, WEDNESDAY…SUNDAY, MONDAY etc.

If we only care about the future, is MONDAY before WEDNESDAY? Yes, except if we’re on TUESDAY. If we’re on TUESDAY, MONDAY means next MONDAY hence comes after WEDNESDAY, not before.

This is why we cannot unfortunately just implement Comparable to take care of the ordering. Because we need to consider a reference IMM code-aware partial order, we need to resort to a Comparator that takes the reference IMM code in its constructor.

Once we identify that situation to the cyclic group of integers, it becomes easy to shift both operands of the comparison to 0 before comparing them in a safe (total order-ish) way. Again, this trick is made possible by the freedom to experiment given by the TDD tests. As long as we’re still green, we can go ahead and try any funky approach.

Try it as a kata

This example is also a good coding kata that we’ve tried at work not long ago. Given a simple presentation of the format of an IMM contract code, you can choose to code the sort, find the next and previous code, and perhaps even optimize for memory (cache the instances, e.g. lazily) and speed (cache the toString() value, e.g. in the constructor) if you still have some time.

In closing

Maths structures are hidden behind many common business concepts. I developed an habit to look for them whenever I can, because they always help make us think, they help question our understanding of the domain problem (“is my domain problem really similar in some way to this structure?”), and of course because they often offer wonderful ready-made implementation hints!

The source code (Java) for this post is on Github.
Follow me on Twitter!
Photo: CME Group

Read More

DDD is back in Paris with a brand new Meetup group!

The first DDD Open Forum of the brand new Paris DDD meetup was last night, hosted by Arolla, and it was good to meet again after a long time with twenty-some Paris DDD aficionados!

@tjaskula, the organizer of this new group, opened the evening with a welcome introduction. He also gave many suggestions of areas for discussion and debate.

A quick survey revealed that one third of the participants were new to Domain-Driven Design, while another third was on the other hand rather comfortable with it. This correlated with a rather senior audience, with only one attendee with less than 5 years experience and many 10+ years developers, including 22 years and 30 years experience developers, and still coding! If you work in Paris, I guess you know them already…

It was an open space session, so we first proposed a lot of topics for discussion with post-its on the wall: how to sell or convince about DDD, introduction on concepts, synchronizing between contexts…

We all decided to start with a walk through of the fundamentals of DDD: Bounded Contexts, Ubiquitous Language, Code as Model… It was great to have this two-way knowledge transfer between seniors and juniors, in an interactive fashion and with lot of questions, including some rather challenging and skeptical ones! There was also some UML bashing of course.

We concluded by eating Galettes des Rois, together with cider and beer, and a lot of fun. Thanks everyone for your questions and contributions, and see you soon on next meetup!

The many proposals for discussion

Read More

Surface-area over volume ratio – a metaphor for software design

There’s a metaphor I had in mind for a long time when thinking about software design: because I’m proudly lazy, in order to make the code smaller and easier to learn, I must do my best to reduce the “surface-area over volume ratio” of the software.

Surface-area over volume ratio?

I like the Surface-area over volume ratio as a metaphor to express how to make software cheaper to discover and learn, and smaller to maintain as well.

For a given object, the surface-area over volume ratio is the amount of surface area per unit volume. For buildings and for animals, the smaller this ratio, the less the heat loss during the winter, hence a better thermal efficiency.

Have you ever noticed that huge warehouses were always cool even during the summer when it’s hot? This is just because in our real 3D world the surface-area over volume ratio is much smaller when the absolute size of the building increases.

The theory also mentions that the sphere is the optimal shape with respect to this ratio. In fact, the more “compact” the less the ratio, or the other way round we could define compactness of an object directly by its surface-area-over-volume ratio.

A dodecahedron, a volume that approximates a sphere with just 2D facets (Wikipedia picture)

What about software design?

Let’s consider that each method signature of each interface is part of the surface-area of the software, because this is what I have to learn primarily when I join the project. The larger the surface-area, the more time I’ll need to learn, provided I can even remember all of it.

Larger surface is not good for the developers.

On the  other hand, the implementation is part of what I would call the volume of the software, i.e. this is where the code is really doing its stuff. The more volume, the more powerful and richer the software. And of course the point of Object Orientation is that you don’t have to learn all the implementation in order to work on the project, as opposed to the interfaces and their method signatures.

Larger volume is good for the users (or the value brought by the software in general)

As a consequence we should try to minimize the surface-area over volume ratio, just like we’re trying to reduce it when designing a green building.

Can we extrapolate that we should design software to be more “compact” and more “sphere”-like?

Facets-like interfaces

Reusing the same interface as much as possible is obviously a way to reduce the surface-area of the software. Adhering to interfaces from the JDK or Google Guava, interfaces that are already well-known, helps even better: in our metaphor, an interface that we don’t have to learn comes for free, like a perfectly isolated wall in a building. We can almost omit it in our ratio.

To further reduce the ratio, we can find out every opportunity to use as much as possible the minimum set of common interfaces, even over unrelated concepts. At the extreme of this approach we get duck typing in dynamic languages. In our usual languages like Java or C# we must introduce additional small interfaces, usually with one single method.

For example in a trading system, every class with a isInCurrency(Currency) method can implement a common interface CurrencySpecific. As a result, a lot of processing (filtering etc.) on stuff that is related to currencies in some way can be done on all these classes without any knowledge about them, except their currency-specificity.

In this example, the currency-specificity we extracted into one interface is like a single facet over a larger volume made of several implementation. It makes our design more compact, it will be easier to learn, while offering a rich set of behaviors.

The limit for this approach of putting a lot of implementation code under the same interface is that sometimes it really makes no domain sense. Since code is primarily meant to describe the domain, without causing confusion we must be careful not to go too far. We must also take great care when sharing interfaces between bounded contexts, there’s a high risk of excessive coupling.

Faceted artwork (picture from http://reinierdejong.wordpress.com)

Yet another metric?

This metric could be measured by a tool, however the primary value is not in checking the figures, but in the thinking and taking care of making the design easy to learn (less surface-area), while delivering a lot of valuable behaviors (more volume).

Follow me on Twitter!

Read More

Collaborative Artifacts as Code

A software development project is a collaborative endeavor. Several team members work together and produce artifacts that evolve continuously over time, a process that Alberto Brandolini (@ziobrando) calls Collaborative Construction. Regularly, these artifacts are taken in their current state and transformed into something that become a release. Typically, source code is compiled and packaged into some executable.

The idea of Collaborative Artifacts as Code is to acknowledge this collaborative construction phase and push it one step further, by promoting as many collaborative artifacts as possible into plain text files stored in the same source control, while everything else is generated, rendered and archived by the software factory.

Collaborative artifacts are the artifacts the team works on and maintains over time thanks to the changes done by several people through a source control management such as SVN, TFS or Git, with all their benefits like branching and versioning.

Keep together what varies together

The usual way of storing documentation is to put MS Office documents into a shared drive somewhere, or to write random stuff in a wiki that is hardly organized.

Either way, this documentation will quickly get out of sync because the code is continuously changing, independently of the documents stored somewhere else, and as you know, “Out of sight, out of mind”.

we now have better alternatives

We now have better alternatives

Over the last few years, there has been changes in software development. Github has popularized the README.md overview file written in Markdown. DevOps brought the principle of Infrastructure as Code. The BDD approach introduced the idea of text scenarios as a living documentation and an alternative for both specifications and acceptance tests. New ways of planning what a piece of software is supposed to be doing have appeared as in Impact Mapping.

All this suggests that we could replace many informal documents by their more structured alternatives, and we could have all these files collocated within the source control together with the source.

In any given branch in the source control we would then have something like this:

  • Source code (C#, Java, VB.Net, VB, C++)
  • Basic documentation through plain README.md and perhaps other .md files wherever useful to give a high-level overview on the code
  • SQL code as source code too, or through Liquibase-style configuration
  • Living Documentation: unit tests and BDD scenarios (SpecFlow/Cucumber/JBehave feature files) as living documentation
  • Impact Maps (and every other mindmaps), may be done as text then rendered via tools like text2mindmap
  • Any other kind of diagrams (UML or general purpose graphs) ideally be defined in plain text format, then rendered through tools (Graphviz, yUml).
  • Dependencies declarations as manifest instead of documentation on how to setup and build manually (Maven, Nuget…)
  • Deployment code as scripts or Puppet manifests for automated deployment instead of documentation on how to deploy manually

Plain Text Obsession is a good thing!

Nobody creates software by editing directly the executable binary that the users actually run at the end, yet it is common to directly edit the MS Word document that will be shipped in a release.

Collaborative Artifacts as Code suggests that every collaborative artifact should be text-based to work nicely with source control, and to be easy to compare and merge between versions.

Text-based formats shall be preferred whenever possible, e.g. .csv over xls, rtf or .html over .doc, otherwise the usual big PPT files must go to another dedicated wiki where they can be safely forgotten and become instantly deprecated…

Like a wiki, but generated and read-only

My colleague Thomas Pierrain summed up the benefits of this approach, for a documentation:

  • always be up-to-date and versioned
  • easily diff-able (text filesn e.g. with Markdown format)
  • respect the DRY principle (with the SCM as its golden source)
  • easily browsable by everyone (DEV, QA, BA, Support teams…) in the readonly and readable wiki-like web site
  • easily modifiable by team members in a well know and official location (as easy as creating or modifying a text file in a SCM)

What’s next?

This approach is nothing really new (think about LateX…), and many of the tools we need for it already exist (Markdown renderers, web site to organize and display Gherkin scenarios…). However I have never seen this approach fully done in an actual project. Maybe your project is already doing that? please share your feedback!

UPDATE: My colleague Thomas Pierrain wrote a post on this idea:  http://tpierrain.blogspot.fr/2012/11/its-really-time-for-us-to-dry-our-apps.html

Read More