Featured

#ImpactMetrics

What’s the point?

It’s estimated that the human race spends over $3.5 trillion annually on IT. The world’s largest companies invest billions in tech driven change each year. But do we know how effectively we’re investing that money? Do we know if those investments are adding the intended – indeed, any – value? Do we know if our software deliveries are hitting the mark, or missing the point?

Feedback, introspection and adaptation based on learning are critical in improving things; measurements matter. To drive continuous improvement, mature software development teams will focus on 3 categories of measurement above all others – flow of work, quality of work and value.

Measurements of flow and quality have become more consistent over time, with teams monitoring lead and cycle time, throughput, MTTR, SLO adherence / error budgets and so on in similar ways.

True value measurements, however, are still generally absent from these metrics systems, and those of larger corporate investment governance structures. Teams often resort to proxies for value; story points delivered, relative value points or feature throughput as examples.  

Others don’t even go this far, and rely instead on ‘software delivered on time and on budget’ as a primary indicator of successful work. There are a myriad of problems with this; the dysfunction surrounding estimation is one (discussed in another post on #RangedEstimates); the fact that the simple act of delivering software (or making process change, or any other type of change) doesn’t actually prove that anything good happened is another. All that meeting ‘software delivery’ milestones tell us with confidence is that some software was delivered.

There are reasons for the absence of commonly observed value metrics.

Measuring value isn’t straightforward; value lags any activity which was intended to deliver it and is often loosely correlated to change delivery. Revenue increases or expense reductions may not show themselves for months or quarters after a product is enhanced, and other factors can obscure the causality between input (effort spent), output (software) and outcome (value), for example changes in market conditions or competitor actions.

So what is Value?

Meaningful prioritisation requires us to estimate the value of a given piece of work. Investment propositions are generally thought of as single step hypothesis…

“If we invest to deliver x, we believe we will realise value y

…where y could be a revenue increase, an expense reduction, or some other tangible value which would make the investment in x seem worthwhile. This could be a tiny feature, deliverable in a matter of hours or the opening up of a whole new product or market, delivered over multiple weeks, months or quarters.  

In truth, there is always something in between the delivery of change and value being realised. There are always at least two, separable hypothesis. The missing link in between ‘change’ and ‘value’ is important to understand if we’re interested in leading indicators of progress towards value.

We’ll call the missing link ’impact’, and rewrite our value statement as follows…

“If we invest to deliver x, we believe there will be an impact, y” (first hypothesis)

“When y happens, we believe that we’ll be able to realise value z” (second hypothesis)

Impact is the thing which links change to value. Impact Metrics are what we measure to determine if the first hypothesis was true.

ImpactMetrics

(thanks @annaurbaniak_me for the graphic)

Impact Metrics

Impact Metrics are a way of calling out the expected impact that an investment will make on the world, from which value might be derived. These metrics should be observable and measurable, ideally in an automated way. These metrics should be a primary data point when considering whether progress is being made, or not. 

One of the principles in the Agile Manifesto states “Working software is the primary measure of progress”. Good teams will extend the meaning of ‘working’ beyond ‘it compiles’ or ‘it doesn’t give me a HTTP 500 when I click the Submit button’, to mean ‘it solves the problem for which it was designed’.

Or maybe it doesn’t. That will inevitably happen. Impact Metrics provide insight here, too. Delivery of the wrong solution “on time and on budget”™ helps no-one, but is often celebrated as a success. Realising that the awesome updates we just made to our online store actually reduced – rather than increased – customer conversion rates might hit hard, but is obviously important if we’re to get to better outcomes.  

Test Driven Products

It’s interesting to observe how Impact Metrics (done well) yield similar benefits for products and projects to the ones Test Driven Development (done well) yield for codebases:

  • TDD requires that you to think hard about the problem you’re trying to solve before embarking on any change, such that you can articulate clearly the expected outcome of the change
  • TDD requires that you make the success criteria of the work you’re about to embark on transparent, explicit and measurable
  • TDD requires that you create a system that is inherently testable
  • TDD makes the success (or failure) of an implementation against those tests immediately transparent to all
  • Over time, TDD creates a system of feedback to show if new changes break expected legacy behaviour in other parts of the system

All of these benefits apply when Impact Metrics are used in the right way, as you must…

  • …define the desired impact of an investment or change up front, including how it will be measured, and describe how these measures are expected to change over time
  • …make these measures transparent; these are the impact that everyone is working to achieve, and from which someone will derive his or her value
  • …start to monitor these measures, ideally in an automated way and prior to any change being delivered in order to understand the baseline
  • …pay attention to the impact metrics as change is delivered (ideally in frequent, small batches) to see if they start to move in the predicted direction – remember that this is our real measure of progress, and is an important source of feedback and learning
  • Note that Impact Metrics often have value beyond the completion of the specific change which introduced them. They should be maintained as you would a normal test suite, discarding them only when they no longer assert something which has relevance

This last point is especially interesting. In the future, something may change which reverses the impact you delivered previously, and the original value may start to deteriorate. This could be due to something we did, or some external factor. Keeping these metrics ‘live’ is a good way of ensuring you can see when your product / value stream is ‘broken’, in the same way a good build radiator does for a code base.

This leads us to a world of Testable Products or Projects, where real progress towards desired outcomes can be measured and made transparent, and where deviations against expected or predicted behaviour can be alerted to and acted upon.

We’re all standing on the shoulders of others…

None of the above is new, indeed, the best technology led product companies in the world do this routinely.

What I’m constantly amazed by, however, is how many large enterprises lack this level of attention to detail in measuring the impact of the change investments they make, or in electing whether to continue to apportion funds to ongoing investments.

Individual and Interactions over Process and Tools

I’ve seen Impact Metrics make a material difference to the types of conversation teams have with their customers, business partners and sponsors; how teams understand their customer needs, and how real progress is measured and celebrated.

Impact Metrics get the team and sponsors aligned on what the real purpose of the work is. Most businesses don’t care about Epics, Features, Story Points, MVP’s, PSPI’s, Release Trains or Iterations. They use these terms only because we’ve asked them to. They care about sales, active customer or user numbers, conversation rates, product or order lead times, levels of customer complaints or satisfaction, social media sentiment, idle inventory, product defect / return rates and anything else which drives the thing they care about more than anything else – sustainable positive margin between revenue and expense.

Teams are not paid to deliver software, they are paid to deliver impact from which value can be derived. Software done well (small batches, delivered continuously) can be a great tool for this, but on it’s own is not the point. As Dan North often says, if you can achieve impact without having to change software, you are winning at software (h/t @tastapod).

There are other interesting side effects I’ve observed in teams using these techniques. Shaving functional scope to hit arbitrary dates occurs less because, as a consequence, the transparently measured impact gets diminished. If you’re delivering in frequent increments, you’ll know up front if the full desired impact is more or less likely to occur, and make decisions much earlier on how to modify your approach or investment.

Impact Metrics also create a cleaner separation of accountabilities between delivery teams and business sponsors. This may seem bad, or even controversial. However, bear with me…

Practically speaking, it’s often difficult or unrealistic for a delivery team to be genuinely accountable for resultant value in a corporate environment, even if they desire that accountability. There are often other important cogs in the system which determine whether the impact – if delivered – actually gets fully converted into value….sales and marketing teams are an obvious example.

It’s also important that the customer / business sponsor / stakeholder feels accountable for the resultant value of an investment. In my experience, it benefits if sponsors have real skin in the game – the accountability of turning delivered impact into value best sits firmly on the shoulders of the person accountable for the financial performance of a product or business area. 

A delivery team should always be aware of, be able to question, challenge and ultimately understand the connection between impact and value. But once that’s accepted, the team can focus on delivering impact – not software. The business / product owner can focus on ensuring that the value is subsequently derived from impact as it gets made. Everyone can see, from how impact metrics move as changes are made, whether the first hypothesis is still holding true, and adjust. Maybe the impact is harder to get to than first thought; it might make sense to stop investing. 

This separation of accountabilities may be undesirable in some circumstances, such as product companies whose primary product is technology but in corporates whose business is something which is enabled by, rather than actually being, technology, it can be beneficial. Feel free to argue this, or indeed, any other point – all debate welcome 🙂

Scaling Impact Metrics

There is one common problem for Enterprises when it comes to applying Impact Metrics – aggregation.

Large companies often collect metrics in a consistent way across businesses and aggregate them, to provide ‘Top of the House’ views. Impact Metrics don’t suit this type of aggregation, as they are necessarily domain specific.

To anyone looking at how this model might apply at company scale, I’d offer this advice

  • Ensure that, for any material investment you’re making, as well as capturing “Spend x, Make z, you also capture “the impact of spending x will be demonstrated by impact measures y1, y2, y3
  • Ensure every investment you make has an identified business / product owner or sponsor who is accountable and will stand behind the second hypothesis “If impacts y1 – y3 occur, I will be able to derive value z
  • Identify how many investments can’t articulate either of these, and understand whether that’s a concern – you may have a case of underpants gnomes

If you do have to produce aggregated enterprise measures, make visible the percentage of investments (and the total spend of those investments) for which there is no transparent measure of impact / success, or for which there is no accountable party will to stand behind the link between delivered impact and value. Go and ask some searching questions of those investments.

For everything else, make positive trend towards impact the key measure of progress. The teams can deal with everything else.

I’d love to hear other’s views (or even better – experiences), here or on Twitter @hamletarable. No silver bullets offered here, just another tool for the toolbox which has helped me, and others I’ve observed, deliver better outcomes in their contexts. Keen to understand if this applies in the context of others, and where it doesn’t.

Other posts I love on this topic

https://dannorth.net/2013/07/05/are-we-nearly-there-yet/ – by the awesome @tastapod

https://gojko.net/2012/05/08/redefining-software-quality/ – by the equally awesome @gojkoadzic

https://www.infoq.com/articles/impact-portfolio-management-agile – by awesome ex teammates @TommRoden and @13enWilliams

 

 

Featured

#RangedEstimates

This blog post was prompted by an enjoyable Twitter discussion earlier this year. You can state an opinion in 280 characters, but it’s hard to convey more detailed rationale using that medium, hence this post.

The topic of the discussion, and this post is Estimation, specifically Ranged Estimates.

Let’s get a few things out of the way early on…

  • Estimates are not facts. The dictionary definition is “an approximate judgment or opinion”. They are nothing more than today’s best guess, on which subject…
  • Estimates are often asked for & given at the time we know least about the thing we’re estimating; this wouldn’t be so bad, but for the fact that…
  • Estimates are very often treated as commitments or contractual agreements; they carry weight and become ‘sticky’, which is a problem because…
  • Estimating delivery dates for software on a horizon of more than a week or two out is somewhere between hard and impossible; a problem compounded because…
  • For some reason (I have looked, but can’t find the source) when we talk about software estimates, we forget that an estimate is “an approximation” and instead translate it to a specific date…this problem is exacerbated by the fact that…
  • Every project or programme tracking tool commonly used in software development today [1] represents an estimate as a single point in time – a specific date. All of the above means that…
  • Estimates give an impression of understanding to a high degree of precision which cannot be relied upon, meaning…
  • When Estimates (regularly) turn out to be wrong, people question the value that they bring (whilst often fiercely criticising those who gave them). When combined with the fact that…
  • Estimates aren’t free – you have to invest time in producing these unreliable, inaccurate comfort blankets which come back to bite you – means that…
  • People are spending time searching, experimenting, trying to find a better way.

Who can blame them?

One such group of people have created the #NoEstimates movement. In the words of one proponent:

#NoEstimates is a hashtag for the topic of exploring alternatives to estimates for making decisions in software development. That is, ways to make decisions with ‘No Estimates’.

My experience is that it is not estimation or estimates themselves, but the way in which they have been used for decades which is at fault. Here I offer some practical ideas which acknowledge the dysfunction laid out above, but equally respect the realities of human-centric delivery systems where some level of estimation is valuable in most contexts.

Driving Better Conversations

My ambition is to help drive better conversations through looking at estimation differently: conversations richer in content, deepening shared understanding, improving mutual trust and transparency, ultimately producing better outcomes.

But first, I want to offer 3 reasons why I feel avoiding discussing estimates is the wrong approach in most circumstances.

1) Estimates themselves may be of questionable worth but there is value in estimation as a conversation

Specifying, writing, building, testing, deploying and supporting software is hard. Good software teams obsess about flow and value, and how to increase both. Flow is hampered if there is misalignment within a delivery system. Exploring the task at hand together can help expose differences in mental models, expected outcomes or intended approaches.

Commentary from #NoEstimates advocates suggests that they would agree with much of the above, but say that such discussions can happen without talking about estimates themselves. That seems to me to be both true and unnecessary – why deliberately avoid discussing a tangible data point which might expose a fundamental misunderstanding between two parties (“You see this as 2-3 weeks work, I see it as 2-3 months – what are the differences in our mental models which cause such variance”)?

So, what’s the harm in talking about estimates as part of that discovery and alignment process? Maybe it has something to do with the following…

2) Estimates very often become a vehicle for dysfunction (though they are passive in themselves and cannot therefore be a root cause of dysfunction)

The stories I have on the dysfunction associated with estimates would be common to many: pressure to provide delivery dates when little is really understood of the work at hand, little willingness to adjust based on new information as work progresses and understanding changes, corners cut to hit arbitrary dates, celebration of delivery dates met whilst value is rarely mentioned let alone measured, teams ‘encouraged’ (coerced) to work long hours / weekends / spend time away from their families and friends to meet ‘their commitments’…the list goes on.

All of this is doubtlessly damaging, and many, myself included, have suffered at its hands. To address this in a systemic way, we must accept that there is work to be done by multiple parties – those providing the estimates, and those requesting them.

Those requesting estimates often have valid needs, but are using the same broken models that were handed down to them by the generation before. They need to be given different tools if we are to drive better outcomes through better conversations.

For those of us at the heart of such deliveries, we haven’t done enough to help wean these groups off of the tools of the past, and we haven’t given them something sufficiently better to work towards.

I’m not convinced that ignoring the needs of those who use estimates at part of a decision-making process, or to align dependant or related activities, is a constructive way to move us all forwards together.

And on the subject of need….

3) Estimates are often required by others – even if not deemed highly valuable for team effectiveness – and others matter in a team sport

Significant software change rarely happens in isolation. There may be operations or support teams processes to update, user training to be undertaken, manuals and other web content to be updated, marketing campaigns to align to and so on. If you’re a marketing manager looking to place adverts for your new product line, having a rough view of when the next release may be available to customers is useful to you.

Then there is the fact that product managers / sponsors need to make investment / prioritisation decisions. Prioritising by formulae like Weighted Shortest Job First (Reinertsen) doesn’t work without some idea of “Shortest” and “Weight”; estimated cost vs estimated value can’t be determined without some indication of relative investment required to generate a potential return. A delivery model where cost / value hypothesis can be tested quickly and cheaply by releasing small batches of functionality is preferable; we all prefer to make more frequent, smaller bets if we can to accelerate understanding. But even knowing which experiments to invest in – these experiments aren’t free either – can be better informed by a view on estimated scale of the relative opportunities.

#RangedEstimates

So let’s suggest how we could look at estimates in a different way, to maybe drive better conversations.

1) Discuss and provide estimates as a range, not a fixed date

I’m not talking about tasks / stories / small shippable units which you want flowing out frequently every few days to a couple of weeks. I’m talking about what most people would call ‘Epics’, ‘Milestones’, ‘Major Release Candidates’ – some larger outcome or value-creating event which is (ideally) realised by the combined delivery of a succession of smaller, distinct, valuable features – the key points in a roadmap where people’s lives become noticeably better. I’ll talk more about what to estimate and what to measure in a future post.

After exploring these higher order goals through the process of breaking them into smaller, value-creating, independently deployable features as you usually would, estimate the whole epic as a team using ranges, asking…

“What is the earliest date we think this outcome could be arrived at, with the wind at our backs and the stars aligned?” (let’s call this the left hand date)

and…

“What is the longest we think this outcome could realistically take to achieve, if the world was out to get us?” (let’s call this the right hand date)

…this gives you a range.

Then, for each of those questions ask and capture why they are what they are; which stars would have to align, and how, in order for the left hand date to be hit? In what ways could the world conspire against us to push delivery towards the right hand date?

These are your levers. They are your constraints, or accelerants. They are the macro things which may impact your flow positively or negatively. You just made them transparent; everyone can see them, and begin to understand what they can do to impact them. Not just the team – this now gives options to others who have a vested interest in the success of the work. It describes ways in which they can help. More on that below.

It’s important to acknowledge that not all levers are knowable up front, especially early in the process. But this doesn’t mean that none of them are, and to suggest as much is to do a disservice to the skill and experience of your team members. Don’t overweight them, but don’t ignore them either.

2) Understand that the size of the range is a measure of confidence

A team I once worked with gave an estimate for a piece of work of ‘between 2 and 6 months’. The sponsor was incredulous at first – “Why such a big variance? Aren’t you just adding padding to give yourself contingency?”. A large range points to a lack of confidence on the part of the team. Making this visible is useful, as it gives everyone the ability to ask better follow-up questions about what’s driving that uncertainty. Presenting a fixed date means others may miss this opportunity for discussion (How many when pressed would communicate 6 months, and watch Parkinson’s law play out…) . 

3) Convey the nuance of the range to those who care about your change

The response to the sponsor’s challenge in the above was to talk about the levers. “Well, we know that if the data quality on the new feed we’re implementing is good, we can get this done in around two months based on what we’re done before. But the worst feed we onboarded took over 4 months just to clean up before we could even start to think about getting it live”. The response? “Ok, when do you think you’ll know whether the feed has data quality issues? If it takes more than 4 months, it’s not worth investing in right now” – “Probably in a week or two” – “Ok, let me know when you get there, and let’s determine whether we should continue”. In this particular instance, the data quality of the new feed was pretty good and the team delivered the feed in a little over the 2 month left hand estimate.

The quality (value) of the conversation was materially improved from the historical norm of ‘deliver x, and it must be within 4 months’. In this instance, had the data quality been poor, the team may have fought to get ‘something’ working to the 4 month date and then carried debt into subsequent pieces of work. The business would have made a bad decision without knowing it; the work may have even looked like a success as ‘something’ would go live on a date which made economic sense on the surface, but the true picture would have been different. And no one would have been able to see it, or had the opportunity to change course.

4) Let ranges move – expect them to and embrace it when it happens

As teams make progress on a piece of work, the task at hand tends to become clearer and uncertainties become more certain…sometimes, the opposite is true. The left and right hand dates will move about, the range may shrink (confidence increasing) or may grow (uncertainty increasing). Let this happen, make it transparent and talk about why. The value is in the conversation – the joint understanding of the challenges being presented over time, and the approaches being taken to overcome them.

This movement is crucial as it gives teams the confidence to not over-invest in up front estimation, as you are not committing to either date; teams become more comfortable providing ‘today’s best guess’, and why. 

5) Don’t force the estimate of a large thing to equal the sum of the estimates of smaller things

It’s tempting when you break up a large piece of work, to estimate the resultant smaller things, sum those estimates and feel that this should equate to the size of the original larger piece. I don’t advise doing this – it’s often wasteful and misleading. There is work between the gaps; complex systems development cannot be decomposed to that level of precision. If using story points at a story level helps you, then by all means do it but understand that summing those points to arrive at a single point of precision for the delivery date of the larger goal won’t give you a reliable outcome. Reconciling these views may add value, but don’t couple them.

6) Educate relevant parties on what this approach offers for them, from their perspective rather than yours

#RangedEstimates enables those who have a vested interest in the work to be more active in driving better outcomes; to have more skin in the game, to understand how they can help influence an organisation to achieve the outcomes they want.

I’ve met very few managers who I felt were not well-intentioned. I have met many without a good understanding of the negative impact they can have on the outcomes software investments and team health, or on what positive organisational levers they could pull to help. “My job is to hold the team to account”….no, the team can do that themselves. Your job is to ensure that the best outcomes are met. Understanding the levers can help you understand what you can do to help improve those outcomes.

7) Understand that this is what we’ve always done….internally…

Here’s the big secret – we all do this, all of the time. When asked to estimate anything, we think in terms of dependencies, work to be done, knowns, uncertainties, contingencies and we translate to ranges automatically.

But at the end of this process, we’re often asked to translate this into a specific date. So, an estimate of ‘some time in Q3’ inevitably becomes 30th September, and Parkinson’s law takes over, meaning that the 30th September is now the earliest date the work will complete…and often, it will be past this.

So what do you do in this situation? Provide the left hand date? That’s the most optimistic date; if we do that, we won’t meet those expectations very often. So the right hand date? That’s pretty pessimistic – although sometimes realistic. I’d feel uneasy about giving that answer every time, as in most cases I will deliver sooner. So the midpoint? An 80% mark?

Estimation is, and always has been “an approximate judgment or opinion”. All that’s being proposed here is to make that approximation and it’s drivers transparent to enable better conversations; to respond to the need for estimates with real estimation, to show a preference for accuracy over artificial precision.

“If a, b and c are true, we could be done as early as x, however if d, e and f are true, it could take as long as y”

Doesn’t sound so hard when you say it like that, does it?

Caveats

This may bounce off your stakeholders / sponsors / customers like a rubber ball, in which case, you’ve probably got work to do anyway. In these situations, #NoEstimates is unlikely to help either. Some people just want a specific date, and then want to drive you to them. In that situation, it pays to remember that we all have choice; comply, educate or get out of dodge. As a good friend of mine often laments, “a business gets the IT it deserves”.

These thoughts are not represented as a silver bullet, or even new thinking. I’ve used this technique to good effect with teams at multiple companies, but my sample size is far from statistically significant. By posting this, I’m hoping for two things – to fuel further objective, rational debate around a problem area I feel strongly about, and to encourage others to use #RangedEstimates and add their experiences to the debate. I don’t believe in universally applicable process when it comes to pretty much anything (context is always crucial). But if #RangedEstimates can be a stepping stone towards better conversations between technologists and those they interact with, that’s enough for now.  

Lastly, I can’t post this without a call to all makers of tools in this space. Please give me the ability to represent this uncertainty, and all of the nuance which comes with it, in your tools. By forcing the input of estimates as specific dates, you are reinforcing a system which we all know is broken. You can be part of the solution, if you allow us to capture and share the realities of what we do, rather than force us to represent a level of precision which is disingenuous at best.

Better conversations may happen all around the world if you do.


[1] At least, the ones I’ve seen – please point me to examples where this isn’t the case. ‘Excel’ doesn’t count 🙂