In this example, we’ll consider how to measure the impact of changes or investments that are intended to increase revenue.
At first glance, you’d think that this is an easy & obvious candidate for clear outcome measurement. Post the deployment of a new feature, did revenue go up or down? Data from ordering systems can be collected and analysed at or near real time, so it shouldn’t be a difficult graph to draw.
However, making a change and then measuring revenue alone is rarely going to tell the whole story. There are a myriad of factors which could move revenue, regardless of the success or otherwise of a given change. Revenue can be impacted by seasonality, competitor activity, general macro economic factors such as increases or reductions in consumer confidence, advertising campaigns, even the time of the month (just before or after common paydays, for example).
So, even though increased revenue is the outcome we want, what are the impacts which might be a) measurable, and b) reasonably predictive of revenue increases? Good impact metrics must be both.
In the original article, we discussed that one of the things which makes impact measures difficult to apply is that impact is always context specific, and therefore difficult to learn from recipe or by following a set process. The answer to the question ‘what should we measure?’ is always – it depends.
To fill in the gaps between output (say, a new feature being released) and outcome (increased revenue), we need to explore the initial hypothesis that led to the change in more detail, always seeking to identify where we might be able to measure something which equates to a meaningful impact.
- Was the new feature intended to bring in brand new customers, for example, by implementing a local language version in an overseas market? If so, customer sign up rates in that market would be one measure to look for, as well as then comparing their shopping behaviour and order volumes
- Was the change intended to increase the number of orders placed from the existing customer base, for example, by streamlining the checkout process? In that case, order completion rate might be the right measure – online retailers invest huge amounts of time experimenting with just this one process
- Was the change intended to increase average order size, rather than the overall number of orders, for example by implementing a product recommendation algorithm? In which case, average order size might be a useful thing to measure…frequency of customers purchasing suggested products might be another
The examples above are more specific than measuring revenue alone, and therefore deal with some of the revenue ‘noise’ discussed earlier. However, teams may want to go further and test out their hypothesis on just a sub-set of their customer base, keeping others on the ‘old’ version as a control group to test against. This technique is known as canary releasing or canary testing, and plenty of far more qualified people that me have written extensively on the subject – I’d recommend you start here if you’d like to know more.
It’s worth noting that just because an impact wasn’t observed, it doesn’t mean that the change in itself was badly designed but it does mean that on it own, it didn’t generate the impact expected. Maybe the new language version of the site works really well, but without sufficient local advertising, not enough people in that market know about it. Maybe the new checkout process really is streamlined, but the release coincided with some slowdown in the back end which led more people to abandon their transactions. Maybe the products being recommended really are of interest to the users, but the UX to add these products to the order is clunky and non-obvious, so most are ignored.
Lack of impact won’t always tell you why, but it will provide an unescapable data point which tells you that the expected impact of the change you made, for whatever reason, isn’t being met. And that on its own is an important thing to know.
Why is this important?
Good engineering teams are not motivated by simply delivering features, but by solving problems and delivering end customer or stakeholder need. Businesses, likewise, don’t invest in producing great technology for the sake of it. As wondrous a thing as technology is, ultimately it is a vehicle to help someone, to solve a problem or to otherwise meet a real or perceived need, be that powering the worlds financial markets, redefining entire industries such as music and video distribution, inventing whole new ones such as mass social media, or simply enabling us to switch off for 20 minutes by crushing a few imaginary candies.
Teams optimise how they are organised, how they communicate, how they interact with customers or product teams, how they design, write, test, scan, deploy and monitor software so that they can turn product ideas into real world feedback quickly, safely and consistently.
But more than this – they care about the value being created, and learn to pivot quickly if an idea turns out to be a bad one. They seek to understand customer needs, and play an active role in helping develop new ways to meet and exceed them. And they look for data to balance opinion, anecdote and conjecture. They want to know, rather than assume that their time and effort is delivering the intended impact, and when they learn that it isn’t, they pivot.
Ultimately, it’s about how you define success. Is success releasing software “on time, on scope and on budget”? Maybe. Maybe not. What if the change was delivered perfectly, on time and as requested, but customer orders dropped, not increased? I wouldn’t call that a success. What if the change was delivered a few weeks late, but over the course of the first two months exceeded the expected impact and bought in 10 times the cost of the few weeks slippage? Provided that there was enough cashflow to bridge the gap, I’d call that a win.
How important this all is really comes down to whether you define success through activities (things you did), or outcomes (things you achieved).