Framework agnostic capacity planning at scale

Nick Brown
ASOS Tech Blog
Published in
9 min readDec 7, 2023

--

How can you consistently plan for capacity across teams without mandating a single way of working? In this blog I’ll share how we are tackling this in ASOS Tech…

What do we mean by capacity planning?

Capacity planning is an exercise undertaken by teams for planning how much work they can complete (in terms of a number of items) for a given sprint/iteration/time duration. Sadly, many teams go incredibly detailed with this, getting into specifics of the number of hours available per individual per day, number of days holiday and, even worse, using story points:

When planning on a broader scale and at longer term horizons, say for a quarter and looking across teams, the Scaled Agile Framework (SAFe) and its Program Increment (PI) planning appears to be the most popular approach. However, with its use of normalised story points, it is quite rightly criticised due to it (whatever your views on them may be) abusing the intent of story points and, crucially, offering teams zero flexibility in choosing how they work.

At ASOS, we pride ourselves as being a technology organisation that allows teams autonomy in how they work. As Coaches, we do not mandate a single framework/way of working as we know that enforcing standardisation upon teams reduces learning and experimentation.

The problem that we as Coaches are trying to solve is aligning on a consistent understanding and way to calculate capacity across teams all whilst avoiding the mandating of a single way of working and aligning with agile principles. Our current work on this has led us down the path of taking inspiration from the work of folks like Prateek Singh in scaling flow through right-sizing and probabilistic forecasting.

Scaling Simplified: A Practitioner’s Guide to Scaling Flow eBook : Singh, Prateek: Amazon.co.uk: Books

How we are doing it

Right-sizing

Right-sizing is a practice where we acknowledge and accept that there will be variability in sizes of work items at all levels. What we focus on is, depending on backlog level, understanding what our “right-size” is. The most common type of right-sizing a team will do is taking their 85th percentile of their cycle time for items at story level, and using this as their “right-size”, saying 85% of items take n days or less. They then proactively manage items through Work Item Age, compared to their right-size:

However, as we are looking at planning for Features (since this is what business stakeholders care about), we need to do something different. Please note, when I say “Features”, what I really mean here is the backlog hierarchy level above User Story/Product Backlog Item. You may call this something different in your context (e.g. Epic), but for simplicity in this blog I will use the term “Feature” throughout.

I first learnt about this method from Prateek’s “How many bottles of whiskey will I drink in 4 months?” talk from Lean Agile Global 2021. We visualise the Features completed by the team in the last n weeks, plotting them on a scatter plot with the count of completed child items (at story level) and the date the Features were completed. We then add in percentiles to show the 50th/85th/95th percentiles for size (in terms of child item count), typically taking the 85th percentile for our right-size:

What we also do is visualise the current Features in the backlog and how they compare to the right-size value (giving this to teams ‘out the box’ we choose 85th percentile for our right-size). This way a team can quickly understand, of their current Features, which may be sized correctly (i.e. have a child item count lower than our right-size), which might be ones to watch (i.e. are the same size as our right-size) and which need breaking down (i.e. bigger than our right-size):

Please note: all Feature names are fictional for the purpose of this blog

Note that the title of the Feature is also a hyperlink for a team to open the item in their respective backlog tool (Azure DevOps or Jira), allowing them to directly take action for any changes they wish to make.

What will we get?

Now we know what our right-size for Features is, we need to figure out how many backlog items/stories we have capacity for. To do this, we are going to a run a Monte Carlo simulation to forecast how many items we will complete. I am not planning to go into detail on this approach and why it is more effective than other methods such as story points, mainly because I (and countless others!) have covered this in detail previously. We will use this to allow a team to forecast, to a percentage likelihood, the number of items the team is likely to complete in the forecasted period (in this instance 12 weeks):

It is important to note here that the historical data used as input to the forecast should contain the same mix of conditions as the future you are trying to predict. As well as this, you need to understand about the variability in your system and whether it is the right amount or too much — check out Dan Vacanti's latest book if you want more information around this. Given nearly all our teams are stable and dedicated to an application/service/part of the journey, this is generally a fair assumption for us to make.

How many Features?

Now that we have our forecast for how many items, as well as our right-size for our Features, we can calculate how many Features we have capacity for. Assuming we are using our 85th percentile, we would do this via:

  1. Taking the 85th percentile value in our ‘What will we get?’ forecast
  2. Divide this by our 85th percentile ‘right-size’ value
  3. If necessary, round this number (down)
  4. This gives us the number of ‘right-sized’ features we have capacity for

The beauty of this approach is, unlike other methods which just provide a single value in terms of capacity, with no understanding of what the risk involved with that calculation is, this method allows teams to play around with the risk appetite they have. Currently this is set to 85% but what if we were feeling more risky? For example, if we’ve paid down some tech debt recently that enables us to be more effective in delivery, then maybe 70% is better to select. Know of new joiners and people leaving your team in the coming weeks therefore need to be more risk averse? Then maybe we should be more conservative with 95%…

Tracking Feature progress

When using data for planning purposes, it is also important that we are transparent around progress with existing Features and when they are expected to complete. Another part to the template teams can use is running a Monte Carlo simulation on their current Features. We visualise Features in their priority order in the backlog along with their remaining child count, with the team able to select a target date, percentile likelihood and crucially, how many Features they work on in parallel. For a full explanation on this I recommend checking out Prateek Singhs Feature Monte Carlo blog which, combined with Troy Magennis’ multiple feature forecaster, was the basis for this chart. The Feature Monte Carlo then shows, depending on the percentage confidence chosen, which Features are likely to complete on or before the selected date, which will finish up to one week after the selected date, and which will finish more than one week after the selected date:

Please note: all Feature names are fictional for the purpose of this blog

Again, the team is able to play around with the different parameters here to understand which is the determining factor which, in almost all cases, is to limit your work in progress (WIP) — stop starting and start finishing!

Please note: all Feature names are fictional for the purpose of this blog

Aggregating information across teams

As the ASOS Tech blog has shared previously, we try to gather our teams at a cadence for our own take on quarterly planning (titled Semester Planning). We can use these techniques above to make clear what capacity a team has and, based on their current features, what may continue into another quarter and/or have scope for reprioritisation:

Capacity for 8 ‘right-sized’ features with four features that are being carried over (with their projected completion dates highlighted)

Within our technology organisation we work with a Team > Platform (multiple teams) > Domain (multiple platforms) model — a platform could therefore leverage the same information across multiple teams (in a different section of a Miro board) to present their view of capacity across teams as well as leveraging Delivery Plans to show when (in terms of dates) that capacity may be available:

Please note: all Feature names are fictional for the purpose of this blog

Domains are then also able to leverage the same information, rolling this info up one level further for a view across their Platform(s):

Please note: all Feature names are fictional for purpose of this blog

One noticeable addition at this level is the portfolio alignment value.
This is where we look at what percentage of a Domains work is linked to our overall Portfolio Epics. These portfolio items ultimately represent the highest priorities for ASOS Tech and in turn directly align to strategic priorities, something which I have covered previously in this blog. It is therefore very important we are aware of and striking the right balance between feature delivery, the needs of our platforms and tech debt/hygiene.

These techniques allow us to present a data-informed, aligned view of capacity across our technology organisation whilst still allowing our teams the freedom in choosing their own way of working (aligned to agile principles).

Conclusion

Whilst we do not mandate a single way of working, there are some practices that need to be in place for teams/platforms to leverage this, these being:

  • Teams and platforms regularly review and move work items (User Stories, PBIs, Features, Epics, etc.) to in progress (when started) and done (once complete)
  • Teams regularly monitor the size (in terms of number of child work items) of Features
  • At all levels we always try to break work down to thin, vertical slices
  • Features are ‘owned’ by a single team (i.e. not shared across multiple teams)

All teams and platforms, regardless of Scrum, Kanban, XP, DevOps, blended methods, etc. should be doing these things already if they care about agility in their way of working.

Hopefully this blog has given some insight on how you can do capacity planning, at scale, whilst allowing your teams freedom to choose their own way of working. If you are wondering what tool(s) we use for this, we have a Power BI template that teams can download, connect to their Jira/Azure DevOps project and get the info. If you want, you can give this a go with your team(s) via the GitHub repo here (don’t forget to check against the pre-requisites!).

Let me know in the comments if you have any other approaches for capacity planning that allow teams freedom in their way of working…

About Me

I’m Nick, one of our Agile Coaches at ASOS. I help guide individuals, teams and platforms to improve their ways of working in a framework agnostic manner. Outside of work, my new born son keeps me on my toes agility wise, here’s a recent photo of him and my wife Nisha (to my left) meeting the team…

ASOS are hiring across a range of roles in Tech. See all our open positions

--

--