In Part 1 of our two-part “You Can’t Measure What You Can’t See: Driving Outcomes through Value Stream Maps” series, we covered how to create a value stream map and took a high-level look at the four metrics to track at each stage of your value stream. In this installment, we’re going to dig into how you can turn this data into insights to increase your DevOps maturity.
Innovation speed is the number one reason companies adopt DevOps. However, the Software Delivery Performance Matrix shows that speed isn’t the only benefit. In fact, the highest performing DevOps teams balance speed with quality, increasing not only how fast they work, but also how well they work.
The question becomes, “how do you optimize speed and quality in parallel?” And luckily, value stream maps give you the data you need to address both. The strategies detailed below will help you to get started on increasing both speed and quality in your development lifecycle. Remember, it's very important to focus on one area of improvement at a time. If you focus on everything, the focus is on nothing. While elite organizations have high performance across the board, they got there by making step-wise improvements over time. Treat your journey in the same way; marginal improvement in one area will ultimately have outsized impact on overall business outcomes.
Side note: If you haven’t mapped out your value stream, we recommend starting your optimization journey here—understanding your value stream is the essential first step to realizing speed and quality performance gains. Everything is theory until you have the data—without it, you can’t accurately identify inefficiencies, reduce waste, or drive improvement.
1: Reduce lead time.
Lead time—the time it takes to release a feature to production after development is complete— is the first indicator of speed. Remember, your team isn’t only delivering work, you’re delivering capabilities that increase business value, so the longer you delay releasing your work, the longer it takes the business to realize value.
Understanding your current lead time is step 1. Step 2 is benchmarking it against industry peers, your team’s previous performance, or even other development teams in your organization. These comparative metrics give you a baseline for where your team stands and also help you identify what lead time you should be able to achieve. The top 13% of Salesforce organizations have a lead time of less than one hour, while high performing teams have a lead time of less than one day. How do they do it?
While there are many factors, two strategies for reducing lead time may include:
- Writing smaller user stories. DevOps experts recommend promoting work upstream at least daily to ensure this flow is steady, but if user stories are too big, developers can’t promote as frequently. Smaller, bite-sized chunks of work can typically be completed, reviewed, and tested faster because they are generally less complex and touch fewer related but tangential pieces of metadata at once. This makes the flow of work more consistent and digestible, thereby reducing time to value for any one user story.
- Reducing the amount of stories developers are working on at once. We’ve all been told time and again that multitasking is far less effective than working on one task at a time. The same is true with development teams: developers should focus on finishing and delivering one user story before starting another. When developers focus on pushing one piece of work through rather than splitting their time between multiple user stories, they help to keep the entire development lifecycle moving forward.
2: Increase deployment frequency.
Deployment frequency is the second indicator of speed, as it measures how often a team releases work to production. Higher performing organizations are far more likely to integrate developers’ changes on an ongoing basis, and when those changes are integrated more frequently, they are likely to be less complex and less costly.
However, if teams just began moving faster and merging code into production more often, quality could plummet. Instead, increasing deployment frequency is best accomplished with guardrails provided by DevOps methodologies and tools.
Strategies that can contribute to increased deployment frequency include:
- Release smaller batches. You can think of deployment frequency as somewhat analogous to lead time; where lead time can be decreased by working on smaller units of work (user stories) and fewer at a time, deployment frequency can be increased by releasing smaller batches of work at each deployment. These smaller batches allow teams to have better and more effective test coverage, and to rollback changes more easily if necessary.
- Automate testing and deployments. When processes are repeatable and follow a prescribed set of steps each time, they should be automated rather than manual. By moving to automated processes, your team will be able to move faster while reducing potential for human error.
You might notice that teams who deploy more frequently tend to see lower risk and faster time to value. That’s because while each of these strategies can contribute to increased deployment frequency, they may also help to ensure increased quality at the same time. Building higher levels of security and compliance into your DevOps processes ultimately enables a faster, more reliable, and more consistent value flow from IT to end user.
3: Decrease change failure rate.
Ideally, errors would never make it to production to disrupt any business process or workflow. But, as this is real life and every part of the development life cycle requires human input at one point or another (either doing the work or setting up the automated systems and tests), errors do happen. The first quality indicator—change failure rate—measures the percentage of production releases that cause deployment errors which result in business disruptions. This measurement helps illustrate to delivery leaders how their work affects the business.
Decreasing change failure rate and maintaining a low number is critical for the business—not just in terms of value creation or loss in the moment, but for overall end user trust and business continuity. Customer exposure to bugs and any amount of downtime due to breaking changes can be incredibly costly.
That said, lowering this number can be complex because it’s tied to factors across people, process, and product. Let’s take a look at a few methods for addressing each:
- People: First, figure out manual steps in the development process, and whether or not that work can be automated. Where there is more manual work—ie manually packaging or testing high volumes of work—there are greater chances for error. These teams often see the highest change failure rates. Automating as many of these processes as possible (such as automated regression testing and CI/CD) can remove a large portion of breaking changes due to human error.
- Process: Remember our discussions about how to increase innovation speed and velocity? These are process optimizations. By increasing your speed through increasing deployment frequency and reducing deployment sizes, QA teams are better able to test the work that comes across their desk, meaning errors in code are more likely to be caught. Additionally, implementing quality gates such as not allowing changes to be made directly into production will go a long way in reducing failures.
- Product: Tech debt is notorious for causing on-going problems. Ensure your code is scalable and as up to date as possible in terms of any refactors that are needed so code can be simplified and old, brittle code is removed.
Change failure rate needs to be as close to zero as possible. Decreasing it mitigates risk and is essential for maintaining business value over time.
4: Decrease mean time to recovery.
When errors are deployed, how long does it take your team to troubleshoot, fix, or rollback those changes? The answer to that question is your mean time to recovery. You can think of change failure rate and mean time to recovery as a pair. Failures should happen as infrequently as possible, but when they do happen, your team should be able to fix them quickly.
A short mean time to recovery is essential for many of the reasons discussed above—not only does downtime potentially erode customer trust, but it also contributes to low productivity and even lost revenue when systems are unreliable. Clearly, your organization wants to avoid all of these outcomes.
Here are a few practices to think about in order to decrease mean time to recovery:
- Version control. When a failure occurs, are your developers able to identify the root cause of the issue in a timely manner? CI/CD tools with version control provide an audit trail of all the changes included in a release, which can help developers track down what went wrong more efficiently. Seeing the difference between the “original” state and the current failure state can also help to indicate what a potential fix might include, or serve as a baseline for what state to roll back to while fixes are in progress.
- Speaking of rollbacks—practice makes perfect. Consider practicing rollbacks with your team so they understand how to “undo” a production failure, what types of changes to roll back, and how to roll fixes forward when appropriate.
Of course, speed and quality are intertwined. If your team has reduced lead time and is deploying frequently, the odds that you can also reduce your mean time to recovery are greater because it’s likely that the code developers will need to assess is less complex and can therefore be fixed faster.
Speed + Quality = Elite Performance
Teams who regularly monitor development metrics perform better, but more effective performance isn’t based on monitoring metrics alone. The crucial step is turning insight into action and implementing strategies that help your team improve on areas of weakness and scale areas of strength.
There are many ways to approach a problem and optimize a team’s performance, but realizing performance gains in all areas is more likely to occur when a comprehensive CI/CD tool is implemented and common DevOps practices are followed.
Speed at the price of quality is no longer a tradeoff. With a strong DevOps culture and value stream mapping to bring visibility to your processes and their effectiveness, there is a clear confirmation that it is possible to “optimize for stability without sacrificing speed.”