Key Engineering Metrics in Software Delivery
Engineering metrics, such as lead time and deployment frequency, help teams understand their overall engineering performance. Most importantly, they provide organizations with an objective way to measure and improve their software delivery. They help teams quickly identify bottlenecks and inefficient processes in their development pipeline and create a plan to improve their daily work.
Why are engineering metrics important in software production?
Tracking and measuring the right metrics can guide teams along the path to improving their DevOps and engineering performance, as well as help them create a happier and more productive work environment.
Other benefits of improving visibility into engineering metrics include:
- Better planning: Teams can use data to drive engineering decisions and make software delivery more predictable. They can better estimate project timelines and the resources needed to complete development tasks.
- Development visibility: Highly visible and widely trusted metrics align team members on their goals. Creating a shared view of reality makes it easier for teams to uncover potential issues and iterate on their fixes faster.
- Efficient delivery: By analyzing and minimizing delays in the delivery pipeline, teams can improve the flow of work from development to production.
- Improved developer experience: By identifying and improving areas of weakness in the development life cycle, engineers will experience less frustration and fewer blockers during their daily work. They have access to the right tools and avoid unnecessary bureaucracy, handoffs, and disruptions.
Ultimately, engineering metrics—when combined with a culture of psychological safety and transparency—can improve team productivity, development speed, and code quality.
Characteristics of useful engineering metrics
Great engineering metrics should follow the ARCS framework:
- Actionable: They provide insight into possible solutions for challenges facing the team. Great metrics not only shed light on obstacles and inefficiencies, they also offer teams a clear path for experimentation and improvement.
- Relevant: Metrics should be important to the long-term goals of the team and organization. Teams should avoid vanity metrics and reduce noise by compiling the most meaningful metrics for their organization.
- Continuous: Continuous improvement is a core idea in Lean methodology that advocates for incremental improvement in an organization’s performance through continuous measuring and learning. Goals should be continuously updated and teams should look to improve the status quo. Real-time, continuous metrics provide faster feedback and make data actionable sooner.
- Shared: It’s important that everyone on a team or in an organization has access to their team’s metrics. Highly visible metrics help teams to create a shared view of reality, meaning they can collaborate with transparency and trust using a single source of truth.
A secondary framework, ORCA, explains how teams should treat the underlying data powering their engineering metrics:
- Objective: The goal should be to eliminate or minimize bias, while providing a fair and repeatable way to understand their engineering systems. Objectivity requires teams to also recognize the limits of their metrics—and understand what context they don’t show.
- Reliable: Teams should trust their metrics to measure what they say they are measuring. They should be automated and repeatable to minimize manual work and reduce reporting errors.
- Collective: Engineering metrics should measure organizations and their systems, rather than individuals. They should aggregate data and avoid personally identifiable data. Micromanaging individuals creates a culture of distrust, fear, and stress; over the long-term, it negatively impacts team happiness and productivity.
- Auditable: Engineering metrics should rely on clear definitions with transparent data sources and algorithms. Every team member should be able to follow clear paths between raw data and metrics in their dashboards. For example, you should be able to trace lead time from each commit to their deployment to production.
Without guardrails in place and a deep understanding of how to use their metrics for team improvement, engineering metrics can have unintended consequences. At their worst, they can feel irrelevant to the goals of the company. They can also be untrustworthy, competitive, or hidden from teams. Over time, teams create a culture of distrust and fear if they feel they’re being judged against inaccurate, unfair, or highly subjective metrics.
That’s why engineering metrics work best when combined with a strong culture of psychological safety, learning, and transparency. Choosing the right metrics and implementing them in the right way can empower teams to continuously improve their craft.
Introducing DORA metrics: DevOps metrics explained
Many of the engineering metrics used in software development today were developed by DORA, short for DevOps Research and Assessment, a team of Google researchers tasked with uncovering the most important traits of high performing engineering organizations.
The world-renowned DORA team publishes the annual State of DevOps Report, an industry study surveying software development teams around the world. Over the last few years, DORA’s research has set the industry standard for measuring and improving DevOps performance.
What are DORA metrics and why are they important?
The DORA metrics combine measures of development velocity (lead time and deployment frequency) and development quality (mean time to recovery and change failure rate). By combining these metrics, teams can understand how changes in product stability affect development throughput, or vice versa.
The DORA metrics have helped standardize engineering metrics in the tech industry. Engineering teams can consistently and accurately benchmark their progress against other companies in the market. Over the last few years, the DORA metrics have shown a widening performance gap in software development. Compared to the lowest performing teams, elite engineering teams have:
- 973x more frequent deployments
- 6750x faster lead times
- 6570x faster time to recover from incidents
- 3x lower change failure rates
According to the DORA metrics, improving engineering performance requires teams to both increase the speed of their deployments and improve the stability of their software.
What are the four key DORA metrics?
There are four key DORA metrics developed by Google to measure engineering performance:
- Lead time for changes: the time elapsed between committing code and deploying it to production
- Deployment frequency: how often an organization deploys to production
- Change failure rate: the percentage of deployments causing failures in production
- Time to restore service: how long it takes an organization to recover from failures in production, also more commonly known as mean time to recovery
Lead time for changes and deployment frequency help teams understand their development velocity, including their continuous integration and deployment capabilities. Change failure rate and time to restore service measure code and product quality by tracking outages, downtime, and other production issues.
Lead time for changes: DORA metric explained
Lead time for changes, often referred to simply as lead time, is the time required to complete a unit of work. It measures the time between the start of a task—often creating a ticket or making the first commit—and the final code changes being implemented, tested, and delivered to production.
Why is lead time for changes important?
Lead time is a powerful metric for understanding points of friction and bottlenecks within the development pipeline. It provides insight into how long it takes for teams to complete their work and how quickly they deliver value to their customers.
In many ways, lead time is a powerful proxy for friction or frustration during the software development life cycle.
Short lead times mean an organization rapidly designs, implements, and deploys new features and updates to their customers. Code moves efficiently through the delivery pipeline, from first code to first review to deployment. Teams provide developers with the time and tools needed to build and test their work, while also providing resources needed to safely and easily deploy code once it's merged and approved.
Long lead times indicate delays during the development process. Developers could be overburdened with meetings or distractions. They may be using outdated tooling, battling technical debt, or wrangling difficult merge conflicts.
Long lead times can also indicate delays and bottlenecks during the deployment stage. In this situation, developers are able to easily merge their changes to the main branch, but deployments are unsafe, risky, or require too much coordination between different teams or team members.
How to calculate lead time for changes
There are several ways to calculate lead time depending on how your team marks the beginning and end of each unit of work. Moreover, lead time can be calculated for any unit of work, such as a story, task, or feature.
The DORA metrics measure lead time as the time it takes to get a commit into production. To measure lead time, calculate the time elapsed for each commit between its creation and its release. Find the median time across commits to reveal an organization-wide lead time.
The DORA team divides teams into four categories based on their lead time:
- Elite: less than one hour
- High: between one day and one week
- Medium: between one month and six months
- Low: between more than six months
The DORA metrics use the first commit as the starting point because it can be easily and objectively measured using source control data and it captures most of the development work required for a task.
Some engineering teams prefer to measure lead time as the time between creating a task, often in Jira or Clickup, and deploying related changes to production. Calculating lead time beginning when you create a task helps teams understand the full feature life cycle, including the design and planning stages. However, lead time can be skewed by large ticket backlogs or project management techniques. It can also require manual work to link tasks with commits or involve developer workflow changes to tag pull requests with special task-related labels.
How to improve lead time for changes
To improve lead time, you should first identify your team’s most significant time constraint during the development life cycle. For example, if your team requires a full day to deploy changes to production but only an hour for code reviews, it’s more effective to start exploring ways to improve the release process, rather than trying to decrease review time by a few minutes.
There are a few common causes of delays to consider when improving lead time:
- Decrease the number of handoffs: Too many team dependencies or handoffs between team members can delay important work. Each handoff requires team members to context switch, schedule meetings, or wait for free time. Self-service environments, pre-approved tooling, and up-to-date documentation can greatly reduce the number of handoffs required to write and test code.
- Limit work in progress: Too much work in progress means developers must balance multiple competing priorities, forcing them to break focus and fragmenting their time. Limiting work in progress can improve focus and speed.
- Avoid late stage rework: Late stage rework can be a sign of changing requirements, poor planning, or lack of early testing. Rework is especially time-consuming because it requires rewriting code after hundreds or thousands of lines have already been written. To avoid rework, teams should provide earlier and faster feedback to developers—often with shift-left testing and continuous integration to quickly flag potential code issues.
- Improve automation: Automated workflows can help reduce manual workloads required to get code into production, freeing up DevOps engineers’ time to focus on other improvement tasks. Automated tests and shift-left security can also reduce the time your team spends on repetitive tasks and identify issues earlier in the development process when they are easier to fix.
Deployment frequency: DORA metric explained
Deployment frequency is a measure of how often your organization deploys code to production. It helps teams understand the deployability of their codebase and the health of their CI/CD pipelines. When teams have a higher deployment frequency, they maintain their codebase in a deployable state and have the necessary automations and workflows to quickly and easily deploy code to production.
Why is deployment frequency an important metric?
Releasing to production more frequently is often an indicator that a team consistently ships new features and product updates, providing value to their customers faster. By enabling more frequent deployments, teams enable faster feedback and faster time to market, while avoiding painful backlogs and delays.
However, if releases are too frequent, quality issues may arise without automated and robust testing. To avoid releasing low quality code to production, it’s important to measure deployment frequency alongside other software stability metrics.
Low deployment frequency typically indicates delays in the delivery pipeline—either before merging code into production or during the deployment step.
Complex merge conflicts, often caused by tightly-coupled architecture or long-lived feature branches, can decrease the number of changes merged into the main branch. Change approval boards and slow reviews can also create a bottleneck of changes when developers try to merge them. With fewer changes to production code, teams deploy less frequently.
Even after developers merge their code into the default branch, painful or complex deployments can lower deployment frequency. Slow builds and flaky tests can delay deployments or push teams to avoid deployments altogether. When deployments are needlessly complex, teams often wait to deploy code on specific days of the week with a dedicated deployment team—creating a significant choke point in the development pipeline.
How to calculate and use deployment frequency
According to the DORA team, deployment frequency measures the number of deployments in a given time period. For example, elite teams deploy code multiple times per day, while low performers deploy code less than once every six months.
DORA divides teams into four categories based on their deployment frequency:
- Elite: multiple times per day (on-demand)
- High: between once per week and once per month
- Medium: between once per month and once every six months
- Low: between fewer than once every six months
Engineering teams can also calculate deployment frequency based on the number of developers. Simply take the number of deployments in a given time period and divide by the number of engineers on your team to calculate deployment frequency per developer. For example, a high performing team might deploy to production three times per week per developer.
Although less widely accepted, some teams measure deployment frequency as the number of opportunities to deploy to production compared to the actual number of deployments. For example, if your team merges four pull requests into the main branch, but only deploys those changes after the final merge, then your deployment frequency would be 25% (one deployment divided by four opportunities).
How to improve deployment frequency
Improving deployment frequency requires teams to keep their codebase in a deployable state more often. To improve deployment frequency, teams should:
- Decrease batch size: Deployment frequency is often considered a proxy for batch size—the size of a single, discrete unit of work. Large, unwieldy code changes can create delays and make deployments less frequent. Smaller units of work are easier to merge and deploy because they are simpler to review and less likely to conflict with other ongoing changes to the codebase.
- Adopt trunk-based development: Trunk-based development is the practice of merging small and frequent code changes into a single branch, known as the trunk branch. By avoiding long-lived feature branches, teams can consistently merge new changes into their main branch.
- Practice continuous integration: Continuous integration and deployment is an agile methodology that encourages teams to merge and release their software more frequently. It requires teams to implement consistent workflows and tooling for building and testing their software, so developers can easily integrate and validate their changes. As a result, developers can deploy their changes more frequently.
- De-risk deployments with continuous deployment: Better CI/CD pipelines also make deployments safer and more reliable by decreasing the manual work needed to release changes to production environments. On high performing teams, deployments are streamlined with automated and self-service CI/CD tooling, often allowing anyone on the development team to safely release code.
Mean Time to Recovery (MTTR): DORA Metric Explained
MTTR, short for mean time to recovery and also known as time to restore service, is the time required to get an application back up and running after production downtime, degraded performance, or outage. It measures the average time between services failing and being restored, highlighting the lag between identifying and remediating issues in production.
Why is MTTR important?
Production failures will inevitably occur at every engineering organization. They are important for organizational learning because they provide teams with the opportunity to continuously improve their systems and workflows.
To minimize potential issues during the learning process, teams should have systems in place to quickly identify, triage, and resolve issues in production. As with technical debt, teams should also set aside time required to improve the long-term stability of their application and reduce MTTR.
Low MTTR indicates teams can quickly mitigate incidents in production. Incident response teams have the tools to alert the right team members, analyze incidents with ample telemetry, and quickly deploy fixes.
High MTTR threatens the customer experience. It can indicate a buggy, unstable, or unusable product. While a lack of new features or product updates can sometimes drive customers to competitors over the long-term, high MTTR can threaten the user experience of existing customers in the short-term.
How is MTTR calculated?
Mean time to recovery is the time between the start of an issue in production and the end of the incident in production.
To calculate MTTR, track the total time spent on unplanned outages then divide by the number of incidents. The result will be the average time per incident.
An incident must be both high-stakes (it impacts customers) and urgent (it needs to be resolved immediately) to be considered a failure. Cyberattacks, DNS issues, and network outages are all considered failures; however, a chart missing a few pixels of padding would likely not be important when calculating MTTR, unless it's causing severe usability problems.
DORA divides teams into four categories based on their mean time to recovery:
- Elite: less than one hour
- High: less than one day
- Medium: between one day and one week
- Low: more than six months
It’s important to remember that MTTR does not measure the severity of incidents or the number of affected customers, which can be impacted by the timing of the outage, the use of feature flags, and the degree to which performance has degraded. As a result, some teams use a weighted average when calculating their MTTR. For example, teams might double the time spent resolving incidents during peak hours when calculating MTTR compared to incidents during non-peak hours.
How to improve MTTR
Reducing MTTR can help teams save time and minimize firefighting. There are two main ways to decrease MTTR:
- Reduce the number of severe issues in production
- Resolve issues in production faster
It’s important to think long-term when improving MTTR. Teams should uncover the underlying causes of application failures and implement preventative measures to avoid similar failures in the future. Ultimately, the best way to improve MTTR in the long-term is to prevent recurring issues—not rely on quick fixes and band-aids.
Teams can adopt a principle known as DRI—don’t-repeat-incidents. After each incident, they should run blameless retrospectives to identify their causes and develop a strategy to prevent similar incidents. Teams should set aside time for long-term fixes during their sprint or project planning.
Second, teams should create robust incident management plans. Creating strategies for handling outages before your team experiences production failures can improve response times and reduce stress during an outage.
Incident management plans should improve the efficiency of your responses and help your team find production failures faster. They require insights, data, and telemetry—all coordinated in a timely fashion for the right people.
A great incident management plan requires teams to:
- Assign an incident commander. They can be the first person to respond to an issue, or even a rotating or permanent role. The incident commander is responsible for coordinating response activities and sharing information between team members. For example, many incident commanders will create temporary channels in Slack or Teams for each incident to streamline team collaboration.
- Invest in cross-training and knowledge transfer. Create runbooks and continuously update documentation so anyone on a team can respond to an outage effectively. The goal is to reduce dependencies on only a few team members during incidents and empower every engineer to assist if needed.
- Improve visibility. Teams need to quickly find what’s causing an outage, create hypotheses for a fix, and test their solutions. They can shorten this process with real-time data flows, better context, and robust monitoring using tools like DataDog and Monte Carlo.
- Fine tune alerting thresholds. Many teams only alert people when there’s an outage; however, teams can adjust their alerts to notify team members when certain metrics are approaching dangerous thresholds. With more advance notice, teams can create a fix while the situation deteriorates, but before it reaches catastrophic failure. At the same time, teams should avoid alert fatigue by suppressing alerts during maintenance and routing alerts to the right teams or engineers.
- Practice failure with chaos engineering. Chaos engineering helps teams identify areas of weakness in their incident response plans and provides an opportunity to rehearse their incident management—e.g. how to collaborate, fine-tune alerting tools, and so on.
- Turn on feature flags and blue-green deployments. Teams can quickly rollback or turn off problematic changes with feature flags. They can test changes on a small group of customers before fully rolling out their changes, limiting the scope of any production failures.
Other definitions of MTTR
Although MTTR is one of the most widely used metrics in engineering, there is often confusion about what the “R” represents. In addition to mean time to restore, there are two other usages of MTTR: mean time to respond and mean time to resolve.
Mean time to respond is the time between first alert and fix. It excludes alerting lag time and measures the efficiency of your team’s response after they have been notified of an issue.
Mean time to resolve (or resolution) is the time to detect, diagnose, and fix an incident, including the time required to improve long-term performance. It measures the time required to fix an issue in production, as well as the time required to implement additional measures to prevent the issue from occurring again.
Understanding the different meanings behind the ‘R’ in MTTR is important, as each option has slightly different meanings within software engineering. The most common usage of MTTR refers to mean time to restore, although all three metrics provide additional context for your team’s incident response.
Change failure rate explained
Change failure rate is the rate at which deployments to production lead to incidents. While MTTR measures your team’s ability to mitigate incidents, change failure rate measures your team’s ability to avoid issues from even reaching production.
Why is change failure rate important?
Production failures and incidents are an important part of organizational learning. Even high-performing teams will experience production outages and downtime. Ultimately, the goal of measuring change failure rate is not to blame or shame teams; instead the goal is to learn from failure and build more resilient systems over time.
Engineering teams can achieve lower change failure rates by building more robust CI/CD pipelines with better automation, achieving higher test coverage, and encouraging frequent code reviews. Teams should strive to catch bugs and potential issues earlier in the development cycle before they reach production environments.
How is change failure rate calculated?
To measure change failure rate, calculate the percentage of deployments that cause production failures. For example, if your team had five releases in a week and two of them caused outages, the change failure rate would be 40%.
It’s important to remember that change failure rate does not measure failures caught by testing and patches before deployments. Change failure rate only refers to incidents after deployments.
According to DORA, change failure rate tends to be similar among high, medium, and low performing teams, but decreases significantly for elite teams:
- Elite: 0% to 15%
- High: 16% to 30%
- Medium: 16% to 30%
- Low: 16% to 30%
Change failure rate is often considered to be one of the more difficult metrics to calculate because it requires teams to label each release and incident, then manually associate them. Some services, such as CircleCI, allow teams to use failed builds and workflows as an automated and simply proxy for change failure rate.
How do you improve change failure rate?
There are three main ways to improve change failure rate:
- Provide faster and earlier feedback: Reducing change failure rate requires teams to provide faster and earlier feedback to engineers during development. Teams should strive to improve and strengthen code quality checks, while keeping them lightweight and easy to use. Teams can also create better feedback loops with shift-left testing, test automation, and more robust testing environments. Earlier and automated testing flags problematic code changes and pull requests before they are released to production and when they are easier and cheaper to fix. A strong culture of code reviews and team feedback can also identify design and integration issues that may go undetected by automated testing.
- Pay down technical debt: Teams can also consistently pay down technical debt to avoid unexpected behavior when pushing new changes. Legacy code can add complexity to the development process, allowing hidden or obscure bugs and issues to interfere with your most recent releases.
- Invest in loosely coupled architecture: Similarly, teams can invest in loosely-coupled architecture. In a loosely-coupled system, components are detached and able to operate somewhat independently. For example, a headless content management system detaches a blog’s frontend from its content storage, but enables data to be transferred through APIs. Whether in a loosely structured monolith or microservices, loosely coupled architecture reduces the probability of changes in one part of the codebase from unexpectedly creating issues with other components.
Engineering metrics and DORA summary
The DORA metrics are most useful as a positive catalyst for change in an engineering organization. They serve as the fuel—not the engine—for continuous improvement. The ‘engine’ comprises collaboration, experimentation, psychological safety and organizational learning.
Whenever teams identify potential areas for improvement, they should devise and test their solutions, sharing what they learn with their team members at each step. They should continuously revisit their DORA metrics to validate their progress towards building a highly effective engineering organization.
Useful engineering metrics outside of DORA
To improve visibility, engineering managers and leaders should consider other metrics beyond the DORA metrics as well.
Many engineering leaders in software development use cycle time and lead time interchangeably.
Similar to lead time, cycle time measures the amount of time from work start to delivery. Shorter cycle times indicate faster time to market, while long cycle times indicate delays and inefficiencies in delivering new features.
Lead time is typically more flexible in how teams define it. Some engineering leaders argue that lead time includes the total time elapsed between creating a task and developers beginning work on it, in addition to the time required to release it. Cycle time, however, refers specifically to the time between beginning work on a task and releasing it, excluding time needed to begin working on a task.
However, as previously mentioned, the DORA team defines lead time as the total time between creating a commit and releasing it to production.
Rework measures the amount of code churn that happens at different points in the development pipeline. In other words, it tracks how often code is rewritten or removed.
Rework early in the development cycle can show rapid prototyping and experimentation. Developers build, break, and fix their code. Late stage rework, however, can be a sign of changing requirements or a lack of early testing. Rework late in the development cycle is often costlier and more complex to fix, negatively affecting team velocity.
Open/close rate is a metric that measures how many issues in production are reported and closed within a specific timeframe. It can help teams understand how the stability of their product is changing over time; increasing open rates can indicate growing technical debt and an increase in the number of bugs in production.
Mean time between failures (MTBF)
MTBF is a metric that measures the time between unexpected incidents or failures. MTBF is used to track the reliability and availability of production environments. To calculate MTBF, subtract the number of hours of downtime from the number of hours of uptime, and divide the result by the number of incidents. For example, if an application has two failures during the 8-hour workday and two hours of downtime, the MTBF would be 3 hours (subtract two hours from eight, and divide by two incidents).