Engineering Velocity: A Complete Guide to DORA Metrics and Measuring Team Performance

How do you measure the productivity of a software team? If you answer "lines of code" or "hours worked," you're optimizing for the wrong things. These metrics encourage gaming, don't correlate with value delivery, and destroy morale.
High-performing technology organizations focus on outcomes, not output. The industry standard for measuring software delivery performance is the DORA (DevOps Research and Assessment) framework, backed by years of research across thousands of organizations.
Why Traditional Metrics Fail
Let's examine why common productivity metrics are counterproductive:
| Metric | Why It's Used | Why It Fails |
|---|---|---|
| Lines of Code | Easy to measure | Incentivizes verbose code; 100 lines could be 10 |
| Hours Worked | Visible effort | Burnout; presence ≠ productivity |
| Story Points Completed | Tracks velocity | Point inflation; estimates ≠ value |
| Bugs Fixed | Shows activity | Incentivizes creating bugs to fix |
| PRs Merged | Shows output | Encourages small, trivial PRs |
The fundamental problem: these metrics measure activity, not impact.
Introduction to DORA Metrics
DORA metrics emerged from six years of research by DevOps Research and Assessment (now part of Google Cloud). The research surveyed over 32,000 professionals worldwide and identified four key metrics that predict software delivery performance AND organizational performance.
┌─────────────────────────────────────────────────────────────┐
│ DORA Framework │
├─────────────────────────────────────────────────────────────┤
│ │
│ Throughput Metrics Stability Metrics │
│ (Speed of Delivery) (Quality of Delivery) │
│ │
│ ┌─────────────────┐ ┌─────────────────┐ │
│ │ Deployment │ │ Change Failure │ │
│ │ Frequency │ │ Rate │ │
│ │ │ │ │ │
│ │ "How often do │ │ "What % of │ │
│ │ we deploy?" │ │ changes fail?" │ │
│ └─────────────────┘ └─────────────────┘ │
│ │
│ ┌─────────────────┐ ┌─────────────────┐ │
│ │ Lead Time │ │ Mean Time to │ │
│ │ for Changes │ │ Restore (MTTR)│ │
│ │ │ │ │ │
│ │ "How long from │ │ "How fast do │ │
│ │ code to prod?" │ │ we recover?" │ │
│ └─────────────────┘ └─────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────┘
The Four DORA Metrics Explained
1. Deployment Frequency
Question: How often does your organization successfully release to production?
What it measures: The speed at which your team can deliver value to customers.
| Performance Level | Frequency |
|---|---|
| 🏆 Elite | On-demand (multiple times per day) |
| 🥇 High | Between once per day and once per week |
| 🥈 Medium | Between once per week and once per month |
| 🥉 Low | Between once per month and once every six months |
Why it matters:
- Smaller, more frequent batches reduce risk
- Faster feedback loops
- Earlier value delivery to customers
- Easier troubleshooting when issues arise
How to improve:
- Automate testing and deployment
- Break features into smaller increments
- Implement feature flags for safe releases
- Reduce batch sizes
2. Lead Time for Changes
Question: How long does it take for a commit to go from code to production?
What it measures: The efficiency of your entire software delivery pipeline.
| Performance Level | Lead Time |
|---|---|
| 🏆 Elite | Less than one hour |
| 🥇 High | Between one day and one week |
| 🥈 Medium | Between one week and one month |
| 🥉 Low | Between one month and six months |
Why it matters:
- Low lead time indicates a healthy, automated pipeline
- Faster response to customer needs
- Quick bug fixes and security patches
- Higher developer satisfaction
Components of Lead Time:
Lead Time for Changes
Code ──► Review ──► Build ──► Test ──► Deploy ──► Production
│ │ │ │ │
└────────┴─────────┴─────────┴─────────┘
Total Lead Time
Typical breakdown:
┌──────────────────────────────────────────────────────────┐
│ Code Review │████████████│ 40% ← Often bottleneck │
│ Build/Compile │███│ 10% │
│ Automated Testing │██████│ 20% │
│ Manual QA │██████│ 20% ← Eliminate if possible │
│ Deployment │███│ 10% │
└──────────────────────────────────────────────────────────┘
How to improve:
- Automate everything possible
- Parallelize test suites
- Implement trunk-based development
- Reduce PR review wait times
3. Change Failure Rate (CFR)
Question: What percentage of deployments cause a failure in production?
What it measures: The quality of your changes and the robustness of your testing.
| Performance Level | Failure Rate |
|---|---|
| 🏆 Elite | 0-5% |
| 🥇 High | 5-10% |
| 🥈 Medium | 10-15% |
| 🥉 Low | 16-30%+ |
What counts as a failure:
- Production incidents requiring remediation
- Rollbacks
- Hotfixes
- Failed deployments
- Degraded service requiring intervention
Why it matters:
- Speed means nothing if you're breaking things constantly
- Customer trust and satisfaction
- Team morale and on-call burden
- Regulatory compliance
How to improve:
- Comprehensive automated testing
- Feature flags for canary releases
- Better pre-production environments
- Post-incident reviews and learning
4. Mean Time to Restore (MTTR)
Question: How long does it take to restore service when a failure occurs?
What it measures: Your team's ability to detect, diagnose, and recover from incidents.
| Performance Level | Time to Restore |
|---|---|
| 🏆 Elite | Less than one hour |
| 🥇 High | Less than one day |
| 🥈 Medium | Between one day and one week |
| 🥉 Low | More than one week |
Why it matters:
- Failure is inevitable; recovery speed is what matters
- Minimizes customer impact
- Reduces stress on teams
- Shows operational maturity
MTTR Breakdown:
Incident Timeline
Incident ──► Detection ──► Triage ──► Fix ──► Deploy ──► Verify
Starts │ │ │ │ │
└────────────┴─────────┴────────┴──────────┘
Total MTTR
Key Components:
• Mean Time to Detect (MTTD) ← Monitoring quality
• Mean Time to Acknowledge ← On-call processes
• Mean Time to Diagnose ← Observability, documentation
• Mean Time to Fix ← System complexity
• Mean Time to Deploy ← Pipeline speed
• Mean Time to Verify ← Testing confidence
How to improve:
- Invest in observability (monitoring, logging, tracing)
- Create runbooks and incident playbooks
- Practice incident response (game days)
- Implement easy rollback mechanisms
DORA Performance Benchmarks
Based on the 2023 State of DevOps Report:
Elite High Medium Low
───── ──── ────── ───
Deployment Frequency Multiple Daily to Weekly to Monthly to
per day weekly monthly semi-annual
Lead Time for Changes < 1 hour 1 day to 1 week to 1-6 months
1 week 1 month
Change Failure Rate 0-5% 5-10% 10-15% 16-30%
Mean Time to Restore < 1 hour < 1 day 1 day to > 1 week
1 week
Key insight: Elite performers are NOT trading speed for stability. They deploy more frequently AND have lower failure rates.
Implementing Metrics Without Toxicity
Metrics can be weaponized. If you punish teams for high Change Failure Rates, they will stop deploying. If you reward Deployment Frequency, they will deploy empty commits. Here's how to implement metrics constructively:
Principle 1: Measure Teams, Not Individuals
Software is a team sport. Individual metrics create:
- Competition instead of collaboration
- Incentives to game the system
- Blame culture
Instead: Measure at the team or product level. Celebrate team improvements.
Principle 2: Use Trends, Not Absolutes
A single snapshot tells you nothing. Focus on:
- Are we better than we were last month?
- Is the trend improving?
- What changed to cause improvement or decline?
Good Dashboard Example:
Lead Time Trend (Last 6 Months)
───────────────────────────────
12h │
10h │ ■
8h │ ■ ■
6h │ ■ ■
4h │ ■ ■ ■ ← Improving!
2h │
└───────────────────────
J F M A M J
Principle 3: Context Matters
Not all teams are the same:
- A platform team rewriting a core database engine should have a lower deployment frequency than a frontend team tweaking UI text
- A greenfield project will have different metrics than legacy maintenance
- Different domains have different risk profiles
Instead: Compare teams to their own history, not to each other.
Principle 4: Focus on Removing Friction
Use metrics as a compass, not a report card:
| If This Metric is Poor | Investigate These Areas |
|---|---|
| Low Deployment Frequency | Manual processes, fear of change, large batch sizes |
| High Lead Time | Slow builds, review bottlenecks, manual testing |
| High Change Failure Rate | Inadequate testing, poor environments, rushed changes |
| High MTTR | Poor observability, missing runbooks, complex systems |
Anti-Patterns to Avoid
Goodhart's Law: "When a measure becomes a target, it ceases to be a good measure."
| Anti-Pattern | What Happens | Better Approach |
|---|---|---|
| Targeting Deployment Frequency | Empty deploys, split PRs artificially | Focus on reducing batch size |
| Punishing CFR | Teams stop deploying, hide incidents | Blameless post-mortems |
| Gamifying metrics | Competition, gaming, burnout | Team-level improvement focus |
| Public shaming | Fear, hiding problems | Private team discussions |
Building a DORA Dashboard
Here's what a healthy DORA dashboard looks like:
┌─────────────────────────────────────────────────────────────┐
│ Engineering Metrics Dashboard │
├─────────────────────────────────────────────────────────────┤
│ │
│ Team: Platform Engineering Period: Last 30 Days │
│ │
│ ┌─────────────────┐ ┌─────────────────┐ │
│ │ Deployment Freq │ │ Lead Time │ │
│ │ 23/month │ │ 4.2 hours │ │
│ │ ↑ 15% │ │ ↓ 22% │ │
│ │ [■■■■□] High │ │ [■■■■■] Elite │ │
│ └─────────────────┘ └─────────────────┘ │
│ │
│ ┌─────────────────┐ ┌─────────────────┐ │
│ │ Change Failure │ │ MTTR │ │
│ │ 8.7% │ │ 45 minutes │ │
│ │ ↓ 3% │ │ ↓ 15% │ │
│ │ [■■■■□] High │ │ [■■■■■] Elite │ │
│ └─────────────────┘ └─────────────────┘ │
│ │
│ Trend: Overall improvement. CFR spike in Week 2 │
│ due to database migration - addressed in post-mortem. │
│ │
└─────────────────────────────────────────────────────────────┘
Tools for Measuring DORA
| Tool | Type | Best For |
|---|---|---|
| Sleuth | Commercial | Complete DORA tracking |
| LinearB | Commercial | Engineering metrics |
| Jellyfish | Commercial | Engineering intelligence |
| Four Keys | Open Source | Google's reference implementation |
| Haystack | Open Source | GitHub-focused |
| Custom + DataDog | DIY | Flexible, integrated |
Beyond DORA: Additional Metrics
While DORA covers software delivery, consider these complementary metrics:
Developer Experience Metrics
| Metric | What It Measures |
|---|---|
| Developer Satisfaction | Survey-based team health |
| Onboarding Time | Time for new developers to ship |
| Context Switching | Interruption frequency |
| Technical Debt Ratio | Time spent on maintenance |
Quality Metrics
| Metric | What It Measures |
|---|---|
| Test Coverage | Breadth of automated testing |
| Escaped Bugs | Bugs found in production |
| Code Review Turnaround | Time to review PRs |
| Security Vulnerabilities | Open security issues |
Implementation Roadmap
Month 1: Foundation
- Set up deployment tracking
- Instrument CI/CD for lead time
- Define what constitutes a "failure"
- Establish incident tracking
Month 2: Visibility
- Create team dashboard
- Establish baseline measurements
- Share with teams (not management first)
- Identify obvious improvement areas
Month 3-6: Improvement
- Focus on one metric at a time
- Run experiments to improve
- Document what works
- Celebrate improvements
Ongoing
- Regular retrospectives on metrics
- Adjust targets as team improves
- Resist pressure to use for performance reviews
- Keep focus on team improvement
Key Takeaways
- Measure outcomes, not output: Lines of code and hours worked don't correlate with value
- Use DORA metrics: Deployment Frequency, Lead Time, CFR, and MTTR are research-backed
- Elite teams are fast AND stable: Speed and quality are not trade-offs
- Measure teams, not individuals: Software is a team sport
- Trends over absolutes: Compare teams to their own history
- Context matters: Different teams have different constraints
- Remove friction: Use metrics as a compass to find bottlenecks
- Avoid weaponization: Metrics as punishment destroy trust and performance
Focus on removing friction. If Lead Time is high, invest in faster builds. If MTTR is high, invest in better monitoring. The metrics are a compass, not a report card.
Want to improve your engineering team's performance with data-driven insights? Contact EGI Consulting for an engineering metrics assessment and improvement roadmap tailored to your organization.
Related articles
Keep reading with a few hand-picked posts based on similar topics.

Technical debt is inevitable, but unmanaged it can sink your project. Learn proven strategies to identify, categorize, prioritize, and strategically pay down your software's hidden liabilities.

Learn how to build Internal Developer Platforms (IDPs) that boost engineering productivity by 30%+. Includes IDP architecture, golden paths, and implementation playbook.

Why are so many digital initiatives over budget and late? The problem might be your funding model. Learn how moving from 'Project Mode' to 'Product Mode' fundamentally changes how organizations deliver software.