The On-Call Rotation Design That Actually Reduces Burnout

The On-Call Rotation Design That Actually Reduces Burnout

On-call burnout is an engineering retention problem disguised as a scheduling problem. Engineers who are chronically on-call, who lose sleep regularly, who cannot make weekend plans, who feel like they are always one alert away from a two-hour incident, leave. Not immediately, but steadily. The turnover is expensive and the knowledge loss is worse. Fixing the schedule without fixing the underlying alert quality is the most common mistake teams make.

"On-call is broken at most companies not because engineers are weak or uncommitted, but because organizations tolerate alert volumes, undefined runbooks, and unclear ownership that would be unacceptable in any other professional domain. You would not tolerate a surgeon being paged 40 times a night for non-emergencies. Why do we accept it for engineers?"

— Charity Majors, Co-founder and CTO, Honeycomb.io, in a widely circulated essay on on-call culture (2023)

Rotation length tradeoffs

Weekly rotations are the norm, but they are often wrong for small teams. With a four-person rotation on weekly shifts, each engineer is on-call for one full week every month. If the on-call experience is consistently bad, that is one week out of four where that engineer's experience at your company is miserable.

Shorter rotations like daily or two-day shifts distribute the pain more evenly but create handoff overhead and reduce each engineer's ability to build context on ongoing incidents. Longer rotations like two-week shifts reduce handoff frequency but make a bad on-call week significantly worse for the engineer holding it. The best rotation length minimizes the frequency of the worst experiences, not the one that optimizes for operational convenience.

Shadow shifts for onboarding

New engineers should shadow on-call before holding it alone. Shadow shifts have two purposes: training the new engineer on incident response patterns, and creating a second opinion during incidents that reduces the new engineer's anxiety about making wrong calls without support. Three shadow shifts before solo shifts is a reasonable minimum for most systems. Shadow shifts also surface runbook gaps, because new engineers ask questions that experienced engineers stopped asking years ago.

Runbook coverage as a prerequisite

You should not have any alert in your rotation that does not have a runbook. Not a great runbook, but a runbook: a document that says this alert fires when X, here are the three things to check first, here is the escalation path if those do not resolve it. Measure runbook coverage monthly. If 30 percent of your alerts fire without runbooks, you have 30 percent of your on-call burden falling on the engineer's improvised judgment at 2am.

Alert fatigue metrics

Track alert volume per on-call week, by alert name, over time. An alert that fires 40 times in a week and requires five minutes of investigation per fire is 3.3 hours of on-call work from a single alert. The fix is almost always not the alert threshold but fixing the underlying condition the alert is monitoring. Noisy alerts are a proxy for unresolved system instability. Fix the system problem or accept a noisier rotation.

On-call compensation matters

Engineers who are on-call outside business hours should be compensated for it. This is both fair and practical. Engineers who feel on-call is uncompensated labor resent it, which compounds the burnout effect of the scheduling and alert quality issues. On-call compensation can be monetary or time-in-lieu. Either works. Not acknowledging it is the mistake.

📊By the numbers

MetricFindingSource
Engineers who report on-call as a significant burnout factor58%DORA State of DevOps Report, 2023
Average alerts per on-call shift (teams with poor hygiene)37 alerts per shiftPagerDuty State of Digital Operations, 2023
Reduction in after-hours pages with structured rotation designUp to 40%Google SRE Book practices, 2023
This publication runs on Ghost + n8n + Mautic + EspoCRM — the same stack Crescevo installs for devtool companies and engineering-led B2B firms. If you want a look under the hood — see the system →
Code and tools are informational. Test in non-production environments. You are responsible for security and testing. Full disclaimer →