I was discussing with some old friends recently about on-call - not specific incidents, more how to shap the burden across a large team. Someone mentioned in their company a few engineers had stitched together AI-assisted workflows that made their rotations noticeably lighter. Triage summaries, draft runbooks, little automations that shaved minutes off repetitive steps. The pace of invention sounded healthy. The mood in the room was not celebration. It was uneven relief.

If you have led at any scale, you have felt that asymmetry before. Individual excellence quietly becomes a patchwork. Some people move fast in private toolchains; others carry the same load they carried three years ago, only now the baseline expectation has drifted upward. The question is not whether experimentation is good. It is whether the org is buying leverage, or buying a lottery ticket with a few winners.

The two futures

There are two tempting stories engineering leaders tell themselves about AI tooling, and both contain truth.

The first story is innovation-first. Let people move. Let them wire assistants into their editors, their terminals, their incident channels. Let a thousand flowers bloom. The upside is real: fast iteration, local fit, morale for people who like building their own leverage. The downside is just as real, but slower to show up on a slide. Skills do not transfer. Prompts live in private notes. Scripts assume one person’s mental model of the system. You get tool sprawl, inconsistent guardrails, and a soft version of “works on my machine” - except the machine is a workflow, not a binary.

The second story is standardization-first. Pick a platform, bless a template, route everyone through the same guardrails. Collective productivity rises. Onboarding gets easier. Security and compliance have something finite to audit. The downside is also familiar. Shared tooling converges on the median use case. Edge innovation feels like bureaucracy. The people who were fastest alone now wait on a committee, or they route around the official path and you inherit shadow IT anyway.

So the managerial problem is not picking a camp. It is timing and design. If you standardize too early, you freeze the wrong abstraction. If you standardize never, you celebrate local genius while the team pays a collective tax.

What “good” looks like for a team

When I say “good” here, I do not mean “impressive demos.” I mean durable slack that is broadly accessible.

A team is winning when a new engineer can get sensible defaults in a day, not when three heroes have bespoke automations nobody else can operate. A team is winning when incident learning compounds in shared artifacts, not when the best postmortem draft lives in one person’s chat history. A team is winning when review and on-call load stay bounded as scope grows - because leverage is composable, not because a few people are quietly carrying the org on caffeine and pride.

The second-order effects show up in places leaders pretend not to care about until they hurt.

Reviewer fatigue rises when every change arrives in a different AI-shaped package - some diffs readable, some sludge, some hiding risky shortcuts behind plausible language.

Support burden shifts to the few people who understand how the unofficial stack actually works.

Incident fairness erodes when “easy” rotations are unevenly distributed because only some people bought themselves relief.

Planning fragments when token spend and tooling subscriptions multiply in parallel, each justified locally, none visible as a single steering problem.

None of that is an argument against experimentation. It is an argument against mistaking individual throughput for organizational resilience.

A deliberate hybrid

My view is simple enough to state and hard enough to execute: start pulling a thin spine together now, and leave explicit lanes open for the next thing to graduate.

Pulling together does not mean freezing the universe. It means choosing a small golden path - one supported assistant stack or integration pattern, one documentation habit for prompts and runbooks, one lightweight evaluation habit before something becomes “how we work.” You are not trying to capture every clever hack on day one. You are trying to make the median engineer less alone on a bad night, and to make the clever hacks legible enough that they can be adopted or rejected on purpose.

Leaving the door open means designing mechanisms, not vibes. Time-boxed experiments with a named sponsor. A simple RFC-to-adopt path for workflows that prove out. A supported default with an escape hatch for teams that truly need something else - but with a higher bar for handoff quality, not for novelty.

The graduation criterion I care about is boring on purpose: shareability. If it cannot be run, reviewed, and handed off by someone who did not invent it, it is not a team standard yet. It might still be valuable as a personal accelerator. It should not silently become load-bearing infrastructure.

I also care about ownership rotation for the shared layer. Nothing ossifies innovation faster than a guild that becomes a gate. If the people who maintain the golden path never rotate, you get politics. If they rotate too chaotically, you get churn. Managers earn their keep in that balance.

Finally, revisit defaults on a rhythm - quarterly is enough for many teams - so “standard” does not mean “2019’s best idea fossilized in policy.” Standards exist to reduce coordination cost, not to win arguments forever.

What I would not compromise

Even while I argue for convergence on a spine, there are places I get conservative fast.

Safety and secrets handling should not be a free-for-all, no matter how clever the automation. Observability and auditability still matter when a model is drafting the steps a human will execute under stress. Handoff quality matters when someone else is holding the pager in six hours.

Those are not anti-AI positions. They are anti-magic positions. The teams that scale treat AI like infrastructure: versioned, reviewable, replaceable.

Questions I am carrying into my own staff conversations

Where is your org already paying a sprawl tax - in review time, in incident variance, in onboarding, in budget lines that only finance can see?

If you had to standardize one thing next week, what would earn the widest collective relief with the smallest loss of experimentation surface?

And who gets to bring the next breakthrough inside the fence - not as a hero narrative, but as a repeatable upgrade the whole team can trust?

I am betting on a hybrid because the alternative is false precision. Pure freedom feels virtuous until cohesion frays. Pure control feels responsible until innovation routes around you. The job of leadership here is to make the default path good enough that most people choose it - and to keep a deliberate door open for what comes next.