Running Blameless Postmortems That Actually Improve Things
A production incident happens. The team scrambles to fix it. Then comes the postmortem — and in most organizations, it is a waste of time. People deflect responsibility, action items are vague, and the same type of incident happens again three months later.
I have run dozens of postmortems. The ones that actually improve things follow a specific structure.
The Ground Rules
Before anyone speaks, I set three ground rules. First, no blame. We are analyzing the system, not the person. Humans make mistakes because systems allow them to. Second, assume good intent. The engineer who pushed the bad config was not careless — they were operating in a system that did not prevent the error. Third, action items must be specific, assigned, and timeboxed. "Improve monitoring" is not an action item. "Add alerting for API latency exceeding 500ms by end of next sprint" is.
The Structure
I use a five-part structure. Timeline: what happened, minute by minute, based on logs and evidence. Impact: what broke, for how many users, for how long. Root cause: not the surface trigger but the underlying system condition. Contributing factors: what made the impact worse or delayed the response. Action items: concrete changes to prevent recurrence.
The Facilitation
The hardest part is creating genuine psychological safety in the room. I start by sharing my own mistakes. "I missed this risk during planning" or "I should have escalated earlier." When the leader is vulnerable first, others follow.
I also separate the timeline reconstruction from the analysis. First we agree on what happened. Then we discuss why. Mixing these two phases is how postmortems devolve into arguments.
Follow-Through
The postmortem is worthless if action items die on the vine. I track every action item in Jira with a dedicated label and review completion weekly. If an item is blocked, I escalate it. The team needs to see that postmortem outputs are treated as first-class work, not afterthoughts.
This is how you build a learning culture. Not with posters about continuous improvement — with consistent, structured follow-through after every failure.
←Back to all posts