Skip to main content

Posts

Showing posts from February, 2019

Who to blame for all your problems

Who to blame for all your problems Conducting Blameless Postmortems This post is based off of my talk at PyCascades 2019 To start off with, what is a postmortem? There are two common uses of the term: A document detailing what happened during an incident A meeting to review an incident, usually resulting in the creation of the postmortem document This post is focused on the meeting, but I'll also have some recommendations for the document. Why Run Postmortems? Why do we conduct postmortems, anyway? Production broke, we fixed it, call it a day, right? Holding postmortems helps us understand better how our systems work -- and how they don't. If your system is complex (and it probably is), the people who work on it have an incomplete and inaccurate view of how it works. Incidents highlight where these gaps and inaccuracies lie. Reviewing incidents after the fact will improve your understanding of your systems. By doing this as a group and sharing what you found, you...