Incident Driven Learning

2 minute read

I recently had an interesting chat with some friends about our experiences in the industry and some of the fun work we’ve been doing.

One topic that caught my attention was the costly mistakes we’ve made in the past that negatively impacted the systems we were responsible for and could have cost us our jobs on a bad day. For instance, one friend shared how a fintech company almost went bankrupt due to an introduced bug, and another mentioned how a mobile app crash affected a certain percentage of customers. I remember almost losing clients’ data due to a data hack earlier in my career (premium tears).

One attribute we all shared in common was how we learned from those experiences, which became the foundation for much of the knowledge we’ve acquired. We were fortunate to work in environments that approach mistakes in a blameless manner.

I can still recall the fear and panic during my probation when I took a Tier 2 system offline for a few minutes; it wasn’t funny. Thankfully, I had a senior colleague who calmly took on the responsibility of resolving the issue, making me feel calm throughout the process.

In recent times, I’ve been “privileged” to be partly responsible for incidents, learning how not to break things or, better yet, how to build systems that seldom fail. The most empathic engineers I’ve met are those with several such experiences. No wonder they are approachable and eager to help—understanding how even the best of us make mistakes, regardless of tenure or rank.

In summary, embrace and learn from your mistakes. Look forward to them as badges of honor for the trade. Not everyone is privileged to break a production system (haha).

PS: Which of these scenarios is easier to accomplish? (a) Avoid breaking a system, or (b) Avoid building a system that breaks.