You’re a fraud analyst, customer success manager, product manager, trust & safety analyst. You’re in charge of making sure your company’s operations run smoothly. That there isn’t a major fraud event or customer loss on your watch. How do you use your company’s data to manage an operational process?
Google’s site reliability lead wrote an engineering classic piece on his philosophy on DevOps alerting. We’ve interviewed 100+ leaders about business alerting, from companies across fintechs, healthcare and marketplaces varying from 2–4000 people.
We inferred several similarities, uncovered some great best practices and common failure modes to avoid. We bring you these lessons in this series about alerting.
If you take away only three things from this article, remember
- Alerts should be real
- Alerts should be urgent
- Alerts should be actionable
Alerts should be real
Monitoring is a trade-off between your time/system complexity and your confidence its working well. Ensure alerts are designed by the right stakeholders (Read: What you need to get started) While it might seem counterintuitive, over-monitoring is harder to solve than under-monitoring (Read: How to decide what to monitor)
Alerts should be actionable
There’s several ways to ensure you’re crafting good alerts (Read: How to craft a good alert), but at a minimum, alerts should make it clear what the next steps are. Else, you risk having different responses to the same alerts depending on who’s on call, or worse, inaction.
Alerts should be urgent
Ensure you’re only adding alerts for something that is or will imminently be a user-facing issue that you need to immediately address. Alerts are not the way to display regular business metrics or cases (Read: How to manage non-alert items).
Finally, the best alerting systems manage themselves! (Read: How to set up a self-managing system)
The next series of articles will walk you through what we learned about:
- What you need to get started
- How to decide what to monitor
- How to craft a good alert
- How to manage non-alert items
- How to set up a self-managing system
Don’t want to read this whole series & just want to monitor your business without engineers ASAP? Use LogicLoop to start writing flexible and scalable rules to trigger alerts, reviews and actions. You can jump straight into the documentation here.