Alex Bramley on The Art of SLO, Part 1
Alex Bramley talks to Sven Johann about the basics of service level objectives. They begin with terminologies (SLI, SLO, SLA, Error Budget), look at costs of outages and discuss what reliability has to do with customer happiness. They continue with having 100% reliability is the wrong target and what’s possibly the right target. Alex then explains how to get started with collecting data about your system’s behaviour. They close the first part of this series by looking into latency SLIs.
Read transcriptShow Notes
- SRE Workbook
- Implementing Service Level Objectives by Alex Hidalgo
- The Calculus of Service Availability
- Art of SLO Workshop)
- Google Customer Reliability Engineering blog
- Consequences of SLO violations
- Applying the escalation policy
- An example escalation policy
Chapters:
- 00:00:15 Welcome and intro
- 00:02:14 Terminology: SLI, SLO, SLA
- 00:09:05 Cost of a (cloud provider) outage
- 00:11:22 Reliability and customers happiness
- 00:20:19 Error Budgets
- 00:26:31 100% reliability is the wrong target
- 00:37:44 Collecting data
- 00:54:31 Latency SLIs
- 01:09:53 Outro
Comments
New comment