Rating is available when the video has been rented.
This feature is not available right now. Please try again later.
Published on Mar 15, 2018
In this video, Seth Vargo and Liz Fong-Jones discuss how the SRE discipline reduces tension over velocity/stability between product teams and system operators by quantifying risk and employing error budgets. Striving for 100% availability in a service isn't just impossible, it's unnecessary. Maximizing stability limits how quickly new features can be delivered to users. Extreme availability produces diminishing returns as user experience becomes dominated by less reliable components like cellular networks or WiFi. While we want to reduce the risk of system failure, we also have to accept risk in order to deliver new products and features.
In the SRE discipline, error budgets are the prescriptive, quantitative measurements for how much risk a service is willing to tolerate. Error budgets are the byproduct of the agreed-upon SLOs (Service Level Objectives) between product owners and systems engineers. Risk and error budgets are directly related to many DevOps principles. Error budgets clearly define that "accidents are normal" by quantifying accidents and risk. Error budgets also enforce that "change should be gradual", because non-gradual changes could quickly break the SLO and prevent further development for the quarter. This is why we say class SRE implements DevOps.