Loading...

LISA17 - Failure Happens: Improving Incident Response in Large-Scale Organizations

379 views

Loading...

Loading...

Transcript

The interactive transcript could not be loaded.

Loading...

Rating is available when the video has been rented.
This feature is not available right now. Please try again later.
Published on Nov 15, 2017

Damon Edwards, Rundeck, Inc.
@damonedwards

Deployment is a solved problem. Yes, there is still work to be done, but the operations community has successfully proven that we can both scale deployment automation and distribute the capability to execute deployments. Now, we have to turn our attention to the next critical constraint: What happens after deployment?

We all know that failure is inevitable and is coming our way at any moment. How do respond quickly and effectively to those failures? What works when there is just a small set of teams or an isolated system to manage will quickly break down when the organization grows in size and complexity. But on the other hand, what has been commonly practiced in large-scale enterprises is proving to be too cumbersome, too silo dependent, and simply too slow for today's business needs.

How do we rapidly respond to incidents and recover complex interdependent systems while working within an equally complex and interdependent organization? How does operations embrace the DevOps and Agile inspired demand for speed and self-service while maintaining quality and control?

This talk examines the trial-and-error lessons learned by some forward-thinking enterprises who are currently streamlining how they:

Resolve incidents
Reduce friction between teams
Divide up operational responsibilities
Improve the quality of their ongoing operations.
See how these companies are rethinking how and where operations happens by applying Lean and DevOps principles mixed with modern tooling practices.

This talk will:

Dissect examples of operational incidents from inside actual large enterprises
Identify the common organizational and technical anti-patterns that prevent quick and effective incident resolution and interfere with organizational learning
Discuss emerging design patterns and techniques that remove the friction and bottlenecks while empowering teams (highlighting publicly referenceable work shared with the DevOps community)

View the full LISA17 program: https://www.usenix.org/lisa17/program

Loading...

When autoplay is enabled, a suggested video will automatically play next.

Up next


to add this to Watch Later

Add to

Loading playlists...