Loading...

LPC2018 - Task Migration at Scale Using CRIU

157 views

Loading...

Loading...

Transcript

The interactive transcript could not be loaded.

Loading...

Rating is available when the video has been rented.
This feature is not available right now. Please try again later.
Published on Dec 3, 2018

url: https://linuxplumbersconf.org/event/2...
speaker: Victor Marmol (Google), Andy Tucker (Google)


The Google computing infrastructure uses containers to manage millions of simultaneously running jobs in data centers worldwide. Although the applications are container aware and are designed to be resilient to failures, evictions due to resource contention and scheduled maintenance events can reduce overall efficiency due to the time required to rebuild complex application state. This talk discusses the ongoing use of the open source Checkpoint/Restore in Userspace (CRIU) software to migrate container workloads between machines without loss of application state, allowing improvements in efficiency and utilization. We’ll present our experiences with using CRIU at Google, including ongoing challenges supporting production workloads, current state of the project, changes required to integrate with our existing container infrastructure, new requirements from running CRIU at scale, and lessons learned from managing and supporting migratable containers. We hope to start a discussion around the future direction of CRIU as well as task migration in Linux as a whole.

Loading...


to add this to Watch Later

Add to

Loading playlists...