Loading...

LPC2018 - Task Migration at Scale Using CRIU

38 views

Loading...

Loading...

Transcript

The interactive transcript could not be loaded.

Loading...

Rating is available when the video has been rented.
This feature is not available right now. Please try again later.
Published on Dec 3, 2018

url: https://linuxplumbersconf.org/event/2...
speaker: Victor Marmol (Google), Andy Tucker (Google)


The Google computing infrastructure uses containers to manage millions of simultaneously running jobs in data centers worldwide. Although the applications are container aware and are designed to be resilient to failures, evictions due to resource contention and scheduled maintenance events can reduce overall efficiency due to the time required to rebuild complex application state. This talk discusses the ongoing use of the open source Checkpoint/Restore in Userspace (CRIU) software to migrate container workloads between machines without loss of application state, allowing improvements in efficiency and utilization. We’ll present our experiences with using CRIU at Google, including ongoing challenges supporting production workloads, current state of the project, changes required to integrate with our existing container infrastructure, new requirements from running CRIU at scale, and lessons learned from managing and supporting migratable containers. We hope to start a discussion around the future direction of CRIU as well as task migration in Linux as a whole.

Loading...

When autoplay is enabled, a suggested video will automatically play next.

Up next


to add this to Watch Later

Add to

Loading playlists...