 Okay, hello everyone and thank you for coming, my name is Ayal Shenitzky and I'm a software engineer at Reddit on the Ovid storage team. Hi, I'm Daniel Ares, an engineer at Reddit and a contributor of Ovid project. Okay, so thank you for coming and today we're going to talk about back to the future and how we can do an incremental backup using Ovid. So the agenda for our time journey today, we'll start reviewing the old supported backup APIs in Ovid, then we'll see the new supported incremental backup API, then we'll dive into Dolorean hood and see how it works underneath, and we'll show how to use it using the engine and image IO backup API. So let's start. So how many of you used Ovid or know about Ovid? Can you please raise your hands? Okay, to see most of you, which is fine, great, thank you. So this is Ovid from 10,000 meters above. We decided to show here only the main components that related to backup. So we have the engine, which is the management application, the Java application, and underneath it we have the hypervisor, the host that runs virtual machines, and of course we have the virtual machines disk. Also we have an external application, a backup application that is an external application for Ovid and wants to backup those VMs, especially the disks. So how it can be done? So the old supported backup APIs, let's review them. We go back in time. The oldest solution is the backup appliance. The year is 2014, even before I was born. The backup applications runs actually inside Proxy VM. How it's going to be? The flow will be to take a VM snapshot, and then you will attach that specific snapshot to a backup VM. Once you attach the snapshot to a backup VM inside that Proxy or backup VM, you will copy the entire disk, and detach when you are done the disk snapshot from the backup VM, and remove the snapshot. Fine, let's move forward in time. We have a newer way, we have the image transfer API. If some of you were in the BaroS presentation, they are using this as their backup solution. So the year is 2017. You can now download and upload snapshot using the image.io, rest API image.io is a subproject in Ovid to communicate with the engine and download and upload disks from Ovid engine to your environment, and it was presented since Ovid 4.0. So how the flow is going to look like? You just, of course, need to take a snapshot. When the snapshot is ready, you will download the snapshot as a Q-code disk. Once you're done, you can remove the old snapshot. So in order to restore the backup that you've taken, you can prepare the disk for restore. It will be in the backup application. For example, if you have a chain of snapshots, you will be re-based all those snapshots, one on top of another, or any kind of preparation you need to do. Once you're done, you can upload the disk or the chain of snapshots using grow or Q-code 2 format, and create a VM from the uploaded disk. So those were the supported backup APIs. So now, Daniel will present to you about the new incremental backup API. Okay, let's see the incremental backup API. So first on your seatbelts, we're going to jump to the future. Okay, so the year is next year. We are still working on it. It's not done yet. So it will be an intake preview in Ovid 4.4. It will require upstream LiveVirt and QMO 4.2, and the ITs incremental backup via change block tracking and using the NBT protocol. So let's see the full backup flow in a very high level. First, we start the backup with a simple API for that. Second, download the disks in raw format. We don't care about the source format, we get it in raw. And last, stop the backup. And there is no step 4. Simple API, simple flow. Ovid does all the everything behind the hood. And actually, LiveVirt and QMO does everything, but Ovid manages all of it. Okay, let's go for the incremental backup flow. Very similar. First, we start the incremental backup. Second, download the change blocks. Again, in raw format, we don't care about the source. And third step, stop the backup. So again, very simple. Now, let's see how we do restore from incremental backup flow. First, we need to prepare the disk for restore. The backup application needs to create a single file in raw format. Then we upload the disk again. And then, simply attach the disk to the new VM and we can use the VM as it was before. Now, let's see the advantages of incremental backup. We have quite a few of them. First, copy change blocks. And this speed up incremental backup or actually enables incremental backup by copying only the change blocks since the less backup. Next, extends API. For full backup, we have a new extends API for exposing data extents and zero extents so we can skip the zero extents and download only the data extents. Next, we are snapshot free now. No need to create or delete a snapshot. So all the complex operation of managing a snapshot, which can be quite complex as we know, no need for it anymore. Next, raw guest data. We can download the data in raw format as the guest sees it. We can convert it later if you want, but raw is supporting streaming easily so this is more efficient. And then last but not least, we have an improved image error client which is a library that enables us to upload, download, disks in both formats, even including the backing files. Now, let's see the limitations. First, we're supporting only QR code disks. We need the features of QR code to get the extents. Next, it's available only for running VMs now. We can support it later on for non-running VMs but this is current location. Unless the extents API is not useful for raw preallocated disks. Since for raw disks, we get only a single extents so we can really do incremental. Now, Erl will show us under the hood. Okay, great. Thank you, Daniel. So we saw how the incremental backup is going to work in the higher level site. So let's dive into the Lorent hood and see how Libreth and QMO actually doing those operations. So in the base of incremental backup, we have checkpoints. You can imagine a checkpoint like a lightweight snapshot. When the user is selected to create a backup, a checkpoint is created in the Libreth layer. Actually, checkpoints tracks the metadata inside this specific backup. For example, you can see here that the user created the backup in January 1st and this checkpoint was created for that backup. You can see the backup that this checkpoint belongs to and which disks are participated in the specific backup. So time moves forward. This is the first backup in the chain. The user, in the day after, selects to create a new checkpoint. So now a new backup and a new checkpoint was created underneath it and now this checkpoint has a parent which is the first checkpoint and same disks participate in the specific backup. Fine. Day after, the user is selected to create another backup and another checkpoint was created and now we have a chain. The chain of checkpoints will be persisted on the host as long as the VM is alive. In case the VM goes down or crashes or something happens to him, we need to redefine those checkpoints because Libreth is not aware of them. So we persist all this information in the engine database. Again, the engine will redefine those checkpoints and the VM will be aware of what happened. So we have checkpoints. We know how it works in Libreth site but checkpoints actually just track the metadata. What actually gives us the ability to know the difference between the bits that changes. So this is done by dirty bitmaps. As we said earlier, for each backup we are creating a checkpoint in Libreth and also for each checkpoint we are creating dirty bitmaps on the QMO level. So as you can see here we have a disk SDA and the user creates a backup. Also Libreth creates a checkpoint for that backup and a dirty bitmap was created for the disk. Each entry in the bitmap is marks a specific cluster in the disk. In the example here we can see that the disk is clean. Now that was written by the guest so the bitmap is also clean. No bits got turned on and no changes were made from the guests. So time's moved on. Guest writes data to the second cluster. So the second bit inside the dirty bitmaps turned on and marks us that this cluster is now dirty. The guest writes data to it. Let's move forward the guest writes data to the fourth and the fifth cluster. We can see that even though the fourth cluster is not fully changed we are still marked as dirty on the dirty bitmaps. Fine. We can create a new backup. So a new checkpoint was created and a new bitmap was created in QMO a dirty bitmap. The first bitmap of the first checkpoint is now deactivated and we can see here that no changes was made to the disk. So the checkpoint, the dirty bitmap of the checkpoint is still clean. Once the user will want to download the backup in that stage you will get the data that was traced from the first checkpoint. Okay. So now QMO can report us on a specific disk cluster that's got dirty. So if we want we can get a map of the disk, of the extents and we can see that QMO will report us which extents got dirty or which cluster. So the first extents will mark us that the first cluster isn't got dirty so the dirty will be false and the second cluster got dirty so QMO tell us dirty will be true. So this is the section that we need to actually download. So we know it works. Let's see how we can use it using the overt backup API. The example here using the Python SDK that is layered on top of the REST API of Overt. In order to start a full backup you just need to fetch the VM service using the Python SDK and inside the specific VM service we have the backup service. So we just need to create a new backup using the add method and specify the specific disks that's going to be included inside that backup. Once we've done that we get a backup object we will wait for the status of the backup to be ready. Once the backup is ready we can move on and if you remember for each backup checkpoint was created underneath it in Livid so we need to fetch this specific checkpoint ID in order to provide it later since when we are tracking the difference. So we fetched the backup ID the checkpoint ID from the backup and we can start a transfer session with ImageIO. So we are creating a new transfer using the image transfer and we specify the specific disk that we want to download and of course the direction of the download. Since this disk is part of a backup we need to specify which specific backup it is and the format must be raw. Okay once we initiated the transfer we will wait for the transfer session to be ready and when the transfer is ready we can fetch the transfer URL. This URL give you the direction specific to the host to the demon so the transfer and the download will be more efficient and you can download actually the backup disk Daniel will elaborate more about this later on. So we have the backup the full backup in our backup application we can finalize the transfer and finalize the backup session. Fine. So we have a full backup we can do an incremental one, right? But before that we need to filter the disks. Only QCAU2 disks are supported for incremental backup and the user must mark the specific disk that you want to participate in an incremental backup. So this will be marked by a disk backup dot incremental so we filter the disk that supports an incremental backup that we want to download and we can start a new backup but now it's been an incremental backup. Okay. So we provide the disk that we filtered before and we provide the from checkpoint ID that we fetched from the full backup that we done earlier and the same goes on again. So now Daniel will show you how you can interact with ImageIO backup API in order to download the backup disk. Okay. So how to download the disk without any data but it's not very efficient. We need to find a better solution and we have one. So how do we do this? First we send the options request to the server demon on an actual resource and we get an example response we see that this resource supports the extents API 0 flash we can optimize the connection we want to use multiple connections Now let's say we want to download 100 gigabytes image How many gigabytes are 100? Well, we don't really know it can contain as much zeros but we don't really care about the zeros so we don't need to download them So how do we do it? First we get the zero extents with a get request again on the resource and simply specify context zero an example for response we can see that the first request is zero so we don't need to download it we can simply skip it and the second one isn't zero so we shouldn't skip it Now which blocks change since the less backup for incremental backup we can tell it by getting the dirty extents So for getting the dirty extents we send a get request to the resource again extents just now with context dirty this is for incremental So this example we can see that the first extents is dirty so we should backup it download it and then the second one isn't dirty so we shouldn't download it So getting the data How do we get the data extents? We use HTTP range request Again invoke a get request on the resource and specify a range there with offset and length Okay, so now how do we speed up the restore Can we speed up? We simply upload only the data extents How do we do it? We send a put request on the resource and specify content range offset, length and size and we simply need to send single request for each extents Okay, now for the zeros What about the zeros extents? We can use the zero feature to zero the extents on storage So how do we do it? We send the page request on the resource with operation zero specify offset and size and that's it So finally flashing is my data is really on storage Well, we don't know We need to flash it first to ensure it This will under the root invoke fsync command So how do we do it? Send a page request on the resource with operation flash It will invoke the fsync command under the root And this is done synchronously Okay, was all of it too complicated? Well, probably but luckily we have a much better solution the client library So what's the cool kids are doing these days? Apparently building overboards and using our amazing client library So what is the image.io client library? It's basically a way to write your own time machine Easy It can be a reference implementation for using image.io with the API You can build your own backup solution and it's good for testing your backup APIs We have lots of examples of it in the codebase in the Python SDK Okay, let's see the pipeline You basically interact with the client the image.io client So for the download for upload the image.io client rates the data extents using QMMPD from the disk sends the extents to the image.io demon again QMMPD for download it's simply the other way around the image.io demon rates data from the source with QMMPD sends it to the client and stores it in the destination So we have QMMPD all the way Okay, so let's see how to do full backup with the client We invoke the download API specify the transfer URL and the name of the file If it's default is a QCAR format just because it's more efficient Okay, so for incremental backup we just need to specify incremental equals true and that's it Okay, how do we prepare the disk for restore we need to rebus the images one on top of the other to have a single image to upload we incrementally re-basid it up the last full backup Okay, how do we use the client we invoke the upload API again with the image that we've created and the transfer URL Okay, do you have some time for a demo? Yes, okay, so thank you we can see a short demo for how we can do a full backup using the Python SDK and using the client library so in the example here we have one VM if we do a 30 VM for the beginning we just mark the disk as enabling incremental backup so we can use it for backup we open a console just to create a directory that will illustrate that this directory will be when we download the disk after the backup will start we create another one that will be in the downloaded disk we are using the Python script that we prepare for you as an example and we are starting a full backup for the specific VM ID the backup was started we open a console again to create a new directory that directory, the after directory will not be included in the downloaded disk because it was created after the backup was started so we created the backup it was ready we can create and start an image transfer the transfer session is ready now so we can start downloading the disk this is the URL of the transfer that goes directly to the demon and the image will be stored in the following path that we provided using one of the arguments to the script the backup disk was downloaded the backup was successfully done we can see here we have the disk we are just creating another QCOP layer on top of it in order to not destroy the data and we can start a QMO on top of it so we have Fedora 30 VM we log in and we can see that we have only the first directory not the second one that was created during the backup okay great, so that was it for more information you have a following links here for the image.io and also image.io client the SDK example page in Ovid and in QMO and of course for the Ovid site if you want to investigate it about it more you can visit here thank you so we have a little bit time for questions if you have yes? yes the question was when we started the backup process does QMU run some backup sweep or something to QMU guest agent QMU guest agent yes so we need QMU guest agent but we also freeze the VM on our side using the hypervisor the VDSM so we'll make sure that the data will be consistent and now how it will be made and of course when the backup is done we are just the backup operation which is quite short more questions? okay fine I guess that's it thank you very much