 אוקיי, אז נתחיל. רגע, אני מריק אדאס, אני סורטורלנגיינר עם רדאט, עשיתי על פרויקט עוברט. In this session I'll present a new feature that was introduced in עוברט 3.3 called RAM Snapshots. We'll start with a quick overview of עוברט architecture and volume types. Then we will review what are snapshots used for and how they were implemented in עוברט. And then we will get into the details of the RAM snapshot feature. We'll see how it improves the snapshots in עוברט, how it was implemented, and how it can be used. So, עוברט is a platform for managing virtual data centers. It is composed of hosts. In the bottom of each host, you can see the hypervisor. The hypervisor is the process that sets the virtual environment, basically simulates the real hardware for the virtual machine. Each host is installed with VDSM. VDSM stands for a virtual desktop manager. It is a Python application which serves as an agent sorry, in the host. It is responsible to report information about the virtual machines that are on the host and to do operations it gets from the engine, which we'll see in a second. VDSM interacts with the virtual machines using עוברט. עוברט is a library that exposes a simple API for operations that are made by hypervisors. The next component we'll see is the engine or the backend as it is sometimes called. The engine is responsible to monitor the whole environment, the whole virtual machines and the hosts. In the context of this session it is important to understand that the engine stores the information about the whole environment and persists into a DB and it is responsible to execute operations that come from the outside world. How the outside world interacts with the system so there are a couple of clients. One of them is the WebAdmin. The WebAdmin is a web application which provides a way to do advanced operations in the systems which are intended for administrators such as they define new networks, new storage, set user permissions and so on. The second, another client is the user portal. It's another web application which provides a way to do operations that are intended for users such as run VMs, connect to VMs using a console and so on. And there are two more technical clients which are the command line interface and the REST API. Now let's talk about volumes. A virtual disk is composed of elements that can be either files or block devices depending on the storage type that stores its data. Those elements are called volumes. There are two volumes in OVIRT, row volumes. This is the simplest format. It contains binary, plain binary data and cow volumes. Specifically when we talk about cow volumes in OVIRT we refer to QCOW 2 which stands for QM or copy and write version 2. Cow volumes contain only part of the data, only data that was changed. We will just demonstrate it. A typical image, this looks like image 1. It is composed of a chain of volumes where the base volume can be a row or a cow. And all the rest of the volumes are cow volumes. In general, and we'll see some exceptions later on, the last volume is the active volume. It is the volume that the VM is set to work with. It works directly with that volume. When we want to write data to a disk like image 1, the data we write will be stored in the last cow volume, assuming that it is the active volume. It means that since the volumes are created initially empty, cow volume contains only the data that was changed in the disk while it was the active volume. It also means that all the rest of the volumes in the chain are basically read-only because they never change. When the hypervisor will try to read data from a specific place in the disk, it will first try to read it from the last volume, again assuming it's the active volume. Each cow volume contains a metadata section that specifies what data the volume actually contains. So when we try to read from the disk, we'll first check if the data is contained in the active volume. If it's there, we'll read it from that volume. Otherwise, we'll try from the parent volume. So in our example, we'll try from volume 3. If it's not in volume 3, we'll try from volume 2, and if it's not in volume 2, then volume 1. Let's see some of the usages of snapshots in Ovid. So the typical usage is for backup and restore. We want to save the state of the VM in a certain point in time and be able to restore that state later on. Let's take, for example, a VM that has one disk, image 1, that contains one volume, volume 1, the engine stores a table called snapshots table. For simplicity, we'll sync of it as a table that has two columns, one that specifies the name of the snapshot of the entry, and the second called volumes contains volumes. Initially, for each VM, there is one entry in the snapshot table called active VM entry. The volumes which are associated with the active VM entry are the volumes that will be the active volumes on the next time the VM run, or they are the active volumes if the VM is already running. When we create a snapshot, what we basically do is we add a new volume for the disk and we set it as a child volume of the previously active volume. In our case, we add volume 2 as a child of volume 1. The snapshot table is changed. The new volume we added will be associated with the active VM entry and a new entry will be added. Let's say that our snapshot is called snapshot 1. So a new entry called snapshot 1 will be added and it will be associated with all the volumes that were active volumes before. In our case, volume 1. It makes sense because volume 1 basically contains the data of image 1 at the moment the snapshot was created. Now let's say that we want to restore the data that was saved in the snapshot. We do it by the preview operation. When we preview snapshot 1, we add a new volume which is set as a child volume of the volume that is associated with the snapshot. In our case, volume 3 is added as a child volume of volume 1. The active VM entry is cloned to be a previous active VM and the volume we added will be associated with the active VM, meaning that volume 3 will be the active volume next time the VM runs. Of course, preview, you can do the preview operation only when the VM is not running. It is called preview because the user can undo the operation. By undoing the operation, the state is reverted to how it was before the preview operation. You can see that we didn't remove any volume and we didn't remove any data from this snapshot's table. So it is revertible. When we are sure that we won't need to undo the operation and we want to stick with the data of the snapshot, we can commit to the snapshot. Committing to the snapshot will remove all the unused volumes. Those are the volumes that are not in the chain of the active volume. In our case, it will be volume 2. In the snapshot's table, we will remove all the entries that were associated with volumes which were removed. In our case, the previous active VM entry will be removed. Basically, the next time we run the VM, the active volume will be volume 3, since it is initially empty, it basically means that all the data will be the data that was stored in volume 1. We also use snapshots in Oviart to implement Stateless VM. Stateless VM is a VM that always starts with the same state, regardless of changes that were made on a previous run. Let's demonstrate it on the same VM that we were talking about. We add a new temporary volume for each disk. In the snapshot's table, the active VM entry will be cloned to be a Stateless snapshot, and we will associate the newly added volumes with the active VM entry. When the VM will be powered off, we will revert to the state it was before we run it, as it is on the left side of the slide. The third usage of snapshots in Oviart is as part of the LSM process. LSM stands for live storage migration. It is the process of moving a disk from one place to other place from one storage domain to other storage domain. While the VM uses the disk, first we create a snapshot. So if the disk adds one volume, volume one, it will now have two volumes, as we already saw. Then we clone the structure of the disk from the source to the destination. Since the disk has two volumes in the source, it will have two volumes in the destination. And next, we synchronize the data of the volumes. Now note that synchronize volume two is a difficult task, because the VM might try to the disk in the same time. So we need the help from the hypervisor for the synchronization. We won't get into the details of how we do that. It's out of the scope of this session, but it's important to understand that it is a difficult task. All the rest of the volumes in the chain, as we already saw, are read-only volumes, so they can be synchronized just by a simple copy command. So by doing the snapshot at the beginning, we basically reduced the amount of data that is more difficult to synchronize. There are two types of snapshots in Overt. Offline snapshot or regular snapshot are snapshots that are created when the VM is not running. How we do that? VDSM adds a new volume for each of the disks, and as we saw, set them as a child volumes of the previously active volumes. Now the next time we'll start the VM, the VM will use those volumes as the active volumes. The second type is a live snapshot. Live snapshot is a snapshot that is created when the VM is running. Again, VDSM creates a volume for each disk, but now we need an additional step where the VM is actually switched to use the new volumes we created. It looks that way. The engine invokes the snapshot verb in VDSM. Methods, operations in VDSM and LibVirt are called verbs. And VDSM calls the VIR domain snapshot createXML verb in LibVirt. The VIR domain snapshot createXML verb gets three parameters. The ID of the VM that we are going to take the snapshot for, and two additional parameters, XML and flex, that specify the snapshot properties, what the snapshot will contain and how the creation process should look like. One is one in XML format, and the second is flex, Boolean values. There are three types of snapshots in LibVirt. The first type is a disk snapshot. Those are snapshots that contain only the data in the disk. There are two kinds of such snapshots. Internal, the structure of QCAR two volumes support having the data of multiple snapshots along with the data of the disk inside the same volume. So in internal disk snapshot, the snapshot data is saved in the original volume of the disk. And the external snapshots are snapshots that they created by adding a new volume as we saw in the previous slides. The second type of snapshots in LibVirt is a memory state. Those are volume, sorry, snapshots that contain only the data of the memory. There are two kinds of that snapshot. A piggyback, where in a similar way to internal disk snapshots, the data is saved in the original disk volume. Again, this volume must be QCAR two volume. And external, which means that the memory data is saved in a separate volume. And there is the system checkpoint. The system checkpoint is a combination of the two snapshots I mentioned before. Those are snapshots that contain both the state of the disk and the memory. So in LibVirt terms, the live snapshots in OVIRT before RAM snapshots were external disk snapshots. We prefer external over internal because internal snapshots work only when the disk volume is QAR volume and we want to be able to take snapshot for raw volumes as well. And there is no downtime of the VM while creating external disk snapshots. In terms of the parameters to the VIR domain snapshot create XML verb, the XML parameter will contain the volume for each disk, the volume that should be the active volume for the disk. And we are using the following flags. Disk only since by default LibVirt create a system checkpoint and we want only the disk data. They reuse existing external volumes. As we saw, VDSM creates the volumes. We want LibVirt to use the volumes that were already created so we need to pass this flag. And the no metadata flag. LibVirt stores some metadata in the snapshot. We don't need this metadata so we pass this flag. And the queers flag, which basically means that the VM will stop using the volumes in a more managed way. Okay, so all the snapshots we've seen so far contained only the data of the disk. It means that when we reverted to such snapshot the VM must be booted in order to run. RAM snapshot contains the state of the memory with the state of the disk. So when we revert to RAM snapshot we can also restore the memory state. And what we get is almost the same as it was while the snapshot was created. If you were logged in to the VM you will be logged in again. All the open applications will be opened again in the same state. The LibVirt content will be the same and so on. One exception though is that TCP connections will be probably timed out. Now the process of saving the memory of a VM is not new in LibVirt. We already do that in the hibernation command. When we hibernate a VM its memory data is saved in a separate volume. The main difference between hibernation and RAM snapshots is that after hibernating a VM we can't use it until we'll run it again and it will be restored from hibernation. In RAM snapshot we save the memory as part of the snapshot and the user can still use the VM and is able to restore that state later on. So how do we create RAM snapshots? VDSM again creates a volume for each disk. It now creates two additional volumes and then it invokes the snapshot verb in VDSM. VDSM then saves the properties of the VM in one of the additional snapshots. Let's call it a VM properties volume. And then it invokes the weird domain snapshot create XML verb in LibVirt. Now how does it look like in LibVirt? In LibVirt we will create a system checkpoint because we will also save the memory state. We save it as external in a separate volume which is by the way a new capability in LibVirt. It was only recently introduced. This volume will be the second additional volume VDSM created. The XML parameter will now also contain the pass to that second volume where the memory should be saved in and the flags will look a bit different. We want to use the disk only flag because we want system checkpoint and the QS flag won't be used because it is not supported for system checkpoint. And let's talk a bit about the create life flag. In OVIR 3.3 we don't use the create life flag. It means that the VM will be paused. It will be freeze for the whole process of dumping the memory. And then of course it will be up again. By using the create life flag, the VM is active for almost the whole process. It is freeze only for a short time. Now the drawback of using the create life flag is that the volume where we save the memory will be bigger. We will need to save more data in the general case. And in the worst case, the snapshot creation progress might not converge. I still mention this flag here because we are planning to use it in the near future, I hope, with the proper handling of those issues I mentioned. Now let's see how operations we already saw are changed when it comes to run snapshots. So first we can see that the snapshots table is the main additional column called memory. This column will contain the memory volumes, as we'll see later on. The memory which is associated with the active VM entry is the memory that will be restored the next time we run the VM. So when we create a run snapshot, two separate volumes are created, one VM properties volume and the second memory state volume, those are the two additional volumes we saw in the sequence diagram. Let's say that memory 1 represents those two volumes. Memory 1 will be associated with the entry that represents our snapshot. When we will preview snapshot 1, we will associate the memory associated with the snapshot with the active VM entry. So the next time we will run the VM, memory 1 will be restored. And committing to snapshot operation stays the same. Now let's say that we want to run this VM that was committed to run snapshot as stateless. Basically what we want is that the next time we run the VM, the memory will be restored, and after that the next time we run the VM, this memory will also be restored. So how we do that? We are associating the memory that was associated with the active VM entry before the operation with the new active VM entry. How is the process of running VM looks like now that we have run snapshots? So first we check the status of the VM. If it is suspended, it means that it was hibernated before, and we treat it as a VM that should be resumed for hibernation. If it's not suspended, we check the memory that is associated with the active VM entry. If there is no such memory, then we run the VM as regular VM that is down. If there is memory which is associated with the active VM entry, we do the following steps. First, VDSM loads the VM properties from the properties file, and it modifies them. We'll see in the second why we need to modify them. And then Liberty stores the memory from the memory state volume with the modified VM properties. And as soon as the engine detects that the VM is up and might change the disk, it clears the memory from the active VM entry. It is important to do so, since we want to prevent a situation where we will later restore the same memory with the disk state, which is not coherent to that memory state. This state might lead to data corruption. So one property that we always need to change in the VM properties are the active volumes. And let's see why. When we created Snapshot 1, the active volume of Image 1 was Volume 1, and it was saved that way in the VM properties. Now after committing to Snapshot 1, as we already seen, Volume 3 is the active volume of Image 1. By default, if we don't change the properties, the VM will have the same properties it had when we created Snapshot. So we must change the VM properties to replace the active volume of Image 1 from Volume 1 to Volume 3. Let's see how it looks like by a sequence diagram. The engine invokes a restore verb in VDSM. VDSM then loads the VM properties and modifies them. And then it invokes VirDomainRestoreFlagsVerbinLibert. The VirDomainRestoreFlagsVerb gets four parameters. Again, the idea of the VM that we are going to work on. The volume where the memory that should be restored are in. The flags parameters, we don't use the flags parameters, so we won't talk about it. The XML parameter is important because it let us overwrite some of the VM properties when we restore it from a memory Snapshot. As we already seen, we always need to change the active volumes. Sometimes we might even need to change additional settings which are related to the data center or cluster. We need to do it when we restore the VM from a RAM Snapshot where the VM resides in a different data center or cluster than the one it was in when we created the Snapshot. So we will need to change additional settings to match the new environment the VM is in. We can think of the XML parameter as if we have three layers. One layer is the host level, what the operating system inside the virtual machine sees. It can be memory size, disk, and so on. Another layer is the host level, what really exists in the reality, how it really looks like in the hardware and the virtualization layer which bridge those two layers. The XML parameter let us make changes in the host level layer. Those changes are obstructed by the virtualization layer for the operating system inside the virtual machine. To understand, how is it possible that we will restore VM from RAM Snapshot in a different data center it was in when the Snapshot was created? We need to talk about the export and import operations. Export operation is the process of copying the VM into a special place called the export domain. This export domain can be later attached to a different data center or cluster and then it will be imported and then it can be imported. It means that it will be copied from the export domain to that storage domain, to that data center or cluster. In general, when we export or import a VM, its Snapshots are also copied. In case of RAM Snapshots, its memory volumes are also copied, of course. The memory, the Snapshots are not copied and the copy collapse option is set. When the copy collapse option is set, all the volumes of the disk are merged into one volume. Sometimes we set the copy collapse just because we want to get rid of the Snapshots because they take place in the storage, the reads are more slow, slower. Sometimes we have to use the copy collapse, for example, when you import or export between different storage domain. All the volumes of the disk are merged. In terms of the Snapshots table, we can see in the bottom of the slide that all the entries will be removed from the active VM entry, which will remain. Note that the memory that was associated with the active VM will be copied also when the copy collapse option is set. If you want to ensure that your memory is copied on export and import operation, you need to make sure that it is associated to the active VM entry, to a RAM Snapshot. Let's see some screenshots that demonstrate how it looks like. Those screenshots were taken from the web admin, but it looks the same in the user portal, in the extended view of the user portal, which means that only users with the right permission can see those tabs. In the first screenshot, we can see the Snapshots tab and we can see that it now has a memory column. The memory column contains disabled checkbox for each entry that indicates whether or not the entry is associated with memory. Initially, for a VM that was just created with no Snapshots, the active VM is not associated with memory and the checkbox is not set. When you want to create a Snapshot, you will get the following dialog. It will contain a checkbox that lets you choose whether to save the memory as part of the Snapshot or not, which is, by the way, set by default. This option is only shown though for when creating live Snapshots. When it's not a live Snapshot, the VM is not running, it won't exist. I created two Snapshots for that VM, Snapshot 1 and Snapshot 2. Snapshot 1 contains memory, so its memory checkbox is set and Snapshot 2 is not. You can see in the bottom of the slide the task tab and we can see that when I created Snapshot 1 there were two additional steps for creating volumes. Those are the VM properties volumes and the memory state volumes we saw. When we choose to preview Ram Snapshot, we will get the following dialog that has a checkbox that lets you choose whether the memory should be restored or not. If it's not set, it's set by default. If you don't set it, then only the state of the disk will be restored and you will need to boot from the disk. In the bottom of the slide, you can see how the Snapshots tab looks like after previewing Ram Snapshot. By the way, you can see here the undo operation we talked about and the commit. Actually, all the operations. Here we can see that the task tab looks like after committing to Ram Snapshot. The memory checkbox of the active VM entry will be set. When I run the VM that was committed to the Ram Snapshot the memory that is associated with the active VM will be cleared as we already discussed. In terms of REST API the XML that represents Snapshot will contain an additional attribute called persist memory state which indicates whether or not the Snapshot contains memory. When we create Snapshot memory state attribute to indicate whether the created Snapshot should contain memory or not. Of course, for offline Snapshot this attribute is ignored. When we invoke the preview or restore restore is an operation that exists only in the REST API it is a combination of preview and commit immediately. The restore memory attribute to indicate if the memory should be restored as well or only the state of the disks. That's questions. The architecture. Yes, yes. The engine is a Java application and it's data into a DB in postgres database. Other questions? Okay, just a second. So here is the wiki page of the feature. You can see more details about it. The mailing list you can reach us if you have any question. My nick in the IRC and my email. Thank you.