 So welcome and thank you for coming. My name is Ayan Shenitzky, and I'm a software engineer at Reddit in the overt storage team I'm a Friday all on senior software engineer. I used to work with a on the same team currently working in multi cluster management Great, so let's start so today. We're going to talk about the new managed block storage in Ovid And how you can offload all your storage operation to the storage back in itself So how many of you used or using now over it's Can you please raise your hands? Nice. Thank you. I see a quite a lot of hands way to here. So Over it is an open source virtualization platform It's going to allow you to manage and orchestrate virtual machines as you can see here in the diagram We have the overt engine which is the management application written in Java in over it You can have a multiple data centers Each data center can contain a multiple clusters and in each class in each cluster you can have several hypervisors and of course in each hypervisor you can run multiple VMs Also for each data center. We must have shared storage and virtual networks So if you already mentioned shared storage in Ovid We are supporting the traditional storage domains storage types for example for block storage domain We're supporting ice Kazi and fiber channel and for the file-based storage domain We are supporting NFS posix cluster and local storage that resides on the host itself For performing storage operations. We are we are having in order to create disks for a VMs We can we need to perform Some operations to allow Disk operations for the VMs for example creating snapshots cloning disks migrate from one storage domain to another Creating templates etc. Those operations are quite Complicated and requires a lot of synchronization for example communicating with LVM and maintaining Q-Cow chain Storage migrations those operation takes a lot of compute from the hypervisor which costs when we run two rounds and VMs and In order to perform those operations. We must lock the disks to prevent simultaneous operation on the disk and Of course because we are maintaining the operation by ourself It takes quite a lot of time to perform those operations Also, when we are using traditional storage domains We are losing the options to use some more advanced storage features like deduplication and compression So how can we prove all those problems that we mentioned before when we are using the traditional storage domain? so Maybe we can imagine someone that can do most of the storage offload operation for us Let's say we just call it in order to perform the operation and we don't care about how we done it we just get the result and Maybe if it's gonna be in the storage back in itself It could be implemented even better than we are doing so we can gain we can reduce the complexity and We can reduce time because operation are done on the storage side itself so it's gonna be much more faster and We are still need to lock the disk but for a shorter amount of time because all the operations are much faster So actually we in over it. We had that solution before the old generation of Cinder integration Cinder is a block storage For OpenStack as some of you may be used it before can you please raise your hands and know about it? Okay, not a lot which is good because it's currently deprecated We are not supporting it in the new version due to some a authentication changes in the OpenStack And it has few limits for example in the old solution. We are supporting only safe as a back-end and In order to use it. You need to create your own OpenStack environment Okay, so how can we have someone else do all the hard work for us, right? It is always the best solution I guess What if we get all the goodies from Cinder and Without the need of a fully deployed OpenStack, so let me introduce you to the Cinder library aka Cinderleep Cinderleep is a Python library that give us an object oriented abstraction of the Cinder drivers and Actually, it allows us to use those drivers without the need of any OpenStack deployment And also you won't need any of the keystone for authentication RabbitMQ for messaging none of the services just a simple Python library that will give you access to all the storage drivers So this library have been developed by Gorka with a reddit engineer working in the Cinder team And he actually started to work on the solution to implement safe, sorry CSI driver CSI is complete is a Container storage interface, which is an API in Kubernetes to provide persistent storage to your container workload So actually it's kind of cool to see all the different projects using the same code base why it is actually spirit of open source, right? so a few words about Cinder Divers and Anybody here know about Cinder Divers? Okay, so what are Cinder Divers? They're actually Implementation of an API by the storage vendor themselves For example, they need to implement all the provisioning for volumes all the provisioning for snapshotting and The good thing about that that they actually know what are their APIs. Okay, so if they have smart APIs inside your hardware, they will know which API to use. They just need to implement the right API towards Cinder Okay, there are about 80 supported Cinder Divers in Cinder and when I say supported, I mean that they are active developers But also CI subsystem That actually will validate every patch in the Cinder Diver and in Cinder itself against real hardware Okay, so we won't have any patches that will broke a specific storage backend Because this is an example of a storage configuration that you will get inside the Cinder Configuration, it's a kind of key value for the staff And only one of the parameters are common between the different vendors, which is actually the volume driver So this is actually the name of the Python library that is implementing them The API and all the parameters are really specific for the vendors So you won't see any RBD pools of course inside the NetAppDiver or LenovoDiver or whatever All of the parameters are documented inside The OpenStack documentation including all the defaults and all the parameters that are available Okay, cool. So now let's see how we can use Cinderlib. It is actually really really simple Just import the library as a Python library and then we need to Initialize a backend. The backend it is the storage driver abstraction Quite simple giving the name of the driver in this example It is LVM volume driver and then all the parameters that it will need for example the VG write the volume group That the volume will be created on By the way, you can have multiple backend on the same code works fine Next we want to create a volume very simple Just call an API from the backend giving the size and the name of the volume can give an ID or whatever you want The next step you have a volume you want to use it so we need to attach it Okay, very simple. Just call it a method in the volume, volume attach. In this example It is a local attach. What does it means that the volume will attach on the operating system that the code is running on Okay Once we have the volume attachment, you actually get the pass attach the pass which will give you the Dev Full pass on the operating system and you can start to write write and read from the volume Snapshot also really simple. Just keep in mind that It differs between the different drivers. So in safe, it would be quite Fast I guess but maybe in the LVM volume driver it will be me takes some more time What else do we have we can extend the volume the later volume of course creating snapshot the editing snapshot, but also cloning cloning from an existing volume cloning from snapshot and Actually, these are all the bidding blocks that we need to improve our orbit disk operations Okay, great. So now we have the magic. We have this external worker that we dreamed of we can upload all the storage operation to the Storage back in itself. All we have to do is to teach over it how to use it. So let's see In order to use cinderlib, we have several constraints The first one is we need access to the storage management API Cinderlib needs a communicate with the story the chair itself Using the storage management API that can reside on a different network We also need a method out of persistency. We are creating volumes snapshots Extending volumes cloning them delete. We need to know what operation have been done and persist them also in the previous example that Freddy showed you we performed a local attach and In over it we need to perform a remote attach or detach of the volume to a virtual machines By doing remote attach and detach a virtual machine can run on the hypervisor But cinderlib runs on a different machine. So we need so we need to allow that to happen So in light of those constraints We decided the following architecture to integrate cinderlib inside of over it Basically, we are creating where we are added a new a cinderlib executor set of classes Java classes That's the whole sole purpose is to run a Python Python code using in a cinderlib client cinderlib clients reps cinderlib as a backend and cinderlib communicate directly to the storage management API so This solves us the first constraints to communicate directly to the storage management API using cinderlib The second constraint is that we need the metadata persistency. So The engine already has a database the engine database is a postgres database We just added a new one a new schema the cinder database and we are providing a cinderlib all the communication info in order to Keep his old metadata and persisted on the postgres database that the engine uses. So we solve the metadata persistency We solve the external management to communicate to the storage API and as you can see here All the communication are done without the hypervisor to be involved all the storage operation are done And the hypervisor doesn't know about them The only operation that need the involvement of the hypervisor is when we are running a VM Then we need to perform a remote attach and detach of the volume to the virtual machines and for doing that We are using OS brick, which is a Python package That's created by OpenStack and Freddy will elaborate more about this later So now that we know how we integrated cinderlib inside of Ovid Let's see how we use it. So in order to add a manage block storage domain We just select a new storage domain and as you can see we added a new domain function The new domain function is the manage block storage We must defer between the new storage domain the manage block storage and the traditional storage domain Because the implementation are done quite different behind the scenes Also, we must keep the UI as open as possible as Freddy mentioned We are supporting over 80 different drivers. Each driver can have his own parameters And the only common parameter is the volume driver itself. So as you can see here, we have the volume driver and set of parameters that Needed for adding a SEF as a back end Also, you have the options to add the driver as sensitive drivers in the driver sensitive options If you are inserting your passwords or Example something like that. You can add it in this section and all those parameters will be kept encrypted in the database of the engine So once you inserted all your parameters, you press okay the engine will do a connectivity check to the manage block storage and Will validate that it can be reached The new manage block storage domain is not monitored yet So this small validation is required in order to validate that we have a communication to the storage itself Great, so we added the new manage block storage domain We can add a new manage block storage disk quite simple. Just selecting a new disk selects the new manage block storage type and Select size. Basically, that's it. You have a new manage block storage disk Okay, cool. So now we have all the operation that For provisioning snapshots everything done on the engine side So the appervizer is not doing anything but we are in the business of running VM, right? So we need to to see what is different. Why the flow is different now? mainly because the storage is not attached to the host all the time so in NFS in the NFS and iSCSI The storage domain is actually always attached to the host as part of the monitoring part of OVIRT In our case the storage is not attached to the host So we need to add a few steps before running the VM to be able to actually Connect to the disk and be able to write in data Okay, so let's take a look at what is new in this flow First we needed some additional parameters from the host itself. We call it connector information So as part of an existing API we added the call from the hypervisor to gather this information and We store it actually inside the engine DB. So the data is there ready for us to When we to use but we we want to run the VM. So what is this connector information? Actually, also it is we are using OS break to get the same format as Cinder is using What is this connector information? It is actually some parameters that we are getting from the host For example the IQ initiator that will be used for the iSCSI connection some IP address also, of course, if we are using multipaths and All this stuff is stored in the DB and we will use it actually in the next step Okay, so now the user of a VM Configure a disk of the new type and he wants to run the VM First step the engine will decide on which host the VM is going to run It has different algorithms. Sometimes the user can select the host that you want to run. Sometimes just checking what what assist is more Available for CPU size anything like that once we have the host that we want to run We want to expose this volume to the host. Okay Not all the loans on your backend storage is available to all of us There is some access rules some stuff like that that the management API need to open so that the host will be available To see actually the loan or the other disk So collecting the volume is done using the engine using the connector information that it mentioned before from the DB Again, we're taking the volume ID and the Cinder drivers parameters to the Cinder Lib Cinder Lib will call a connect method from the storage management API and he will get the connection information Back to the engine database. So what is the connection information? It is actually what all what the host needs to be able to connect to a specific loan Okay, so for ice-cazi it will be the loan ID, of course IQN and portal and also if the target is already discovered or not for RBD, so it will be a little different just giving some Kirings and password and stuff that will be available for the host to connect Cool, so now we have the connection information. We want actually to have the Volume available on the host so we need to attach the volume So a new API on the hypervisor attaching the volume getting the connection information from the engine and again using the OSB Library to attach physically the volume to the host Once everything is attached We are getting the actual path on the operating system Okay, so slash dev slash dm 25, which is the path that the loan is available but in over it we are Actually working with multi-pass mainly for by getting VMs between different hosts We want to keep the same pass on all the host. So we are also providing the dev mapper, which is a multi-pass pass Back to the engine From this point we're actually getting again on the main flow that we had before the engine will build the VMXML with all the parameter and the path of the disk and Send it to the hypervisor that will be using levid and QMU to run the VM And that's it. We have a running VM so Stopping a VM is actually the same thing just the other way around you need to start the VM first You need to stop to write data to the volume then we going back to the engine to disconnect the volume and Eventually, it's a we'll finish all the all the flow. So what's next? There are a lot of Features in storage in over it and not everything is still implemented on the managed block storage For example, we have a list here of stuff that we want to do like live storage location meaning I'm getting the storage from a VM That is running, right? So without actually interfering in the in the user workload disaster recovery for example in NFS and ice-cassey all the data all the OVF all the XML of the VMs are also stored in the storage so that you can disconnect the NFS mount mounted in a different engine And you will be able to Recover everything. So it's not currently implemented in the MBS storage migration between different type of storage German code migration when the VM is done Actually, this one should be quite easy just attach the destination the source and just move the bits between them One thing as we mentioned, it's quite important for us and Some more points that are mentioned here And regarding the packaging we are some we are working on it to have everything as part of the default of your distillation There are currently some manual steps, but hopefully we will get it finished on the next release Here are the links of the stuff we talked about in the live on the open stack documentation also all the drivers and feature page if you want to deep dive inside the obvious features Okay, any question? Yes Since in the living I will repeat the question. So on the example where I'm showing the LVM driver So it's maybe it was not a good example, but it is exactly the same for safe or for a net up driver of our Carinario, whatever and another driver. So it is an abstraction. So you won't see what happens behind the scenes So the snapshot really depends on the implementation So I guess that in self we know how it works But for LVM driver need to check what the code is doing It could be like you said the same way that we are doing currently in of it Yeah, I agree. That's so maybe we could make a better example next time But you need to the drivers are open source You can go and check every implementation of the Fitch driver and see what they're using if it's something that makes sense. So you maybe don't want to use it Yes, any more questions? Yes This feature is currently in tech preview. It's been released in 4.3, but it's still in tech preview a lot of missing parts There so but we are currently trying to provide a way to Edit to the default installation and then we'll add hopefully more features to support this We saw already some overt users on the mailing list using the This feature and we got some good feedback for now. It is actually actually the faster way to use safe without it Yes, anyone else? Yes, please What do you mean by overlap? well the question is what will be and the difference of between Opstack and of your direction maybe and Overlap of a few features. Yeah, so here we want to actually reuse some code So we like we like to do that right in open source. We don't want to rewrite everything again So the idea is to use Something that is being widely in use by a lot of users and it is has been proved like a good solution in a lot of customers that Actually happy with this with open solution and the solution and we want to be able to use also all these Good stuff that is there inside of it and Currently Open second of it are two different solution for virtualization, but eventually We are using the same ideas regarding working with the disk So most of the time it will be on the same page on this kind of stuff Okay Anybody else? Thank you