 Welcome everybody, sit down. We're a few seconds late for our talk. It's our first non-glass and non-seph-related talk, so that's nice. So yeah, welcome, Harshita. Hi. I have to speak a bit louder. Can I have a seat, please? Hello. Cool. So, hi everyone. My name is Harshita and today I'm going to give a talk on Run CFS in user space. So, I'm a developer advocate, kind of a newbie in that area and a software engineer. So, I work at a company called Maya Data. We basically work on containerized storage solutions for Kubernetes-straitful applications. In simple terms, we provide storage in containerized form. So, today we are going to talk about these four bullets. First is, what is the UCFS? Why UCFS? How we use it in our storage engine? Storage engine is called C-store, and the key components which are included in the storage engine. So, what is UCFS? So, UCFS is basically the modified version of CFS which comprises of the changes which we made in CFS to run it in user space instead of kernel space, and also the changes which we require to make it as a C-store replica. So, C-store, as I mentioned, is a storage engine and it basically comprises of three things, which is a Zirappel binary. I'll come back to it. What is Zirappel exactly? The second thing is implementation of replica API, which we use for the IO operations plus the rebuilding, replication, cloning part of it. The third thing is management binaries. The changes which we made in management binaries, ZFS and Z-Pool. So, why the UCFS? So, as we know that ZFS provides excellent advanced file system capabilities, and we tried to use the ZFS file system capabilities which basically runs on an individual server to make a storage engine which is containerized and which can be run on on-premise and multi-cloud platforms. The benefits or you can say perks of UCFS is, there's no kernel dependency as it runs on user space, completely in user space. Rebuilding, the extra feature which we provided is rebuilding, which is including the ZFS features, which are already there, which is snapshot cloning, application and it is containerized means it is loosely coupled and it's API driven. Now, before going into details, how we used it and what changes we made, I'll just give you a basic architecture of the storage engine. So, on the bottom side of it, there are disk block devices and we created a pool which is called C-StorPool on top of it, and in this case, it is mirrored one, and on top of C-StorPool, we create our volumes and for the volume replication thing or factor, we create a volume replica on each node where C-StorPool is running. That pool is managed by a target pod, which can be run anywhere and now, on top of it, we then an application, stateful application which uses the PVC to create a PV and basically writes on a iSCSI device which on the node where that application is running. So, basically, application will be running on any node and that not a iSCSI client should be there, so that we can create a iSCSI device and communicate with the iSCSI target which where the iSCSI server is running. So, when it comes to this target receives block IO from the iSCSI client whenever application runs or writes anything on the device, and the target work is to replicate that block IO to the volume replicas which are on all the three nodes or any kind of node or any number of nodes, depends on what replication, how much replication factor you need, and the second thing is C-Stor replica which receives the IO and basically writes on the local storage which is our block devices. So, how we leveraged UCFS to build our storage engine? So, first thing is ZFS C-Pool CLI, what changes we made there is as we know that Z-Pool and ZFS CLI works in user space and they communicate with the ZFS which runs in kernel space, and it basically do the IOCTL call which is a system call. But as our target goal was to run the storage engine in user space, so what we did the approach we took is IOCTL redirection, where we allow ZFS to not do the IOCTL call, but instead of do create a Unix domain server. So, the ZFS CLI now runs on a Unix domain client side, and the UCFS or we which we call UCFS or the Zirappel binary, which we where the ZFS running in user space. On that side Unix domain server is running. So, IOCTL redirection as I told you ZFS CLI, Unix client is running on the Zirappel side, which is a binary where ZFS runs in user space, there the server is running. So, instead of the IOCTL calls, where did the Unix system calls, not system calls, socket calls, and then these two components make the call, and now what happens is ZFS CLI will send a command, for example, to create a pool will send a Z pool command, and with this information or command will be sent with the IOCTL information, and this ZFS CLI runs on a container as every the whole system or storage engine is containerized. So, the CLI binary is run on a C store pool management container, which is a sidecar for a C store pool pod, and it sends the information with the commands and the IP address of the target. I will come back to what is target and how we are using the target to do the IOs. Next is when the Zirappel receive the information, it runs the same functionality or functions which ZFS runs in case of kernel space. So, what is ISCSI target? ISCSI target is usually used to do the I operations and syncing of the data over the replicas, and it communicates with the Zirappel. So, two kinds of connection are made. First is data connection to send the IOs, second is management connection, which basically used for the registration of the replicas. So, for example, we created Z-wall and the Z-wall which was created will make the management connection to the target and will register itself to the target. So, the target is aware of what replicas are there. Yeah. So, next thing is implementation of API. So, the main thing is the changes which we made for the Z-pool to be able to run the Zirappel commands as IOS ETL redirection. The second thing, to be able to use the Z-wall, we skip the device creation part, because to create a device creation view, you have to go into the kernel space side. So, what we used is, we used the walls to do the IOs and to do the replication and cloning and everything. So, the I operations received at the lift or side, which is basically a layer which runs on Zirappel side to basically do the redried API specific calls. And we also created a wrapper of a DMU layer, which is a ZFS layer only, data management layer. And there, we just created a wrapper so that we can send other I operations along with the metadata. And what is that metadata is target with the I operations send a metadata, which is kind of a incremental IO number, which is stored in our volumes in Zapp attribute, which is used to rebuild in case of a failure. So, for example, if we talk about the whole flow, user created a C-store pool part. How he will create it? There's a YAML, he provided the block devices and he mentioned, I want a mirror kind of pool. Then, C-store pool parts will come up on one on each node. Then on the container, C-store management container, Z-pool ZFS binaries are running and the command for creating a Z-pool will be sent to the Zirappel, which is running on another container on the same pod. And it will ask to create a Z-pool for me. For a pool will be created then. And next thing is after a pool is created, now we want to cover the volumes. So, the same binaries will send the Z-wall command along with the target address. Target is the address of the same thing, which I told you, which is used to send the I operations. So, Z-wall will be created and Zirappel will make a connection to the target for the management connection to register the volume. And when the target sees the volume is healthy and who tells the volume is healthy, the volume only tells or sends a signal, I am healthy and then I operation will start coming from the application then to the I SCSI device, which runs on the same node to the target, then to the replicas. And yeah, so suppose, so in case of volume replicas, when it comes up the every replicas state is degraded, then it communicates with the target, then target will tell start rebuilding yourself. Though how will they rebuild? It will tell as Zapp attribute contains the IOCTL, sorry not IOCTL IO number, incremental number. So, the IO number is basically, it's a number which is, which we send incrementally along with the data, which is block data which is approximately 512 bytes. And from with the IO number which is the largest or the highest is the latest IO send to the volume. So, suppose volume three has the highest IO number. So, it means the latest IO was sent to that replica and target will tell okay, this is the volume from which you need to rebuild. So, the volume will start rebuilding and once every volume is healthy, basically we use N by two plus one factor, which is suppose there are three volumes, at least two of them should be healthy for the IOs to come up, to come in. And then, so basically LIPS store is a library, which we use to do the API specific operations on the volumes. These are the API operations which are performed through the LIPS C store API. And the components we talk about is the C store target, I already told you what it is. And it exposes the ISCSA device, which is to the application. So, as I told you the ISCSA initiative should be running on the node where the stateful application is running. So, it's the work of target to expose that disk to the application, so that application can write on the device. And it also runs as a container and it's also used to send the data and to sync by what I mean by syncing is every replica should have the same data. And C store pool, it's logical storage pool which is comprised of the physical devices or the block devices, which you mentioned while creating the pool. And pool pool is created on node and set of devices. Now, management container is the container where the binaries are running. And on the same pod, another container where the Xerapple is running, which is basically UZFS, which is a ZFS running on user space. And as Unix domain socket server is created on the servers on Xerapple side and the client is created on the management binary side. And for to directly contact with the developers who have done this thing, you can join the Slack channel and they are blocks written by the developers. And there's a GitHub wrapper for our storage engine we just see store. Thank you. Any questions? Thank you indeed. Sorry, we don't have time for questions anymore. If you have questions for Shita, please find her somewhere else and ask her questions. So if you join the Slack channel, your developers will be in direct in touch with you. So they'll be better to answer the questions as they have write the code. Thanks.