 Hi, welcome to this session, and we talk about Longhorn. So my name is Shen Yang. I'm an engineering director from SUSE. And I'm the original author of Longhorn. And as you know, Rancher, also the original company behind Longhorn, has bought into the SUSE three years ago. So I hope at least some of you here know what Longhorn is, but just to recap, Longhorn is a way to give you persistent story support for any Kubernetes clusters. And you can easily install it. It's easy to use, easy to maintain. And it's open source, has no string attached. And you can find out more details at longhorn.io. And there are a few things when we design Longhorn, we take to our heart as a design principles. And obviously, when you design a new storage product from ground up, there have to be some focus and some compromise to be made. And the Longhorn focus on three things. Reliability, usability, and maintainability. So in the reliability side, Longhorn is essentially backing by the block devices created by the Longhorn storage layer and is crash consistent. We have also built in multiple layers of protection against the data loss. So in Longhorn itself, within the cluster, you have the in-cluster snapshot mechanism, which able to periodically snapshot your volumes whenever you need it. And you have also an out-of-cluster back-end mechanism building, and with no third party software required to enable you to backup your data to outside the cluster, like S3 or NFS endpoint. In case you have the whole cluster went down and you cannot find any copy of things, like your data center in the worst case went up in fires, you still have a copy of your data somewhere else to keep you safe. And more than that, in fact, Longhorn itself has been built to be worried, resilient. And if you have the scenario, you have the whole cluster went down with, say, your control plane, Kubernetes, everything is down. As long as you still have the hard drive, Longhorn can recover your full data from that, without really having to go, you don't need to go through a very complicated recover step, it's very simple. So those kind of protection we're adding based on our experience on the previous other products, that is the one focus, that's the biggest focus for the Longhorn. The second focus for the Longhorn as, in fact, for old products and projects from the Rancher Labs is the usability. So from day one, Longhorn is designed to be simple to use, simple to deploy. You can deploy Longhorn in one click installation, you can do the help chart install, you can do the YAM apply. And obviously, if you are using Rancher, you can also install it from the Rancher catalog. But any of these ways overworks, and by default, Longhorn can detect whatever is available space in your disk, and able to utilize that immediately after deploying it. And you definitely have a lot of options to customize later, but we're really proud of this, like a lot of effort we're putting into the automatic detection environment and deploy the best options for you. And also, we have provided the UI out of box, and this, in fact, UI has been our way to operate a lot of Longhorn functionalities, like snapshot of the backup, node scheduling, and many advanced functionality has been exposed both from the UI and API. So those makes Longhorn stands out to easy to, like really adapt in a complicated Kubernetes world. And the last thing Longhorn really want to achieve is the manability. And Longhorn itself, as you can see later, is, in fact, very easy to understand. And also, because it's easy to understand, so when something wrong happens, it's also easy for the DevOps to understand like why it's happening and what is happening, and then easy to recover even in the worst case scenario. And Longhorn from like very early days or maybe five years ago, already start to support upgrade with interrupting the workload. So you have a piece of mind to do your upgrade whenever you have a little bit of downtime, like downtime in the sense of like say, you don't have that much other work to do, it doesn't require the workload downtime. So in fact, we can see from our public telemetry data, say most of the Longhorn user, in fact, is on the latest release. That's all thanks to our non-interruptive upgrade process. All right, so what's the latest one? The latest Longhorn feature release, 1.5 has been released a few months back. And it's an enterprise-grade distributed storage software for Kubernetes. We went from distributed block storage software to distributed storage software. It's a subtle differentiator there because now Longhorn currently support re-running many storage as well. And in the future, I'm going to talk about at the demo then very soon, we're going to support object storage using S3 API as well. So you can one click to install it on any Kubernetes clusters, no matter it's running on bare metal, it's running on the VM, it's running on the cloud, it's all works as long as it's Kubernetes, and it supports x8664 and arm64. In fact, it's another thing, like we have been adding and contributed frankly by the community user to add the support and IBM mainframe. Yeah. So on the one hand side, and the key functionality, we have the feature parity for the most like enterprise. So we are on the in the market, stream provisioning, snapshot with schedule, expansion, clone, end-to-end encryption, which is critical for lots of customers, and cross AZ, cross availability zone scheduling. This is mostly for the cloud providers like AWS. And when they have the EKS control plane, across AZ has HH-A, but their EBS volume is always attached to a single availability zone, which can result in like your workload down when you're like AZ is down, and the long-term help you to solve that problem as well. So in the later release, we have add-bit routing detection and the trim, which is going to help tremendously with your storage space management. And we have many support, it's now also GA and the data locality to ensure that your replica can collocate it with your well-to-data is to provide you the best performance and the ability. On the disaster recovery side, Longhorn was building with the backup restore feature from day one, and that is incremental backup and incremental restore when needed. Let's give you the best efficiency for your and the safety for your data outside the cluster. And we have also supported the cross-cluster disaster recovery volume with defined RPO and RTO. And this is enable users to able to have like a main cluster and then the backup cluster and constantly shipping the replica data from the main cluster to the backup cluster in a synchronized way. And whenever, so you have the case if you have main cluster goes down, you can easily like spin up the backup cluster within couple of minutes without waiting for the older data should be restored from your backup store. So on the operational side, we provide intuitive UI out of box and the line mark grade without interrupting the workload. And we have added the permissions support recently and the storage tag for you to customize how you want your storage to be scheduled and then you can have like your volume one wouldn't create over on the NVMe disk. Another volume say you want for just cold storage you can add mix into the spinning disk on the other one. So on the Kubernetes support, Longhorn is in fact built on top of Kubernetes and Kubernetes is all we do, right? So obviously we support all the protocol there, the CSI interface, container storage interface and the extension to the CSI like snapshotter. And in fact, long and extension is also there. We have also added automatic recovery for the Kubernetes managed workload. That is in fact the one thing like Longhorn building on top of Kubernetes to using the feature to complement the Kubernetes. As you might know, sometimes when you have the stateful workload running on different nodes and if you that node went down, you mostly well expecting the node like the part running on that and the volume get reallocated to another one but that is not unfortunately the default Kubernetes behavior. There's a reason behind it but many customers found it's very annoying and the Longhorn in fact have this self to help the user to get the recovery better, right? So for the 105, we have a huge milestone feature which even though it's in the preview release is the Longhorn engine V2 based on the new technology called SBDK. And I'm going to talk about this more on this later. All right, so as I mentioned, Longhorn is very simple and if you haven't seen this and I can quickly go through it and everybody will understand how it works. You have two nodes, both have a disk and a CPU and a RAM and this is obviously forming a Kubernetes cluster and you have one part requesting a volume and in this case, Longhorn, we are going to create two replicas which pointing on the two different nodes using two different disks and create one engine to connecting to those two replicas and providing the block device into the part to serve for the volume. This is essentially just a simple rate one architecture there and the Longhorn can have like from one replica to 10 replicas, which is arbitrarily limited with putting there. We don't know anyone is doing that but that is like how flexible it is. And also you can see if you have like second part on the volume and you can create the same set of replica and engine and the same set of replica and engine for the new set of volume as well. So there are two advantages to this design is first you can see from the data pass which is the line shows up and arrows there. Those volume, even though they're residing the same system the data pass is not in the wind. And that means like even you have like one replica down you can at most affect like one volume. And obviously if you have another replica still working there's no interruption. And even you have like one engine down it at worst case just one wouldn't get affected. All the other volumes on the system we're not going to be like get involved anyway. Another thing you can see that is on engine side we always collocated engine with where the workload part is. So in a case like when the load was down which is the most common case the storage work need to deal with. And obviously the engine will be down but that is because the part the workload itself will be done as well. So in that case that is we're going to relocate the part and the engine will follow it to point it to the available replicas later. But the thing is like traditionally those engine replicas processes need to be scheduled in a way which is like kind of complicated but now with Kubernetes help and those components can be easily scheduled and orchestrated by Kubernetes which give you a way to like really doing this microservice based like storage engine here. I can go to a little bit of details on how engine works. And now you can see you have three nodes and the different kind of disk there. The black one is like the data disk and the gray one you think is like root disk. So you still have the part went down and the replica created and that is created with what we call isn't the manager with its aggregation of the process on the node and the created engine and the two set, the second set and the third set that's all good. But what if you have one node went down? The node went down, right? So node went down and the part B and the C creates still on the node two we're not going to get impacted because their engine will automatically cut the communication from the replicas from the first node and the part A we are going to get relocated by the Kubernetes with the volume which going to promote long-haul to create another engine on the node three even like node three has no local disk and local storage and the engine from the node three we are going to find out, okay who is the remaining replicas of this part like this volume, so I'm going to connect to it and everything restored. So that is how the failover in the longhorn happens on the block level and that takes like a millisecond for the engine to switch over to cut off the replicas and it takes like a couple of minutes for the Kubernetes to decide to relocate the part and that's entirely decided like how Kubernetes want to do that. All right, so a little bit on the community side we have in fact, counting 99,900 plus worldwide life node right now and I hope like by the time I'm giving this talk is going to reach like 10,000 but I think we are like this morning I checked is it's like 100 nodes showing and we still maintain a very high growth rate at 50% plus and you can find those adoption matrix available at matrix.longhorn.io and you can see from the picture there we have quite a wrong from like I think 2019 to now. Yeah, so we have more than 3000 users in the Slack channel and the 5,000 plus stars from GitHub and really the most interesting I want to talk about is the upcoming Longhorn 1.6 release. So as I mentioned before we are working on the SBDK engine and called Longhorn Engine V2 and that is based on SBDK and the NVMe over fabric technology and SBDK is a storage performance development kit developed by the Intel originally for its Octon hard drive and this is very high performance using pooling instead of interrupt to get in the maximum IO possible and by like basically rewritten the Longhorn engine based on SBDK we have achieved the near native performance for the V2 engine and for the 1.6 release which is due to which is due January next year we are going to have a core feature functionality like a snapshot backup and those features will be available. We are also introducing the object storage volume which is provided you a S3 endpoint to using within the Longhorn directly. You can find out more roadmap information at GitHub Longhorn VK roadmap and let's talk about like how it's different from the Longhorn engine V1 versus V2. So in engine V1 you will have this stack at the workload level and you have application I hope it's like big enough for everybody to see application going to write the file system which is in turn going to talk with the iSCSI block device created by the Longhorn and iSCSI that's initiator talking to the Longhorn TGT which is the user space iSCSI framework we use to expose the block device and that TGTD is in turn talk to the Longhorn engine using some customized API and protocol and use and also through the Unix domain socket and that means like obviously engine is going to be on the same node as the workload and engine in turn uses customized protocol over TCP to talk with the replica on a different node which replica in turn to access the sparse file on the file system and that's the file system in the end of course going to access the block device so if you have multiple replicas and that is like you just do it for the multiple times for the replica so that's how engine V1 looks like on engine V2 which is based on SBDK we have dramatically shorten the data pass obviously application they want to rewrite the file system nothing you can change there but instead of using iSCSI we're using MMV over fabric protocol to expose the block device it's still going to be on the same node but we are using SBDK which is also a user space target to providing the block device and we have rewritten the Longhorn engine to be embedded inside SBDK that is written in C so V has greatly like we already reduced and just eliminated communication between the previously DGTD and Longhorn engine and now they just like internal process communication and by that we have achieved zero copy from SBDK received a package to the Longhorn engine spread out like talking to the replicas and the Longhorn DGTD, like Longhorn engine is going to talk to the Longhorn SBDK target on the other node using MMV over TCP protocol as well and that is also embedded already the Longhorn replica logic in there and that is also being rewritten so the Longhorn replica now is going to access the raw block device instead of the file system so one layer less to reduce the overhead so if you have like multiple replicas like the replica side of architecture remain the same but there's one even more interesting question what if you have those distributed like database they only want like a one replica but still want the best performance and the ability to maintain your snapshot and backup mechanism that's when you want this Longhorn NGVTU also have a dramatic improvement for the local replica mode which you have like a one replica collocated with your workload in this case the workload like the application right into your file system which in turn go to the NVME over fabric backup block device that's all same but the SBDK target part has already shortcut everything in it so the Longhorn engine inside can talk to a long replica inside like directly which talk to the local Roblox device directly so that has dramatically shortened the data path and we have achieved the zero copy from the point of the package into the SBDK target to getting to read or write the data on the disk so by this way you have achieved a near performance a near native performance and we still maintain the same feature for the volume like snapshot backup expansion everything available to the normal volume will be available to this local replica mode and this best suited for distributed state for workload which means that they don't require high availability from the Longhorn they're going to maintain themself they're doing the sharing themself and but they want to still have you still want to have a consistent way to maintain your old workload like distributed or non-distributed so the GGB database or data store will be very suited for this use case so what's the performance we see and IO ops you can see the blue one the blue bar there is the native performance and the red bar there is the performance for V2 engine and the yellow bar there is performance for the V1 engine so you can see that for the read and write and both the local replica mode and the replica like three replica mode for the V2 engine has been dramatically improved over the V1 engine data and in fact we are also continually improving that and make sure that there's definitely going to be more can be get down there and we have also on the you can see that the Neo native performance not only for the single replica is if you are using like looking at the read data that's also there so on the throughput side because on the Longhorn we are reading from multiple replicas when you do the read and that means like we can aggregate the throughput from multiple nodes at the same time that means like you can see from the sequential read and the random read the engine V2 data has been like a triple to value what is coming like what is available from the local disk so now it's really the most important thing is latency is the king latency is the king for the storage so on the Longhorn engine V2 we have achieved the single node latency overhead in less than 30 microseconds for the local volumes obviously if you are going to the network it's going to be slightly higher but it's still a vast improvement to the V1 engine and in fact maybe the fastest out there in the market as well all right so that's all for the V2 engine another thing I want to talk about for the 1.6 release is object storage volume and we are providing an S3 endpoint out of box because our customer really want a unified storage solution rather have these different things piecing together to give their to provide S3 endpoint object storage port so starting with 1.6 we are utilizing the project called S3 gateway within SUSEV Venture and which is based on the RADOS gateway from SAF to implement this object storage volume and 1.6 we are going to declare this feature as experimental and we have already implemented the get-put-delete multi-part upload object store like object versioning, locking and those features will be available and the more features will be coming so for the object storage volume it looks like this you will have like we are working with the SAF upstream to have the RADOS gateway adapt to this approach and the RADOS gateway is in fact very mature has long-standing provider for the object store and the S3 API layer is being like I already know that it's going to be hard to maintain that layer so that is in fact we are taking that from upstream RADOS gateway to ensure the best quality for that and RADOS gateway they have an abstraction layer called the deeper driver abstraction layer and after that is going to be more of the SAF specific and we have replaced that with S3 gateway file-based driver backend to put that on top of the Longhorn block device and the Longhorn block volume in turn like in this case providing the whole package to the end user as the Longhorn object storage volume and you are going to have the S3 client connecting to you can create a connected to how many like object storage volume like you want and you can create more than one and in the end this is backing by the Longhorn block device block volumes. So, okay so due to time limit I'm going to do a very short demo and give me a second to switch and I really hope this works. All right so what I'm going to demo here is the new object storage volume feature and this is the rancher as you might familiar with and this is the simple demo cluster we created and if you go to the Longhorn tab here you will see the general overview for the Longhorn volume and you keep a note there's three notes here and there's a couple volume here that's all known not news but there's a new tab called object storage. So this is how you're going to create an object storage volume and if you want to do it create object store and put the name, size, access key, secret key in there and there's a few options on the endpoint, data placement this is essentially how exactly same as the Longhorn volume itself you can even specify what engine you want and once you create it, for example, I have two here I'm going to, you can in fact explore what is inside that object store endpoint. So, administrator is going to give you to jump into the main AI and put your ACL access key, secret key there you can see that we have total one bucket this bucket called bucket one and I didn't put anything in there but now we can. So what I'm going to do now is switch back to the Rancher and I have a pod running here and this pod is talking, it's actually command which I think most people may familiar with and I can execute into this pod I hope this is big enough so this is directly communicating to the object storage I created before and if we want to take a look in the house look like you can see that, well obviously there's access key, secret key which is highlight like, well I'm not going to highlight that but this is the endpoint. So test object store which is the object store name dot Longhorn system service cluster local this is addressing using the internal Kubernetes DNS and there's the one settings you might need to pay a little bit more attention is to yeah, so how the bucket format is and put it something like this there and encryption no, P2P no and this is HTTP we are going to add guideline to how to do it at GPS later and no proxy and credential yes okay, so they have successfully verify it's connected and I'm going not need to save this setting because this setting is good enough like, is that big enough? Okay, I'm going to do a little bit forward work here. All right, so now if I go to LS3 command list and I can see the bucket one there and obviously I can do, it's nothing in there, yeah. So now I'm going to get something in there. Okay, so this is not a project however, like the renter is working on this is open source hyperconverged infrastructure and I'm just going to grab some file here and obviously Hoverster is also based on Longhorn for the storage layer but let me just grab a file which is big enough to make a dent. Okay, so we have complete like downloaded this file now next I'm going to upload it. Okay, so it's down, like let me see if there really, it's really there, all right, it's really there. So how can we know that this is right? So let me just rename the current source file and I'm going to download it. Okay, it's there. So let me try to run, show some, run command. Yeah, they're identical. So we have successfully upload and download using WorldCommonS3 command and once it's there, you can also see that from the bucket which is now over sizing, just refresh it. You can see the file is there and there are a few details you can see on the UI as well. And you can do see more things about that from the UI as well. Okay, so that's my quick demo. And I'm going to switch back. Okay, so that concludes my presentation, a demo. Any questions? Yeah, the 1.6 will be available at the January 2024. That's its current hour timeline. Yeah, so if you're asking about when will the long haul engine V2 will be GA quality, that is likely going to be the later next year and may be slipping to the afterwards because like, yeah, we have rewritten the whole engine using C and yeah, so it's going to take some time for us to just realize it. Okay, so we have the long haul booth and the project part of the show and showroom. And if you have any other questions, feel free to find me there. I will be there until the show closed. All right, thanks everyone.