 Good morning. Good afternoon. We'll just wait for a few more minutes to have a few more people join. Hello everyone. We're just waiting a couple of minutes to have a few more people join. All right, I guess we can start. Grinton has made his apologies. He's died up with something else. So as our first topic on the agenda, we have two items on the agenda today. Our first topic is the Pravega project. So, oh, hi, Luis. So, Pravega presented to us a few meetings ago. I've put the YouTube recording in the list. Because Pravega presented to the SIG before they actually made an incubation proposal. Amy has said that they can skip the TOC 3R step so it can go straight to the SIG presentation or recommendation. My personal recommendation at this stage is that we should recommend this to move to the TOC. Asking for the SIG leads to weigh in or any of the other SIG chairs to weigh in on the Pravega project. I think the proposal that they have put together is particularly strong. So unless we have any specific objections, I'd recommend that we move this to the TOC. Should we put some comments in the PRs or where are we capturing those to the TOC? We're going to put them in the dock. I think we should, well, if we want to move ahead, yeah, we should put some, we should put a recommendation in the PR, I imagine. Yeah, okay. I mean, I think it's strong. I think it should move on and I think it enhances the landscape. So I'm for it. I just wanted to know what, you know, our process was moving forward. All right. So we'll give it, so Derek, I'm not sure if you're on the call, but we'll give it a couple of days just to get any more feedback and then make an official recommendation by the end of the week if that works for you. Yeah, that sounds great. Really looking forward to it. Also, I did about two days ago, I added a comparison with Kafka to the proposal. So look for that. Alex, you recommended maybe also adding like provisional use cases from the website. The proposal is getting a little long. I don't know if we should add those, but I wanted to, if you guys have any other recommendations about the proposal, we're certainly open to that. As you guys finish your reviews. I think it looks, I think it looks fine as it's honestly there's there's enough detail there and the comparison with Kafka is really brilliant. That must have taken quite a bit of research and work. So we appreciate that. Thank you. Yeah, a lot of reviews as well. I wanted to be accurate. So thank you and looking forward to this other presentation today. Awesome. Okay. Did anybody else want to comment on Pravega before we move on. So the next item on the agenda is, and I expect this to take the majority of the rest of the call is the presentation of the various data store. I'm hoping I'm pronouncing that right. So this is, this is a Kubernetes cloud native storage project that that builds on the DRDB project. And I believe Philip is on the call and he's going to be presenting. Hello, Alex is Philip speaking. Hello. I'm ready to go. Fantastic. Do you want to, do you want to show your screen. Yes, I will do that. Okay. I assume my slides are visible. We can see it. Thank you. Okay, then I will fly over the first half of this huge slide deck really quickly, because I assume you guys are all very familiar with that. So the first part is about software building blocks on the Linux kernel. And they are all used with with various data store. So this is why I have them here in the slide deck. And the first building block is LVM. And I'm pretty convinced all of you know what LVM does physical volumes into volume group and out of that we get LVs and snapshots has been around since must be like 30 years now, right. A few years later, it got the capability to manage thinly allocated logic volumes. And that's the thin pool driver. So with that a regular LV becomes a thin pool. And out of that you can create thinly allocated logic volumes and snapshots of those. And those snapshots are really efficient. If you take multiple snapshots from a single origin. Then what else is there is Linux rate. Linux software rate provides all the rate levels from striping, mirroring rate five, six and so on and it has even two front ends these days. But it goes to the same backend implementation in the kernel. Then there are a number of implementations to use to block storage tiers where one is a cache for another one. The two major one on the mainstream kernel are DM cache and B cache. They both serve about the same purpose. And there's a third one. The third one is called DM write cache. And DM write cache was built with the purpose to put P mem in front of, let's say, In the meat drives. There is also the duplication on a few kernels on all the relative seven dot five and later also sent us seven dot five and later kernels. Where is that coming from. It's, or it's called video virtual later optimized. And it's coming from an acquisition of redhead. And maybe one day they managed to bring it to the upstream kernel right now this is a real or sent us technology only. Then there are all sorts of targets and initiators. So these days, the only relevant is the Lio, which provides ice cozy and all. It's relatives and the new kid on the block is and we meet our fabrics and we have a target and initiate implementation on the upstream kernel and also arriving in the in the recent distributions. And the question of the top of your head, do you know which kernel version supports the envy me over fabrics now. I don't have it top of my head, but the five some things will have to envy me over fabrics. And when we talk about the major distributions I know it's, it's in real eight dot one. Right. Maybe also an eight zero but I'm not sure top of my head. Thank you. Yeah, CFS and Linux is also worth mentioning it. It's not on the on the upstream kernel, but it ships with the Ubuntu distribution. So you find a number of Ubuntu users out there who prefer CFS over other technologies over LVM or LVM with thin provisioning. And it's not only a file system. It also has a complete replacement for LVM built in that is capable of doing thin provisioning and it also has its own spin of the rate idea called rate set or wait see and it also brings caching to use SSDs cache for slower storage tiers. Yeah, and for what I am concerned I will only look at the let's say volume management aspects of it. I don't care about the file system aspect of it. And then here at Linbit we are a lot focused on on dbd. So let me explain what that is in a few minutes. So when we look from a very, very high altitude, you can imagine it as being like a raid one between a local lock device and initiator. And on another machine behind the target is a second copy of your data. But that is just to give you an idea. It is implemented as a device driver like a virtual device driver for Linux. It is a block device definitely something here and definitely something there. And in the moment you open it or you mount it. It promotes to primary and starts to replicate everything you write to it. In the moment you unmount it, it demotes the secondary and you're free to mount it on the other side and the direction of the replication will be reversed in that moment. So that is usually used when you have one application, let's say a database using two block volumes with different characteristics. You have Oracle database and you have your database logs on the fast NVMe drive and you have your table spaces on the slow drive. And if you mirrored two volumes concurrently within a consistency group, then it makes sure that the writes are never reordered and the two volumes are always at the same logical point in time. On the replication target. It's so far my slides only had two notes, but it is a multinode technology. So you could mirror from one primary to two secondaries concurrently. And each of the replication links can be synchronous or it can be configured to be asynchronous. And that can even be switched at runtime. It supports a diskless mode. And that means the primary, so that is the node where you're using your data, where you're accessing your data. It doesn't need to have a local replica of the data, but is also capable to ship the read requests. And in this case, where it has two nodes that actually have copies of the data, it will do a kind of a load balancing scheme between the two nodes for read requests and for write requests. It sends the right request to both nodes concurrently. The application running here in the primary is of course shielded from any failures. So let's say this secondary goes away, and there was a read request processed on this secondary. And now it crashes, then the primary will reissue the read request to this node and deliver the data to the application. So the application shielded from this error. When the node comes back, it gets, it is automatically reintegrated and gets a re-sync of the writes it missed in the meantime. Yeah, so you, so you can imagine it as the diskless primary as being an ISCSI initiator that has the luxury to be of being connected to two targets concurrently. Yeah, all that comes from a background of building high availability systems, and we have been working on that since nearly 20 years now. And what we did recently is recently we optimized it in case our metadata is located on PMEM or NVidim's because then we have the luxury that we can update the data there in smaller units than full blocks. So like in cash line granularity is what you get then. And on the roadmap we are planning to look into ratio coding. But this is still very, very deep work in progress. Okay, so far, I told you about all these storage building blocks and Linux, including DVD. And they can, they can be combined as you need them so you can use logic volume from LVM as backing devices for DVD. You can put a video, the application below the LVM, or you can slip a DM crypt encryption between DVD and LVM and so on. So you can combine them on the data plane as you need it. The complicated part about that is that most of these things bring their own management tool. And this is the point where lintstore comes to the picture. The idea of lintstore is that it builds on all of on all those storage building blocks, and it gives you like an unified API for it. So it is a distributed application you run on a bunch of generic nodes. The only requirement is that these nodes run the Linux kernel. And it can then fulfill your volume requests, like you showed in, I need a new volume, it should be two way replicated it should be that size and I also give it a name. And it can do that for you. So the user it exposes a REST API, and then top of that REST API we have built various connectors, and one is for the Kubernetes world, and I will put a focus on that. And we also have connectors for OpenStack, OpenNebula, Proxmox and XCPNG in the works. Yeah, then maybe let's look at an example. So in a hyper-converged architecture, I'm looking here at, let's say, six nodes that have enough memory and CPU to run the workload. In case of Kubernetes, these are containers, pods, of course. So please excuse that they are labeled VM here. And the node has also built-in storage devices. Now, if Linster gets a request for a new assistant volume, and let's assume our policy here is that we want to have them two way replicated in order to be to protect against failing one of those nodes. We try to place one replica on the same node as where the workload is running, and replace the second replica somewhere else. So right now for the orange part and the black part, the layout is in the optimal state. And that gives us that every, all the read requests can be carried out locally, not touching the network at all, giving best performance. And reducing load on the network. Only for writes, we need to send something over the net. Now, in case a VM gets life-migrated or a pod is moved, the situation looks a little bit different. Now it is a three-node dbd setup with a primary diskless here where the workload is running and two storage nodes. So now all the reads are also shipped over the network. But it takes a single insta command or a time-triggered policy. And given we have enough available storage space here, insta can allocate here new logic volume added to the dbd config. dbd will start to copy over all the blocks, re-sync or everything over. And when that is finished, insta will remove the now redundant third copy redundant in the sense of the policy we're using here in this example. So this is a, this was a walkthrough by example for a hyperconverged architecture. Insta can also deal with architecture where you have disaggregated storage nodes and compute nodes, your workload nodes. Now, I want to touch quickly on the software architecture of LIMSTOR. So it has, it comes with two main components. One is called the satellite. The satellite could also be called LIMSTOR node agent. So it runs on all the nodes, either storage or workload nodes. And it is our agent there to execute local commands. It is, it doesn't have any local state nor does it need any configuration. So just installing it and starting it and in context of Kubernetes, it's simply a stateless container. The controller is the central part. It establishes connections to all its satellites to do something useful. And in traditional insta setups, the controller would be stateful. It has an embedded SQL database. In the context of Kubernetes, it can put everything into an etcd key value store. And then the control itself is also stateless and can be easily be moved around. I should mention here that this structure we are seeing on this slide is only the control structure. It has nothing to do with the data path. So the data path is dbd and the data path is independent of that. So that means we can stop and start the satellites or the controller. We can even upgrade the controller and the satellites to complete LIMSTOR system. And all the existing volumes, all the existing persistent volumes that are in use continue to do IO while we can do that. Hey Philip, sorry if you're going to cover this in some latest slides, but I was just wondering how are the control plane decisions made? Is it within the LIMSTOR controller or the LIMSTOR satellite and is it centralized function somehow? What do you mean by control? So for example, when you get the request for a new volume, for example, what makes the decision on placement or how to decide which node it should live on and that kind of thing? Okay, yeah, now I got it. Yes, that kind of decision is found in the controller and then the controller hands out the tasks to the satellites. So is there sort of one controller that's a leader for the cluster, so to speak? Yes, there's only one active controller. So the grade out controls here just like standby positions. I see. Yeah. And that often leads to the question, what is the scalability of the system? Currently the biggest installation we have operates with about 500 satellites and one controller and so far it behaves very smooth and we expect that this can scale to 1,000, 2,000 easily. Okay, thank you. And the other things we see on the slide is, yeah, well, the REST API and then there is a little client library and the CLI program, how we can inspect all the LIMSTOR system. What kind of challenges it can solve for you is, for example, data placement. So you could, it supports you that you can tag your nodes with chassis numbers, room number, rack number, and then refer to these tags in your policy and you could express a policy like always place replicas in different chassis, but in the same rack, things like that. And this placement policy recently got a, let's call it multi-dimensional thing so it can take into account into consideration available storage space, all your constraints based on labels, but also other metrics like available bandwidth on your NIC or available bandwidth to the back end storage. Literally arbitrary things you want to take into account. That sounds extremely powerful. Are those policies, how are they exposed to Kubernetes? They are LIMSTOR objects and from Kubernetes, you, or in the Kubernetes world, you have a storage class and the storage class maps one-to-one to a so-called resource group in LIMSTOR and on the resource group, you express all these policy stuff. Okay. I'm not sure if at this point every property of such a LIMSTOR resource group can be addressed through the storage class YAML, but yeah. All right, thank you. I have a question. Maybe you are going to cover this later. I was just wondering since this project is called a pyrus dataset project, but I haven't seen a slide that shows pyrus yet. So I'm just wondering if that's coming up later. Yeah, that's coming into slides. Okay. I'm sorry. Okay, because so far it's actually not Kubernetes-specific up to this point, right? Okay. So yeah, the connectors, Kubernetes and some others, let's ignore the others. So for Kubernetes, we joined forces with Dow Cloud because at LIMBIT, our main expertise is Linux and DVD and all very low-level stuff. And pyrus and Dow Cloud helped us a lot to understand what is actually necessary for the Kubernetes world. So we began with packaging all these components into containers. So we have now module loader containers for the DVD. We have prepackaged containers of the satellite of the controller. And even now an operator. We started with deployment by YAML files. And yeah, and that is now starting to be deprecated by the operator. And yeah, and here comes in where, what is pyrus datastore? Datastore is a packaging of LINSTORE and DVD publicly available on Docker Hub Quay. The development happens on GitHub. Yeah, and it has all those components. LINSTORE, controller, satellite, the operator, etc. CSI driver, of course. I forgot to mention that, right? CSI driver. And recently, we finished the work on Stork plugin. So Stork is a Kubernetes schedule extension that allows you to collocate your workload with replicas of your storage. So we are working with the Portrax people to get that merged into Stork upstream. And yeah, that's, that's pretty much about it. Yeah. So here this is my slide that reminds me how the mapping is. So Kubernetes storage class is a LINSTORE resource group. And Kubernetes persistent volume is a resource with a single volume on the LINSTORE level. Yeah. And then I have a case study and a summary slide here that tries to summarize the scope of it. Really, just a quick few questions. So because I feel like we covered a lot of the detail of what's happening in the data playing with LINSTORE and the DRDB foundation. But I'm sort of still a bit fuzzy and maybe we haven't covered enough detail of sort of how this operates in a cloud-native world in terms of, you know, when you're operating it's either hyper-converged or otherwise. You know, how does placement decisions happen? How do failovers happen? What does the CSI integration kind of look like? Is that something you have a bit more detail on perhaps? I don't have slides on that. But maybe you can guide me a little bit with questions and I see that here on the call is also Moritz and Moritz maybe can help me with answering a few of the questions. That's fine. Yeah, I mean, you know, it was a pretty sort of quick turn around between asking for a presentation and actually presenting. So, you know, it's fine if you don't have everything on a slide. But I was wondering if you could go into a little bit more detail on some of the Kubernetes integration aspects. So, you know, you mentioned that there is a controller, but what does the controller do in terms of the satellite? Does it configure LVM and set up the RBD connections and how does it manage, you know, for example, a node failing or something like that? Is that logic part of the controller or is that something else? Yeah, the controller, and so the lintster controller that has the database, the overview of what is the cluster, you know, all the nodes, all the volumes, all these objects. And that's still not Kubernetes specific, right? So the lintster controller would be the same if it is used in another environment. Right. And the Kubernetes specific parts, well, the CSI driver, right? So, all the control is through the CSI driver. Do you have other operator that communicates with Kubernetes directly or is it all through the CSI driver? It sounds like it's torque and CSI driver. Right. So how is it all going, basically using the 3D class and decide what is the resource group to use and then everything happened that's controlled by the resource group, whatever, however that is configured. No, I meant like, I think it's stork and the CSI driver are the interfaces. Oh, stork, okay. So, okay, stork is a scheduler, right? Stork is not just an extension scheduler, it also deals with snapshots and group snapshots, movement of data. Okay. It deals with all the data lifecycle services that Kubernetes does not. Okay. So there's, there could be a lot of, I guess what we're asking is, we don't know the interaction. We don't know the interaction between this project and the cloud services that Kubernetes provides. I guess that's the question, right? Yeah, it's not clear because I just, as I was saying, I see the barriers that don't see too much about it. I mean the slides and I went there to look and it looks like there's an operator. Yeah, not, not, not quite clear to me how. I think the question is what is being presented and what is, what is the request to include in CNCF? Right, yeah, seems like it's the pirates. Sorry, how do you pronounce that? I say perios. Oh, perios. Okay. That might be from my German language background. Yeah, so what, what we bundle under perios is the CSI driver, the operator. Yeah, also the YAML files, the Helen charts. And, well, the stock bits, but I think stock is, well, is stock related to the CNCF? I am not yet. Not yet. Yeah. And we also have plans to build a failover control. And that is also very Kubernetes specific. And so that is also going into this perios project. Okay, so it sounds like we need to understand more what this operator does. Hello. Yes, go ahead. This, this Alex from dark cloud, which are helping limit to do the perios project is okay for me to add something. So to make it more clear about the project scope. Oh, of course. Sorry for you. Sorry for you. Yeah, I think we can give you a hand on this one. When I, when I share the screen for two slides. Yes, no problem. Okay. One second. Alex, I can share it. I have it here. Yeah, please, please, please share the screen for the two slides sent you today. Yeah. So I think what a shame. Mrs young machine is asking about is it the difference between perios and the link store. And, and this slide is about it so generally links. This is a stack similar to what you see from root plus self plus new bus sort of. So here on the previous does is is for the containerization and orchestration of the links to components. Okay, now it's done by operator. We also have the CS driver and also will contain some like a fellow refacing for our W O volumes and also the the connection to the stock scheduler. So all the Kubernetes components will be inside the previous project. But link store is actually the story system that does the clustering. The life cycle management means a create delete volume, and also resize volume, and what volume monitoring is all done by link store. And d rbd here is for the block replic replications. Okay, this is this is a this is a stack. So, say the control flow the back and control flow is in the store. The data pass is in d rbd plus the LVM volume underneath the rbd and all the country or the control front end control that that Operates with Kubernetes with imperias. So this is a A stack. If you can go to the next slide. So generally, what we want to do is to want to contribute in the this free stock into the New York's New York's actually the diabetes already within the news kernel so and on the link store I think is in discussion of entering soda and we want to give careers to to cloud native foundation. Okay, I think that, at least from my point of view, I would like to see more of what the architecture of of periods is and how interacts with with Kubernetes and as a whole. I think that's what's missing here. Yeah, agree. Yeah. I would agree with that. I mean, I think, I think we, I think we need We need some, you know, basic structure to kind of say, look, when you when you use the period operator, for example, it's it's implements the lint store controller and the lint store satellites, for example, and then when, you know, a volume is a PVC or a volume request is is issued. What process does previous implement and also, you know, if you're making the comparison with rookie, I'd like to kind of understand sort of. Yeah, I think. Yeah. I think the previous is most close to the concept of the roof. Actually, it's a collections of all of the storage operators use a driver and also other like a scheduler things that makes a story system like links to a cloud native. Okay, this is a career scope and the link story is actual the actual story system. Again, the diabetes underlying data pass technology. And that's fine, but but I'd like to understand sort of the, what does the previous operator actually manage or what are the plans. Yeah. Maybe, maybe I can just go ahead with you. It's really hard to understand the microphone. If that was more it's you have a serious audio problem. I suspect this was yes. Yeah. Okay. That's a pity. Moritz audio is not working because Moritz is now work. The main guy behind the previous operator. So he could give most insights from top of his head. That's okay. So that's, that's, that's all right. I mean, if it sounds, it sounds like We, it sounds like, you know, maybe you needs to Prepare a little bit more background on the on the previous Operator and the previous functionality specifically, I think, you know, we got a good understanding Of sort of lint store and the RDB, which, which, which is, you know, great. And thank you for that. But I think, I think we, we didn't quite understand the functionality of periods and and what you're you're planning on on on building with with period. So perhaps if you want to sort of maybe discuss that that that would be the most helpful and if if you're not prepared today that's fine as well we can only schedule it in for For a future date. Alex. Let me try to answer that from top of my head as good as I can. And I also shared the webpage where it is on GitHub. And so it gives you a walkthrough. But let me try to summarize that in my, in my words. So the purpose of the periods operator is to, to, to run the lintster controller in containers. If, if requested, if necessary, it will run the DVD module loader containers on your cluster. If requested, it will run at CD in your Kubernetes environment and point the lintster controller to that at CD instance. And it will also get the CSI driver and the CSI components in place. If requested, it can also do detection of local storage devices and of unused local storage devices and turn them into lintstore storage pools. Automatically. And, and I think it can also, if requested, it can also bring the stock scheduler extension into the cluster. So, from top of my head, that is what it does. I think that's, I think that's helpful to, to set the scene, but I, I suspect it would be, it would be helpful to get, you know, a presentation on periods specifically. Perhaps at a future date. And, you know, we can, we can do this in the next meeting. Can I, can I just quickly ask, is the intention to apply for sandbox, or is it for, or, or are you looking to, to possibly apply for incubation. Let's take it step by step. So, the first goal is to get it into the sandbox. And maybe for that, but, you know, step by step. Okay. That's, that's, that's fine. In that case, what I would recommend, you know, it's possible, obviously, to make a sandbox application. I don't want to go into the TOC directly, but I sort of strongly recommend that that we can maybe structure the presentation, slightly more thoroughly so that we can get a better understanding of what, of what periods is going to cover and what maybe some of the plans are for periods, because I think there's there's a bit of a gap of understanding with the team. So, so that would be useful for the next steps. Absolutely. Okay, so we'll, we will assemble a presentation with with that in focus. And when ready, I will shoot a mail on the sick mailing list and ask for for some time in one of the meetings. That would be that would be fantastic. Thank you so much. Thank you for giving me the stage. You're very welcome. And we look forward to hearing further details in future. Does anybody have any other questions or comments that they would like to raise at this point? Otherwise, I think we can give everybody a few minutes back. Don't have anything. All right. Thank you, everyone. Have a good rest of your day. Bye. Bye bye.