 All right, well, let's kick it off and get going. I'm not sure if we're gonna get too much more than this today. Not a huge audience, nice cozy little group. Thank you all for joining. This morning on the agenda, we've got a couple things and if anybody had any extra topics to bring up that they wanna chat about or just generally discuss, feel free to add it to the agenda. I asked Saad a few weeks ago to come to the group today to chat about what's going on with CSI. I think we've had our, the most recent release is 0.3 and it looks like we're chugging steadily forward towards getting it stabilized in Kubernetes and possibly a 1.0. So I asked Saad to kind of chat about what's the latest on 0.3? So hopefully we'll have that discussion and hopefully we can kind of chat about other things after that that the people may have questions about for CSI. So after that, Alex has also been working on a little bit on the white paper. So we can chat about what the progress is there and then open it up for other discussions. So with that, let me hand it over to Saad to review some of CSI 0.3. All right, thanks Clint. So yeah, 0.30 of the CSI spec came out a few weeks ago. So to put this into context, the first version of CSI 0.1 was released at the end of Q4 and then in Q1 we bumped it to 0.2 and now in Q2 we bumped it to 0.3. For those who've been wondering why are we not yet 1.0, the idea was that we get the spec to a state that we're happy with and then try to implement it. And that's both as, you know, implement CSI drivers as well as on the CO side for Kubernetes, Mezos, et cetera, to discover any issues that we have with the spec, use that to revise the spec. So the first implementation on the Kubernetes side of CSI was in the Q4 release, which was 1.9 of Kubernetes came out as alpha support for CSI using CSI 0.1 in Q1 for Kubernetes 1.10. We bumped that support to beta with CSI 0.2 and now we're maintaining 0.2. Between 0.1 and 0.2, there were breaking changes. So if you had a driver written, you had to actually go in and update it. Between 0.2 and 0.3, we've tried to minimize any breaking changes and we've basically eliminated them. I can talk about some of the changes in 0.3. So the big things that went in were, one was the snapshot API. The snapshot API is designed to basically create, delete and list snapshot objects. We already have a create volume, delete volume, and this was following very similar patterns. You can take a look at the PR if you're curious about exactly what that looks like, but the RPCs are fairly straightforward. The details of it, of course, took some time to decide. The second feature was topology. In particular, availability topology. The idea here is that a particular volume may not be equally available across a cluster to all nodes and we want some way to be able to express that when we provision a volume to be able to influence where that volume is going to be available as well as after the volume is provisioned, be able to understand where it was provisioned and use that information to do intelligent scheduling on the CEO side. So if you look at this API, we have a new plugin capability called accessibility constraints. If you support it, then you can request topology or you can have optional topology requirements on the create volume call. These topology requirements basically act as constraints limiting where a volume can be accessed from. Note that while this is designed to allow you to specify things like region zones, racks, the way that this has been expressed in the API is that we never actually refer to region zone rack instead. It's an opaque key value where the key can be whatever makes sense for your storage system. So one of the challenges is how does the CEO discover what the keys and values are? And for that, what we did is extend the information that we get for a given node to be able to return the labels that should be applied to that node. So if you look for get node get info. So node get info will now return a list of topology, basically labels that apply to the node. And then when a CEO decides that it wants to schedule to a particular node, it can take those labels and send them back in on the create volume request side to ensure that a node is scheduled there. And then on the response for create volume, you will get the list of topologies that that volume is accessible from. So this went into the spec in 0.3, both snapshots and topology. On the Kubernetes side, we have not finished integrating with this behavior yet. We're planning to do that for this coming release for 1.12. And then some of the other smaller features were things like we have node get ID, which allowed us to retrieve the ID of a node as understood by a storage system. What we realized with the topology work and some of the other smaller PRs that went in was that there's additional information for a node that we want and it's not just ID. So we expanded this call to node get info. But in order to maintain backwards compatibility, we left the node get ID call as is. And we plan to remove that before 1.0, but for now we'll leave it in for backwards compatibility. We also clarified the probe calls. So there used to be a probe call on both controller as well as node. But we realized that it didn't really make sense and it was not very clear what the purpose of the node call was. And so we clarified the purpose as being essentially just a health check. And you can read that PR if you're curious about the details. And now node get info in addition to reporting topology also reports of volume limits, meaning the maximum number of volumes that can be attached to a given node. This information can be used by the CO to make more intelligent scheduling decisions and not attach too many workloads to a given node that of a given volume type. So those were the big changes that went in to 0.3. There were a bunch of smaller clarifications and corrections that we did. For the most part again between 0.2 and 0.3, there were additive changes, nothing breaking. So your 0.2 driver should be compatible. But if you get a chance, do take a look at this 0.3 spec and try to update your driver to maybe take advantage of some of these new functionality. That's all I have. Do you have any questions for me? Yeah, I do. Let me wait for the group though to see if Van Well says any questions. So, oh, go ahead. I was just gonna ask. So, one of the things I've been following is the topology API bits. Because I think that's gonna be interesting as we're seeing a lot more people deploy larger Kubernetes clusters that kind of reserve different nodes for different purposes. And is that kind of where you're sort of seeing the topology question come into play in terms of sort of requests? So the need, the use cases actually vary a lot. So if you look at cloud providers, they're naturally segmented into generally zones and regions and the storage that they make available, their block virtual disk storage tends to not be available globally. It's usually zonal. And today we have no way to be able to express that in the CSI spec. So that was a big gap that we needed to fill. So that was a big driver. On-prem, there's all sorts of different segmentations of a cluster that you could have, things like racks. And storage systems can be not equally available to all the nodes. It might be available in a certain rack, a certain subdivision of that cluster. That was another driver. Cluster is being very large. When you have very large clusters, it does make sense to have some sort of natural division inside that cluster. So that is a use case. It wasn't the primary use case when we started looking at topology, but this should help that as well. Got it, thanks. Sassad, what are the big milestones coming up and what do you see as the path to getting to 1.0? Yeah, so we want to get to 1.0 right around when the COs are beginning to land their GA releases of CSI. For Kubernetes, we're targeting Q4, possibly slipping into Q1. So based just on that, we anticipate CSI going 1.0 by end of year. Part of going 1.0 is going to be coming up with things like certification process for CSI drivers. That's something that we need to start looking into. We have a sanity testing suite that if you have a CSI driver, you can run this test, set of unit tests against your driver to do a sanity check to make sure that it's implementing the spec correctly. But for a certifications suite, we want to expand that and maybe have more requirements that must be checked off. Beyond that, we do want to also become an official member of the CNCF. Today, the project is still run independently and not officially part of the CNCF, even though that is the goal. In order to get that ball rolling, we need to get our legalese all in order, which we're going through at the moment, making sure we've got all the CLAs and things like that from contributors, that sort of thing. Once we have that, we'll get the ball rolling on getting inducted into the CNCF. And then in terms of the spec itself, I think for the most part with the APIs that we have, we're becoming more and more comfortable with them. We're going to continue to expand that. For the Kubernetes implementation, for example, we're going to try to implement this quarter support for ephemeral local volumes. So far, we've focused mostly on remote persistent volumes and we also want to use CSI to support things like, hey, I want to inject some identity into a pod. And the lifecycle of that volume is the lifecycle of the pod. I don't have an attacher. How do I make that happen? So we're going to focus on that use case this coming quarter and see if there's any changes that bubble up into the spec. So those are the things to look forward to before 1.0. Do you know the, has Docker made any official commitment for CSI implementation within their engine? Yeah, I actually talked to them at DockerCon this few weeks ago. And so they're very much onboard with CSI in terms of helping guide the designs. They're one of the approvers on the approver list and they've been very involved with what the API looks like. In terms of actually implementing it, basically what they said was they're not certain yet based on where their product is going. So I believe Docker is trying to be a orchestrator of orchestrators. And Kubernetes is one of the orchestrators that it supports. So the indication they gave me was if the support for CSI and these underlying orchestrators like Kubernetes, et cetera, are solid, then they don't need to actually go and implement it at their lair. So they're doing a wait and see kind of approach to see if they actually need to re-implement this at their lair or not. Is, do you know who the Docker representative is at this point? I think it was Brian. Yes, so it was Brian Goff, Brian left and he's been replaced by Deep. Let me see if I've got his information should be in the prover list. Okay, I'm from Deep Debroy. Okay. All right. Does anybody else have any questions? Excellent update, Saad. Thanks so much for jumping on. I really appreciate that. Thank you. Thank you very much for your attention and what you guys have done for the CSI project. I think it's very important for all the COs and very important for customers using cloud natives. So it's excellent stuff. Thank you. Great, thanks. All right. So next item on the agenda. Alex, if you wanted to chat a little bit about what we did with the white paper. Sure. So just to recap, the white paper that we're trying to put together is designed to, or the goals are to clarify terminology and provide some examples and how we can, how provide information where realistically an end user would be able to compare and contrast to different technology areas which regards to some specific attributes of the system like availability and scalability and consistency and performance and durability and those sorts of things. And I spent a bit of time thinking about how we would do this because originally we kind of broke down the document in terms of having block stores and file systems and object stores and then following on with key value and databases, et cetera. And the more I talked about it, the more I was kind of getting confused because what I, so the conclusion I came to was most of the storage systems are formed out of a ton of layers. And maybe 10 years ago, we could make reasonable assumptions that for example, what we used to call a sanary and that would present block storage would have a certain set of attributes in terms of performance or availability or something like that and file system or a shared file system would have a certain set of capabilities or attributes. But then the more I talked about it, the more I realized that actually it's less about the data interfaces that define the attributes but more the underlying layers within that storage system or the storage service. So I'm kind of proposing that we break this down into and I put some of this into the document at the moment. Do you guys have access to the documents? If not, I'll stick it into the chat. Yeah, if you throw it in the chat, that'd be great. There you go. So since we want to be able to compare and contrast and give users the ability to kind of make decisions based on this, some of the, I kind of said, okay, let's put some high level points down together to define what the attributes of a storage system were. And that covers availability, which is, the ability to do things like failovers and covers things like redundancy and data protection. The scalability, which is, it's a bit of a subjective term but I try to define it in terms of the ability of being able to scale the number of clients that can access an interface, the number of operations that can be put through an interface and the ability to scale the number of components in the storage system to facilitate A or B. And then performance, again, can be subjective but I've kind of tried to define it in terms of things like latency or the number of storage operations that are done per second or the number of throughput. And consistency, again, I tried to kind of limit it to operations, the consistency of the data after a write or an update or a delete operation or the delays that can happen between performing those operations and actually getting committed to the non-voluntary store. And then also things like durability, which includes some of the functions of data protection and redundancy but also covers areas like checksums and bitrots and all of these more advanced areas, which probably need a bit more defining. And then follow that on if we look then at the next page with the data access interface. So the interface is what I've tried to do is put each, put a couple of sections together and say what that section defines and what the influence is. So for example, the data access interface defines how the data is stored and consumed by applications and that influences the interfaces that you use to manage and provision that stuff. It influences things like availability in terms of moving the data access interface or getting access to the data access interface between nodes. It influences certain things like performance because protocols and things like that affect that and it does influence scalability. And I've kind of tried to group these into two sets. One around volumes and one around storage services that you access through some sort of API layer. So volumes covers things like block file systems and share file systems and application APIs covers things like object stores and key value stores and databases. And obviously these are including but not limited to kind of discussions, right? And then after that, I was kind of thinking, well, what are the next set of layers that come into this? So I tried to define the storage stack or the layers that contributes to these things because most storage systems are no longer, as we discussed, are no longer defined just by the storage interface. So for example, you can have a file system interface that's backed by an object store. And therefore the file system interface isn't really defining anything around scalability or durability that's actually defined by the backend object store. And then you get into even more complex scenarios where for example, you could have a file system like Gluster that's providing a block interface which then is going to provide a front end file system interface on top of that block interface. So with all of those layers in place, I kind of try to limit the layers that we kind of see in normal use. So starting at the very top level, there's like what does the container or the application see? And that defines how the data interface is actually consumed by the container. So things like the volume namespace and the networking required. There is an orchestrator and a host and operating system level. So things that define, that virtualize these interfaces within either the orchestrator or the operating system. So things like volume managers and bind maps and overlay file systems which regards to volumes or things like meshes and balancers and those sorts of things that can affect those application interfaces. And then one layer down, what the transports looks like. So how the data interface is actually talking to whatever the storage system is. And that can be things like, you're accessing local stuff, you're accessing remote stuff. The remote stuff could be points to point, it could be hyper converged. And then there's the actual storage topology which defines the architecture of how the actual storage system is running. So you've got sort of a couple of options there where you've got things like centralized systems or distributed systems or maybe sharded systems like Vitess for example. And then there's potentially another layer which happens at the virtualization or hypervisor or perhaps cloud provider layer where resources are either accessed directly or the virtualization layer provides some sort of mapping or pooling or connectivity management or failover. And then there are the next layers sort of how is the actual data protected within the storage system. So things like, some of the obvious stuff like RAID and mirroring and replicas and the razor coding. But we can sort of beef that up as well. And then there are probably things like data services which get applied at different levels within the stack. And that can do, the most obvious ones are things like replication or snapshots which again can happen at the application layer at the application API layer like with the database layer that can happen at the volume layer that can happen at the block layer. So I think it's worth defining what those kind of options are. And then just to sort of complete it off where the data actually is sitting then in some sort of physical or non-volatile layer. So, and the main reason for including this is because it's often used in service definitions from cloud providers or products which are available from commercial companies or things like that where we're talking about in memory caches or non-volatile memory or SSD layers or spinning disk and those sort of things to give a little comparison in terms of comparative speeds and comparative differences. So I know that was quite a bit of a long rant but I'm very much interested in hearing what your feedback is on this in terms of how we can sort of flesh this out. And if you're happy with sort of this kind of layout I can sort of start fleshing out the different sections in the layers and then we can start fleshing out the actual sections to give examples of individual blockstores and individual file systems and individual object stores and databases and that sort of thing and flesh out the bottom sections without having to constantly refer to how that file system does data protection or how that block store does replication or how that database does scalability or whatever else because we've already kind of defined the generic terms that are available in each of the layers up front. Comments, questions? So I think it's actually a really, it's an interesting perspective and something new that I haven't heard in our discussions so far is that it's complicated is what it comes down to like looking at it from a bottom up perspective or looking at it the way that Quinton laid it out originally in the document. I think it provides some information about just a general storage service and what capabilities or how you expect it to act. And what you've laid out here is that it's more of a top down vision that you've got a storage service but you don't know what's backing it and the storage services capabilities are all gonna be changed based on how the storage services actually depend. So I think it's actually, to me it's an interesting way of looking at it but I'm wondering if the information or the complexity of that adds like if that's valuable to people and it might be. So that's my first comment on it. Yeah, it's a balance. I think we can keep the sections sort of high level enough that we don't need to go into intimate intricacies but it allows people to actually make decisions based on availability or consistency or whatever else based on not just how they access the storage but also how it's, what the different layers in that storage actually provide to it because you often hear kind of, there are often myths about, oh well, file systems are slower than block or object stores are prior latency or whatever else but actually it kind of does depend on how you're accessing an object store, what it's protection or a ratio coding is, what services are based on it, whether there's load balancers, whether it is, you know, and all of those sorts of things. Yeah, also I guess tied to my comment and I think what you're saying here too as well is, you know, the more details that are exposed about how the service is built like I think the less cloudy it is at the end of the day is you don't go to a public cloud and get the information about how their SQL service is built in the back end, you know, what you get as an SLA and what you know is that, hey, if I have a certain type of, you know, or a certain structured data, like this is the type of service that I need to be using to be efficient for my application but everything else about how the thing is built is, you know, so on one end I think it's, like the more information we expose, maybe the less relevant it is to the general audience who's looking to go, you know, stand up and build an application that relies on cloud data services because they just will never get that information that helps them understand like how that service was built. But on the other hand, you know, someone who is, you know, looking to be highly dynamic in how they manage our applications and they do have their own storage team that can help them understand like how something was built and helped to make sure it's optimized and, you know, set up in the right way like I think that that's also interesting too. Yeah, so I kind of thought about that too. So if we're limiting the landscape to just cloud services, that's probably less exciting to most people because a number of people are coming to this discussion in terms of, you know, different projects, for example, that they're also looking to consume, you know, whether it's something like Rook or Gloucester or Ceph or, you know, perhaps something like Minio or Vitesse or Cochrane B or any number of key value services that have been sort of proposed to the talk. And if we need to kind of define, okay, so, you know, Vitesse is charted, what does actually charted mean? Or if we say, you know, Minio uses a ratio coding, well, what does that actually mean? And why would you want to use it? So kind of, I kind of came at this room, let's define the terminology and then we can use that to define the products and services without having to sort of explode that information into each and every example. I don't know if that makes sense. Does anybody else have any comments about it? Sorry, it took me a while to unmute, but I had one thought that you prompted me to think of Clint when you got into the idea that some characteristics mean they're less cloudy. And in a way, to some extent, I think people tend to pigeonhole these stateful storage solutions into just categories that existed in the legacy pre-cloud world, you know, putting them in categories of relational database or whatever. But I think it might be valuable since you made the point that people buy these by service level agreements to just inventory the potential service level agreement characteristics that you want to use to evaluate a solution, you know, like how many IOs can it do per second? What is the actual guarantee for resiliency in terms of fault tolerance within a failure domain? The backing storage is ability to manage replication across failure domains, including geo regions and just give people a clue as to what the different accesses are that they should potentially evaluate a cloud storage system on. Yeah, absolutely. Because, you know, when you define those attributes, then there are all sorts of things that kind of come into play. Like for example, you know, if you're using, say, an active, active database, for example, as your way of storing systems, then perhaps, you know, your way of doing availability is influenced by some of the upper layers. Like perhaps there might be a low balance there might be an ingress controller there might be a service measure, whatever else, which is actually what is defining your availability or your failover capability, for example. And that's why I was kind of trying to put it into layers so that people could see those layers and they could see which areas like, you know, consistency and performance and failover and whatever else are affected by each layer. Well, Alex, I like it. I think it's a good update and a good direction. I know that we, we haven't gotten a response from Quilton in a couple of weeks, but it might be worth hanging in the back end and then we'll just get back to get some more direction before we dive into it further. But I think it's, if we approach this from the top down perspective, I think that it's more modern and more valuable and more oriented towards cloud. And I think that, you know, if the white paper is able to be succinct enough, while approaching it from top down, like I like that direction, what I'm worried about is like just having too much information and too much to discuss in the white paper where it gets watered down or it's just not possible to get it done in any reasonable timeframe. So my only concern is just the scope creep. If we do approach it the way that you're discussing, while also including information that Quilton's putting in there too. I agree. I mean, I think it's probably gonna take me two or three weeks to kind of flesh out some of the layers. But then once we have that, that could even, you know, I mean, structurally, we could have examples of, you know, find block object, database, et cetera, at the front and even have the layers as just a way of defining the terminology perhaps and it could be, you know, structured like a glossary or an appendix or something if we need to sort of show the product examples first. I was just kind of getting really confused when trying to work through the product examples and trying to figure out how to talk about them without talking about all the different layers can influence what that thing can do. Yeah, and I agree with that too. I think it's the right place to end up because it's reality of each implementation of any of these higher level storage services is lots of complexity underneath it. It impacts the actual service. So I see how you want to end it up there. Yeah. All right, so what I'll do is I'll email Quilton and obviously if anybody on the group has any feedback, more than happy to take feedback and just feel free to comment on the document as well. And I'll try to watch more of it for the next meeting. Okay, excellent. Thanks, Alex. That's great work. I appreciate you jumping on to that. All right, so we've got about 20 minutes left. Does anybody have anything else they wanted to chat about? I'm also interested in other topics that folks may want to hear about. I did have the SNIA group reach out. I think that they possibly wanted to get a spot to chat about Redfish. So that might be one that's interesting that's coming up. I know, I see you out there. I don't know, Portworx or Alex, like George O.S. if you guys want to do an update on what's going on and what you're seeing. That could be another interesting one to get on the agenda. Anybody have any other interesting things or things that they want to see covered or groups we should be reaching out to for presentations? Actually, that might be an interesting prospect. So obviously Portworx and StorageOS are busy on developing CSI implementations, but also we're contributing to a number of the CSI sanity tests to sort of automate, driver testing and that sort of thing. So I might ping the guys Portworx and see if we can get a session perhaps in a month's time or something like that to talk about where we are and CSI and sort of practical implementations and that kind of thing. Yeah, that could be interesting. Sorry, just a specific question was, are there any other interesting projects that we should discuss here? Was that what you were asking? Well, I think there's a few things. One, I think we're all working together on other CSI updates and implementations and things to help CSI. So that might be interesting for the storage group. But I think another thing is, where's Portworx at? Where's StorageOS at? What are we seeing in the industry? What are your customers actually really interested in at this point with Storage and Cloud Native? But that might be another interesting topic. And then other than that, what other storage projects should we be reaching out to that we wanna hear more about in our forum here? Got it, that's a great set of questions. Let me think about that and maybe send it by email. Fair enough. Okay, does anybody have anything else? All right, we'll call it a meeting. Thank you, everyone. Take care. Bye. Bye.