 We're going to recap some of the things that have happened over the last six months and hopefully go into a little bit of what we expect to be coming. I'm Sean McGinnis. I'm the PTL for Cinder. Hi. I'm Jay Bryant, Cinder subject matter expert for IBM, been working on Cinder since Grizzly and I'm a core member. Hi. I'm Xin Yang from EMC. I'm a core reviewer in Cinder and Manila. Sorry. I can't see any of you. So I'm assuming if you're here you probably know what Cinder is, but just to make sure everyone has the base, Cinder is the open-sac project for persistent block storage. It is not a storage service. It does not do the actual IO itself. It allows you to manage different storage backends through a nice abstract interface, allowing cloud users to just consume resources. It's a plug-in architecture. We have many drivers and this allows you to configure Cinder to be able to use whatever type of storage on the back end that's supported that you want to use. We don't dictate what storage device actually provides that IO and that persistence. So we will be going through some of the key features that were implemented in the last six months. This isn't everything that was done, but most of the key things that we thought you might find interesting. So with this, I will hand it off. So I will talk about replication. In Liberty, we added support for replication 2.0. That is an admin API that supports walling-level replication. But when we try to implement that into drivers, we run into some problems. One problem is that there is confusion on the definition of managed versus unmanaged replication support. And the other problem is it was concerned on what to do if we just replicated some volumes but we leave the other volumes behind. So at the Cinder-Mitaka Meetup, we decided that we want to take a step back. We want to look at how to address the 2DR scenario first. And then we will come back and look at how to allow a tenant to have more control over replication with granularity, that is, for a tenant to be able to replicate a group of volumes, things like that. So in Mitaka, we disabled 2.0. We added support for API 2.1. That is admin-facing API. That allows us to fail over the entire backend when the disaster strikes. So we have three admin APIs. The first one is the failover host. It allows you to failover one backend to the replicated target device. And there's a freeze that the freeze basically puts the backend host into the read-only state. This option means that the volumes are still accessible but you can no longer create new resources or delete resources until the admin issues the sort command again. And sort basically just puts the backend back to normal and you can create and delete resources again. So we need to make some configuration changes in Cinder.conf. There's a replication device option. It's a multi-dictionary option. Admin can use that to specify multiple replication target devices. And then there is one standardized and required key. That is the backend ID and other keys are vendor-specific. And also we need to have a volume type with extra spec replication enabled that is true. And in the capabilities reporting, the driver needs to report replication enabled and replication targets in the capabilities reporting. So the flow is like this. The admin will configure the backend for replication and then create a volume type that supports replication. Since we support the whole backend to failover, so it's possible that you don't need a volume type with replication enabled because you want to replicate all the volumes. And then the tenant will create a volume that isn't enabled and the disaster strikes. And so the admin fails it over. Now the driver points to the replicated target device. So after failover, all the volumes that are replicated are still accessible to the tenants, but the volumes that are not replicated are not accessible after failover. So we have a few drivers that added replication 2.1 support. Dell Storage Center, EMC SavianX, HP 3.0 and left-hand. Huawei IBM ThrowWise XLV Pure. And Solidify also added data in Newton. So the backups, we already have full and incremental backup support and non-destructive backup support in Sender. And in Mitaka, we added support for backing up a snapshot. If a driver has implemented attached snapshot interfaces, then we will attach the snapshot and back it up and detach the snapshot. If the driver has not implemented the attached snapshot interfaces, by default, we'll be creating a temporary volume from the snapshot, attach a temporary volume, detach and then delete the temporary volume. We instantly, we already have quite a few backup drivers. In Mitaka, we added a backup driver for Google Cloud Storage. So this driver, along with Swift, POSIX, NFS, and GlassJFS, they are all inherited from the chunked backup driver, which stores data in chunks. That means this new Google Cloud Storage backup driver can also support incremental backups from volumes that is created from any backend. We also have a feature to decouple Sender backup and volume nodes. So previously in Sender, the backup service has to be running on the same node that the volume service is running, which means that you cannot really scale backup, even if you add more backup nodes. So now, with this new feature, we can run backup and volume services on different nodes so we can improve the performance. We can add more nodes to scale the backups. Today, we can only support active passive entry for Sender volume. This is because there are local phylox in Volume Manager and there are also non-autonomic state transitions in APIs that may cause race conditions. So there has been a lot of work going on trying to address this, trying to support active active entry with Sender volume. In Mitaka, we added support for distributed logs based on the tools abstraction layer, and you can use like Redis or ZooKeeper as a tools backend. We also merged the spec to remove API races. The network is in progress to implement the spec. So in Sender API, there are a lot of places where we first check for the... We look at the database, we look at the status of the resource, and also we look at the conditions. For example, we check whether the volume is attached or not, whether the volume has snapshots, and we decide whether to go ahead and start an operation. If all the conditions are met, we will go ahead and make a change in the database, change the status of the resource, and start the operation. However, there is a window between when you check the database and the time when you make a change in the database. So that could lead to race conditions. So this proposal is to use a compare and swap to do atomic conditional updates on DB models and version objects. If there are deadlocks, we'll just do retries. So we want to make sure that we only update the database when all the conditions are met. In Sender, one volume node can support multiple backends. However, one storage backend can only be managed by one volume node. There is no concept of a cluster that allows a group of volumes to manage the same storage backend. So the job distribution spec proposes to introduce this concept of a cluster, modify the concept of host, so that we allow API and the scheduler to distribute jobs to a group of volume nodes. If one node goes down, another node in a cluster can still carry on and manage the volumes. There's also a cleanup spec that proposes to add a workers table. In the workers table, we'll have a resource type. The resource type could be a volume snapshot or a backup and also an ID of the resource, an ID of the node that is working on that resource to make sure cleanups are done properly when there are failed jobs. In drivers, there are some drivers have local file locks. So this may not be a problem for Active Active Hway. So this is up to the driver maintainers to decide whether those local file locks need to be replaced with the distributed locks. And in volume manager, there are a few operations that are used in local file locks. They are create volume, delete volume. I'm sorry, they're attach volume, detach volume, and delete volume and delete snapshot. So the proposal is to use a locking mechanism based on the workers table that is added by the cleanup spec. And we use that one to make sure that we are locking properly. And we're also going to use the compare and swap to make sure that we are managing the locks properly on that workers table. If one node loses connection to the database, but it is still running, in that case, it may continue to do work on a job that has already started. But another node in the cluster may think the node is done and try to do cleanups. So that will cause problems. So the data corruption prevention spec is trying to address that problem, trying to prevent multiple nodes from accessing the same resource at the same time. OS Brick is a subproject under a sender. It has the initiator code that is used by both sender and NOVA for attaching detached volumes. In Mitaka, we added a Brick sender client extension library. That also uses OS Brick. We use that to allow us to attach and detach volume without NOVA. So that will make the volume operations with biometal a possibility. In Mitaka, we added a CLI for GET connector. There are two other CLIs, which is local attach and local detach. Those are still working progress. So now I'll hand it over to Jay. Thank you, Shing. So something to kind of comment here with sender is that we've been trying to not let perfection be the enemy of progress here. For instance, the active HA functionality is something we've been incrementally working on implementing and improving. Another example of that would be with the multi-attach. This has been a request for quite some time to get available into sender from our customers. Why? Well, at the moment, if you go and try to attach a volume to more than one node, more than one instance, it says, sorry, it's already connected. We can't do that. And so we want to make it possible that you can have multiple instances reading from the same volume so that you can either do this read-write or read-only, depending on what your back-end storage supports. But that way, you can have your instances running workloads on multiple instances, either for high availability, accessing the same data. If it's doing static processing and you want to be able to have one of your instances hit a fault and not affect your application running. With the multi-attach, we're able to do that now. This has required changes to both the sender client to change the checks as to whether we can attach when it's already connected to an instance. And then on the server side, also to work with that, we've been able to get those in in Mataka on the sender side. Now the next challenge is doing Nova and being able to get it to recognize that sender is now able to support that functionality. A little more complicated than we hoped, because when you detach, it needs to know, depending on which your back-end storage system you're using, whether it needs to detach it from the instance or not. And obviously, that depends on whether it's still attached elsewhere. So we're working hard to get that functionality, the rest of it in, so that we can hopefully see this during Newton. And again, so this is something that you need to make sure the volume drivers we're working on getting those all implemented to report up the ones that do support the multi-attach. Rolling upgrades is another, I think we've got a pretty good stab at this in place with Mataka. This is something that has been requested as we're moving towards more big enterprise level functionality. Right now, if you want to upgrade to the next version between, if you're going from one release to the next, you need to take all of your services down for sender. And that can take more time than our users were looking for wanting to do that. So with the rolling upgrades, we make it possible that you can take down one service at a time. You can upgrade your API service. You can upgrade your schedule service. And then you can move and do the volume service and make sure that each of those pieces work in your environment as you're doing the upgrade. So we've basically had to enforce doing backward compatibility from one level to the next. So Mataka will be backward compatible with Liberty. And we use the Oslo versioned objects functionality to do that kind of following in what Nova has been doing as well to achieve the ability to do rolling upgrades. So we make sure that Mataka is backward compatible with Liberty, obviously, with Newton. We'll do the same thing. And then we've made it so that the database obviously also has to be backward compatible. If you're doing something that's not backward compatible, it's going to have to span multiple releases so that people are able to get to a point where it'll be backward compatible. And we don't allow doing any DB operations, alter or drop DB operations during the upgrade. Microversions is something we've been really working towards. And we're excited that we're getting there on that functionality. So previously, and I'm sure for people who are operators here, they know that moving from one version of the API endpoint to the next has been a challenge to make happen. Because, well, if we were on V1, we were comfortable. We knew it worked. We wanted that new functionality in V2, but are we ready to go there yet? So we're trying to make it easier for you guys to make that decision to get the latest functionality in it, not to be a major step. So we went from V1 to V2. We hopefully have gotten everybody there. It's taking time. Do we still have anybody running V1 out there? Oh, we do. OK, all right. So this will hopefully be able to, as we move forward, get you to the point where you're more comfortable upgrading. So with the microversions, what it does is it negotiates between the client and the back end server as to what it will support for functionality. It sends a header and gets a response back. And then we're able to determine with that, are we able to support function x, y, z that you want to run in the API. And that way, what it does is we're going to be able to incrementally add new functions without having to go to a whole new endpoint version. So we were hoping to do it without having to move from V2 to V3, but that wasn't possible, given that the simple implementation of microversions was considered a significant change to the API. So we added the V3 endpoint, but it's functionally the same as the V2. It just adds in the support for microversions. And then it will incrementally add new functionality as we go from as we add new microversions for new API functionality. For background here, these will be out on the slides. If you want to better understand microversions, these are some links, the specs for the API working group. So this is a movement that is not just in Cinder, but across all the OpenStack projects. And then next explanation of how we've implemented it in the microversion spec. And then a good article by Shandeg about the experiences Nova had, how they're using it, and how the process was getting to that. And then we've used those experiences and learned from them in our implementation for Cinder. And finally, the last thing I'll cover here is some updates to the fiber channel drivers. We had a lot of comments that working with the current zone names was difficult. They prepended OpenStack in front of the WWPNs for the target and the host that were just all kind of run together and it made it more complicated to debug and to know what zones you were really using. So now you'll be able to see, for your initiator zones, you've got your host name, your host WWPN in there in the storage system. So it's easier to figure out what zone you're really using. And then the initiator zones also have the host name and host WWPN in there. So that, for those of you using fiber channels, should make life a little easier. And then with Brocade, added the ability to do virtual fabric support. So they have the physical networks. And then under that, they can partition it further into virtual fabrics. Before Mataka, you weren't able to control those with Cinder. You weren't able to use those zones. But now you can go into Cinder Comp file and add the idea of the virtual fabric that you want to use into the comp file. And the zone drivers will then be able to use that. And it supports doing multiple zones. So that's a good enhancement for our fiber channel users. And that's all I'm going to cover. I'll hand it back over to Sean to talk about some of the Newton and where we're hoping to go, I think, here. Thanks, Jay. A couple more things on Mataka yet. Just want to point out, we have some more back end volume drivers added, Coho, Disco, Fujitsu, Nexenta, and T-Chile. You'd think at some point we'd have them all implemented, but keep getting a few more. So now we're up to over 70. I hate to say an exact number, but I think the last time I checked it was 78, getting pretty close to 80. So here's a list of everything. Hopefully I got everything. If I missed a vendor, I apologize. There's a lot there. We tried to denote the protocol. So if you're looking at implementing storage for your open stack deployment, if you have a certain protocol you're looking for, hopefully this is useful for you, that you can see what might be an option as you go out shopping. Or if you have existing infrastructure, hopefully whatever you're using is part of this list. So what's coming in Newton? I have a disclaimer on the next slide, but before I even show this slide, I want to say this is all stuff we're just talking about. There's some of this that we do know with pretty high level of confidence that we will implement, but there's some of the stuff that we're still discussing. The latter half of this week is all the design summit. There'll be likely changes in some of our plans over the next few days. So with that said, this is just what we think we're going to be doing in the next release. So we pointed out we have replication 2.1. We really simplified things to try to clearly address well one use case before we tried to address every use case. And that was kind of the issue before, is everybody had a slightly different idea of what they wanted to accomplish with replication. And that made it difficult to try to actually come up with something usable. So now with that, we have that base. Generic volume groups partially related to that, but group replication, Tiramisu is our code name for it. This will be building on that foundation of replication now that we have a base level of functionality and multiple vendor support. We want to add new stuff to that to be able to enable more use cases. There's still work with rolling upgrades. We're calling this a tech preview or a beta. I think we might have found one or two little things yet. But the support merged already in Mataka. There hasn't been a lot of time to try different deployments, different scenarios. So we're going to continue working through that, make that as solid as we can. The HA work, heard several people asking for. But that's another one. We've got a lot of the building blocks in place. But it's going to take us a little while before we get everything done and well tested to make sure that it's something that's solid. Microversions, again, foundation is in there. We still have some cleanup work to do, moving things around, making it easier as we add more functionality to be able to use microversions to expose that. Extend attached volume, we've got some work in there ready for that. This is another one that takes some coordination between Cinder and Nova. It seems like a simple thing if you're extended volume most vendors who you right-click and say, extend and give it a new size. Well, if you actually want that to work for an end user going through Horizon, going through the CLI, you need to have some coordination between the block storage. You need to reskin the device on the Nova compute node. You need to do things in the guest. So there's some work there. Ironic, things like the OS Brick client extension. We're looking at ways that we can make Cinder usable beyond just Nova and things like Ironic are one of the areas where you can use Cinder as a more of a general software defined storage control plane and do things without having to have extra services involved. And the one I'm really kind of happy to see coming that we've talked about for a long time, that we're finally able to get some traction is this better async operation error reporting. A lot of times right now you try an operation and for whatever reason it fails and you got to call up your admin and they've got to go look in the logs and they've got to trace through and maybe they have to change their log level and restart the service, which we don't have H.A. right now and so it's not the ideal scenario. So what we are hoping to have is an easier way where if something goes wrong, there's a way that we can report back and you can find out what happened without having to go through all that hassle. All right. So most of these were covered in the release notes but if you go out there, the release notes are very detailed now. Great work by our infrastructure folks, the release notes management and the way we capture release notes now as we're going, I think has really helped. They're very detailed, hopefully not too detailed but everything that you need there, different sections on upgrade impact, new features, bug fixes, things like that. I definitely recommend taking a look at this first link to see what's out there. Launchpad is where we do all of our bug tracking and blueprint management. So as end users, I would recommend I'd ask if there's anything you see that doesn't look right, go to launchpad.net slash sender and filabug. Worst case we can say, no, you messed up, something wasn't right there. But unless we know about issues that people are running into, we can't fix them. So anything that you see, feel free to go out there and file a new bug. You're also interested in what's coming up, the blueprint section. There'll be a link across the top, one's for blueprints. That's where folks go out and submit a blueprint for new features, new functionality they want to add to sender. You can also, if there's something that you would like to see, most of the times when you file a blueprint, it's because you plan on actually writing the code for this, you could certainly file a blueprint for something you'd like to see and then follow up with us and see, there's plenty of folks in the community that are just looking for something to work on. So if there's something you're not a coder, something that you want in sender, put out your thoughts there and I'm sure someone would probably want to pick it up. Our sender wiki has quite a bit of information and it's a little more developer focused, so as an end user, I don't know if there's anything too useful there, but there's certainly a lot of information. And then finally, the sender YouTube channel, starting in Vancouver. Couple of summits or two ago with the summit and the mid-cycle with their design sessions, we actually started recording those and streaming them on YouTube. So this week, probably you'd be able to stream live and see what we're talking about, but they're also captured out there so you can go back and see, we can also go back and see what we said at the time, which has come in handy a few times. So, with that, I'll ask Jing and Jay back up and open the floor for Q and A and hopefully we can see it. Hey, good question, I'm not sure if I'm misunderstood or not. You guys are not supporting Ceph replication, is there plans for that? RBD? Yeah, sorry, I'll recap once he's answered here. Okay. So it's being looked into. There's no definite plans yet, no code started or maybe, but it's something they need to look into and see how they would support in Ceph. Okay, thanks. My question might be misplaced, but just for the upcoming stuff, I know previously there was some work done in terms of integrating Glantz and Cinder a little bit more so we could have Glantz images stored as Cinder volumes and then eventually boot Nova instances directly off those volumes in sort of a copy-on-write sort of mode, kind of like the way Ceph does it today. So it wasn't mentioned in the upcoming stuff maybe because most of the work to do that is being Glantz and Nova or? I believe the Glantz team has implemented some of that support already. I'm not sure the latest done there. Cinder, okay. So I actually don't know the exact status of the blueprint, but I know that there is a, one person is working on that actively last time. I don't know if it finally got merged in Mitaka. I'm sorry? It did get merged, okay. Okay, so is that functionally complete now? Does anybody know? I'd say one thing on the Cinder side related to that is we did, it was actually in Liberty we implemented caching, image caching, at the Cinder level. So there's still that performance hit on the leg on the initial time you use an image where it has to copy it over and convert it. But hopefully if everything's working well there and there's no changes then you should be able to use that already copied image and just boot up another volume. Yes, yeah. Cinder Glance, yeah later this week I'd have to look on the schedule. So the allocated snapshot API, sorry, in a snapshot API do you have support for providing allocated blocks, incrementals? So for getting reporting back the size of that snapshot, is that your question? No, so let's say that I had this in the volume and it was a fresh Cinder volume and I wrote some data and then took a snapshot. So the snapshot is probably of the size of the volume but I don't want to get all the blocks in the snapshot that we're not written to. So that's neat. So I think in that case it's gonna depend on your backend storage and how they've implemented it. You know like with store-wise we have flash copy so that it will only do an incremental based off the base image but we do have some drivers that don't support the ability to do that and they do a full copy of the volume for the snapshot. So it really just depends on what your backend storage is and how they've been able to implement it. So there's no plan to get on the API? If you're asking about the API we do not have an API for that. You basically want the change of blocks returned to our API, would you not have that exposed? Okay, my question would be that NOVA supports an API that allows efficient pooling of resources so you can get back the servers that has changed after a certain amount of time. Are you planning to support such kind of efficient API pooling or notifications from Cinder? There was a spec I thought. Someone was looking at like being able to query just change since or older than a certain. Yeah, that's what it is. Oh, kind of a cross open stack. Yeah, I haven't seen a lot of work. I haven't seen any patches recently for that, but I believe people are talking about it. Thank you. Hi, one question. Do you know if there's any work going on making it possible to share volumes between tenants? Someone similar to what you can do with the Glance image? Public, like we have the public tenants or what? Like public image or is that? Something like that, you know, or more specifically between a group of tenants or? So like share volumes, like make a volume kind of a shared. Yeah, transfer is one thing, but it's... So we have that, the nasty quota thing, right? So that's one way you can kind of share. If you have like a parent project, then you have children projects and you can make them share the same resources. That's one possibility. But we don't have, I think there is a spec that's been there for a long time, but it's kind of a, we had a lot of discussion on whether to support that. Like I think it's a public snapshot, something like that. But we didn't really implement it. And then now we have the nasty quota. So I think that should be able to accommodate what you want to achieve. Okay, thank you. And actually, that's a good note just on the nasty quotas item. People who tried to use that in the past and we found out it wasn't working as expected. So we got a lot of good improvements into Mataka for that. It's again, an ongoing work in progress, but it's functioning much closer to what the spec is. So, sorry, is there another question? Plant from Cinder. Kings. Events management. Indications that whenever there is a creation of a volume or attach a volume. So is there any way that we manage it by events? Notifications, right? So the notifications, we already have that. If you're asking for notification, when we create a volume, we will send notifications, we'll delete a volume and all of that. So we have that already. So if that's what you're asking for, then that is already there. So stillometer actually can pull the results and use it. Is that what you're asking for? Or you're asking for more than that? Actually, I was asking for the instance of the volume, whatever is created, complete instance of the volume. Whenever we create a volume, there will be a complete list, like what kind of volume it is and all. We get the notifications. Yeah, so when we create a volume, we do send a notification, it does show us the details of that volume. That is there, yeah. So my question is on the freeze thing when we do the replication. So when we do the freeze, we are not allowing any operation on that particular volume since it is read only. So does it affect some time? So the freeze is that if the volume is filled over that particular volume, you can still use it. It's that you cannot create any new resources or delete resources. So you can still kind of read it or write to it. So still can I take a clone of it during the copy? I'm sorry? Can I clone it? Can I clone it? Those are kind of disabled. So all of those other operations. But you can do a saw and then, basically this is just to make sure that, admin needs to make sure that everything's in good condition and then it's just kind of protection. After that, you can do it, right? After the saw, you can do it. Okay, yeah. Do you have any, do you foresee any challenges between cross platform backends? So for example, if I have a NetApp backend on one and trying to do an HA to another sender driver, do you see the challenges there or does it just work based on the current API support? What exactly, what you're trying to do between those two backends, like you can, you can move from one backend to the other, you can do that kind of generic kind of a- But for the HA, or active active HA, do you think- Or you may all have two different drivers managing the same, what do you know? I don't think we can, that I don't think we can do that because it's- The volume is, because the volume is kind of a specific right to your driver, right? The volume is the kind of you have that, you host it on your driver. You can't say, okay, it's hosting a NetApp story and also an EMC story. We don't have that capability to do that. Unless if you say, okay, I back it up in one place, that we can do, but not- Right, so you're back up on a clone and not- Yeah, you can, back up is on a different device where you back it on Swift or on something else. That's a different backup device. That's not your volume driver backend, right? So two volume driver backend, that's if that's completely different then we don't have, we cannot share them. But I remember seeing someone was trying to say, can I make a call from one driver to the other and there's a blueprint or something out there. But right now we can't really do that. So I've noticed with iSCSI and LVM, if you push it really hard, it just kind of falls over and dies. Is it actually considered a real driver or is it just more of an experimental tool? It is a real driver. It's, I guess, all of the other drivers have vendors that are supporting them. So unfortunately, I think that means sometimes the LVM driver doesn't get as much attention as it should. But if you're seeing issues with it, I had the link for Launchpad. If you could file a bug and attach any kind of logs that you have, that's something we definitely want to fix. Yeah, thanks. All right, thank you.