 The series evolved from sessions that were held by the PTLs regarding updates to their projects at each summit, and we converted them into webinars to extend that reach beyond the summit. With us today, we have John Dickinson, PTL for Object Storage, and Robert Collins, PTL for Provisioning. Each one will speak for 15 to 20 minutes, and then we can take your questions in and at the end. So with that, I will get going, and I will pull up John's presentation. John, when you're ready, we're ready for you. Thank you very much. Happy to be here, and I'm glad that we get to participate in these webinar series. One of my favorite times of every summit so far until we started these webinar series was always giving the State of the Project talk and being able to basically brag on the community and see what's going on. And so now doing that in the webinar series has always been becoming the new way to do that. It's really exciting. So thank you very much. My name is John Dickinson. I'm the Project Technical Lead for OpenStack Object Storage. You can find me online. My Twitter handle is not my name. I can send you an email at me.knocked.mn and various other places. So today I want to talk about what we've been doing since the Ice House release and what you can expect in Juneau and what's going on there. So next slide. So the basic reality that we're all familiar with is that storage is a problem and it has to be solved. It is a ton of data that we're all generating and that our users are all generating and the applications that we're writing are now cross-platform, as in like moving from different devices and users are demanding their data being portably accessible across their desktops from home to office to homes to tablets to all of these things. All of these new applications in addition are just generating a tremendous amount of storage with documents, videos, user-generated content. And it doesn't matter where you are or what you're writing, this is a problem that has to be solved so we have to have a good solution for it. Next slide. The change that's happened here is that when confronted with this massive problem of storage growth and the storage requirements, we realize we have to change a little bit about how storage is provisioned and how it is deployed in an organization. Used to you need an application, you write an application and you provision some storage for it and the next application comes along and you do the same thing that gives you a lot of siloed storage. And that becomes a little bit tricky to take advantage of some of the economies of scale pooled storage. Siloes are hard to provision and use effectively. So what we need is some sort of scalable infrastructure to deal with that. And ideally what you really want is to be able to do that in a way that you don't have a particular hardware lock-in. So you can change and grow and adapt over time. Next slide. So the other thing we realize is that we also need a lot of agility with our storage. From the ops perspective, you want to be able to automatically work around failures. You want to be able to do seamless upgrades. You need to be able to do capacity adjustments on the fly. And then just from the organization overall, you have to be able to match the storage to the use case. When you have a use case that says this is going to be high throughput or this is going to be large volumes of data your storage system is going to work best when you're able to actually configure your storage to specifically solve those problems. And so what we need is not to make the applications worry about what particular hard drive you're storing it on or how to actually effectively store the data to prevent operational error, hardware failure and things like that. You need a system where your storage can decouple your data from the actual storage media that it is on. And this is what SWIFT does. This is why SWIFT is useful and why you get such great economies of scale and throughput and efficiency when using SWIFT as a storage engine for your growing unstructured data. So next slide. So overall as a summary, what does SWIFT provide? Looking at three different people. If you're the enterprise IP organization, what you get is you get a shared group of stores that gives you a flat name space that allows you to have to consolidate your storage requirements and take advantage of the cost savings you can do for that. The no lock-in allows it to be very flexible as far as how you're going to grow your storage going forward. Now if you're the ops guy, then your day-to-day operations is smoothed out by being able to run SWIFT. So you're able to do things like automatic failure handling, where if there's a normal and expected thing, SWIFT automatically works around and adjusts for as well as automatic, or not automatic, but smooth capacity adjustments, meaning that you can change both up and down the amount of storage provision without having to worry about turning off the system or having any sort of end-user client downtime. And then being able to do in-place rolling up your entire cluster again without having to have any kind of client downtime or interruption of service. These sort of things, automatic are built into the system and they let you, as someone who is running and operating and deploying SWIFT makes the day-to-day operations easier. Now as far as the application use case, the end-user application, the storage, the things that we all know and love, are using to actually take your pictures and upload them someplace and watch the videos and stuff like that. The hard problems that we have, SWIFT takes over the hard problems of storage away from the application. Those hard problems generally all have to do with scale. Where do you put stuff? How do you manage your access to it? And with SWIFT, since SWIFT is able to offload those hard problems away from the application, the application developers can think about making the app awesome and not having to worry about dealing with nuances of storage or wrestling with particular limitations of a particular protocol or the storage media itself. And so those are the things that you get from running SWIFT, both from the apps perspective, from the enterprise perspective and from the developer perspective. So what's going on in SWIFT? What have we been working on and what are we looking at working on over the next several months? We finish up the OpenStack Juno cycle. So there's a couple of major things that I'm going to talk about, and these are storage policies and Erasure Codes in SWIFT. Next slide. Storage policies, I think, are the biggest thing that has ever happened in SWIFT since the whole project was open source. Storage policies allow you to exactly match your storage to your use case. Organized storage. So in a nutshell, what storage policies give you is that given your global hardware footprint of your SWIFT cluster, then you're able to choose what subset of hardware is storing your data and then how the data is stored across that subset of data. So if you know something about the internal to SWIFT, storage policies allow for multiple object rings in the same cluster. So let me go into a little more detail about how that actually works and some of those use cases to really just kind of bring that home. First off, storage policies has introduced a slightly new API perspective and it's extremely simple. Next slide. Storage policies are configured by the Deployer, the operator of the cluster, and so a cluster operator may create and provision, say, a gold, silver and bronze storage policy, or maybe the tuna fish and cupcake storage policy or whatever else they want to do. And then when the client ends up creating a new container inside of SWIFT, the client will send one extra header, the extort policy header, and give it the name of the storage policy that they want that container to be. And that's it. There's nothing new, nothing else that has been changed in the API to support this. It means that, I thought I guess with some recording information, but it means that all existing clients continue to work with SWIFT seamlessly and there's no backwards, incompatible changes here. But all new clients that want to take advantage of this functionality now have the ability to easily create new containers in different storage policies. So let's talk a little bit about the use cases on the next slide and how storage policies I see have already are being used and have been talked about being used and maybe some possibilities for the future. So there's a few really interesting things that I can think of for storage policies. One of which is that, well, I guess to back up just a little bit, remember, storage policies allow you to choose your subset of hardware and how to store your data across that subset of hardware. So first, with that first part, you can choose your subset of hardware, for example, based on a geographic region or a particular performance period, you know, SSD versus spinning hard drives. And you can choose, when you're going to choose how you store your data across a particular set of hardware, then you can choose the replication policy that is used. So by default, we normally recommend that people use a triple replication scheme inside of Swift, but maybe that's not right for every use case. Perhaps you want something that is just two replicas. And I think this is the first obvious example of where storage policies become really useful. Imagine you're doing video transcoding. You've got your goldmaster image that's, you know, multiple gigabytes stored in your Swift cluster. And it's really important that you keep that with both high durability and availability. So you want to put that as something that has a triple replicated policy. But if you're transcoding that video and you want to make it available, for example, on different mobile devices, different bit rates, things like that, it's really nice to keep those available. So you don't have to recompute that on every request. But if you lost that particular transcoded copy, it's not the end of the world because you can recreate it. So in that case, you can save some money, especially on your storage costs by provisioning a reduced redundancy storage policy. So you just use two replicas. And in that way, you can store all of the transcoded video or perhaps your image thumbnails or something else that's, again, retreatable in a reduced redundancy storage policy. And so this is something that I expect people to very quickly write and deploy and it's something that you could use today with storage policies in Swift. Another very obvious case with storage policies is taking advantage of being able to choose that subset of hardware and being able to designate that either on a regional setting or potentially on a performance setting. So on the regional setting, you can imagine that if you've got a globally distributed Swift cluster, which you've been able to do for about a year now in Swift, what you could do is have an East Coast and a West Coast United States location and also perhaps an EU and an Asia location. And what Swift has done up until this point is simply distribute your data with as many replicas as you have configured across that entire global footprint. But not all data needs to be accessed equally in all places and some data only needs to be in certain locations. So because of potential laws and rules and regulations and things like that. So one thing you can do is you can configure different storage policies to designate different regions. For example, you could say I've got a set of branch offices and each or maybe edge locations throughout the world and I need to make sure that I have a copy of my data on the edge. For example, if they have an office in New York and an office in LA but then my headquarters is in Denver. In that case, maybe you'll have data that is stored and accessed both just in New York and you have different data that's stored and accessed just in LA. But you can create a storage policy such that the data that is stored in New York also will always have a copy in the home office in Denver. And likewise for the LA office, we'll always have a gold master copy so to speak in the Denver office in the home office. What this means is you can imagine that you've got the data that is pertinent to a particular geographic location. For example here, the New Yorker LA. We live in that location and that gives you good locality of access which increases throughput and reduces latency. But it also means that you have your copy back at the home office in Denver but you don't have to worry about going all the way to LA to store that data and access that data every time you try to do it. So it really allows you to have a lot more efficient control over how or where in the world your data is placed. Obviously you can imagine some other things about saying, I need to have certain sets of data that for example only live inside of Germany and never leave and traverse the Atlantic and come to the United States or something like that. So those are some examples of how you could carve up your switch clusters to enable some storage policies across different geographies. And you could clearly do the same thing by using different performance peers, SLA requirements for different kinds of hardware. Speaking of different kinds of hardware, there's some other things that have been very interesting happening in the community along those lines of storage policies. Is that there's been several companies, HP and Red Hat included, that have been developing some plugins for Swift to allow them to take advantage of either a particular storage engine, for example like Gluster or Seth or to take advantage of particular hardware, for example in case of HP, and they offer something over and above or just something that is not available out of the box with Swift. And so what you can do is you could be able to, for example, configure a storage policy that says, you know, I'm going to have this set of hardware is going to be running or this set of data inside of Swift is going to be available on this set of hardware. And that allows the vendors to come in and offer either their own open or proprietary extensions to Swift and take advantage of their own technologies they are developing. And then what are the kind of things can we do with storage policies? I think this is where you come in and this is where I'm excited to see how people end up using these powerful tools. There's obvious things that people can do along the lines of encryption and compression and things like that. But I think there's also some very interesting things I've heard about, say I need a storage policy that offers, for example, very low jitter rates for video streaming. And that's the kind of thing that you could preferentially store, say video content in your low jitter storage policy and not have to worry about that on your storage policy that's designed for backups. Overall, I'm really excited about the possibilities that storage policies allows. This is something that actually just last Friday was merged into Swift. We're currently doing some community QA work on it right now. And it is going to be available for the world within the OpenStat channel cycle. So next slide. Now the next big thing that is happening inside of Swift and the thing that we are working on for the rest, or one of the major things that we're working on for the rest of the junior cycle, is creating a storage policy that is a non-replicated storage. We're going to be able to allow employers to set a storage policy that would be erasure coding data. If you're not familiar with the erasure coding, basically it's a way to take your data and then break it up into chunks, mathematically compute a few extra chunks, and then store all of those chunks in such a way that you can survive the loss of quite a few of them. And the real advantage of this is that although it does cost from extra CPU to compute the needed bits for erasure codes, you can get very high durability without having, say, the overhead of triple replication. So you may even be able to get down to, say, a 1.4 to 1.5x overhead rather than a full 3x overhead for triple replication. This is something that is an active area of work, and it will consume a lot of time within the junior cycle. And I hope it makes it inside of the junior cycle, but we're not really committed to that date right now. It's more likely that it's going to be closer to the end of this year. And so just in a very practical sense, one reason I'm really happy about it is that it gives a better fit for some common use cases that are swift, specifically when people are storing very large data that doesn't have very high access rates, things like backups and image storage, especially when it comes to integrating with the other open-set projects. Next slide. So we'll be working on these things quite a bit, and there's obviously not the only thing that we'll be doing with the rest of the junior cycle. Some of the other active areas of work include specifically looking at some performance and efficiency improvements, and we've got some exciting things in progress there. And overall, we've got a great community that has been built over the years and has continued to participate, but we would love for you to get involved as well. If you have any questions or want to get involved, please reach out to myself or reach us at OpenStackSwift on FreeNode IRC. Take a look at the OpenStack Getting Started pages and swift.openstack.org is where you can find a lot of the Swift developer documentation. So I think I have some time for a couple of questions. I'm just looking over the chat now. Great. You can see whether I got the answers right or not. Yes, storage policies are in master right now. The use case with transcoding with storage policies would be, for example, a reduced redundancy storage policy. You would want to be able to store your GoldMaster video, for example, at Triple Replica, and then you could store the transcoded copies in reduced redundancy, which just allows you to save some space. And then that last question there talking about which sort of erasure codes we'll be supporting. And Robert was absolutely right. There's a library that's being developed in conjunction with this effort called Pykey Seaglib, which essentially is a pluggable interface to different erasure codes libraries. The advantage of this is it means we're not just supporting one particular erasure code. And given the realities of the legal landscape around erasure codes, it means that those who want to use something that they may have access to but is not legitimately accessible, it is absolutely possible for them to take advantage of that. One interesting piece around this is that Intel is an active participant within the Swift community here and is working on and integrating their ISA-L erasure code library that is optimized for working on their processors. So basically it's a very pluggable thing. We'll have some things that will work out of the box for you, but it's also configurable to something that you may need. Are there any other questions before we move on to Robert and then get to the end? You've got them. Great. Okay, well then thank you, John, and if we have more questions we can see if we have time at the end as well. Thank you very much. Yeah. Okay, Robert, one second. Let's get you loaded up. There we go. Great. All set. When you're ready, Robert. Thank you. Right, like John, I'm very glad to be giving this update. It's actually really nice to be able to take the time after the summit to pull our thoughts together because the summit is just so crazy busy. So I'm Robert Collins on the project technical lead or program technical lead for triple O and you can reach me on Twitter or my corporate e-mails there or just the e-mail I use on the list is fine as well. Although, you know, if you want my attention, probably corporate e-mail, it gets way less mail. And I like it that way, so that's not necessarily an invite to spam me. Next slide, please. In fact, probably skip over that one. It's just, you know, sort of place. So much more boring slides than John's. I apologize for that, but for an update, I think probably I would have run several hours if I tried to do an in-depth slide deck. So instead, I'm just going to run bullet points. So in Ithouse, we had working deployment of clouds, fully functional, but not with all the bells and whistles. We had DeVion, Fedora, Ralph, Sleaz and the Bunche support and all the major components, water, plants, heat, et cetera. We deployed using those to be a metal and we had experimental IWI support. Single control node, single node control plane, No8J. And one of the big things that we got done during the Ithouse cycle was we got CI checking of everything going into the tree up and running. We didn't have comprehensive coverage, it had come a long way from, yeah, it worked for me. We'll get it checked in. And that's a key part of maturity in the OpenStack community is getting good CI in place. Unlike most of the projects in OpenStack, we can't use the DevStack VM tooling because we're doing the deployment itself and DevStack is a different deployment technology. We wouldn't actually be testing anything that's relevant to triple O if we did that. So we've got a custom route that depends on running things that look like bare metal. So they'd boot off PXE, they have dedicated bare metal networks and that's got a bunch of investment. Long-term, and this is something that we want to try and do during Juno, we want to be able to turn any production NOVA environment into a emulated bare metal test rig. So we need to add some features to NOVA and we need to want to support this and that will let us reduce the friction involved and get us more into the sort of mainline test infrastructure. So in Juno, we're specifically expecting to bring on two new regions, the HP2 region and the highway region. These are being donated in much the same way that the regular Infra VMs are donated as part of these companies' contributions to OpenSTEP. We want to increase the robustness of our CI so we need better services in each region. We need mirrors of the distros that we build on and cages of pipe high and carpals and all of that sort of stuff. These things make a huge difference in the reliability and performance of triple O. We recommend deployers put them in place to make their local builds fast, so we need to have them for CI as well. We're looking to collaborate with Infra on exactly how that's done. Maybe it's deployed by Infra, run by Infra, or maybe they're deployed as part of a triple O plumbing in that cloud. We desperately need to increase our coverage. We need to run tempers on each deployed cloud and we need to test the upgrades really do work. We already catch some interesting things. We caught a neutron bug last week that's because we've got multi-node testing and all of our tests are multi-nodes. And once we get upgrade testing in place, I'm sure we'll bring similar benefits. There will be some different to grenade and hopefully we'll find different edge cases. We need to... We're looking to and we need to increase our integration with the regular Infra data mining on CI. Now, that's mainly about conserving resources. We've got a limited number of resources. We often have queue times when we've got a lot of activity going on, a lot of development going on. And if people are retesting to find out that something was a spurious failure or a genuine failure, it consumes a significant amount of resources. So if we can say quite accurately, this is a spurious failure. This is a failure to recheck it. That's useful because we can then say only ever recheck if you're absolutely sure it's spurious. And the other thing it does is it gives us guidance about what spurious failures we need to track down and fix what race conditions we're running into. We also need to monitor the regions because we had a whole bunch of trouble with the HP1 region due to Melanox firmware versions causing packet loss and it would drop off. So we had to talk to it from the infra-nopal and eventually we'd just not be running tests at all so it's not making good use of the hardware. So we want to get on top of those things and be able to pinpoint when things go wrong in a much more aggressive fashion than we were. It's pretty sad it's all stuff, to be honest. Lastly, this is what I mentioned just before. We want to do quintuplo. So we want to run triplo on Nova or on OpenStack. So the next big thing we've got over the Juno cycle is bringing our HA story along. So in this iteration we're aiming for a single component failure tolerance. So we're not aiming for auto-healing and we're not aiming for two-node failures. The plan is that we're going to default to a three-node control plan. So I'll let you explicitly ask for a single node and we'll give you HA. That'll be the default. All services, I say all, but I think Rabbit is going to be the exception. But everything but Rabbit and but local SSH to the machines. Everything else, we will access via virtual IT only. HA proxy will be listening on the virtual IT. S-Tunnel will be listening on the local node. And the actual service itself will only be listening on local hosts. So that gives us quite a good security story. Everything will be SSL, which is great from a security perspective. It's perhaps not the best thing for an HTTP performance perspective. So if there are things where people really want to run an HTTP version of what perhaps Swift Public Objects comes to mind, we'll have to have that particular service listening on the node local IP, not just on local hosts. But at the moment, we don't intend to do that. We're going to see how far we can get with a single design. Galera is going to be the initial implementation of cluster database. Like everything else, we're extremely happy for it to be flagrable, but we in the community are just going to get one up and running and then move on to other interesting problems. We'll still be able to deploy a single node control plan. But that's really about the size of that 3G. Slide, please. So for upgrades, triple O, because it deploys to physical machines and because we're using golden images, which is the default language of an open stack environment, image-based snapshots and so on, we need some way of preserving the data when we upgrade the software on a machine. So we put that into a separate partition on slash mount slash state. We need to finish making sure that all of the data we want to fix it during an upgrade is actually being recorded on slash mount slash state rather than somewhere on slash var slash lib or something like that. We're going to be getting rolling upgrades. We've got a lot of it in place, but again, this is something we'll be finishing during Juno. So when you do an upgrade, it won't take the whole cluster down. It won't take one node at a time. And we need to have the upgrade will only proceed if Ravitan and Galera are in sync. So if we have an inconsistence in cluster state, it will stop and let somebody fix it before it continues. It's not entirely clear how we're going to do the really precise sequencing that Nova needs for upgrades. From an ops perspective, they have a clear but non-trivial story. And we need to figure out how to get heat to be able to work that state machine. There are multiple different ways that we want to be able to do upgrades eventually. We'd like to be able to do it online where we don't reboot the node and we just take the image, unpack it on a glance server somewhere and then R think it's down so that lots of machines can benefit from it being unpacked just once. We want to be able to do multi-cast upgrades and things like that. But right now, the very first iteration is just going to be a Nova rebuild with the preserve ephemeral, which will preserve that mount state partition for us. It's quite likely we'll be able to get a first iteration of the no reboot story in place. And that's particularly useful for the case where we're dealing with a security bug in, say, Nova or something. We don't need to reboot the kernel. We just need to restart the service with a new version of the code on it. But the first iteration of it won't be pretty, but it will be a lot faster than a full rebuild. Next slide, please. So our third big thing is we need to get much broader coverage. So we don't currently have scripts to deploy Trove, Solom or Mistral. We need to add, I think, designate just gotten to being the incubator project now, so we need to add that to the things we support installing. We need to add monitoring as a default thing in the environment rather than something that people are adding in as an external facility. It's kind of a core thing for any of our clouds. Next slide. So Tuscat is the triple low API. It's a stateless server that aims to provide the logic that we run on an administrative node that's administered as a machine in a central place, which allows multiple administrators to admin the same cluster without having local state of their own that they can lose or have to synchronize between machines or have a dedicated machine that's not installed through the same tooling that they have to put it on. So we hope that during Juno Tuscat will be mature enough that we can replace the ad hoc scripts we've got that bring up the undercloud and the overcloud and instead use Tuscat for that. Tuscat's got a new leaner model. It had some state before. We've stripped that back and we're going to make extensive use of heat provider templates to encapsulate the various roles you have in the cloud. So the control plane will be at provider template. The hypervisor will be at provider template. We're pretty excited about this because this will finally get us to the point where we're collaborating directly on heat rather than layering stuff on top of it. And one of our fundamental founding pin is to triple O was to have a virtuous circle where we deploy OpenStack as an application on OpenStack and improve OpenStack to make a case of deploying applications as complexes itself. And we've done that very well in some areas and not so well in others in each one of the ones we haven't done as well as would like to. Next slide, please. So there are a couple of other interesting things and the last one is one I think in some ways the most exciting about. So we're going to get rid of these undercloud and overcloud terms. The terms that seem to be sort of gaining consensus are a deployer or deployment cloud and workload cloud. And the deployment cloud is the one that you use to deploy your machines. So it's an administrative-owned cloud and administrative-owned people that use it. And the workload cloud is the cloud that you deploy using the deployment cloud. And end users, if you're a public cloud, that would be your actual public cloud. But you might also deploy test clouds, which are distinct from the deployer cloud and they'll still be workload clouds. The particular workload you'd have if I'm doing a test of something. The last one is the some one. So this is where we've taken advantage of a bunch of risk action that's been done in NOVA over the last year since we first started looking at things. And it's now possible to run multiple hypervisors that have very different models for allocating resources. So KVM, Docker and LXC all look pretty much the same from an API perspective. You can subdivide a machine. You can split out CPUs and memory in just any way you want. It's all effectively virtualized. It's certainly all subdividable. But NOVA, BMX or Oronic don't look at all the same because they have to allocate a full machine every time and you can't over-commit. That's machine size in the story. Because we can now run them in the same NOVA scheduler we can run on the deployment cloud a single Keystone, NOVA, Blanc, Swist, Neutron and two different NOVA computes. We can run the Oronic NOVA compute and we can run a KVM NOVA compute. And that lets us take all of the nodes that we needed to put into the overcloud or the work-closed cloud control plane and run them virtualized. So instead of having kind of a minimum footprint for HA of three machines, for your deployment cloud and three machines for your control plane and then, you know, the different machines. Your first hypervisor, you can say, I've got three machines for my undercloud. They have enough KVM capacity on them to run the overcloud control plane and my fourth machine is my first hypervisor. So for small environments, this is a huge reduction in overhead. For big environments, it will get lost in the nodes. And yes, that is now the VM. Great. Thanks, Robert. Any questions there? I can't see them for the presentation mode and pulling it back. I can't see any in the chip. I was hoping to return the favor to Robert, but nobody asked any questions for me, too. Okay. That's impressive, though. You can answer each other's projects. Does anyone else have other questions? I can unmute the line as well. You are now unmuted. Any other questions for John or Robert? Going, going. Okay, well, great. When we finish the webinar, you'll receive an email and you can send us a foundation question, too, and I can get that information back to Robert and John as well. So if that's it, then I think we'll conclude. And thanks again, John and Robert, for your time. Really appreciate it. Thanks for the opportunity. Sure. And these webinars will be posted to the foundation channel by the end of the week as well. If anyone would like to listen to them again. Have a great day, everyone. Thank you.