 All right, we're gonna start a little late here last session right over, but let's get going so we're not wasting any time. What we're gonna talk about is Cloud Foundry on OpenStack and kind of what we've gone through as IBM running our Cloud Foundry platforms on OpenStack, kind of challenges we ran into how we worked through them and I went from there. So if you're not familiar with OpenStack and I'd be surprised, but it's a IaaS cloud operating system, open source. So we use it internally for our infrastructures or service platforms. There are other companies that use it. The key pieces to OpenStack, we call the core services, Nova Neutron, Swift, Cinder, Keystone, Glance, their compute, networking storage, object storage, block storage, authentication, and image storage. So these are the core components to OpenStack that you would need if you were running Cloud Foundry on it. Obviously, we're at Cloud Foundry Summit, so you have a pretty good idea hopefully what Cloud Foundry is. So IBM as a whole has been involved in Cloud Foundry for quite some time now, both on our public Bluemix platform as well as our private platforms. So if you're familiar with the way Cloud Foundry works, use Bosch as the deployment and lifecycle management tool for Cloud Foundry. So this is how you deploy and run Cloud Foundry and the way you deploy it on different platforms is called the CPI, Cloud Provider Interface. So Cloud Provider Interface tells Bosch how to talk to the underlying components. So there's CPIs for AWS, CPIs for VMware. There's also CPI for OpenStack. There's some other ones. IBM, we've created one specifically for SoftLayer and I know there's some other newer ones out there to deploy directly on bare metal and things like that. So the CPI is really what tells Bosch how to talk to the underlying layer. So when you're doing a Cloud Foundry deploy, there's a bunch of different components. So Bosch is the mechanism that does it. One of the key things you need is what's called a stem cell. So this is your base OS that you're going to deploy onto that cloud that all those other pieces are going to be put on. So in that release is the version which software packages, configurations, all the components that are going to make up that Cloud Foundry release that you're going to deploy, and then it's all tied together in a manifest. So the manifest basically tells Bosch what to deploy, where, what the networks are, all those type of components. So all those together is how Bosch deploys Cloud Foundry onto underlying IaaS. So starting with the problems that we ran into, not just with IBM, but what we've seen and from talking to other Cloud Foundry teams that they ran into when trying to deploy on OpenStack. Key ones, instability, especially with the earlier releases of OpenStack, different APIs, API changes, API performances, all those types of things. And they change between releases. Some people would deploy their OpenStack. It wouldn't be a contiguous release. So they'd use some services from one version, some from another capacity. So this is where we've seen OpenStack uses flavors similar to AWS. So we have like VM sizes, and they're not always ideal for Cloud Foundry out of the box. So what happens is you end up with a lot of wasted resources because you're using the next size flavor up. Networking, how, you know, you're already, if you're doing something like OpenStack, you're probably doing some sort of software defined networking or networking encapsulation, then you're throwing Cloud Foundry on top of that. So how does that all play together? Well, the other thing is enterprise software. So, hey, cool, we deployed our Cloud Foundry, you know, our DEAs or in Diego or whatever. But we need these other pieces, say a commercial software package, we want to run Oracle for a data service behind it. Well, that may not have been supported on OpenStack. That was another challenge we saw. And really what you're trying to do and the idea with CPIs and the way Bosch works is to be able to make them really generic. So we can just swap out the CPI and deploy the same to different IaaS layers. But the problem is they're different enough where you can't really get to that layer. And then the other thing we're into is if we have customers and had customers who said, well, we want both. So we need IaaS for some of our stuff and then we want Cloud Foundry for the other stuff. So we want them all in the same layer. And then it's how you handle permissioning. So for example, the credentials you gave to Bosch can't step all over the VMs that the users are creating individually and vice versa. HA, what works, what doesn't, how do we provide availability for all the Cloud Foundry components and as long as well as the OpenStack components. And then the constant release cycle. So OpenStack does major releases twice a year. Cloud Foundry has a pretty aggressive release cycle as well. So how do we get it deployed and then be able to keep up with those releases? So there was a survey. Someone from IBM did asking people kind of what they were running into with OpenStack with Cloud Foundry. So what's your level of experience? You can see most have had some experience and you can see how much difficulty did you have with it. You see the biggest one was significant difficulty deploying Cloud Foundry on OpenStack. As we dug into that a little bit more, do you have to customize anything? You know, it's almost 50-50, you know, making customizations to get deploying their environment. And the two biggest ones here you can see from a what were the biggest issues with Bosch? Instability OpenStack environments that they were working on as well as the difficulty getting the initial setup going. So then we asked more like what versions are you using? This survey is a little old, but you can see it's a couple releases behind. And then what type of OpenStack? Is it locally managed, remotely managed, all those kinds of things. So most of these environments were their local OpenStack environment. So talking about that real quick, what we use at IBM, our OpenStack is called BlueBox. It's a managed OpenStack offering. So we take care of all that for the clients. We operate it. All they do is consume it. So whether you do it in a software data center or in a customer's own data center, we remotely manage it, upgrade it, keep it running, troubleshoot it, all those types of things. And we provide all the usual OpenStack services in a highly available fashion. How do we do that? We have a tool called Ursula, which is open source. It's kind of a wrapper around Ansible that allows us to deploy these OpenStack environments very quickly in a very controlled manner. We get very good repeatability out of them and also upgradeability. So, for example, the latest full release of OpenStack is Mitaka, which came out earlier this year. All of our customers' clouds already upgraded to Mitaka, which you can see some people are even just starting to look at doing that. And the only way we've been able to do that is automation and orchestration through our tooling. So by doing that, when BlueBox got acquired by IBM last year, and the BlueMix team got to see a very good, stable OpenStack environment that they could start to deploy to, that's when we started mixing that in. So if you're not familiar with BlueMix, which when I first came to IBM, I always just thought, well, BlueMix is IBM's cloud foundry, but then there's a lot of other components to it. So all the backing services and additional things like that. But the core of it is still cloud foundry. So it's a very large environment we offer as a public facing cloud foundry environment, as well as where we're focusing here with the OpenStack is dedicated and private. So single tenant, either in a software data center or in customers' data center. And BlueMix can get pretty complex once you start pulling in those other non-cloud foundry things. So besides the base cloud foundry stuff, we have things like container service. And, you know, we can do GitHub Enterprise and all these other pieces. So that it's not just beyond the basic botched install of the cloud foundry components, but it's all the other pieces around it. So what did we learn? Well, first thing here is the best place to start is if you have a cloud foundry environment is there's a good information on the foundation's website to validate your OpenStack environment. Yeah, some basic things you can find out to check to make sure you have in place before you even try to deploy. And there's also a newer project called the OpenStack Validator, which you can actually run against your OpenStack environment and it'll test a bunch of the API calls that the CPI would make anyway and make sure they work again before you go and deploy. Because depending on the size of your environment and the performance of your equipment, you know, cloud foundry can take a little while to deploy, so you don't want to start a deployment and then you know it's going to fail anyway. So you can run this tool first and it'll check all your settings and tell you kind of like a pre-flight check to say, hey, we couldn't delete a disk or we couldn't delete a VM or you know, there's some bad permissions, something like that. This tool gives you the ability to see that. Sizing. So I mentioned earlier the default flavors. What we've done is we've gone with custom flavors to match the different components of cloud foundry. So you can see here the DEAs, router, core nodes, service gateways. We have different and that's what we start our flavors with CF for these. So we have these, how many gigs of RAM, how much disk. So we use those and then you can see the different counts of all the other ones. And then on the right side to the amount of persistent disk. So this is our internal, we'll use for base sizing for starting the cloud foundry core deployment. So it uses a good bit of resources to get started but we found using these settings and these sizes has given us the most efficient use of the resources we could, we can use based on the underlying stack environment. So now we get into the scalability piece, it's usually the next question. Cool, we got it up and running initially. How do we scale from here? So on the public cloud side, we've scaled, we've proven cloud foundry can scale extremely large to tens of thousands of machines and things like that. But what about in a private open stack environment? So this is where we found that starting with that base consumption right there, you see in the blue, that's kind of where we get started. How many you need 32 cores, 700 gigs of RAM, six terabytes of disk and 1.5 terabytes of being persistent disk. So this is your base starting and as we've scaled some of these customer environments, we've found kind of like what those blocks are as they go. So you can see based on that initial starting point that gets you one terabyte of application memory. So depending on how many gigs or megabytes you give to each runtime, you know, that's how you could chop that up. But that will get you a terabyte right there. So each, the assumption we make and how we've been applying is each DEA, and this is all based on DEA's not Diego, four core 32 gig machines. So every time we add 28 additional machines, we get additional terabyte of application capacity. So that's kind of, that's like our sizing numbers we worked off of where every time we want to add, so we brand it to an even 32. So we say every terabyte of application memory, 32 DEA's. So it makes it pretty easy when we do sizing to work off those numbers. And each of those 32 DEA's would sub use 12 terabytes of data store capacity. Then as we get larger, we start growing other things, more aggregators, more, every three terabytes added, we add another API worker and another go router. Once you get into the services, those are really service dependent. So if it's something smaller, you know, or larger, that's really depends on the service, whether it's a Redis or MySQL or whatever you're using for your backing services, that's totally based on that individual service. So some of our services are lightweight that we use and some of them are really heavyweight. Another thing we've seen is the, especially during a deploy, Bosch can be, can hit the API really hard, asking for resources and configuring things. There's no sort of rate limiting on it. So we need to make sure we've the API settings for OpenStack turned up to increase the limits. So that way we don't get timeouts and because we don't want the Bosch deployment failing due to the lack of API resources. So, oh, this is another thing we've run into is name based security groups where in OpenStack, you can create security groups, they have a UUID, but you can also give them a name. So when it does it by name, when you give it a name, the first thing it has to do is go across the message Bosch to the database to find out the UUID of that security group before you do anything with it. So it just adds a lot of extra overhead where if you can reference the security groups by UUID, you're skipping an additional transaction. So that's, we saw reduced a lot of overhead on the back end. This is the next one, Neutron. So we use in, on the private side for BlueBucks, we use Linux Bridge with provider networks and VXLand, but if you're using OpenVSwitch, depending on the networking technology you're using, you have to be careful of your MTUs because by default, if you're using a standard MTU, say on the physical host of 1500 and you haven't changed it, and then you're using VXLand, you already cut that down. So you start, hang on, sorry. You can use the, you need to turn it down because you can see 1400 or 1460 depending on its GRE or VXLand. You'll run into problems with that anyway if you don't, you know, outside of just even Cloud Foundry. Once you start getting, you know, networks inside of networks inside of networks. Also the, you can change this compute scheduler driver as well to try and balance it based on, instead of the default scheduler. So you want to balance it based on, on load, which is obviously when you're standing up a lot of DEAs, this is, this is a key piece of that. On the Bosch side, on the Cloud Foundry side, NAT's timeouts are, can be a challenge anyway, when you have a lot of components that are, that are talking, if you're not familiar, NAT's is the message bus for, for Cloud Foundry. So increasing that timeouts gives a little bit more, it's a little bit more resilient than dealing with some of these issues because it'll wait a little longer for, for these pieces to come back to it. So as that grows, that's, that's something that can get, you can start running into a problem with pretty, pretty quickly. Isolating components with multiple networks. So this is where you can kind of get more efficient with your network allocation. Again, at the smaller side, it's not as important, but as it scales, this is where you start to run into problems and delays and congestion that can cause, you know, kind of this, all these pieces that need to talk to each other start to have challenges. Also any of the stuff that is communicating inside the OpenStack environment, so if you're not familiar with OpenStack, it is the concept of private networks and then floating IPs, floating IPs are the public IPs to talk to the outside world. So generally, the best practice is anything, any of the pieces communicating with each other, even your own. So like, let's say you're using Elk as your logging destination. So you're going to log everything to Elk. You need to have the logs coming from LoggerGator and from Cloud Foundry to go there. If you go out through the floating IPs, you're going through the Neutron router. So you're adding additional processing that needs to happen and more load on your cloud versus if they can go, if they can be directly connected with inside the private network, you'll cut down a lot of extra overhead. So that way they can communicate directly without adding an additional routing. Especially if they're both on your, one less, if you're both in the OpenStack environment, you have floating IPs on both, now you're doing it both ways. So it's hitting the router in both directions. So if you can have them talk over the private network, you'll save yourself a lot of overhead. This stuff is where you start getting into, some of it is kind of common sense, like, hey, don't open ports that you don't need, you can keep to the minimum. That's kind of basic computer hygiene type stuff. But the other thing is when you get into certificates, certificates can always be a challenge with anything. If you are using self-signed certs, you have to include in your manifest the location of the CA that signed the cert. I recommend not using self-signed certs because they're so easy to get certs now with stuff like Let's Encrypt. Using self-signed certs are just asking for problems. The other thing, don't use full admin credentials on your Bosch manifest. So when you give the manifest, you have to give it OpenStack credentials so it can go build all this stuff. If you're giving it full admin credentials, it can literally do anything to your underlying OpenStack cloud. You know, delete entire networks, delete entire machines, whatever. And yeah, minimize your use of floating IPs as much as possible. Every single node doesn't need to have a floating IP. Only the things that need to actually communicate with the external network. So if you're using, you know, whatever kind of load balancer, for example, if you're using F5 or HA proxy or data power, one of those, obviously they need to have a public-facing IP address. But all the individual CF components don't need to have public IP addresses. When I say public, I mean just outside the OpenStack cloud. That could be still on your private internal network. So this also another challenge, too, is how some of these components will get out to the internet to get new releases or even during the initial deployment. So the Bosch, when you, in the manifest, you tell it which release you want to use, it's going to go out, pull it in, and then deploy it. So if you're behind a firewall or proxy and it's limited in its ability to get to the internet, you're going to have a problem because Bosch is going to be able to pull its components in and then depending on the components you're deploying, some of them are being compiled and built locally, too. So again, it has to pull those things down. So from an OpenStack perspective, you don't need to have a public floating IP to get to the internet. But what we've seen catch a lot of customers up is they say, oh, we need to give this VM access and it has this private IP address, and we give it outbound 443 access. And then it still doesn't work. And the reason is, is if it does not have its floating IP, all the source from the rest of your network's perspective, all the source traffic is coming from that gateway IP. So you're actually letting the gateway IP out over 443 or whatever. Also Cloud Foundry out of the box does not support SSL packet inspection. So what we've seen a lot of larger, especially larger companies, they'll do have a different certificate that they've trusted to inspect all the SSL traffic going off their network. So basically they're doing a man in the middle. Cloud Foundry does not handle that. So in these cases, when they need to go out to the internet to get those things, we generally customers have had to whitelist the Cloud Foundry environment to say, don't do SSL packet inspection on these IPs because it just won't work then. So break it. The only thing it'll work with is one that's, if it's an internet authority sign certificate. So from my experience, most customers don't. They have a self-signed internal cert because they can push that CA cert to all their laptops or whatever. So they're like, that's fine. They don't want to pay for an external cert. And this is the type of stuff it breaks. As I mentioned earlier, optimizing capacity. So making sure that your OpenStack flavors are right to match your Cloud Foundry So as you see, the defaults in OpenStack.com are very similar to the AWS ones. They start with M1 small, all that kind of stuff is we create a set of flavors specifically for Cloud Foundry. So we start with CF and then the different size machines for the different roles. So that way we can get the sizes down as much as possible. The other thing you'll run into, if you're not familiar with this part of OpenStack, OpenStack is the concept of the metadata service. So Cloud in it, when a VM boots, it actually connects to an internal HTTP address and pulls out its config information. That's where you can feed it additional details of what to do when it boots up. So hey, when you boot up, then run the script or things like that. Some people have that turned off and they use a thing called config drive instead, which is instead of it going to an HTTP address, it mounts an ISO that, you know, it mounts what looks like to it as a CD-ROM drive. So you actually have to configure that in your Bosch manifest, otherwise it'll fail when it goes to deploy because it can't figure out how to tell the VMs what to do. So that's something that we ran into when we saw customers and why it wasn't working and turned out that the metadata service turned off. This is one of the bigger challenges is if you're going to share this environment with non-Cloud Foundry workloads. So we have some customers there, the Cloud Foundry, the OpenStack is only there just to run Cloud Foundry. In other cases, they want to run other workloads and have Cloud Foundry also. So the challenges can be here are the going back to the credentials. So admin is too much, but some of the individual tenant admins are not enough. So what you really have to do is you can't use any of the out-of-the-box roles. What we generally recommend is specialized roles just for Bosch to use for Cloud Foundry where, hey, it's a tenant admin, but it also has some high-level stuff like the ability to change flavors. Quotas are a big one. Modifying quotas is really an admin thing, not a tenant admin thing. So that's something we run into too. So you end up having to create a specific role just for deploying Cloud Foundry. Like I mentioned, OpenStack, we have two releases a year at a minimum. The major release is in, there's point releases in between, and then it's a continuous deployment model. So there may be hot fixes or bugs or patches that need to go in as well. So how do we keep that rolling without taking down the Cloud Foundry environment? Moving from one release to another, in the earlier days, people would do, so they do a new environment and then migrate. Luckily, we've gotten past that in OpenStack for a number of releases where you can do in-place upgrades. And what we basically found is we can stagger the upgrades and with the controllers at redundant services, we can upgrade one, flip over to the other. So we can keep the upgrades to a minimum. The other place where we run into it is the hypervisors themselves. That's when you end up having to reboot nodes because there's a kernel patch or a hypervisor change. So that's where making sure the Cloud Foundry services and the nodes that are on, what order of those come back up, is important to make sure that you can keep your instances up and running without having application level outages. If you have enough DEAs or Diego nodes, the idea is even if you lose an app instance, you have other ones still running behind the Go Router so it shouldn't even be noticeable. Most of the ones are for the minor upgrades or just a little fix here, a little fix there kind of thing. It's just a service restart basically and it's usually only a couple seconds and it's generally unnoticed by Cloud Foundry. And then if there's any config changes, so something we find later that a parameter has to be changed. For example, when we first ran into that API limiting issue, making those changes, again, restart is just a restart of the service, so it's not as bad. Again, it's a couple seconds and it's generally not noticed. The key takeaway I would say here is automate everything. Automation, automation, automation. It's the only way we're able to keep our OpenStack releases upgradable, supportable, as we support the number of environments we have. It's the same thing with Cloud Foundry using Bosch. We put all those pieces together. Our Ursula tool that we use to deploy the OpenStack and then we're using OpenStack as a tool called Rally to run testing against it once it's done, to make sure it's ready to go and run all those validation pieces. And then the Cloud Foundry, there's a fog. There's a Ruby gem called Fog, which will basically discover OpenStack and feed that into our Bluemix Cloud Foundry deployment tool, which basically builds the Bosch manifest. So that way, there's no chances for mistyped credentials or anything like that. It's discovering it live. Making sure that we're finding and checking all those components. And that way, when we go to do the deploys, it'll go and create all the VM configs, create all those on the fly. So all those things we talked about, the specific flavor sizes and everything like that, using this automation tool builds all that stuff for us automatically. Pulls down the stem cell for the OpenStack environment, generates the manifest, deploys Micro Bosch, does the whole thing and away it goes. So that's kind of how we've been able to do this in a very repeatable fashion. So in customer environments sometimes it was challenging depending on their underlying eye as they were giving us, presents and challenges with deploying Cloud Foundry. Now we get this very repeatable, understandable OpenStack environment to make deploys much quicker. So to wrap up here, this is kind of how we deploy it. So we do managed OpenStack and managed Cloud Foundry. So we use this on the blue box level. We use a thing called Site Controller that deploys and manages and upgrades and handles all the OpenStack stuff. And then the blue mix side, it's a similar concept. We call it relay. So there's basically one local machine that then kicks off all these other pieces. So once we just have these base components and we can build the entire OpenStack environment and build the entire Cloud Foundry environment on top of it in a fully automated fashion. So again, why customers like Cloud Foundry on OpenStack is the fact is you're getting an open paths environment with Cloud Foundry and an open eye as environment with OpenStack. Both of them have very strong communities around them. A lot of contributors, a lot of sponsors, you know, besides IBM in both spaces, just the ability to open source community work together and build these things, you know, like that, like the OpenStack CPI is now being leveraged by, by not just Blue Mix, but Pivotal and ATOS and a whole bunch of other companies that are deploying on OpenStack gives them that capability to work together and get you that fully open eye as and path solution. So meeting those installation requirements for Cloud Foundry is very straightforward. Once you kind of, once you learn some of these lessons and as you're doing your deploys, it gets much easier. So we've seen the two of them put together, make a real difference in our private Cloud Foundry environments, getting this predictable OpenStack layer for our client's environment, so that way we can get a good, clean Cloud Foundry installs and then a good customer experience, right? At the end, it's about the user's, the developer's experience, so they want a predictable Cloud Foundry environment and Cloud Foundry needs a predictable eye as environment, and we get all of that in a, in an open source, easy to use kind of environment. And we are out of time. So any questions?