 Well, good afternoon everyone. I hopefully had a great day today. My name is Bill Hanckert. I'm an IT architect at IBM and this is Chino. Hey, Dewin. Thanks for attending. Chino Sahu also an architect and Bill will kind of take you through some context of what we do in IBM before we kind of jump into the meat of the presentation. So hopefully I'm going to keep you guys awake. You know, I know it's been a long day and it's 5.30 and we've got stuck with the end session, but let's try to keep it a little bit interesting. I'm going to kind of go through the slides a little quickly and then we have some Q&A. You have questions and we'll get through it. So I'm going to walk out this way just because I can't really click and point or, you know, but so what's this thing working here? No, it isn't working. Sorry about that. Oh, here we go. So what we'd like to get out of this session here is obviously understand why we have a use case for centralized image management within OpenStack. What drove us to this? A little bit of an overview of the IBM Hybrid Development Cloud. This is a hybrid development cloud team within IBM. This is not internet-facing. This isn't software. We're an internal team within cloud that supports the various development and support teams and test teams within IBM. We provide services to them. And some of the technologies that we've used to develop this tool chain, Jenkins, Packer, IBM Endpoint Manager for Security, and obviously Microsoft KMS that we use for auto registration of our Windows images. So a little bit of a context here of the mission. We started in 2015 for all the IBM development teams within the cloud analytics security teams that needed to leverage OpenStack as we're a fairly large VMware environment and OpenStack is coming along. We wanted to provide a method to have continuous delivery, have them unbounded, give them that OpenStack. Here's your project. Here's your images. Here's your flavors. Here's your network. Go forward and deliver. And we wanted to step out of the way. We didn't want to inhibit them. So as OpenStack is growing in leaps and bounds within the company, we wanted this project and we wanted to basically have something to deliver to them and sort of step out of the way, have something common, have something that would be global. We have four locations that we're in today. Littleton, Mass, RTP, Carolina, Hursley, England, and Toronto, Canada. So we have global teams. The need for commonality across all these teams was paramount because a developer could be working on projects in multiple locations. Just to add real quick, I mean, two fundamental business objectives. We wanted to ensure our user-based development teams that build software, build services. We want to make sure they're enabled to do their development in a hybrid context, do their dev tests on-prem, and then have the ability to promote that workload or promote that offering out into our public cloud. And then also enable them with the tools and self-service capabilities to do continuous delivery. Deliver more frequently, deliver in an automated way, adopt more of the dev-op practices that we're trying to champion within our company and really across the industry. Thanks, Jim. So I'm going to give you a bit of an overview of our cloud here, and I'm going to use the laser pointer to kind of point out some of the things here. So obviously we have our OpenStack dashboard, and for those of you that are on my way here, I'm sorry. So here's our dashboard. We use Jenkins. We use stack like Rafaana over here, basically our login page. And one of the things that we did with our users was this lobby project. And so we created a project with an OpenStack and all the clouds. And we told users they had sort of a try and buy, go in there, kick the tires. Do you like it? Is it going to work for you? As they adopted it, then we engage them and we create projects and then start onboarding them to our environment. And we use Slack. Right here we have GitHub. We have a product called BluePay. That's a custom application with an IBM that we charge our clients for that come in to leverage our cloud resources. You can kind of think of it as OpenStack plus as a service. We used OpenStack as the base and then built a lot of functionality around and on top of it. Automating a lot of our business processes that are specific to corporate tools, corporate processes. And today we're going to actually focus on the image piece of this, the image customization and the image replication and kind of syncing that across all of our clouds. So just a little bit of an overview of our environment is here. And the right-hand side would be within the IBM intranet. And obviously we're using GitHub. We have obviously OpenStack. We have it in these four different locations. We use SpectrumScale which is GPFS for storage, Cinder, Nova and Glance. Obviously OpenSwitch, VSwitch for networking. Stack like Rafaana, Kibana and Nagios monitoring the network, monitoring the OpenStack infrastructure, getting alerts when certain services within OpenStack aren't responding. Good little tool. Right here is our security tool within IBM that we have to ensure that our images when they're on the IBM intranet are compliant. They're patched. They have the appropriate antiviruses. They have the appropriate password policies and they meet our business guidelines. This is our ticketing system. We use that for support. IBM CloudDesk is basically a user has a problem. They submit a ticket pretty much similar to other ticketing systems. Blue pages and blue groups. This is our LDAP authentication. We've wired in the previous screen. I can show you right here. We basically wire all that in. That's my IBM intranet ID into our corporate blue pages directory. We don't have to use any on-prem LDAP. We don't have to use any active directory. We have everything within the corporate directory that we can use. It's a lot easier for users. Also, it's a lot easier for us with these blue groups because we do use blue groups, LDAP groups to lock down and segregate projects from each other and give people access within the open stack infrastructure. As I showed you previously, BluePay, which is a usage and charge back tool that we use. Right over here. Basically, we have the interconnects into our software labs. We have dedicated links from each one of the labs and locations that don't traverse the IBM intranet. Typically, the IBM intranet is in each location. They go through a blue code proxy somewhere out to the internet. These are directly into the pods within software. We use these high-speed file transfers to move data from the labs within software that folks have content or images or operating open stack environments, operating system. Anything could be out there that they need to move data from their continuous delivery model. Let's see right here. Our Cloud Registry, BlueBox, and let's see, I can't really see that. Yeah, IT SAS, the same thing here. That's really an overview of our environment. Pretty large, a little complex, but that's the lay of the land. Like I said, we're focusing mainly on the image repo that we're going to talk about a little bit more, but I just want to give you a bit of an overview of how we're using open stack within our environment. All right, so the problem here is images, obviously, are what our users are going to use. As we see the patches, everybody loves the patch, and OS gets non-compliant, and things happen out there. VMs send out there, people don't patch, don't put AP viruses, problems happen, and then we have sometimes issues. So as you see, when it first started out, we have a nice pretty house, and then we have the arcade house and an old micro bus sitting next to it, which isn't good. So we needed to basically address this issue. I mean, I'm as guilty as the next is I was creating my own images and my location, I was promoting them to my cloud, Chino's doing the same, and we all kind of gathered together to say, listen, we shouldn't be doing this, we should have a central team doing this, we should basically coordinate this and have one central team managing, coordinating it where we can save time, reduce duplication of effort. Yeah, I mean, I think the message here is you've got to take care of your images, you've got to take care of your instances. We don't want people to do it, so let's build a tool or build automation, and then take care of the automation, instead of having to take care of the images themselves. Okay. Thanks, Jim. And so the case for change, as I said, eliminate the duplication of effort, improve how we deliver the images, avoid the images getting out of date, out of patching, and obviously provide that consistent user experience across all clouds. Like I said, we're in four locations. We want to ensure if a developer that logs in to Hursley England has an Ubuntu 14.04 image is using it, is the same user experience as they log into the RTP cloud. So we try to keep that level of consistency the same across all the clouds. Okay, and as you see right here, we have a pretty, you know, this is just a snippet of all the different permutations of what we have, I'm sorry, a snippet of the permutations of all the different operating systems, and potentially what packages people may want on them. And we'll talk about that a little bit in the next few slides. I mean, our users are not different than most users. They're going to want a base OS image plus something else. And so instead of having them just import their own image into their project, we want them to kind of follow the standard process that will end up spitting out an image that will be compliant out of the box, will have what they need, and will be maintained over the long term by the automation that we have in place. But you'll see that this slide really illustrates the problem. There's just so many different combinations that it's a scale problem, and we want to standardize that approach. OK, thanks, Jim. So here we are at a high level. The solution that, and luckily today, we have one of our developers here that has helped develop this product, which is really neat. So really, I'm going to stand over this way and kind of go over at a high level right here of, whoops, sorry, I keep it in the wrong button. So really what it is is we want to publish a compliant OS every 30 days to an image catalog. That image synchronization gets detected. We do an image sync to the different clouds and the different locations. And then if a customer has a requirement to say, hey, I want Java, I want DB2, I want Nginx, they basically make that request, and that will basically trigger the synchronization, and then the image customization will happen automatically. So the image repository right here is available throughout the IBM intranet. Folks can sync to that. There's a global storage area where they can pull those different images, and it's a cron task that runs every night to pull this data down. So here is the manufacturing, starting the manufacturing process of the base image. And I can go over here. So basically we start with an ISO, like an Ubuntu ISO. We bring it in, and then we load it into Glance. And then the sync to create, we do a nova boot. We create the VM. So we create the VM and create a volume with that. And then that gets published into Glance. So once that's in the Glance repository, then, as you see right here, here's the VM right there. And there's the Glance image repository right there. The Jenkins job gets kicked off right here. And basically we do a Jenkins job that talks to the master, and so for example, in this case, we're going to probably be building Ubuntu. And so the job starts off. It does a clone of the image. It starts Packer, which is the OpenStack Builder orchestration tool. And then we start these different steps through there as we walk through the manufacturing of each one of these pieces. And some of these processes are optional, but some can be run, and then we run a test. We then copy the metadata. Now this right here, the IT SAS registration in a scan. This is our security tool, where we need to register our images within our security tool. We need to also scan it for any vulnerabilities open ports. And then essentially what happens in our IT SAS pool is a floating operating system record gets created. And the security system now says that this is a valid image, it's compliant. The end user that eventually gets this image will be compliant from day one once they get it. So then once that is complete, if there's a failure, obviously we get notification. We use Slack. The Build team will get notification of that. It is then published up to GSA. So GSA is our IBM Global Storage Architecture. It is based on GPFS. It's a clustered file system. It's available all around the world. This is where the Cron job pulls to the different locations among IBM. So I'm going to just stop right there. Right here is kind of a busy chart. Questions right now? I mean, I'll take some questions. Yes? Yes, we can do the state check. Yes, that is, yes. Steve, if you could feel the question. Yeah, that's right. We start with the basic. So it's a bare operating system. Yeah, OK. Any other questions? OK. I know it's a bit of an eye chart here, but trying to break down the methodology that we used here. So the image customization. And this is the point where if an image was published and the end user wanted, say, some different package on it, we could trigger another job that would install Java. And then it would be to their project. They would get that delivered to their project. They would have a customized image for their usage. And once again, as you see through here, it goes through and the job runs the plug-in packer, the notification through Slack, and then it puts in some of these job params here. The user actually defines what they want in their image in a GitHub repo that we share. They define what they want on the image, and then that gets pushed as the inputs through the Jenkins job to build the image and then push out to the cloud. Question? That is correct, yes. Yeah, there's a customization job that can run if they want to separate. Because on the onset, we didn't want to hit every premium. We wanted to provide a base image, and then it's up to the developer what he or she wants. But then we knew there was the need for, of course, it's not going to just be straight up onto. They're going to want other things. So we have that ability for them to do that. It's all user-initiated, the user-initiated. Yeah, it's all user-initiated. They go through and define what they want in Jenkins. Jenkins automates that process. And 20, 30 minutes later, they have that image in their OpenStack tenant. Think of another question? Oh, I'm sorry. Go ahead, yes. Go ahead. Sorry. Sorry. Yes? You mean for updates? If you assign a floating IP? Well, I mean it can not outbound, right? So if it's in there, it can pull updates to the base. So if the user, so the thinking was to keep it. So for 30 days, it's compliant. That's our policy. And then after 31 days, if there were patches, they do have the ability to pull patches out. You know? Yes? Yeah. If you want the latest version of whatever software customized on your image, you'll have to go through and recreate the image, you know? OK. Wait. Sorry. Still working? Yeah. So a couple of slides back, you had a security scan somewhere? Yeah, right over here. Yeah. So when the customers come in and they modify the image to add whatever they want, do you go back into that flow and rerun the scan on that custom image? Yes, yes. So if they obviously have to install Apache or if they install some down level version of something or whatever, which we don't want them, you know, which we ensure they don't do, then the scan would pick that up. So I mean, if they're going to, regardless of, once they have that software installed on their image and they're out there in testing, after a certain cycle, there's a scan that our security system has to do at a certain point in time on that instance to ensure that they're compliant. Well, we do. Yes, no, we do. We ensure that what they get is a compliant image and the record that's in our security system indicates that at the date of manufacture, this image has been made compliant, moving forward once it goes into the, you know, when they run it with a Nova. So it's always compliant while it's out there. When they provision, they know that they're getting an image that is fully compliant to our security specifications. Sorry, there was a gentleman right there, I had a question. Is it 30 days, Steve? The version of how the time period we keep the image? Yeah, every 30 days. Well, we'll update it. We'll update it. So we tend to keep, like, N minus one from, you're talking about major releases, like, you know, as long as there's demand for a version and it's still being supported, we'll provide it. Yeah. Yeah. Yeah, I mean, it's definitely a management nightmare. You know, it's a function of how many images. Your management cost is going to be a function of a number of images. So we try to minimize that amount while still maximizing what users need. Question? Yeah, the previous slide, where you start with the ISO? Yes. A little... Is that a typo? Is that a type? Is it manual? Yeah, all right. I'm extrapting, man. Yeah. But it's only done once. Yeah. So as you see right here, it's the major OS release is done once, and then upon that, once that's in the repository, it's patched upon that. So we don't have to do that again. So, I mean, if 1704 comes out, they're going to do that manual process once more. But then when 1704.1... One time per major OS... Yes, yes. Any more questions? I'll just crank along here. I think I stood here. All right, and so here's a little scene behind Packer, the JSON. And as you see, kind of the YAML file that is used to create the user. I mean, create the different components. We have the type, what it is. OpenStack, the different regions, the availability zones, and we go down the list of the networking, the floating IP. So this is what gets, excuse me, baked into the image as it gets created. And just another look behind the scenes on the Jenkins job that will get kicked off to run Packer. And then notification, once again, using Slack notification, if the job failed, it succeeded, we go into Slack to ensure that the functionality is there. So the next slide here is that we had a bit of a, in the previous slides, we had some IBM Endpoint Manager. And this is a, if you heard of BigFix, this is a IBM security tool, that we have optionally the end user can install on their image that the IBM Endpoint Manager can go out and maintain patching, compliance, a fairly large portfolio of software to do a lot of this work, but it's used in IBM against a lot of our images, does a lot of stuff behind the scenes to keep the images that are long-lived. And we'd only use this on a long-lived floating IP addressed instance that's running out there. And it handles more than just OS patching. It'll handle any sort of middleware or software that would require patching it. It doesn't have 100% coverage, but it's pretty close. And it's really key to keeping instance-level compliance ongoing for those rare cases where there are long-living, long-running VMs. We tend to try to drive users to minimize the number of long-running VMs. OpenStack is all about dynamic workload and provisioning when you need to and destroying as you're done. Question? So it's within our security tool. The end user has the ability, if they deem, they want to have this function run against their machine. If they just wanna set it and forget it, then that will go off and do the patch management. It doesn't, I mean, the change control part of that. Well, that's what IT SaaS is for. That's what the security tool IT SaaS will basically make a notation that on this day, IBM Endpoint Manager applied these patches on this. So if we're in case of an audit, we have basically the record of history of how this instance was patched. So that change would happen on the endpoint, when the IBM Endpoint Manager happens. Right. Right. So for these, we call these, so we classify things a little bit, our security classification. So these would be considered test and development machines where the change management rigor is not as a production-facing machine. I mean, if it's like a critical patch, they'll get nag notes and eventually get escalated. I mean, I'll go to their manager, the manager's manager up through, I mean, it's, we have a tool that does the change management that every image and instance is linked to. So any patch that needs to be applied, if it's a, you know, sub one critical patch and it doesn't get patched, it'll eventually get escalated to their management. And let's see what else has been right. So obviously our Windows machines, once they are our Windows instances, I might say, they are managed by a KMS server. They come online. They automatically hit the KMS server to register as people know with Windows doing unattended syspreps and all that stuff can get kind of gnarly. This kind of handles some of the key management for us, which is pretty good. I mean, I think it helps a lot of the folks that are using Windows to register these things. And, you know, it's, as you see the numbers there, one KMS, about 150,000 instances where you basically were churning and burning instances, Windows when we're doing tests, so we need to re-register. And so the outcome here, so we have common compliant images across the clouds. You know, no compliance cost for 30 days. So you got a 30 day get out of jail free card. And after that, if they continue to keep the image, then they'll have to maintain it within our security tool to have to patch it. We integrate with our compliance system, which is key because we can't have machines that are not patched, unsecure, running on our network. As you see, we have an automated framework to customize the specific stacks. And the user experience from the image perspective is the same across all geographies. So we have the same look and feel. And the reduced cost associated with that is, you know, on average, we're saving around 16 hours, man hours per week across each location. So we don't have everybody doing it. We have a team that is central that handles all of those for all the different geos. And that duplication of effort that we've had of different sites. So we have to free up the local IT teams to do other things and support our other functions within the team. And here's some numbers here about BigFix, some of the different locations that they're in within the IBM Corporation, and some of the patches and what they are doing across the different geos. So fairly robust environment there. And, questions? Yes. Yes. It's via a requirement. It's a Jenkins job that gets through. They log in to Jenkins. The UI that we have in Jenkins for them. Yep. Right. It's a different menu as opposed to, yep. Other questions? It's post-process that happens. So when the machine is actually built, it has knowledge of the KMS server and it knows where to register to. Pardon, I'm sorry. That was through, I believe, the Sysprep preparation. When the image, yeah, and the boot. Yeah, it's in the bootstrap process. Other questions? Yes. No, they're always registered within the, our security system. So because, right, because we actually have jobs. So we have a listener that listens within the framework of OpenStack that sees something. There's a message queue. That basically says, oh, there's a new Windows instance. I'm gonna register it. It has the Fossee record associated with it. It belongs to this person. It's in there. And so each user within the development teams has their own login to the security portal. So it would show up on their list as, okay, now you have a new image within the security portal. I mean, compliance is just about, it's not just about being safe and secure. There's a lot of emphasis on audits, making sure everything's documented. Is there anything that touches the network? We're required to captures, that this VM by this person with these characteristics and this OS touch the network, mostly for audit reasons. It's less about being secure, to be honest. Plus, I mean, we wanna ensure that we're not gonna have any malicious activity on the network. Yeah. You know, obviously it goes into the product. So we wanna make sure that there's no bad things going on. Yes. Yes. We don't send our images out there. They're only within the blue network. They're not out to software. So within software it's, they have their own methodology of provisioning. So software, if you ordered a server, a Red Hat server, they would provision it for you. They have their own provisioning system. So yeah, and I mean, if you would deploy, so if you would just get bare metal and install OpenStack, then it would be a different story. We could use our own process out there. But if we just ordered bare metal, or if we ordered like a Windows sent us, a realm machine out there, they have automatically, they'll provision it for you. They have their own satellite servers. They have their own yes servers. They have their own kickstart servers for all that stuff. Today it's separate, but we've got a work item to try to either host soft-layered images, public cloud images within our on-prem cloud so then they can literally port them over. But today it's a recreate the workload in public cloud. But we don't really have a good portability, workload portability, whether it's VMs or containers. We don't really have a good portability method from on-prem to public cloud. It's more about enabling the connectivity between the two. Any more questions? Well, I thank you for your time. I hope you enjoy your event tonight. And thanks for taking the time to talk. Listen to us.