 Okay. Sounds like it's time for us to start. Hello, everyone. Welcome to this talk. I'm Igor Bolotin, and I'm chief architect of Symantec OpenStack Cloud. Together with me today, Richard Bush. Hello. I'm Richard. I'm a technical director at Symantec. I previously was at Google doing image part-image work, and so now I'm doing Symantec and doing something similar, something a little different. Let's start. So let's talk about a little bit about environment in which we find ourselves. Just like many companies, when they move in, enterprise companies, when they move into cloud the first time, we find that you start with a lot of existing complications that run on a variety of environments, mostly bare metals, some virtualized variety of operating systems, some new, some old, quite frequently very old. You find different Linux flavors, you find different Windows versions, and then the company starts to move into the cloud. Why? To start to move into the cloud. Well, I'm not going to talk about why cloud here. We all know why. The first kind of baby step, people move into cloud virtual machines. Maybe not rewriting applications yet. Maybe just moving into the cloud, because it's easy to get VM on the cloud. And by the way, you can also consume some cloud platform services, and we provide these services. It's just the first step. The long-term vision, well, we're not going to talk about long-term vision here either. But the very first step you find when you go into the cloud, the first question you ask, so who is responsible? Who is responsible for all of this managing of different operating systems, different images that you got on your cloud? Oh, if you go to public cloud, the responsibilities are very clear there. Well, cloud provider is responsible for the provider images, and that's all. Everything else is responsibility of tenants, of users of the public cloud. When we go into private cloud, that responsibility is not that clear. Why it's not clear? Because, first of all, we're at the same company. And the responsibility for providing security for our cloud is shared responsibility. Everyone is responsible. Both cloud team and users of the cloud. We do need to provide custom images. It's not enough to provide just, build just provider images and be done. We need custom images for efficient operation. Can't really go and install hundreds of packages every time that I need to build a new virtual machine, so I build images that are customized for this specific project. But then, it's a huge overhead for everyone to maintain these images. Can you hear me, actually? Yeah, okay, good. Now, when new vulnerabilities are discovered, and they discovered very frequently, like just a couple of weeks ago, another vulnerability is discovered, and another vulnerability need to be fixed. And now, everyone is scrambling. Everyone is going and trying to page VMs and trying to page images. We have different needs in different areas of our cloud. In development, we need something that we cannot allow people to use in production sometimes. And all of that brings to the point that the responsibility is really shared. And all of these tools and processes, all of these governance need to be provided as a service to our users. If we don't provide it as a service, it means that users need to do it by themselves. Not the best thing in the private cloud. Now we go and try to figure out the governance model. And there are a couple of extremes that we can go into. We can go and not to do that much of a governance. The governance could be a little bit relaxed. And that's what typically happens when the cloud team is really focused on building developer cloud first. Cloud for developers. Cloud for the new applications to be built. Not a lot of attention paid to security and governance of that environment. But there are risks associated with it. One goes and starts building their own images. Every developer has preferred operating system to work with. And yes, I need that specific version of the operating system. So you start to see hundreds of different images doing different things. There is no established process for patching, for scanning. And very frequently developer VMs are just not scanned and not patched. The worst thing that happens, you put these applications on and now you have publicly accessible systems that are just vulnerable to very basic threats. You get all of the agility. Not that much of security. You can go to another extreme and been there, done that. The picture is not pretty when you go too far on the governance. And by the way, that's what you get when you focus on your production first. You get a lot of attention to security. You get really hardened systems. You have very few images that everyone is, that only those images users are allowed to use. No custom images. You don't get any root or pseudo access on that VM that you created. You run some very strict configuration management tools. And by the way, they overwrite everything you do that is not on the VM, which makes it extremely difficult to develop in that environment. Because it's too hardened. It's too difficult. It's impossible to work with. You get security, but you use agility on that. So how do we build system that would be not falling into the two locks or too strict? How do we build system that is balanced that can help us achieve agility with security? That's the question. And that's the approach that we're trying to take here at Symantec with Symantec Cloud. The components that we're trying to address for the governance of private images, sorry, for images in the private cloud. These are components. It's about defining clearly for different types of images who is responsible, who owns that image and who is responsible for maintaining this image. It's defining very clearly who can and cannot do what needs to be done with these images. Upload, download, add, delete, make it public. Need to define that extremely carefully. Capacity management. Well, cloud is there, but does it have capacity for storing everything that everyone puts in? There are different approaches for building images. And there are approaches that you can do it at Hock. You create an image, you snapshot it, make it available. You need to make some changes. You make the changes and make it available. Or you can build it in the right way. And we will talk about what the right way is. Honorability and patch management, configuration management, these aspects need to be addressed in order to make it secure and at the same time give people ability to work with your environment. So let's start with the different types of images. What types of images we are talking about? Let's start with the public images. Every cloud has provider images. One thing is very clear. The public images are only the images that we provide. As a semantic cloud, we build these images, we make it available to everyone in the cloud. And the only images that you see as a public image is images that we provide. Those are official images. Everyone can use them. These images are hardened. These images are scanned. They are secure. And we provide them to everyone to use. These are base images. There are also platform images. Images that we use in order to build our platform services. And people are more than welcome to use them. They are a little bit different in what is in that image. Richard will talk about this a bit later. And Richard will also talk about how we build and how to internally open source these images. Now, when we are talking about our public images, naming convention is very kind of basic thing. But we found very interestingly that our users were the first one to point to us that the naming convention that we picked at the beginning was not the most convenient one for that. At the beginning, we gave them images and we put the version or actually the date in the image name. We said, here, you know exactly when that image was built and whenever we publish the new image, we will give you image with the new name and you can switch. And they were like, no, we don't want to do that. Because that means that every time you publish new image, now we need to go and change our scripts and our integrations. We need to go and chase what is, figure out what is the latest image available. And yes, we need to use your latest image because that's the one that is the most secure one. So we changed that. Now our images are, have very simple name. It's just the name and the major version. You have base Ubuntu 14.04 or base and TOS 7. You don't have all of this complexity of the name and people can rely on that. On the other hand, we do store information about when the image was built and what happened with that image in the properties. Everyone can go and look at what's in that image, but it's no longer in the name. What it actually means also that whenever we roll out new image, we need to replace the previous one essentially. We no longer keep multiple copies, multiple versions of public images in our cloud. We keep one and it's the latest one. The previous one need to be archived. What does it mean archived? Oh, today what it means is we essentially take the public bit off. It becomes a private. We don't delete images. When you delete images that cause other problems. We keep all of the images that we created and we keep them at least until all of the VMs that were created with these images disappear. Well, terminated. Once all of the VMs gone, then we can go and delete the image and that's the cleanup process that need to happen because again at the end of the day you don't want to have hundreds and hundreds of images in your cloud even if they are private images, it's still over. Now let's talk about private images because we do have private images in the cloud. Everyone is allowed. Everyone can create private image and people can share these private images with the rest of the company. Well, not exactly with the rest of the company because the default sharing that we have in OpenStack is direct share with some other tenant. You want to share your image, you go and tell exactly whom you want to share it with. That's the shared images. Oh, here's the new capability that is coming. It's a community shared images. It's the images that you can actually advertise to the community. You can say, hey, I created this image. This is great image which has some really cool features that my cloud team didn't provide for me but I can provide it to the rest of the teams in the organization. The community sharing work actually coming in liberty. It didn't make it to killer. We do have it in our cloud development was completed on that but that's a different conversation. So this is coming in the next release actually. Community visibility. You can go and read the blueprint. It's actually very well thought through blueprint. It gives the capability that we really, really need. The important thing is though, even though these images are shared with the rest of the cloud, they are very explicitly not maintained by cloud, by semantic cloud team. So when something happens with that image, it's the responsibility of either the user to figure out what's wrong or responsibility of the image owner to help that user to fix the problem. Now, if they come to us and ask for help, we will help but we might need to ask them to reproduce that problem on the official image because, well, not always we can help with somebody else's image. Now, that also creates very different responsibility for patching. Now it's no longer responsibility of the cloud provider to patch the images when something, when the new vulnerability comes out. It's now responsibility of the image owner to remediate. And we define, but what we do, we define the SLA for remediation. If it's critical vulnerability, the SLA will be very clear how fast that image need to be fixed. And if it's not fixed within a defined SLA, it will be disabled. Well, negotiable sometimes. We know that sometimes the images cannot be just disabled because that can cause bigger problems. But generally understood that it's owner responsibility to fix it. Now, who can create images? Oh, if it's developer cloud, if it's developer class of service in your cloud, everyone can build and upload images to their own projects. And they can use this image and they can share this image with other tenants as the developers. But when we come to production, you need to have explicit role in production to be able to share images, to actually just start from uploading the image and sharing image in production. Because not everyone should be allowed to do this. And moreover, if you are trying to make image public, well, we don't allow people to make image public. That's the whole point. The only people who allow to make image public are cloud image administrators. That's you by the way, right? Yes. Okay. Just to be clear. Let's talk a bit about capacity management. Cloud is big, right? We all know that it's infinite capacity. Well, it's illusion of infinite capacity. Because in reality, it's always finite capacity. We only have so much space. We only have so much compute to run our workloads on. And that means that if we give everyone capability to build and upload and create images and create snapshots and store them, that can grow really fast. And we need to be able to manage that. How do we do it? Well, just like we define quota for number of VMs and number of cores and amount of RAM, we also define quota for storage that can be used for images and for snapshots. But what was also very interesting, what we found, was that defining quota is actually, it does help. But what helps even more is defining expiration. So rather than just blindly applying the quota, you can actually say that when you create that snapshot, it automatically expires after some time. And then it will be archived, cleaned, removed from the environment. Works really great with VMs. Works really great with snapshots. And yes, of course, you have to give people ability to go back and mark that one. Don't expire that one. I really need that. I know that I'm not supposed to necessarily have this don't ever expire. But at the same time, I have some VM here that provides some critical service. I really, really don't want that to expire automatically. Same goes with snapshots. Well, I build that service and I have that snapshot as a backup. I want that not to expire automatically. Have the capability. Also defining different policies. How you clean up. Do you want to clean up when you run out of space, when you run out of time? Maybe both. What are the priorities? Which images or which snapshots need to be deleted first? That requires a little bit more choice. And we do provide choice to our customers. We don't just log them into one way of doing things. You have choice. You can select whether we want to automatically delete when it expires or you want to let it run a little bit more if you have space. The important thing is that after a while it's still going to be deleted. And we reclaim the space that we need for other customers, other tenants. Well, another little tidbit on the capacity. People join the company. People leave company. People leave company constantly. And what you find, you find that you have terminated accounts and you need to figure out what to do with the resources that were owned by these people. That includes VMs, that includes networks, that includes images. Now, why it's important to address that explicitly? Well, if the guy was created an image and shared with the community and now it's available for everyone to use and he'll just left the company, who's responsible now for fixing that image if something goes wrong or if there is new vulnerability that needs to be addressed on that image? So the approach is very simple. Either somebody needs to take over that resource or it needs to be taken down, removed. You can't have resources in the environment, the images, VMs, anything that ownership is not clear. There could be an organized way of moving the owners, change the ownership. Or the guy before he leaves, he can transfer the ownership to someone else. Or there might be a way when the guy already left. Well, sometimes termination can be very quick. And then somebody else need to take over that account after the guy already left. So the notification will go out. The notification will go out to the guy, the girl, doesn't matter. It will go to the manager, to the project owner in which that image resides and to VM owners who depend on that image. Now, if it's unclaimed after a grace period, remove it. Don't leave it. Because that's actually a risk. It's not only just using the capacity. There is a security risk associated with uncontrolled or resources that are not owned by anyone in the company. Let me transfer it to you for how you do it. Yes. So we want to set up a continuous integration, continuous deployment system for our images. We want to have the image, the templates, stored in Git as well as the files, unique files that might be in the image, the list of packages, all those details, as well as the instructions essentially of how to build the image, all stored in Git, all put under a review system like Gerrit. And then images actually are used by a service account, not by the user themselves, the user interface to say, look, build me this image. And then the system goes and pulls it out of Git, builds the image and then actually uploads it to Glance. This way you can be assured that whatever images are there are guaranteed to have come from checked-in source code. So there's no, like, well, what actually went into this image? And then also, once you had that technology, you can actually put in all kinds of metadata with the image as the change list number or the branch that it came from, the time, and then the other details that might be interesting at some point down the track. So you can then also set up an automated build system where every night you build an image and you just say, well, we're using the same set of parameters, but just fetch from an up-street vendor, for instance, like if you're having an Ubuntu or CentOS image, say, just build me the new nightly image and then just upload that to Glance and that way that becomes available for use. And that way you can kind of keep on top of getting updates from vendors without actually having to do any work. It's the cheap way. For all new VMs, you get the latest updates. Then we want to do vulnerability and compliance scans for the image. And I'll talk about that later. So that's part of the pipeline. You also can do regression testing. So part of this pipeline is that an image is not actually made available until it's passed a number of stages. It's passed regression testing so you can see, well, does it boot? Can you log in the basics? And then you can extend that and say, well, there may be other things. You say, well, there's a whole set of demons that may be in the image. Can I check that they're all up and running correctly? Then you also have the compliance tests and the vulnerability scans. So basically these are all gatekeeping points. And an image is not actually available for use until they pass all those points. You can define policies on exactly how, you know, to what level an image has to sort of pass this level of goodness before you make it available to people. Then we can publish images and actually control the publication so that we can use that for essentially deployment. Because you don't want to have an event where you push an image. It passes your test, okay, but then there's some danger lurking in the image or some fatal bug that just wasn't caught during testing. And I think you all know that testing doesn't catch all the bugs. And then you make that available globally in your entire fleet. And then you have some other event that causes a widespread number of VMs to restart. They'll restart with a new image and, well, basically, you're out of luck. So then you want to tie the publication into canaries and different zones to control how quickly this thing may get out if you have like a fleet-wide VM restart or something like that. So setting up this workflow sort of hides a lot of that work from people. So then we use this for building our images. And then we can make this available as a service for our customers within Symantec to say, okay, if you want to build an image, you can use all these technologies and then get all these good things for free. So what do we do for our image qualification? So we run QOLUS. So that basically looks for known vulnerabilities. Essentially, it's a large database check. And we also run an internal compliance suite to see, basically, have you hardened this image? In many cases, it's just like, is it sufficiently annoying that it's past what we think is a well-hardened image? We think it's going to be secure. So again, these tests then sort of say, with the policies that you can define, you can say, okay, only if it's past, maybe it's like at least 80% secure because you don't actually get perfect security. But at least you say, we think there's no critical security vulnerabilities. There's only low-risk vulnerabilities. So this is good enough to be published. And so you can define that in your policies and then make that available and similarly for the compliance scans as well as regression testing. And so clearly you would not want to publish an image as public until it's past all these tests. So another thing you need to do is you have to continually rescan all the images just because an image was published two weeks ago and it was great then right now that same image, that same set of bits, the new vulnerabilities are being discovered and it's like, oh, okay, we need to patch that. So we need to know there's a problem. So we need to always be rescanning our entire repository of images that are made available, which is another reason why you have to do cleanup and otherwise you just have an ever-growing list of images to scan and it can take a while to scan. So once you find there's a critical vulnerability, then you mark the images as being vulnerable and then you have to start setting up notifications to people. You can present in the user interface the state of every image. So you can say, well, it's basically red, green, yellow, whatever you use for defining how good an image is. And then anyone who, both the image owner and anyone who has running a VM that uses that image needs to be notified that there may be a potential problem. Because you're doing scans and because the whole image building pipeline is fully automated and is actually run by a service account, you can attach to the metadata for the images, links to the vulnerability scans for that image. So if you need to drill down just how bad is this image? You get the email, you know that you're using this image. Well, how bad is it really for me? And then you can go look and say, okay, maybe it's not a problem. Or maybe I can live with it just a couple more days. We can summarize all the results and include those and you think we run audits on that. We basically defined an SLA and we talked about that. Essentially, you have a certain amount of time to fix an image that's deemed to be critically vulnerable. And so then we have a set of steps about who gets notified, image owners, VM owners. They get emails, they get nagged emails and so you kind of escalate to a certain point and at some point the hammer comes down and the images just get disabled. At the end of the day, we reserve the right to disable an image and disable VMs which we deem maybe a risk to the whole cloud. For configuration management, we basically have two flavors. We have the base image which just works out of the box. It boots up, networking works, you can log in, user accounts are set up through LDAP and then that's it, you're on your own for that image. You manage it yourself, you update packages. So that's great for people who kind of want to do it themselves. Then we have another flavor image which are the opinionated images where they're images that we actually use for our own platform services. So if we have load balancing services or database services, then we need images to run those VMs and these are the images we use. These are at the moment controlled by Puppet and so we have a whole list of packages that we think that this is what you need, a whole set of configuration and that might not work for some of our customers. So they have the choice. They can go bare bones, base, and do it themselves or they can take our images and we support both. And then through the community image feature, people can take base images and extend them and if they decide to take up the maintenance burden, the support role of helping their fellow employees, then they can do that as well. And that's basically questions. We have a little bit of time for questions. If your scan finds that an image is vulnerable and someone has a VM running off that image and they've already patched their own VM, are they still subject to having their VM taken down? So that's a good question and we didn't cover that but sort of implicit is that not just we have to scan images but you also have to start scanning VMs. We're not at that point yet and it's more expensive but yes, we'll have to do that and we'll have a similar sort of process. Really it doesn't make any difference whether it's the image that's vulnerable or some particular VM that's sort of drifted away from what's patched and good. So yes, at some point we'll have to take those ones down with notification. It's interesting because typically what you find is that the VMs are patched first. You find critical vulnerability, you go and patch the VMs. It's quite frequently that the images are being forgotten in that process and you end up with every single VM in the environment is patched and you have a lot of images that are sitting there not patched and you spin up new VM and you get back the vulnerability that you thought that you fixed already. So that's why we're kind of more focused on getting the images right. So for private image unless they subscribe to your full service how you actually know they are vulnerable or not? We scan the images. For private images you're referring to. So even for private image you also scan them because I thought the image is the only responsibility for these images. So while they're responsible for the content of the images to actually publish an image even for themselves it requires going through our pipeline and the pipeline runs all the checks. So for instance they couldn't publish an image that didn't pass the basic regression test. If it didn't boot they wouldn't be able to upload that to Glance because they don't have direct access to uploading. That's done through the service account that provides all the gatekeeping roles. Does that make sense? Hi. You mentioned Puppet at the end there. Can you say a little bit more about how you set up Puppet? Do you have one per tenant or is it managed by the cloud services team? It's in flux actually. We have one sort of puppet-based module that we're using for all our machines and there's a colleague here who's offered very graciously to redo all that. So we're kind of learning what's the best way of using Puppet and it's even possible we might not be using Puppet in the longer term. In the previous life I was using a system that actually pushed images not puppets and it actually worked very, very well. So I'm kind of biased towards that. I have another question. Do you have any particular tool recommendations in particular for image building and or image testing? Not the vulnerability one, the functional testing? Yes. So for image building at the moment we're using Packer. Actually I'm not too fond of that either. I just recently joined and I'm sort of inheriting a bunch of things. Well it's just really slow. It's like even to build a basic image it's like five minutes or something like that on a VM and you know I can do that in the 30 seconds with like Deb Bootstrap for instance and it's essentially the same operation. It's just you're not spinning up a VM and you're not like emulating human pushing buttons on an ISO installation CD and all that kind of stuff. I can understand why they've done it but I'm really hoping that we can find better ways of doing that. So but that's currently the tool we're using is Packer but that's just a plug-in. I'll be happy to rip that out and put something else that's much faster. And you had another question too I think about testing. Yes. So that's likely to be at this stage a framework that will write with plug-ins. Basically a framework that you can say okay look here we've got an image. I need to you know run it through some kind of testing or scanning regime. So you know automatically spin up VMs provide plug-ins they can put in regression testing modules a bunch of those we'll be writing ourselves and then also plug into our scanning services because it's most of it's the same problem right? It's like I've got an image I need to put it into a VM and explore it in various ways. I'll lower this. I think you usually kind of unscrew something there and it slides down or you can pull them out. That's better thank you. First of all I mean thank you very much you know very informative session appreciated. I had a question about auditing and compliance and if you can expand more about you know your experience with auditing and I'd be particularly interested if you were subject to any sort of you know compliance standard and if you can talk more about it and also you know if did you find what was your experience with the OpenStack Services log enough events that would help you from an auditing perspective or what are some of the use cases you ran into that would be great. On the auditing we absolutely are in scope for various compliance regulations and our friends at the semantic global security office provide us with all of the requirements that need to be met including the tools that need to be run as part of the scanning in order for us to meet the audit requirements. Now we also find that the default logs that OpenStack components produce are not sufficient in order to meet compliance requirements. However with the PiCAD or rather now it's keystone middleware, audit middleware that we have enabled on our services that produces the events that we can collect and store for the audit trail necessary for compliance needs. Okay thank you. I had a question regarding licensing of the software that's in the images. So what's stopping somebody from putting copyrighted material in an image and making it community. And also then though in a good use of this what does the semantic cloud have for kind of charging so that if you have licensed software being distributed in your image is there a way that you can be charging so that the owners of that license are getting their money for the instance hours based on that image? Very good question today. We don't charge our users for pretty much anything. It's private cloud. However it's a very good question that we need to start thinking about licensed software. We do have internal process at Symantec about controlling what software can and cannot be used both commercial as well as open source and we have the service team the teams who actually work on building the products within the Symantec kind of need to make sure that they don't use anything that is not allowed. But it's one of the services that we probably need to start with.