 It's Halloween. The maintainers of OpenSSL, one of the most widely used software libraries on the internet, have just announced a critical vulnerability. There's rumors and information about this on security blogs. Everybody in security circles seems to be talking about it, but we don't have any hard facts yet. And in my mind, I'm trying to figure out what's going to happen next. It's pretty uncommon for OpenSSL to have an issue like this. It's an example of why we encourage you not to roll your own crypto. They're just less likely to make mistakes than most of us are. And as I'm trying to figure out what's going to happen next, it's bringing me back to the last time I dealt with a serious OpenSSL vulnerability. Heartbleed was one of the most iconic vulnerabilities of my career. And at the time that this happened years and years ago, I was building intrusion detection systems in a windowless sock in a SIGINT station back home. And I remember the gravity of finding out that millions of servers could have their private keys leaked, their secrets decoded, and their identities stolen. And as I'm going over the information that I'm seeing in these rumors, I'm trying to think, is this one going to be like that one? So at work, we've got a team ready. I'm checking Slack. I've got other stuff that I'm supposed to be doing, but I can't help myself. I'm just looking at ThreadIntel and these news feeds. And I've got what I assure you is a very reasonable number of tabs open. And I'm just refreshing the site over and over and over again. And then the site goes down. It just couldn't deal with the hordes of traffic from so many people trying to find out what's going to happen next. I switch to another tab. Finally, the announcement comes. I'm thinking to myself as I read the description, what's vulnerable? How can an attacker control this? What could even fit in an instruction that size? I'm starting to realize after all of this work and all of this worry that it's a dud. So how is your Halloween? How many people here were involved in this? Well, when we started looking at this, we're in the dark. But we can shed some light on it now. It actually started way back in September of 2021 when a new major version of OpenSSL came out. A great presenter once told me never to show code during a talk. So I'm sorry for this. That's probably a capital crime. Just bear with me. So this is a bug in a puny code decoder. Now puny code is a way of taking stuff that's not in the Latin character set or not English and making it sort of English-ish. It made it more familiar and easier to deal with for the people who developed this software. It decodes text. It moves it to a new place in memory. And on the highlighted line, we can see that if it's written more than the maximum it was supposed to, then stop. Otherwise, keep going. This is a pretty common type of bug called an off by one error. And in other words, it says don't stop until we've moved one more chunk of memory than we should have. Oops. So more than a year passes. It's now October of 2022. And the vulnerability is first discovered. My team and the public don't know anything about it at this time, but our omniscient narrator does so we can go take a look at it. This is taking a chunk of memory, and in this case, one chunk too much, four byte unsigned integer. And it's moving it into another place. Now, it's supposed to fit in that decoded buffer, but it's taking up just a little bit too much space. This is called a buffer overflow. So if the buffer overflows into instructions that the processor will later dutifully execute, then we have what's called a remote code execution, or an RCE. On the other hand, if the overflow is used by something else later, but it just reads that memory, it's probably just going to cause it to crash, and we call this a denial of service, or DOS. There's an additional failure mode for DOS that they found out a little bit after this, where you can actually fill that payload with a whole bunch of dots, and that makes it way more likely that you can exploit this, but, well, you're only overflowing it with dots. So there's no risk of RCE. You only have a higher likelihood of DOS. And again, we didn't know any of this at the time. The very first that anyone heard of it was on October 25th, the OpenSSL project team announces a new update. They say it's a security update. It's the highest severity level, which is critical. And right from the start, we had some issues. There were some confounding factors that made this harder to deal with. For one, they announced it without telling any CVE numbers or a name. There was a sort of unofficial nickname, but people were reluctant to use it. And so everyone that I was coordinating with was just calling it the OpenSSL update, which made it difficult to communicate because as it turns out, on the exact same day, they were releasing another OpenSSL update for a different version. But it was totally unrelated. There was nothing whatsoever to do with this vulnerability. A few days after the announcement, things really started to pick up steam. We'd kicked off incident response, queued up work teams, started asset inventory and dependency checks. We got our first notifications from paid vendors and this activity and this work and this effort that's going into it keeps increasing and increasing and increasing until we finally see the peak anxiety the morning after Halloween. At this time, there were conversations on social media. I'm seeing a lot about it on Twitter. Well, Mastodon was mostly silent as was the custom at the time, but people were panicked elsewhere. And I think this comment on a security blog actually captured the zeitgeist. This person described it as nasty, nasty, nasty. Of course, this is before we have any information about what it is or what it does. So November 1st comes around. Finally, we get the details. It's downgraded to lower severity and we're getting a good look for the very first time. We can start to map it out and figure out the defense. Now, most of our defense actually happens to the left or to the right of this, but we want to move as much as possible to the left for next time. After the details are out in the open, we can take a look at what it's going to do. We can map it out and see how this works. This is simplification, but it gets the gist of it. An attacker can take a poison certificate. It gets decoded, it overflows. If it puts a nasty instruction on the stack, we have RC. If it puts a little chunk or a whole bunch of dots somewhere else when we maybe have DOS. So we've got a roadmap for our vulnerability, but we still need to know what's vulnerable. And I think this is one of the most important parts. For us at Shopify, we had thousands of images, hundreds of thousands of pods and thousands of people. And we have to find out who owns these, what's in them. We need an asset inventory and we need to know what those assets are made of. So has anyone heard anything about S-bombs lately? Just me, okay. Is anyone sick of hearing about S-bombs lately? Yeah, you're in the wrong room. So I think the reason that there's this sort of fatigue around S-bombs is that you put all this effort into creating this registry of what your artifacts contain and you want to know what these S-bombs can do for you. So S-bombs aren't new. They popped up eons ago. Originally I think mostly for tracking licensing, but here is what S-bombs have done for me lately. Before the vulnerability was even described, we had this query shared by my colleague that was able to tell us exactly what was affected and exactly where it was running. And this is, as you can see on October 31st, before we really know the details of the vulnerability. So this is what S-bombs have done for me lately. We saw the scary red map of badness and now that we know what we have running this, we can try to figure out our mitigations. As it turns out, you need to be running open SSL and there are a lot of other alternatives now like boring SSL. You need to be running version three and there were still lower major versions being maintained and very commonly used and so those weren't affected at all. You need to convince something to decode it which as it turns out for servers comes after chain validation. So you would have to already trust a poison certificate before you even start processing this. You also only get that four byte overflow and to use memory in most cases which depending on how the compiler lays it out, it's possible that something bad could happen but it's very unlikely. I haven't seen this exploited in the wild at all and even if you've got those dots it just crashes the program. It's still a heavy cost for the attacker to get it to process this and so we can just add more compute power temporarily until we can block it. So I saw that some of you said you worked on this. How many people at your company were working on this? For anyone here was it more than 10? More than 20? More than 40? Yeah, the impact of this misalignment is in the time and effort spent on it. Does anyone have a truly unlimited budget or unlimited headcount, unlimited time? Even if you're not constrained by your policies there are intrinsic limits. You can only interview and select and hire an onboard so fast and I think we can all agree that the fiscal outlook for tech in 2023 doesn't look like it did a couple of years ago. More companies are expecting us to be more deliberate. We need to justify our expenditures more and more and I think that's what matters most to corporations and investors but this is actually what I care more about. There's a more significant risk and that is the time and effort spent on that could have been spent on automating things that are likely to be mistaken by developers and leading to security misconfigurations. We could be building out detection for a more severe vulnerability or we could be working on outreach. When we see these things we need to use the available info. We need to weigh the opportunity cost against the risk of compromise. What's the business impact? What's the user impact? In our case that's millions of merchants who depend on us for their livelihood and of course there's user privacy to consider. So this is why we had dozens of people working on this when it was first announced and this is why I would urge you instead of just saying this is a waste of time and try to yolo it next time try to prepare ahead of time so that the effort comes before rather than after. And just to be very clear I don't hate open SSL at all. I think they actually did a really good job on this. They announced the patch very quickly after it was reported on its own. At first it seemed like it could cause an RCE and they responded to new information by dropping the severity to high. If you were frustrated by the way that this was handled I would urge you to consider what's happened in the past when other vendors have minimized the impact of a vulnerability after it was discovered or reported and how disastrous the consequences from those were. The most important thing if we want to reduce the effort after these things come out is to have an upgrade ritual at a regular cadence. I like automation. Something like Dependabot is great here. You need to have exception lists because there are going to be things that you aren't able to upgrade. Review those often and include the stakeholders in that process so that they are continually justifying the existence of this out-of-date software. Track the time since the last update for all of your assets. If you can do that, you can look for the outliers and you know exactly where the pain points are going to be next time. Then map those special mitigations for the exceptions. I really like old software at home. I do not like old software at work. In a previous role, I was consulting for a company that had an electron microscope and it only had drivers for Windows NT. I hated that machine. But there was no way I was going to convince this client to get rid of it. It had proprietary drivers, the company that made them was out of business and they're not going to throw away a perfectly good electron microscope because Shane told them to. You might have a different example. Maybe your boss insists on reliving their glory days on the slopes of Ski Free or playing Chip's Challenge. Maybe they insist that Windows Entertainment package just isn't the same unless it's running on a 486 with Windows 3. As security practitioners, we should try our very best to get people to do the best thing. Sometimes we can only convince them to do something. Like, hey, maybe get that machine off of the company network. Track what they're doing and why. And usually I would encourage you to do detection for any vulnerability that comes out. I still think that's a good idea and my go-to is usually Falco. In this case, Falco is great, but it's only going to read the first 80 bytes of a buffer by default. You can modify the snap length with an option to change that, but it's probably not worth the additional overhead in this case. Instead, let Falco look for other suspicious activity from as yet undiscovered or existing vulnerabilities, just not this one. If you've got some sensitive system that you can't upgrade and you really need to mitigate this threat or you're getting hammered with a DDoS from those dots we talked about earlier, you can set up a Suricata IPS. You can use another EVPF detection tool that's geared towards network traffic sniffing or just use Fluent D or something like it to aggregate logs from affected systems and then alert on that poison payload when it's detected. We saw this timeline and we can replace the dates on here with different ones for just about any vulnerability and we're going to over and over and over again. There are definitely things that we'll need to do again on the right side of our timeline, but if we consider the inglorious work on the left side, it is much more impactful in addressing these vulnerabilities and it has the greatest impact on security. So if we want to be prepared for the next threat, whether it's nasty, nasty, nasty, as one commenter said or another dud, we can't blame maintainers for not predicting the impact on our environments right from the start. We can't presume to be some sacred group that just issues edicts to developers telling them what to do and we certainly can't treat every new risk as a sensational surprise. What we can do though is we can map and threat model our environment, we can build bridges with builders and encourage developers to follow best practices by working with them to do so. We can start preparing right now. If we do that, we can uphold our responsibility to our employers, to our stakeholders, our customers and the community. We can protect their privacy, we can earn their trust and we can be responsible with their resources. Maybe, just maybe, we can even preserve our conference budgets.