 Hi, everyone. Welcome to DEF CON. We're going to talk about something that has been talked about before, but not talked about in ways that I think have been as useful or as helpful as they should have been. We're going to look at the crash override event from 2016, but reevaluated from the perspective not just in terms of what happened, but based upon what was involved in the environment, what the adversary was actually trying to do, which is a protection focused attack on electric transmission. So, first off, who the hell am I? My name is Joe Sloick. I have an interesting background in that I do not have a computer science background or what not. I actually dropped out of philosophy graduate school at the University of Chicago way back in 2005. Since then, I've bounced around. I was a U.S. Naval officer going back to 2009, then went to the Department of Energy working at Los Alamos National Laboratory for a few years, and now I work at Dregos, which is an ICS security product company. We got some stuff set up over here. We can meet there after this if anyone wants to ask questions, but really throughout all of that transitioning from a more offensive-minded to a very much focused, defensive-minded role over the years. But enough about me, because you all don't care about me. You care about what we're talking about here today. So first, just as a point of background, we'll go over the crash override event and review, so we're all on the same page as far as that's concerned, and then talk a little bit about what process integrity means in an electric transmission environment within the context of protective relaying for electric operations. And based upon that, we'll reevaluate crash override as a protection-focused attack, focusing on protective relaying operations, and then what the implications of that event are for future defense and future attack or behaviors. So first off, crash override. So everyone knows more or less about the string of Ukraine attacks or whatever, just how often Russia has been making Ukrainians feel miserable going back many years now, including back-to-back electric disruption operations in 2015 and 2016. 2016 featured a more automated mechanism in order to enable that disruption through a piece of malware called crash override by Dregos and destroyer by ESET, because, God forbid, we all have the same fucking name for things. Yeah, so it was interesting because crash override really represented one of the few instances or so it seemed of an adversary completing the entire kill chain for an ICS attack operation. So what we had was a penetration of the ICS environment from Enterprise IT that then resulted in the adversary getting sustained controlled access to the control system environment, and from that access they were able to pivot out and gain the ability to implant malware on the SCADA environment, on the control systems for the actual transmission gear, and time it to cause breakers to open at a specified point in time to induce an outage. Paired with that, there was a wipe-insistence disabling event that was somewhat similar to what happened in 2015 that looked like it was designed to inhibit recovery and make recovery more difficult by wiping project files, among other items, and then rendering the machines unusable. And then finally, there was an attempt at a denial of service against the Siemens to protect protective relays in the environment, but no one really bothered to look at that in very much detail, which in hindsight was a very grievous mistake on our end. But didn't we already fucking talk about this like a couple years ago? And the answer is like, yeah, we kind of did. There was a black cat presentation on this a couple years ago when this happened, and it kind of provided the highlights or whatever, but we really didn't get too deep into this, which is why I've done a lot of work on this since then. I did a presentation at Byers Bulletin last year with a follow-up paper that went into the methodologies leading up to the deployment and execution of crash override that can provide you with some of the methodology behind how the attack was enabled. So if you're interested in this topic, I recommend reading that. But there's still a lot about this that, you know, this isn't something that we just closed the book on and we shouldn't have just closed the book on that. This is really a seminal event for electric utility operations in terms of cyber attack scenarios. Because while crash override is interesting, people sort of discounted it because like, well, it didn't seem like it was that much bigger of a deal in 2015. And there were some failures as we started to get access to actual data and malware samples involved in the event. So the event itself targeted hundreds of RTUs from manipulation, whereas it seemed like it only impacted a handful. Because it seemed like a lot of the communication that was deployed by the adversary in this case either ignored or was ignorant of specific implementations by the vendors that produced the targeted equipment in question, such as stateful protocol analysis and stateful protocol behavior, something that my colleague Dan Michauds-Soussi has presented on with another one, my colleague. So talk to him about that afterwards. And it all kind of resulted in a smaller impact. So you had a lot of press reporting about like, oh, like this is just a test or, you know, it's kind of like a damp squid. No big deal. And if we look at 2015 compared to 2016, we can see, you know, sort of overlaps. But again, like there's some fundamental differences between how these developed. So in 2015, we had very manual interaction with the control systems in question, where the adversaries harvested credentials, VNC'd into workstations, and then manually manipulated gear in the environment. And you can pull up the YouTube video of the Ukrainian operator freaking out and taking a video with his phone of the mouse moving and people opening breakers and whatnot. In 2016, this is a pretty big fucking deal. The interactions instead of being manual are encoded in the software itself. And the reason that's a big deal is because when you start doing that and do it properly, you can get a tax that can scale. That means that I can deploy, set and forget and do that across tens, hundreds of sites nearly simultaneously where I don't have to have one operator per operation, but rather one team deploying in advance for later execution. But then it came to attack impact. So in 2015, you know, disrupt electricity distribution across three companies in Ukraine and then inhibit recovery. 2016 went after electric transmission, which implicitly should be a much bigger impact. But as we'll see in a second, didn't actually amount that way, and then attempt to inhibit recovery and impact protection systems, but a lot of that didn't work out as intended. And so from a success standpoint, you know, in 2015 was kind of a bigger deal in that you had, you know, from a traditional battle damage assessment standpoint, more customers were impacted for a longer period of time than in 2016. So if you're comparing the two just any simple metric standpoint, it could look, it almost looks like 2016 was a regression from what 2015 was able to accomplish. But if we look at it that way, we're missing some of the point. Because in a lot of ways, there was silence in the post-attack scenario as far as what actually was achieved and what it meant. So while there was significant attention to the malware, you know, from a sexy reverse engineer standpoint, like, whoa, we got ICS malware, let's see what it does. And the immediate impact of like, oh, the lights went out in Ukraine in December again, there was very little exploration of what crash override really meant in terms of possibilities and intentions irrespective of what the adversary was able to achieve. And so that's what we're kind of here to talk about today. Because when we look at the post-attack comparison, things actually are quite distinct between what was attempted between the two. So in 2015, we had some asshole actions like modifying the firmware on serial to either the connections in order to disrupt communication and then deploying a fairly commodity wiper that used the Kilda's framework to wipe workstations and HMIs for the purposes of inhibiting recovery. So manipulate the distribution environment, wipe the workstations and back off and the Ukrainians will, you know, start buzzing around although they went to a manual restoration to get operations back online. And there's a chair on the aisle open right there in case anyone needs a seat. In 2016, things worked out a little bit differently because we had a file-in-service wiper on the impacted SCADA workstations. But in the way in which it was launched and in the greater context of the event itself, it wasn't so much that the adversaries were looking to inhibit recovery, but based upon the workstations that were targeted, it gets into a loss of view and a loss of control condition on the ICS monitoring environment, which we'll get into in a second. And then finally, an attempted protective relay of protective relay denial service. So I've said protective relay a number of times and unless you're a power engineer or working in utility, you're probably wondering, like, what the fuck is he talking about? So protective relays get into an aspect of process safety and protection and fundamental process integrity in industrial control environments. So process integrity, you'll hear people say things like availability is the most important part of the CIA triangle in ICS. That's bullshit and I will happily arm wrestle or argue with people out in the hallway about this one because you can't operate, no sane operator will operate, generate, produce, whatever if they can't verify the integrity and safety of their operations. Because when you start talking about what it is that's actually existing in my environment, we're talking about validating outputs so that the widgets are actually coming out the way that they're supposed to. We're talking about long-term operational stability that I know that I'm going to fall within scheduled maintenance windows and that process wear and tear is going to be around what I'm expecting. But most importantly of all is safety, that I know that I'm not going to over pressurize a line or overcurrent a line in my electric distribution or transmission environment so that I can maintain the overall integrity of that operational environment and either not cost myself a lot of money or result in potentially killing someone. Because the thing is is that as we look at the popular conception of ICS attacks and what gets posted in crappy New York Times or Bloomberg articles is that everyone wants to look at an ICS attack is like you turn off the power or you blow off the plan through you destroy centrifuges and I'm going to have a much deeper dive into all this coming up in October for those that are really interested in this topic because the thing is is that that's like amateur hours shit. What you really are after for a very potent cyber attack in an ICS for Elm is a more subtle effect where you're undermining operator integrity and understanding of what's going on in the control system environment leading to potential catastrophic and difficult to diagnose and fix outcomes. So degrading the process in a hard to diagnose condition. You know why are my widgets not coming out like they're supposed to anymore or why are the centrifuges failing at an increased rate? You know how do you investigate that if you've had a environment lacking integrity and monitoring or inducing defects or lack of reliability in the environment or as we saw in an attempt in 2017 the Trice's Triton event targeting process safety within the control system environment. So how does that relate to electric transmission operations? Because we don't have spinning turbines or you know oil and gas flaring or whatever well but we do have lots of power flowing over lines and one of the major sort of revolutions in electric operations over the last 20 30 years is the introduction of digital protective relays which are able to autonomously and at machine speeds monitor and control electric transmission operations to prevent things like or prevent a fault in the electric line due to overcurrent phase out of whack or other sorts of scenarios popping up from resulting in physical damage or potential cascading outages as a result of propagating defects within the grid environment. So if we look at what the implementations the implications are of targeting protective relays as part of overall electric operations and primarily at a transmission or generation level is that we can create a hazardous situation for personnel and equipment you can burn through transformers because you have too much power flowing or cause lines to sag because of that overcurrent situation resulting in a fault when it hits a tree. You laugh about this but that's exactly how the 2003 Italian and northeastern blackouts happened was lines were overloaded sagged a little bit hit a tree and then cascaded on from there when individual stations started going to self-protection mode. But you could also induce along those lines islanding among affected substations as every single substation starts entering into a man-woman or electric substation for themselves scenario to try and protect local assets but then cutting themselves off from the rest of the grid resulting in more load being shed onto the remaining infrastructure which then results in a you know even more stress on them producing potential situations where another fault increases and then cascading onward at that point. Or the really scary scenario if you start getting into protective relays a generation is that manipulation of protective relays from a generation environment starts getting you into a spot where you can induce an aurora-like effect on generating equipment. So aurora was a manipulation in frequency relative to electric generation operations that results in a generator rocking back and forth between frequency extremes inducing physical damage. Again these are topics that we can talk hours on elsewhere so we can go into detail later. But in looking at crash override then well okay we actually had a far more complex attack attempted to be executed than what we observed within the environment because what we had was like okay we had an outage but if you look at the malware itself the outage was intended to be across the entirety of the victim transmission station so the goal was to create a widespread transmission outage wiping transmission completely from the targeted substation. After that we had a attack on the state of the workstations that were controlling the field devices in question not just to wipe them to make recovery more difficult because from 2015 the Russians or whoever attacked the substation knew that the ukrainian operators would go out to the field to restore operations. So wiping the workstations wasn't going to inhibit recovery because they knew what the ukrainians would do. Instead in this case the wiper wasn't designed to make recovery harder it was designed to prevent the operators in Ukraine from accurately gauging and understanding the state of the system at the time of the outage while they were rushing to rushing to repair it. Thus right after you have that wiping operation take place the semen super tech denial of service that was criminally ignored in past analysis becomes very important because the way in which the super tech denial of service works is that it is a semi-legitimate function designed to place the victim super tech into a firmware update mode so you clear underlying logic from the device but it still appears from physical observation as powered on network accessible and for all intents and purposes functioning if you don't actually have you into the device itself which of all of your workstations are no longer accessible you don't. So we have this removal of transmission protection on a de-energized line so the immediate effect is nothing like I have no power flowing through this line I don't need protection big fucking deal but that loss of you makes it difficult to ascertain that that protection is lost because in that rush then to restore operations at the transmission site by flipping those switches in the breaker yard now you create the conditions for potentially hazardous effects on an unprotected line that you're re-energizing. An immediate potential impact of that would be an overcurrent situation that is as the line is restored from a massive transmission outage across multiple sites that that line becomes so overloaded to produce either a SAG event that results in a fault as it connects with some something that's grounded or overloading transformers within the transmission station that produces physical damage or increased wear and tear on the device leading up to physical damage. On a longer term because there's a as we'll get into this you know many other things in play here you at least create the condition that any other subsequent fault in the line while the Ukrainians were working on restoring visibility into the environment results in similar situations even if you don't get that immediate overcurrent scenario. But for all of this mistakes were made the attackers in this case were neither brilliant nor were they completely as successful as they wanted to be because among other items and we hinted at this already is that there was improper protocol communication involved within the event itself. So hundreds of devices were targeted but few are impacted because the actual specific protocol implementation used in a lot of these cases did not align with what was designed in the malware. So as much as we want to say oh scary modular malware targeting lots of systems quite frankly the operators fucked it up and how they designed it either because they lacked the proper testing equipment or testing on the wrong equipment or didn't do proper Q and A on what they were deploying the environment including ignoring things like stateful communication requirements for specific protocols or ignoring the distinction between general protocol specs for like IEC 104, IEC 61-850 versus vendor specific implementations of those specifications. But also the C-Protect in our left service implementation was really interesting because they got it right. The denial of service as it was coded would have worked however the malware included hard-coded addressing for the four C-Protects within the victim environment. In the process of designing the communication from the DOS.exe which is what it was named in the environment executable to the victim C-Protects someone forgot ndness for IP address socket creation and so as a result the IPs were read in backwards and since it was hard coded there was nothing the operators could do at the time of deployment. So while we had and this is an incomplete section of the UDP packet that puts the C-Protect into a legacy C-Protect this has been patched into a firmware update mode this is what we get if we look at it in Wireshark so you know 172.16.something.something so yes your APTs are humans too never forget that but again you know while we can laugh at the lack of success or what failed to occur or whatever as we start looking at this in greater detail based upon what was in the environment and what it was designed to do we get a really scary scenario that becomes much more complex and much more worrying than what we saw in 2015 because now we get in situations where we go beyond just trying to turn off the lights for a couple of hours to like no I want to induce physical damage at the transmission at the transmission substation to cause an outage lasting days weeks months maybe even longer depending on how quickly you can replace the gear in question and not only that because of the potential nature of how this environment was laid out also creating the possibility that you know Yvgeny or Yuri or whoever the poor bastard is that goes off to the yard to start closing breakers manually puts himself in a dangerous situation when something arcs and whatnot when he connects it where everything's flowing through the one line after hundreds have been disabled there are some caveats to this though and this is important to note and why everyone talks about the scary scenarios of the ICS but at the same time it's really hard to pull this off effectively as shown in this instance because they fucked it up so for one it's impossible to completely know what electrum the adversary involved here intended apps and effects these are all very educated guesses based upon looking at log data and actual malware samples in the environment so caveat like this may not actually be how things were designed but you know based upon everything that was designed and the attack flow and sequence of events this is my professional assessment on what was intended in the environment but aside from that I think a lot of people lose sight of or ignore the fact that SCADA ICS is not something in a vacuum like yes we have an increasingly digitized landscape as far as how processes are monitored controlled as well as for safety implications but there's also fundamental physical safeguards layered on top of that that because engineers are a lot better at what they do than we are trying to understand what they do have put in place over years of experience to make sure that such situations like this if they don't necessarily fail at least manifest themselves in a way that's controllable or reasonably safe so other redundancies within the transmission yard and within transmission operations may have prevented some of the cascading effects that were sought by the crash override event thus the same scenario depending upon you know exactly what equipment you have in your environment what physical safeguards or backup relaying technologies exist could play out dramatically differently if you did this in Ohio or if you did it in Germany than if you did it in Ukraine and settle that though you know it is important to recognize that while we can say that all these redundancies exist we see multiple examples of just squirrels poor tree trimming lightning high wind etc causing massive outages like the 23 the two events in 2003 the 2006 Denmark Sweden outage and other large-scale power events where all of these systems are in play and no one's maliciously trying to manipulate everything and yet shit goes haywire okay so a future consider really I don't have two minutes left oh shit okay so a future considerations on this is that we're seeing integrity attacks on the rise we've gone from stuck step to crash override to trices is these things have rolled out over time and the possibilities include various ways of manipulating manufacturing electric generation distribution and oil and gas production because from an electric utility standpoint we get various scenarios that play out across safety integrity and reliability within the electric grid that are both hard to diagnose and unless you have proper monitoring and defenses in place hard to defend against so some examples of attacks I could go into explanations of these further I thought I had 30 minutes not 20 by bad but looking at this from a defense and detection scenario the important thing is is that these are not attacks that we're going to identify or defend against just by throwing some blinky box in the environment and hoping that it catches bad shit because it looks weird instead we need mechanisms to align traditional sort of IT based visibility increased host based visibility into increasingly digitized environments and fundamental knowledge about both how the process is operating what state it's in in order to be successful so for the crash override event done thank you Sydney great I can pontificate on this a little bit more so if you look at crash override and how this played out and what the state of the environment was at the time of reconnection is that the reason why this attack is enabled is that the operators in question lack the visibility and the knowledge about the state of their industrial environment allowing them to make or enabling them to make intelligent decisions about how to safely restore it the lack of visibility induced by the attack on the engineering workstations in the state of environment combined with the massive outage and the loss of protection meant that the operators were then put into position where they're rushing to restore operations to get back at availability without understanding what their process integrity was because their ability to do so had been inhibited by the attack so the way to get around that is that from a security monitoring and defense perspective we need to start taking a whole of you approach to how industrial environment environments operate by combining that security viewpoint with process viewpoint so if in this environment not trying to victim shame hero whatever but if they've been doing things differently or better detecting that they had seen a significant IT based event within their environment then coupled with a control system environment and then even just having the knowledge that huh and why do I have this one of communication to my supertext at this point in time should have flagged to the operators that okay I don't know the status of what the relay art is in right now I don't know whether or not you know someone communicated to my relays I don't know for what end maybe I want to make sure that you've getting as he's driving out to pick up to the transmission site holds off a little bit until we've got a better idea of what's going on so this really gets into the question of you know my fundamental cyber hygiene for industrial control systems is really a combination of two things visibility and root cause analysis so visibility is that you cannot examine you cannot debug or otherwise fix what you cannot see so layering in multiple types of visibility within these environments as well it sounds hard and potentially expensive is the only way we get around these issues thus crash override took advantage of limited IT visibility in a damage or degraded ICS environment to enable a potential protection attack a scenario but then correlating those events is neither easy nor typical either because in many unfortunate cases we're often operating where our engineers and our IT personnel live in separate worlds someone's from Venus someone's from Mars y'all can figure out who's who but the sides don't talk together or work together enough with each other to make sure that the combination of seeing communication to the relays with an IT based intrusion get combined in a satisfactory way to make sure that operators are enabled to make intelligent decisions and what's going on and finally from an analysis standpoint it's getting to the point of understanding and being able to go one level further from like okay I have a massive transmission outage why what data do I have available to make subsequent decisions off of this how do I gather that data how do I perform forensics how do I perform this correlation do I have the people trained and able to make these sorts of decisions within this environment and it's only by addressing these three fundamental mechanisms that we're going to get into a scenario where complex hard to diagnose but also hard to act properly execute events can be properly defended against and responded to so I don't know if this particular slide deck will be made available but there's some resources there and citations as a note I will have a white paper that goes across all of this instead of me like yelling at you very quickly for 20 25 minutes or whatever while you're all trying to break make it out it'll be available on the drago's website on Thursday I think we'll see I think I need to check on that but it goes into all of this in greater detail with more background on protective relaying operations and laying out the exact attack scenario some more so if you're really interested in this topic recommend looking at that when it comes out after everyone's recovered from their DEF CON so with that I do have time for questions because Sydney can't tell time but um yeah uh what's going on either no one understood what I said or oh what's up ah yes so the asterix there because while the interactions were encoded in malware they weren't properly encoded in all the malware so this is one of the things and this is where differentiating between a lab environment or especially for example you look at software suites that can emulate ics communications in software so you don't have to go out and buy or buy that ebay actual like abb or sel equipment to test on they don't always function or process traffic in the same way as the actual gear will out in the field so while everything may have looked hunky dory out in the lab in Moscow wherever crash override was put together how it was actually worked within the environment was not appropriate thus leading to a much smaller impact than what was desired so yep thank you for a very good speech yes uh but um 2015 2016 crash override uh you mentioned several times that there are some attitudes that they have were executed uh what's the motivation behind it to make clearly it comes more or less from russia as this uh i think Ukraine yep but have they been holding back because we're not seeing anything else or aren't there no more existing penetrations in other places that's getting ready for action that's a very interesting question if you attended my sky talk earlier this week you know here okay so if you start looking at things from that perspective that uh so the question was um you know yeah this seemed like there was an aptitude didn't work or limited impact or whatever have we seen anything since or where what is the state of this sort of event now and my professional take on this is that if you would or pay attention to the news from 2017 onward you've seen sustained uh very deliberate effects to build up access and knowledge of the electric utility op environments in the united states the the the united kingdom germany and a few other places and based on u.s. government analysis it ties back to russian state interests yet we haven't seen a crash override in iowa or birmingham or you know whoever else in the world why is that well i think what we're seeing is that while we had ukraine represents itself as a permissive environment for operations like unfortunately for the ukrainians no one's coming to their defense or saying to the russians like you guys can't do that anymore there'll be consequences that sucks for them it's good for us though on our side because if someone were to do a massive blackout with physical damage that could be traced back to a cyber attack in new york city i'd like to think there'd be consequences but what we are seeing though are adversaries developing all the prerequisites necessary in terms of access and environment environmental knowledge that at a time in the future of their choosing if that access is maintained and that knowledge is accurate that's something like a working crash override could be developed deployed in those environments to facilitate an attack at a time of their choosing so that is an entirely like we can talk at the bar for a couple hours on this one or whatever that's a fun subject in my mind so i hope that answers someone of your question but yeah we haven't seen people go away we just haven't seen people go complete the kill chain to a final disruptive effect okay anything else yep how deliberate do you think it is that they so the way it was designed you know it left them in a state where they weren't careful they couldn't inflict damage on themselves yep maybe it's kind of their own fault but why are you hitting yourself clear yep do you think that that was very deliberate or i don't know okay if i understand the question for everyone in the room it's it seems like the attack was designed so that it was almost like they were impacting themselves as a result of deploying this and i think that was very intentional because based upon lessons learned from the 2015 event the adversaries knew that in an emergency operation that the utilities would work as quickly as possible to restore operations no matter what was involved and by inducing a loss of view and loss of control at a logical level it enabled them to take advantage of a degraded environment so that in the rush to follow those procedures unintended and unforeseen side effects would manifest themselves undermining the fundamental integrity and safety of the victim environment so it's that's why it's like making this happen requires you know sort of a lot of moving parts and understanding not just the technical side of how the utility operates but also sort of a softer side of like what are their ttps or their policies and procedures for restoration and how can i make sure that the effect i'm delivering manifests itself in the right way in light of what those procedures are so that's why in ics attack everyone's like oh look i found this vulnerability and shit or whatever i'm gonna throw zero days at this like okay cool story bro that's not how this shit works in reality trying to take down an ics system requires a fundamental knowledge and understanding of the industrial processing question to get anything beyond a short term disruptive effect unless you get lucky so that's i think i'm out of time at this point someone's gonna tackle me so i'll be hanging out like right over here and bouncing around the ics village if anyone wants to talk about this on the side more than happy to do so especially if you bring one of these so uh yeah just let me know look for that white paper and uh thanks everyone for listening to me