 Greetings! My name is Stefan Stevenson-Mo, and today I'm going to be discussing how to detect attackers on your ICS network using your own measurements using a technique known as state estimation. A little bit about me. I have eight years experience more or less in infosec. I started off my career with a power utility. It's a big one in the South. I'm educated as a mechanical engineer, but I've taught myself pretty much everything I know in infosec, and I currently work as a penetration tester for coal fire federal. A little bit of background on ICS attacks. So attacks on critical infrastructure are a thing now, as I'm sure everyone is aware. As control systems continue to adapt more common technology that are used in IT, they open themselves up for all the vulnerabilities that normal IT systems are prone to. The big difference is these systems are hooked up to massive machines and things that can kill people and cause major disruptions to society. So that's what we'll be talking about today. For the purposes of this talk, I'm going to be defining attacks into two types. One, hard and fast, which we will not really be discussing. I think DDoS, ransomware, things that are noisy, things that cause a lot of chaos suddenly and are very obvious. Instead, we're going to be talking about low and slow attacks. So I think Trojan, backdoors, anything Stuxnet, anything where the attacker is trying to be stealthy and remain undetected. So traditional intrusion detection. You've got your IDSs, your firewall, your anti-virus, DDoS mitigation, that sort of thing. Those are all things that are used commonly in IT, but most utilities in my experience stop there pretty much. When it comes to a lot of these security products, their focus is on stopping them at the IT network before they can pivot onto the network, or just making it impossible to pivot onto the network by being disciplined about strict segmentation between the two networks, so they can't get through. The shortcomings with these products and this approach is that it doesn't always work, as we all know. It doesn't really embrace the defense and depth strategy. Most technologies, and problem with most of these technologies is OT technologies like RTUs, PLCs, LatoLogic controllers. You're not able to put a lot of these technologies onto these networks. One reason is because they disrupt network performance and most engineers won't let you put them on there because they'll mess with things like reliability, latencies, jitter on the network. Even if you could put them on, regulators like NERC will often certify systems together as a unit. Any kind of change like adding an IDS or changing an antivirus signature could cause your system to require being recertified. Is there an answer to this? Is there really any better way? Well, I think there is. It's physics. The basic idea behind it is that the laws of physics don't change and neither do most ICS systems. They're pretty much static. They change very little over time. Power systems, once they're set up, the actual physics behind how they work changes very little and often the systems themselves change very little. So it gives you, as a security professional, the opportunity to look more at a whitelisting model, which I know many people cringe when they hear that. But in this case it's kind of appropriate because a lot of these systems are very static. So state estimation, you can think of as sort of a numerical whitelisting model. So a little bit of background on state estimation. It was a technique developed in the power industry originally around the 1960s. It was not developed for security because no one was thinking about that back then, but instead is more of a way of ensuring reliability on the grid and preventing future blackouts by detecting garbage data more quickly. But for given certain circumstances, it gives us the chance to look at it as a security technique. So the basic idea behind state estimation is you take raw measurements from your sensors. You have a state estimator that you run it through that gives you state variables, which the state variables feed into your physics model, which is similar to the state estimator, to give you measurements that you would have predicted. You then subtract those from the raw measurements that you got and plug the output into a chi-squared distribution function, which will give you a probability that your measurements are either garbage or being tampered with. So there's no way around it. This talk is very math heavy. I had to learn everything about InfoSec on my own, so InfoSec people can learn some math now. My chance for revenge. In more mathematical terms, we basically have a bunch of measurements, which we give by the vector z, and we are trying to find state variables x that we can describe those z's in terms of. For this technique to work because it is a numerical method, we need more z's than x's. If the sizes are equal or less, then this doesn't work because our system is either fully constrained or under constrained, and we can't make a prediction on whether or not our data is good or bad. So to put that another way, we compute a system of functions, h of x, which is a whole bunch of functions that are comprised of our state variables that give us our measurements. Since this is we have more measurements than we do variables, we have n plus m equations, and m equations will have more equations than unknowns, which means that there are more than one solutions to this, which means this becomes an optimization problem. So for the purposes of this talk, we're going to use the least squares method. There are other ways of solving this as an optimization problem, but least squares is the most accurate. So our goal for least squares is to minimize j, which is nothing more than the sum of the squares of the differences between the expected measurement and the actual measurement. Oftentimes we'll put some kind of weight onto the measurements that's basically a description of how accurate it is. It's given by one over sigma squared, and sigma being the standard deviation of the measurement. So the way we minimize j is the way we minimize anything else, by setting the derivative with respect to the state variables equal to zero. Now because we're taking a derivative, we'll also need the derivative of h, or the Jacobian, since it's a matrix which we do nothing more by just taking the partial derivatives with respect to each of our x variables. I'm not going to go through the entire derivation because for the purposes of time, but essentially what you end up with is an iterative method where you have an initial value that you guess subtracted by some delta, and that gives you a next value, x1, which you then feed back into your x0 value to repeat the algorithm over and over again until your residuals end up being next to zero. So once you figure out j, once you figure out j, you then figure out the probability that the sum of the least squares accurately reflects the data you're getting. And this you get by putting in p for zeta, your probability for zeta, and your degrees of freedom here. So obviously the more degrees of freedom as you can see by this, the cleaner an answer you get. As you can see where you only have one degree of freedom, this doesn't really tell you a whole lot right here. Whereas if you have 10, you get a much cleaner solution with your most likely case being either close to zero or close to one. So you take one minus that probability which will give you your likelihood that the measurements have been tampered with. So a result of one means that your measurements are perfect and zero means that there's no way those measurements can be accurate. But obviously for this to work, we want lots of accurate measurements with low standard deviations. So some basics about power because we're talking about this in the context of power because it was invented for power systems. We primarily work in phasors. We don't really work very much in the time domain because it makes the math a whole lot easier. But otherwise you'd have to work with complex signs and cosigns and stuff like that which is really messy. You can basically think of a phaser as a rotating vector that rotates counterclockwise and has a offset and a magnitude essentially from other vectors. So as you can see Drake says use phasors don't work in the time domain because it makes your math a whole lot easier rather than having to figure out voltage drops across inductors and capacitors using a lot of complicated calculus. So that does have consequences though for our math. We can no longer use the simple equation power equals voltage times current that does not work anymore. Instead we have three concepts now. We have apparent power, real power and imaginary power. So your real power or your apparent power is your voltage times the complex conjugate of the current which is broken into two components. Your real power which is what you think of when you think of electricity and your imaginary power which can kind of be thought of as power that is basically being bounced back into the grid because of things like inductances and capacitances on your load. The beer model is often used as a way of thinking about it and optimizing this is a big part of being a transmission engineer. So some basics on how the grid works. Power plant makes the power, low voltage stepped up to high voltage, transmitted at high voltage to a transmission substation, to distribution substation, to small transformers on poles to your house. So a potential attack that we would worry about you know on a simplified grid like this is an attacker going after something like a numerical relay. So I'm not going to explain like how an attacker would do that because we're all adults here and thinking about malware on something like a numerical relay running Windows CE is not something too far fetched. But essentially for the purposes of this we're worried about an attacker who's very subtly trying to change a setting in either the potential transformer or the current transformer which would give a bad voltage your current reading. This is, this could be bad because it could cause a relay to not break when the voltage conditions are dangerous or it could cause it to break and cause disruptions when situations are not dangerous. Neither of which is ideal. So we're going to see if we can use state estimation to detect this in a simple example. So this is about the simplest example I could come up with for state estimation. Normally a real-life state estimation would be way more complicated than this. But for this we have a generator and a bus and two readings on one relay power power and then two readings on the other one for real and reactive power and a voltage reading right here on a second bus and a transformer sorry a transmission line in between the two. For the purposes of making the math easier we're going to assume that the power company is giving us the voltage here so we don't need a sensor and that it's one that it's the voltage is one and we know that for 100 degree certainty this makes the math a bit easier. So we call this a virtual measurement essentially. So we're not taking this measurement it's just given to us and these are the other measurements that our system is getting and we want to know if these measurements are legitimate or if they've been hacked. So first we need a set of equations to describe the physics behind this system. So we have our first z which is equal to our just the voltage so that's h1 equals v2 it's pretty simple. The others we're taking the real part so we have four power measurements power one two real power one two reactive sorry real power one two reactive power one two real power two one reactive power two one which is simply nothing more than the formula we took earlier with real and imaginary components taken respectively. So just by using Ohm's law we can get the current we don't need to measure directly just by taking the differences between the two voltage phasors and multiplying by the impedance on the line. Through substitution we get power one and two and then we take the real component of it to get the real power on the first line and then the imaginary power on the sorry bus one and then we can do the same thing to get a power on real and imaginary power on bus two. Now that we have our equations and we know v1 for a certainty we can cancel it out and that makes our equations form into this nice h matrix that looks like this and now we have all of our equations in two so we have now two unknowns and all our measurements in terms of that which are delta two which is the phasor offset between voltage one and voltage two we can just call since um delta is just a offset we can just arbitrarily call one of them um zero which again makes our math easier so we only have two state variables that we need to know as opposed to one as opposed to three. So now for optimization so as we recall earlier our goal is to to minimize j which um is simply the h values minus the z values squared and added together so it expands out to this using substitution with h that we just got it expands out to this big long and nasty. So then using the equation that we have the algorithm that we came up with earlier for these squares and using this weighted matrix which is just nothing more than a whole bunch of weights and a diagonalized matrix to make the matrix multiplication easier um we get this for our h uh for our Jacobian of our h equations we now run the algorithm and the nice thing about the least squares algorithm is it's auto-correcting so you can start with any guess and it's going to lead you in the right direction so we start with the initial guess of zero for the angle and one for the voltage we run it and we get our first iteration we get this for x1 and this for our residuals and um while these residuals are low uh we can probably do better it's always a good idea when you have a non-linear system to run the equation at least twice so we run it again and we get this for our x value sorry we use this as our input for our x's and we run the equation we run the algorithm again and we get this for our second um iteration for our x values and this for our residuals this looks a lot nicer um so we then normalize these residuals by dividing by the standard deviation and we get this so then we it's the sum of the squares so we square all of our normalizing residuals and get this we add them all together and we get j equals 20.76 so now we use that as input for our chi-square matrix using 20 20.76 as our input for the zeta and our degrees of freedom as input for which line we should follow and we get something that's up here so for the amount of degrees of freedom that's a pretty clean clean answer that our data is definitely uh being tampered with and somebody is messing with uh one of the values that our relay is reading okay well that's great but now how do we discover who the culprit is um well it's actually pretty simple um we can determine which sensor was bad simply by throwing out measurements um and seeing if that lowers our j um if it lowers our j then it's likely that that measurement was the one that uh was messing this up but as you can see by this chart when you throw out measurements you lose degrees of freedom so now instead of um operating on the three line we're operating on the two line so if you don't have lots of redundant measurements this isn't uh you won't get a clean answer often um so using the previous example we start with measurements that we think might be um bad and there's really no way to do it than just guessing so we start off with z4 chuck that out we get j equals 4.16 and our new probability that we get is 12.5 by pulling it back into the uh chi-squared equation so it could be that one kind of unlikely um so then instead we try chucking out z5 and we get 18.21 for j which is corresponds to 0.01 from the chi-squared function um so it's probably not that uh so then instead we try z2 and we get 1.626 for our j um which gives us a 44.34 probability that the estimations now make sense um so as i said before with two degrees of freedom um you often won't get a clean solution but it's most likely that z2 um was the measurement that had been attacked or was being compromised since um this raises up the probability so dramatically so um this tells us that the attacker was most likely manipulating the reactive power reading on bus loan so with higher redundancy again and cleaner data we get a more definitive measurement uh answer on what the measurement is that is being attacked so in summary um state estimation you step one create a mathematical model of the system you're trying to analyze the states of calculate the least squares estimates for the state you're trying to estimate based on um the given measurements sum the squares to give you j calculate one minus the chi-squared test output where v is your degrees of freedom and look at the output and see if your measurements make sense so uh benefits of state estimation are one you're using measurements that you already have that's probably the biggest one so as far as defense and depth goes this doesn't require um doing any tricks to add you know normal commercial off-the-shelf um it security technology onto an ot system which is a big plus and if you work in the power system industry you already have these um adb spider g load flow seeming spectrum um there's lots of commercially available state estimators out there for the power industry that again we're not designed for security but could be potentially utilized for security um it makes life much more difficult for a attacker to be stealthy if an attacker is attacking your your uh sensors they have to be uh especially if you have lots of accurate redundant measurements um they need to be much more careful about how they manipulate those measurements to avoid detection and um a big one too is you can use it on legacy analog equipment um by doing you know some digital conversion but legacy or analog equipment um it's a technology that can be used on analog equipment so uh future work because this isn't really something that's being used right now for security anywhere that i've seen and if somebody knows better please correct me um is uh what would be the best way to build a state estimator um we have ones that are commercially available but maybe it's a better idea to integrate the data we already have into a data solution um you know like splunker elastic or some other sim or maybe it's better to build it onto um splunk like a splunk app or an elastic stack app or something like that there's really no telling um the techniques are it's a purely numerical technique um so it could be deployed a numerous amount of ways um there are lots of options so it's an interesting uh be interesting to see what we discover um it can be used anywhere uh it because it's again it's a numerical technique any system that's roughly stable follows the laws of physics um you can use state estimation is a technique for detecting an attacker it's uh it's an ics intrusion detection technique not just useful for power industry the only real limitations are that the system needs to be mostly static once things start changing over 10 hertz there are there's research into dynamic state estimation but it's by no means a well established technique so is state estimation a silver bullet of course not we're all adults here we know that there are no silver bullets so it does have drawbacks um requires a uh well understood documented physical model of the system uh lots of redundant measurements um and you're only able to determine if the data is actually garbage so you're not actually able to determine if the data is malicious again this is why it works best in industrial environments where you you have knowledge that your sensors are of to a certain degree reliable because the manufacturer has told you that you have a meantime to failure or certain standard deviation and it is what it says it is um and not something else because you need to be able to trust that your sensors work um to a certain degree before using this technique um additionally uh integrating state estimation into security is pretty much non-existent at this point um most SOC teams have no idea how most of the systems like as if they're defending physical systems they often do not know the physics or how the um the actual physical system they're protecting works um and even if they did there aren't any sins that have any ability to build state estimation into them easily um and the state estimators that do exist have no easy way of being integrated into uh security products so of course the answer is teamwork um engineers that understand state estimation and the physics behind what they're doing need to work with uh the blue teamers right um teamwork and leadership is the answer that fixes this problem um because often you'll have people that understand this stuff they just need to to work together and they're working together they'll make this all safer so special thanks to uh professor Sockies at uh Georgia Tech I used uh some of his work in this presentation and that's it I hope you've enjoyed this presentation and I hope you learned a lot