 Hello, DefCon! We're very excited to have you here. Today we're going to talk about very long research we've conducted in the past few years, exploiting OPC UAE in every possible way. So, let's start. My name is Sharon Brzezinov. I'm a vulnerability research at Clare de Timidi II. I also have a DefCon Black Badge. Woo! With me is Noah Moshe. He's also our senior researcher. And we would like to say thank you very much for the other researchers who worked on this project, Uri Katz and Vera Menz. So thank you very much. Today we're going to talk about a very long research we've conducted on OPC UAE security. So basically we researched a lot of actually dozens of OPC UAE protocol stacks and different products. And we found some core issues in their implementations. So it's not like we found a one-off vulnerability in one product, rather we found and we were able to develop multiple attack vectors that exploited multiple products that support OPC UAE. So all in all, we found around 50 CVs and we developed 12 unique generic attacks that we gathered into one huge framework and we're going to release it open source. We also released a fuzzer, OPC UAE fuzzer, that actually many vendors are currently using as part of the security development and they also found a couple of bugs with it. So I think it was very helpful. And finally, a big thanks to ZDI. Some of these research at least were very incentivized because of Pwn2Own. So ZDI has their Pwn2Own competitions and also they have the specific ICS category which they emphasized on OPC UAE. So it gave us the cash prize obviously, gave us a good incentive to research OPC UAE, find bugs and get 200K. So today we're going to talk about what is OPC UAE, cover some protocol stack implementations, go over bits and bytes, which is a bit boring. We're going to cover the research methodology, so how we approach this project, how we research different protocol stacks, etc. And afterwards we're going to show you some cool vulnerabilities and exploits we're able to find and finally release our OPC UAE exploit framework. So let's start with what is the problem? So why OPC UAE was created from the first place? So in the past let's say we had a physical process, in this case water tank that we want to keep track of the water level. So we have a PLC with some kind of a logic that keeps track using sensors on the water level and these are configured as variables. So the sensor is reading where the water level is and we have a variable that is changing in the process. Now if we want to monitor this procedure, for example from an HMI or a SCADA server, we had to use the specific proprietary ICS protocol in order to communicate with the PLC to read and write these tags, values or variables. And obviously it's not very convenient because if you want different products to communicate with it, we're very limited. So OPC UAE was introduced in order to have a unified way to communicate between different devices and products within a SCADA network. So now we could all communicate OPC UAE and we do not have to be limited by the specific ICS protocol. And this is why many vendors joined in and today most of the vendors are supporting OPC UAE in various ways from servers to clients to support in PLC, the SCADA servers. So it became kind of the new standard and it's very popular today. And in essence the protocol was created for kind of a unified way for data exchange between industrial devices. So we have the server which stores the variables, for example, the water level variable. And we have clients that read or write to these variables to monitor the process or also alter and modify the process. Now it's very used, it's very widely used by almost any vendor. So let's go over a bit of the history of OPC UAE. Now OPC UAE was created by OPC Foundation and the first specs goes back to 2006. And it was created based on lessons learned from the OPC classic, OPCDA. So in the past there was a different protocol with a similar purpose, but it wasn't good enough. It lacked independent platform independent so it was very tied to Windows and it was not scalable, it was not secure. So OPC Foundation created OPC UAE and also created a very thorough and deep specifications to make sure everybody are using it correctly and also to cover all the topics that anyone needs to know about OPC UAE. So for example, how objects are created in OPC UAE is found in a specification, how security works is also found in a specification. So if anyone wants to implement OPC UAE, obviously they need to read the specification and they're pretty good so it's very kind of easy to read and follow. Now to accelerate the use of OPC UAE, OPC Foundation created three main protocol stacks. They created the Java protocol stack, .NET protocol stack and NCC protocol stack. Basically what they tried to do is they tried to create some kind of SDKs or core libraries that other products could easily implement in their software to add support to OPC UAE. So in essence to expedite popularity, OPC Foundation created the first OPC UAE protocol stacks. And today there are hundreds, maybe thousands of products using OPC UAE from servers to clients to protocol gateways and they can all be found in the OPC UAE, OPC Foundation website. And when we started this research we also went over this website and we looked at the different products and it's very convenient to see what products are supporting OPC UAE. Now the problem is that most products are actually heavily relying on the base protocol stacks that OPC Foundation created. So if we look at some of the top products today that implement OPC UAE, we can see that OPC Foundation is using OPC Foundation core libraries. So for example the OPC Foundation .NET is actually the actual first OPC UAE .NET protocol stack. So the vendors created products that use the core libraries and edit a little bit of their own touch or their own code to modify it. But in essence they're still using the same protocol stack as the core library. So we wanted to find vulnerabilities in the base protocol stacks so we will not exploit or find vulnerabilities in just one product rather we will exploit all of them or at least big portion of them. And to do this we created a very long list of different OPC UAE protocol stacks and products and we tried to divide them into different categories. So we picked different protocol stacks written in from C to C plus plus, Python, Java, basically any modern programming language and we divided all of these products to different categories which in each category is based on a very similar protocol stack. In this way if we found a bug or a vulnerability in one of the core libraries of a specific category we could exploit all of the products that use the same core. So before we dive into the vulnerabilities themselves let's go over quickly how OPC UAE is implemented. So in OPC UAE we have this concept that is called nodes. So in OPC UAE everything is a node. For example our water level variable is a node of type variable and it has its own subtype float. So basically it's a variable that has a value of type float. And in OPC UAE also we have the concept of namespaces. So namespaces are kind of containers for nodes and we can have different namespaces for different purposes. So for example we can have a namespace with all the base nodes and we can create our own namespace that extends different objects for our purposes. In namespaces we have the nodes so therefore nodes are identified by namespace ID and also node ID. So if we want to refer to a specific node in the entire address space we will need to specify the namespace and the identifier. Now since the specifications are very, very thorough and detailed they also tell us how to encode these nodes. So for example in the specification they tell us that we can encode namespace with a single byte and identify with two bytes. And we actually looked at the specification in order to understand how to implement it ourselves so it was very beneficial for us to read the specification and understand this. In OPC UAE we also have this concept of services. So services is our interaction with the server and by activating some kind of services and we can activate different services on different nodes. So for example if we want to read a variable or a tag we can use the read service to read its value. If we want to write to it we will use the OPC UAE write service. So services are very, is our way to interact with the server and actually to implement OPC UAE you need to implement a lot of these services. So if we sum up this crash course so we have for, in our example, for example water tank, in our water tank we want to monitor this process. So the water level in our OPC UAE model will be a variable of type float and if we want to read it we will use the read service from our HMI and our HMI will continuously read this variable using OPC UAE in the read service. Sorry for the boring stuff let's go to some more exciting stuff. So we have our research, we researched OPC UAE for a very long time and when we came up to start this research we needed to come up with a plan. So we started with buying two Intel nukes which are very powerful computers, somewhat powerful. And we installed VMware ES6i on those nukes. We did this because we wanted to install many, many products from different categories. So for example to install OPC UAE servers, OPC UAE clients, OPC UAE product gateways and also different product stocks in this case. So we needed a lot of different virtual machines in order to install all of these products so we could research their binaries or review the code. And to do this we started to build our own client. So we wanted a way to interact with these products and we wanted a way for us to kind of poke these servers to reach some code path and we decided from very early on the research to build our own protocol stack. So we wanted to build our client that we could easily modify or change and play with the different packets. We also wanted hands-on experience with OPC UAE. So we wanted to understand better how objects are created. We wanted to understand better how services are being used. And so we created our own client to make sure we really understand how OPC UAE is being used. After building our client the next step for us was to have some kind of fuzzers running passively in the background. So we wanted something to fuzz all of the products that we installed and we had dozens of products installed in our environment and we created our network fuzzer. So the network fuzzer was based on Bufa's framework. It's very convenient to easily write network fuzzers based on this framework. And we implemented six different services including grid, browse, write, etc. And we fuzzed them all on all the product. So we had dozens or even more than dozens of protocol fuzzers trying to exploit or at least find crushes in all the products that we installed. And it was actually very beneficial because it helped us to find a couple of bugs. And we also released it as open source. So now vendors are using this as part of the security development cycle. We also created coverage-based fuzzers. So at one point we found NCC OPC UAE stack. So it was a source code of the implementation of a protocol stack. And we took it, we created some harnesses with libfuzzer and AFL. And we also ran this for a couple of weeks. Unfortunately it did not find any bugs. Mostly because Kaspersky did the same thing a few years ago. But it was still very helpful for us because we created a lot of corpuses. So we created a way to reach a certain code path within these products. Finally we needed a way to control all the fuzzers, all the running fuzzers. So we built a slack bot that monitors all the different fuzzers. And whenever we reached the crash we received a notification and also a screenshot. So it was very helpful for us to monitor these hundreds of fuzzers running in our infrastructure. And it also helped us to understand what's going on and keep on monitoring and getting the status every single time. Next, while we have all of our fuzzers running in the background, we wanted to move on and do some manual research. So we turned out to the specification and started to understand and look for esoteric and complex features and mechanisms. So basically we asked ourselves what will developers overlook when implementing the specification? And just to give you an idea of what it means, let's go over this example. So when we read the message header specification, so in OPCUA we have a very strict way to send OPCA messages. And part of these messages we have the header. And inside the header we have a flag saying whether the message we're sending is complete or whether it's chunked. So we can send a very, very, very big, long message. And we can send and we can divide it into different chunks and send each chunk every time. So in the header we have a way of saying, of notifying the server whether this specific packet is a chunk or it's the final message and now it needs to process, process all the different chunks we send. So we asked ourselves what happens if we're sending the server a chunk and another chunk and another chunk without ever sending the final chunk? What will happen? The server will terminate the session, the server will crash. So these types of features we tried to research and explore. So let's move on to the cool stuff. The vulnerability is an exploit. And we'll start with the denial of service scenarios. Obviously the denial of service is a big deal when we're talking about ICS or SCADM because shutting down a server might mean shutting down an entire factory. So obviously we wanted to research and explore the denial of service scenarios. For example, what happens if we're able to crash the OPCUA server? And to do this we did not just want to think about, okay, a single crash or something that our father found. We actually approached it from a very thorough way, I would say, because we created ourselves categories. For example, uncontrolled memory management. And we followed these categories to find specific attacks against these categories. So we actually, when we reversed, reversed engineer the code or when we did the code review, we tried to think what will cause uncontrolled memory management? Or what will cause a thread deadlock? And we follow a different code path that we believed will get us to this point. So if we take an example, the Chanuk floating, as I started to describe earlier, then as I mentioned, we have in the OPCUA header, we have the Chanuk type. This is the flag I was referring to. And it has two main values, C or F. C means this specific packet is a Chanuk, and it's part of a longer chain of Chanuk's. And F means this is the final Chanuk, and now you need to process all the previous Chanuk's. If we look at different servers, for example, this is the OPCUA dotnet stack, we will see an if statement checking for is final, meaning is the flag F, and if so, stop processing the message, stop to receive more Chanuk's and start to process the entire message. So what we did was simple. We sent a lot of Chanuk's, and we just never sent the final Chanuk, and finally, the server crushed. So we used this vulnerability not once, so it was not an off one vulnerability. We actually used this vulnerability against different protocol stacks, because apparently many developers did not think what will happen if they will receive a lot of different Chanuk's, and we were able to exploit and actually crush a lot of different OPCUA servers. Another example from our use after free category is method calling from a dead session. Now, it turns out that in OPCUA, we have a way to activate methods remotely. So we can configure in the OPCUA server a method. For example, here we have a method to multiply two nodes, and these nodes are obviously from type integer or float. And we have a way to activate these methods remotely by sending a method activation or invoke. Now, we looked at the specification, and we noticed something very interesting. It actually says in the specification, what if the method calling is created from a session, and we actually send a lot of different methods, and then terminate the session. The specification says that the server should not return answer to the client because the session is terminated. Obviously, we thought ourselves, okay, what happens if the developers did not implement this correctly? So what will happen if we will send a very long list of methods, for example, 255 methods in an array for the OPCUA server to start processing, and we terminate our session in between. So how it looks like? We're preparing a lot of methods to send to the server. We're sending all these methods to the server. The server starts analyzing all the methods, and in between, we're terminating our session. Now, the server continues to process all the methods, and finally, it needs to send back the result to the client. However, the session is gone, and if developers are not implementing this correctly, they will try to dereference nonexistent session, which will result with an access violation. So again, this was, this scenario was based on something we read in the specification, and we thought ourselves will be interesting to implement. And by the way, all of these attack vectors are implemented in our framework, exploit framework, which you can access through GitHub, and we'll show you later how to access it. Okay, so the null service is okay. I mean, we can cross servers in scan networks, but that's not a big deal enough. We wanted to have a way to do remote code execution. We wanted a way to control the OPC UI server and maybe modify these tags. So for example, not just crashing the OPC UI server, but actually change the water level from zero to 100 and change how factories and change how the physical process looks like. This is much more interesting. So we decided to research the PTC KEPR. This is a very popular software. It's one of the industry leading OPC UI servers used in the biggest manufacturing lines, including oil rigs, wind farms, et cetera. It's Windows based, 32 bits implemented as a server, a service, and we researched this product for quite some time. By the way, we also let our fuzzer to fuzz this process and this product for quite some time. And one day at night, we received a notification from our Slack bot that there is a crash. And at the beginning, we were very spectacled that it's not a real crash, but our researchers, Uri and Vera, started to look at it and they discovered it has something to do with string manipulation. So they tried to get even deeper into what happened exactly in the process of converting strings and they discovered something very interesting that we'll cover now. In OPC UI, we need to encode our strings in some way and transfer it over the network line. So if we, for example, have a tank ID in our example, we have a tank ID or a tonic location name, then we need to encode the string, as you can see here, in some way and send it over the line to the OPC UI server. Now, in OPC UI, all the strings are UTF-8 encoded. But some of the servers are using UTF-16. For example, Kepper is using UTF-16 as a way to encode the strings. So whenever Kepper read the strings from the packet, it actually tried to convert from UTF-8 to UTF-16 and there we noticed the crash. So there was something in the conversion between UTF-8 to UTF-16. So before I explain what happened, let's go over what is UTF-8 and UTF-16 encoding. UTF-8 is a type of encoding that we can represent symbols. For example, the letter A with some bits, for example, A is 41 hexa. But if we have other symbols that are bigger than 7F hex, we need another byte to represent this symbol. So for example, this funny looking A is represented as C380. So some bytes or sorry, some symbols are represented as a single byte and some symbols are represented as two bytes or three bytes or even four bytes. And Kepper, whenever it received a message, OPC UI message with a string, it tried to understand how many symbols are within this string. Why? Because it needed to understand how to allocate memory for the UTF-16 conversion. So for example, if we're sending this string, Kepper will try to understand how many symbols are inside. So it knows that 41 means one symbol and C3 probably means two symbols because this is above 7F. And so it tries to calculate how many symbols we have. For example, here we have one symbol represented with one byte, another symbol represented with one byte, another one with another byte. And we have a symbol, this funny looking A, with two bytes until it reaches an alt terminator. And whenever it reaches an alt terminator, it stops and calculating exactly how many symbols it has so it could convert it to UTF-16. But what happens if we're sending just the C3 at the end? Kepper will probably think it's a symbol, one symbol represented with two bytes and will jump two bytes. So let's see how it happens. So we have the 41, which is one symbol, one byte, 41, one symbol, one byte, another one. And then we have C3. C3 in Kepper logic means to jump two bytes. So it will jump above the alt terminator and we're starting to jump on the heap. And this means we will jump on the heap until we're reaching another alt terminator and we could leak data. So we actually used this vulnerability in order to leak a lot of data from the heap. But we also were able to leverage this further. So first of all, leaking data from the heap, we used the read tag. So for example, if we want to read variables or tags from the OPCA server, we'll use the read service and we could just specify the known ID, so known ID and namespace ID. And we could read the tag, tag information. But if we're sending, if the node is encoding with C3 at the end, whenever Kepper will try to convert, we'll start to read data from the heap. So we used it to leak a lot of data from the heap. And this way, we were able to leak pointers and defeat the SLR. We also did kind of the opposite. So we used the write service in order to write memory into the heap. And we used the write functionality. So for example, we wrote a tag that ends with C3. And whenever Kepper tried to convert it from UTF-A to UTF-16, we started to overwrite the heap. So now we had out of bound read to leak pointers in the VT SLR. And we also had out of bound write to construct our rope chain and eventually get remote code execution. So our researcher Uri were able to really convert it into a full chain. And first of all, he used the leak primitive to leak a lot of data from the heap. Then he overwrote, he calculated some addresses that are needed in order to create our rope chain, overwrote some information on the heap, and finally triggered the bug. It got full remote code execution. Great. So we covered the denial of service scenarios, also remote code execution scenarios in OPCA servers. Let's see how to do this with remote code execution in clients. Okay. Thank you very much, Aron. Like Aron said, at this point, we exploited many different products. It started with OPCA servers, also some OPCA gateway protocols, but that's when we thought to ourselves, let's try and look at clients as well. Now, what is the usual suspect and like the immediate thing that you think about whenever you're talking about exploiting OPCA clients? Well, it might look like this. We have an OPCA client connecting to a rogue or malicious OPCA server. And whenever it does it, it tries to read, write, or interact with different tags. And somehow, by returning malicious data, the server is able to exploit and run code and execute code on the OPCA clients. Now, when we realize this is the attack scenario, we thought to ourselves, yeah, we looked at pretty much logical bugs and memory corruptions. And these are very, very tough, meaning they take a lot of time to research and fully exploit, and it's not that easy. I mean, just on the memory corruption Sharon showed you just now, to actually fully exploit it on Windows 10 machine, it takes months and months and many of our researchers like Uri worked on it for a very long time. However, we thought to ourselves, maybe there's an easier way to exploit OPCA clients that are less relevant to OPCA servers. And that's when we looked at two very, very popular OPCA clients being inductive automation ignition and softening data feed edge aggregators. Now, this is kind of the big names in OPCA clients. These are SCADA and data servers that also have the functionality to connect to an OPCA server and read and write tags to it. However, one more thing they have in common is that they are both web based, meaning they are used from a browser by a client and connect to the ignition in the active automation ignition server using web browser. And that's when we thought to ourselves, yeah, web browser are a little bit easier. What can we do in them? Well, of course, the main functionality in OPCA clients is reading, writing, or subscribing to tags, meaning I want to read a variable. So I'll read or write to it. That's when we thought to ourselves, yeah, let's say this scenario will happen. We have our OPCA client connecting to our malicious server and trying to read a tag, like in Sharon's example, trying to get the water level. Well, whenever it tries to read a tag, our server simply says, yeah, sure, here we go, read a tag. However, instead of returning actual value like a float number or a string, we return a simple XSS script tag, which then the client takes and inserts it into its DOM. Now, whenever it inserts into its DOM, it's simply executed, and we have the ability to execute code in the context of the client's browser. And as it turns out, both of these servers are actually have an XSS vulnerability in the reading and writing tag functionality, meaning we were able to, as you can see here, achieve alert in softening and of course, in ignition as well. Now, XSS is pretty cool, but I guess all you can do with it is maybe re-crawl the user. I mean, it's not remote code execution per se. That's when we thought to ourselves, how can we take it further and actually leverage our XSS vulnerability into achieving a full-on remote code execution on the ignition, the inductive automation ignition and softening edge-agreative servers? Well, as it turns out, we actually chained multiple vulnerabilities in both servers in order to achieve the remote code execution vulnerability. So, let's take a look at both of these exploitation chains and see how easy it was in comparison to the full-on pledged memory corruption vulnerability we showcased to you before. So, in the case of ignition, one of the main functionality that Seville offers is the ability to import and basically upload a new project. Now, in this project, we can set up something called gateway events. What's gateway event, my task? Well, basically it's a callback script that we can add that will be executed whenever a certain thing happens. Like in this case, we're talking about a script being executed every few seconds on a scheduled event. Now, inductive automation ignition chose to allow users to add Python code to their gateway scripts, meaning by simply having our, by simply uploading our callback script, we are able to execute arbitrary Python code and achieve remote code execution on the Ignactive Automation Ignition server. Now, this was a whole chain starting from the XSS in the OPC UA client read. However, we also wanted to look at softening and try and exploit a few functionality there. Well, in softening, we don't have a project upload. However, we have something that could be similar, a procedure restore, meaning we want to restore our old configuration of the Seville and go back in time and basically have the old settings. Well, whenever we do it, we perform a procedure restore procedure, we actually upload a zip file to the Seville and the Seville basically unpacks it and loads an XML and configuration files from this zip. Well, we looked at the softening Seville and found a few vulnerabilities involving zip slip and pretreversal, meaning we had the ability to basically write files anywhere in the system whenever we invoke the procedure restore. And by uploading a shared object that will execute code whenever it is loaded, we are able to execute code on the Seville, allowing us to basically control it and having remote code execution on both Seville's. Now, these exploitations started from the OPC UA vector, meaning we had the ability to execute code in the context of the browser. However, by chaining it with a few different vulnerabilities, we are able to achieve full on remote code execution on the client, allowing us basically to exploit it as well and achieve fully unplugged remote code execution. Now, everything we showed you is pretty cool and pretty expensive. However, during the development of this entire process and research, we developed our own exploitation framework that we're going to share with you today. And like Sharon showed you, we researched dozens and dozens of different protocols and we actually discovered over 50 unique CVAs. I think we are getting closer to 60 at the moment and developed a lot of what we call attack concept, meaning one vulnerability, one kind of logic flaw that could affect multiple different Seville's and multiple different products. That's why we created our own OPC UA client and our own actually OPC UA exploitation framework that you can feel free to clone it and scan it and do it for your own. This framework is pretty extensive and contains the entire knowledge base that we researched in the last few years and we hope it will be useful and basically it is released as an open source project. So feel free to download it from our GitHub account. In this framework, you'll find the full-on POC script and exploitation techniques of all the different attacking scenarios we developed, like the chunk flooding that Sharon showed you, like the cap where vulnerability in the UTF-8 bed allocation and bed string concatenation and of course dozens and dozens of other vulnerabilities we uncovered over the last three years. So feel free to use it and feel free to share it and use it in your own Seville's environments, etc. So let's summarize everything up and see the entire process we've done. So like Sharon told you, during the last three years we researched OPC UA heavily, mainly as the point-to-one contest, however also on our own, looking at different products, different OPC UA implementation, protocol stacks, code bases, etc. We found and discovered over 50 vulnerabilities exos-fishings, protocol stacks that affects dozens and dozens and actually probably thousands of different OT environments and we work closely with the vendors to fix every vulnerability and they disclose it to them and make sure that the environments are more safe and more secure. We actually gave them early access to our OPC UA exploitation framework and we can basically giving the OPC UA vendors and developers the ability to test their products, again the different attack techniques we developed and look for bugs in their own code base and we are happy to say that a lot of them used it and actually found new vulnerabilities using our own OPC UA exploitation framework. So please do use it as well in your OT environments, in your products, etc. Use it, add to it, it is open source and so of course you can contribute to it as well and thank you very much for our talk, for attending our talk and OPC UA. Thank you very much.