 Hello If you haven't been living under the rocks for past few years You probably noticed that the age of AI is upon us. It's almost everywhere. I mean we have Driverless cars driverless taxis trains are running themselves Oftentimes better than normal drivers We also have tools that are able to produce Art or at least pretty pictures and also these lovecrafty and horrors on that on the other side We also have Language models and chatbots that are able to pass the T-Ring test that are able to Converse with people that are able to It's a great says to summarize text and all the other fancy stuff and It's natural to ask What is the state of the AI in cyber security? I mean where we are on this scale of two homicidal artificial maniacs and This largely depends on who you are asking if you ask the companies that are selling AI powered systems for cyber security Yeah, they will tell you they're swiftly approaching the short on level of capabilities if you ask Anybody else mostly the researchers in this area you'll probably get the answer that even the Roomba is overselling our current current capabilities and Was the reason for that I mean just a few numbers to begin with The Tesla that their new beta Driverless mode has driven more than 250 million kilometers and every time there's some every time something happens just that cup of wood at their servers and it used to refine their models Stable diffusion is producing those pretty pictures. Yeah, they used 160 million images to train Their model GP D3 and the obsolete one right now has used 45 terabytes of text data to train and produce the model What do we have in security? Well, mostly a limited data sets. It was a Not so long ago time that for 10 years The old for training Machine learning powered systems. We were all using the same old absurdity to data sets and the situation Really hasn't changed that much because there's not enough data sets and they're not Let's say variable enough that do not cover all the use cases that we want to focus on and They don't really reflect the real-world situation and also We don't really have the environments where to train those autonomous cybersecurity agents so The situation is in a way quite bad but as I assume most of you here are Developers so What is the natural a reaction? Do something about it create some environment where you can train those Train those agents where you can work on that work on that cyber security But there's a good reason why the state of these tools and of Let's say cybersecurity autonomy is where it is and it's because it's not a really Easy problem so for strut starters if you want to create an environment where Where you want to train something You first have to decide. Let's say which paradigm to use Where you just simulate everything? Will you be using a docker containers? Where you use fully virtualized networks and so on or some hybrid approach and Going back to those large numbers a few slides ago Just imagine that you want to do hundreds and thousands and millions of different scenarios that you want to play out and You suddenly realize that you really can't work with virtualized environments because You don't have the hardware and power to run so many different scenarios to all restart all these different scenarios Prepared the hardware or so you usually don't have all those vulnerable machines that you want to try and so on so it's It's a problem to get all the stuff that you need and Even in emulation you just hit the same brick wall you just can't run it all so the only way to say prepare the environment for training Autonomous cybersecurity tools is to have some kind of simulation environment. So Does any one of you recognize some of these pictures? Yes, he's some heads nodding The reason I'm showing is that these are Open AI gym environments that are used for training Machine learning algorithms The problem is that you can't really use it for cybersecurity. I mean this this is not how you secure your network and This is the problem that all those environments that are available are Just simple abstract problems. That's just they don't reflect what the cyber security is about everything is connected intertwined domain is really large and There's a really really hard stuff to come up with the environment that Is able to reflect that that complexity and But let's say you You somehow will get your hands on that type of environment that you have an environment where you can train it whether where the Environment reflects The complexity of the domain and then you can see why everything you need Let's just for the sake of the argument say that you have this. Is it all that you need for training autonomous agents or having work working autonomous cyber security agents and The answer is not really This is just a high level Description of all the problems that probably needs to be solved before we can even think about letting Autonomous cyber security agents lose and Some of these stuff may not be applicable to let's say normal domain that this was done for for the army, but most of it is Is valid even for civilian context so but one step at a time and in this In this talk I would to just Guide you or present you the Decisions that one has to make When deciding what to do when you want to create a simulation environment and then that you want to use for training Autonomous cyber security agents That will be usable So not something that's let's say just doing some abstract stuff That's but something that can later be deployed and used in the real world context so It starts with let's say one easier decision that it's to choose what modeling approach will use whether it be some kind of district discreet event simulation market processes the interview this actually one of the few questions that is not really that important because You can probably Get away with with anything the harder part is Choosing the the abstraction that you that you want to use and So for example, I'll start with the with network model like Let's say that you want to create an autonomous agent that will be able to somehow Guard or attack this doesn't really matter Some some kind of network, then you probably have to in that environment as some representation of the network So yeah, it's just network. What it is collection of nodes that are Yeah, they're connected some connections, but Is this enough to decide what to do you probably need to Do some refinements you have to say okay, so there are two types of nodes like we have a something on the edge and then we have network and active network devices But on the edge there are different types of machines like workstations. You have servers you somehow need to account for that in the model and How about the Let's say IOT stuff You have the cameras that are linked you have some kind of printers. These printers are good vectors for attacks But how about Biodi devices laptops mobile phones what not? And all the stuff in the cloud will you account for that in the model? You have to because it's just what it is right now. So when you Want to model how the infrastructure looks and that's it the network level? You have to account for the current realities and this It's a bit complicated and it needs to be Detailed enough for that all on a message and to decide and to decide correctly But let's say away you decide on some some kind of infrastructure But for you now, it's the another part How do you model those nodes that are? that are in the infrastructure We'll type of the node like the PC Phone printer or whatever will be enough Or will you have to include more? like What are what is the operating system running on it? What are the services running on it and? if you think about how Different exploit looks like and how the tools that are using them to exploit this stuff how they work You'll probably need to get it give it Something more to work with So for some exploits. Yeah, you will need to have At least some kind of model of the file system You may need some model of memory and for even some lower low level of Exploits you may need the modeling the communication between bosses. It's just it's your decision and it's You always have to think about what the agent will do what it will what it will be capable to do How about connections between those between those nodes? I mean you can decide medium whether whether you will be simulating that there's a difference between For example going over the air over the wire Will you be modeling the properties like? the bandwidth of the connections It's needed if you want to if you want your agent to for example be able to work with the OS attacks Because then you need to model that how the connection looks like And how about protocols? There are many different attacks on protocols that are abusing their structure or whatnot So you probably need to include that or not it really depends What you expect your agent to do And how about the users? It's another kind of work I mean do you even want to have users in your simulations or do you just say okay? Let's say there are no users and we're just securing infrastructure or attacking and But if you decide that you won't have users Will there be just some kind of user types or will they have different identities? That are for example linked to different data on those nodes Are the users active? Are they doing something or are they just reacting to let's say external stimuli? Do they produce some traffic? Even are they just working eight hours a day? Do you want to just model? their behavior as the day goes on there's really hundreds of considerations that you have to do and Each of these consideration will affect what the agent will be able to do and We'll talk in both attackers and defenders because if If you don't model something Then the agent is not able to React on it or act on it So yeah hundreds of considerations and what I described right now is still just only the passive side You're describing What is the environment where it happens? But when you're trying to create a simulation that's workable you also need to think about the active side You need to think about what the agents can do So for example, I'm talking now about attackers. How do you model that the attacker the attacks unfold? I mean For quite a long time. This was the default attack model that was considered in research publications There's a retaker and this target That's all something happens in between Okay, then there was let's say you was attacks or some attacks that require Coordination, okay more attackers one target still nothing complicated, but Then you have things like this. That's just representation of what Stuxnet did and you see that the attack path and usually attack path of any APT or some more complex malware is much more complicated so When you're considering how to model the attack You just need it to enable it to do all this stuff if you want it to be able to act upon it and act reasonably which That's another layer of complexity because When you give the actions to the agent we can do They need to be expressive within that model They need to have the impact and the agent must be able to learn that impact again. There's something you need to Incorporate in that model because then when you're implementing it you need to be able to Provide the correct outputs for agents input. So what will you do? Will you just some kind of? so some kind of abstract actions which is For example, this is something that is currently most of being done that there are only few abstract actions like say, okay I'm scanning or I Don't know. I'm good for saying come exfiltrating keys and that's all but if you ever use those tools that are doing those attacks the Actions are usually much more complex You have to set a lot of parameters for those actions to work You can also use some kind of attack and defense taxonomies that are available Or you can use something something of your own This is just for illustration. These are attack and defense frameworks that Damba might my track that Provide some kind of Structure to possible attack and defense actions there Say this just high-level. There are also many different tactics That are linked to each of those categories but Yeah, if you want to simulate that and if you decide you want to simulate it that you just need to Go through each of these actions probably possibly each of those Techniques and you need to Implement how those techniques reflex in your model So it is quite a lot of stuff that probably needs to be done And you get this for free when you're having fully virtualized environment But as I said earlier, you really can't do that because you don't have the hardware So yeah, you get kind of stuck between the rock and the hard place with this and I would just go briefly introduce One one of the simulation environments one that I I Develop or started started development of currently a work of many people and Yeah, I never wanted to do this. I just wanted to produce produce the agents But instead I spent a couple of years doing this and It's been only recently that I got into the state that I can use it to train something something usable so That's the simulation environment. It's just one combination of all the possible Parameters all the possible decisions that I've described earlier and it's a discrete event simulator that support message passing It provides It's mostly used for Simulation of lateral movement, which means that there is the infrastructure somewhat simulated There are services Simulated as running on those nodes and There's not that many complexity. There's a but it provides Provides third a comprehensive authentication authorization framework. It's working with With vulnerabilities that have those on CVE sets tied to Metasploit and so on. So it's being done in a way that the agents Train on the simulation can be then moved to the real world And yeah, it integrates with those different machine learning Toolkits it enables to use different Behavioral models so you can plug different behaviors of or different actions that the agents can do and then you can to say compare how how this works and It even simulates or enables integration with stuff running outside of simulation so that you don't need to Reimplement everything in a simulation so you can have your ideas or IPS running outside of it And there's just a bridge between Between those two that you can just take the messages that are running in a simulation Converted to something that the ideas IPS Understands just let it process it and give you back the response. So it's just trying to reduce the Effort that user of the system needs to do I'm saying it is because it's just it's it's open source You can you can try you can download it. You can you can play with it Easiest way to find it is just to look at piping and Yeah, there's also a Documentation it's it's a research project. So the documentation is incomplete as it usually is but you can start doing something with that We have a lot too We have a big plans with this and the development still continues Few things just to name that we're working on is One that's tied to the unavailability of data sets and that is we're building a mechanism that can create realistic Simulation a realistic so realistic scenarios so that you just provided with Some constraints what you want to have in this scenario, for example, I want to have this large infrastructure with some of those services or nodes running and it will construct a Vulnerable infrastructure that has some attack paths that runs through it and then is able to instantiate it and The other important thing is that we are working on Transition from the simulation to emulation so they mean when and this is one big problem in this area that when you have the agent that trains on the simulation You don't need to somehow move it Outside and this this transition is is complicated because in a simulation you're working with abstract actions You try to move it to real world and somehow This doesn't work transfer you in the real world. You have to use different tools You have to give the agent to ability to affect the environment So that's something that we were working on and we have some working prototypes that the agent can just Direct use what has been training with or training on and it just moved to real world It's still it's still things that it's inside a simulation environment. Nothing changes for it But it's doing a real-world work on the outside if this is this may be too long for you and I Understand it because it's really at the beginnings and you can for example want it to be say Available as a service that it can already create those scenarios that you Enables you to play with those those agents that perhaps visual and other things or whatever Maybe just only the big green button that does all the stuff We're working on that too it's a just a sneak peek preview of what we're working on and it's the I don't you project that we're doing as a collaboration between Maastricht University and Chevedet and This is something that will be This should be an integrated environment for development of those type of technologies with already Premade agents that you can use something that for example, if you want to train Users with those agents that they can collaborate with or they can fight against each other and Provide you with all the analytics and all the control that but that's gonna take It's gonna take some time. So This is the current state Either with a few years Then we'll have something and maybe My presentation will be different in the time or you can just try to code something and see where it gets you and That's all for me for today. Thank you I We actually implemented multiple models and What we learned was that Those and I'm thinking about talking about some kind of research my researcher model there was some kind of taxonomy of attack actions and It turned out that on the paper it sounded good and when we try to implement it, we just Find a lot of edge cases where it was not working. It was not complete. It was overlapping So, yeah, we're still trying to find the good model that's Comprehensive enough, but it's implementable in this relation and it translates well to to the real world so So There are People are trying to use those language models to drive those agents and Kind of soda works and Usually those language models there. They're noisy the producing out who's that sometimes works sometimes does not so if you use it For example to drive a penetration testing tool It works in a way because there's there's no problem with it because even if you do wrong actions nothing better happens In this situation, you don't want to use it as a To drivers of the defense because you can't really work with let's say 80% success rate So, yeah, people are using it it produces some results, but I think going the last Last mile to have it to choose as something that you can trust that's gonna take much longer time Hmm What's it be on my my domain Myself and at the risk of exposing a lot of ignorance here what I was worried about was like After you have this, you know the simulation of things trains when you're trying to protect something in the real world Does it have to know what it's protecting in other words understanding the configurations of all the machines and the Open ports and you just everything else and it seems to me the simplest way to gather that information is to have like a Privileged crawler or something like that like that can draw information from the different machines to find out about their configurations But if it's doing that it's on your network and if that thing gets compromised Yeah Yeah, that's a hard problem and usually What this what happened what's happening in a simulation world is that those agents are provided that Global overview of state something that you described and that's what they are being trained on. I mean, I'm not aware of somebody doing the Defender that would take care of the whole infrastructure and doesn't have access to this. So Yeah First time it get compromised. It's gonna get ugly But then on the other hand, I'm working mostly on attackers. So I'm okay with that Did I do Joe it's probably 10 people Securing some Funding for support from some of the big companies that are certainly interested in This type of work actually not not a big companies this this research started originally as a research working group by NATO and We got some support for developing prototypes of those type of agents but No, we're actually not Not contacting large companies because fortunately we still have funding from public sources and usually want to Say have everything as open as possible and we're not sure how it will turn out companies Will change Well, that's something I would like I would like to have those age those attackers and defenders So if I think you can see each other and bettering themselves and one of the reason is that What I see in the literature is that The attackers are usually pretty stupid. That's they're not doing much of the complicated work Yeah, and that's Then there's a perception that the attackers are like this and the defenses are being tailored against those type of attackers So I would like to have even better attackers so that the defenders can get really better and to have it let's say on Yeah, to move it to a higher level Yeah, we also have one and It is this out another step we Right now we have the simulation we have a way to transfer it to emulation like We have a some kind of configuration that describes how the Infrastructure with all the agents looks in a simulated environment. We have a way to transform it to decorized environment and But all those configurations are pretty easily transferable. So we'll be moving it those best agents or that those agents that need to for example a Interact with with humans will be moving it also to put it virtualized environment Before we all disperse I've been asked to inform everybody that number one There is going to be a social event today. So you can pick up your tickets and check in during the lunch period second There is going to be Wait to be feedback on sketch that work So you want you can say how you like or you don't like the event third after we have a social event around 2130 I think there's going to be some fireworks and drone show Around your dad. So if you want to watch you can you know join it. It's public It's not funded or in any way and yours by this organization, but it's a nice event anyway, and finally There's going to be a win-win-win session tomorrow or you can win some swag or that's really yeah, you can see That's that was just on the screen. So yeah, you can win some swag for answering questions. So yeah, it's kind of it