 Next we have Ivan with DeepFish Simulating Malicious AI. We'd like to thank our sponsors Endgame, Silent, Sophos, and Tinder. And without further ado, here's Ivan. Okay, so last talk before lunch, let's see how it is. So my name is Ivan Torroledo or Ivan Torroledo in my language. I'm going to give a talk that we call DeepFish Simulating Malicious AI. So first of all, let's see who I am. So I am the lead data scientist for the research team AXE XTERRA. I studied physics and economics. Additional to the data science research that I do, I also like to use Fortran and do a lot of stuff doing parallel computing. And finally I am from Colombia. However, I hate coffee. Yeah, it's a sad story for me in Colombia, but so here is my Twitter account if you want to follow me. So let's start. I like to start my talk using this figure because basically it provides the motivation that we had at the beginning of this project. Here we have the Google Trends results for the keyword cybersecurity and AI. As you can see in the last three or four years, there has been a dramatic increase in the interest of the people around the relationship between these both topics. So if I had to summarize our current scenario, it should be this. Yeah, everyone is talking about AI in cybersecurity. So knowing this at the beginning of the year, we started to look for the newest cybersecurity marketing trend. And we found out several news wondering about how AI can be applied to cyber attacks. So if these were a terminated movie, people would be wondering about if Skynet is already among us trying to destroy the world. So despite all these news and marketing noise, we don't have a real evidence if this is happening. So the real question that we need, we first needed to ask is how attackers may enhance their campaigns and attacks using AI? Okay, so as a research team, we decided to start testing this question and this hypothesis. So by analyzing deeply the question and the idea, we realized that the first question that we need to answer is how attackers may enhance their campaigns and attacks using AI? Okay, so to start testing this question, we needed first to define a use case. So we choose a phishing campaigns and phishing attacks as the first one. The main reason behind this decision was that almost 91% of cyber attacks and cyber crime starts with a phishing email. So phishing is one of the most important tools that each attacker has. And if an attacker starts using AI, probably they would start improving their phishing attacks. So once we define the question, once we define the use case, we can start creating an experiment. So we created an experiment that has three steps and we called that simulating malicious AI. So at the first step, we wanted to identify some individual traitors. So why we need to do that? So basically, the main goal of this project is to understand effective patterns of phishing attacker and try to improve them using AI. So as we cannot now attack us directly, we must learn about them using their attacks. So to achieve this, we collected a database with almost 1.1 million phishing attacks coming from phishing and we decided to analyze this data. So how do you analyze 1 million phishing attacks? Incredible attacks. So we started by looking for the most common domains in our database. It led us to this first domain called nailerandticks.com. Basically, this domain belongs to an online attack store that obviously was compromised previously. And in the whole database, we found 406 URLs using the same domain. Now, to check if this set of URLs belong to the same attacker, we want to verify if they were targeting the same institution. So we did a visual check to the screen shots available in fish tank and we realized that they were targeting the same institution. In this case, a mayor Brazilian bank called Bradesco. So finally, we can say that this set of URLs belong to the same attacker because they were targeting the same institution. Now, what happens if the attacker is not only using one domain? It's a fair question. So we started to look for this strategy in the whole database. So by analyzing the data, we realized that they were some keywords commonly used in the data. So we collected these keywords to define the attacker strategy. Now, by checking this strategy in the whole database, voila, we found 105 additional domains using the same strategy. So finally, the first domain that we need to do is to verify if they were targeting the same institution. So again, we did the visual check and voila. We verified that they were targeting also the same institution. So finally, at the end, we can say that this whole set of URLs belong to the same attacker because they were targeting the same institution and they were using the same strategy. Okay, so cool. We just uncovered one factor in our 1.1 million phishing attacks. So we keep doing this process for a while. So we found additional domains using additional strategies and targeting other institutions. For example, in this case, we found that they were targeting a Canadian bank called TD Canadian Trust. So at the end, we just uncovered several transactors using this strategy. So that was the first step. Now, once we uncover or we identify some transactors, we wanted to evaluate how effectively they can bypass our detection system. Indeed, our own AI detection system. So how we do? First of all, of course, we needed to define our detection system. So for this case, we used an AI classification model that we created previously. Basically, it's a model that is using LSTM neural networks. And basically, by analyzing each URL, the model is able to learn defective patterns or to gather the strategies and the patterns that each URL has and at the end produce a probability if the URL is being used for phishing attacks. That's the whole intuition behind this model. Okay. So once we define the detection system or our own AI detection system, we started to measure how effectively they can bypass our detection system. So to measure this, we defined something that we call the effectiveness rate. The effectiveness rate is defined as the percentage of URLs that are able to bypass our detection systems. So for the whole database, we found that we found that the effectiveness rate was 0.24%. Okay. Now, for the threat actor number one that we just uncovered, we found that the effectiveness rate was a little bit higher, 0.69%. Finally, for the threat actor number two, we found that the effectiveness rate was 4.91%. Now, at the end, we can say that despite these two threat actors that we just uncovered were a little bit more effective to bypass our detection systems that the average attacker in the whole database, we are doing a really good job with our detection systems in the whole database. So we can say yes, we are winning the battle. Cool. So that was the second step. Now, let's see the final and the most interesting step, the third one. So in this step, we wanted to improve the phishing attacks of each attacker using AI. So let's see how we achieve this. So to answer, to achieve this, we created an algorithm called DeepFish. So let's see how it works. First of all, the model divides the data of each attacker into a set of non-effective user-friendly URLs that the algorithm, our detection system, was able to catch and a set of effective URLs that bypass our detection systems. Okay, so taking this last set, we transform the data, we encode the data into a mathematical representation such that we can input this data into an AI model. So what kind of model did we use? Again, we use an LSTM model or launch and turn memory that basically the whole intuition behind this model is we give one URL to the model, the model catch or gather all the patterns in the URL and then start producing new characters that at the end, we collect these characters to build new URLs following the same patterns that the previous URL had. Okay, so that's the model that the AI model that we use. So once we have this model and once we have this trained model we are now able to start producing new URLs in the following way. So first of all, we give to the model once it, for example, it can be a segment of a URL that attacker has and we start producing new characters to create new paths. Then we filter that path to gain some aloe paths and by joining this data with a set of compromised domains that the attacker should have, at the end we are able to create something that we call a synthetic URL with this form. So basically to summarize, we created an algorithm that was able to analyze a data that we gave to the model and produce new URLs following the same patterns. That's the whole idea behind this model. Okay, so that was the experiment. So let's see the results. So as you remember, in the traditional way, the attacker number one has an effect in the rate of 0.69% and the second one, 4.91%. So what happens is that the attacker has an effect in the points when the attackers start using AI. Boom. We can see that for the threat actor number one, we increase the effectiveness rate from 0.69% to 20%. And for the threat actor number two, we increase the effectiveness rate to 36%. So with these results, we can say that if an attacker starts using AI in the way that this fish does, they will be able to bypass our detection systems more effectively than before. Yeah, that's the experiment and that's the conclusion. So if this fish could say something, they will say something like this. You are doomed. Yeah. Okay, so however, down panic, we keep improving. So the next steps in this experiment is to include another AI tools, for example, deep adversarial learning or generative models that basically at the end, including these models, we are going to be able to anticipate how attackers may use AI to enhance their attacks. And by anticipating this, we can say that we are going to be winning the battle against AI and SkyNet again for the next year. So thanks so much.