 Good afternoon. My name is Xiao Jingjiao. Today I will talk about how to leverage open-source information to make effective cyber attack and defense against different models. Before I start, I'd like to introduce who we are. We are Northwest Security, a team of independent security researchers and data engineers. Right now we have five members from left to right, Chen Li, Yang Chuan, Li Yang, Xiao Jing and Li Wei. We have diverse backgrounds but we do have a united interest which is to apply AI techniques to solve security issues. We have been fascinated by those ideas coming out of interaction between AI and security. So we have been actively looking for opportunities to learn more along that direction. And we also, we are the team got invited for the CED CTF, which is the first CTF competition on adverse service attacks and defenses. So as a new team attending the CTF for the first time, we learn a lot from beginning of preparation to the end of competition. So today this talk gonna meaning, sorry, gonna meaning folks sharing our learning from this event. So this presentation gonna only include two parts. In the first part I'm gonna talk about how do we evaluate open-source attacks and defenses. In the second part we're gonna introduce our design of the defense for the CTF. So because due to the limitation of the time and the resources, we didn't try to develop our own method for the attacks and defense. So basically our strategy is just try to utilize or leverage the public available information or open-source code. So in order to select those methods, we have to define criteria. So we define our criteria based on the requirement of the CTF. So there are three factors we are considering, speed, transferability and the strengths. For defense, because CTF has a hard requirement on how fast the image classifier has to run. So we put that one as a top one priority. So the requirement is two seconds per image. And also because we have to, we only can submit one defense. So transferability is also very important. So we even sacrifice our strengths and to to enhance our transferability. And on the other side for the attack, because there was no limitation on what kind of attack we can use, how many times we can attack. So basically we only consider the strengths of the methods. But actually this strategy turned out is not that hundred percent correct. So because we just finished our CTF, actually the speed for attack is also important because we implement some strong attacks, but it takes way long time. So we don't have the opportunity to try it. So basically every time we each run we only maybe just try maybe like one or two times. That's it. So next time maybe we can improve on that. So we use a target rate to evaluate the strengths of the method, which is a percentage of the images being misclassified as target classes. So keep those criteria in mind. We start our evaluation. So basically the evaluation starts from building a baseline basically. So we do a benchmark of the base defense attacks. So to do that we use two open source libraries for attack. We use clever hands, which is a Python library which targeted, which actually built to benchmark the machine learning system against adversarial attacks. So it's basically the collection of different adversarial attack methods. And for defense we actually Google provided those pre-trained weights actually. And they even provide those adversarial trained models. So what that means is those model is trained by both original image and also some other several images. So there are actually there are two popular architectures. One is an Inception V3 and another is an Inception ResNet V2. So in order to do the evaluation we have to use some test data set. So basically the test data set we are using is provided by CAD, which includes thousand fresh images classified into thousand classes with ImageNet labels. If you look at the right hand side that distribution target class versus true label, you can see the width of the distribution along along the X direction is very consistent. So that means those target class is uniformly selected from zero to thousand. So which means this is an average case of a scenario. So what that means is if we use this data set to evaluate our method, that's going to give us an average estimation of this method. It's not the best case and not the worst case, it's just average. So by using those two libraries we start our evaluation. So this table shows a target rate of four attacks against three defenses. So all four attacks use the exact same gradient based method which is a basic iterative method, DIM. The difference is the first three, the different model has been used to calculate the gradient. So basically the attack is calculated based on the gradient. But those three attacks use different models. Actually they are corresponding to those three defenses. So basically the model in 7v3 and also the adversarial training 7v3 and ensemble adversarial training in several residents of V2 has been used to calculate the gradient. So basically we can see for each column, for each defense, we are doing a white box attack. So we can see the white box attack is very effective actually. For example, for the first one, BIM based on inception model, we can have like 89% hit target rate. But even for the second one and the third one, the second row and the third row, those two, those two, sorry, the second column and third column, those two defenses, those two defenses have been adversarial trained. Even for that model, it's still vulnerable to the white box attack. We can still have a very high like 75% and 85% hit target rate. So but we can also see even though the white box attack is very strong, but the transferability is very bad. So basically it only works for that model. For example, if you look at the first row, it only works for in 7v3. But for another two defenses, the hit target rate is zero. So it's totally it doesn't work. But we can improve that by doing some assembling. So basically the last attack ensembles three different models together. So the basic idea is we calculate three different adversarial images based on three models. Then we just calculate the average and use that average image as our final attack. So by doing that, you can see we can keep, we can still keep a high hit target rate, but also improve the transferability. So basically it works for all three defense. So that's just because I forgot to mention, so because those two levels are publicly available. So basically we treat those when we analysis those performance, we just treat those like a lower boundary. So basically if we pick the attack or defense, it has to, it must be better than those methods. So now then we try, we also try some strong defense like Guided DeNoise, that's actually the author is sitting in the audience. And also the Random Padding. So both of them actually the first and the second place in the 2017 NIPS Adocera defense competition. So you can see those two defense are very strong. So basically for those four attacks, the hit target rate is just zero. So it's just totally not working. But for us, our strategy for attack selection is basically just make sure we have a corresponding attack for each single defense we can find. So because there's no limitation on how many times we can try. So basically we build a set of, a broad set of attacks for this CTF competition. So this diagram is a polar plot, which is a polar plot of a hit target rate. So basically different colors stand for different attacks and six different defense is located on six different angular position. So from this, this is like a needle-like branch. So this needle-like branch pattern actually can tell us two kinds of info. One is the length of the branch, which is the value of the hit target rate. So which actually is the strength of the attack. And the number of branch, which actually tells us how good the transferability of this method is. Because the more branch we have, which means this single method can attack more defenses. So basically through all our evaluation, we use this plot to evaluate all the methods. Just trying to get a balance between the transferability and the defense. So those three methods we implemented. So for guided denoids, we kind of, we kind of like, because they are, they open source of everything. So kind of, we just implement a wide-boss attack. So actually it's very effective. We can get like 98% hit target rate. And for another two kinds of defense, like a random padding, basically they randomly adding some paddings onto the input image because they try to utilize the transfer environment feature of the CN-based image classifier. So basically for our original image, if you add random stuff on it, it will affect the classification results. But if we input a fake image or an adversarial image, when you add some random stuff onto it, it's going to strongly degrade the influence of that attack. So that's the idea of the random padding. So we also implement some, adopt some method to implement target attack for that. So now I will talk about some strategies we applied in both defense attack. So basically for the first row for defense, it's called, it's very popular one. It's called ensemble adversarial training. So basically we not only just train the model with both the original image and adversarial image, we also try to generate those adversarial image from several different methods. So we try, by doing that, by doing this ensemble adversarial training, you can strongly, you can effectively improve the transferability of your defense. But on the other hand for the attack, you can also apply ensemble adversarial attacking, which also can improve the transferability of your attack method. So basically same idea you can apply for both defense attack. Another idea is for defense is a gradient masking. So basically right now for those popular attacks, most of the popular attacks is a gradient base. So basically which means you have to get the, you have to somehow you have to get a gradient of that target model. Then based on the gradient you can, you can generate your attacks. But this gradient mask is just trying to hide those gradient info. So you, they can, they can just make gradient unsmooth. So which means it's kind of like stepwise. Or you can, they can just make the gradient vanish totally become very small or very large. It's exploding. So basically when you get those info it's just guide your calculation to a wrong direction. Basically by doing that they can achieve the defense in purpose. But still on the attack side, researchers still also propose some corresponding attack, attacking strategies. Basically if you make a step, stepwise the gradient, basically you generate a lot of local minimum. So then when the attack try to escape from those local minimum they just, they can, there's one way they can do is they can just add a random prohibition each time onto that calculation. So basically when you start into some local spot and you just randomly jump, jump it out. That's just this idea. That's the idea of a random prohibition. And also some, some attack they use a gradient smoothing. So basically when you calculate that, when they get a gradient info they apply a Gaussian smoothing and smooth that gradient out. So everything is perfect for the attack again. A third new method is called a backward pass differentiable approximation. So basically it's approximate. So basically this method is just, we cannot get the real gradient from that network. Then we can, then somehow we can approximately calculate one. So basically that's the idea. So we don't, we don't use the target to calculate a gradient. We just somehow come out of an idea, get a close one, then use that one as our gradient, then achieve the, the attack. And the third strategy for defense. So basically the gradient is an adversarial attack, which is a, is because of the adversarial, adversarial noise. So it, it is a noise. So very, very natural solution for the noise is just do a filtering, just get rid of noise. So that's the third, the, the basic idea of the third one, which is, which is use some image processing method. Just, just filter those noise out. But still from a tag side, people still figure out how to attack this kind of defense. So basically what you are doing in the defense side, I just doing the same thing on the attack side. When I calculate the attack, we just improve those filtering or anything inside it. So when, when I calculate the attack, it's going to take into account everything. So that's the idea. So actually we use this, we use this method to achieve the attack against the random padding. So basically we, when they calculate the attacking, our third image, we also add the random padding into a iterative calculation. Then the last one, I think that's the only one, there's no effect attack method, so which is a detection only method. So basically it's not enhanced the, the robustness of the system. It's just add on another module to attack. If there is a adversarial attack happening or not, if it's happening, we just give an alert or we just, just returns some random stuff. So that, that's the idea of a detection. So that's all about the evaluation. So based on the, so we do a, we do a very extensive evaluation of a lot of attacks and defenses. Then based on those learning, we came out the, this design of our defense. So basically this defense include two modules. The first one is the left-hand side, that big, big rectangle, which actually is our image classified. But this image classified include two image classified. So basically we paralleled a line, those two image classified together. Then we put a difference filter at the end. So basically what is happening is when there is an adversarial image coming, it's going to get classified by both of those two classified. One is a CV2 plus random padding. Another is a guided denoise. So then there's, there are two labels output from those two classified and part is going to get fit into that difference filter. So basically we're going to compare. If those two agree with each other, then we think that's the attack field. Basically we, we get the true label. But if they're like, we don't agree with each other or contradict with each other, then we just simply return a zero random number. So that's the idea of the first module. So I want to spend a little bit time on that CV2 filter. So, so basically CV2 is the name of the open CV package in Python. So basically it's, this is also an image processing idea. Because when we are doing, we read a lot of codes, they are doing a lot of, they're doing the image processing through a lot of different ways. But then we think why, we just, why don't we just use the professional packages just doing the image processing. So open, open CV is the very famous open source library. They are just doing the image processing. So they provide a lot of sophisticated functions. So basically you, and also it's, it's implemented by, by C. So it's very fast and very easy to use. Basically you can just use the function, just add a one line. You just, as simple as one line code. Just add that filter in any existing defenses. So right now what we are using is called bilaterary filtering. So which is edge preserving noise reducing smoothing filter. So basically what it's doing is this, actually those four images show the effect. So the left, left up corner, the first one is the original adversarial image. Then if you, if you go through those filter three times at the last one, the bottom right, that one, you can see that on those funny, funny pattern on the background, those actually is the adversarial partition, try to, try to confuse the classifier. So basically by going through all those filters, those noise is smoothed out, but at the same time the edge is still preserved. Because we know for those image classifier, they kind of like they, they try to learn those edges. So if you can maintain those edges, which means this matter won't affect your classification accuracy. So to approve that, we apply this CV2 filter onto the guided denoise defense. And use it against three different attacks. So this time those color patterns stand for defense. And the left up corner, that one is what we implement, what about attacks against this guided denoise method. So you can see the yellow one, which is the one we use CV2 into the, this guided denoise method. It can effectively reduce the paytackly rate from 96% down to like 45%. So our method, our method is, this CV2 filter is actually an effective method. And also the reason why we put another two defenses, another two attacks on the, on the, on the other side, is just try to see if this CV2 filter gonna degrade your transferability. But it's turned out is not, it's not, it doesn't do that. So we think, we believe this CV2 filter is a very effective way to at least enhance your defense. Because it's very fast and it's also very flexible. You can, you can apply it to any existing system. So that's about the first module. So the first module, the successful defense, defense using the first module, relying on an assumption, which is there's no attack which is strong enough to fool or confuse both of them. But that's not true actually. Even, even we know that's not true because we do a, when we do the testing, there's some attacks which can fool both of them. CV2 plus running pending and also the guiding hours. So basically both of them gonna output the, the exactly same label but that label is wrong. So which means the first module gonna fail if, if that happens. So that's, that's where the second module come, come into play, which is a Keeming filter. So this Keeming filter is based on a machine learning method called Keeming clustering. So the assumption here is this Keeming clustering algorithm is robust against a neural network based adversarial attack. So in order to verify that we conduct a very little, a little experiment. So this is an experiment. So basically what I'm doing is there's a thousand original images. Then we generate another thousand adversarial images. Then they, then we put both those two thousand images into this Keeming, Keeming clustering. So this Keeming clustering are gonna, gonna put, put, gonna automatically classify those images into different classes. Then we compare each orange image's class against that adversarial version. Then we found that that actually a 94% of them are matched. So which means those adversarial attack don't affect the accuracy of the Keeming clustering. So we can trust this method. Then based on that method, I'm gonna talk about how we implement this Keeming filter. So basically the first one, the top one is, is describe how we generate a mapping because this, all the, all these competitions are based on the ImageNet dataset. So ImageNet has like thousand labels. But when we do the Keeming clustering, it's impossible for us to do like thousand, thousand, thousand clusters. So basically our Keeming is trained to classify the image into ten, ten classes. So how can you map a thousand, thousand, thousand different labels into a ten different classes? So that's, that's how we, that's how, how we do it. Basically we, after we train our Keeming clustering cluster, we use another test dataset. So that test set is gonna go inside into the Keeming clustering. We know those true label, those ImageNet label of those test class set. Then after we classify those image with Keeming cluster, we also know the Keeming classes. So that's gonna generate a map, a map, just basically the table. So we know which, which class gonna correspond to which ImageNet labels. So after that mapping label is ready, we start to finalize our logic. So basically it's very simple. So when an adversary image coming, first it's gonna go through the first module I just discussed. Then it's gonna generate a label by which, which is an ImageNet label. Then that ImageNet label gonna convert to a Keeming label by, by using that mapping table we just generate. So which, it's basically just a 10 elements list because we, the Keeming, we only have 10 Keeming classes. So this, those 10 elements actually is all, is all of the by probability. So, so the, for example, the first list, which means we think this label is most likely in class nine, but least likely in class four. Then, at the same time, this attack image also gonna rarely fit into the, the Keeming cluster. So Keeming class is gonna generate another list. So it's also a 10 elements list. But for this list, the Keeming thing, the Keeming things, most likely this image is like, is in, is in class two, but least likely in class nine. So these two lists actually they, they contradict to each other. They, they don't agree. So that's when we think, oh this filter thing, this attack actually succeed because they are contradict. So then they gonna just output a zero or random number. Only, only when those two lists match or some, or pass some checking, checking logic we said, it's gonna let the, the original label come through. So that's the idea of the Keeming filter. So that's all about the design of our final submission for the defense. So this is a, just a quick, just a same polar plot showing the hit target rate of our defense against some strong attacks. So you can see for all six attacks, the hit target rate is close to zero. So we believe our defense is effective and also has, has a good transferability. But it's turned out, it's not. We still lost some, some points. So I think, I think for the Keeming field, Keeming module, there's still a lot to improve space. So that might be a direction we, we try to improve. So in summary, I talk about how we select a set of attacks, a gaming, aiming different defense. And also I talk about the defense comprised of image processing, classified difference filter and a Keeming filter. With that, I'm gonna finish my talk. Any questions? Yes. You mean, is that a random or not? Yes. Yes. So we are, we are, yes. That's the, that's the main, main way to achieve the attack. You mean the attack method? Yes. Actually, so, so the attack actually is a, mathematically is, is actually just solving an optimization problem. So it totally depends on how do you, how do you define your problem? So there are some attacks, they can just only adjust one pixel. It's still, it still achieves their purpose to change the classification. And also they have some, some research proposal, some, I don't know, sort of patch. So basically it's just a pattern, a patch. When you pull that patch onto anything, the image classifier gonna misclassify that image. So basically you can, yes, by define, by define your problem, you can control where, where, where you pull those noise, even though there are noise, but you can still control where, where you apply it to it. Yes. Yes. So from an image perspective, how the attacks and defenses work. Yes. How can you translate those same concepts to, let's say, a piece of malware that you're trying to process through AI or ML to classify whether it's benign or malicious. How, how would we say, okay, this is, this is how we modify the piece of malware with noise or adding new processes through the AI. Is it literally the same way that you would modify the image to be able to tamper with the algorithm? Okay. So your question is just asking if the same, same method can be applied to a different other AI applications besides. Is it literally the same way or is it completely different at all? I think fundamentally they are same. So it's, it's still another kind of optimization problem. You just try to solve that optimization problem. But in practical, when you're doing attack against some other stuff, even though it's not that far away, like I'm doing the same attack against an image classifier, but in, in real, in the real world, there are more, there are a lot of factors has to, has to be taken into account. So, for example, there are some, some research about the, so right now what I'm, what I'm talking about is totally in the digital domain. So basically everything achieved in the computer. But if you want to achieve a physical domain, like you just put something on the wall on the, like on the washing machine, the washing machine is going to miscalify as a dryer or something. For that, you have to take into account, like the light, the, the, the wheel angle, all those factors has to be taken into place. So basically it makes the problem more complicated, but I think fundamentally they're same. Yeah. Yeah, actually that's a, that's a very good point. So yeah, that's why we, so there are a lot of different ways to apply the same idea. So basically it's, basically what, what we are trying to do is we try to get rid of the, the bad noise, which is adversarial noise, but still keep the value information for the classifier, because you cannot just totally get rid of everything. So that's going to also going to mess up your classifier. Because I'm actually even for this series two filter, I didn't show here, but if you're, if the parameter didn't set correctly, the accuracy of the classifier going to strongly degrade. Yeah. So basically, yeah, that actually that's a good point for, we can try to do something in the frequency domain. Actually that's one of our team, yeah, team member, yes, yes, yes. Actually that's, that's one of our team members. He's trying to work on that direction. Basically it's, it's not doing the image filtering. We're, we're trying to look at the frequency domain to tell if it, if it's a real image or not. So that's, that's, that's our idea. Yeah, yeah, that's another way. Yeah. And also because what, even though you carefully set up your filtering algorithm, it's still going to affect the classifier accuracy. So basically I, I read some paper, basically the researcher is doing is this, they use another set of a neural network, just try to calculate something called class activation region. So basically when the classifier look at the image, there was an interest region or those not interest region. So basically that neural network just try to figure out which region is more, more important, gonna affect the final classification for that classifier. Then when we know that region, when they apply the image filter for the attack, we don't touch that region. We just filter everything else out. Yeah, that's, that's the, yeah. Yes, yes, yeah, yeah, yeah. But that's gonna be slow. Yeah, yeah, yeah. Okay. I have one more. Oh. Nobody's gonna ask, I have one more. Okay. So you, you, you talked about processing the different images through different filters to increase the accuracy, right? Yes. One of the things that you mentioned was carrying out of the serial versus originally images to kind of, you know, have a, have a baseline that you can use to, you know, understand, right? Yeah. So does, does the accuracy improve the more images you feed into, into your dataset? Or is there a certain point that it doesn't matter how many images you throw at your, at your dataset that the accuracy just peaks? So I think that depends. So it depends what kind of algorithm you are using. So for example, if you are, so when, if, if you, when you train that classifier, if you're taking into account both like pure original image or some image after filtering. So when you train that model, you're already taking into account that factor. So then when you do the image filter later, it won't affect that classifier. But if you, if you try to, when you feed more image, I don't know, I'm not sure I fully understand it, but you mean, if you feed a lot of image, I think you're talking about training, right? Yeah. So for training, yeah, we can, so that's actually, that's the one, that's the, the key point of the, the adversarial training. So basically, the adversarial training is mixed of the original image and adversarial image, but we can also do a mist of original image and some image after image filter, right? So by doing that, we treat, we train that classifier, that classifier gonna learn, okay. So some image is original image, but some image is after some filtering, but I can, I can still identify their same. So that's kind of how you, how you build your training, training dataset. But let's say you had unlimited time, the accuracy continue to improve the more you, you feed, or is there a certain point where it just peaks, no matter how much more time you train? I think it's that's, that's limited by the, the neural network is not limited by the image filtering. So basically, I think it's gonna reach a peak. I don't think you can just keep going. Yeah, I don't, I don't think, I don't, I don't feel that. Yeah. Okay. Okay. Thanks.