 Hello, I would like to talk about our paper, timing block was attacked crafting that puzzle example choose timing rigs against DNN on embedded devices. I'm Tsunato Nakai on Missile Shaking Corporation. This slide showed a quick overview of my talk. We focus on that puzzle example, AEs. We introduce a fast attack to cracked AEs that is based on difference in processing time according to input data. The problem is that cracked AEs needs trained data, module architecture and parameter, substitute model, or module score. But our attack cracked AEs by using only processing time on DNN on embedded devices. Our contribution we propose a novel path of attack to cracked AEs by using selection weeks. We identify two relationships between the processing time and number of activated nodes and between the number of activated nodes and AEs. And we clarify the cause of our attack by implementing a countermeasure to prevent the timing rigs. AEs are small participation to the input data of a deep learning model for misclassification. AEs are easy to exploit and have a large impact on real-world. For example, attaching a small sticker to the door sign as an AE. An automatic drawing system is misrecognized to stop other speed limiting. This table shows the taxonomy of crafting AEs. The methods are classified into white box attack and black box attack mainly. White box attack use trained model information such as the architecture and parameter of DNNs, reverse engineering countermeasure such as the model encryption, protective model information. The black box attacks use only input and output data of DNNs. From the viewpoint of attacker, black box attacks are more serious than white box attacks because of the attacker's knowledge. But black box attacks are more difficult to craft effective AEs than white box attacks. In the black box attacks, the attacks using transibility can craft effective AEs. The attacks craft substitute the model using the input and output data, and then do white box attacks using the model information. But transibility depends on the kind of trained model, and the trained model with deeper hidden layer have robustness of transibility. The attacks using model score are also one of the black box attacks. The attacks focus on the changes of output probability. The confidence reduction countermeasure is effective to masked output probability. The point is that crafting AEs needs some model information. So our motivation is that can we get some information from surgeon weeks to craft AEs? Related work uses trained model information. Some works reveal the model architecture and parameter from surgeon weeks, such as electromagnetic radiation on embedded devices. Especially, existing black box attacks use output probability. Our approach is the black box attack using surgeon weeks. So research question is that can attacker craft AEs using surgeon weeks if the output probability is masked? This slide shows the threat model. We focused on DNNs on embedded devices. The DNN on embedded devices are attracted interest because of distributed and real time processing on site. An Anisa report mapping the asset and threat of AI notes physical attacks to hardware. So in case of embedded devices, attackers can measure the sexual information such as processing time. We assume two security functions for embedded devices, model encryption and confidence reduction. So the attackers measure to craft AEs by using the input data of driver and surgeon weeks. Finally, the attackers craft AEs on a target device and then input the AEs to output devices for misclassification. I explained our approach to craft AEs using surgeon weeks. In related work, Bettina reported that some types of activation function have differential processing times depending on the input data. For example, in case of a rail function, it is implemented like this source code. So depending on the input data, the rail function indicates two cycles as differential processing time. When we defined TA is a processing time when it is activated, TNA is processing time when it is non-activated. TA is longer than TNN. Bettina also showed other activation functions such as sigmoid also have data-dependent timing leaks. Now we can observe the change in the number of activated nodes on DNNs from the data-dependent timing leaks of activation functions. For example, in case of DNN with rail function, we can describe the processing time T. According to the input data X1 and X2, TX1 is 4TA plus 5TNA plus alpha, TX2 is 8TA plus 1TNA plus alpha. According to timing leaks of rail function, TX2 is longer than TX1. So the DNN with more activated nodes is more time consuming. I'm explained that increasing the number of activated nodes affect the output data of DNN because of the increasing number of propagated values. For example, in case of DNN when the input data is X1 and X2, if the number of activated nodes increased, then the output probability of correct label is more affected than the output probability of failure activated nodes. Next, we simulated the connection between the number of activated nodes and the output probability of correct labels. The simulation uses a basic MLP model for MNIS dataset on a Linux OS computer. We focused on activation nodes in the first layer that is sensitive to input data of the model. We manipulated the number of the activation nodes. The simulation roughly derived the activation nodes with activated value or non-activated value. This graph shows relationship between the change in the number of all layers activated nodes and the main output probability of the correct label. The output probability decreases as the increase in the number of activated nodes. In the simulation, this graph shows the relationship between the change in the number of all layers activated nodes and the number of successful attack. It is a misclassification rate. The decrease in the output probability caused misclassification and increased the number of successful attack. According to the simulation, we defined two relationships between the processing time and number of activated nodes and between the number of activated nodes and AEs. So the strategy of our attack is firstly we add a small perturbation to a part of input data and then we measured the processing time of prediction. Finally, we craft AEs with timing weeks. The AEs cause misclassification to other devices that have the same DNA model of a target device. We demonstrated our attack on MCU's. This figure shows the setup of our experiment. We used micro tensor framework to replace the DNA model on MCU's. And we deployed a basic MMP model for MNIST dataset and the CNL model for CIFATN dataset on two MCU's. This graph shows the histogram of successful attack to the MMP model on the MCU. The data showed the perturbation bound until the misclassification is caused. We compared with random noise under the same condition without the use of output probability. These graphs indicated that our attack tend to the craft AEs with small perturbation compared with random noise. This graph showed histogram of successful attacks to the CNL model on the MCU. The data showed the perturbation bound until the misclassification is caused. We compared with random noise under the same condition without the use of output probability. These graphs also indicated that our attack tend to craft AEs with small perturbation to the CNL model. Next, we analyzed the cause of our attack by using constant time against data-dependent timing weeks. We assumed constant time to prevent the timing weeks depending on the input data of activation function. So we improved activation function to consume the same processing time regardless of the input data. We deployed a countermeasure using the inline assembly code on the MCU. These graphs show the histogram of successful attack to the MMP model on the MCU. The data showed the perturbation bound until the misclassification is caused. These graphs indicated that our attack tend to prevent the occurrence of timing weeks by countermeasures. I'm discussing the limitation of our attack. The first one is activation function. This talk focuses on red function. The paper showed result of sigmoid function. Our attack are successful. Our attack depends on the types and implementation of activation function. The second one is other platforms such as GPU or TPU. We focused on MCUs. There are some neural network accelerators such as GPU or TPU. The paper showed the evaluation of JSON and GPU and call edge TPU. We cannot observe the timing weeks due to the prioritization of activation function on GPU and TPU. I'm concluding my talk. We proposed a novel black box attack to cracked AEs by using differential processing time according to the input data on DNS on embedded devices. We identified two relationships between the processing time and the number of activated nodes and between the number of activated nodes and AEs. We demonstrated that our attack depends on the processing time on activation function. We clarified the codes of our attack by implementing a counter measure using activation function with constant time to prevent the timing weeks. Thank you for watching.