 Hello, everyone. This is Anita Avae. In this slide, I would like to present our research work, which is about inconsistency of simulation and practice in delay strong props. This is a joint work with my advisor, Ami Muradi. Well, at first, I would like to give you a brief introduction about POP and its applications, why we are using POPs. In order to limit the secure memory usage and avoid the attacks on non-molatile memories, physical hardware primitives such as physical unclonable functions, POPs have been introduced to the world of the embedded devices two decades ago. Such functions supposedly provide a secure hardware platform with light with features. For example, low energy consumption or a small area footprint. These fingerprint of devices are using challenge response mechanism. This mechanism is the main approach of these primary functions with a unique behavior for each POP. In more detail, if I want to explain, it's like that an ambiguous and unpredictable stable and unique response is answered for random and big challenge to create a set of challenge response per CRPs. It means that each POP in idle conditions generates a reliable uniform and a unique response but not the predictable one. I should highlight that our main focus here in this world is strongly based POPs. In the POP area, we have several challenges that we will talk about just two of. The first one is ML attacks or machine learning attacks. According to the recent publications, machine learning attacks or let's say, pure machine learning attacks are identifying as the most common and effective threats, which can be performed with different classifier algorithm. As you can see in the left figure, there are some learners like logistic regression, software vector machines, decision trees or using a black box learning algorithm like ANN are usable in POP learnings. Actually, the goal of POP modeling is to create a model of the strong POPs to generate the same response of the same given challenges to the POPs. Afterwards, we focus on the next challenging point, which is POPs hardware implementation, especially the delay-based ones. The most difficult challenge is to achieve an implementation with physical characteristics not far from the ideal conditions. Along the same line, delay-based POPs are prone to environmental noise like temperature and supply voltage variation. This motivated the researchers to deal with various hardware implementations of delay-based POPs. Now, I think it's a proper time to talk about our motivation and collaboration. The main question is that are all ML attacks in simulation domain consistent with the same accuracy and success rate in the real world? To answer this question, we observe various the most common delay-based POP architectures like Arbitr-POP and their evaluation matrix, like uniformity or the opposite one by us, or other metrics like the uniqueness and reliability that can affect POPs robustness against the modeling attacks. In this way, we choose our case study as IPOP architecture from just 2019, and its classical LR-based attack as a splitting IPOP which is again presented in chess by 2020. Our practice investigation on real datasets collected from more than 100,000 various IPOP implementation from 100 FBJ cluster. We will explain in more details in the next slides. One of the most common strong POP primitive is Arbitr-POP which is applied in many delay-based POP architectures. As shown in this figure, the trigger signal should pass end stages which is controlled by challenge bit. At the end, an Arbitr or a flip flop decides the final response of the Arbitr-POP depending on which delay line is faster than the other one. For a success POP modeling, we need a proper POP model. As you can see in this equation, which in this equation, it is commonly used as a linear delay model of Arbitr-POP. In this equation, you can see that there is a vector of weights, that these weights represent the physical characteristics of each Arbitr-POP switch. Also used, you can see the feature vector, which is drawing from that given challenge C. At the end, the Arbitr is presented as a unit sign function to decide the response should be 0 or 1. This additive linear model can be extended to other delay-based POP primitive like its or Arbitr-POP. In order to increase the complexity of the POP model and boost the ML resistance, XOR Arbitr-POP has been applied as a fundamental element as well. We will talk about the XOR Arbitr-POP and Arbitr-POP in our cases study architecture, IEPOP later. Now, it's time to give a brief background about the two common learners, LR and ANM, which is the most common in POP in the strong POPs modeling. Well, the goal of LR or logistic regression like other modeling algorithm is to find the best fit and optimize models, models which means a high prediction accuracy and also minimum error rate between the prediction answer or prediction output and the real output. So by applying a proper activation function and enough iteration, as you can see here, for example, that the LR can have lots of these iterations, an efficient model can describe the good relationship between the output or dependent variable, which is here, the POP responses, and a set of independent variable, which is here, the POP challenges. Besides the classifier algorithms like the mentioned logistic regression algorithm, ANMs or MLPs are also able to learn the POPs models with the same goal. This network consists of an input layer, an output layer, and several hidden layer, which depends on the complexity of the model can be extended or reduced. In this world, we have chosen a composite POP architecture which has more complex model compared to the basic elements like arbitral POPs. As illustrated here, this interpose POP or IPOP architecture has two X or arbitral POP layers, top layers or X layer, X X or layers, or bottom layer, Y X or arbitral POPs. The one with response of the top layer from an MB to X X or arbitral POPs plays the role of the interpose week for the challenge set of the bottom layer. As a result, N plus one challenge bits are given to the bottom layer, which is the N plus one bit Y X or arbitral POP to generate the single bit final response. In the original IPOP paper, it has been shown that the classical LR or MLP attacks are not completely successful for XY IPOP. For instance, it has been shown that the classical LR will be ended to maximum 75% prediction accuracy on different IPOP variants, in which it's not an efficient accuracy. However, in the chess 2020 splitting IPOP attack applies a divided conqueror method to successfully break different interpose POP variants up to one nine and eight eight version. In a nutshell, this new LR splitting attack divides the Y IPOP into two X or arbitral POP components. As a first step, the attacker chooses random uniform interpose bit for N plus one with challenge of the bottom layer. Then the attacker applies a classic LR attack on the bottom layer as a divided X or arbitral POP and then just stops at a rough model accuracy with the threshold of 65 prediction accuracy. Then filter the CRP sets by choosing the correct interpose bit, which is case by the not very accurate LR model from the bottom layer. Afterwards, in the POP step, the attacker applies classic LR attack on the top layer to achieve high accuracy. And here again, the attacker comes back to apply the classic LR attack on the bottom layer with the correct predicted interpose bit. And this loop or this iteration can be happened multiple times to achieve the high accuracy. This splitting LR attack will be applied in our method to see how it can be different in simulation and in practice domain. Before that, I would like to explain a bit about the FPGA implementation of delay based arbitral POPs and its variants. So since our main goal is to analyze the effect of different POP metrics on the efficiency of ML attack, especially in experimental domain, we need to investigate our assumption on real hardware implementation. Here we will focus on FPGAs due to their reconfigurability and affordability in the market. So as presented in the left figure, we can see our experimental set up which consists of 100 instances of busses tree boards where Xilinx Arctic 7 FPGA is integrated. To explain our POP implementation, we need to say briefly about FPGA structures. The general FPGA architecture consists of three types of models which one of them is a configurable logic block CLB as shown in the right figure you can see. And each of these blocks has two slides. Each slide contains four LUT or let's say bills and they are from bell A, bell B, bell C, and bell D. It is noteworthy to say that in 8th of implementation, we implemented each switch stage by five to two LUT or LUTCOP tables. Therefore, we have applied two kinds of patterns in our implementation. The first pattern is random pattern and the second one is fixed pattern. The random pattern is that one of the bills can be randomly selected for each LUTCOP tables in each slide, as explained. It means that we have four to the power of N cases. We refer to this strategy as random placement. The next one is a fixed pattern. In this strategy, we can place just only one LUT in each slide like just bell B as you can see in the right figure. Then we also have the option to place two LUT tables in each slide by fixing the pattern of use bills like A, B, or A, C, or whatever, and so on. Six different placements are provided here. And the final option is to occupy all LUTCOP tables in one slide. It means that all of the bills can be used. We should mention that it is very challenging to achieve a similar routine for the path delay lines with our path. Flying some constraints such as fixing the placement or keeping the hierarchy can mitigate such challenges. As the case study, we focus on 64-bit 1.5 ipath as a target path primitive. We follow all placement strategies in constructed 1011 implementation design, which means that we have 1000 FBGA bit of string with random placement that suggests a fixed pattern based on our mentioned strategy. Before we want to explain which design and why that has been chosen for the rest of our observation, we should briefly recall several path metrics that are effective in path functionality. In the following, three important path metrics relevant to delay-based path designs are restated, which are considered in our practical analysis as well. Uniformity, reliability, and uniqueness. For sake of probability, here I focus on uniformity, which is our main criteria in this world. Considering a stateless path primitive with single-bit response, this parameter estimates the proportion of ones in the path responses. An idle value for uniformity similar to uniqueness is 50%. As it reflects the unpredictability of the response of a stateless single-bit path primitive. In this regard, by focusing on a single device, which is one of those 100 FBGA boards, we examine all mentioned 111 designs collected 1000 CRPs where challenges are selected uniformly at random and extracted the uniformity of all arbitrary path instances as well as the inter-post path final responses individually for each design. To finalize one of the design as the best one, we have considered the uniformity since their reliability and uniqueness metrics are mostly the same. This table lists the uniformity of four finalists. For each placement strategy, random or pattern, we selected the first finalist, round 1 and pattern 1 has the one with the best average uniformity overall all arbitrary path instances while keeping the uniformity of the final response in the range of 40% to 60%. At the second finalist, round 2 and pattern 2 has the one with the best uniformity for the final responses we have chosen the design identified in the round 1 in the first four of this table. Using the mentioned FBGA cluster we evaluated the chosen design on all 100 devices while collecting 1 million CRP for each device. We present the result of the evaluation of our chosen 1.5 type of design in terms of reliability, uniqueness and uniformity. It is presented that this design enjoys a higher reliability but not very idle uniqueness which is because of small process evaluation in FBGA implementation. Also we can see an almost normal distribution for the uniformity of each arbitrary path instance being in the range of 31 to 54%. It is important to mention that the uniformity of the final response IOCAL keeps its good range of 45 to 55% in the chosen design. Here we wanted to investigate the non-uniformity effect on real dataset of IPOP attack. Afterwards we investigate how it is going on a real dataset with existing splitting LR attack. Based on the simulation result given in previous chess paper the splitting IPOP attack should be able to successfully which means with the high probability and with the success rate of one break this IPOP variant which means one-five IPOP using 500,000 either Noispy or Noi conducted this attack on our 100 FBGA devices with the corresponding IPOP variant using the design that we have chosen and mentioned earlier with the larger dataset of 1 million CRP with the overall probability of 96%. We then performed the original splitting IPOP attack on our Noisy and Nois IPPS sets leading to accuracy of maximum 72% as you can see here and mostly the range of the prediction accuracy will not go further than 72%. Worthy to mention that we are using the pipe of library of version 007 and for a splitting IPOP attack. Since the splitting IPOP attack does not achieve an adequate prediction accuracy when the targeted POP is slightly minus or non-uniform our attention goes to other learners like ANN or MLP which is also embedded in the mentioned pipe of library. Here in this graph we show the result of the ANN attacks conducted to the same non-uniform real datasets. At first we applied the naive ANN as the black box attack which achieves a higher prediction accuracy compared to splitting IPOP. The prediction accuracy of this attack is in range of 83 to 93%. On the other hand by substituting the all learner of the splitting IPOP attack which is presented in previous chess with a new learner which means that with ANN learner we achieve a more stable learners behavior leading to a higher and narrow prediction accuracy of 89% to 93%. It means that our learner substituting solution can solve this non-uniformity effect on real datasets. Note that in our ANN splitting attack we have not changed any other attack settings except replacing the learner. To be more concrete in entire attacks conducted on all 100 devices, FPGA devices the naive ANN achieves a higher prediction accuracy compared to original splitting IPOP or our IPOP attack while it has less accuracy in approximately half of the devices when compared to our adopted compared to our adopted ANN splitting attack. It can be seen that our ANN splitting attack not only leads to a higher prediction accuracy but also runs considerably faster compared to a naive ANN. This short time time is due to the separating the learning process of each XOR archetype of layer thereby giving the learner a chance to converge efficiently to the proper model in the back and forth process. In order to verify our results in the simulation domain and finding out the reason behind such an accuracy loss we repeated the same attack on the same 1, 5 IPOP variant and also the other variant using simulated CRP sets for various amount of non-uniformity in orbital puffs instances in both top layer and down layers. As you can see in both figures that we are considering 1K and KK models usually in the simulation domain the designers and attacker consider the ideal point which is the 50% uniformity and it brings the good and high prediction accuracy. But as far as the model gets more complex and also the non-uniformity pros more but not very harsh you can see that the prediction accuracy can be decreased. We would like to investigate and verify why this uniformity or bias can influence a puffed learning process. We have concluded the two parameters, bias parameter and bias variance trade-off parameter. So the first one is the bias parameter that it is also mentioned in previous publication but we wanted to highlight that this bias parameter should be always considered in APOP additive delay model which we mentioned earlier. Here this beta should be added to this model to have an accurate puff model but it is worse to mention that this beta has a different distribution compared to the rate which are the representative of physical characteristics. Another effective parameter is bias variance trade-off. During regularization and minimizing the loss function or error rate there is an effective parameter so called bias variance of the learn rates which should be also considered with respect to the type of the selected logistic function and the estimation approaches on the data sets. The bias and variance of the fitting model generally have a relation with minimizing the loss function which is a quantifying measure to show how bad it is to get an error of a particular size of direction in the learning process. As you can see the figure in this graph based on where the learner stays we can encounter with underfitting, good fit and overfitting model which is depends on the bias and the variance value. In more details in the iterative batch sizes this trade-off can cost to not minimize the error rate. Consequently the learner stays in the local minima instead of finding the global minima. But then I would like to conclude our talk with a couple of takeaway points. Non-uniformity in the output of the path with a complex architecture can change a terribly successful ML attack into a model with less prediction accuracy. And also it is worthy to mention that a high prediction accuracy of delay paths in simulation should be also very fine in practice. And there is no fixed rule which modeling approaches or supervised learners are more capable or are stronger in path learning. It depends on the path architecture and other metrics as well. And it is also worthy to mention that accurate model of path including noise and bias brings the simulation experience very close to the real dummy. Thanks for your listening.