 Hello everyone, my name is Tianlin Huo and today I will be talking about our paper called Blue Sender, a two-level directional predictor based search channel attack against SGX. The hardware-based trusted execution environment is a promising technique to enable secure computation. Intel's SGX is drawing significant attention these years because of its strong security guarantee, which enables a variety of new applications such as secure data analysis. However, recent researches demonstrate that SGX is vulnerable to the following set-channel attacks, including page-table-based attacks, cache-based attacks, and branch-prediction-unit-based attacks. Compared to the other two kinds of attacks, BPO-based attacks are getting considerable attention. First, BPO-based attacks can defeat most of current SGX protection approaches. Second, BPO-based attacks can identify instruction-level control flow information inside an enclave, while other attacks can only recover page-level or catch-line-level secrets. The pattern history table is a main component in the BPU, which can also be abused to conduct search-channel attacks against SGX. Brass-Scope is the latest PAC-based attack against SGX. By manipulating harsh inclusions in the bimodal predictor, this attack can extract fine-grained information of a target enclave. However, Brass-Scope needs to execute a large number of branches for activating the vulnerable predictor, which causes considerable training overhead during the attack and limits the temporal resolution of this attack. If the target application runs fast, it will be hard for this attack to catch up with the target application. As a result, several secret bits may be missed during the attack. In this paper, we propose Blue Sander, a new PHT-based attack which can reveal fine-grained control flow of an enclave program running on real SGX hardware. To the best of our knowledge, this is the first set-channel attack on the second-level directional predictor. Several new techniques are also proposed in this paper. First, we present a new method for constructing a given branch history with only 93 branches. By using this method, we recover the branch history information of an enclave interrupt. Second, we propose a novel detection technique which enables the attacker to monitor the enclave's actions in a high-temporal resolution without caring about when the target instruction in the enclave is executed. Third, Blue Sander is the first attempt to use SGX interrupt for fixing the branch history. By interrupting the enclave just before the target branch, the attacker can ensure that the target history is the same each time when the target branch in the enclave program is executed. We also implement Blue Sander on real SGX hardware with the latest patches and evaluate this implementation on several case studies. Experiments show that our attack can outperform the latest PHC-based set-channel by 52 times. Specifically, in the RIC attack, Blue Sander extracts the whole private key with 96% accuracy by attacking only once. Now we describe the background. The BPU is an optimization design of modern pipeline processors. By predicting the possible execution path of a process, it can speed up the fetching process in a processor design. This figure on the right side illustrates one possible design of the BPU, which has several main components, especially the branch directional predictor and the branch target buffer. The branch directional predictor is for predicting the possible jump directions of a branch instruction, while the branch target buffer is for predicting the jump destination address of this branch. The bimodal predictor and the second-level predictor are two main branch directional predictors in modern processors. The bimodal predictor exploits the observation that the branch direction outcome has a relationship with the previous history of this branch, while the second-level predictor believes that the outcome is also affected by the jump history of other branches. When a branch first comes, the BPU usually chooses the bimodal predictor for predicting. However, if this predictor makes mistakes for several times, the BPU will select the second-level predictor instead. Both of these two predictors rely on a table called PRT for predictions. PRT is a recording table containing several ambient saturated encounters. When a branch comes, one PRT entry in either one level or two level predictor will be selected for prediction. After this branch is actually executed, the counter given by the relative entry of the PRT is updated. A finite state machine description of updating an ambient saturated encounter is given in the right figure. If the most significant bit of this counter value is zero, this counter is in a nautical mode, which means that the direction it predicts is not taken. Otherwise, the counter is in a taken mode and its prediction made is taken. For example, a 3-bit saturated encounter with the value 1 is in the nautical mode, while another 3-bit counter with the value 4 is in the taken mode. Also, the predictor is updated according to the actual jump directions of the incoming branch. If the real execution direction of the branch is taken, the value of its corresponding PRT value will be plus 1, or its value will be minus 1. We assume a standard SGS threat model in which the attacker has full control over the operating system. First, we assume that the attacker has access to the target enclave program source code or binary. By analyzing these resources, the attacker can obtain the detailed behavior of an enclave such as its control flow. Second, we assume that the attacker and victim programs are co-resident on the same physical core. This is because the BPU is shared at logical core level, but separated at physical core level. Third, the attacker is able to measure the misprediction information of her own branches. Both the performance monitoring counters and RGT-SCP instruction are available for this measurement. Fourth, the attacker can interrupt an enclave just before the target branch in the enclave, ensuring that no other jump actions are taken before the execution of the target branch. This can be achieved perfectly when interrupting an enclave since the attacker can control the OS. Now we turn to present Bluestander. This is the overview of our attack. The Bluestander attack aims to obtain the fine-grained control flow of an enclave program by manipulating the second-level predictor. In general, the Bluestander attack consists of the following two stages, activating the second-level predictor and leaking secrets. In stage one, we activate the second-level predictor. We force the predictors other than the second-level predictor to shut down by imposing several mispredictions on them, ensuring that the second-level predictor is eventually activated and used by the target core. Executing the target branch with a carefully designed TBDVQ direction vector can mislead the BPU to make such mispredictions easily. This stage is only executed only once during the attack. In stage two, we are trying to leak secrets using the second-level predictor. Two steps are included. Constructing collisions and abusing collisions. First, we try to establish PET entry collisions between the victim and the attacker processes. Then, by probing the state changes of these collision entries with the TBDV-A vector, the attacker can infer the control flow of the target enclave, as well as the secrets in it. This attacking stage must be executed for each detection during the attack. We activate the second-level predictor based on the following observation. The second-level predictor is chosen by the BPU for predictions only when the bimodal predictor cannot predict well. As a result, it is viable to shut down the bimodal predictor by imposing several mispredictions on it. Any direction vector which does not match bimodal predictor's prediction pattern can force it to make mispredictions. The prediction and updating rules of the bimodal predictor is shown on the right side. According to these rules, we can draw the conclusion that only when all the execution directions are taken or not taken can the bimodal predictors predict correctly all the time. Based on this conclusion, we construct an efficient execution direction vector named TBDVQ for shutting down the bimodal predictor. The way to construct this vector is as follows. First, we define a TBDV as an L-bit vector with randomized 0 or 1 bits. In this vector, 0 means the execution direction of a conditional branch is not taken, while 1 means its execution direction is taken. The function of this TBDV vector is to mistrain the bimodal predictor. Since this vector is generated randomly, it can hardly match the prediction pattern of the bimodal predictor. The TBDVQ vector simply repeats the TBDV by n times, and the TBDVQ vector is designed to quiesce the bimodal predictor as well as to train the second-level predictor. The figure on the right side is an example of constructing TBDVQ. L and n equals 3 and 4 separately here. First, we generate a 3-bit TBDV vector randomly, which is 0, 1, 1. Then, we repeat this TBDV for 4 times and generate a 12-bit TBDVQ vector finally. The left table illustrates the misprediction results of executing a branch with this 12-bit TBDV vector. It can be seen that three mispredictions are made. If the TBDVQ is long enough, we can finally quiesce the bimodal predictor. After activating the second-level predictor, we move to manipulate this predictor for set channel attacks. At first, we need to construct entry collisions between the victim and the attacker. The branch history and address of the target branch are two main elements which have an influence on the PhD indexing. To verify whether the PhD indexing of the two-level predictors is only affected by these two elements, we carry out an experiment in which both the victim and the attacker run a same piece of code with a large number of taken branches. The experiment result is given in the left table. If the attacker process runs alone, only 4 out of the 11 predictions are incorrect. While the attacker runs with the victim, 9 mispredictions occur. This difference shows that the predictions of the attacker's branch operations can be influenced by the victim and the entry collisions have been established. We repeat these tests for several times and the results are always the same. We also reduce the number of taken branches and the results show that 93 taken branches are enough to ensure that the branch history is fixed. However, although it is easy to infer the address of the target branch, it is pretty complicated to recover the branch history of the victim process since its value varies in different contexts. Also, some pieces of the code executed are sacred to us. To deal with these difficulties, we send interrupts to the enclave in order to fix the branch history of the enclave score. This is because enough jump operations are included in the interrupt handling operations and they can force the branch history to achieve a fixed state. By making use of the published tools such as SGS step, we can interrupt the enclave precisely. To deal with the problem that some pieces of code are sacred to us, we need to reconstruct the branch history after an interrupt. Now, we move to describe how to recover this branch history with only 93 jumps. Our history reconstruction is based on the observation that the branch history update can be controlled by manipulating the last two bits of branches' destination addresses in Intel processors. The red figure is an example. We vary the jump destination of the first jump instruction in the attacker's code while fixing a source address. We find that the PRT collisions can be established only when the last two bits of the jump's destination are fixed, such as 01. This result means that only the two bits are enough to represent the effect of one jump on the branch history. We make use of this result to recover the enclave's branch history after an interrupt. First, we add 93 jump instructions into the attacker's code, which are called R-jumps. The source addresses of these jumps are fixed upon insertion. Well, their destination addresses can be changed. We also add 93 jump instructions, which are called T-jumps, and a target branch in both the victim's code and the attacker's code. Note that the source and destination of each victim's T-jump instruction should be the same as that of the corresponding one in the attacker's code. Now, the two target branches share the same predator entry. Then, we delete the last T-jump instruction in both processes and change the destination of the last R-jump instruction in the R-jump sequence. By checking whether an entry collision has been established, we can infer when the branch history of the two processes is the same. According to the conclusions drawn above, four times are enough for recovering the last R-jump instruction in the attacker's program. It can be seen that our first test fails. Next, we delete the penultimate T-jump instruction in both processes and recover the jump destination of the penultimate R-jump instruction by checking again. We repeat this step 93 times in total, and finally, the branch history after an enclave interrupt can be reconstructed. After having the ability to access the target entry in the second level predator, we try to abuse this entry to conduct blue sander attacks. Although it has been uncovered that the main component in the second level predator is a P.R.T. with several N-bit set training counters, the value of N is still unknown. To detect the value of N, we execute an attacker process alone, which contains 93 training branches and one target branch. The TBDV we use has L1-bit T's and L1-bit N's. L1 is said to be big enough in this test. We repeat this TBDV for 100 times and generate the final TBDV queue for the target branch. By executing the target branch with this vector and checking the mis-prediction result of the target branch, we can infer the exact value of N. The test result is presented below. After executing the target branch four times with taken or not taken directions, the two-level predator can be trained. In other words, N is three. Another challenge to abuse the PHC entry collision is that current BPO-based attacks usually require both the attacker and the victim processes to be executed in a specific sequence order, which limits the temporal resolutions of attacks. For example, in most attacks, the attacker needs to train the BPO first, then the victim executes the target branch and affects the state of the target BPO entry. And last, the attacker probes the BPO changes and infer victim secrets. To increase the temporal resolution of our attack, we adjust the branch directions of the target branch dynamically. We set the execution direction of the target branch just opposite to the predicted direction. An example is shown in the table below. If the relative second-level PET entry uses the saturating counter to give a taken mode prediction, the attacker sets the direction for the target branch to be not taken. Otherwise, the attacker sets the direction for the target branch to be taken. We call this direction vector, which the attacker executes the TBDVA vector. By executing the target branch with this vector, our attack does not require the sequential order as before. We find that without the execution of the victim, the TBDVA should reach a stable state eventually, repeated with the basic sequence TTTNNN. The explanation of this phenomenon is presented in the first table. However, if the attacker process is run with the victim process, this stable state will be broken. By analyzing the irregular sequences, the attacker can finally deduce the fine-grained control flow of the victim process. The relationship between the possible irregular sequences and the victim's actions are presented in the second table. If the number of N is larger than that of T, our target branch with the taken direction is executed by the victim. Otherwise, the victim should execute the target branch with the not taken direction. Now we turn to evaluate BluCenter attack. We launch BluCenter against the MBedTLS library, which is a popular choice for conducting encryption and decryption operations in SGS-based environments. The RSA algorithm in this library is vulnerable to control flow attacks. By detecting the execution directions of branches in it, the attacker is able to uncover the private key of the victim. This figure demonstrates the execution directions of the attacker's target branch, which are affected by 10 bits of the private key. Take the first bit as an example. The three red spots means TTT, and the four blue spots means NNNNNN. Now we can easily draw the conclusion that the victim's execution direction is T and the secret is 1. With the help of post-processing analysis scripts, we can reliably recover all of the private key. We also compare BluCenter with Bracescope. Now we present the countermeasures. We divide the countermeasures into two categories, software ones and hardware ones. One software-based approach against BluCenter is removing conditional branches in the target enclave. However, this approach is usually algorithm-specific and applying it to general applications is challenging. A more practical countermeasure is auditing. Which requires no changes in processor designs. However, selecting the threshold used for differentiating between benign programs and adverse programs is difficult. As for hardware countermeasures, one hardware-based mitigation is to randomize the Ph.D. indexing logic, which can make it harder for the attacker to construct entry collisions in the predator. However, extra hardware components may be required, such as buffers. Another solution is to prevent predicting substantive branches. Although this method can protect the victim process from side-channel attacks, it cannot protect against cover-channel attacks. We may also double the BPU logic or flush the PT state whenever the context changes. However, these mitigations either require a more complex BPU design or increase the running overhead of processors. You can get more information about the following aspects in the paper. In this paper, we propose a new Ph.D.-based attack against SGX named BluCenter, which leverages the second-level directional predator as a side-channel. By exploiting hushing collisions in the predator, this attack can precisely identify fine-grain control flows inside an enclave. We implement BluCenter and evaluate this attack on 2K studies. Results show that even though several defensive techniques are used, BluCenter attacks are indeed practical and pose a serious threat on SGX. In particular, our implementation is able to derive the RSA-private key with more than 96% accuracy in a single run, whose speed outperforms the latest PT-based side-channel attack by 52 times. We stress that hardware-based countermeasures against BluCenter attacks need to be taken in order to prevent large-scale explanation of BluCenter. Thank you.