 Hi everyone, I'm Manuel and I'm here to present my latest work, ROTED, Random Oblivious Transfer for Embedded Devices. This work was developed in collaboration with Pedro Branco, Luis Juliais, Paul Martins, Paul Mateus and Lionel Souza. And here is my email, so feel free to send me an email if you are interested or want to know more. So our goal for this work was to develop a highly efficient Random Oblivious Transfer protocol. A random OT is just like a one of two OT, but the parties don't get to choose the messages in case of the sender, nor the choice bit in case of the receiver. For instance, OT extension protocols are widely used to improve the efficiency of most cryptographic protocols requiring many OT executions and allow for the conversion of a few base OTs into many outputs using only cheap cryptographic primitives as overhead. However, in the malicious setting these base OTs are required to be random OTs and most related art has focused on designing one out of two OTs which require adaptation before being used in these protocols. So, by directly using random OTs we are able to improve the efficiency of the protocols without requiring an expensive black box conversion from OTs to random OTs. Finally, as we'll see, optimizing this primitive will have very relevant applications in embedded systems, in the run of things, desktop and server applications, and even more. Our novel work has three main contributions which we will analyze next in this presentation. First, our random OT protocol. It is a three round protocol and uses techniques similar to an RLW key exchange where the sender and the receiver will perform two partial executions of the key exchange but only in one of them it will be possible to reconcile a shared key. So, while the sender will have two messages, the receiver will only be able to share one of them from which it will get its OT message. Meanwhile, the sender won't be able to decipher which one the receiver got leading to the security of the OT. Also, both the sender and the receiver will be able to influence both of the messages as well as the choice bit. And so, if one is honest, the distribution of the outputs will be uniformly random leading to the security of the random OT. Finally, a proof by simulation in the random model is provided to ensure you see security. Secondly, we implement our novel protocol in C++ and execute it in an Intel X86 server class processor and in four ARM application class processors and benchmark it against current state of the art achieving speeds at least one order of magnitude faster. Finally, we integrate the protocol in an open source private set intersection framework and evaluate the real world implications of our protocol as a random OT is not very interesting to the real world by itself. Here, we manage to get up to 6.6 times faster executions than when using the related art. Again, our protocol has three rounds and we will go slowly by each of them. In the first round here, both the sender and the receiver generate an RWB sample which will be used for the key exchange. The receiver also samples the random string R which will be used as a common random string between the parties by querying the random oracle on R. The receiver chooses a bit C which will define which message is real and which is random by mixing it with an output from the random oracle. Note that both are indistinguishable by the RWB assumption. Finally, the random strings T0 and T1 will be mixed in the output messages of the protocol to ensure that the sender won't have any control over the distribution of those messages and must be committed by the receiver at the start so it cannot change them once it knows its message. In the second round, the sender receives the message from the receiver and computes the CRS to recover both the real and random key exchange messages. And using its secret key reconciles a key from both. Remember, only the reconciliation using the real key will yield a shared key. It then samples a bit A which is its contribution to enforce that the output choice bit of the receiver will be uniformly random as well as a random string U that must be included in both messages to force them to be uniformly random as well. The sender sends its key exchange public message and commits to both reconciled keys by querying the random oracle on them. In the last round, the receiver uses the information from the sender to reconcile the one shared key, while the choice bit B being derived from the XOR of A and C. The third condition here is necessary so the UC simulation is indistinguishable even if no commit on the reconciled key is much. Note that this never happens when the protocol is executed honestly. Finally, the commits on the random strings T0 and T1 are opened by the receiver and checked by the sender and the final outputs are computed using all the exchange information. We will now go through the intuition on why the protocol is secure. We start by considering a malicious sender. By considering malicious adversaries, they can arbitrarily deviate from the protocol. First, a malicious sender can't learn the bit B, as while it has two key exchange messages, a real and a uniformly random one, it cannot distinguish which is which, as this is exactly breaking the RLLE assumption. Which message is which is given by the bit C chosen uniformly at random by the honest receiver and unknown to the sender. Knowing B requires knowing C, as B equals AXOR C. Moreover, the malicious sender cannot bias a distribution of the messages. As they come from the random oracle and the query must include the random announces T0 and T1, sent by the honest receiver in the end of the protocol. In the case for a malicious receiver, it cannot learn both messages, as this would require it to find both shared keys. But knowing both shared keys would require finding the RLLE secret from a uniformly random sample. This is the random key exchange message. Furthermore, the malicious receiver cannot bias the distribution of the choice bit B, as it results from the XOR of the bit C with the bit A chosen uniformly at random by the honest sender and unknown to the receiver. And it cannot bias the distribution of the output message, as it comes from the random oracle and its query must include the random announce U, sent by the honest sender. In order to prove universal composable security, we must construct a simulator that simulates every adversary, while remaining undetected to an external environment. For this proof, we will work in the random oracle model, which allows the simulator to observe and program queries to the oracle. So first, to simulate a corrupted sender, the simulator will program the oracle H1, such that it will be able to reconcile both shared keys. By making this slight change, the simulator is not detected, as distinguishing two RLLE samples from one RLLE sample in the random one requires breaking the RLLE assumption. Then, with both keys, the simulator can extract the bit A from the malicious sender, using its queries to the oracle H2 that it observes. Finally, the simulator only has to program the outputs of the random oracle H2, so that when the right messages are queried, it replies with the messages it got from the ideal functionality. Since these are random, as we are doing random OT, these replies are indistinguishable from a normal reply of the oracle, and the simulation is done. Now the simulation for the corrupted receiver. The simulator will program the random oracle, such that the bit B it got from the ideal functionality is specified from putting the bit A equal to 0 or 1. A very relevant detail here is showing that the corrupted receiver can't query the oracle on both. But if the corrupted receiver could find both shared keys, then it could derive the shared key from only public information of the key exchange, or find the RLLE secret of a random sample. Finally, the simulator only has to reply to the query on H2 with the message it got from the ideal functionality, when asked the respective query, concluding this simulation. There are two edge cases in a UC-proof simulation. First, when no party is corrupted, the adversary doesn't corrupt any party, so only observes the transcript. But the transcript of this protocol carries no information that allows it to distinguish its setting. This can be proven from the uncorrelation of inputs and outputs of a random oracle, and from the RLLE assumption. Then, the other edge case is when both parties are corrupted, and here the adversary generates all the messages by itself. So the proof is concluded and the protocol is indeed UC-secure. We'll now address the implementation details. As stated before, we implemented our novel protocol in C++, resorting to external libraries for some specific operations. First, Gaussian sampling was implemented using NFLD. We assumed there was a shared region of memory that was periodically populated with random data, so the protocol only needed to read data of memory instead of generating random numbers on the fly. Then, random oracles were implemented by hashing the inputs and using the output of the hash as a seed to pseudo-random generator. This generator was then used to produce the output of the random oracle. For sampling polynomials, rejection sampling required extensive calls to the underlying hash function, which shows Blake-3 as it is one of the most efficient cryptographic hash functions to date. The entity was also implemented using NFLD, but it needed to be extended to support ARM with Neon SIMD. We also considered polynomials that were transmitted and considered the outputs of the random oracles to be already in the entity domain in order to improve efficiency. Finally, it is important to note that our implementation used 16% more memory than the current state of the art. We are now ready to analyze our implementation results. First, in absolute numbers. In these tables, we can see the results of implementing our random OT in comparison to the state of the art. Note that the protocols being compared to were first converted to random OTs. We can see that there is a clear improvement when using our random OT in all of the implementations. Substantial improvements are obtained when using vector instructions instead of serial instructions, but even more substantially when going from the ARM Cortex A53 in-order architecture to the ARM Cortex A72 out-of-order architecture. In the end, we are able to achieve around 27,000 random OTs per second in the Apple M1 application class processor and around 37,000 random OTs per second in the Intel server class processor. As discussed before, we integrated our random OT protocol in a state-of-the-art open-source private-set-intersection framework. Since the framework only supported x86 architectures, our results are limited to that platform. Our goal here was to show that our random OT proposal was significant even for the real world and the use case for the private-set-intersection was selected for its many direct applications. In the end, we achieved very substantial improvements in terms of speed while our memory drawback became negligible when considering the requirements of the entire private-set-intersection protocol. We now provide a relative comparison between our random OT transfer proposal and the state-of-the-art for all architectures. On the left, one may see the results for the four ARM application class processors and on the right, the results for the Intel server class processor. In order to perform this relative comparison, we used PVW as the baseline. Again, we can see that our proposal improves substantially on the current state-of-the-art in all of the examples. To provide more insight on this comparison, profiling PVW showed that almost 50% of the execution was done performing point multiplication, an inherently sequential operation. Moreover, RLWE, due to its ring structure, benefits considerably from using multiple instructions and CMD. Once again, we see that vector instructions speed up the protocols considerably, around 30%. But changing the execution back-end from in-order to out-of-order provides even more considerably speed-ups of over 100%. An interesting case is the AVX2 implementation, which is faster than the AVX512 implementation. Indeed, the RLWE AVX512 implementations were not able to fill the vector. Therefore, link checks had to be added in the entity loop, leading to more missed branch predictions. When integrating all random OTs in the private set intersection framework, similar conclusions arise. Again, we used PVW as the baseline in this relative comparison, and again, our proposal was the fastest. Our proposed random OT, when used in the private set intersection framework, was at least twice as fast than the related art, reaching up to 6.6 times faster. Here, the AVX2 was again the fastest for the same reasons as before, while the relative improvements were about the same when considering two protocols with vector implementations or when comparing serial implementations. As previously stated, the memory requirements of the private set intersection framework far exceed the ones for the random OT, meaning that in this application, the slight memory drawback is essentially inconsequential. To conclude, we have developed a novel random oblivious transfer protocol, whose security is based on the RLWE assumption and that is proved UC secure in the ROM, a crucial factor for protocols that are meant to be used while composed with other protocols. We achieved speeds up to 37,000 random OTs per second for an Intel X86 server class processor and up to 5,000 random OTs per second in ARM application class processors suitable for constituting part in a system on a chip. By using vector instructions, we are able to improve speed substantially at around 40%. In comparison with the previous state of the art, we achieved speeds one order of magnitude faster and these results are obtained on diverse platforms suitable for embedded systems, internet of things, desktop and server applications and others. We also showed the practical interest of our random OT by integrating it in a very relevant private set intersection application whose applicability ranges from contact discovery, remote diagnosis, contact tracing and others. Moreover, using our random OT in this application results in improvements of at least twice the speed of the previous state of the art and reaching up to 6.6 times faster. Future work will address ultra low power devices as these have different requirements and will need more specific optimizations. Thank you.