 Okay. So, uh, first of all thank you very much for attending this talk and welcome to video-based cryptanalysis, extracting cryptographic keys from video footage of a device's power LED. I will start by introducing ourselves. My name is Ben. I'm a blackhead bold member. I also do some freelancing in a postdoc at Cornell Tech. I have a PhD in security and together with me is Ophick, which is a master of science student at the Ben Gurion University of the Negev. This work relies on two papers that we've recently published which are video-based cryptanalysis and optical cryptanalysis. You can find them online if you just look for them. And moreover, no prior knowledge of cryptography is required to understand this talk. We try to keep it as simple as possible so the entire audience will be able to enjoy it. Some of the specifics of the implementations are not, uh, I will not discuss them in the talk. You will be able to find them in the papers. Now with that in mind, let's have some fun. So here is a question for the audience. Uh, what do you associate with the term cryptanalysis? Now some of you probably associate, um, servers and quantum computers and data centers and in general you probably tend to associate high level of computing capabilities with the term cryptanalysis. Others might associate specialized hardware, for example, oscilloscopes as the hardware that is needed to conduct cryptanalysis. Probably supply chain attacks is also a term which is associated with the term cryptanalysis and also the name of some spatial agencies, um, might appear as well. Um, however, I tend to believe that the best majority of you do not associate this iPhone with, uh, the term cryptanalysis because smartphones in general have weak computing capabilities. They are very popular devices. They cannot be used to apply any complex attack. And the best majority of you do not consider yourself as an entity who owns, uh, a smartphone as, uh, as, uh, somebody who will, uh, is interested in, uh, recovering cryptographic keys. And here is our message for today, okay? Think again. By the end of this talk, you will understand that power LEDs pose a great risk to information confidentiality and video cameras, whether they are security video cameras or video cameras of a smartphone provide the needed infrastructure to exploit this risk. Now by the end of this talk, we will discuss on how to recover cryptographic keys, uh, from a device by using or by obtaining a video footage of its power, uh, LED. But first, let's try to answer where did the idea of using a video camera to recover secret keys from a device's power LED come from. Now this actually takes us to 2014 where the visual microphone was published by a group of researchers, uh, from MIT. These guys were able to demonstrate a speech recovery technique, uh, using a video camera, uh, where they analyzed the movement of an object. It was demonstrated on various objects, but the most iconic demonstration was the one with the bag of chips, uh, and they were able to analyze the movement of a bag of chips from the video footage and recover the speech played nearby the bag of chips. Okay. And they demonstrated it with the use of two types of video cameras. The first one was a professional video camera, a high, uh, a high frequency video camera that enables to provide 20,000 frames per second. And the second one was a regular video camera and they exploited the rolling shutter in order to do so. And we'll discuss the rolling shutter later on in this talk. In 2016 I started my PhD and a few years later, uh, we, um, published the LEMFON attack. The LEMFON attack is a method to recover speech by using a photodiode. Now a photodiode is the sensor that you can see on the right side. Okay. It's an optical sensor which converts light into electricity. Okay. Not for pictures, but for electricity. And then it needs to be digitized with an A to D. Um, we show that attackers can analyze the movement of a desktop light bulb such as the one that you can see in the picture by obtaining optical measurements using the photodiode and then to recover the speech played by the speakers that you can see, uh, that are placed nearby the, uh, desktop. Okay. And we demonstrated speech recovery from various distances up to 35 meters away. You can see the experimental setup at the bottom. The photodiode was mounted into a telescope. The telescope was directed to the light bulb that was placed on top of the table on top of the desktop and the speakers were used in order to play the speech. Now I want to play to you. The speech recovery is, uh, in order to convince you that you are able to, uh, recover speech at very high quality. Uh, the guys from the audio, can I play it? Okay. Let's try. We will make America great again. This is the original. This is from 15 meters, 25 meters away. Now the last one was, was recovered 35 meters away from the light bulb. And bear in mind, this was recovered from light measurements obtained via the photodiode that was directed to the rotating or to the vibrating, uh, light bulb. Now interestingly, in one of the experiment that we conducted, we found that not only we can recover speech from, by obtaining optical measurements from the light bulb, we can also recover speech from the power LED that we used, from the power LED of the speakers that we used to project the sound, uh, nearby the, uh, light bulb. Okay. And this is actually led us to write or to publish an additional research that we named the GlowOM attack. Now the GlowOM attack is the exact same technique. Okay. We used, we recovered speech using optical measurements obtained by an optical sensor, the photodiode. Um, but in this specific case, we analyzed the speakers power LED intensity. Okay. When speech is played by the speakers and recovers the speech out of it. And again, you can see the exact same threat model. The photodiode is mounted to a telescope. The telescope is directed towards the power LED. You can see it in here. It's the, uh, green LED of the speakers. And we recovered again, uh, speech at very high quality. I will play to you. The speech recover is again, uh, that this time we recovered from five meters up to thirty-five meters away. Okay. As you can see, the speech or the quality of the speech deteriorates with distance. But then again, it was recovered, uh, by optical measurements obtained by the photodiode and the optical measurements are, uh, uh, taken from the power LED of the speakers. Now, this is very surprising in a way because probably some of you wonder why can the intensity of a speaker's power LED can even be used to recover speech. And in order to answer it, we actually conducted an experiment where we took the USB speakers and played the frequency scan from the USB speakers between zero to four kilohertz. Now we obtained two traces. The first trace was a power trace that we obtained, uh, via, um, uh, an oscilloscope that was connected to the USB connector of the speakers. And the second trace that we obtained was an optical trace that was obtained by directing the photodiode towards the LED of the USB speakers. Okay. You can see them, uh, at the bottom. Now we made very interesting observations, but I think that the most interesting observation among them was the fact that you can see that the power consumption correlates with the intensity of the power LED of the speakers. Okay. Now this is very nice and very interesting, but is this correlation between the intensity of the power LED and the power consumption can also be seen in additional speakers? Or is it, is it just limited to this specific model that we've used? So then again, we conducted, um, a few additional experiments that we took, various, we actually repeated the exact same experiment, but with different uh, models of, uh, speakers, among them there are Sony, uh, speakers, um, Google's assistant. There is also JBL and Creative. You can see that we took the, the best one among them. Okay. You can see that the correlation appears in each and every one of them. This is by the way, spectrograms that were obtained from the optical measurements. Um, and as you can see the intensity of the power LED of various speakers correlates with the power, uh, uh, consumption in the range of zero to four kilohertz. This is what we consider the universal phenomenon. Now some of you still probably wonder why do we see this correlation? So the fact is that in various electrical circuits, the integrated power LED is connected directly to the power line. Okay. Moreover, dedicated means intended to decouple the correlation between the power consumption and the, uh, the optical and the power correlation are either not integrated to the circuit at all, or in some cases they are integrated but they are ineffective. Okay. And as a result, the power consumption of the device which is essentially the power supply to the integrated power LED affects the intensity of the LED. And due to this fact, the optical measurements reflect the power consumption of the device. Okay. Now at this specific point in time, I decided to leverage these findings to conduct crypt analysis. And, uh, the idea was to apply the previously, um, suggested crypt analytic attacks that first relied on, uh, power traces in order to conduct them. This time we wanted to apply them using optical measurements obtained from the power LED. And this actually, uh, led us to optical crypt analysis. It is a paper. Again, it is about to be presented at CCS but you can find it online. Optical crypt analysis is a key recovery technique to recover the key from a device using optical measurements obtained from the power LED of the device using a photo diode. Okay. As you can see, we directed the, uh, photo diode towards the LED of the, of, uh, Raspberry Pi. And one of the first observations that we made is that the intensity of the power LED of various devices, it's not only, uh, Raspberry Pi. There are additional devices as well. Um, so the, the intensity of their power LED correlates with the power consumption of the device in much wider spectrum that we initially thought. We analyzed zero to four kilohertz. You can see that in this specific case, it reaches up to 500 kilohertz, which was the upper limit of our equipment. The potential is actually much greater than 500 kilohertz. Interestingly, we conducted an additional experiment where we took, um, GnuPG, run it on the Raspberry Pi and extracted and obtained optical measurements from the power LED using the photo diode. As you can see in the spectrum, uh, that appears in the right side, we can distinguish between the decrypt operations and the slip operations of the device, which we consider as an idle. And more over, as you can see, we can detect the beginning and the end of the cryptographic operations performed by the CPU. Okay. This is very bad for information confidentiality. This is very bad because, um, it now opens a door to conduct the previous timing cryptanalytic attacks that were demonstrated in the past, but in this specific case, they can now be demonstrated using, uh, optical measurements obtained from the power LED of the, uh, obtained by the photo diode that was directed to the power LED of the device. And we used this understanding in order to, uh, recover RSA, ECDSA, and PSI keys. Again, you can find this information in the paper. Now, this is very interesting. Okay. Uh, having the ability to recover cryptographic keys using optical measurements instead of power traces is something which is very interesting. However, the primary disadvantage of, uh, this entire, uh, method is the fact that photo diodes aren't commonly used sensors. Okay. The vast majority of you do not own a photo diode or at least the photo diode that can be used to recover these cryptographic keys. Moreover, in order to, um, obtain optical measurements from a photo diode, attackers must connect them to a dedicated A to D in order to sample the electricity, the output of the auto diode. Okay. And the idea we had back then was to go from a photo diode to a video camera. Okay. Instead of using a photo diode, we will use, um, a video camera to obtain the optical, uh, to obtain the video footage from the power LED of the device. Now, with that in mind, I will let Offek to discuss, uh, the threat models and, uh, the rolling shutters. Okay. So let's discuss the threat models. Our objective is to perform crypt analysis and recover secret keys from a device's power LED using video cameras instead of photo diodes. Now, we have two different threat models. The first one is closed video acquisition. Closed video acquisition uses a smartphone's video camera. It targets any type of power LED and requires physical access to the device. The second threat model is over the internet video acquisition. It uses an internet-connected video camera. It targets only type two power LEDs, which turn on when an operation is executed and turn off when the operation is stopped being executed and are commonly used in smart code readers. And it can be applied remotely over the internet using a hijacked video camera. So far, we have used photo diodes, which are analog sensors that can be sampled at few gigahertzes. Video cameras can record 60 or 120 frames per second. A sampling rate of 60 frames per second is not sufficient to perform crypt analysis. Or is it? We used to think that the picture is the result of one atomic snapshot taken by the video camera. However, 99% of the video cameras use a rolling shutter to record the video. A single picture is the result of multiple snapshots taken at different times by the video camera. A video camera uses a rolling shutter to scan an object vertically or horizontally and combines the scanned pieces into a single frame. Only a few rows are captured atomically each time. And the entire picture consists of multiple snapshots of an object. Here you can see an example of a picture captured using a rolling shutter. On the left, there is a moving object. And on the right, there is a captured image. As you can see, because each couple of rows are captured at different times, distortions appear in pictures of fast moving objects. Such distortions should not appear if the picture was taken using one atomic snapshot of the object. Now, let's describe an experiment we conducted to visualize the effect of the rolling shutter. First, we programmed an Arduino Uno to turn its power LED on and off every 250 microseconds. And here's what we can see. Now, we cannot see the LED turns its power on and off at 4 kilohertz because the sampling rate is only 60 frames per second. Second, an extra lens was used to direct the camera of a Samsung Galaxy S22 Ultra to the Arduino's power LED so that the video of the LED fills the entire frame. And here is what you can see. Now, because the LED turned its power on and off a few times in each frame, there are red rows that were captured when the LED was turned on and black rows that were captured when the LED was turned off. And this is why we can see those black and red stripes across each frame. So, by filling the frame with the video of the LED, we exploit a video camera's rolling shutter to increase the number of measurements obtained at different times. This allows us to detect the 4 kilohertz flicker. Theoretically, by exploiting a video camera's rolling shutter, we can increase the sampling rate by three orders of magnitude. From the FPS rate, 60 measurements per second to the rolling shutter rate, 60K. In reality, there is a delay, a transition time between consecutive frames. The transition time between frames is a period of time during which no object is captured by the video camera. You can see it here in the figure that between two consecutive frames, there is a transition time. This should be taken in account when analyzing the video footage of the power LED. Now, I will let Ben to discuss the ACDSA key recovery. Okay. So, before we'll discuss the ACDSA key recovery, in order to discuss the ACDSA key recovery, we need to briefly discuss the Minerva attack. The Minerva attack introduced or published three years ago, and one of the most interesting revelations that researchers did was the fact that they found that various cryptographic libraries apply some runtime optimizations in their code. Okay. And I referred to the code of the elliptic curve 256 R1, the implementation of the ACDSA. And they found that the number of iterations of the main loop is actually determined by the number of leading zeros in the nonce. Okay. And as a result, it created a time dependency between the number of leading zeros in the nonce and the execution time of the main loop and, essentially, the ACDSA signing time. And the Minerva attack introduced a technique to recover the ACDSA key by analyzing the set of, by analyzing the ACDSA signing measurements of a set of ACDSA signatures. You need to have about 4,000 signatures in order to recover, and they're associated ACDSA signing time in order to recover the complete key. Okay. Now the original Minerva attack required the attackers to obtain measurements from the CPU, the timing measurements obtained by querying the CPU. And, essentially, it required the attacker, the attackers to compromise the host machine. The authors were unable to demonstrate the attack remotely over the internet due to the noise added by the network latency. Okay. The Minerva attack is very sensitive to errors in the timing measurements. You are about to see it now demonstrated over the internet. Without getting into too much specifics and details regarding the Minerva attack, you can think about it as a black box. Okay. The black box receives a set of signatures and they're associated signing time and returns the ACDSA key out of them. So, essentially, what we need to answer is how can the ACDSA signing time of a signature be estimated from video footage? If you will be able to answer it, we will be able to recover the complete ACDSA key using video footage. Now, this was the experimental setup in our experiment. We used it, we placed an internet connected video camera and directed it to the power led of the smart card reader. Inside the smart card reader there was a smart card with an ACDSA key. And we wanted to recover the ACDSA key from the smart card using video footage of the power led of the smart card reader by using this internet connected video camera that was placed 16 meters away from this desktop. Okay. And this is the experimental setup. I want to show you the video. You can see on the left side, this was the smart card reader that we used. We actually focused on the right power led of it. And let me show you now. Now you can see that we zoom into the power led of the smart card. And interestingly, the smart card actually provides you with indication regarding whether the smart card reader actually provides you with indication regarding the smart card, whether an operation actually taking place or not. And this is done by changing the color of the LED between blue or basically turn it off. Okay. Anytime it's blue, the smart card is an idle. Anytime it's off, the smart card is being used to sign. So as you can see from the video on the right side there are differences between black and blue colors, okay, which indicate whether the smart card is currently being used to sign on stuff or not. So as you can see this actually, the picture on the left side is actually the result of averaging each row in a frame to a single value and arrange them in a time series, okay. And we did it for the red, green and blue channels. You can see that we can distinguish between the sign operations where the power led is off and the idle operations where the color of the power led is blue based on the blue channel. Now we need to answer how can we calculate the assigning time out of it. Now in order to calculate the ECDSA time of a signature, we extract the series frames associated with each signature from the video, okay. You can see them, you can see examples of them on the bottom of the slide. Now we kept series that contain the indication of the switch between the beginning and the end of the signing. Meaning that we kept series that the beginning and the end of the signing were kept in the frames where you can see in this specific case, this is the series that you can see on top, okay. Each series that missed or lacks one of these indications regarding the switch between the beginning of the signature and the end of the signature we filtered it off, okay. You can see that in the frames that appear in the first series, the color is changed inside the frame between blue and black and vice versa, okay. This is what we considered as the indication that the operation started and ended, okay. And this are only the type of series that we kept. Now first of all we calculated T1, which T1 in our case is the execution time of all of the full black frames that you can see in here. This is actually done just by counting the number of full black frames and multiplying them by this constant. This gives you T1. T2 is the execution time of the first frame, okay, in which the color changed between, as you can see in here, between blue to black. So we calculated it as follows. We counted the number of rows that are associated with the sign, which are, you can see it in here, they are in black, and divided them by the total number of rows in the frame. Okay, this actually gives us the relative time in the frame. We multiplied it by the scanning time and added the transition time. Now so far, Ophik discussed about the scanning time and the transition time, but he did not mention how you can determine the scanning time with the transition time. So let's discuss how we can find S, which stands for the scanning time and T, which stands for the transition time. So I remind you the experiment that we did at the beginning with the flickering LED of the Arduino. You can just take one of the, you can repeat the exact same experiment with the flicker, take only one of the frames from the video, count the number of changes between black and red that you have in the frame, and multiply it by the time that the flicker was on and off. In this specific case, the on and off periods were equal. Okay? And by doing so, this will give you the scanning time of the video, the video camera. And the beauty of it is that when you get the scanning time out of this experiment, you get the transition time for free by deducting the scanning time out of this constant. Now with that in mind, and returning to T2, you now have S and T and everything needed to calculate T2. Now T3 is actually calculated at the exact same way. Okay? But in this specific case, we do not add the transition time because the ECDSA sign stopped or ended in the middle of the frame, so the transition time happened after the frame, so there is no need to add it to this specific, to T3. And the sign time is actually the sum of T1, T2 and T3. Now the beauty of it is that if you will do it to enough signatures for about 4,000 signatures, you can see that this is the result that we got. We were able to recover the ECDSA key out of it, using the Minerva script that the orders were published, again using an internet connected video camera that was directed to a power LED of a smart card reader. Thank you. Okay. Now more over, there are many additional smart card readers available to purchase on Amazon that are vulnerable to this attack, and the distance between them and the video camera might vary based on the intensity of their own power LED, but we were able to recover the same ECDSA key from the smart card inserting it to five additional smart card readers. Again, all of them are available online. Okay. Now let's discuss recovering a psyche from a device and again recovering a psyche from a device or in order to understand how to recover a psyche from a device, we need to briefly discuss the Herzblit attack. The Herzblit attack introduced a year ago, they also presented their talk at Black Hat, and they were able to suggest or introduce a new timing attack to recover the complete psyche from a server. The Herzblit attack, unlike the Minerva attack that I showed you before, is not the result of the implementation of the code. Okay. It's the result of the execution of the device. The code should be resilient to timing attack, but the execution actually reveals something in terms of the time that it was executed regarding the data being executed by the device. And this is happening due to the dynamic voltage frequency stabilizers, maybe you are more familiar with the DVFS, okay, which yields different execution times based on the data that was recovered that is being processed by the device. And the researchers were able to show that for each index of the bit, they can craft a dedicated cryptogram that relies on the indexes that they were already recovered. Okay, it's an adaptive attack for those of you are familiar with cryptography. And determine whether the value of the bit under attack is similar to the value that was recovered in the previous index or changed. Okay, and this is all done based on the timing threshold. Now, the original attack was implemented using timing measurements obtained by querying the API of a server over the internet. They deducted the request time from the response time. And again, due to the added noise that was added by the network latency, it took a few days to recover the keys. Now, this was our attack, okay. And this is the experimental setup of our attack. We tried to recover the secret key, the side key from the Samsung Galaxy S8. The Samsung Galaxy S8 is the smartphone that you can see on the right side. It's not a smartphone that you can see on top. The Samsung Galaxy S8 was the device that hold the side key. Okay. Now, interestingly in this specific case, we tried to recover the secret key from a device by obtaining video footage not from the Samsung Galaxy, but from the speakers, okay, the power led of the speakers that were connected to the same USB hub that was used to charge the Samsung Galaxy. And in order to make it completely outrageous, we did it with an iPhone, tried to recover the key from the Samsung Galaxy S8. On the right side, you can see the video footage of the power led of the speakers. Okay, you can see that this is green. This is actually being taken from the speakers, as you can see on the left side. Between the speakers and the smartphone, you can see a lens which was used to fill the entire frame with the view of the LED. I want to show to you the video. You cannot see anything, right? It seems like a completely innocent video with green color. Moreover, I will argue that you cannot see even the difference between the idle and the psych operations being performed by the Samsung Galaxy S8, okay? They cannot be detected by the naked eye. But here is something very interesting, okay? We executed eight consecutive iterations of psych operations where each iteration consisted of 1100 of psych operations on the Samsung Galaxy S8 while obtaining video footage from the power led of the connected speakers. You can see that by averaging each frame into a single value and arranging them in a time series and again zooming into the green channel, you can see that we can detect the beginning and the end of the eight iterations by analyzing the green channel and again using this by arranging the values in a time series. Now, this actually gives you the ability to calculate the iteration time based on the number of values between each two peaks that you can see in here, okay? But then again, another question that we need to answer is how accurate is this calculation? So on the left side, you can see the distribution of the execution time based only on the first iteration, okay? We applied the attack by using only the first iteration and as you can see, it's very noisy. The two possibilities or the two cases that whether the bit was similar value or was it changed in comparison to the previous index, you cannot set a threshold that will give you an accurate you will be able to use it in order to accurately distinguish between the two specific cases. However, when you use instead the distribution of the execution time based on the minimum execution time among these eight iterations what you're seeing in the middle you can see that you can actually set a threshold in the middle that will allow you to distinguish between the two possible bits of the key index with very high accuracy, okay? 99% accuracy. Now 99% accuracy is good. It's not excellent, okay? It's not error prone. As a result, we will have to use an error detection and correction mechanism throughout the key recovery process in order to recover the complete key. And as you can see, this is actually what we did. We recovered the complete key, the complete psyche from the device using a video footage obtained from the connected speakers. Then again, we also integrated an error detecting and correcting mechanism that was suggested by the guys that presented the Herzblit attack. Okay, so let's briefly discuss the limitations of video-based script analysis. First of all, the first limitation is the fact that the attack mostly targets weak IoT devices. I would say up to a level of a smartphone. I would be very surprised if somebody would take the exact same video cameras that we used to recover secret keys from a power lead of servers or power lead of laptops. Now, moreover, we discussed about it earlier due to the fact that there is a transition time between each two frames where the object or the power really isn't being captured in any frame. The sampling, the distribution of the sampling is semi-uniform. It's not exactly exactly, which is another issue that we need to resolve when recovering the secret keys. Now, let's discuss the takeaways. I think that now you are convinced that power LEDs are much more informative than you initially imagined. In some cases, attackers can use the information modulated over the power LED to recover speech. In other cases, it can be used to recover cryptographic keys, as we showed in this talk. Now, moreover, the potential of the attack is actually much greater than we showed you in this work. We focused on what we considered as ubiquitous cameras, not professional cameras. We used security cameras and smartphones video cameras in order to recover the secret keys. Attackers in reality can use professional video cameras that will provide a higher rolling shutter rate, better bid death and enhanced zoom capabilities and by doing so, they probably will be able to recover secret keys from a wider range of devices from probably even a greater distance. This is, I think, the most interesting insight or take away from this talk. We expect that more and more devices will be exposed to video-based cameras. This is actually the result of two interesting facts. First of all, video cameras' specifications are continuously improving each and every year, following Moore's law. You can see how the sensitivity of this picture has improved significantly throughout the years. Moreover, the number of functional IoT devices with weak CPU power and I'm referring to sensors and smart cards and robotic vacuum cleaners and smartphones and even video streamers. The number of such devices are continuously increased each and every day. Their deployment is expected to grow throughout the next year as well. And as a result, we will have much improved video cameras throughout the years that collocated with a greater number of weak IoT devices. And due to this fact, we believe that more and more devices will be exposed to video-based cryptanalysis each and every year. This is the final take away. It's mostly meant for perspective. It might be easier for this audience to compromise the target device with a malware and exfiltrate the key over the internet, then applying the attack that I've just mentioned. However, bear in mind that video-based cryptanalysis is intended to extract keys from non-compromised devices. And the beauty of it is that it uses popular equipment not something that we consider specialized equipment as the equipment that I've discussed at the beginning. Now, a few other things that I would like to mention. This research was recently awarded with the PONI Award for the best cryptographic attack of 2023. You can find the additional details if you will look for video-based cryptanalysis online. And with that in mind, thank you very much for attending this talk. I will be happy to take any questions from the audience. Thank you.