 Alright, thanks everyone for your patience while we work through technical delays. Now I'd love to introduce Jesus and David on behavioral biometrics and context analytics, if we could all give them a huge round of applause. Hello everyone, thank you for joining us, despite the hand cover. Today we are presenting how we reimagine the RISC-based authentication system. Our work analyzes information from the device and also from the user. From the device, a web-based fingerprinting, and from the user, behavioral biometrics patterns. My name is Jesus Solano, lead data scientist, and this is David Camacho, lead data architect. We both work at sister technologies in the research team. To start, let me ask you something. Aren't you bored of having to remember long and complex passwords? Yes, you are. I am too. How many of you have been in this situation? I think most of you. Okay, you. We have been using passwords during the last 60 years, but they are not secure anymore. And to be honest, they have not been secure for a long time. The question is why? There are two main reasons. The first one, passwords don't validate user identity. And the second one, usually, originally, the passwords are meant to be difficult to guess by other humans. But today we are not dealing with humans guessing passwords. We are dealing with computers guessing passwords, so they are not secure anymore. For instance, imagine that you are setting up your password in your Yahoo account email. Even if you put this long password, you are not safe. The question is why? Well, because it is very easy to guess by a computer. And especially if you are like this kind of people, then you are not safe enough in the web. Now, how are the online services protecting us in the web? How are they protecting our identity? Do you know? Well, that's an important question indeed. Think about your cell phone, think about your email accounts, your corporate accounts. How are they protecting our data and our identity? What are they doing, what do you have to do to protect them? So definitely, many of you, if not all of you, have been in this situation looking at this kind of thing, trying to match all the requirements and you end up with a long password that you cannot remember or you just put it through, I don't know, some password manager or something. That's kind of hard and it's difficult to remember, right? To enforce the password protection or maybe the model that we use right now with passwords, the system has to ask us to set up some security questions or challenge questions. But these questions have another problem. It also requires a lot of our memory and we have two different options. If we want to remember them, we just put the truth, like literally the truth of what they're asking us. But that's information that may be publicly available and it's easy to guess. And if not, we just create some random made up answer but it's hard to remember. So again, you throw it up to a last pass or any password manager that is able to store these questions. And finally, they enforce the security using symptoms like CAPTCHA. But CAPTCHA can only tell or differentiate between humans and computers or maybe bots, et cetera, for this one. It is a robot arm that is able to bypass another robot test. So it's not secure enough, definitely. We need actually something that actually protects our identity and focuses on something different beyond just secrecy. Cyber security industry in the latest years came up with three main approaches. The first one is passwordless logging. The idea with passwordless logging is that you want to replace your password with something that is dynamically created like an OTP but it's attached to your device and your device can also be stolen. The other way to do this is just use biomedical information like your face recognition or fingerprint recognition. However, these patterns can or maybe this system can be easily fooled. We saw in a previous presentation that you could just add some noise to an image and the system will be fooled. Or there's some research that shows that you can just print a treaty, a mask of your face and it will fool most of the face recognition systems. Another important approach is the continuous authentication. The continuous authentication is not meant for starting logging. I mean, not for the password or logging time. It's meant for the use of the system. It tries to understand how you interact with the system and stop anyone that is not you from doing any possible harm. However, this approach takes a long time window and it opens a huge window to the attacker to be successful. And finally, there is the connection behavior. The connection behavior what does is trying to understand how you connect the system in terms of connection, timing, location or even your device. However, there are two particular problems here. The first one is that if you are very stable in your behavior, like you always connect from your office the same IP, the same machine at the same time, labor hours, the time you are traveling or you're connecting from your home, you will have a false positive and will annoy you with a second factor authentication or something. And if you travel a lot, anything will be normal for you. So you never, I mean, you'll always be exposed and you will never have an alert, right? So the main goal we want to do with our project, with our proposal here is to successfully validate the user's identity. I mean, avoid just relying on secrecy, on passwords that are not revealed and start looking at who are the people who claims to be. Since early 2000s, there have been two main approaches to analyze this problem. The first one, the context-based authentication and the second one, the behavioral biometrics authentication. In the first one, we can create a profile from the user from the connection patterns. That means that we are recording the browser, the operating system and also the time of connection and we can recreate a pattern from the user from this connection. In this pattern, you want to differentiate from the user from another one and you can help the user to enhance their security. On the other hand, we have the behavioral biometrics. Here, you want to learn how the user moves the mouse and also how the user types. However, if you want to create a good profile from the user for these patterns, unique patterns, you have to have very long time frames in order to analyze. That means that you cannot analyze it or analyze the user in the login time. You only can analyze the user if you get more than 30 seconds of interactions, but a common login time is less than 25 seconds. This usually works very well on continuous authentication but not in static authentication. Moreover, when we are dealing with this information, you are dealing with information who is sensitive. That means if you are recording the keystrokes typing, then you are recording the password or your credit card number. It is not good. Then you want to analyze the data. But if you analyze the data, the most typical thing that happens is that your model lost predictable information. So you have to make a trade-off between the privacy and the model performance. However, all of this is in the theoretical side. In the real life, the data is complex and the behavioral patterns are very complex. How are they? Well, as many of us may know, in real life, things not oftenly happen the way they were supposed to, right? So we make a lot of assumptions and they are not just fulfilled in the real life. But to create a robust model, something that is usable in the real life for real users, we need at least real data that will make us closer to something that is actually usable. For that, we collected a dataset containing 320 hours of computer-human interaction. It means, in a gamify environment, real people use real computers performing tasks like day-to-day tasks, such as writing emails, browsing on the Internet, or creating documents. From the part of the context data, or device identification, we collected more than 2 million users, real users, and these information of the users were collected using our company's product. So we know that it's real data that is out there. But to test our hypothesis of how this individual and our approach works in static context authentication, we have to test them before. So we created some features out of this data and tried with a random forest algorithm. So our results show that the random forest for the behavioral biometrics in static context authentication, or logging time, if you want to, is 0.79. What does it mean? It means that almost 80% of the time, we know we can differentiate when it's an attack or a real user trying to log in. However, 20% of not recognizing real world means a lot. So at your left, you will see a shadow area in the graph. What this shadow area shows, the best possible area of thresholds you could select for this algorithm. As you can see, the crossing point, which will be like the best you can choose, shows 80% accuracy, but 80% recall. What does it mean? It means that if you're 80% accurate, you can tell a from 10 times when an attacker is occurring. However, two of 10 times, you will let an attacker pass. That's a lot of time, and that could mean a lot of loss and a lot of problems, actually. So as we can see here, the conclusion with this algorithm is that behavioral biometrics by itself will be useful, maybe enough, but you have to select your threshold very, very carefully, and it depends on your data. So in the best case, you have an accuracy of 0.72% or 72%, but if you move just a little bit threshold, you will see an important decrease on your accuracy and also in the recall. So it means every time you move just a little bit threshold, your threshold, you will be losing more and more attacks. Now, testing with the same algorithm, the random forest algorithm, our base context authentication, we see that it has a better performance in general. I mean, we have an AUC value of 0.82%, or close to the 80% again, just a little higher, of accuracy, but it's also sensitive, however, it's more stable. Now, it means that in general terms, people has a normal behavior or unstable behavior that could be predicted, but if you see the plot at your left, you will see there is a threshold that is really low. Also, that means that you are probably missing a lot of attacks or you will confuse a lot of different machines. Also, the known weakness of this algorithm is that the context could be easily mimicked or easily replicated. So these kind of algorithms only works as a decision criteria, but are not definite and definitely it doesn't really protect our identity. So, as you can see here, it's more stable, but the results are not really definitive. I mean, we can catch 75% of the attacks, but 30% of the attacks are just bypassed. So, to finally understand the problem we're trying to solve here, we need to answer this question. Who are we defending from? Who are we protecting our users from? Let us introduce three types of attacks. The first one, the simple attack, the second one, a context attack, and finally, the physical attack. For the simple attack, imagine a hacker anywhere in the world who is trying to log in in your email account. In the context attack, imagine a more sophisticated hacker who is trying to log in your email account, but he or she is replicating your device. And finally, in the physical attack, imagine you are tired and you want to get some coffee, you get up and go for the coffee, but in this time, a colleague from you takes your machine and filters some important data. Now, let us show how the before the previous model behaves in these three types of attacks. First one, the context-based model. It performs very well in the simple attack because you are from a different machine, then we can recognize the hacker. But in context and physical, we cannot recognize the hacker because the hacker is different from you. Then it missed the authentication. On the other hand, the behavioral biometrics only performs good in the first simple and context attacks, but it doesn't perform very well in physical attacks because the person is different. So today we are proposing a new model which combines information from those models, and then we create a new combined approach which performs very good in all the three simple or all the three types of attacks. The question is how? How we combine these models? Well, takes the information from the behavioral biometrics, add the information from the web-based fingerprinting, and create for each one a machine learning algorithm, in this case a random force. Then you have an output for the behavioral biometrics and also an output for the web-based fingerprinting. If you sum up or you combine these outputs, you can create an enhanced security model. How is this combination? Well, we made a sensitive analysis and then we combine in a parametric linear combination the outputs of these machine learning algorithms. After saying that, our results are very good compared to the single models. Here you can see the AUC, and it is almost 21 better than the other single models. Also, you can see here that the receive operative characteristic curve or rogue curve is softer compared to the single models. Here the dash line is for the coin. That means that you are guessing, but when you are close to the point 0, 1, then you are in the better scenario. As you can see, the green line is the best scenario of our three models. That means the combined model. To be more specific, let me show some statistics about our model. First, the precision is higher, but that means for the real user that we are increasing, that we are reducing the friction because we are reducing the false positive. If you reduce the false positive, then it is easily than a legitimate user lock in the website. On the other hand, we are reducing the false negative. That means that we are increasing the security because an hacker is very difficult to access to the system. Finally, if you sum up these two metrics and also add the accuracy, then we are creating a more robust model. All of these models are based on robustness, but also are reducing the friction from the user. In conclusion, our model outperforms the other single models by combining the information from the behavioral biometrics' unique patterns and also from the context-based authentication. In this way, we are also reducing the friction that a final user sees when he is trying to access to the system and finally increasing the security of an legitimate user. As you can notice here, it is only possible for we are only testing in static authentication. That means less than 50 or less than 30 seconds. However, if you imagine these 30 seconds as a time window and you aggregate the time windows, we are easily straightforward, we are extensive this problem to the continuous authentication problem. So this model is working well not only in the static authentication but also in the continuous authentication program. Well, to close this talk, I want you to leave three things so you can take it home. The first one is that as landscape security has changed and evolved since a lot of years ago, the focus of security nowadays should be focused on who are you, on identity, not just secrecy, because we've been working with an old model and it's just easily to fault. Second, the behavior are just little details that tells a lot about us. Focusing on these little details will increase a lot our security and that is because first of all, it's hard to replicate that kind of little details and second of all, because it will tell who is the real person and who is not. And finally, I hope we could convince you that security is not the complete opposite of user experience. In this case, we're not just increasing the security of logging but also reducing the annoyance of having to perform a second-factor authentication or having to remember long and complex stuff or just trying to pass in a simple challenge that even a machine can pass. So that's everything we have today. Hope you like it and we're ready for taking some questions. Thank you. Actually, here the attacks are artificially generated. That means that an action from the one user is the attacks from another user because they are performing the same action in the dataset. Then you can have this user as an attacker from this user. The short answer here is yes, we use it. However, basically the main difference here is that behavioral biometrics by itself or on its own is not enough for static user. I mean for logging time like less than 15 seconds where you're typing your password and stuff but if you combine that using behavioral biometrics a little difference of our future extraction with it and you combine it with a sensitivity of the device identification and connection behavior you will have a better accuracy because what makes our approach better in the results is the combination rather than using just one single approach. So as most of the machine learning algorithms the output saying that it is an attack is just an actionable alert. That means that when an attack is detected the system will take an action based on maybe the threshold or the level of alert or whatever. So think about you detect 70% probability of attack. That seems suspicious but not necessarily malicious, right? So you may want to actually annoy the user with a second factor authentication but what then happens is a lot less than usual or the individual approaches and if you have like a 99% of attack you could just deny the action. Okay, I want to repeat the question basically the question is about if we replace password with this approach will hackers have another vector of attack? It's serverless? Okay, that's an excellent question here is that as you're logging once again one important thing is that you have some sort of noise maybe I mean you... every time you type your password or moved your mouse you won't do it exactly the same time however you have like habits, right? We're trying to detect the habits and if someone tries to imitate that habits it will be very very hard because the time is so little and the difference could be so noticeable and of course our algorithms are I mean a system that uses any machine learning algorithm will be free trained because it won't last forever, right? So at the end of the day one of the cool things that maybe we didn't show but it's important here is that we could use little interactions to start identifying people and as the attackers and actual people tries to log in into the system we could just train it with more and more data that cables a better recognition of the very profiling of the users so at the end we don't necessarily need to replace password per se we could just use it as a better identity double check maybe that's part of our approach Actually for... we don't create one model we create two models one for each one and the features for each model are different because one are features from the context base and other for the behavioral biometrics features from each model when they give one output in the machine learning output then we create a RIS based model that means that we sum up the outputs and then we create a final output this output is probabilistic as they say but this output is giving us a level of RIS that this connection is legitimate or maybe an attack and a sensitive analysis and we found out that the model who is contributing most than the other in this case is the context base authentication because we have more data in this case but also because the connection patterns is more easy to get despite of the other behavioral biometrics Well, basically your question if I get it right what happens when a hacker could bypass one individual model and then sum it up like conjoin attack okay, so first of all we showed earlier that there are like three categories of attack there is another kind of four category of attack that we didn't show because it's just our future work and here's the thing if a hacker is able to bypass one single model I mean individually that actually requires a lot of work I mean in terms of like investigating information to replicate your machine or actually having physical access to your machine the other part is how or what are hackers doing to actually replicate a valid behavior of you because as I said before it's more than I mean exactly the same values each time you login but you will have close enough values and that difference is what I'm trying to measure, right? Now, someone that takes the time and time enough or puts himself on the task or doing both stuff it's definitely a target attack and it's less probable and it's kind of out of the scope right now because that's part, I mean that's part of our future work and so the short answer here will be yeah, it's definitely probable that if someone achieves to bypass both attacks individually maybe a conjoined attack could bypass kind of easily our current work but we haven't created that kind of a scenario yet I think that's all the time we have thank you for joining us