 Global internet penetration has increased by 10 times in the last 18 years. At the same time, as mentioned, the landscape of attacks has continued to grow. And it was estimated that in the last year, they cost about $600 billion. So this is a very important and yet hard problem to solve, because the amount of data that you can get from the network is enormous. So we always call it finding a needle in a haystack. And since the network behavior is highly dynamic, and the attacker is also coming up with new work from the tax, it's like finding a needle in a haystack can change its color and size. So one common way to do anomaly detection is to set load threshold. They say that if your load goes above certain threshold, it's probably an attack. But is this a satisfactory solution? It's very hard to set all these thresholds. And high load doesn't necessarily imply malicious traffic. So while this problem is very challenging, and the issues of having a lot of data is problematic, but it can actually be turned into an asset. If you can convert this data and use them as input to a machine learning algorithm. And the algorithm can hopefully provide you a lot more information about what is going on in your network. So in this NUS, Singtel cybersecurity lab, we have been looking at the large amount of data that Singtel has collected and tried to use it to solve our problem. So we have been looking at designing different techniques so that we can basically automate a organization's capability to do a lot of detections. So this is kind of a very simple flow chart of how usually you do on form of machine learning, which is called supervised learning. Usually you will assume that you have a large amount of ground proof or labored data. You know that these are attack traffic. And based on it, you fit it into your machine learning algorithm to learn. The issue is that you normally don't have that much labored data. What you can also do is do clustering. Basically, you look at features such as the day of the week, time of the day, and all the addresses, and try to figure out how you cluster them. And hopefully, you get nice data like this. But it doesn't happen all the time. Quite often, you get data that is very hard to cluster. And so now the question becomes, can you actually design algorithm that can detect unknown or even new attacks by processing a large amount of data in real time? So one of the approaches that we have been looking at is something called provisional autoencoder, or VAE for short. So VAE is an unsupervised deep learning techniques that doesn't require ground proof data for training. So what it does is it tries to learn the distribution of the dominant normal data. And based on that, it can get a sense of what is normal. So to make the discussion more concrete, I'll just show here some of the features that you can collect from different network, things like source, decision, network addresses, traffic, or application types, packet sizes, and timing. You can convert this into many possible variation and look at it differently and come with many features to look at. So how can VAE help you? So first, what VAE does is that it estimates the distribution of the normal traffic. And based on that, you can come out with a distance. And that distance will help you decide whether a certain set of data is a normal or not. Instead of you setting just a load threshold. And due to processing limitation, usually you can only select a small subset of the features to use to fit into your machine learning algorithm. So what VAE can do is that instead of making the user select the right set of traffic, you actually can fit in a large, a much larger number of features to the machine learning algorithm. And then we should tell you what is the right features to use. So what can such a scheme play in a wider context? One thing that we have looked at is that can it generate fingerprints? That means by looking at the data, can the system give you a fingerprint of known data, the data that you have seen before? Once you have this, that makes detection and resolution much easier. Another thing that we would like to do is that, well, since we have fingerprints, and if you have stable fingerprints, then is it possible to find new attacks by identifying new fingerprints? And finding new attacks always very difficult. So if you can use it to finding new fingerprints, that is a feature that is really missing from existing techniques. So in conclusion, what I have presented is a way of how we can incorporate, incorporate, unsupervised deep learning techniques into an overall framework for network intrusion or detection. And of course, these techniques will have to be used in conjunction with many other techniques as well as human input. Thank you.