 We will now see a real example of a flow-based intrusion detection system that has been developed at the University of Toronto, Shikur. Shikur is a flow-based intrusion detection system dedicated to the detection of OS compromised by SSH dictionary attacks. Let's first have a look at why we need flow-based intrusion detection. There are several factors that can limit the processing speed of an IDS, especially for increasing line speed and link load. First, the network card with IDS host will introduce a first limitation to the number of packets that can be captured and processed. Second, the detection engine is often implemented in software. And third, the detection engine will need a certain processing time that depends on the complexity of the detection algorithms to identify an attack. Complimentary to this, we should also consider that line speed is still increasing. For example, the link connecting the University of Toronto and the Dutch research and education network surfnet is 40 gigabits per second. The Energy Science Network ESNet in the US has recently upgraded to 100 gigabits per second link. The question is then, what are the alternatives to DPI? A possibility to cope with increasing line speed and network load is to work with aggregated data, such as network flows. We have already seen when discussing the taxonomy that network flows provide a few desirable characteristics. For example, since no payload is available, they are more privacy friendly than DPI. Also, algorithms using flows are not affected by encryption. Scalability is another advantage of flows. Also, flows are already largely deployed in backbone networks for monitoring and accounting. The disadvantage is clearly the lack of payload, meaning that any algorithm with design working on flows need to cope with this. We will see now an example of flow-based intrusion detection. Let's introduce Shikir. Shikir is an intrusion detection system developed at the University of Toronto that targets SSH dictionary attacks. It might seem unlikely that nowadays SSH attacks are still successful, but we have to find out that this is in fact the case. A study conducted by the Pony Money Institute reporting that in 2014 more than 50% of the survey organization have been involved in a security accident involving SSH. Sadly, there are still quite a lot of vulnerable hosts out there. In particular, Shikir aims at detecting SSH dictionary attacks. An SSH dictionary attack aims at gaining unlawful access to hosts that run SSH daemon. What allows an attacker to gain access is that the host typically has an account with weak credentials. Possible credentials can be found in so-called dictionaries, pre-compiled lists of user names and password. Shikir performs detection in a completely flow-based fashion. Moreover, Shikir focuses on compromise detection, meaning that the goal is not to identify an attack, but to identify a report on a successful attack, which we name a compromise. Compromise detection is high on the wish list of security operators because it lowers the number of alerts that need to be manually checked. Let's then have a look about how SSH looks like when we have only flow data available for our analysis. We can observe two aspects. First, if we consider the source and destination IP, and in particular the flows from an attacker to his targets, a clear pattern emerges. In this picture, you can observe the evolution in time on SSH dictionary attacks against the Austro-University turn-to-network. On the y-axis, we have an host identifier. On the x-axis, we have time. The picture shows that this attacks progress in three phases. First, the attacker scans the target network looking for SSH demons. Second, the attacker intensively contacts the host in subset of the target network address space. This is when an attacker brute-forces the SSH demo using dictionary file. It does so efficiently, only for hosts that have been previously identified. Finally, we observe some residual traffic, either to the attacker or to a small subset of the host in the network. If we then consider how many packets and bytes were exchanged, we obtain this picture. On the x-axis, we have time again. On the y-axis, we have instead the number of packets per flow exchanged during the attack. This number vary between 1 packet per flow to roughly 16 packet per flow in this example. However, what is important is that we have a confirmation that the attack progresses through three main phases. Keep in mind that although the attack pattern remains the same, the attacks can show some variability. For example, in the number of connection of the packets per flow values. Let's now see how Sheikir tackles the detection of this attack. Sheikir models the behavior of SSH dictionary attack in three states. A scan state during which the scanner identifies possible victims. A brute-force state in which the attacker tests the dictionary against possible victims. And finally, a compromise state that implies that at least one of the user name puzzle combination in the dictionary was successful. An SSH dictionary attack can start either in the scan state or directly in the brute-force state. This is due to the fact that either the scan has been performed at an earliest time or by a different host. From the brute-force state, the attack will either end directly if the attack was unsuccessful or progress to the compromise state and finally end. We have seen that for this SSH dictionary attack we are able to identify a clear pattern at the flow level that is using only flow information about the time evolution of the attack. Sheikir uses this pattern as a way to identify the attack state. However, in practice, we have seen that only the scan and brute-force state can be quite easily identified. The compromise state presents a lot more challenges. For example, traffic from the target host back to the attacker could be caused by tools like fail to ban, which locally rejects connection from possibly offended host, while we in reality think that we are dealing with a compromise host. This is where our domain knowledge comes ending. To better characterize attacks, the developer Sheikir's have embedded into the detection algorithm a set of behaviors upon compromise that can be observed at the flow level when serving a set of attack tools. The picture above summarizes these findings. The analysis of attack tools shows that when a compromise takes place, the attack tool can either continue the dictionary for the compromised target or abort it. In addition, a tool can either maintain the successful connection open or log out immediately. When Sheikir analyzes data, it does so in five minutes intervals, which we call a data chunk. The tool behavior can either be observable within the same data chunk, like in the first and third column of the picture, or span over chunks, like in the middle column. Sheikir actively monitors flow data related to hosts that are involved in a brute force phase and search for instances of the highlighted behaviors. If any of the six options in the picture is identified, the attack has generated a compromise. This has proven to be a happy design choice that keeps the false positive rate low while still being able to identify the successful attacks. There are a few takeaway messages that we can bring home after studying Sheikir. First of all, we should not underestimate an attack. Several examples in this course show that no matter how simple or old an attack is, until poorly managed hosts exist, the attacker can still be powerful. This applies to SS8 dictionary attacks as well. Secondly, Floppy's approach likely benefits from specific domain knowledge. One can argue that this is a general rule in intrusion detection, but I would add that it is even more the case if you work with aggregated data. Last, the future of intrusion detection is in compromise detection, in the sense that operators and security specialists are ready to move toward a situation in which the number of alerts is kept small and the confidence that an attack is successful is high. If you are interested in more information about Sheikir, have a look at the Sheikir GitHub repository.