 In this set of slides, we will look into network scans. We have seen in this week introduction that the basic mechanism behind a network scan is engaging the target to answer and probe packet. The type of answer or the lack of an answer can provide valuable information to the scanner. For example, based on the behavior of TCP, the scanner can try to infer if a port is open, in other words, if a service is active on that host, or vice versa if a port is closed and no service is running. Let's now have a look into some different types of scans. Depending on the type of scan, different information can be gathered. The simplest type of information one might want to gather is which hosts are active. To do this, a scanner will progressively contact all hosts in a target set. It can be a target network or a list of IP addresses and probe them. There are several ways to do this, in the sense that several protocols can be used for an host scan and give you information about which hosts are alive. A very simple scan, in this case, is a so-called ping-zweep, in which the scanner sends ICMP echo packets to the targets and wait for a request. The advantage is that such a scan is really simple. The disadvantage is that ICMP echo packets can be ignored at the target or can be filtered by a firewall, so such a scan might be unreliable. A different type of information one might want to gather using a scan is which ports are open on a host. This information can be considered a good approximation of which services are active on a host. One can use both TCP or UDP for a port scan. In a nutshell, there are pro and cons for choosing either of these protocols. TCP scan uses TCP responses, like SIM packets or Reset packets, for inferring if a port is open or closed. It is considered a quite reliable way of scanning because TCP is connection-oriented, but it might be expensive for the scanner to initiate a TCP connection with multiple destinations. UDP port scanning, on the other hand, is simpler but more unreliable. It needs more guessing for identified open ports because it will either wait for an error message, for example an ICMP destination and beachable message, or for a legit response for a certain service, DNS, for example, if a DNS request was sent. If no message was sent, it will assume that a UDP service is running on the probe port. It has received the probe, but not knowing what to do with it, it has simply thrown it away. Of course, several things could go wrong during this inference process. ICMP might be filtered and UDP probes or responses might get lost. In the following, we focus on a couple of examples of TCP port scanning. The simplest way of scanning announced using TCP is to rely on the operative system network functions, as to say, use the regular implementation of TCP. A scanner can, for example, use a regular TCP handshake to check if a port is open. It will send a SIM packet, receive a Sineq answer if the port is open, and then complete the handshake with an act. Alternatively, the scanner will receive a Reset packet if the port is closed. So far, this is the regular TCP behavior. You might have noticed, however, that the scanner does not actually need to complete a TCP handshake to have all the information it needs. From the scanner perspective, receiving a Sineq or a Reset packet is always needed. If the scanner interrupts the TCP handshake after receiving an answer from the target, we talk of a SIM scan or half-open scan. The scanner will interrupt the handshake with the Reset packet. The disadvantage here is that the scanner needs an undocumented implementation of TCP. A scanner can also try to engage a negative answer. For example, in a Christmas tree scan, the scanner sends packet with a thin, urgent and patch-like set. The packet will trigger a Reset answer if the port is closed, or it will be ignored by the operative system if the port is open. When performing a scan, you can either search for a specific open port in a set of IPs, or vice versa, trying to identify all open ports in a single host. In the first case, we talk about an horizontal scan, while in the second, about a vertical scan. Vertical and horizontal scan can also be combined in a block scan. Scans are typically the first steps of a more articulated attack, and in practice, a scan and a different type of attack, like password guessing, are often combined. Let's see an example. In this picture, you have a representation over time of an SSH dictionary attack that we have measured at the University of Toronto campus. On the X-axis, there is time. On the Y-axis, we have a numerical identifier of the IP addresses. A dot indicates that there has been a communication between the attacker and the host. An SSH dictionary attack aims at finding SSH accounts with an easy-to-guess username-passer combination. To do this, the first step is to identify hosts with an active SSH service. This is a scan on port 22, which can be clearly seen in this picture as a line. Then, the attack directly progresses to brute-force username-passer combinations. This is what happened in this second part of the picture. Scans can be quite noisy from a traffic pattern point of view. This can make them quite easy to detect. And since the scanner will most likely use one or a small set of hosts to perform the scan from, scans can also be easily filtered. Scanner can apply several techniques to try to remain under the radar, like scanning at a very slow rate or distributing the scan. Sometimes, more complex techniques are in use. Let's see an example. This picture is a planner representation of a slash 8 network block used for this specific measurement. The representation uses the so-called Hilbert curve. By using a Hilbert curve, IPs that are closed in an address sense will also be closed in a Euclidean sense. Each point on the picture represents an IP address. The colored points are IP addresses that have been contacted by during a distributed scan on the super protocol performed by the Solity Botnet. The scan seems random. However, analysis of the traffic generated during the scan shows that the scan was actually performed in a sequential manner, but in reverse IP sequential order. If you reverse the order of the bytes in the IP addresses, the scan appears to progress in a sequential manner, and it is highly structured. In this case, scanning in reverse byte order is used for obfuscating the scan and distributing the probes in a seemingly random way. So far, we have somehow assumed that a scan is an attack. This was considered true in the past, but it is nowadays a too restrictive definition of such a mechanism. In truth, there are legitimate reasons to perform a scan, like for example, a network administrator that needs to test connectivity to end hosts. Scans are also often used as a data source for research. In this respect, network scans fall into the category of active measurements. For example, a project by among others, the University of Southern California, uses pink sweeps to map which parts of the IPv4 address space are in use. Researchers at the University of Michigan have developed ZMAP, a tool that allows efficient internet wide scans that is used for gathering information about cellular protocols. For example, they use it for studying the impact of the heart bleed SSL vulnerability in 2014. It might seem that the malicious or alleged nature of a scan is then only a matter of intention. I do not believe this is the case. The fact that one can scan the internet does not imply one should. The research community is constantly reflecting about the ethical implication and the possible consequences of active measurements. It is in addition considered best practice to adhere to a few rules. For example, limiting the network hindrance both at the source as well as the destination and be as clear as possible about the goal of your study and in stating that this is done for research purposes. Despite this, some targets might not want to be part of the scan. A proper study should then include the opt-out mechanism, allowing networks to be excluded from the scan. Last, once the data are collected, researchers need to put care in deciding if and how data can be shared with others. Data sharing is crucial for the research community, but it might endanger the targets and measures like responsible disclosure should be taken such that it does not happen.