 In these set of slides, we will give some more details on natural flows. We have previously seen that network flows are defined as a set of packets with common properties that pass the same observation point. We have also seen that while a flow is an abstract concept, in practice we are able to measure flow records, that is, a flow is split in parts for reserve measurement. We will now go more in depth into the topic of flow monitoring by looking first at the flow metering and exporting process and then at exported data. The typical architecture for flow monitoring is based on the following stages. In the packet observation stage packets are captured and pre-processed. The following stage consists of the metering operation, which aggregates packet into flows, and the export operation, which exports flow records to collection devices. In the data collection stage, flows are collected from the exporting process and stored for later use. Finally, flows are analyzed in the data analysis stage. In practice, different devices take care of the stages that we just mentioned. An example is given in this picture. It is important to note that often the packet observation stage and the flow metering and exporting stage are combined in the same device. In the picture, this can happen either in a dedicated probe, like flow probe 1 or flow probe 2, or in a forwarding devices, like at the entrance route of a network, for example. These devices export flow records and send them to the flow collectors. Devices dedicated to collect, store flow records, and often also pre-process them for later analysis. After this stage, flow records are unavailable for manual or automatic analysis. Let's now look more into details, into some of these stages. In the packet observation stage, packets are captured from an observation point and pre-processed. The first step in this stage is packet capture, in which packets are read from the line. After this, the packet receives a timestamp, that is, the time in which the packet is captured is recorded. Accurate timestamping is crucial for further processing. Packet capture and timestamping are always performed in this stage. After that, the packet observation stage progresses through a set of optional steps. If only a number of bytes of the packets are recorded, we talk about truncation. Last, packet sampling and filtering rules define which packets are included in the measurement and which other are instead disregarded. In the flow metering and export stage, packets are aggregated in the flows, and flow records are exported. In the metering process, packets are aggregating into flow. This is done using information elements that define the structure of a flow. Flows are stored in the flow cache. Also, the metering process adhered to a set of expiration rules to decide when a flow record should be removed from the cache and exported. The flow metering process uses a cache accounting flows. Each time a packet is observed, the flow metering process checks if the cache is already containing an entry for such a flow, and if not, it creates one. Cache entries also need to be removed from the cache. This happens if a flow is longer than a certain timeout, active timeout, or if a flow does not account any packet for some time, inactive timeout. For TCP traffic, cache entry are inspired if TCP thin or reset packets are observed. Expiration can also happen due to resource constraints at the device. Occasionally, the entire cache of the device can be flushed. After the flow metering process, flow sampling and filtering are a way to reduce the number of flow records that are exported towards a flow corrector. Flow sampling and filtering are operation equivalent to packet sampling and filtering. However, note that at this stage, these operations are applied to the entire flows and not single packets. Last, the exporting process takes care that flow records ready to be exported and included in an IP fix message and encapsulated in a transport protocol packet. Typically, exporting is done over UDP, but TCP is also possible. After the packet observation and flow metering processes, flow records are collected and used for data analysis. Data collection is done at the flow collector. The flow collector receives, stores and pre-process flow data from one or more exporters. The processing task can be, among others, data compression, anonymization or filtering. Data analysis is the final stage of the flow monitoring process. This stage depends on the goal for which the flow is collected. Example of data analysis that can be performed using flow data are accounting and reporting, security and performance monitoring. Now that we know how the flow monitoring processes work, it will be tempting to consider it as a black box. The process takes network traffic as an input and gives network flows as an output. The implicit assumption here is that network flows are equivalent to network traffic in the sense that they will give us a faithful representation of the status of the network. Our studies have shown, however, that this assumption is far too simplistic. Our experiments have shown that also the flow monitoring process, like any other measurement process, can introduce measurement errors or, more precisely, measurement artifacts. In our research, we have systematically investigated measurement artifacts introduced by flow monitoring appliances. We have identified artifacts as imprecision in when a flow is removed from the cache, in missing flags in TCP flows or flags present in non-TCP flows, in precision in the byte counters and gaps in the exporting process. We believe it is very important for research to develop a flow-based analysis solution to be aware of these artifacts. In this way, the effect of the artifacts can be taken into account when the analysis software is developed. Let's have a look at some of those artifacts in more detail. We tested a group of six flow devices from three different vendors. The first five devices are routers, while device number six is a dedicated flow probe. Our task group included a variety of hardware configuration and software version for widely used devices. We investigated the artifacts by sending crafted traffic with specific characteristics with respect to flow duration, flags, and packet size to each one of these devices, and then checking the quality of the exported flows. One artifact that we studied is the fact that TCP flows are exported without TCP flags. More specifically, a subset of Cisco devices only rarely export flags for TCP flows. These artifacts affect 99.6% of the flow records we observe. This is the majority, but not all the flows. This percentage is due to the internal implementation of these devices. For efficiency reasons, most packets are switched in hardware, but in this case TCP flags are not accounted in the flow-metering process. A small percentage of packets are instead switched in software, where the flags are instead accounted for. It is important to note, however, that for those devices the TCP flag, although not exported, are correctly used for deciding when a flow record needs to be exported. Another artifact related to TCP flags is the unexpected presence of AC flag in some non-TCP flow records. These artifacts are due by the fact that some devices use the TCP AC flag for discriminating packets' fragments from non-fermented traffic. As a consequence, some non-TCP traffic is exported in flows that have a TCP AC set. The artifact has been observed in a group of Cisco devices and it affects roughly 1% of the flow records in our test traces. In this video, we have looked in detail into the flow-monitoring process. We have learned about the different stages that constitute the flow-monitoring process from packet observation until data analysis. We then have seen that flow records cannot be considered a one-by-one representation of the network traffic, and we have pointed out that also the flow-monitoring process has any measurement introduces measurement errors. Those artifacts, due to implementation decision, might impact the quality of data and especially the behavior of analysis process that rely on flow data. It is therefore important to be aware of this issue and develop flow analysis software that is able to cope with these situations.