 Welcome everyone to this talk my presentation about network flow analysis using net flow protocols and t-flow 2 Which is a software I've wrote last winter It's nice to see so many people around here in this room. I was expecting it maybe 10 20 people. I see many more here I should you go through the agenda So first I introduce myself who am I? What is the flow? What is net flow and why would I even like to collect this flow information? What are these net flow protocols what exists there out on the market that you can utilize to get flow information from? your network Then I will Be talking about how I tried to find usable collection and analysis solutions There wasn't too much out there that was usable for me to say the least I'll be talking about t-flow in more detail down to really down to implementation details in the go language and what I found to be Fast and what was too slow on packages that is in the go but that are in the go bind the go default Packages and in the end there will be a demo on t-flow 2 so you can see it in action Yeah, who am I my name is Oliver herms May you may know me as you know me as tact as well I'm a network engineer or network. Sorry somewhere in between I'm working for as 5 1 3 2 4 all thomas x-siring AG at the moment You may know our product why put TV? It's very good Until recently I used to work for Google as 15169 in network operations and the network. Sorry now I'm back and That's not enough networking for me. And that's why I also work on as 21701 the Freifunk Rhineland backbone I'm a pixie certified CCMP GenCIP source provider blah blah blah you might not care about that So let's go deeper and go to the extra topic like what is the flow? So a flow in a network is a U-directional sequence of packets that share certain properties So what does that typically mean? So when the packet goes every packet has some properties like You have a source IP address if this is a destination IP address You have a source mega dress destination mega dress source board destination port and whatnot. There's lots of information That is not the payload, but it's actually metadata is used to forward the packets to the right destination So a flow is an end to pull Typically, it's a source IP is national IP protocol source but destination source port and destination port and That's mostly what people agree on that it is a flow Or what you can say is it's pretty much a communication flow between two sockets on two machines That is what a flow is so all packets in the network belong to a flow Flow might be just one packet or can be millions of packets over the time a flow can be living for I don't know one second flow can be living for hours days weeks months Even years, so if you keep your SSH connection open for example for a year That is one network flow is one TCP connection and is identified by a four tuple or five tuple of Source IP destination IP source for destination port and protocol and that's it and that's a flow So what's net flow? So that flow is like my family of protocols that is used to analyze network traffic So you have a network and just imagine your traffic going and you can see a counter. You can see it counters How much traffic is going from one interface to another interface and maybe you notice that interface is full or you're dropping packets or whatever and Maybe you want to have a deeper insight on what is actually going on this one that flows there for So it samples these flows so there are sort of machine the router or the switch keeps Track of what flows it was forwarding packets for and counts the packets and the bytes So this is the volume of the traffic that was forwarding and it wasn't invented by Cisco initially So what components actually involved when I use net flow so you can see here is in the middle You have a router and the router actually forward some packets for a user or for whatnot and What that router does as I said it tracks which flows are going through And these flows are periodically exported in special packets and being sent to a To destination a destination is a net flow collector Net for collector reads in the packets and gets all the metadata about which flows have been passing through the router So what net for collector actually does is either keeps it on RAM information for some time Or it stores it on long-term storage like disks SSDs or whatever And what you then usually want like you have the data But that's not the thing what you want what you want is you want to have analysis of the data You want to look into it you want to see what is this data used for how like you want to analyze it Where's traffic going? Where's it coming from? When was it how much did increase it decrease? There's certain reasons why you want to have this Yeah, why would you collect metadata? The biggest reason for my for me personally as an adult person is travel shooting So whenever you have congestion a network a link is full The only thing you see is that the link is full you have no idea what traffic comes from you have no idea What traffic goes to you have no idea how to mitigate if you don't have net for information You could sniff on the interface with TCP dump or something and see actually what is going there I make a guess like our traffic is going from here to there, but turns out on big routers You have no TCP dump At least not for the for the forwarding plane Or when you have a DDoS attack Your links are on full again question is who's under attack which IP address Because you need to know who's under attack so you can install maybe a black holding route or something So that the traffic gets dropped at a perimeter and doesn't congest your backbone network Another reason to have this is capacity planning Like you want to know if this link goes down Where's the traffic actually going? So let's say you have a link from one point to another point and you have a backup link that somewhere else And you want to know if that backup link is big enough With two ways you can either look at Where's the traffic going on the link that is utilized or what you can do is you can just shut down that link And see what happens but that's not the way you want to do it in a bigger network and your views are on it and The third reason might be somewhat important to some of you is security Because it gives you some insight on who's communicating with whom on the network You may see network flows like SSH sessions to machines that should not be there like between two between two machines like you have a let's say you have a VLAN with Regular workers in your company and you have an engineering VLAN and suddenly you see that you have actually SSH connections from Regular VLAN to your production network where all your servers are and they should not exist Maybe it turns out that your authors failed to program the firewall rules properly I've seen stuff like that and it was found using that flow Of course, you can also look at the logs on the server itself But then you have to visit All the servers if you don't have to centralize logging and this is a quite easy solution to Check out if you have any illegal flows in the network So there's different network network network flow versions out there Just to make it short net flow version 128. They only support IPv4 So I wasn't consider to using that to use them at all. They only support a fixed set of attributes which means If anything changes a network environment environment like you add MPLS to your network you want to have MPLS labels On the on the packets you want to see which label was used to forward a packet And you have net flow version that doesn't support MPLS Yeah, you're screwed You have to wait for a new net flow version. That's how it was in the past But luckily then they invented net flow version 9 and therefore version 9 works With a templating system that allows the variable range of attributes to be carried on on the packet so that means The router is not just sending the data packets where it says there was a flow from X to Y This amount of bytes whatnot, but it also sends a template and says that says this field number want like this is a template number one two three and There's this field and this field and it means this and this and there's this field and that means this This send out on a data packet the templates being referenced To see what that data actually means so you need to template to make sense of the data packets Otherwise the data is completely pointless for you. That's how net flow version 9 works It supports V4 and V6 And this template system makes it somewhat complex because you have to keep track of the templates And you cannot just read in the packet and say Offset, I don't know 12 bytes and there's the source IP address. That's not how it works in a flow version 9 And also there's IP fix. It's also known as net flow version 10 kind of it's basically the same It uses a different wording and a slightly different header And I'm actually not sure actually why it was invented like this Maybe someone in the room knows especially in the front I Was just told by people that yeah, you support the flow version 9 like I implemented net T-flow in the first place with net flow version 9 support and then people came to me and said yeah, you know what I have a nice Julliper MX router and it only supports IP fix Okay, I'll talk about that in a few minutes in more detail So it's deep depth into version 9 because that's what I implemented in first place It's defined in RFC 3954 and the general packet format is that you have a header And in the header you have a very variable amount of templates and data flow sets and option template flow sets So the template flow sets they include the templates as I said that that I just mentioned The data flow set is the data that is referring to the template They can be in different packets So they don't need to be in the same one like you send the template Like maybe every half an hour from the router you can configure that on the router usually it sends it out And the data packets they're coming in a stream like they are popping up every few milliseconds the options There's stuff in like if you use sampling on a network You just use you can say I don't want to know every flow. You can say I want to know every I don't know 16 30 second Flow or 64 flow or every thousands flow or I don't know what So if you have a lot of lot of flows, you don't want to export all of them because it's a lot of information and it's hard To maintain them in machines and a lot of resources if you export them all on a big network So you can you can figure sampling and if you can figure sampling The net flow packet will tell you that this data here is actually sampled data. So if you want to Interpreted and you want to See how much traffic is actually going you have to multiply this by certain factor that was configured on the router and it tells you in the in the option This option thingy unfortunately is not supported in T flow yet But that's the detail the network version 9 header is Quite simple. So there's a six in bit field as the first one is a version number That's usually set to nine for IP fix. It is set to 10. There's a phone ringing here There's a count Field at the six in bit as well that tells us how many records do we have in the data packet? So we know Where it actually ends? How long it is? There's a sis uptime that is the uptime of the source of the net flow packets in milliseconds. I don't know why that is there I never needed it It just exists The next thing is more interesting. There's unique seconds. That is the timestamp when this net flow data has been emitted by the router So it tells you this information was a certain point in time Sequence number. So all the packets that are being emitted by the router They are sequenced that allow you that allows you to see if any packets have been dropped Because net flow data is usually transmitted using UDP You send UDP packets from router to a remote host It can can get lost if you have congested network Can be lost and again what you can do is then if you want to calculate like how much traffic would it actually be? You get a factor of like it say you see every second packet is lost You can just what you play all your traffic by two and you have an idea like how much is it actually? I mean there's a source ID that identifies an observation domain I think I can be used if you have VRFs and stuff on routers And then it just that the VRF ID is in there or something usually it's set to zero in all my setups at least And T flow doesn't take care of the source ID if you want to run multiple instances You have to run multiple binary instances multiple processes. It's not differentiating it at the moment So the header is easy But here comes the template flow set format This looks a bit ridiculous So all these data In the in the in the in the packet Actually has kind of the same header format again and the header format like the very first two fields Like there's a flow set ID that is set to zero and the zero means this is a template If there's a value that's not zero it means this is a data packet zero means this is a flow This is a template packet Then length tells us okay, how long is the packets and then in there comes the template where we can see is there's a template ID 256 that says here's a template with the ID 256 and then it starts and it says this template has field count amount of Fields and then it says us okay. There's type number one For example, this can be a source IP address And this is four bytes long and then the next field can be this is IPv6 source address And this is actually 16 bytes long And this is how it describes what the net flow Data packet will look like and the data packet it refers to this template ID in this case the 256 that would be in the data packet That's how it works And then comes the data flow set format so the data flow again this flow set ID and length as I said This is the common header between the two The flow set ID tells us which template we have to use to decode this packet and the length tells us how long is this how long is How long is this flow information? So we know where the next one starts And that's basically how we get the information of the packets It keeps on it requires some state. It's I don't like state, but this makes it very flexible I think it's a good solution So there's also IPv6 as I said It's mostly the same as that for version 9 the packet headers differ slightly What differs mostly is the wording And getting from my netflow version 9 packet decoder to To an IPv6 packet decoder took me less than two hours Like I copied the thingy renamed some variables to adjust to the RFC Removed one header fields Compiled it started it and way it just worked Was so simple So you can see IPv6 versus netflow version 9 Or you can actually see us. Well, there's a packet header in netflow version 9 And there's a message header in IPv6 Isn't that amazing and then there's template flow set and there's a template set and We have data flow set and work net flow and with the data set in IPv6. So it's exactly the same Here comes the biggest difference actually is the header format of the netflow packet overall So on the left side, you can see that netflow version 9 packet header on the right side you see IPv6 All these fields have actually been renamed The interpretation of count and length are a bit different count means how many records do I have length really means How many bytes do I have after this field? This uptime Just got Removed from IPv6 is not there anymore Seemingly because nobody needs it as I said Unix seconds is export time sequence number is the same source ID just got redempted to observation domain ID It's all the same The same thing for template flow set you can see this here on the left side and very left top It says flow set ID equals zero and IPv6 it says set ID equals two So IPv6 the template is not ID zero but ID two That's the only difference Well, the data flow set format is again kind of the same just that it is different wording flow set ID versus set ID Length is the same and then all the rest is the same Short x-scores. There's a protocol called as flow as flow is a lie because it doesn't take care of flows But what actually does it samples it samples frames or packets And adds information about like interfaces and stuff So basically what what the device does is it receives a pack a frame. Let's say it gives either that switch receives a frame It just picks one out of let's say, whatever you configure out one out of a thousand picks a random packet and Checks out what was ingress interface? What is the egress interface on the switch? What is this then just copies the whole packet? up to certain size Cuts off the rest and capsulates it and sends it to a collector That is what as flow does as far as I know Correct me if I'm wrong But it doesn't aggregate into flows at all. It's not tracking Any all the flows that are passing through router So this is very cheap to implement for device vendors But also I think accuracy wise it's not the best thing to do and as flows currently not supported in T flow So why did I actually start this T flow project as I said in the in the introduction I work on AS 21701 the Freifeng Rheinland backbone and I used to work in Google and Google I was used to have fancy tools To analyze net flow data that give me information like from where to where is traffic going in case I have congestion So now what I have to do as an ad-op person We're not suffering in Freifeng Rheinland from too much congestion at the moment But it happens like we had DDoS attacks and stuff going on and you want to know where's traffic actually going So I need this solution to find out where's traffic going The Freifeng Rheinland backbone consists of six routers that are actually forwarding around about one million flows per minute Flows per minute means that's either a flow that started before this one minute window and ended After this one minute window it can be a flow that just started and ended within the one minute window Or it can be a flow that just ended in the one minute window or it can be one that started in the one minute window I don't actually know how long how long living these flows are But if I export all all flows on the router every 60 seconds, I get one million data sets out of the world out of these out of our routers As some people might know we use Linux on the routers they are just cheap x86 machines With a Debbie and on and bird How do I get out? How do I get this net flow information out of that box? We use IPT net flow, which is a kernel module an extension for IP tables What you basically do is you just say IP tables There's a forward dash J net flow and it goes into this module and the module keeps track of all the flows and Exports the data to the collector The link kernel is not keeping track of ASN information. So there's no information about autonomous systems You don't know from what ASN did the data come to which ASN did it go. It's not there So IPT net flow cannot export this information as it's not present in the kernel So I somehow had to augment that later on And I was trying to find a solution that allows me to do this and there was simply none We wanted to have 100% flow sampling for accuracy Well, of course if you design something new you're aiming for high targets So much my idea was that all these 1 million flows per minute should be processable on one machine On a regular quite cheap machine like I don't know a quad core i7 with 64 gigs RAM and It should be able to cope with this load Yeah, we need an efficient solution because well the fraffung Rheinland doesn't have too much money To throw out just to get net flow information for us as engineers in case we need it So, yeah efficiency was a must. So I looked at available solutions Solutions because none of this was a solution to me The classic thing to use is NF dump and NF Zen NF dump is basically a demon So you have NF dump dealer. I think it's called running that collects that flow packets or IP fix packets and dumps them into binary files And then you have a command line tool And you can say you can fire queries against that data on the drive Start coming from RAM. It always comes from drive and that reports you actually Text data and not graphs over time. So if you have to say like, okay, that five minute window make a query And then it tells you there was this flow and this five minute window It was making five megabytes of traffic And if you want to have wanted it to data for five minutes back if to do another query for another time window and say Give me for that time window to some and that's your where there was another five megabytes so you want to see it over time you have to make lots of queries and Build the graphing yourself because it's not there What you can do in NF Zen is you can put recording rules in and say when data comes in put them through this Logic here and if they match these if the flows match this criteria Write the data into our defiles and then you get a graph ugly rd graph So to say of the art to me at least And if Zen is the web front end for it It's written in PHP. I Won't comment on that so setting it up is pain in the ass Then there's this PM a cct thingy that I had to look at it was very complex to set up and The fact that it throws all the information into regular-purpose databases like MySQL or Postgres or MongoDB And I have one million inserts per minute Which is I think 16,000 per second or something. It's quite a lot and just one machine to get the job done Make me doubt that this is actually possible with these resources and this software Commercial solutions. I just said a look at what is on the market, but I didn't have hands on it I didn't have any ends on experience But we found them all too too pricey or too inflexible or not. They are just not open source me So what was our first approach to get my own solution? My very first go program in early 2016 was to write a NetFlow to permissions gateway and just export all the attributes as labels One needs to know that Prometheus Doesn't like to keep to keep track of too many labels because every combination of labels is It's own variable and I was basically just killing it with too many variables in the database So it was too many just was too many time series basically. It just it just fell over within a few seconds next approach NetFlow to MySQL gateway So I defined a single MySQL table with an index on every column because you want to run queries later on If you don't have queries it reads in the whole table so single table plenty of columns lots of indices on and I was just inserting data into the database I wasn't running any queries and the CPU was just immediately like 100% and the box was just falling over So yeah, it did not work third approach The first big ago program that I wrote. It was not that trivia was T flow version one That stored all the flow information that was coming in only in RAM This database was made of AVL trees so AVL tree is a binary tree that balances itself so in case that I insert sorted data It makes sure that the tree is balanced and it doesn't become a linked list. It's very important for performance of queries and inserts So what was this the structure was like this? There was one tree for the source IP addresses and in there in the node there was the value so let's say an IP address 8888 and There was another pointer to a next AVL tree and that tree was keeping track of the next attribute Let's say destination IP address one two three four and then in there there was another tree Was keeping track of the next attribute the next attribute and the next at you there to traverse like I don't know if we have 12 let's say we have 12 attributes. You had to traverse up to 12 trees It worked Was not the fastest but also not as bad as the promissive solution and not as bad as the masculine solution And it was very memory efficient But I never released it to the public because all the code was living in one package and stuff like that. It wasn't well designed And I wasn't Using it for quite some time. So I just put them to the corner and say yeah, we will not use this in production So then come the fourth approach t-flow version two and this was aimed to be released to the public I was like, okay now I have some experience with go I think I can find a better solution to get things done and I was coming back from as the vertical last year and my colleagues have been so So good to me that they gave me all the on-call shifts for the winter on the weekends as especially so I was sitting in Dublin It's raining outside, so I wasn't caring too much about it sitting inside keeping Keeping care taking care of the network Nothing happened usually so I plenty of time to focus on my problem and start Hacking on t-floor 2 so I redesigned the database layer and added a nice web interface nice of interface There's some people in the room that won't agree with that and Eventually implemented the file storage layer to keep flow information not only in RAM, but also on disk. I added an annotation layer annotation layer to add BGP information like autonomous system numbers that are lacking from the IP tables NetFlow module. So actually have ASN information And it turned out that I had to re-implement the NetFlow version 9 decoder that I was using and later on I was implemented IP fix So what does t-floor 2 actually look like this is a general design on the very top You see the router that sends a net flow packets via UDP and there's two Modules running that are there to receive net flow packets or IP fix packets does never version 9 server and the IP fix server Well, guess what they do never version 9 takes care of never version 9 and the IP fix one takes care of net of IP fix So there's two open UDP sockets where they receive the data From there via a channel and go it goes into the annotator layer the annotator layer at the moment Can put stuff through the bird annotator or not put it through there You can control that with a command line flag when you start the t-floor binary So it comes back. So either it goes through the bird annotator or not depends the bird annotator There's one piece missing here actually talks to a bird instance and the bird is a routing daemon and this routing daemon Has all the all the ASN information for all the prefixes and if the prefixes for the IP addresses that we are Actually trying to get information for So once annotator is done It updates the stats package and says by the way I received a net flow packets or I received an IP fix packets And this is this was this big So you can actually monitor what t-flow is doing in the background that stuff is exported via HTTP and can be scrapped with Prometheus So but that's not all the annotator does annotator also of course forwards the data to the database layer So database layer can keep the stuff in memory and organize the data In a way that we can run fast queries, but it's also not too Too expensive to add data into a database because as I mentioned we have to add like data 16,000 times a second Which is not too trivial actually It's not a task you want to run on regular-purpose databases. I mentioned already So there was quite some issues during development Actually plenty of them because everything that was doing was too slow So one first thing that I realized was when I did a query on the database And I looked at all traffic on interface and I just say give me all the traffic on interface Like no great no more criteria. Just said I want to have to refry from grinding on D kicks interface Tell me what traffic is there and was saying 800 megabits But in fact the other monitoring from the from the counters of the of the interfaces on routers They were saying here's 2.4 gigabits going hmm Where's all the where all the flows going either the IPT net flows lying? But probably my software is wrong So I started digging around in my software if I do any calculations or something going wrong or no it wasn't So later on I kept to the idea. Hmm. Maybe I'm not reading the packets the net flow packets fast enough from the kernel space And the buffer is running over So I tried to figure out how can I check that and turned out there's prognet UDP and prognet UDP 6 files And when I checked them they were showing plenty of plenty of drops of packets That was explaining to me why my data must have been wrong So you think like just read faster. How hard can it be? So I parallelized the workflow and say you know what I gotta have multiple go routines that all read from the same socket But actually my CPU was at 100% already So it wasn't helping I Increased the buffer size That helped for a short moment But not long term. I mean It could have been spikes so for spike spiky loads larger buffers would help But in fact, it was just there was data was coming into the buffer too fast. I was draining it too slow There was just a fact. So what I did is I started profiling. What is T flow to actually spending its CPU time on? Why is it so slow? And it turned out there was this net flow decoder that I found on Github from FLN was called NF9 packet and That didn't perform very well 90% of CPU time was running into that package That I didn't implement that I didn't even implement myself I just wanted it to decode a net flow packet and give me a struct and give me all the data Yeah, 90% of CPU time But the graph that I had unfortunately I've thrown them all away The nice profiling graphs would have left to show them to you But I was too lazy to reproduce them all and check out old versions of the software and let them run against production again And profile them again and stuff. So I'm sorry about not having the graphs But what it was showing actually that that nearly all the CPU time was then going into the encoding slash binary package With it, which is a package of the Senate library of the go Distribution and there's a packet. There's a there's a function or method is called read and that was used to read data From a byte buffer into a struct Doesn't sound too bad, huh? This is the standard way of doing things in go But I had a look into the package of into this binary package What is it actually doing and in the function we can see in the first line. It says reflect What does that mean? Reflect means there's a that this is a function that has a parameter that takes everything Would know whatever data type and go is a type safe language So I was taking whatever you're throwing in but then it needs to know what it actually is So how does the program find out what is this data type that you just gave me like you're giving a destruct and saying this is the struck that you want to have filled with this binary data, but it doesn't know the struct so has to Break it up and never look like what is in there And that's what what is what reflection is used for and reflection is slow as hell Remember I had to do this like a few thousand times per second This sucks So what I did I thought about how can I re-implement this and I was googling for ways To decode packets without this binary dot read function And there was mailing list entries and a mailing list was he saying no, no, this is the proper way to do it This is how you have to do it. There is no other way As I know there must be another way it might not be supported very well, but there is one so I did this I re-implemented the whole package With the same external interface because the software was using it already and in the face look fine But the implementation the internals I implemented now in a much cheaper way and what it does it a set of reflection and the binary read package It actually is casting of data And this increased the performance at least tenfold I'm not I don't remember exactly how much better it became but it was at least ten ten X better at least So suddenly I was able to Process all flows coming in without dropping a single one So the product net UDP was not showing any drops anymore, which was a good achievement already So how does this not really could I work as I said we are casting so When you read from a socket and go what you actually get is a bite slice, but a bite slice is nothing that you can get the raw address of the data that is pointing to from because The bite slice is actually a struct that has unexported fields and one of the unexported fields within the bite slide even the slice is There's there's one point. There's one variable in that points to an array and what I need is I need this exactly this Pointer address that is pointing to I can't get it because I export it So I can't work with white slice obviously. So what I do is I copy the stuff into static size array Because from the static size array, I can straight get the address using the unsafe package and then I've just Memory that's a pointer to some memory and I can use the unsafe package to cast the shit out of it basically So I have a struct and just say there's 1500 bytes Take the first few hundred for this truck this truck is exactly these few hundred bytes long and just reinterpreted It's like template like read this literally I put like putting a template on something and cutting stuff out and This is without copying any data This is the way how one would implement this in C. Anyways, so yeah, that's how that's Why the net flow decoder performs so well because not copying any data around and so using any reflection And I guess there's not much room to make it any faster than it is If you have ideas how to make it even faster feel free to send me pull requests Let's talk about the database layer next So what's actually a flow? How do I represent the flow? This is what a flow looks like as plenty of data in You might be wondering about certain data types here For example, let's take the protocol field here. It's a hue in 32 But in an IP header the protocol field is 8 bit field Why am I wasting so much memory here? The reason is that this data structure actually comes from a protobuffer and Protobuff is a binary format from Google That I used to serialize the data on the disk because it allows me to do that with very little CPU overhead and it's not offering me anything smaller than you in 32 Or what I can take is a bite, but there's no No you and eight Maybe I can one day change that to to use to use the bite because that's 8 bits as well But for the moment, that's how it is Wasting memory. I know but This was the best way to implement it for the moment So this data needs to be stored somehow in a way that we can actually find it quickly and Also that it can be added into the data structures With not too much CPU overhead Because again, we're doing this a few thousand times a second So the main variable that keeps the whole database actually is of the top type over there that says flows by time and router There's a map to a map. So a map and go is basically a hash It's a hash of a hash and the first index in 64 is a timestamp Like what time are we talking about for that for the flow? The next thing is what router? reported this flow This is represented as a string the reason for this is An IP address net dot IP type is a bite slice and the bite slice cannot be used as an index on a hash in go I heard that one can use a struct that actually points to a bite slice But I didn't implement that yet. So for the moment, it is a string again Probably not the best solution, but it is fast enough for the moment And then what this thing he points to is a time group and the time group there becomes interesting So for each and every attribute that we have in that in our in our flows, we have a field there That's actually a map of the type of that attribute. So for example Let's take the protocol and protocol says it's a map you insert it to Because you insert it to why because it is here you insert it to as well and the same value was used here to access the map But what do we find then so flow comes in? We have a timestamp. We have a router So we find this time group thingy Then it says protocol is six or 17 Usually, let's say it's six and we access the six here and then there's a pointer to an AVL tree Why is that and what are we actually storing in the AVL tree what we're storing in the AVL tree is the pointer To this thingy. Why is that? That means all flows That fulfill the criteria coming having the same timestamp being on the same router and having the same protocol All those flows will be thrown into this one tree So if you come with a query asking me, can you give me for this timestamp and this router all flows that were UDP? I can find the tree in constant time and traverse it in linear time You cannot make it much faster for querying this is why this is Organized like this So what did I did I go for a tree a tree for those who don't know inserting in a tree? The runtime of it is all of log n where n is the amount of data in the tree already and Searching takes all of log n as well Wouldn't a hash map be better Because a hash map you can insert in constant time and find stuff in constant time as well Sounds better No, it's not better in this case At least now with this implementation in go because the hash map requires to be copied when it grows above a certain size so You say you ask go give me a hash map and you add data Well, it fits in until a certain point the map becomes too full There's too many collisions popping up to go runtime realizes needs to increase the space For this map and what it does is it copies it all over to a bigger memory allocation It is takes off CPU time. It is a spiky. It's like not popping up all the time It's like you add thousand elements five thousand elements ten thousand elements and suddenly boom as to copy everything and the CPU goes 200% and In that time I was failing again to decode the packets that were coming in So of course I could have increased the buffer space again or just go for a tree which allows me more stability of the program So for query as I said if you come and ask Here's the timestamp 1 2 3 4. Here's the router ID blah blah blah whatever and I want to have everything for source 8888 let's say let's forget the the last three attributes here down there. Let's say the query stops at 8888 We have this map of maps that allows us to find the tree Where all flows are in with source 8888 So what you're asking for is basically this one subtree on the very left top query done simple thing But what you usually do is you have multiple criteria that you want to query for In this example, you want to create for all all flows. They were coming from source address 8888 Their protocol was UDP. You don't care about ICMP, TCP, GRE whatnot. You just want to have to UDP once The destination ISN must be 1 2 3 And the ingress interface on the router must have an interface number 5 That is your query criteria. So how does it work? Again, we can find these four trees. So for all of these four attributes We have a tree that represents all flows that fulfill this criteria We have four trees or if to find this the common elements of four trees so what we do is Take the left tree you traverse it for each element that you find in the tree you go to the second tree And do a lookup and check out is this flow also in this tree? If it is well, it goes into the into the intermediate result tree down here in the middle left in the middle left If it is not you skip it go to the next one once you're done with these two You do the same thing with the two trees on the top right You traverse the left one take every element do a lookup on the right one See if it's there or not and build the intermediate result tree Then you have two intermediate result trees and then you find the common elements again And then you have to find a result and then you have all the flows here that fulfill all your criteria that you wanted to have So how does this perform? Looking up in the tree as I said is log n How often do you do that for two trees? n times so it's n log n And then you do it in parallel So what T flow actually does is it takes the top left two Run set in one goal routine and finds the the common elements and takes the top right two And finds the common elements on two other CPU cores at the same time And then it takes the two results and builds the uh that the final result tree with the um with the common elements So how long does it actually take? n log n To find things it could be faster with maps again But the problem is as I said The maps will be copied around and you use too much CPU What I actually have to do is implement the map myself and make it somehow better But the issue is when you use a hash map you have to reserve some memory for it and The amount of data that goes into these trees at the moment can be differing a lot For example, it can be like one host somewhere on the internet that nobody talks to just one guy talks to it with one packet It will get its own tree with just one entry with just one IP address And there's other IP addresses that have plenty of entries because they've been plenty of flows like I don't know google front and web server or something and that has thousands of entries So what is the ideal size to reserve in terms of memory to avoid copying around? I don't think I will find a better solution than what is there in the go language anyways at the moment So probably the best solution is to go for a tree Uh, as I mentioned in the beginning, uh, that flow is based on this templating thingy from version nine onwards Um, you're receiving the templates In regular basis that explain to you what the data actually has to be interpreted like So the program has to keep track What templates it actually see and whenever flow comes in There's a flow ID of template ID. Sorry and with template ID have to look up in the cache Uh, what is the template? Like what is what is it so I can actually read the data So first approach was Well, let's have a map use a string as index and Put the template records in there To access this I need to build a string The template key here and as you can see what I did in the beginning was I used The format package and the sprint f that creates me a string And to find the correct template I needed three informations. First of all the router ID because the template IDs are only unique per router So I need this router ID that we are talking about the The source ID because you can have multiple sources That are reported in the net flow header For us, it's always zero in my use case at least But needs to be there actually and the template ID itself needs to be there So I concatenated these things And yeah, and guess what? This was too slow This was eating up a lot of CPU time and actually Uh, lowering the space lowering the the speed Um of x of ink of adding data to a database a lot So second approach was nested maps So I just said, okay, let's have a map of un32, which is the router ID to a map of un32 Uh Which is the source ID And have a dana map on un16, which is the template record ID or the template or the template ID And then just put the template records in there So this bears me to convert the ip address that I have of the router from net.ip into a string in the first place The next thing is this saves me Uh converting a un32 and the un16 into a string and this helps me concatenating the string and this helps me Allocating memory for the string So not doing anything basically just just use data that I have anyways reinterpreted as un32s And use it to access a map in constant time Much faster file storage layer. So at one day Uh, I thought like keeping the stuff in ram for half an hour or an hour Eh, it's not enough. Maybe I want to keep it on disk for some time Um, so my first approach was to use the encoding gop package gop is go binary And it's the default way to write binary data to disk and go Turns out again, it's too slow For the same reason as the binary.read thingy because it uses reflection So second approach was let's try google protocol buffers And google protocol buffers was like more than 50 faster So using 50 less cpu time basically Um, and the nice thing is it can and will be reused uh for upcoming grpc api to do automated queries Against the tflow system. So at the moment tflow only supports queries from web interface But soon there's going to be an api that you can use with grpc To query data out of tflow And grpc Utilizes good protocol buffers. So having had this protocol buff there Makes the very low hanging fruit to implement the api So now as I solve the cpu time issue add a new problem this guy, oh Uh, we have like seven gigabytes of data per day coming in raw uncom uncom, uh, uncompressed Writing it is okay like seven gigabytes during the day. It's not that much like, I don't know a few Kilobytes per second maybe or even less. I don't know Um, but the problem is when you run a query And it affects like a day of data You have to read seven gigabytes of data from disk and this can easily take at least two minutes And two minutes was too long for me and I was like seven gigabytes per day This is also very limiting in terms of how long can I keep information on disk because disk space Is not endless at least not in fry from rheinland Um, so what did I do? I started compressing the data because Because of all my previous optimizations. I had some cpu time leftover And cpu time was cheaper than more disk io Um, so I was like, hmm, maybe we can compress it and then write it to disk that makes the data smaller And then need less time to write it to disk and read it from this And turns out the effectiveness was increased by 3.5 times So instead of seven gigabytes per day, I was writing two gigabytes per day Which means your query and it doesn't take two minutes, but it takes you like, I don't know 20 30 seconds Around about 30 seconds to get to get a query done for whole day Which I found quite good um, and the cpu cost to do the to the compression and decompression was actually Not very high like I wasn't able to see it in cpu graphs at all on the server Came at basically close to zero cost, let's say Um, and I also implemented the annotation layer to get this asn information that I mentioned in the beginning Um, there's only the bird annotator for now, but If you want to work with it and you have other sources of information that you want to put On your flow information to make queries like customer IDs project IDs or whatnot Um, this is the way this is the point where you can hook in your own module to add more metadata to the flows Um, but at the moment the whole t-flow system is not implemented the way that it would be easy to add new attributes because it would involve to edit all the All the types and the structs that I have in the in the program But I think I will rework this quite soon to make it more flexible Uh as flexible as in the netflow version protocol because right now This the set of supported attributes is fixed in t-flow too But maybe I guess I will change that quite soon Anyways, so for now, there's just a bird annotator and what it does is it talks to a bird instance using the unix domain socket That's also used by the bird c command line interface when you are interacting with bird Um, it does it over a unix domain socket Uh, this is this is used in purpose to uh to get detailed information Um on routes for example You have a flow that comes from 8888 and you want to see what was the prefix and routing table that was actually used to forward this packet So we do is a show route 8888 show route for 8888 on the bird and the bird tells you here's the prefix And by the way, this is the as path and here's the destination asn um And it does all these lookups Together this information To and we precise it gathers the source asn for flow a destination asn the next top asn So in which asn was the next of router that the packet was sent to um The source prefix and the destination prefix And of course it caches the results because you don't want to run to the bird all the time and say show route this Show route that show route this and like a few thousand times a second as I wouldn't perform very well Um, so we're caching this So t-flow 2 is the state of the art flow analysis. I'm not sure For me, it's the best available open source easy to set up solution at the moment, let's say It was my first release go project. You can find on github.com slash tagv6 slash t-flow 2 Originally I released it on github.com slash google slash t-flow 2 But as I'm not a google employee anymore for quite Yeah for nearly a month now I forked the project into my own repository um It includes very efficient netflow version and ip 60 coders it keeps flows and ramp for a configure one of time I'd source them on disk in a very efficient google protocol buffer format Um, we have a usable user interface now web based for easy equation of queries And we can annotate flows with but uh of pgp information from bird Um, and the only downside that I see at the moment is that the api is still somewhat ugly, but gipc is to the rescue soon We've read some known issues at the moment For example, people are poking me all the time. Hey, what about s-flow? I'm like, yeah, s-flow is a lie, but maybe when it's very well supported soon I have to have a closer look on the protocol I think it wouldn't be too hard to implement it. So that's coming for sure quite soon One issue is when you reboot the router the IDs of the interfaces can change And t-flow is not keeping track of that t-flow only reads in the interface ID Um, and that interface ID that is reported by the router is being saved on this So maybe in this moment interface five is either net zero you reboot the router and suddenly Interface five is actually ethernet one t-flow doesn't Take doesn't get that and you query it for inter for you for interface five And what you can see is that in the one moment there was like this traffic and next moment is this traffic because it represents different interfaces So you have to be careful when reading when you're reading this data and to reboot the router um I have quite some ideas how I can Uh, circumvent these problems or work around them But we're not there yet Ah, good idea. Yeah, that could be because I would think that would in that would Tell us that we have to Read the mapping from the router like what physical interface is actually what interface ID at the moment Good point The api that we have is quite ugly and there's no documentation about it And there's no guarantee for stability as it says in the in the read me anyways So don't wonder even the next version Your queries on the api will not work anymore And in some places you have suboptimal data types But for the moment, I think it's fair enough Because it works quite well and much better than everything else that was that I've seen before and that data produced before Um, yeah, so lessons learned during developments. Um, so if you program for velocity Make sure you avoid reflection Make sure you avoid memory allocations and copying data around every copy operation Costs your cpu time And if you have to allocate memory it costs to even even weigh more cpu time avoid Avoid converting data, especially to strings if you can just use them as an index on a map or something Don't convert that stuff just to access data Doesn't make sense And always actually I think it's worth it to compress data before you write it onto the disk always It's always faster Like cpu's these days are so fast and disks are still so slow Um, that it makes sense to always compress it And because they're using unsafe package if necessary. So this thingy of casting Uh, the information out of this byte array that I have Requires me to break the type safety of go and definitely unsafe package Um, I think you should not if you need velocity. Don't be shy To make use of the unsafe package also although it is called unsafe You just have to be careful in what you're doing and test it well and then it's fine and Go is a great language for distributed Workflows at distributed like on multiple cpu's on the same machine. Sorry simple Make sure you make use of it wherever you can if you aim for velocity And now I want to go for a small demo on tuckflow Based on the frey von rheinland backbone No worries. You will not see any flow information Showing particular IP addresses because of data protection And stuff I'm not going to show that to you because I'm not allowed to but what I can show you some other metadata. What I want to show you is Uh, a breakdown on our dkicks interface and show you From where is traffic actually coming from which autonomous systems who are our peers that send us the traffic on the dkicks port And Maybe also where is it actually going to to which asn? So let's say Ethernet to vlon 40 81 This is the interface for ingress. So as the router we are asking for the Let's reload this This makes no sense So we go for bba frankfort 2 We want to know about isn't it too We learn 40 81 and we won't have a breakdown on Uh, was it source autonomous system number? We won't have the top 15 flows and the rest will be aggregated into one big chunk And then you run it We can see us This fancy graph That tells us we have around 1.8 1.8 gigabits per second of traffic at the moment on the on the dkicks interface And we can see that one as here is clearly dominating The ingress traffic Into the fray von garland backbone at least on the dkicks interface and that is as 1569 and that is nobody less than google so Both people on the fray von network are just using google services all the time So we actually do is also We can go here and say We only care About this And actually what we want is a breakdown by protocol So we only see traffic from google and we will see which protocols are actually used Here and we can see is mostly is protocol 6 And protocol 6 in ip is as far as i know tcp And 17 is udp and protocol 1 is actually icmp So here we are now we can see what types of traffic we have here What we can also do is you can add another breakdown. Let's say um Source port And we can see now is so the top combination is protocol is tcp And the source port is guess what 443 because this is all http as connections And people Probably watching youtube Over http over tcp over ip over our dkicks port And this is stuff you can easily see using tech flow and What we're actually analyzing here is a Big chunk of data. I think currently in ram. We have like 50 something gigabytes of data on flows In the ram and this is data for up to one hour And currently we're not saving data On disk for the fryfunk because we don't need it For us, it's enough That we can look back for one hour and see from where to where was traffic actually going and then forget about it So no worries. We're not saving your data For too long We're saving it For a time window that we actually needed for like one hour and that's it. I keep it in ram. So if The power of the machine goes It's all gone. We don't know anything anymore So thank you for your attention You can find the program on github com i'm open for feature requests Um, if you there's anything missing that would really help you Feel free to send me your feature request. Um, if you want to send me pull requests, I would appreciate that even more obviously Open to that and if you have any questions or comments, um I would invite you to leave them with me now No questions zero Well, okay, then thank you Thanks alone So it's actually been pretty interesting to hear since I'm working on an open source project unreleased so far Which does pretty much exactly the same thing. It's not quite as far as used But it's a working prototype Well, one thing I've been wondering is how you're handling incomplete flows. So like uh, how's ipg net flow Handling incomplete flows. Is it sending them on completion or is it sending them every few seconds? You mean flows that haven't been terminated yet Yeah, exactly So it is exporting them every every 60 seconds. So we have configured even though if the flow is deductive it gets exported And then the byte counter and the packet counter gets reset So in the next one it's not reporting the bytes from the previous time window again I see so and in t-flow you're just aggregating it as it comes in Uh, exactly. I yeah, I'm not aggregating them over the time. I Save every time for every timestamp. I save the data individually So I can graph them over the time. So you're not just you're not like Merging them later once the flow is complete. No, I see so One interesting observation has been that a large percentage of our traffic is actually in long-lived tcp sessions and My first approach was to merge them afterwards But it quickly got too too resource intensive to to go back and And merge them later and yeah So you really wanted to see How much data one flow that was living very long was actually transmitting So you're interested in the in the in the in the volume of the traffic Yeah, exactly. It really it really skewed the statistics since it was accounted for when the flow was complete so Yeah, really interesting. Thank you You're welcome