 I'll use the working keyboard. I have a hobble laptop due to the internal keyboard sparing the inside parts from a glass of orange juice two days ago. And the talk had to be rewritten subsequently. So here we go. So the title, obviously, is Fashionably Late. I'll explain. Actually, you probably will figure out over time what I mean by Fashionably Late with regards to round trip times. But in any case, this is a discussion mostly of what's going on in the murky underwaters of networks that actually have real load on them and experience real problems. I found, through a couple discussions with folks at school, that generally the research crowd in the world doesn't regard analyzing round trip times in anything more than doing fancy histograms, smoothing, that sort of thing. Isn't a very interesting thing. I thought it was more interesting than a lot of folks did. So I decided to sit down, get some gear, and start pinging stuff really fast. So we're going to talk today about something I would just colloquially refer to as active probes, kind of an entire field, if you will, of ways to analyze what's going on a network. That is, when you don't have something you can continuously observe, you need to go and get the measurements routinely over and over and over again. Hence the name active probe. OK, so generally, I think I kind of almost talked halfway through this. We're going to talk about, I'm just going to go quickly over the concepts behind active probing. What you can expect from it, things that you have to be aware of, things that you just can't do with it. Basically, known problems that are just understood, but I want to communicate before I display any kind of real data. Also wanted to discuss the operational aspect and the practical aspect of doing this kind of analysis work and data gathering. We'll also talk about how to identify, looking at some graphs from an enterprise network, basically my school. What features you want to look for in your own graphs if you do some sampling, and what these things really mean to maybe network administrators or network engineers. We will also, I'll basically kind of broach the topic, but this entire talk kind of is a reason why we should look at something that we sample on a routine basis like network delay as a signal. It's not just a datum. It is a signal, so to speak, a data set. And if we interpret it as such, we find that there's a lot more interesting things, or at least I find there's a lot more interesting things to be seen than any other way. And we'll of course have the required conclusion slide, which is useless. So what are active probes really? Essentially, we want, in this case, to determine the round trip delay between hosts. Why is that important? Why that's important is what we'll see later. But essentially, we have to actively probe the network itself to determine the state over and over again. We could look at a couple things, like try to infer that. We could look at a web server chained through routers to a dozen or so client machines, and ramp TCP between the server and the clients, like transferring FTP and that sort of thing, and maybe try to gauge between each packet the gap that exists or the time it takes to see eight packets go out one act back, if we know the congestion window we're operating, and basically using existing sessions infer the delay between hosts. Those methods can certainly work. I don't think they can reveal the delay state 300 times a second, maybe two times a second, you could determine this by watching TCP. So I want to know more rapidly what the delay state inclusive of all the hops between MIA and Node would be. So doing an influential kind of flow-based or looking at pairs of hosts communicating, I don't think there's a real appropriate way to go. Another problem is, in this condition, we want to look at specific characteristics between a source host and a destination or a collection of destination nodes. And flows in the real world don't work the way we would necessarily want our experiment to go. We wouldn't necessarily know that a person's ever going to be downloading an ISO for the entire time. They may start it and the transfer may subsequently error out, they may time out, they may go away off the internet entirely. We can't really base any kind of routine experiment or routine measurement on looking at those types of traffic flows for more than just maybe ancillary data gathering. We also want to know that we will have enough data to look at. We may see that on any average pipe from ISP or a dial provider, the transfer time for anyone person's TCP activity or on maybe web surfing activity is 200 milliseconds, 300 milliseconds, and it may involve a couple hundred packets or three or four. There's just really no way to know that we're going to have enough long live traffic to make any use of. As far as things I like pros about active probes the sampling is very straightforward and very easy. You write a program or use existing ones that just spew packets out and you wait for the replies, at least in this case. Relatively low overhead, you're not hoping the TCP stack can do its segmentation, can do its math to increment sequence numbers, and also some algorithmic overhead that just sending an ICMP echo request doesn't have. You also can't control the probe characteristics necessarily. If you, I'm sorry, you can control them yourself, that's what I meant to say. You will be able to control the rate of sending, unless you make modifications to TCP, will attempt to ferry the data between its buffer and the far host's buffer as quickly as possible without, of course, incurring too much loss. We don't want that. We want to commit an information rate, not a maximum information rate between hosts. Downsides, of course, would certainly, excuse me, I can't even read my text. My laptop display does not work this way. So if we were to probe quickly, we'd obviously see that there's an additional load in the link. We would have to basically make sure that we gauge our own probing in respects to how much bandwidth we know to be available between the hosts, or can at least reasonably guess is available between the two hosts we care about. We could very easily overrun a constrained link and get useless samples as a result, or cause undue amounts of errors in our results, unless we can carefully gauge this. I found that in just generic testing, 100 packets to 200 packets per second is not all that damaging, especially if they're containing maybe 8-byte ICMP echo request payloads. So we don't see more than 5 or 6 k a second, typically, on a link when I do any kind of tests like these. One last issue is certainly when you want to send packets on a known interval, it's not necessarily true that your host's IP stack will honor those requests to send data. Generally, ICMP is the best out of TCP or UDP. There could be plenty of good things going on that buffer enough data to send a much larger chunk to be more efficient. There could be much of algorithms in the way of you. At least with ICMP, you are at the resolution of the user land process dumping some data somewhere, which on an unloaded machine with recent kernels, as I've noticed, really isn't too bad. So we don't have to go to extremes and lengths of constructing real-time OSes or even configuring a real-time OS to run our sampling applications, although certainly those should provide even more accurate results. So the talk also, I want to focus a little bit on something akin to what Afir earlier was referring to in terms of how he's using ICMP details in the headers now to nail a host or fingerprint a host's characteristics, that is. In terms of the way a machine replies to these rapid fire pings, so to speak, you can definitely find that in a lot of systems, replying to ICMP is not a big priority. That's actually a very good trait, but that is the very trait that is worth looking for. So if you had five router vendors, different engineers in each camp, certainly they're not going to decide how to reply to ICMP differently, lower prioritized the same way five times. You may see some runners use a scheduler on a specific interval to always reply to management traffic, which may include basically ICMP messages to be sent at that time, routing messages, that sort of thing. The main focus, obviously, in a router is to pass user data or things that make the ISP money. Pinging a router doesn't make an ISP money. Transiting customer data does. So between Cisco and Foundry and Juniper, there's certainly going to be differences in how they handle management traffic. And while some may say that's the very reason we don't want to look at this data, I think the opposite. I think it should be very unique to each router vendor, and possibly within a classification or category of router as well. OK, so when you look at network delay, what you'll see is upon sampling a path between, say, yourself and yahoo.com, you'll find most of the values come back on some sort of low, random noise, like a background noise of sorts. And you'll see these, for example, larger items that are spikes in delay. I would generally categorically split these kinds of measurements, or at least these measurements you observe, into two groups. One that involves small scale events that contribute to the average delay you see, the background noise, and then large scale events that contribute to the actual peak delay you may observe, the spikes you see throughout your sample set. And those are two elements that are present in the same sample data, but come from distinct sources. So I'd always separate those two, certainly. I think I'd discuss a little bit how we're doing this. But here's just some example commands that I found work reliably well. Time, time is a great command to see how long some subsequent arguments take to execute. And generally on anything a gigahertz or faster, you can find that waiting eight milliseconds really means about 10 milliseconds reliably between ping reps. Now, this takes some guesswork to nail. You just need to basically ping a set known packet, so that's payload count, say, 1,000 and 2,000 packets. And time how long it takes. The rate at which each packet is sent, I've found it has noise below 10 or 12 microseconds. And those measurements, that noise of the self-scheduling on the system is very, very, very far below the actual noise you'll see from a router with less CPU than you, or a link that's busy. So I would say to anyone wondering about whether or not your own local machines impact is going to be present in the data that it's probably is a very small portion of anything. Once you then know what kind of interval will get you the samples, intervals you want, how many packets per second you want for a sampling rate, you can just take and drop the data to a text file and then parse out the stuff you don't want. I just run it through AUK, say, give me the value times 100, so I have microseconds instead of milliseconds. Pretty simple, or was it 1,000? Yeah, 1,000, thanks. So then once we have data, we want to make some use of it. So we immediately jump into kind of an operational setting here. We will look specifically in whatever graphing application we have to look at a time series. That is just a time and amplitude graph to pick apart the high, the large scale events and the small scale events. We'll also spend time then looking for other events that we can basically find similar across all of our samples. So if you're trying to determine, say, a path impairment somewhere between yourself and some other customer that you don't manage the routers in between, you may find that you can go to each router's IP stack, ping it rapidly, and spot the consistencies between the routers, if they're all the same vendor, for example, and then find any differences quite obviously. Some things you'll find in typical data that are of great importance and great telling are shark fins, or something that a researcher at University of Wisconsin-Madison called shark fins. He did a lot of this person I'm referring to. His name is Jim Gast. He hasn't published his thesis yet, so I couldn't reference anything. But he basically piled through something called the Surveyor Dataset, which was a project that lasted from, I believe, 1998 to late 2001, 2002, which involves 60 machines on the public internet, pinging each other two times a second and trace routing every five minutes. And it was a full match for each of the 60 nodes at pinging every other node. And with all this cross-sectional data through the network, the public internet at large, you could find similar events that kind of gave you the impression that at some point in the network some interface or some link got seriously overloaded. And you saw the ripple effect, so to speak. You saw congestion in that link, and you saw other links that reacted to it. They became congestion, and so on and so forth. The effect was kind of like dropping a pebble on a water surface and watching the waves propagate. So the characteristic that Jim named these events where the drill would rise significantly quickly and then trail off quickly were shark fins. That's what they look like. The second characteristic you'll find is when you're looking at a frequency-decomposed version of the time series data, is you'll see specific rings, ring tones, so to speak, in the way that the delay varies over time. And to visualize these things, the two packages that I commonly use are Bodline and SigView. Bodline's a much better Unix package, works on anything I've tried for FreeBSD and Linux kernels, runs in X beautifully well, has 24-bit visual output so that FFT results look pretty decent, aren't very aliased. SigView for Windows is a little bit less useful, but is also free, for limited time at least, and can essentially do the same things that Bodline does in terms of viewing just time data or frequency data. Okay, so long-awaited. Here is the effects, actually, of pinging a foundry big iron. This is 100 packets a second, and the time scales you see in the bottom is pretty condensed. We're running through about 10 seconds, 12 seconds with the time, and the staircase patterns, as you can see, which I had a laser pointer to point these features out, you'll see the staircase pattern is heavy towards the left side. It becomes lighter and lighter and lighter as we go through time, and tends to shift downward again in terms of the random values that we tend to have delay focus on as we sample. That edge is where the decision point, presumably in the router, is whether or not to reply during this interval or the next interval. Again, as I was alluding to before, it seems as though this router vendor has a specific interval at which they will reply to management traffic, and so this reveals it. Now, this is zoomed in. The rate of delay that we see is pretty consistent minus the random noise and whatnot, but the previous feature, if you see the very center, there's an upward going peak and a downward going peak. You can spot that exact same feature in the almost the left third of the graph at that point as well. So another router that had a chance to test was a Cisco 7513. This one is basically handling three or 400 segments of lane that come in over multiple OC3s from all over the campus. And recently, a feature was enabled to export all the Sflow data, or NetFlow data rather, and the spikes you see on a second interval presumably are for that, are that activity on the router CPU, some sort of scheduled event every second of free memory, push things across the management interfaces out to a sample receiving machine, so forth. But if you were just to look at this dataset without examining any other finer details, you may think that that staircase pattern or any kind of real interesting content isn't there because of the large deviation in delay that's hidden, but if you were to magnify the graph, you'd see that the Cisco has a similar staircase pattern to the Foundries. So again, there's a notion of a scheduler here in the router that would give us the impression at least to my eye that there's an interval which you can emit a packet, but you will always get a response either that sample or the next one through time, or the next interval in time. And as my sending clock slowly drifts or changes through time or doesn't quite harmonize with the receiving clock and the reply clock in the router, I will see a strange, maybe a variation of where I'm hitting it on the money and then, so to speak, a few degrees later off the money and I wait exactly one more interval for reply. So as I harmonize with the router, I can see that this staircase pattern grows and then almost immediately decays into disnoise and that could be very well the noise of my own sending clock. Going back a slide, ideally that I wouldn't have that grass, so to speak, up and above and below the trend line, but that could be caused easily enough by a random delay in the link for a queue somewhere else or my own sending clock, the sending interval in my system being a little bit ahead or a little bit up behind and missing that interval just enough to see a deviation above or below. And here is another example, a T1 connected router. I was curious in general whether or not these harmonics I was observing were only something that somebody sitting on top of a large pipe, like a gigi to a university core switch could see. I was curious whether or not low speed, lower speed links, that is, would still have enough, what's the word, temporal openness or temporal resolution to convey these fine detail changes. And certainly enough, another friend of mine who's a service customer, bless his heart, or maybe outpouring of sympathy, he has a 2624 on a T1 from them and the same kind of harmonization with presumably that router's timer to reply to management traffic, it also exists. So it seems that this method, even though it's noisy or that this sample is very noisy because of the T1 possibly having more traffic on it than a large, fasty pipe at school, we may find that the trend lines still exist. Okay, well, so that was the first interesting observation I made while doing these kinds of samples. The second ones after a hint from another researcher, Jim Gast, was that, well, if I'm sampling quickly, I mean, the tests that he, the data that he was looking at for these, or found that these other anomalies occurred in were at much lower rate, two samples a second. I'm sampling at 100 or 200 a second, so I figured that maybe I should be looking for something else in the data. And sure enough, the same kinds of spikes in delay would occur at these high rate samples. As you can see in this graph, we have two of these elements kind of zoomed in on and the leading one contains a little bit of noise and then a significant rise in delay and then fall off and then some additional noise and then another event similar to the first. And looking in closer, we can see a very interesting feature of these graphs. Now, I didn't really explain what was going on in why the delay spikes like this, but my best theory on this occurrence is that in the typical communication of a bunch of hosts, accessing some resources through a constrained pipe, you have the effect of TCP multiplexing happening on a narrow bandwidth. That is the amount of transit you could see from all the nodes in your local network. Probably vastly exceeds the connectivity you have to your ISP, let's say fast ethernet over a T1. So TCP, if it's active on say 10, 20, 30 hosts or maybe even a couple hundred from a larger campus, you may find that the rates each customer, each person would see on their machine to a far host is dynamic. TCP is always gonna try to ramp up and get more bandwidth until it sees losses and then fall off quickly. That's the goal of the congestion window in TCP is not to denial of service yourself, but to get as much as you can up to the ceiling or as near as the ceiling as you can approach. The interesting dynamics though in TCP, if you observe the delay through the same links that TCP is communicating over, is that right on the onset of that queue filling, certain special things happen in a router. Queuing in routers is an interesting discipline that I haven't spent more than a few months researching, but from what I understand, a lot of people these days implement something called random early detection, which is a method or a discipline to say on a router's output interface that if more data than I can stuff on my link bandwidth or my link rate comes in in a certain interval, we should sense that threshold before the queue fills and then drop data from the queue in a certain percentage. So if you had say a 16 kilobyte queue onto a fast ethernet pipe, you might want, not you might want, but if your router were to sense that a delta came through, some sort of additional data came through and a large burst that put the queue over the edge and may try to drop packets randomly within that 16 kilobytes to introduce uniform loss to all the speakers. The goal of which is to make the TCP people quiet down a little bit and not fill the queue up and not experience huge jitter or huge delay. Well the problem is that that works, but it takes at least around trip time and then some for the senders to back off. So while that queue doesn't fill tremendously, it doesn't fill greatly, it still fills a little bit more than it usually is and we observe that here. If I'm pinging end host 100 times a second, that's a packet every 10 milliseconds that I'm gonna send out and hopefully receive. So every 10 milliseconds I'm getting an idea of the state of some collection of buffers in this network and the interesting result of that is when that queue becomes nearly full, I miss samples for a brief window of time, say 100 some milliseconds or whatever round trip time those senders have between themselves. And then I get my reply back at a very, very high delay. In this case it's 1.4 times 10 to the fifth. That's pretty high delay for that one sample. And so as the queue gradually becomes less, less full, as the TCP senders get it, that they should back off, they saw some loss, then the delay apparently drops again. So by looking for these shark fins, one could probably assume that indeed you saw a congestion event. You can maybe characterize what kind of event it was, whether it was on a drop tail queue or a randomly detection queue, which are just two different heads of the same coin. That is a way to make TCP senders back off or you might be able to say something about how aggressively the queuing disciplines have been tuned. The artifact that this graph shows, interestingly enough, is on the rising, leading edge of the shark fin. That has been also affectionately called by Jim, StutterUp. And that is where a few of the senders get it initially because they may have lower round trip time. You may have a mix of traffic that involves 10 or 20 or 30 customers on cable or DSL who were 10 or 15 milliseconds away. And they may realize loss or notice it quicker. The rest of your customers may, not customers, but users, may be much further away, 300 milliseconds or 400 milliseconds away. And they don't get it just yet. So what you might see is in the presence of that kind of mix of traffic, that the delayed temporarily goes back down for a sample or two after the initial rising edge. But the other senders haven't backed off yet and the queue then fills over the brim once more and we have to induce more loss. And then we see, again, another significant rise, in this case from about four milliseconds, all the way up to our peak value. And then, of course, as the queue finally becomes less abused by the senders backing off, they got the messages time, the delay goes back down. That StutterUp can indeed reveal a very interesting dynamic mix of a host at that, whatever buffer, heavy, heavy overrun. Some other Stutter examples, you can see that the top left graph has a stutter down. And this could be, again, another condition where you may have aggressively tuned red that induced loss non-uniformly. And most of the communications level out quickly, but a few didn't get any loss, but still were able to ramp up enough to cause an additional small ramp and delay. The other graphs show other variations in the theme. The bottom right shows what I would call queue resonance where several, maybe a hundred or a thousand hosts were active over that same pipe. And were to all ramp up uniformly after the queue event occurred, but then hit the ceiling one more time, but to a lesser degree, and eventually settled back out over a few seconds. Okay, so looking at the time series data, just sampling, delay, and graphing it sample over time is a pretty straightforward way to kind of pick out some elements of what's going on. But a bit more interesting angle on that analysis is the motivation to try to find a better way to visualize what's going on in that small scale event category. Large scale events are pretty obvious and pretty visually self-described, but the random changes and delay that are always there are maybe full of a few more interesting characteristics that maybe can be tied to the host's IP stack a bit more closely. So it's basically, it's worth the effort, worth the time to look at the data through the eyes of maybe Fourier transforms. And inside of trying to analyze this delay in terms of a spectral breakdown, we also want to make sure that we're not doing a couple of bad things. We're not sampling too slowly to catch important events. And if we are sampling too slowly, is the rapidly changing event being missed by our sampling giving us the false impression that the event happens at an even lower frequency. Much like sampling a signal too slowly, you hear some weird machine robot-like tones, maybe on AM radio you've heard this before, where you don't quite match the carrier's beat. The same kind of thing can happen in sampling network events presumably. But we'll see that it's not, and I can show a couple graphs that kind of support the notion that we indeed aren't really aliasing ourselves by sampling slowly. So the first graph we see here is basically a run of 900 seconds going from left to right. The reason these aren't scale is related to SIG view. It's for Windows, and apparently you don't want actual information in your graphs, so you can't turn on labels. And so left to right is about 900 seconds. It comes down to about 250 samples of delay per pixel on the screen, and that's about 1024 wide, so you can get an idea of it. Vertically the top line and the purple line is near zero hertz or near no frequency change, and the bottom of the graph is near 50 hertz. And looking through time left to right, the color of the graph, from the darker yellow to the brighter red to the bright blue, purple is intensity. Yellow is the background, lowest intensity. Purple is the hot, hot signal we found, or we are seeing on the network. And as we go through time, the random change in delay is a sample between this machine that I had and the router named, or at least the characteristic of it named. I tend to keep things like that anonymous from the campus's point of view. And you'll see that approximately 10, 20, and 30 hertz are frequencies or rates of change for that sample delay consistently through time. You'll also see some spreads or bursts of harmonics that go from near zero hertz all the way up to 50. There's just some random junk caused by queues being filled. A few bars in the graph. Presumably, if one were to characterize saturating traffic load conditions enough with say a mix of FTP, NFS, AFS, a host of common protocols, you might find that the way the delay changes and produces those bursts of spectra is different in characteristic of the transfers that are going on. So if a hundred packets a second, you could maybe see if your queues are tuned properly, what kind of traffic is causing real outages or real service losses or service problems in your network, and at the same time, see what is occurring in terms of the response rates from IP stacks that you care about. So at least in this graph, we can see those three clear harmonics through time and the dynamics that change as you were sampling for those 900 seconds. And for the next graph, I couldn't fit on one slide with text, but the description is the same as the first, but I sampled that 200 hertz. So I used the SIGVIEW program, not SIGVIEW, but Baldline on Unix to do the decomposition of the graph. And those top three lines again are the 10, 20, 30 hertz components, but now we have twice the headroom. The bottom of the graph is 100 hertz of energy, so to speak, in the delay. And we see two new components this time, with the lines around 60 and maybe around 75 or 80. You can kind of see them towards the middle and the bottom six of the graph. And I'm not sure where those are from. This is a pretty fast router, but those certainly weren't there before. So with more sampling, presumably this should be able to build a library, so to speak, of commonly known harmonics from IP stacks on devices that are in operation. So the time series, the time over frequency breakdowns are interesting to look at dynamics through a sample period. But what if you just wanna look at pure characteristics of the schedulers or the characteristics per packet for a long run of samples all at once? That's the place for a two dimensional, just standard 480 composition. This graph is a Windows box sitting on the same lab network on a 100 megabit switch. And we see that the rate of reply, or the variations in the rate of reply for this Windows machine are very, very characteristic and interesting. There's rises and peaks everywhere in the graph that you don't see for other specific kinds of samples. That's another Cisco router on campus, looked at through the same medium up to the interface in the same, essentially the same processor as the previous two routers. But we see strong components at 10, 20, and 30 Hertz and a combination of components on the, at every single frequency interval, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1 going down the bottom of the graph. That again is repeatable, very characteristic of Cisco hardware of that era. Now, as far as host OS's that aren't Windows goes, the Unix's are a lot quieter in terms of the harmonics they produce and the consistency with which they reply to ICMP. This is a FreeBSD 4.8 box. I believe it was actually running stable, if I recall. And it is unlike the Windows hosts and the routers in the way that it responds. It seems to reply as quickly as it can afford to, barring other activity on the machine, maybe interrupt activity, that sort of thing. Linux was very dirty. Very characteristic harmonics, excuse me. And again, unlike other previous examples of routers or other host OS's. This seems to be a robust general fingerprint. This is actually an average of two different hosts behind that same T1. So the characteristics you see there are relative representations of 2.4.20 on an average fast use of the environment behind a T1. So the general takeaway, I think, from those sorts of sample or those sorts of methods would certainly be that the hosts you sample and the devices that you're testing indeed create interesting and what I would call dare to call rich harmonics. That is the way that the delay you'd experience per sample is not just a consistent random function. It's very, very deterministic. Also that links and queues will create generally bursts of harmonics versus hosts creating consistency. Links and queues react differently to dynamicisms than hosts ever would. A host would still try to reply to ICMP as quickly as it could if it was transiting 20 megabits or 40 megabits. The best of my ability to test. That's just how it seems to be. Routers are completely different and queues and whatnot are as well. They would certainly, queues would sort of look the same at 20 megabits and 30 megabits, but it would certainly look a lot different at 80 megabits if the link it were emptying into were just 79 megabits in bandwidth. Very, very interesting observation from that. And we can probably certainly argue that to take this kind of research, this kind of observation method forward will need a lot of hosts in a controlled environment to really separate what's happening on a link layer, the large scale events from the consistency of the small scale events. Right now I'm working in Whale at the University of Wisconsin and it's basically on an invitational basis at this point to use their equipment to eventually characterize how a GSR at 12,000 looks, how a VXR looks, how a DOCSIS equipment looks in terms of high rate active probes hitting its management interface. Eventually we should be able to find between categories of routers, classes of routers in between vendors, pretty consistent types of replies from these devices. But even more interestingly, I think, is that even if you change how the ICMP implementation happens, what fields are set, what flags are set in the packets, the way the schedule works probably will not change drastically. You could change the cosmetics, as Ophir referred to, the individual packet attributes or what he's basing a lot of his decisions on. I'm basing mine on, or in this case, the delay rates are based on something a lot more fixed. It's doubtful that Cisco would ever change a scheduler, they may simply change the way ICMP replies. This should prove to be a longer term, a long-term, robust method, if we can sample enough devices to really test that. And as just general conclusions, I would certainly say that the internal state of devices is certainly the thing we're measuring here. But we also measure the combination of its self-business, if you will, the internal state of a router, and the combination of the effect of that with its scheduler. The way the links themselves appear is obviously different and that those two categories of influence or those two spheres of influence are certainly separate. And also that when we see a congestion event, Q basically delay rapidly rising and falling off again, the rate at which the fall off occurs should tell us something else about the path between that congestion event and your machine. And that is that edge rate of the rate of decay on that delay spike should be equal to the bandwidth at the most constrained link between you and that hop. If you were to have a Q fill up with data back to back for you, it says if you remitted the packets with zero seconds of delay between them, even though I omitted them with 10 seconds of delay between them. So if they were back to back at a Q because they weren't sent yet, and the Q finally empties, it should be played back on the wire nearly back to back. And that rate of the slope of the fall off, I see off that Q would be steeper if the bandwidth were higher or much more gradual if the bandwidth were lower. So over time again, we should be able to determine the dynamic systems between myself and the queuing events based on link bandwidth that we know or link bandwidth that we want to detect. Ideally if we could make one hop get really, really busy say hitting it from the other path into it, you would maybe be able to determine based on the fall off rate from that route of being busy how much bandwidth they had to the router upstream from them and so on and so forth. Until you could disambiguate the path pretty cleanly. Other programs like Pcar and packet pair or packet pair dispersion utilities like path rate don't really, you list don't really go per link detailed enough and at this point to really tell you necessarily what the interconnection bandwidths are. Watching Q emptying rate should be dang near close to what the true link bandwidth would be. And the last point would be that by using four A decomposition techniques we can see the richness in the variations that are in the small scale time domain or the small scale amplitude domains and have those exposed and be a bit more uniform and a bit more specific to the host we're targeting than just the time scale limit, just the time decompositions. So that would probably be anything that I wanted to say about the topic. Is there any questions or commentary on this? Yes. Sure. So the question was, I think you probably all heard whether hurdles in the way of making a, this actually worked truly on a network to fingerprint hosts. That would mostly be access to equipment at this point. There's basically the results that you see here are things that we've tried at Sunday morning at 3 a.m., Saturday at 6, Monday at 1.30 in the afternoon. And the same sorts of harmonics were seen outside of the burst usages. That is, you'd see periods of near saturation, that sort of thing, but the general trends were still there. So really at this point, I think the sampling methods and the analysis techniques are pretty solid. I just need, I would need more access to gear, plain and simple, to basically build a good enough characterization. There's a lot of things that need to be toggled. On Cisco gear, I don't even know if these devices were running set or not. Cisco Express Forwarding could change very drastically the types of replies I'm seeing. So really this needs to be looked at quite thoroughly to base, to make a large enough sample or to be even generate a large enough dictionary to name a host, so to speak, in future probes. But that certainly isn't impossible. So that would be a high one, but at least in this lab, the gear exists, I basically need to do the work. Yes? I have not tested that scenario specifically yet. I have only used Linux in terms of it being an end host. I'm not sure how it would, I wouldn't even know where to begin in engaging how its IP stack would look temporally on the wire. That's a good question, thank you. Yes? What was the method? Switching methods? Oh, okay. So the first question was, do I think I can review the action of someone changing the type of service field in the IP packet? I'm not sure that it would be revealed at this point in a lot of the core routers in the United States, however, don't honor the type of service bit anymore. They forward the packet as it came in and they probably drop it just the same if it has to. I can't see that the invoice over IP tests that I've done with trunks from California to New York and back and forth, the user impact of honoring type of service hasn't been to my experience yet very great. I know within a lot of enterprises, diff serve and things that tag poor packets on Ingress that borders through the core can be acted on and are done. I can't say that on the public internet at large, I've seen that action and then consequently, I'm not sure that a high rate probe would review that unless you were maybe modulating the type of service bit per packet and then doing tests with type of service zero, one, two, three, four, five to see if there was any distinct difference. That would probably be a great analysis method. I haven't yet tried them in the backbone yet, just probably based on, as I just mentioned, some silly assumptions, but they may not be true, so thank you. The second question was, oh, I'm sorry, refresh my memory. Sure, so switching methods. I have seen some minimal differences between T1s that are frame relay that I know about somewhere between the end host that I'm sampling and the link between my machine. A lot of times it's ATM to some aggregate router breaks out the frame then to the customer's equipment. So the combination of ATM and frame as a media access method, I think that the links effects on temporal or time-ski are pretty minimal. I think it's got a lot more to do with queues and a lot more to do with the IPsec itself, so I'm not sure that you could cleanly detect it in anything more than maybe seeing if you have multiple paths. Specifically, if you had CDDI or FTDI between Cisco riders, I've found that on certain protocols, IPX being one, you don't get perfect sequencing. Some packets are transposed, or three packets are transposed. If you're looking at that kind of thing on this graph, you should see a very, very, very noisy output. I haven't yet tested the robustness of those methods with CDDI or FTDI, namely because you don't have the gear anymore, but I would think that knowing a lot about the max that are out there, you probably should see some large differences between them. Yes, generally, ethernet switches by and large are the quietest devices I've passed traffic through. A flip cable doesn't look much different. How much time do we have left? Five. Hit me up to the show, I can show you some other graphs online. Sure, as far as knowing what type of switch you'd have in place, you'd have to have a good enough sample set with actual cross traffic on the backbone of the switch to disambiguate that. I found that if a switch has just two hosts on it, that's the most ideal condition, and you get pretty darn near wire rate with a few microseconds of forwarding delay. And that's pretty consistent, but if you have cross traffic, that's where things get very, very interesting because how a vendor multiplexes access to the backbone bus, even if it's a virtualized matrix on a core or an ASIC somewhere, is still per clock cycle, still a rigid function that's gonna be exposed by high frequency samples. So with testing, it should be visible. I would go on the limb and say, have I done more than Cisco? Now I tell us, and unfortunately not. I just don't have a lot of gear. It would be very interesting to go through Xiland, Bay, Lucent, all the big names you've had actual switches in, maybe COLA facilities or core environments and do a good cross-sectional study of what are the effects of port pairs and then cross traffic present, but not yet. So there was one more, is it over here? Yes, sure. So the reason that got in there was when I was writing my presentation, we had a 2550 with layer three. And with layer three enabled specifically, that device induces different types of variations at high frequencies when you have those ports you're testing through in the span group. I had originally not disambigrated that the problem was I thought any port with any chassis that the span port configured for any traffic was revealed by a pair of hosts communicating through it. It turns out it's only the pair that if you have the port you're on being spanned. But the effects are basically higher frequency variations, some wispy noises, so to speak, 70 to 80 Hertz typically. That is detectable, so to speak. It's not robust and that's why I didn't include it in my presentation. You have to really have a quiet environment to see the two sides of it being on and off. So unfortunately it's not detectable really from say OC three to aggregate router some larger center through a colo to that device. Yes? Well we couldn't turn it on and turn it off in the core routers in the university. But in the lab I've seen on our TELUSN specifically on the alternative routers, they typically will have these rough things on the second or a minute intervals that kind of do some cleanup work. The Cisco that I showed with the big spikes orders of magnitude above the average low latency were samples from them were presumably from NetFlow export. The other elements that I didn't show but I could welcome to see as we're done show minute intervals of something happening on the management interface. And it turns out that something was OSPF, SPF calculations on a river stone. There's a bigger one every hour too that even drops packets to port that other folks at other networks have reported. I haven't observed that myself but certainly that would show up in these samples as well. But yes, depending on how the CPU and how the router is architected, each individual ICMP probe could be a new flow and if that's a heavyweight operation it's gonna compete for CPU with a routing protocol that should have more priority you think. So that should be visible in a very interesting way but needs to be disambiguated more. Just haven't had the opportunity to turn this protocol on, that one on, this one off. Do this redistribution with this protocol, that sort of thing yet. But that's certainly in the works and maybe next year we'll have it. Any others? Thank you very much for coming.