 Welcome to another edition of RCE. Again, this is Brock Palin. You can find us online at RCE-cast.com. You can find links to subscribe, the entire back catalog, and links to all of our Twitter's and blogs and all that sort of fun stuff. Once again, I have Jeff Squires of Cisco Systems and one of the authors of OpenMPI. Jeff, thanks again for your time. Hey, Brock. Yeah, it's nice to be out in front of our recording schedule again. We've actually got a few queued up and we're back on a regular schedule. This is good. We got kind of screwed up over the supercomputing and holiday season and it's taken us a little while to recover, but now we're back and this is a good feeling. What do we got for today? So today we're going to be looking at one of the things people love to blame their problems on, networks. And so how do we actually test networks and what are toolkits for supporting that? Yeah, yeah, you work for a network provider and people love to blame the network. So we're going to be talking to Jason today from ESNet and one of the contributors to perf sonars we're going to be talking about. So Jason, why don't you take a moment to introduce yourself. Hi guys. So I am Jason Zyrowski and as Brock said, I'm a network engineer from ESNet and ESNet is the Department of Energy's network provider. We connect up all the federal labs and starting over the last year, we've started appearing directly with universities to support our science missions and we're a mission network, which means that we care deeply about being able to deliver on the process of science for all of our different end users. And because of that, making sure networks work is actually one of our main jobs. So the things that we'll be talking about today, perf sonar, it's something I've been involved in for about 10 years. It really helps us make sure that we're running networks properly and gives us the tools that we need to support the scientists. So give us an idea of what perf sonar is. So simply stated, perf sonar really is supposed to be measurement middleware and I'll try to unpack that a little bit. We already have lots of tools that can tell us how a network works. We have things like ping or we have things like IPerf that can give us a network metric. And these are usually invoked by a user sort of on demand from their end system to some other remote location. We also have lots of passive monitoring tools as well like SNMP so that we can get counters from different network devices. But what we found is that there's not a way to share this information very, very cleanly. A test that somebody makes on their own, they keep that information and it's not widely shared with other users. So having measurement middleware, something that enables this sharing has been a huge win in this space because now we can invoke tests on different end sites. We can have a way to share them in the middle and then we have a common platform that allows us to visualize an alarm. So when we have a problem that somebody has perceived on the network, we can identify it a lot quicker and hopefully get to a resolution a lot faster. And I think the last thing that perf sonar really is, is this is a tool to raise user expectations. Many people aren't aware of how well they could be performing on a network with whatever tasks they're trying to do. And perf sonar helps to level the playing field with regards to that. So if you're the provider for all the DOE labs, it seems like you're one organization between all of it. So you mentioned the capabilities of sharing the results and things like that. Why is that important to you? Well, so just sort of to noodle on that a little bit, and this goes a little bit more than just DOE labs. If we think about what a common science use case is within the DOE, it may originate from a lab where we have an instrument, something like a light source or a genomic sequence or something like that. There's going to be users that are sitting in universities or even commercial providers, anywhere else in the entire world that are going to need to have access to that information. So they're going to need to use a network and the network is just as much an important instrument of science anymore as the actual device that's creating this data. This network needs to be running cleanly and efficiently, and there's going to be many domains involved in the problem. To be able to really fully understand we need to have visibility and this visibility comes from a tool like perf sonar. We can set up these regular end to end tests across domains, and we can then look at the results and then either a determine if there is a problem. If things are looking okay, then there shouldn't be a problem for the end user or get in there and debug things when we start to detect that there really is something that's going on. So from ESNet and ESNets, this network between DOE labs, there's also people like Internet2 and Exceed uses Internet2 and a bunch of local networks and uses perf sonar and stuff. So there's a lot more people involved with perf sonar. Can you give us a little bit of the history of it? Sure. So the original idea of perf sonar sort of came of a couple of different efforts, probably in the early part of the 2000s. There were really five different efforts that were sort of looking at a way to try to create, you know, a standardization on network measurement. There was an effort in Europe called the Jant2JRA1 project, which was created by the EU government, but it had 10 members of all the different end runs that were located in Europe. There's the Internet2 end performance initiative, which was in the US R&E space. There was the Global Grid Forum, which was trying to just standardize measurement formats, not necessarily looking to develop software. There was ESNet, which was trying to roll out a new network at the time. And then there was also the RNP network in Brazil that had its own monitoring effort called MONIP. Each of these sort of met within that span of the early 2000s, and they decided that, you know, going alone was silly, so it's time to really combine into one effort. And the original software was released shortly thereafter, I think probably in about 2004, and it was originally written in Java. Since then, over the past 10 years, there's been many more implementations by these core partners as well as others who have received soft funding. And this has sort of created what we call today the perf sonar toolkit, which is sort of a flagship product that we release. And this is a complete Linux distribution that also has all the perf sonar tools, some glue in the middle to help with the administrative tasks, but it really gets to the core of what the mission should be, which is evaluating network performance and trying to evaluate it in such a way that people can understand it. All right, so let's get into some of the details then. So you say there's a whole Linux distribution around your common core. What are some of these core tools? I mean, what kind of things are you testing? So it's going to be some of the basic tools that many people are familiar with, something like Tracer or TracePath, the ability to look at a layer three hop count between two different insights. We also have things like ping as well, a way to measure round trip latency. But there's a couple of other tools that people may not be as familiar with. One of them is called OAMP. This is a one way active measurement protocol. It's written in an RFC. And using a tool like this, we're able to measure each direction of a test independently, which is different than a traditional ping test. But we also get a lot more information out of it. We get packet loss, we get packet duplication, we get out of orderness latency jitter, which is the arrival time between packets. A test like this is crucial to try to figure out if there actually is a problem on the network. It's running in an active manner. It's constantly run. And we're able to see macro behaviors on the network, things like microbursts and congestion that may appear in a very, very quick period of time. Something that we may not be able to see with a traditional tool like SNMP monitoring, because that's designed to really pull on a slower interval. And that sort of masks over macro behaviors of what networks may be doing. There's other tools that are included in this like BWCTL and IPERF3 that measure achievable bandwidth. All of these base tools sort of sit at the lower layer. They're the ones that actually perform the measurements. On top of it we sort of have this sharing and this archival layer that stores the different pieces of data. And then we have the presentation layer on top that allows the user to configure the tests, but also allows them to visualize the results, configure them into an alarming system like Nagios if they choose to do so. So really the entire Linux distribution is meant to sort of isolate the task of measurement, put it onto a specific machine that sits within the network somewhere and simplifies your life so that you really don't need to worry about all these different things, which has been a huge win for operation staffs. So you mentioned, you know, dedicated machine running these tests between the perf sonar machines, trusting the links between them. But what if I actually want to get to my end desktop, if I'm a researcher in my lab and I'm wondering why I can't get to my storage very fast or something like that and I want to test a network. Does perf sonar help with that? Yeah, there's two ways that it could. So if we're talking about a desktop or a laptop, it's probably a system that has something like a web browser available. There's two tools that we've packaged on the toolkit for a long time. One of them is called NDT, which is the network diagnostic toolkit. And the other is Npad, which is a tool from Pittsburgh Supercomputer Center. Both of these are designed to be servers that run on the toolkit instance. So some provider would, of course, have to set this up and maintain it. Somebody like Internet2 or ESNet, perhaps the user can then visit this website via their browser and get a pretty good indication of what performance would be to their host. And what this is actually testing is it's testing your host and therefore your browser, your operating system's ability to push and pull information over the network. So if your host is improperly tuned, if the network interface is not properly configured on that host, all the other things that may be between that user and that server, they'll come out in a test like that. We normally don't recommend that people try to do science to the desktop. I think that this is a trend that's eventually going to be phased out. So all of the personal tools are available as RPMs. And if you have a cluster node or you have a storage resource or something like that, you have the ability to install these things via these RPM packages as well and get almost the same benefits as if you were installing a full toolkit. Now let me ask a derivative question off of that. So diagnosing problems within a single organization, not necessarily at the user level, but let's say a network administrator says, I think I'm getting weird performance over to that segment over there. Is this something where such an administrator could install two instances of perf sonar and just run tests between them across their own local networking? Absolutely. One of the common cases that we find is that people want to have a better instrumentation on their local or their their metro area networks. So we find that people do have many of these instances that may be deployed within a small geographical footprint. There's several campuses in the United States that have taken this approach and they're at the point now where they have a persona node available within every single building and they're constantly running tests between them. There is a little bit of a caveat to there though. We have to be careful with some of these tests. A test like an achievable bandwidth test, something like IPERF that people may be familiar with. It's designed to flood the network. It's designed to see how much bandwidth you can actually get. And when we run this in a local setting, this can actually be quite destructive because it fills up local buffers. It can cause congestion within a local setting. And while it is providing a useful measurement for you as you run it, it could be causing problems for other people within the network as well. So when running within a LAN environment, we often recommend that you need to do some of the lighter weight tests, something like a ping to just get measurements of a latency or packet loss. In particular, we do recommend OAMP though because it is the best at detecting when there is going to be packet loss duplication out of ordering any of those different things. So within a local setting, that's what we recommend people try to do. Now what kind of platforms do you need to run a server, particularly within this kind of LAN environment here? And I know we're digressing a little bit across from our original topic, but this is one particular use that I think many of our users might find useful. Could I run it on something as simple as a Raspberry Pi, for example, or a virtual machine? So the Raspberry Pi and the other smaller micro-machines is an emerging area that we're still looking into, and we have a lot of community interest in this topic as well. So there's a couple of others that are looking into it. Right now, we found that you have to be kind of careful with the hardware itself. And the reason is because a tool like OAMP is heavily reliant on timekeeping from a daemon-like NTP. To get an accurate measurement of latency, you need to have a very accurate representation of time. So what we found is that smaller machines that may not have a battery on board or may not have the ability to control their timing very accurately, they can suffer a little bit with regards to the measurement and we end up with a higher error rate. It's not to say that they still can't do the job of looking at packet loss numbers or being able to give you a quick and dirty result, but we're still trying to figure out which platform of those smaller machines would work the best. The same sort of goes for virtual machines. We found that hypervisors, some of them in particular, will try to push time updates on the backplane between the hypervisor and the virtual hosts themselves. And this can actually cause a lot of havoc for NTP because it doesn't understand why time is being skipped ahead or skipped back and that'll cause the demon to turn off. So we've had a little bit of a challenge trying to get the virtualization aspect to work well with an accurate measurement. By far the best way to do this is still with a dedicated server. You can get by in a pinch with the desktop or a laptop when we hope to have a better answer about the smaller machines in the next couple of years. So what about the test case where I'm really trying to, you know, we have more and more users, you talk about not doing at the desktop, there's even the extreme cases that we have more and more users from campuses that are using DOE, NSF, resources halfway across the country or halfway around the world. And can I use perf sonar to test, you know, those links too, but obviously I don't, if I'm a local administrator at a university, I don't control the perf sonar machine at the other end, even if they have one. So how can this work? Sure, so one of the guiding principles and still one of the most important things that we do today is that we wanted measurement to be open and a lot easier to perform. And I think that many of us can probably remember the battle days where we had to try to schedule running an iPerf test with somebody that was three or four continents away. And this may have involved lots of emails or perhaps even phone calls in the middle of the night to try to find a range of time when it was okay for us to run a potentially destructive test and they gave us permission to do so. With perf sonar, it really simplifies all that. We can simply look inside of a directory service that we've maintained which has a listing of all the public servers that people have decided to advertise and make public. We can identify that oh, I'm sitting at Michigan and I need to test to some exceed site. We get the name of that server and there's a pretty simple policy that comes with every single perf sonar node. It's an open policy. It allows you to run that test that's not very, very long perhaps about 30 seconds but it allows you to do that test by default. So if somebody chooses to go in and change the policy so that they turn off that perf sonar node of course there's there's going to need to be some backdoor communication with them to get them to open it back up again. But by default, you can set up tests to anybody else that you can find via that lookup service and hopefully get a better idea of what the end end performance is between you and their network. So you mentioned a test can be like 30 seconds but how do you make sure you don't have multiple bandwidth test going to a single box? They all get a bad number what the link can actually deliver and also how do you prevent like people from basically using your perf sonar box from doing a denial service? So by default, the BW CTL tool which is the one that runs these heavier bandwidth tests it runs in a serial fashion so it only allows you to do one bandwidth test at a time and then other requests that come in will be cued and then they'll eventually kick off when the queue happens to drain out a little bit. All of our testing and we've gone through several security audits and years of operational experience at this point. We've never seen anybody use the perf sonar nodes for a denial of service. The only thing that we've noticed is that if you make a lot of requests in a short period of time you could of course cause the queue to grow to the point where it becomes less useful but that's not causing any additional traffic on your network it's just making that tool a little bit less useful because it can't honor real tests that are coming in. So thus far we've not really had any issue where the tests themselves have been causing a lot of additional use on the network. Now going at this in a slightly different direction you mentioned sharing of data between perf sonar sites and stuff like that. Can you tell us a little bit about the aggregation and how the data is put together to create a cohesive picture of performance on the network? So every single perf sonar node if we're talking about the toolkit in particular the user has the ability to configure these regular tests. So they can set up a bandwidth test to some other remote location they can set up a latency test. All the data from these localized tests that they set up is stored within a Cassandra database on the machine itself. This is rotated over time so that it doesn't grow infinitely but we do have all of the results from from these tests in there both in terms of the raw results and some aggregation numbers. So there are basic visualizations that come with the toolkit that allow you to sort of overlay what the effects of bandwidth and packet loss and latency look like between these different sites to get a fuller picture. Let's say we're looking to see what the aggregated performance is for a larger network something like ESNAT or something like Internet 2 or XEED we need to rely on some other tools that sort of sit off to the side. We have a dashboard application for example that allows us to identify a handful of nodes let's say 10 or so that we can set up a matrix of tests too. We can then get a visualization of you know just in terms of the stoplight colors are things performing well between these different insights. This can then be plugged into something like a Nagios instance to tell us when things are going bad. There are some other emerging efforts to try to do a little bit more on the way of analysis and this is probably one of the dangers of open source for any project out there. We don't have a lot of experts in statistical analysis and data crunching. So we certainly love to be able to have more visualizations available and more analysis tools that can aggregate these things from many different locations into one. We just haven't been able to identify the person that can help with yet. So soft funding has been very good and a lot of people have NSF and DOE funding to try to look at this problem so that we can get a little bit closer to automatic detection. Along those lines when I run a test with a remote person on our site is the test symmetrical? Do they get results as well? No. They don't get results if you've configured the test on your local instance. The local instance your local instance would store them. The other side only gets noticed that a test has been run via the logs so there wouldn't be any actual act of storing data on their end. So I've gone through a couple of personal art tutorials that were quite good at conferences and such and I found it kind of interesting the types of problems that have been found. So can you describe what some of the most common issues at Perf Sonar helps people discover? And I think that I may have even taught one of those tutorials that an exceed session long ago. So I believe that there's a lot that we can talk about in this space but the most common problem that we're facing today really comes down to the buffering of network devices. When somebody plugs in one of these devices on their local campus and they invoke some tests right away they may notice that everything looks really really poor and this may be because the location that they plugged it into locally has a lot of issues. Perhaps they've plugged it into a 1U data center switch that doesn't have the ability to tune the interfaces and when we start up a 10 gig test originating locally and trying to get out through the wide area links we're going to overtax the buffers on that first hop switch almost immediately and that's going to impact our TCP performance to everything to the outside world. So once we discover this of course we go in there and we fix it perhaps we move the host perhaps we tune the buffers if we have the ability to do so and then we may start to turn up other issues. Damaged or kinked fiber is a common one. Congestion from fan-in of multiple resources into a single device is another thing that we see quite often. Just a lot of issues are related to the actual configuration of networks more than the construction and this is one of the dangers of relying on default configurations. If we have science use cases things that are going to require these large flows that are going to have to intermingle with more of an enterprise use case we're going to have this sort of mismatch of requirements and that's going to have to cause us to rethink the architecture a little bit. And maybe later we can talk about science DMZs but before that the what is like the normal user experience what do most places do now if they aren't proactively doing this testing do they just assume it's as fast as it is? Well it's hard for me to say what they're assuming but a lot of a very common thing that we've heard is that if we look at what a local network looks like and we run some local tests due to the way that TCP performs when we have a short round trip time we're not going to really turn up a lot of these issues so a common thing that I used to hear years ago is that oh I ran IPERF from my my cluster node to my border network and it showed as being 10 gigs so therefore everything must be okay what we miss in that sort of explanation is that yes that's about a one millisecond maybe a two millisecond path we're able to get a full 10 gig of performance but that's only because TCP can recover very quickly even in the presence of pack of loss of multiple percentage points when we start to go a little bit further away that's when we start to see these issues so TCP is a a fickle sort of protocol in that we need to exercise a longer path just so that we can turn up problems that may be on the shorter portion of that path so those that are not using perf sooner they may have other solutions in place but often they're not going to be end to end and if we have science being end to end and this is the common thing that we're seeing anymore science doesn't occur just within a single domain and it's really a worldwide sort of thing we need to have this sort of end to end visibility to really get to the bottom of these things so you're talking about doing these long range end to end tests how do I even find a long distance end point you know if an exceed site you know how they name theirs Japanese sites things like that I mean how do I even find out do I have to call them up and ask hey what's your perf sonar node well you know that's how we used to have to do these things we asked that we have to identify a the networks that we were going to be traversing and then be the people that we can talk to in each of them so you know for all the people who do have social anxiety there's good news we we have automated this procedure a little bit of course it could still be automated a little bit more but what we have right now is every node that comes online the the person who brings it up has the ability to configure it such that it'll register into a directory and this directory has a programmatic API it also has a couple of visualizations that are sort of built on top of it so if you're looking for a node that happens to be on the APAN network in Japan you simply search for APANJP you should get a couple of results that come back that allow you to pick out and figure out which test that you wanted to invoke over there if the user wants to keep a node private they have the ability to do so as well but as of about earlier this week when I looked at the numbers we have about 1400 public nodes that are available in about 300 different domains around the world so there is pretty wide coverage of these it's not on every single path eventually we will get to the point where we have visibility on almost every single link at least major and then we can get in there and really debug a lot of these harder issues but the coverage is good enough and I think it was Metcalf's law that says that a product is really only useful when you have deep coverage within the marketplace like that so we're finally at the point now where Perfcenter is useful by default because there are options out there to use it so just to be clear these public options to use are are not restricted to ES NAT or Exceed or Internet2 or things like that these are generally available on the public internet if they show up in the lookup service they have to at least be reachable then that's not to say that the user may not have put a policy in place that restricts how long of a test that you could run or if you can run a test at all so every single node also has a contact address that's associated with it so for example if Broca sitting at Michigan and wants to invoke a test to one of the Exceed sites he can find it via the lookup service he can try to invoke the test from his local machine if he finds that there's an error and the error comes back and says you know permission denied or something like that he can contact the owner and ask why he can say you know is there a restriction in place if there is that person can then of course add a a wait listing rule to allow him to do so but our default case is to leave them open and we find that almost every single node that's out there unless it's been explicitly denied will allow somebody to invoke a test to it all right now a follow up on that if I would say a large organization in a small area like a like a school system right that has a lot of different points of presence within a metro area and whatnot is it possible to set up can you use or is this a typical use case for an untypical use case for you know every one of these points of presence within say a city to set up a perf sonar and you know exchange data between each other exchange tests between each other but that's kind of a closed perf sonar network that they don't make them so even though these are on the general internet they're not available to you know outsiders so to speak outside of the organization yeah for that use case we find that people usually do just use RFC 1918 space when they configure these things that way they don't show up in the the public infrastructure and they can only be used for for internal testing that's by far I think the the easiest way to a protect the resources from people who shouldn't be testing to them and b also make sure that your resources are able to give you accurate information about your local network setup in that way now just for our listeners who don't know all their RFC numbers by heart can you define what that one is sure RFC 1918 is the the definition of private address space for example people who have a home network device that they've gotten from their their provider it's going to hand out 192.168 addresses to all the devices that you may have in your house so your phones your microwaves for some reason your your computers and anything that requests an IP address is going to be given this private address space via the magic of network address translation you could still get to the outside internet and download things or upload things whatever you choose to do but your host is sort of protected by this this wall that doesn't allow to be advertised fully to the the wide open world so you've mentioned a couple of different ways in which the toolkit which is a collection of all these tools IPERF BWCTL OAMP and then there's a web interface and there's a Cassandra database you've mentioned that it's a full Linux distribution I've also seen a Docker Docker file out there and then you also mentioned RPMs what what's kind of the point of providing it in so many different forms and which way would you recommend somebody to operate it well you know it comes down to choice and it's really what the community was asking for we started with the toolkit and the original invocation of this was actually using nopics for people who can remember such a thing which was a Debian based distribution that was meant to be put onto live CDs so we started with the toolkit as sort of the all enclosed thing because it was easy for us to support we were able to package all the different products we were able to make the glue that sort of put them all together and it got it out there in the easiest possible way it was a single thing people had to download there was minimal configuration this was a lot better than trying to configure things by source and build them and all these other harder things that that people can remember from the the battle days of Linux what we found is that people then said to us okay well that's great but we don't use nopics within our thing or then when we eventually switched over to Centos they said we don't use Centos we're a Debian shop or we're a scientific Linux shop so by making the RPMs available including the source RPMs people can then download and build this for whatever installation they happen to be using within their their their own environment so this allowed them to to build it into their configuration management systems a lot easier so as it stands today we still release the toolkit as sort of a flagship product and what we recommend people do if they want to have a dedicated machine that is just used for measurement so this is what a provider should probably download this is what a campus should use if they're going to have this on the border or next to their high performance computing resources if we're just talking about a scientist you know somebody who just has a very small cluster for example that they're privately using they probably don't need an entire toolkit so they can just use the RPMs and put those on one of their storage nodes or one of their compute nodes for example and get the benefits of being able to kick off a live test every now and then just in case you know just to verify that things are working well with the Docker this is something that just started as a project that somebody was playing with and we found that this was a pretty good distribution method as well it's not one of the officially supported releases at this point but within the next couple of releases we expect it probably to get out there and do so we're going to be exploring other distributions as well in particular Debian and what we found is that the smaller machines one of the questions that was asked earlier a lot of these are built on Debian based distributions you know Raspbian and all the other derivatives that run on top of them so to get the tools onto these we're going to need to be able to support Debian a little bit cleaner and that would be making the packages available for this distribution so you said something there the so we've been talking a lot about having perfs on our box the perfs on our box and having scheduled tests so we can have views in the network all the time but you mentioned for a small user just installing the rpm and kicking off an on-demand test so you can kind of install like just a command line bwctl and I have to have a web interface and an archive and stuff just so you can kind of test yeah absolutely all the rpms are available via the the same rpm just or the same rpm repository that the toolkit is available in so really it just comes down to installing that package and then doing a yum install bwctl and then you're able to get both the daemon and the client packages that you can invoke these things on demand it doesn't include all that extra stuff on top and sometimes you don't need that if we're just talking about being able to verify performance between our cluster node and somebody else's pretty much all we need to do is be able to kick off the on-demand test we don't need to have the additional layer of visualization on top of it does esnet or anyone make a almost like an appliance like a piece of hardware that's known to already have like the tuning from faster data and all these things kind of in place where I can literally buy it plug it in and use it like an appliance so this is an often asked question and to sort of really appreciate the answer to this I have to go back in history a little bit when we were still offering the live cd product and the live cd for people who may have used it or were familiar with it was meant to sort of be almost a complete solution where you just downloaded the cd stuck it in the tray or a usb key both ways were supported booted it and then everything was ready to go now because it was on memory that was not writable in this case it wasn't able to be updated but it still gave you all sort of the same functionality it included all the hooks that were there to make the tunings of the operating system so that you did sort of get the benefit of those faster data tunings but what we found is that people would look at this and they would call it an appliance and from the standpoint of trying to secure one's network and do do diligence for all the things that are going on this was actually quite a bad thing because the software wasn't being updated and a couple of years not even a couple of years a couple of months ago people may be familiar with the shell shock bug that came out and this was devastating to the persona community because a lot of people had these old live CDs or perhaps they just had hosts that were not well patched and these machines were taken down by this attack because they were not watched very closely so by passing around an appliance it's actually kind of gets you into the mentality that oh I don't need to babysit it as much or I don't need to care about it as much and this is quite a dangerous way to think we want the persona nodes to either be thought of in one of two different ways and credit to one of the users on the mailing list to coin this phrase but we have to treat them as either cattle or pets in terms of cattle we're not afraid if they get owned we can simply just reinstall them and go on with our day if they're patched they need to be integrated deeply into our infrastructure and that means that they need to have configuration management they need to have security policy applied to them all of these different things sort of point to not wanting these to be an appliance and that's one of the reasons why you probably will not see some of the national providers or even the the partners in the perf sonar project really release this complete solution because that'll sort of send the wrong message we still want people to treat them as servers that they own and love to a certain extent so that they become useful and safe so we've talked through a bunch of use case scenarios of how perf sonar can be used in a single site across a city across the world things like that what's the strangest use of perf sonar that you have heard of something that you not necessarily design the system for but it turned out to work fairly well for somebody so we've had a lot of interesting use cases and I think this is one of the beauties of open source software you know we can design a product that we think should work a certain way and somebody can download it they can tweak it they can add to it whatever they think that they need to do and they come out on the other end with probably a better use case than we could have ever imagined one of the things that we'd always said is that you know we want all these nodes to be open and they only really are useful when people can find them and they can use them for their their individual things one of our customers I guess customers probably the wrong word one of our collaborators which runs a fairly extensive national network they wanted to use this on a VPN and we said well we're not really sure if things will work out okay if they don't have the ability to talk to the outside world but you can certainly try so in the end they ended up deploying close to a hundred of them across a nationwide network that was all VPN didn't even have external access and the only complication that they ran into was figuring out how to get software updates to them they were all able to work internally rather well they were set up to use only the lightweight tests and they provided a lot of useful information to people who were using this VPN service in particular those that were connecting through very very small links you know even less than a dial-up modem or a DSL connection so we find that that was something that we hadn't thought about but it worked rather well some of the other interesting use cases that I think that we've we've sort of identified over the past couple of years campuses have just purchased hundreds of these and have really tested the scalability of the tools so we didn't think that the databases and the various tools would be able to support simultaneous tests of more than a hundred instances and they're holding up fine in fact producing lots of useful information that is fed into real-time systems so people who are comfortable with getting in there and playing with the APIs and learning how the storage infrastructure works they've been able to produce some nice visualizations and nice tools on top of it No, let me throw in one thing here because you you just threw in a couple more instances of a use case that I wanted to talk about but I figured we'd run out of time but you just brought it up so I'm going to take that as license you were talking about within a single organization having tens or hundreds of perf sonar nodes what's a typical methodology for running in there do you have a central perf sonar that initiates all the tests and keeps all the results locally and then all the other perf sonar instances are really just end points to run tests against or what's common? So I think that that is probably the most common and somebody will set up a central node that is just the orchestrator of the tests it maintains the configuration file of all the things that need to be run and then all the information is stored there as well so that's where the main database is so this host probably has a little bit more guts than some of the the testers may have it has more main storage it probably has a lot more ram all the beakers then can almost be very very lightweight they don't need to have a whole lot of onboard storage because they make the test and the results of that tester then sent back to that central instance um the other way you can do this is if you have a lot of these things you may find that trying to do a full you know 100 by 100 mesh or even beyond that is going to start to become challenging so people have then split it into maybe a couple of centralized nodes you know sort of the the old star network design where we have one managing the tests for a small handful we have another cluster that does the same thing and then maybe we have a couple of those clusters testing between each other so depending on how many you try to scale you may need to break it up in that manner but for the most part we do sort of encourage the centralized centralization because it puts the data in a in a the same place and it also allows you to reduce the the construction of what those beacons are going to look like a little bit so what language are you making most of this system in it's kind of gone through a a transition over the past couple of years it started as being almost all in Java back in the the early part of the 2000s a little bit after that we started adopting most of our things in the Perl programming language with a couple of Java pieces as well there's some tools that are written in Python and Django we still have some that are C and C++ so if we look today I'd say that all seven of those languages are still sort of present for different tools in here and everything is still licensed under open source licenses and the licenses do vary but Apache is probably our most common and we make everything available for our project websites so that people can participate if they choose to do so now a question I'd like to ask just being a developer myself when we talk to other developing projects is what version control system do you use and and why so this has also gone through an evolution we started originally using CVS as probably many people did and when CVS started falling out of rage I think we migrated to subversion but we were using both of these just on local servers you know we were young enough and silly enough to think that we can just manage this all ourselves and as the project grew we found that that started to become a little bit of a challenge you know nobody wants to be a full-time administrator just making sure that your version control system is working properly so the solution that we adopted probably around 2007 or eight was to go to Google code and this allowed us to have the the Wiki the issue tracker the subversion repo sort of everything in place so that we didn't have to run a private Wiki a private bug tracker the repo anymore only this last year Google code announced that it was going to be ceasing operations so we've had to sort of rethink our approach here and within the last month we completed our migration over to GitHub so at this point all of the personal products are over there and also some of the related projects that we affiliate with IPer 3 NDT all of these are now located over at GitHub so what do you got coming for the future what are some things you'd like to see done and what's in the pipeline so this is a question that I think that it's it's fair to me for for me to turn this around because personal really is a bit of a community project and we rely heavily on what the community tells us you know what we're doing good what we're doing bad and in ways that we can try to improve all the conferences that we attend and you know including things like super computing there there's always going to be sort of a a public facing session where we ask people to come and visit and say hey what can person are due for you that we aren't doing right now so we have a couple of things that we've put on deck that we think are valuable and many of these come from from community feedback but we're certainly open to more and we do encourage people to to send suggestions on the mailing lists some of the things that we've identified as being important over the next you know year to probably two years looking at these smaller systems that we've talked about you know whether they'd be something like a Raspberry Pi or anything else that's sort of small form factor and inexpensive versus a traditional server and because of that there's going to be a couple of of related things we want to do auto configuration for example we just want the node to come online be able to phone home and get a list of tests that it needs to perform because many of these systems are going to be headless and deployed within you know dark wiring closets around a campus environment it's going to be infeasible for somebody to be carrying around cables and a monitor to talk to them so we want them to sort of have the intelligence to do what they need to do automatically along with that we're going to need to really be exploring these different Linux distributions that we've not traditionally released software for some of the other affiliated projects of perfzone are not really a part of the main line are looking at more intelligent ways to use this monitoring data there's at least two CC IIE grants that I'm aware of this is the NSF solicitation for campus cyber infrastructure that was released over the past couple of years at least two of these are looking at ways to use perfzone are to guide intelligent data movement and also automatically deduce problems within the network so we do look forward to the results of those research efforts because we're pretty sure that once they are able to prove out their results they're able to write their papers they can be rolled into the main line of perfzone are so that everybody can benefit from them good okay Jason thanks a lot for your time where can people find a perfzone are a toolkit and how can they get involved so everything is available via our main website it's www.perfzoneare.net we have just recently gone through a redesign so that should feature just about anything that somebody is going to need to find there it has a link to our a source code repository at github it has a list of all our mailing lists we are and will always be an open source project that relies heavily on the input of others so whether you just want to send us feedback on a product that you need us to examine or if something isn't working the way that it should be working please the lines are open and we're always interested in hearing about that if you want to contribute and be a part of the effort that's great as well we've had a lot of important people over the years come and contribute pieces of their time to make some of these products better and we will certainly be relying on that as we go into the future okay kason thanks again thank you thanks jason all right