 Welcome. Hi. My name is Ben Kochi. I'm one of the maintainers of the Prometheus SNMP Exporter. SNMP is a networking protocol that's used to manage and gather data from network devices, typically routers, switches, that kind of thing. It's very old, but fortunately the data model that it uses maps very well into Prometheus metrics. The metric trees can be mapped into metrics and they're indexed in tables and the indexes can be mapped to labels. This works out really well. So I've got a couple of old Juniper switches. They're in a switch stack and there's quite a lot of ports and a lot of data to gather. So let me start up a quick scrape. Well, that's going. That's taking a while. Let's take a look. I've turned on SNMP Exporter debug logging and let's see how long it takes to gather this data. Well, it's still going. So while we're looking waiting, let's take a look at the SNMP configuration that I've added to my Juniper switch. So there's some stuff that I've left out, but this is the interesting bit that helps improve performance. The first thing I did to improve performance was I added an interface filter and this drops some of the data from the device that I don't actually need to gather from the device. There's a number of subinterfaces and it's a little bit cryptic, but basically this drops the subinterface data from the output of the switch. And the second thing I've done is I've created a, I've added these stats cache that caches the data for 29 seconds and this is designed to match with the scrape interval. So if I hit the device twice from two different Prometheus instances, it'll produce cache data, which is should be much faster than producing the pulling the raw data from the switch. And but I wanted to make sure that I didn't cache longer than one scrape interval. So I've made it one seconds shorter than the actual scrape interval. Let's see how that squawk is doing. Okay. So that walk completed and it took 22 seconds. Well, it's not bad, but it's not great. So let's see if we can figure out why and or how to improve this. Well, so we've got two subtree walks here in the debug log. One of them took 12 seconds. One of them took, took 8.9.8 seconds. Well, that pretty much matches up with the default IF Mib. And so this is the walk configuration that I've asked the device to produce data for. And so the interfaces table in the IF X table come from this IF Mib. And as you can see here, these two tables, the IF table and the IF X table contain a lot of sub trees. And so the first thing we can do is well, let's see what happens if we take and split that out. So I've taken and I built an expanded tree that takes and expands all of these sub trees. And let's run that scrape. So here's IF Mib expanded and let's see what happens if we try and load this. And we'll wait for those logs to finish. All right. So that's going a little bit faster. Well, sort of. It's still taking somewhere in the order of 5 to 600 milliseconds per subtree to gather all this data. And so we haven't really improved the speed by making the scrapes more granular. So it must be something about the scrape data that makes it take so long to produce those metrics. So the next thing we can do is we can simply stop ingesting data we don't need. So here's a generator config that I've created that only gathers exactly what I need from the device, which is the high capacity counters for all the basics. And then I created a second config that gathers all the error counters and a couple of other things like admin status, upper status and port speed. And so once this is done producing data, yeah, so that still took 24 seconds. It definitely wasn't any faster. So let's see what happens if I do the same thing. And I only gather my mini config. Well, let's take a look. Let's wait for that. Walk to run and see how long that take. That looks like it completed. Well, that was much, much faster. I wonder why the system log seems to be a little bit lagged. But let's see if we can get that to produce more data. There we go. Yeah, so that walk only took six seconds. So the big trick to do if you're gathering data is too slow is turn on SNMP exported to blog logging, examine all the sub trees to find out if any specific sub tree is fast or slow, and then reduce the amount of data that you're gathering. Thanks. If you want to see these configurations, I put them up on my GitHub under my tools repo under the SNMP exported directory. And I have a lot of example configs here. Thank you.