 Internet scanning or how to become your ISP. Please give a big round of applause to Johannes Klick. Can you? OK, that's perfect. Yeah, welcome to the talk. First, internet scanning, challenges, and new approaches are how to become your own ISP. My name is Johannes Klick. I'm concentrating on internet scanning. And my second favorite topic is industrial control system, OT hacking. Have discovered in the past several vulnerabilities, industrial control systems, and the firewalls that have protected them, and also have given some talks to PhDs. That's Russia and in Black Hat USA about how to find industrial control systems on the internet and then to compromise them. And also I did some publications on academic conferences like the ACM Internet Measurement Conference. So what is the motivation for the talk? How to set up an infrastructure for scanning the internet repeatedly? So my problem was that Shodan IO does not provide raw data for free. And as a researcher, you are very interested in raw data because you cannot be sure that maybe comprehensive data or manipulated data you will get. And the next problem was that Shodan IO does not provide clean snapshots of the internet. This means they are scanning the internet, but they are not deleting old results. They are putting every time new results into it. And this means that if you have in your think on dynamic IP addresses, you will have the same server twice or whatever, however multiple times in the database. This means you have no real clean data. Census is very good. Also academic projects are in the past. Now they've got a startup. But also since they are in our startups, they are not providing the raw data for free. And they also have some inconsistency in the database. I will show later. And the next point is you need to think about that people or the persons that are running these platforms, they know what are you looking for. And the question is, what are they doing with this data? And the next question is, who might be interested in this data? So that's why I better to scan on your own internet. Here a good example from Shodan. Here you can see for Ternot that Shodan says you have a property of 5 million Ternot hosts. You're looking at census. They say, oh, it's approximately 3.1 million. And also our own scans show that we have 3.1 million Ternot hosts on the internet. If you there, you can see that Shodan has much more systems in the database, caused by the dynamic IPs and not providing clean snapshot scans on the internet. That's why if you're clicking on Shodan on some hosts, you'll see very often that they are not reachable and so on. You maybe know it when you're working with Shodan. On the next point here, you can see inconsistencies on census. On the left side, you can see I look for, please give me the number of all HTTPS hosts that has responded on port 443. On the right side, I did a similar request. I said, please provide me all hosts that had given you, via HTTPS 443, a status code. Status code is that 200, OK, 400, 403. You know what I mean? And you need to get a status code for a full handshake on HTTP. So both requests should result in the same number, but they are differing. On the left side, you have 41 million. On quite the equal request, you have 35 million. And there's a big difference. This shows there are some inconsistencies, and that's why I cannot use this data for my research. And I have no the raw data access. And that was my final motivation. There was in the first quarter of this year, there was a new project called Packertel. Packertel are some researchers that said, yeah, there's no raw data. We are scanning the internet just using Mascan, scanning the internet from one single source IP and doing it in a time span of two hours. And then we are providing the data for all for free. And then they got some problems with SpamHouse. Because SpamHouse is very crazy. They have a lot of black hole IPs on the internet. And then if they receive one or two sins, they are saying, oh, you're a really, really bad botnet. You are malicious, whatever. And then you're putting it on a spam list and then block list and so on. And that problem run Packertel. Because they are ISP said, oh, no, here, you're virtual private. We will kill it. Virtual private server, we will kill it. You will not get there. You cannot scan anymore. And there were also some further stories about Packertel. And then they tried to make SpamHouse angry. And in the end, Brian Krebs docked them. And then they stopped their service. Very fun story. OK, so this talk is about how to set up an infrastructure for long-term repeated scans. I'm very interested in the distribution of vulnerable or exotic network services. For instance, like industrial control systems or building management systems in the internet over the time of different autonomous systems or BGP prefixes. And this talk will explain how to set up this framework. In the first 50%, we will talk about the setup itself. And in the next 50%, we will talk about the results of our scans. So what is history? First try was, like every other body has did here, probably in the room, just taking a fast scanner and running it on the internet from a single source appeal. But that's a really bad idea. You will get blocked very first. And we'll receive a lot of abuse messages. And a virtual private service provider will kill your service. And you are not lucky in the act. Second try, just rent like 20 virtual private servers for four to 10 bucks per month, maybe globally distributed. The results might be getting better because you're using more source appeal addresses. But nevertheless, you have a lot of big problems of abuse messages. Because your service provider will receive abuse messages. They say, oh, you're malicious. You're infected. We have to kill yourselves. Please clean your virtual private server. Or you will get kicked out like the Packetel guys. The third try was to rent a slash 29 from an ISP to get your own who is DB entry with your own abuse mail contact. Then we thought, OK, we are lucky. Now everything will be fine. But the problem was that these automated IDS intrusion systems, they are not only reporting to the abuse mail that's standing in the US entry. They are also sending messages to the maintainer of the IP address block. This means you're direct ISP. And then we got the same problem that our ISP got informed that we are malicious. We have a botnet and whatever. And then we got a lot of trouble. Yeah, well, in the end, we managed a lot of abuse messages on our own. But in the end, it was too complicated to get some discussions with the ISP. So the final solution is we got our own ISP. So we become a RIPE member and get our own autonomous system, a complete slash 22 network. This means 1,024 IP addresses, sign up fee is approximately 2,000 euros, membership fee is like about 1,400 bucks. And then we ran two different co-location spaces, got a further slash 29 from another ISP. And then we bought a server for 30,000 euros for all the big data management. And yes, and then we used auto replies for our abuse emails. This means if somebody feels annoyed by our scans, he sends a abuse message and get an automatic reply, which explains what we are doing, why we are doing the research, and how he can and what is the way to get excluded from our scans by putting them on our blacklist. So the result is that the abuse messages reduced by approximately 90%. And that also the IDS messages that told us that we are get blocked and so on, also massively reduced. And we can now do complete abuse messages of our own. And this shows that using a slash 24 server, like 1,000 IP addresses for source addresses for scanning works really good. So what is the infrastructure? So at first, we have a front end. Then you can click OK, I want to scan this IP address space. Very often we are just saying 0,0,0 less, 0. This means you complete internet. Then this goes to the scan master. The scan master looks what scan algorithm we have chosen. Then he will take the IP for address space, cut it into small pieces, randomize the IP addresses in the small pieces, and then it sends to the search node clusters, the search node cluster scans the internet, and sends the data to the aggregator. Furthermore, it is very important that if a search node fails, that the work package, the search node has to work on the number of IP addresses, we'll get reassigned by a scan master to another search node. This is a kind of fail safe, because very often maybe people using ZMAP or distributed ZMAP scanners, they know it. Then a ZMAP server node got broken, and then you have to do some manual work, and then you have to choose to look on what was the last IP number, and how was the last prime number, and how to, in the end, go on with your own scan. And that's very important. If you want to set up a global scan infrastructure, you need a kind of scan master that is managing the scans and looking that everything is working. And if not, then you need to have a kind of recovery procedure. So the search node cluster itself consists of different search nodes and two different co-location centers. At first, we have a SIN scanner, just sending the masses of SINs out on the internet, itself written to go. And then we have an app scanner. It's a customized ZGREP version, because ZGREP is a very good tool that supports a lot of protocols. And we added some further protocols and used these kind of application scanner. And then all the data goes to the aggregator, the task of the aggregator is that if a search node failed and another search node has taken over in the ANSI work, then you have maybe multiple results. And the aggregator aggregates all data and pays attention that you will not have multiple results in the database. Then you have data enrichment. That's very important. Just scanning the internet is not sufficient. So maybe you have all the ports, and then you have the application data, like TLS, certificates, information, and so on. But you have to enrichment with further data. We have access to the Whois database. So the internet objects, this means if you're doing Whois 8.8.8, you will receive all of that. It's a company that owns the Whois blog, and then maybe send some information for what is the Whois blog used. Then we have also reverse DNS, Jiro IP, let's say, where is the IP approximately located. And we're using a BGP prefix information. So we're saying, OK, to which announced BGP prefix, does this IP belong? And also, we put the information together which autonomous system belongs to the IP address. As storage solution, we have chosen Elasticsearch because it's one of the best databases I have ever seen. Recently, since the new version 7.3, it's very good. And here, you can see that as a protocol. So we are currently support. We are concentrating very strongly on industrial protocols. But we also support the normal protocols. Most of the protocols are from ZGREP itself. But we have also some special protocols like the Phoenix protocol and some other. So now, thinking about some numbers and settings. So if you really know it, IP addresses, we have approximately 4.3 billion IP addresses. And very often, people are sending all the IP addresses out, but the problem is that only 2.8 billion of these IP addresses are rooted in the internet. This means you can only reach at a maximum of 2.8 billion IP addresses on the internet. So if you get some kind of BGP feed, like if you have your own BGP router or have a feed 2G BGP router or use QIDAR data, you can reduce the number of TCP SYN packets by 35%. Because when you only send the TCP IP packets in the rooted IP space, then you can save a lot of TCP SYN packets. You don't need to scan the private IP addresses and so on. So first thing you have to reconsider is that a TCP SYN packet is approximately 70 bytes big. This means 70 bytes multiplied by 2.8 billion IP addresses means that you're just sending data with a single TCP IP scan, 200 gigabyte of data you will send into the internet. But the interesting point is that you will at the maximum get only reply from 2% of all IP packets you will send out. It's a very easy calculation. The most used protocol on the internet is HTTP. And so you can see that we have 56 million HTTP hosts divided by 2.8 billion rooted IP addresses. It's 2%. This means you have just a hit rate of 2%. 98% of all TCP SYN packets from the 2.8 billion IP addresses are overhead in the app. So next question is using one SYN or two SYN. Every time it's a discussion, we have tested it. Sending doubled SYNs makes no sense because the results get not doubled, right? You see that's neglectable in my perspective. You maybe receive some better results, but we can also see on SSH you are able to receive worse results. For me, it's a standard deviation. And also compared to sensors, you can see that the results, our results, are really comparable to sensors. That these kind of setup works. Very interesting. Senses uses slash 23. We are losing a slash 24. This means we have S2, SOS IP addresses, and our results are also some minor better. But you have to keep in mind sensors is running much more longer as a service. They are scanning much more internet, and they have maybe a bigger blacklist. So what about Skeet? Very often people say, may I scan maximum speed? I can't don't recommend it. If you use the packet data they provided it, you can look it up. Then you can see that we are scanning with ScanSpeed at approximately 70 Ambit, taking 6 to 7 hours. And then you can see that if you scan from a single SOS IP like everybody can do in a time period of 2 hours, your data will be degraded by 10% to 30%. This means you will see, in the worst case, only 2 third of the whole internet. So scanning strategies, very interesting point. You see that imagine a new zero-day remote exploit over a zero-day arrives on Earth, right? And then you're maybe interested to scan the internet very repeatedly, not like every week, but like every day or every 10 hours or 20 hours, whatever. Then it's a very good idea to scan the complete internet once and then to re-skin only the BGP prefixes with at least one host in it. And this will save you additional 25% to 50% of the routed IP44 address space and scan time. And I call those BGP prefix hitlists. I've written a paper about it published on the internet measurement conference. If you're interested, please have a look into very interesting. So next problem was we are our own ISP. And suddenly the police arrived and said, ooh, you're doing malicious attempts. No, they do not ask us for doing malicious attempts. They said, who is the owner of this IP address? Who has used this IP address? And then I, because we are our own internet service provider, and then we receive it. And then we talked with them and said, now we are the owner of the IP address, and we have used it for the internet scans. And they got some kind of afraid because normally you are not allowed to get informed that there are some police investigations about you. But then we explained that we are a T security researcher, that we are doing good stuff, that we are providing the information, that we're informing people, and so on. And then for them it was OK. And we have never heard about it. But in the first moment, I have read this email. I got, well, it was really amazing. Hardware setup, very important if you want to do a lot of internet scans, also with protocol application, data enrichment on zone. The size of the protocol is quite different. And you can see it's already really big numbers. A single HTTPS scan is about like 700 gigabytes. HTTP is 300 gigabytes. And Ternot is only like 2 gigabytes of data. So therefore, we have bought a big server consisting of an MD-APEX 7551. So we bought two of these CPUs. We have one terabyte run. And we use 50% of the memory for the elastic search heap itself. And we use 50% just for caching. Then we have 40 terabyte of SSD data because the stuff that cannot sit in the RAM needs to be stored on the SSD because we need a lot of high IO interaction. And then we are running it on a rate null for just performance increasements. And for long-touch storage and also copying the data from the index that they're running on the rate null SSD, we have about 72 terabyte HDDD running on the right file. So our first problem was when we have gated our server, we started Htop. And we were not able to see our processes because we got so much CPUs. And they got hyperthreaded. It's like 121, 128 CPUs. And first, we need to investigate how to configure our Htop to see our processes. OK, next interesting point, elastic search. Also, some tweaking points. Most people do a really, really big mistake. They have maybe a lot of memory, and they're doing just only one elastic search node. But the problem is, if you have your node has allocates more than 26 gigabyte of RAM, then the Java virtual machine, they use 64-bit pointers instead of special compressed 32-bit pointers. Java does some pointer special magic. And the problem is, when they use 64-bit pointers, then each load of a pointer uses as much as twice of the bandwidth, because 62-bit are much bigger than 32-bit pointer, right? So this gave us, so we have divided elastic search into different search nodes. And each node has a maximum of 26 gigabyte per node. And this has given us an increase of approximately 30% to 30% unbelievable by just splitting the database into several processes in the end. If you're interested how Java is doing memory management and so on using elastic search, there's a very good article about it. Just have a look into it. So now let's go into the results. So what we see here, it's a Kibana. It's the front end of elastic search. And here we have interactive possibility to get a rough overview about the autonomous systems and the structure of the internet. At first, you can see that the most HTTP hosts here are hosted by Amazon. And then we can go see what is the next biggest autonomous system? OK, we see Akamai. Akamai is one of the biggest content delivery networks we have on the Earth. And then in the second ring, we can see the BGP prefixes. There you can see that Amazon has here a slash 12 and announces this area, also a slash 12. There we have a slash 13 and so on. And the next step on the third ring, we can see, OK, that is Amazon. We can see that it's this specific, announced BGP prefix, 3208 slash 12. And we can see how is the service distribution inside of this prefix. And that is the way how I understand the internet. The internet is not about nations and countries. It's only about autonomous systems, their prefixes, and the internal structure of the prefixes. Which kind of services do they use? And here, you can very good see that they use Apache, NJigs, and also some other servers. And you can see there's a big difference between Amazon and Akamai. Because here in Akamai, you can see every server in the prefix has the same server banner. Every time it's Akamai G host, you will get any information about which kind of web servers are they using. Is it their own web server, is it Apache or NJigs, whatever? They have just changed the banner. And here, you can see the difference between two autonomous systems. Akamai is content delivery provider. Amazon is more kind of a hoster, right? And that's why they have a much more heterogeneous structure. What I have now done is we clicked on Amazon and said, okay, let's look into Amazon. How is Amazon structured and doodle? And here, you can see the different prefixes. And again, the service distribution, you can get the rough overview. It's just about, it's a top 10 or top 12 prefixes. And now, we are adding the who is information. And now, it's getting very interesting. So we have, from the inner circles, the autonomous system. Then, we have the prefix. Now, we have the who is description and see who is prefix. And we can see that Amazon is structuring their network. They have a slash 12, and then they have inside a slash 26 standing in the who is database. And we can also see the customer. Here, it's actor incorporation, right? And then, you can go around and can see about the different Amazon customers and see how is Amazon structuring their internet. And then, you can look, okay, that's the BGP prefix, that's customer XYZ, that is prefix, who is prefix XYZ. And then, you can look how is the service distribution inside of the who is prefix. You are cutting the network in more slices and getting a good understanding about the structure of the complete autonomous system. Here, we can have found Zoom. Zoom is a very well-known video conference communication system. And here, we can see which kind of service they use. We will click on it to just say, okay, I'm interesting, how is Zoom that is hosted on Amazon? How is it structured? They are using a lot of services just called Zoom, but they also use some 1.37 service from NJX and also some NJX without any kind of version information in the end. So, and using autonomous system information, BGP prefixes and also who is information gives you a very good overview about the structure of autonomous system and about the internet itself. So, going next, here, another service that was backup. Now, here, a look into OVH. OVH is one of the third or one of the biggest European hosts that we have. And here, we can also see the BGP prefix. Oh, let me go to the presentation mode. So, here, we can see they use this slash 16 who is says it's for dedicated service and then we can see they use the slash 18 from the slash 16 for dedicated service. Somewhere else, we can see they use the slash 24 from another slash 16 for failover IPs. This means we get infrastructure information. How are organizing themself? That's maybe interesting for an attacker or for somebody else, right? So, and then we can also see which kind of services do they use. And we can see that all failover IPs have the same web server in the end. And what we can also do is which kind of web services does OVH at all use. And we can see mostly they use Apache or Android, Apache and so on. So, we can see here is a separate list sorted from top down by the major numbers. Very interesting. But then we have a complete another autonomous system. Here we have SSH, a complete another protocol and we can see that GoDaddy does not reveal any kind of information about the internal structure. All who is blocks are the same and they also have the same name like the autonomous system itself. And very interesting, they are using everywhere like on 90% of all their servers, they are using the same open SSH version. So, you can see that GoDaddy is a hoster, right? So, there is a web hoster. So, they are completely managing different on a different way than OVH or Amazon is doing, or HACMA. So, using this, we have looked for different customers. And if you look in the history description here, you can see you can integrate a lot of customers from Amazon. It's very interesting. You can see we have Cisco, you have Samsung, you have Zoom and so on. Next, we can also look how is Amazon structuring their EC2 infrastructure. That's all the cloud stuff and so on. We can see we have a lot of hosts in AVH Asia Pacific because they have a very big slash 16 there. And we can see that they use airport abbreviations for the locations of the EC2 clusters. So, we can see it's Dublin. Then we can see they're using CDG, which stands for child to go, it's France. Then we see FRA, it's Frankfurt, it's Germany, and so on. And that's very interesting. Using also the US information, you're getting a lot of infrastructure information about the autonomous systems and of companies, how are their structures, the network, and for what parts, and you know what they are doing with their subnets. Then, you can also do some other further interesting analyses. I have said, please provide me all Microsoft IAS 5.0 servers. Who knows what is the operating system of it? It's Windows 2000. And then I told my database, please provide me all the subject common names. So, we have a list of the subject common names and we have now a lot of domains that are running on a Microsoft IAS or Windows 2000 server, a very, very old system, like 20 years old. And then we can, mostly people say, okay, it's maybe some old service, they have forgotten it to shut it down, whatever. But then we can see here on the right side, it's the start date, the start validity date of the certificate. And there we can see that our servers, they are running for some stuff because all dated forgotten servers get not new SSL certificates, right? People are running Windows 2000 servers, 20 old software, not supported since 15 or 10 years, whatever, and providing them with new certificates. And that is unbelievable for me to see that people are doing it. And it's also awesome that these Windows systems are still running, right? Okay, now having looked at the dark side. What we also can do is that we can, we also doing some help lead analyzers. And I told my database, please provide me an overview about all governance agencies, whatever, in Austria, and the HUS database, and that contains a TLS certificate from Fortinet in the end. So we are looking for Fortinet Firewalls that are hardly vulnerable in Austria on governance agencies in the end. And we can see here that we have found a lot of, so it's German and it stands some like a bureau of District One, District Two, and whatever. And then we can see that the issuer organization keyword is Fortinet, so the issuer of the certificate is Fortinet. And on the right side, you can see very good the different versions of the firewalls. That was scanned eight from 2017. Then I scanned again at the internet in one year later, and now you can see they have replaced the Fortinet Firewalls with Sophos appliance, right? Very interesting what you can see using time differential analysis in the end. So then if you're scanning for industrial control systems, you can also see that that's a scan for a seven common modbus. You can see a very big distribution, and you can see there's a hard correlation for the economic point. So if you look at the former GDR region here, you have not that much industrial control systems. You have more in the South and the West. You have in North Italy a lot of industrial control systems, hard, big economic area in Italy. In South Italy, you have not that much industrial control systems. Same to Scotland compared to England. And it's very interesting that you can derive the economic power from the internet scans. Very interesting. And we can see it's a very big problem that over the world, people putting industrial control systems on the internet. Here are some numbers. So it makes sense as seven common stands for Siemens, and we can see Germany has the most number of Siemens industrial control systems also fully reasonable. So now next analysis. Here we can see, I said, I want to see for heartbeat vulnerable servers in April 2017 in Afghanistan. And here we can see on the autonomous system, the autonomous system is the Afghan telecom government communication network. And see who is description is, and that I have never seen and so detailed who is information. It's the government communication network, district communication, whatever, for the communication of 34 provinces into 357 districts in Kabul, Afghanistan. And now you're looking, who is the issuer of the certificate? You can see it's a PF-sense firewall. So you can, if you're doing these kind of analysis and putting data together with other data, you're very fast finding very interesting results. And I'm sure that the CIA is going on since some years, right? Okay, now we are getting to the real bad point. People are putting in the who is information, a lot of stuff. They also say, this is power plant XYZ. I can't believe it, people are doing that. And here we can see it's a bigger power plant, it's a coal-fired power plant. And here we have VPN endpoint via HTTPS. And this power plant has a capacity of approximately 500 megawatts, supplying like 1.5 million people. It's not a small power plant, right? So, and now we are looking, the scan date is from 2018, right? So, August 2018, and we can see the certificate got invalid on July. Okay, it's one month, it's okay, but in the end, it's a critical infrastructure with an invalid VPN endpoint certificate. Then I did the rescan, I reformed them, I emailed them, I contacted them, I got never a reply. And then I rescaned November, and still again, the certificate is out of debt. And now I did some new scans and I found that they have updated certificate in April 2019. And it's a fucking critical infrastructure, people should not care about it in the end. So, for me, I'm living in Germany, I'm not funny with that. So, now I will provide you another further, very interesting analysis. Another power plant, where is it? Perfectly. So, what we have here in the first is we are looking for Kraftwerk, that stands for power plant in German. And we can see, we have only here, the autonomous system name, the BGP prefix name, and the server banner. That is the information you can see on sensors or showdown because they do not use who is information. Now, we are, say, please enable who is information. And then we can see some very more interesting result in the end. So, let's enable the who is information, and then we can see, we have RVE, it's a very big German energy company, and the power plant called Niederhausen. Looking at Wikipedia, what is Niederhausen? Oh, it's just the second largest coal power plant in Germany. This time we have about like four megawatt of energy capacity in the end. So, now that's very interesting, but what is this for a service? Oh, very interesting. It's a Lancome 7111 VPN gateway. Lancome, hmm, for critical infrastructure, okay. Let's look. Ah, there is a list of services devices, right? Maybe it's the end of service device, maybe it's like one year old, I don't know. Let's have a look how old is this unsupported system, or since when is it not unsupported? And here you can see it's Lancome 711 device is unsupported since May 2010. It's a nine year old unsupported VPN gateway, and that is again from the mid of August. It's like one week old, right? So it's a reason we can see it on the internet life. So, very interesting. Thanks. So, now it's getting better, right? Let's look which kind of cipher suite is it supporting. So now let's go back again to the Lancome device and we can see oh, it's the secure cipher we have on the world, it's the TLSRC4 cipher, right? Oh, unbelievable, because it's so old system, it's not surprising, but in the end how? We have a critical law here in Germany, energy companies spending a lot of money. How could it possible there's a very, very 10 year old VPN gateway with a RC4 cipher on a critical infrastructure? I cannot get it, right? So, I have a lot of data. I could talk like 20, 30, further minutes, whatever, but it's my wife's sense I have to stop spending so much time on this data. Yeah, so now we are going to the summary and then I'm very, I want to discuss with you about the results, okay. So, summing it up, using raw data of internet scans is very important and you have to enrich with BGP who is in protocol specific information. This only gives you the power for complete infrastructure analyzers of the internet. There are a lot of, it's very important for you to understand the topology, the structure itself, and you can be very easy for you than in the end. If you do your own scans in data enrichment to identify all websites or vulnerabilities that maybe belong to a company, to a critical infrastructure or maybe to government agencies in the end. Furthermore, how to do this? Here is my small cheat sheet. Get your own AS versus LESH-22, this means like 1,024 IP addresses. Scan not faster than 70 Ambit, use once in, two in, makes no sense because you take like twice for your scan and you will send too much data and you will get more engaged with IDS systems. Scanning the internet in two hours from a single source IP, you will only see 10, you will only see like 70 to 90% of the internet so it's not a good idea. Or you need, you can do it but then you know that the data is not very reliable. You can use only, you should only scan BGP prefixes that are routed and that way you will save about 35 of your ZIN traffic and time and if you want to do very fast, repeatedly scans in case of a short remote executable vulnerability that is maybe released, you should use BGP prefix hit list meaning scanning the internet once and then only scanning the BGP prefixes and this will save you further 25 to 50% of the routed IP for for address space. Thank you very much. If you want to email me or whatever just email me and I'm living here on the camp, it's a garaffle and a big re-intent and if you're interested in scanning, please contact me because a lot of people, I know what scanning ones are self and they're not talking to each other and that's why I'm sitting here. I want to explain to you how to scan and I really would like to set up a kind of scan community in the USA they have a scanning community and here in Germany we won't. So please help me come around with a beer and then we can talk about how to scan and I have also some further interesting informations. Yeah, thank you very much. So thank you Johannes for a super interesting talk. We have some time for questions. So if you have any questions, please, we will have a microphone shortly but in the meantime we'll take a question from the internet. Yes, the internet wants to know whether it's possible to access the scan data or where you publish your scan results or your research results. Yeah, so in the end I'm not publishing the results results because this talk was aimed to explain you how you can set up your own infrastructure. So you don't need that much money like 30,000 euros for the server. You can adjust it because I want to scan a huge load of different protocols in the end. So this talk, the aim was to show you how to do it on your own but I want to share the information with penetration testers because they need this kind of information to detect the complete attack service or companies they are dealing with in the end. And it's for me it's to, in the end, I do not know what people would do with all these data and I don't want to be as the people as the guy who was publishing it to maybe to the bad guys and that's why I only want to provide it to some penetration testers. Okay, a question from the audience. Hello, Naif from Italy. So I'm doing a project focused on Italy and doing massive scanning and I come up with an interesting problem that you may already have figured out. Basically, those kind of scan are not aggressive vulnerability analysis are basically creating a software inventory from the service you can connect to and speak to. I came up to the problem of out of a scan I would like to detect the surface of vulnerabilities. So get out some kind of indicators that this service is vulnerable or not. And the very simplistic approach is to make a matching with a CV database or application version fingerprinting but I've been told by multiple person that the amount of a false positive is pretty high. Do you have any insight on that? Yeah, so you just need to look on especially Debian. Debian distribution is doing very often. If there is some kind for vulnerability version 111 of a certain web server, they will provide a page but they will not increase the minor number of the service. So it's only a potential vulnerability in the end. That's a big problem. And just the only way would be try to exploit it but you cannot do it as a researcher and there's bad guys are in a better position in the end. So we can only say what is a potential vulnerability or capacity in the end. Yeah, I'm totally agree with you. That's a big problem. And for this I have no solution in the end right now. Next question. Thanks for the talk. Have you thought about scanning IPv6 prefixes? Yeah, we have. We have some current also research project in it. Maybe next year. Thank you. Any more questions from the audience or from the internet? OK, looks like we're done. So again, a big round of applause to Janis. Thank you.