 Okay, we're good. Excellent. Our next talk is being helped by Fabian Mihailovic. He's gonna tell us about vulnerable web applications. So without further ado, please welcome Fabian. So, hi everybody. The topic today is don't scan, just ask. Basically, a new approach of identifying web applications, network reconnaissance and actually exploiting them. Therefore, we developed a tool called the SpiderPic, a colleague of me, Richard and me. And basically, the tool tries to model business relationships of a company and identify web applications in order to attack them and reduce some of the problems we have with classical network reconnaissance. The table of contents basically is to give an overview of what we do today, what restrictions that does have, what enhancements could be made, then a presentation of the tool we developed. Sadly, we won't release it due to German law and stuff, but we will explain you how it works and you will be able to implement yourself pretty easily, I hope. And then I will show you some statistics of a real-life scenario, how it could look like, how it works in practice. And at the end, we do have the conclusion. So what do we have at the moment when it comes to classical network reconnaissance? Basically, you do have the database, you know, Nick write databases. And if you want to attack a certain company, you can look up IP ranges in that databases. You can have a look there, you can see information on the domains, which networks. Of course, we do have reverse DNS. We can try to resolve additional host names. If we have an IP of the company and hence maybe access some web applications and stuff. If you do have access to a DNS server of the company, which is misconfigured, you might have luck and perform a DNA zone transfer to identify new targets as well. Then we have those classical information, just Google for it, search forums, search social networks. Or, for example, you could send packets to the systems, analyze how it is routed, which hops it takes and hence maybe identify new routers, new components, which might be hosted in additional network segments and hence identify new networks as well. Or you could brute force DNS names and all that kind of stuff. So that's nothing new. That's been there for years. We used it to identify targets. If we do have a company in scope and you do have tools like standard Linux tools, who is Dick, all that kind of stuff. Maltigo, which is a pretty nice tool and sums up all the tasks and one simple to use tool in which you can perform all the tasks with just a few clicks. However, that's not like business works in reality. You could say we have one target, let's just call it company A with some nice web service, databases whatsoever. And you do have satellite companies, which are connected to that company that might be branch offices, some partner firms from acquisitions and stuff. And you might have service providers, sure, like payment providers, which take your payments. You might have separate web hostels, because you don't host everything yourself. There might be service standards that take your way or manage your warehouse and stuff. So that's how reality looks. You have one target, but you have many satellite firms connected service providers. So you do have a whole measure. In reality, what we see with network reconnaissance is basically we concentrate on that target. We look up net blocks, IP addresses, URLs. That's basically what we work with. And if we do that for a company and we know what we look for, because we look for a company A, we hopefully might get all that information. For the other companies connected to that one, we hopefully might identify most of the information as well. But as long as you don't know which service providers and other companies are connected to the whole context, you probably won't see that whole part. You just will see this and maybe some of those, but you won't see all service providers and company connected to that one in scope. So that brings us to the restrictions we do have at the moment. The scans are solely based on a technical level. We really concentrate on IPs. We concentrate on net blocks, URLs and stuff. We don't take any business relationships into account. And actually, we might get an incomplete view because you can only search for in the databases and stuff what you know. If you know the company A has named A, you just can search for A, but you probably will never use or look up service provider B. So not all targets are identified because if you are an attacker, you probably don't go just for one company in scope if you can attack company B which provides you with the same result. If you're interested, let's say the classical example credit card data and it is hosted on company B as well and company B is an easier target. So just attack company B, why attack company A? So, yeah, furthermore, we don't take any business relationships into account. And the enhancement might be to build a logical network of systems that are somehow connected together independent of the actual owner of the system. Just based on the goal to get your job done to reach the goal you have. And therefore, actually, we developed a tool we call spider pick. And that's what the talk is about. What I want to show you what could be an approach of going new ways. So spider pick now tries to identify web applications in an automated manner and tries to do that on a business level on just a keyword level. And we structured it in four steps. In the first step, you basically you model the business, then you perform certain search queries, you calculate and rank results and at the end you exploit them. During the modeling the business phase, we tried to model services and relationships of the company in scope. And therefore, we basically will just use a keyword list, a rated one with details like text numbers, imprints, product names and all that kind of stuff. The interesting part here is that the whole basis we use for the search from now on consists only of or can consist only of usual terms of the company. You don't have to provide any technical details at all. Then we perform the search where we query search engines in an automated manner. We sum up the results, we perform who is queries, check them as well. We crawl identified web pages and check for external links to new web pages. And at the end, we sum it all up, remove duplicates, build a sorted list in which we score how probable the web page is part of the company in focus. And at the end, we pass them to a exploitation framework and hopefully get access to the system or some nice vulnerabilities out of it. So let's dig into the steps and to the four steps. The ideas, web applications connected to a company probably contain stuff like an imprint, independent of the host or service provider. They probably contain terms like product names and all that kind of stuff. And we do have keywords that are bad, like product names. Probably you will end up with Emma soon which hosts hundreds, thousands of those products. And you'd have terms which are very specific. For example, a text number. Not every web page will contain a text number of company A. Probably it will just be on the web page of them. And they're impressive. So what we did is quite simple. We just assembled a list as a first step in which you just have the business terms and you sign a score. And you do have significant terms and non-significant ones. Like company names rated very low. And for example, a certain imprint or text number gets quite a high score in practice. It shows that with 200 terms, you already can work pretty well. And that a scoring from one to 100 should be sufficient. So the next step actually is the main part. What happens now is that we take the keyword list and you go to different search engines, Google, Bing, Yahoo. It's basically up to you. And you start to search for the keywords. And interpret the results and build up a result set. Basically, we use that table, whereby we use the full qualified domain name as identifier, because that's actually a certain target. And if you use that as identify, you already can remove duplicates. And if you search for a term, let's say you searched for the company imprint and you identified it right here. We add that row and assign a score of nine, which was assigned to that certain item. And if you search for a product name and you end up with the same FQDN again, within a new page, we add the new score again up to that one. And so the score increases. And actually, based on the keywords, which are referenced to a certain FQDN, the score is increased. And the more relevant keywords, the website takes the higher the score gets. So the next step then is that we resolve the IP and perform who is queries for the FQDN and the IP. And if the who is information now contains terms of the list as well, or maybe even the company name, then it's definitely a hit. And we can already mark it as hit, because if who is information contains the company name, it's probably hosted by them. And then you can add additional steps. For example, I wrote a module. If you have a company, you know, just operates within European market, you can completely add a filter and just take systems into account which are hosted within Europe. So you could easily add additional steps to fit to your business. But when we now come to to implementation, as I said, it's pretty easy. Mainly the main component can be written within 500 lines of pearl code. That's that's what we did. Because it was quite simple and straightforward. We integrated Google and Bing as search engines. We didn't even use a database. We just took all the results in the memory and wrote dumps to the disk from time to time because it runs quite a while. And if it crashes or goes down for some reason, you want to lose don't want to lose all the results. And query Nick and write databases is something you should take into account. Every database provides you data within different formats. So you have to keep this in mind when parsing the data. And you have to keep in mind that you can't query as fast as you would like to. Because the service will block you. So you have to decide whether you rotate them or whether you throttle your queries. However, as you do get 100,000 or 200,000 of result. And if you throttle just by one second, you will end up with days or even weeks to tool running. So you should try to rotate service. When it comes to the search engines for pearl, at least you do have many modules. Sadly, none of them did really work. Many of them were quite deprecated than the old ones. But using curl and regular expressions, you can easily implement the search engines yourself. Google, for example, provides the custom search API that can be used. You can do 1000 queries per day for free. However, that won't be sufficient for a real search. So if you want to use the service efficiently, you have to buy a search key, which is charged on the amount of search queries to execute. And for a normal list with 250 keywords, it will result in around 200 dollars. One important thing you have to consider when implementing is that you consider different languages. If you search for Germany, you will get other results than if you search in Spain. So you have to search in different languages to get all your results you want. And you have to disable filters, not that Google just removed some of the results. Same for Bing. You have the Bing API, which needs an app ID in order to use it, but you can request it at Bing developer center for free. You can use Bing as search engine completely for free. You just have to throttle your queries because, yeah. Here we don't have languages. We do have markets, but it's basically the same. You have to consider different markets. You will get different results here as well. And you have to disable the adult filter as well, because otherwise some results might get removed. And then you can easily just use those URLs. It's pretty straightforward. Just use Curl and resolve those domains. You can put your query in here. You get the number of results per page. You can use your pointer where in the results set you want to be. You define the language, which basically is that one. And that kind of stuff. So same for Bing. It can be integrated, both of them pretty, pretty easily. Yeah. The next step is now we have got a scored list with the search results. And we can assume that hopefully if the stuff works in front of the list, we will have web pages linked to the company. And web pages that are connected to a company probably contain links to new company web pages, contain external links. So what we can do right now is that we take a result set, the top of the list, and we just crawl it. And that way we will be able to identify new web pages, and which can just append them to the list as well. Furthermore, if we crawl those pages, we already do perform the first part of the vulnerability assessment later on, because now we can get all the formula and URLs and form fields which are on the page and can store them in the database. So we already did the crawling part. Here, in order to do this, we used a framework called Peter. Right here, I have to give credits. The framework is a private framework developed in multiple years by different security consultants working at diverse companies. And we just made a few slight modifications to that tool in order to integrate SpiderPik into it. So that's won't be released as well and has not been written completely by me. But basically, the crawler is written in Perl, go through the site in order to perform the first part of the VA and external links and it uses that database structure. You can see for each website, you get with the external URLs, for each internal URL, it gets the content, it gets the forms on that page, which are interesting for the actual attacks later on. It gets the form fields assigned to the forms, it gets HTML comments, it gets emails address and all the kind of stuff. So with that information, you can already do pretty much stuff like you could check out which webpage takes credit card data, which webpage contains the log in, which page uses which technology, how much attack surface, how much forms does the webpage have? You already get all those information. What happens right now is that now we are at a point where we search for pages, we got them, we crawled them, we appended the additional results to the list, now we have a score list. If you use a graph to print it, it's not scientific at all, it's basically a trial and error with some assumptions, that's the graph. I did for the real life scenario you will see later on, I guess it was like 130,000 results and that's basically the number assigned to the results and that's the score and you can see that it has that function. So we do have very much pages that we identified which have one or two score points, so they just contained one term like maybe a product and they, which as can say they are probably irrelevant, we won't take them into account anymore, then we do have pages with a very high score that's probably Amazon or some kind of pages which take all the links and search terms you look for and hence in every search query you find those results, so you can cut down to that area which you take a look at and you can see it's like only 300, so out of 100 or 200,000 results we can cut it down to 300 results that actually look promising with a score which lies between a very high one and a very low one and we have to take a look at those ones manually. That's besides assembling the keyword list, the only step which now is manual, so you have to take that list and click on each page but for 300 pages that can be performed within minutes it doesn't take too long. So and now we have targets and we have the information we got from our crawler and the database, so we do have the targets and we do know the forum fields and all the dynamic parts of the web application. So what we can do now is and again that's Peter hasn't completely written by me it's part of other people's work but we integrated Peter to run it against those web pages. There are fuzzers for cross-site scripting, SQL injection, so they are run against the information on forums stored in the database. We start a web application, fingerprint automatically which gets the MD5 hash of the websites and checks whether it's a known product like WordPress or whatsoever. We do have cookies, session analyzes, we start as else as we fire up third party tools like SQL map and map Nikto and run it against them and at the end aggregate the results into one report. That's the interesting part right here. At the end we get a report with vulnerabilities out of keywords. At the beginning we never provided any IP or any domain we just put in business terms, nothing technical and at the end we get actual vulnerabilities out of it so that's pretty cool. If you would run that against a big known company we might get the following results. We used 287 keywords, we ran it seven days including the actual vulnerability assessment. I mean it could be optimized but I think it's okay, you can fire it up and it runs. The costs were 200 dollars but just due to Google basically if you don't integrate Google if you just use Bing or maybe Yahoo I don't know whether Yahoo is for free but if you integrate it you basically can go out with zero dollars and we identified 150,000 unique FQDNs and that's the point of the tool basically. It's not rocket science what we do here but you can't do it manually you can't perform all those searches and check everything so that's the idea of the tool to basically take a lot of work from you. I applied a filter since the country just operates within European Union and just filtered out all systems not host within the European Union and we came up with those and we took you can see right here that were the 300 systems that I took and checked manually and at the end I found 223 applications connected to that company and not just on company systems but on external systems like customer services, marketing campaigns for which marketing agencies created their own pages and stuff and what shows it works pretty well is that the official web pages were within the top 10 of the different countries so it shows we go into the right direction and vulnerabilities yes yeah we identified some yeah so that brings me to the conclusion at the end classical network reconnaissance is good but it doesn't take all the specs into account that the moment we operate on a very technical level and we should elevate it to a more business level and try to really check for how the network the company has within its business looks like and try to attack it and that's what spiderpick was developed for thereby you should note that really based on a keyword list I say it again but based on a keyword list you get actual vulnerabilities and besides assembling the keyword list and besides yeah checking the result set you don't have to do anything everything is completely automatic furthermore spiderpick provides you with some interesting information and for example how much results does google find and how much does being find google didn't find how they correlate and stuff so it's pretty interesting to see how search engines and stuff works so which brings me to the conclusion that companies should not only focus on protecting their own systems because as a attacker you will attack or go the most easiest way so you should choose your service providers with caution and you should try not to spread sensitive information across multiple systems if possible so I have to give credits to Richard who developed the idea and tool along with me and now it's up to you for questions and thank you very much for your interest thanks very much I think we do have time for one or two short questions can you please put your hand up if you have a okay you go there I'll get this one put your hand up yeah hello um did you will you try to to um in a future version to add a module that um accesses existing databases like business registries and stuff like that to map out the companies if you have a specific target in mind for business intelligence purposes or stuff like that yeah that's that's that's a very good idea I never came up with it what we did actually to develop the keyword list is that we did it manually we looked up text numbers on official web pages we looked up product names in the assembly but that's indeed a very good idea you could even automate then could just say the company name is a.com or yeah and then completely yeah that's indeed that's a very very good idea I will consider that for for the next version cool I would just like to ask everyone who's leaving the room now to please do it quietly and move out of the front door so we can let new people in from the back we do have another question back there please put your hand up again no first row right here right here okay we'll get to you we do have another question from the IRC in the meantime I'll come to you net hunter 80 asked if the the company was tested if it was Sony no comment no comment there okay um do you intend to provide this even if you can't release the tool do you intend to provide it as a service sorry uh even if you're unable to release the tool are you going to provide as a service um I haven't thought about that yet it might be possible yes to to actually implement a web page where you can upload your keyword list and at the end you might get the results that yeah that that might be possible but I didn't think about that but yeah I will take it in mind but at the moment nothing like that is planned okay that's all the time we got unfortunately thank you again Fabian again if you