 We had this lightning talk available. The speaker didn't turn up. He had some issues. So, we asked bunch of people but then no one turned up. So, we are... So, Kunal will be speaking about a tool called DataSplite. It's an open source intelligence framework. Okay, so, over to you Kunal. Hello. Good evening guys. Audible? Yeah. Hello. Yeah. What is it? Yeah. Okay. Hello. Peter? Yeah. Like this. Yeah. So, here I'm to present DataSplite, an open source intelligence tool. So, we are the main three contributors to DataSplite. Right? So, what is DataSplite? That's going to make noise on a record. Okay. Okay. Better now. Okay. So, it's an automated OSIN framework. So, basically we have four major modules called domain, email, username and IP. So, you feed in any of these things like domain, test.com, email ID and IP address or username. And then it fetches these information from a lot of data sources that we have configured behind all the modules. It's completely written in Python and currently it runs either on a console or you can export the output that's there on a JSON format. So, one of the major things for... The major use case for this would be a pen test probably because let's say for a domain you get a list of subdomains. You might be interested in finding which of those domains might be vulnerable. Right? So, why DataSplite? So, we have a lot of information that's available open source. Something like username, passwords that might be accidentally posted to GitHub. Then address, email ID, phone numbers, credentials. All this information might be available accidentally or on purpose sometimes. Right? So, real time. So, as you can see these are some screenshots of how the information is available. If you do a Google search for, let's say a site-paste pen, you'll find some database leaks, you know, credentials that someone's done and then posted them online. Then you can either do an index of search, you know, use those Google search terms to find open indexes of different websites. Then finding credentials on GitHub. So, finding credentials on GitHub that have been, you know, accidentally posted. Then, so, as I mentioned, we have four main components called domain OSINT. So, these are the basically the four main scripts that you use to actually trigger the various data sources. Domain OSINT, email OSINT, IP OSINT and username OSINT. Then, the work in progress for us is, you know, company scoping basically, you know, making a whole profile for a company. Active modules, actually passing that information to active attack modules. Then CV, data aggregation, the CV, IDs that you get from modules and then auto-generated reports. So, as I mentioned, we have multiple data sources. So, as you can see from domain, we have about these about 10-15 odd data sources. So, this is how the data flows. From domain, you can get a lot of email IDs from the email harvester module. That gets passed to email module that again can fetch a lot of information from these data sources. So, it gets, let's say, enumerated usernames. So, from those usernames, you can pass it down to get details, profile pictures, hashtags, the conversation that the user has been posting. And similarly, from a domain, you can get IP addresses, which you can actually pass down to find out which all ports are open and the services that are accessible. So, how do you set it up? So, we have two modes. One is a manual mode, which includes a Git clone, which is used to set up a standalone mode as a standalone tool in itself. And the second way, so, how do you do it? You do a Git clone, install the requirements. There's a config.py that holds all your API keys because these data sources that we have configured, most of them might require an API key for you to configure to actually use them. Then, we have a structured, structured, you know, folders basically holding each of your scripts. So, as you can see on the right, we have folders called domain, email, IP, and then username. So, the domain has scripts specific to the domain OSINT. And then, we also support PIP recently. So, it can be installed as a Python package and be used as a library in your projects. Then, we also have a Docker implementation. You just do a Docker pull and you have data splurter running. Then, how do you write modules? It's been recently converted to a framework. So, there's a single file that you call a template.py in each folder. It has three major functions, a banner, a main, and an output. So, banner displays some text about the module. Main does the main logic behind, you know, the module. Some, you know, the processing that needs to be done. And finally, the data returned from the main is actually what gets passed to the output for printing on console in a, you know, human readable way. Then, so, we have a very nice documentation, a very step-by-step documentation on how to set things up in both manual and automated ways. And then, how to write in your own modules. And then, the work in progress. Currently, we support JSON. We are planning to support HTML reports and e-mail text files that can be, you know, passed to another tools, like e-mails and subdomain lists. Okay, so, the demo. Just to add to that, the documentation also has step-by-step, you know, guidelines on how to generate the API keys, how to sign up for the different services that we use here. Yeah. It tells you exactly how, which things you have to copy and put back into quantum.py. So, basically, the sites you need to visit, how to sign up for them, where to get an API key and paste it in the config.py. Yeah. So, so, as I... Domain. Domain. No, just fine. Okay. So, let's say we have an IP address. You just pass it to the IP OS in framework. You can see it first searches for Shodan. So, Shodan has no information, then virus total, whether or not this IP address was listed as a, you know, compromise domain, and then some who has information to get the ASN, the Autonomous System number to actually find out more related IP ranges for this. So, before we go ahead, like, suppose if you have got an e-mail address, okay, you're trying to see someone and you've got an e-mail address, what kind of information would you like to collect about them? Any pointers? Yeah. Anything else you would like to, like, find out about a person? You've got an e-mail ID of a person. Yeah. Yeah. Yeah. Good. Sorry? What was that? True name. True name. Good. Exactly. Yeah. So, what we're trying to do with Datasploit is, you know, it's no rocket science. We're just the things which we'll do manually, going on one site and finding out information. We're just collecting bunch of such sites, calling those APIs, getting it sorted, like, it goes in an automated manner. There is a difference which Datasploit brings in comparison to other tools, because a lot of other tools, you know, they... Nice. That's funny. Yeah. So, a lot of other tools, you know, they are really good, but one thing which I really miss in other tools is, like, they do not have an automated approach. You have to manually pick and use things. Sometimes you don't really know what modules you should use. I mean, for a person who do not do ocean on a daily basis, it becomes very difficult. Yeah. So, that's the problem we're trying to solve here. We're trying to do everything in an automated manner. So, even if you do not know anything about ocean, just configure your config.py, and it will do the things for you. Yeah. So, it gathers the relevant information, displays them all, you know, very creative format. So, as he mentioned, so let's say you get a domain test.com, and you need to find out what all information is available. We have all this information. Punk, spider, the DNS records that are there. The email address harvested related to that domain. These email addresses you might be interested. So, let's say you find the combination of first name, last name. So, you can figure out what kind of format the organization uses to actually name their employees. Then you can, you know, actually brute force them. So, after this, then you get... So, basically, where all this domain has been mentioned on GitHub. So, let's say files where, you know, they might be username passwords or AWS keys, let's say, suppose. So, you can see the files, the specific file where this domain name was seen. And then some page links, then the paste bin. So, as you can see, for test.com, you can have something like an email ID and then a password. So, probably this email ID password will mostly work on different websites that you find related to this email ID. So, finding things on GitHub and Pastebin is something which I always do in my pen test, you know? So, whenever I'm doing a pen test, I come across any domain name. I always make sure that I search these things on GitHub and Pastebin. You never know. Sometimes, you might directly get a DB connection string or you might get a domain credential. So, you don't even need to explore the vulnerability. It gets pretty easy for me. So, yeah. Then, so that domain then gets searched on Shodan for, you know, the ports that are open, as you can see. So, you might find something like open... an SSH open on an IP address related to test.com. So, that might be exploitable. And then all that information, AD443, what services are running, what's the geolocation for that. And then, so that's like a lot of information. Then, the subdomains that are there. As I mentioned, this might be interesting for a pen tester guy to actually find subdomains that are there and might be exploitable. Then... So, not only pen tester, but even if you're running a company, you are managing something for a company. You may check your parameter. Like, sometimes, what happens is, there are some legacy subdomains like abc.xpig.com, and you are not aware of them. And so, most of the legacy code is very much vulnerable to things. And it gets very easy to pop up a shell out of it. You know, so it's probably worth the while checking all the domains which are lying out. Yes. And then, we also saw that domain for any WikiLeaks documents that are related to that domain. Then, ZoomI, similar to Shodan, gets all that IP-related information, the services that are running, the header information that gets returned to those IPs. Then, the next module we have is e-mail... It's a wrong one. Pause the username. Yeah. So, let's say a username. You get a username. So, Keybase is a very good source of information where, you know, a user actually has to verify his own accounts. So, the information you get here is actually the profiles that it belongs to the same user. So, you won't find data that is mismatching or, let's say, false positives, right? So, here, this might be a very good information to find out all that accounts that are listed for this user. So, let's say you found this username previously on, let's say, a paste bin and has a username password combination in a paste bin document. You can go here on these sites and try that out. So, obviously, that, you know, opens up a very good opportunity for you. And another thing which this could be used is, like, a lot of people, you know, use devices and they leave their device name as they are. So, what happens is you can actually use those device names to launch a phishing campaign or, you know, trick the users, you know, to do something stupid. So, as I mentioned, we also allow you to... Sorry, yeah, question. So, it's like we're not grabbing from anywhere else. The person generally verifies their devices on Keybase. So, the Keybase have an open API. You don't even need authentication for that. So, it's pretty easy to get it out. Sure. Yeah. Okay, so, yeah. Yeah. Yeah, that's a valid thing. I mean, so, as of now, we are completely relying on information. That's... Which exists online. As of now, no. As of now, no. Because, you know, we thought of doing that, but it becomes really tricky to... Because we will have to do a lot of correlation. We have to do match images and stuff. Yeah. We... Yeah. Yeah, right now, we are making the best effort, but we are trying to work on those things. Yeah. Exactly. So, if you... Yeah. We will be coming on the roadmap. Those things are in the roadmap. So, you can schedule... You can schedule the things. Right now. Okay. No, we don't have. As of now. So, as I mentioned, we also support Datasploit as a library. So, let's say if you do a simple import Datasploit, you have it available to use in your own Python project. And let's say if you do a Datasploit, I'll just show a quick demo for it. The username. That's our main module for... Oh, sorry. User name. And then, let's say, the module that's in there is username underscore git scrape. Right. And then, we have a main function for each module that needs to be implemented. And then, you pass in the actual username. Let's say I'll pass in my colleagues. So, user typo. User... What's happened to your keyboard? I don't know. He messed it up. He just messed it up. Okay. So, he hasn't defined any API keys for it, but let's say if he had, you'll have that data returned to you in a native Python object, probably a dictionary, a list, whatever, you know, is relevant to that module. So, for git scrape, you'll probably find the repositories that the user have, and in those, you'll find the commits. So, you can find out which all are the most active, you know, repositories that the user contributes to, and probably in that, you can find something, you know, similar like a password username combination that might be there. Yeah. And then, the last one might be... So, similarly... So, rather than, you know, using the consolidated script called domain oscent or even email oscent, the individual modules themselves are callable. You can just say a Python domain, then the specific module that you need to call domain subdomains.py, and then just pass in the domain name that's there, you'll find all the specific subdomains that are there. Okay. So, for subdomain, right now, we are using DNS dumpster, we are using NetCraft, and we are using certificate transparency from Google. And we plan to introduce DNS walking right later on, but that's in the plan. Yeah, yeah. So, most of the APIs might have a throttling there. So, they obviously return, you know, rate limit exceeded, something like that. And here's this one script called emailoscent.py. Is that... No? Okay, I'm sorry. Sorry, was that your... So, we can't, but obviously we can, you know, get in touch with the, you know, specific... Yeah, exactly. So, about the emailoscent, this is what we do. Yeah. So, the very first thing, it will check for some basic things. For example, if you are running a phishing campaign, it will tell you whether it's a disposable email or not, whether this domain can receive emails or not. So, if you are trying to send emails, it will tell you those things. It will tell you search information from Clearbit. It will tell you a bunch of information about the person, the name, the tentative location, or the profile, or whatever things. So, as mentioned by you, for an email you get the true name that's there, the user names that are there. Then it goes to full contact. We are not relying on just one source. There are another source called full contact. We're using them. It tells you the name. It tells you the designation of the person, social profiles. We have got like 500 profiles which we check for, but we are trying to get more. And then Twitter, and it will download all the images from these respective places. What you can do is you can use these images to do Google reverse image search and stuff. So you can get more information. Then it goes and check on have I been pawned, whether this email ID was a part of a compromise or not. So as you see, my email ID was part of three compromises. So what you can do is you can go back to, you know, other places where you can get these database terms. You might get correct thread hashes or passwords. And then there might be chances that I'm using the same password somewhere else. So it gets an easy win. It checks for a page on other places. Right now my email is expired. So yeah. So that's probably what you were asking, right? Slide shares. When API send us a forbidden. Slides, any slides which I have published and associated subdomains for me. So this is the information which you are getting from email ID. I think that's it. That's what we have for the demonstration. We have Twitter account, World Data Despite, where we post in tweets and announcement about this. Yeah. The same feed goes into Facebook also. This is a small roadmap which we think of. We plan to, we are not lazy. We'll try to get this coded. It's like ad hoc tool. So you run it as in when you want. So probably you want to integrate it with automated tools. So you want, let's say, incremental change that's happening to find out the difference between a previous run and a next run, right? Yeah. Yeah. Yeah, we can. I mean, we can, what we can do, I mean, there's no point writing the same code for Maltico because anywhere we are writing in our tool. What we can do is we can write code which, which generates an output which could be directly used by Maltico. I mean, that, that will be a better way. Otherwise there's no point in writing the feed, right? Right. So yeah, we can, we can work on that lines. So far the tool has been covered on multiple places. But I don't want to brag about it right now. Yeah. What do you, how the, the point what, what I raised, what, what I wanted to raise is how you can contribute. You can test the tool. You can write blogs about tool. You can promote the tool. So now it's, since it's a framework, we encourage you to actually go through the framework that's there and actually implement the function. So, you know, write modules, send a support request. We'll try to integrate it there. Yeah. So we are the core contributors. I'm Shubham. So now I'm Shubh Kunal and Newton is missing reason in the other room. And if you've got any questions. Yeah. Please show, shoot. That's it. Thanks.