 a few minutes for questions from the audience, okay? Over to you Arjun. Go ahead. Hello. Good afternoon everyone. My name is Arjun. I'm a security engineer at Target Corporation and I'd like to thank the organizers for giving me a flash talk slot here for talking about my idea. I definitely appreciate that and so you guys must be wondering what's a security professional doing in a big data analytics platform. So a couple of disclaimers. I've primarily been in the infosec field for a long time and I'm just starting my journey into ML and data analytics recently. And second, I'm going to talk about one of the projects that I'm working on. It's called mud pipe, which is basically an acronym for malicious URL detection for phishing identification and prevention, right? So I'm just trying to figure out ways in which you can apply data and analytics and ML in an information security field to solve an infosec problem. And it's not like that there aren't commercial tools available out there to do that. We have blue code, we have virus total and things like that. But I'm a great fan of open source. I'm trying to develop something which is internal to the organization. Some of the drivers for this is basically three things, right? So phishing is basically the fraudulent practice of sending across emails which look legitimate, but which look to compromise the user information basically for identity theft or for credential theft or for stealing sensitive payment information. So basically, if you look at from an organization perspective, you have three drivers. One is from the user end user perspective, security operations teams and the business. So for the business, 91% of cyber attacks actually happen through phishing. So it's really important for business to protect the organization against that. So they're currently reliant on commercial software to do it. And for the security ops team, they spend a lot of time looking through phishing indicators and malicious URLs remains try and block it. So I took three approaches to kind of build a solution around it. One was based on a white listing approach. The second approach was more along writing rule based approach. And the third was the ML based approach. So if you look at the white listing and the rule based approach, I felt the little complex getting data was a challenge, getting keeping it updated was a challenge, which is why I kind of took the root of an ML based solution for solving this problem. And getting a data set is always a challenge. So I looked through some of the gatekeepers for some of the related projects. I looked through UCI data set and was able to come up with about a list of about 30 parameters which typically cyber cybersecurity operations teams use to determine whether a URL is malicious or not. So I included that as features in the model. And then I went ahead and did labeling on all those records I used about 10,000 records. And that and that was my basis for a building a supervised learning model on. And I just initially started off with the logic regression. The results were encouraging. And some of the features that you typically use is does it have like an IP address? What's a domain registration is a certificate expired? Is it a new one? What kind of HTML content it has? Does it submit data to a form, etc? Right? So some of the pros and cons that I observed was a process, of course, it's an inbuilt solution. And it provides more proactive approach to security. The cons I felt was the distilled opportunity to do a little bit of fine tuning on false positives and also keeping the rule sets updated. So to conclude, I would say that, you know, I'm still getting started on this journey. And it's a great opportunity for me to come and talk about this in a big data conference, because if I'm able to engage with people who have deeper expertise on the ML and big data side, I think this is definitely has a potential to be open sourced. Okay, Arjun, stay, stay, stay, don't go. It's your chance to questions, people who have done something like this, who tried something. Yeah, go ahead. Yeah, hi, this quite a innovative approach. So I wanted to ask, whatever you have done, can it can it be extended for something else? For example, identifying PII or PSI data, because that's also something similar, very white list or black list certain parameters? That's a very good question. So when you look at identifying PII data sensor data, right? Typically, we look at the domain of DLP solutions, data loss prevention solutions. And there, I feel a rule based approach would also work where you can have patterns and white listing based on that. But definitely, I think there's an opportunity for deploying an ML based solution there as well. But then the challenge there is again, getting a data set, right? So I've looked, I've I've actually did a little bit of research on that. Getting a data set is a challenge. So I'm right on the process of creating more of a data set. So but that's really a good thing. I mean, if you have any thoughts on that, we can discuss from that further. Excellent. One more question. If anyone has we have time. Okay. Yeah, okay. Well, usual applause for Arjun. Thanks, everyone. Yeah. Okay, next, Vishal from Freshworks. Yeah. His topic is another thing insights from sales conversations. So you have