 Hey there welcome everyone We are kicking off the Q&A session here with IOOB who spoke about only takes a spark popping a shell on 1,000 nodes, which was pretty awesome pretty cool a good way of thinking about how to how to scale up everything And you know not just hacking one system, but how can you use that to really spread out? So really really I heard this is also your first time speaking at Defcon. Is that correct? Exactly yeah So so we have actually going to give it yeah go ahead No, yeah, so I was gonna say we have a tradition here at Defcon Which we call shoot the noob and so really it's just the tradition of you know Hey taking a shot on stage for you know your first thing it doesn't it's always liquor, right? So people mistake that but really just want to do that with you now So I want to welcome you in the traditional Defcon fashion. So cheers Oh boy, okay. Okay. Oh, you got a poor first Wait for it. Oh, I got a poor it first. Yeah, you had the pride. There you go. Okay. There we go Cheers awesome So now that you've been officially indoctrinated to the Defcon and Defcon safe mode Yeah, so I One of the first things questions I'd ask right so right you recorded the talk a little bit ago What have you been working on since anything new that you've kind of come up with or you know anything you found anything You know that's changed since the talk was given it has nothing to do with spark actually I was working on spark like a few months ago now But since then I completely switched subjects and topics. I've been working on AWS a little bit I did actually a tool that reflectively loads DLLs and executables in memory But I wrote it using goal line. So I completely abandoned that area of research and they completely switch over to something completely different. So We'll see what comes up next. Good for you. That's a pretty good way to go is You get some recognition for one thing and then you move off to something totally cool totally new. It's excellent Yeah, we're like this idea of discovering new stuff Excellent, so is that a thing you find yourself doing often in your own research is that you want to get a nice Breath of experience and a lot of different things or how deep do you go before you? Feel like you're comfortable with what you've learned I think I can basically I use My idea is go as deep as you start to understand how it works. Basically. I Before I was looking into mainframes and that took me like I think two years of going through these 900 page documents published by IBM and these obscure systems and obscure Forums actually talking about it. So I Did the same thing with mainframes and then I'll let it go and I switched over to other areas and During that work, I think spark came up and I was like the hell is this thing And I dug into it and I found there was not much research was being done on it and I thought well, you know It's actually they didn't do it and that's how it all started really but it's just like Going after the next shiny thing trying to understand how it works and what's understand how it works you try to bend its rules to do whatever you want with it and hopefully Write a tool about it give a talk about it and then move on to the next stuff How deep just enough to understand it really So what was it about spark really other than the fact that it was new and there wasn't other research I mean there What was it that really dragged the end of this one because it's it's an interesting system as you were showed us in your talk So What was the the thing that I made you decide? Okay. Fine. This is where I'm going to spend the next year of my life Right so I was working on the offensive side and then I switched to the blue side And I was helping a company secure their systems, etc. And they were all on these new Shiny platforms if you will everything was an AWS multi region CICD. I mean they didn't click a single button They pushed everything to code and everything was deployed and scaled and stuff very very sexy stuff uh, so I was you know trying to help them secure that stuff and After like six or seven months, uh, we thought we did a pretty good job Locking pretty much everything that was supposed to be locked. There was no windows by the way So that's why we came to this state. But anyway Um, and then I was talking with this data scientist just trying to understand what they do Uh, and he talked talked about something mentioned something about spark And I was like, what is that and he told me oh, it's something that we use to make calculations and parse data I mean, what do you mean parse data? How many machines do you have and he's like, oh, actually it's like I have 200 machines like what do you mean you have it's like, oh, I spawn the cluster of 200 machines and that guy over there Spawn the cluster of 500 machines, etc, etc So basically the company had like thousands of machines running sporadically if you will but yeah, still and I was like, okay How do you Talk me through it, please and he showed me how he launched the job and how he made some calculations and I was like Oh my god And that's when where the all interest spark came in basically That's awesome. That's a great place to start. Yeah, you start seeing oh a couple of commands and all of a sudden there's This many machines doing something I want to know that too. Yeah, he was writing python code on a jupyter notebook Uh, where you know, there was an authentication obviously Uh, and he was writing code and it executed over multiple machines and I was like Dude, this is amazing And so I dug into it and yeah, that's what all sounded Yeah, all right. So it's a pretty specific setup that you are talking about in your in your presentation How common is that is the setup that that you're speaking about? And how many other ways are there to configure this that might have more research opportunities? Well, the thing is basically Spark cluster can be set up in very different ways The cluster manager so so the basically the process the component that is responsible for Linking applications to workers that one can be replaceable. I could put whatever not whatever you want But you can put many different other components in my talk. I only Briefly demonstrated when it's the spark when it's spark itself that is Doing the orchestration But you can have other setups where it's actually yarn a product from a new framework that is doing all this orchestration And I think in 3.0 you can even have kubernetes doing the work So you have all your pods coming up to do all the work. So it's much more much sexier and the the one that I showed The one that bypasses authentication That one only works on spark standalone mode. So when it's actually spark that's doing all the stuff If you're if you're having yarn in front, it's a completely different story. It's Completely different different protocol. It's been going on At some Hadoop shit. That's too much That's completely different beast It listens on a different port etc. Like the tool that I released sparky handles it, but I didn't you know Go much deeper into it. So there's definitely some area of research there to go to go into And that's the default mode when you're using aws managed servers called e mr So if you're using e mr and you spawn a cluster using their service, it will by default use this yarn mode So it will spin up yarn and then the work will be done by spark question by spark worker, sorry Um, how much is its widespread and like the ratio? The from the studies that I saw it was around 50 50 maybe 40 60 dependent on which website you see but um, yeah Basically the more traditional websites that had the companies that had Hadoop Before will stick to yarn Whereas the new ones that will that have the luxury of starting a cluster of data miner from scratch will Maybe opt more for the spark standalone cluster Okay, that makes sense and 50 50 means that there's there's stuff out there both More to study elsewhere and more to try and hit with the the research that you've done. So interesting Yeah, I think so and it's a very I would say that look here's the thing The id infasi community is so much focused on windows. I find That No, there's so many other great stuff to talk about and to research that you know in windows If you want to make a breakthrough you have to go through 20 years of past research Try to find something new whereas in this big shiny new technologies. Well, it's right there It's like, you know buffer over who was of 1999 it's right there. So There's much opportunity there to be taken That's awesome Yeah, and I'll ask a question too. So like, you know, you mentioned, you know, how much more efficient spark is over hadooped Like, you know, would you say to companies that are still heavily relying on to dupe or like behind the times? Like should they be looking at spark or is there better security there? Like what's what's your take? Oh my god Well, hadoop has that thing. Well, it actually handles kerberos authentication Now, whether that's a good thing or not. That's up for debate. But No, not really like my mantra really I work in blue team. So this is, you know, blue team are talking basically Production should be boring to paraphrase what other people were saying production should be boring So if you have a hadoop cluster of like 3 000 machines, this is the window work and it's fun and everything is fine Then by all means continue if you have a mainframe to do what you need to do I've seen a post actually that emulates what spark does but only using shell commands It works. It's much cheaper. It runs on a single machine And you don't have all the partition in and the shuffling You don't have all the network latencies You don't have all that crappy style that makes it very very very slow comparison to comparison with every keeping everything in memory and just working on a big slice of an object. So I mean Hey, whatever works for the company We add particular set of talent in that particular set of circumstances and data. I mean go for it That's my opinion. Yeah. Yeah, that's awesome. Yeah, and you mentioned so you have that blue team mindset I mean, so I guess how would you try to detect what you were doing during these attacks? Like are there any particular tidbits you try to give to a company that's trying to secure their spark instance other than making sure it's patched Very interesting question very interesting Here's the thing. Um I never thought about that That's how the Decorrelated the two worlds are for me. It's like I do bleeds blue team in the morning I do red team at night and that's how again, they are apart. I never thought about Hi, it would be good to release some yaru rules to detect this stuff never did it cross my mind Future research there you go Um No, but it's uh To me, it's easier to patch But I know that some well a lot of people will not be able to patch Um, to me the first thing you try is you try to patch if you can patch then you want to detect then you want to mitigate it somehow so you need to Isolated either network firewall rules or uh do something like that to you know, basically isolated from the rest of the Reducer exposure if you will now if you can't do that either Oh, well, you gotta detect it somehow. How would you detect it? Um If you have some advanced correlation in place, I think you can detect it I'm talking about specifically the exploit of replaying that single theorized object that triggers command execution I'm talking specifically about that one way to detect it is that unlike other spark interactions It's it like in that interaction specifically you only send one command Instead of the whole charade of hey spark. Hello, uh, which version are you running master? Yes I'm gonna just run the application. Okay. Here are the workers, you know, you have Like 80 messages going around but instead in that exploit you only have one message so If you can if you have a tool that's advanced enough to make these kinds of Correlations say that oh this IP address only or this session only did initiated that single communication That single packet then maybe it's suspicious because it didn't follow up with all the stuff That could give you a hint you can also track the on out of memory errors that are spark like they're On your machines, um, that may be a bit noisy, but that could get you going But then again a hacker could find another way to trigger the execution because I only gave out an example Using the on out of memory error another way Yeah, these are the two main things to look for On the top of my head basically sure We got a follow-up question From pc2 it appears really well done talk It looks like you did the magic on the spark stand alone like we were mentioned there Uh, did you get it working with yarn too? It's uh, not a big deal Let's see not a big deal if not, but I imagine it's possible Also 100 uh, production should be boring Amazing um Oh my god, okay, so here's here's the real story. So I found this thing on on spark and I was like, oh my god This is amazing. I feel so great. You know what? I'm gonna follow up I'm gonna see how it works on yarn then I Hold up yarn did the installation made it work on sparky, right? So let's look up some balloons So I turned up wire shark and I saw the traffic It was on wire shark and I was like, oh my god. I don't want to touch that That's how foreign it was that's how Yarn is more difficult than than spark stand alone. Um It's really really more difficult and I looked at it and I was like, okay I don't want to look into it. That's really what went through my mind When I tried to do that on yarn now If you look for a similar vulnerability, I couldn't trigger it Like I was playing around with yarn a little bit, but I couldn't trigger the same vulnerability Does that mean that it's not vulnerable in some way? Of course not but um By that time I was like, oh yarn is just too much for me. Uh I'm gonna happen not gonna waste my time on it. I'm gonna have a bunch of other stuff Maybe if you feel a connection with you all you can dig into it. Please do so Because like I said, there is nothing out there from yarn. I didn't find anything. So probably there are some stuff to be sought after in yarn Well, and that makes a lot of sense so this If somebody decides that they do want to pursue this pursue the yarn side and would like to ask you more questions all Are you available for? either consultations or for Answering questions that people come up with when they when they're doing this on their own Yeah, of course. I mean Yeah, definitely, uh, like my Twitter is open Right at the end of this will have you post your any contact information You'd like for people to have access to in the track one channel and you can you can be a bit be there for people Yeah, definitely. I mean When you look at this stuff like when you look at the spark source code It's beautifully written by the way But it's it's written in scala or some of it is written in scala and it's it can be very daunting the the syntax is Weird and when you look information Look out information on the web That talks about spark nothing like there are very like I think one blog post or like two Maybe that are dedicated to the internals of spark everybody talks about how shuffling works and how it's you know Partitions data and stuff, but nobody talks about How it works inside so Yeah, there's a lot of Things to figure out and some simple stuff when you explain them But like they look simple, but to get that information But you know, it takes a little bit of time because nobody talks about that those internals Uh, so yeah, if you have any questions, please do not hesitate. Uh, just you know hit me on a dm or just Bring me or whatever cool Yeah, and so, um, yeah, it's kind of one of the things I thought was interesting too, right? So through this through what you discovered, right? Like I was like, oh man, I wonder if he's going to report this to Apache and then like, oh, look, you did I so I guess how did that interaction go? Right, like how long did it take him to fix it? Right, like was it a good interaction tips for other people having to reach out to companies like that? um, okay, uh I I sent them the phone I think on 24th december or something like that. Okay, very bad of me um Then I got a response from the spark security team and they said, yeah, okay, we'll look into it, etc, but Whatever and then didn't receive a response later and then my talk at troopers got cancelled And I think one week before I decided to you know, because I was rehearsing a little bit So I that you know what? Let me just write them again to see where they are on this And I got a response from the same guy saying, oh, you know what? We looked it over and we discarded it because it's not interesting I was like, okay Um, let me rephrase. Oh, and when I sent them I sent them an email with the proof of concept It's like a python code that actually executes the code fully. So proof of concept that's there. So it was really detailed email And uh, yeah So the guy said that they discarded and I didn't understand so I Brought a second email saying are you sure you want to do that because You're right in the documentation that when authentication is enabled between dust and dust component, uh, that Etc. So you're kind of breaking this trust by allowing this vulnerability to go on etc, etc And then I get a response. Oh my god. I'm so sorry. Yeah, we made a mistake We thought we were you were talking about the rpc endpoint that we already got a report for Now indeed, this is a dangerous vulnerability, etc. I'm gonna put everything and they you know, uh I mean, it happens if I do try it myself and sometimes I get it wrong. So this is normal um, and then they went on to fix it so All in all it took I think eight months But since they acknowledged that it was a vulnerability, I think at all it took like three months Gotcha. So persistence was Persistence was key. Well, I mean my goal was not to get it out there to just publish it And I didn't release the tool or I didn't even talk about I think I talked about it to two people who were very expert and spark to validate that when I was not saying bullshit And I asked them to not disclose it That's about it because my intent was not to just, you know, release that I am pro for disclosure But everybody should make their own choices and it depends on my mood. So For that what I decided, you know what, let's just keep it on the low key anyway and Yeah, and I think three weeks after disclosing it, they released a complete Correction because it was not impacting only one function. It was impacting also two others. So they rewrote a class, etc So we worked on the fix well, I worked on the origin of the fix and then they took it over because More competent than me and then the Thing that was really long is the release process because they had the fix three weeks after I reported it, but Since like let's say beginning of April they waited until July And the other version to about four to six to actually release the fix So that's the part that was long and not actually fixing it just releasing it But yeah It's out there and we're all more secure for your efforts. So it's, you know, better than having no one looked in that area before Uh, yeah, yeah, definitely and and now we're talking about that. There was some interesting point during basically this whole Spark adventure And I think I briefly touched upon and during the talk is that I was so like I spent like a couple hours even Days trying to understand how the rpc stuff worked and only to find out a couple days later that somebody already published the whole so Yeah, that was that was very interesting. I don't know if it happened I think it happens to some people But basically it's just so stuck and like you find yourself drawn into that code base and trying to understand how it works When all you have to do is actually google the right keyword to get the actual exploit And when you do it you feel a sense of frustration and But it's part of the game Yeah And that's pretty good advice for all of us who are into Finding vulnerabilities and figuring out how to report them is that enumeration step goes on The software that you're working on and you have to read the state of the art of what people know. So That's that's a good thing to reinforce Exactly. Um Awesome. Well, so tell us a little bit more about the next thing that you're moving on to you It sounds like you already have this idea of what you're researching next Um, how actually I wrote down a small plan of not a plan, but a bullet Bullet points of what I want to do later Um, I just released this tool called reflect p which reflectively loads an assembly Uh an assembly and a p objects and shout out to um Name right, please Um Rob If needed you can post it in the track one channel at the end and I will I will post it Okay, so his name is Rob Knopp. Well, he goes by the handle of Rob Knopp. Sorry Um, but yeah, I wrote a tool called reflect p which reflectively loads a p uh executable in memory And I are out his basically tool to reflectively load also p assemblies. So Rob Knopp create tool Um, and why did I do that? So this is this is what I'm working on right now and why did I do it is a very, uh I find interesting is that When you look at the same code, um to do the same thing but in power shell and other languages is that It's huge script of like 2000 lines of code No tests whatsoever and you just launch it and if it doesn't work well too bad It can't debug a 2000 line power shell script So I decided to write it from scratch using golag and using like, you know testing and good Dev like good Dev practices, etc So that anybody could just look at the code and understand what's going on and what are the steps and where it didn't work So this is what I'm really working on. I'm trying to make this Tool that will actually be easy to understand for people who just get into it and just want to understand how Reflectively loading something works uh windows Uh, so that's the thing that Uh taken most of my time right now and my next big thing is kubernetes. I really want to get into it. Um, it is All of the talks that I saw like 90 percent of them talk about how to abuse a kubernetes Set up with no authentication. No airbag. No, nothing I want to see like what is there to exploit. Um, You know, once you have all these security hardening in places, if possible to bypass Uh, I don't know namespace isolation is possible to bypass think control, etc, etc I wouldn't I want to really dig into it. Uh, because there's again, like I said, like I really look out for these niche topics Where there is not much talk where there is not much tools available and try to focus on them and hopefully something gets out Out of this research. So that's yeah That's awesome. That's awesome. Yeah, so I we're kind of wrapping up here at the end of the time Is there anything else you want to share? Um with with the group or again How to reach out to you like and we can post your contact information But anywhere you want to point anyone to before we call it up end here Uh No, nothing except that uh, basically, uh, thank you for this. Thank you to the spark team. Uh, first of all, because honestly Amazing like when you look at the code base, you understand how talented these people are and uh, really Pleasure and the privilege to actually have interacted with Some of you, especially some of you that I saw on stage talking about it. So I was pretty excited though. I tried to hide it a little bit Yeah, so thank you all. Um But again, like for people who want to basically go into infosec and um Break into it give talks the research, etc My only advice is basically look out for all these small niche topics that are Basically abandoned by everybody And go deep until you feel you understand it and then just go crazy with it and then most Surely something will pop out Awesome. Awesome. That's great advice. I appreciate that. Yeah, and I was going to say just one one more thing basically, um was going into, you know Since you did actually sit a lot like a live demo, right? It was like your one thing you're missing out on not having the live defcon crowd is you know Round of applause for having a demo work on stage So since you actually did you kind of roll that out I was we wanted to give you, you know, nice nice round of applause and get that in the discord chat too Just you know, thank you for for getting that and getting it working and everything. Um, thank you very much Cool So I think that's it for us. So, uh, again, thank you again for for joining us and thanks for the talk and look forward to what What's uh, what you have coming up next and hopefully see you next year at defcon Thank you very much. Thank you Bye