 All right, Aloha and welcome to our talk on uncovering stolen algorithms in commercial products. I'm Patrick Whittle. I am the founder of the Objective C Foundation. And today I'm stoked to be co-presenting with a good friend, former colleague, Tom McGuire, who is an instructor at the Johns Hopkins University. So today we're going to talk about what I believe is a systemic issue affecting the community, and that's the unauthorized use of stolen algorithms in widespread commercial products. Figured if it could happen to us, it could happen to others, maybe even happen to you. So here we'll first introduce, we'll talk about the victim application, and then we'll talk about how we were able to find and prove that other commercial products were using its closed-source algorithms, and then how we were able to ultimately resolve this situation. So first let's talk about oversight, the victim, who I mentioned, its closed-source algorithms. Stolen by at least three separate unrelated commercial entities for profits and gains. So in this part of the talk, I want to discuss oversight, specifically its internals and its algorithms, as this is important to understand. This is relevant when we dive into showing the fact that this code was ultimately stolen. Now oversight is a pretty straightforward, pretty simple utility written by yours truly. It was released in 2016, initially as a closed-source application. So that's important because we'll see the infringement occurred when corporations actually reversed engineered the binary and re-implemented the algorithms. Its goal is pretty simple. Just seeks to alert you anytime anything accesses your mic or your webcam, and also identify what process is responsible for this action. This was kind of the killer feature. No other tools at the time had the ability to identify the active or responsible process that was accessing the mic or the webcam. Now oversight was designed predominantly to detect stealthy malware that got onto your system perhaps via a zero-day or some other infection mechanism. And when that malware would attempt to access the mic or the webcam, that gave oversight the ability to detect it and throw up an alert. So on the slide, we can see an example of it detecting malware. And the rest of the malware on the slide oversight was able to detect it with no prior knowledge of the actual malware. Because again, it was just alerting anytime anybody accessed the mic or the webcam. Turns out oversight is pretty good at detecting zero-day vulnerabilities too, specifically those that relate to webcam or mic access, remote Zoom bugs, or even zero-days that Mac malware was exploiting. So again, on the slide, we have some examples of malware with zero-days or other vulnerabilities. And again, oversight can detect that because it doesn't care how the mic or the webcam is accessed, just that it was. Finally, oversight also played a pivotal role on covering good apps behaving badly. My favorite example was that it was able to uncover and prove for the first time that Shazam on macOS was actually still listening, even if you, the user, turned it off. Yikes. All right. So back to oversight's killer feature, which is the ability to identify what process is accessing the mic or the webcam. For security tool, this is obviously a must-have feature, right? If the process, the application accessing the mic or the webcam is Zoom, Skype, FaceTime, that's fine, right? That should be allowed. Maybe don't even alert the user. But if it's some other malware or some other unrecognized program, obviously you wouldn't alert the user. Now, you might be thinking, well, yeah, this is a great feature. Why didn't other tools have this capability? And the answer was it's actually very difficult to implement and to achieve. So on the slide, we have a few lines of code. It's very easy to determine that the mic or the webcam was activated. MacOS provides a notification for this that you can register for. But that notification doesn't tell you who done it. So now let's look exactly how oversight went about identifying the active process. It's a bit involved and leverages a bunch of undocumented features of the operating system. But again, it's important to understand this so that when we look at commercial products, we can again prove without a doubt that they stole the code directly and verbatim from oversight. So oversight has three steps that it performs in order to identify the active or responsible process. Step one is enumerating mock messages. So when an application wants to access the mic or the webcam, under the hood, kind of behind the scenes, the low-level APIs and frameworks will actually send a mock message to the camera or mic daemon. So with this observation, I said, okay, cool, I can just enumerate mock messages and see who was sending mock messages to the camera or mic daemon to ascertain what process is responsible. Turns out you can't do that directly, you need special entitlements, but there's a command line utility that ships with MacOS called LSMP, and it has the correct entitlements. So what oversight does is simply execute LSMP, it's very straightforward, you spawn a child process, then it reads everything from standard out and parses that. Now this parsing is a little complex or at least a little involved because the output from LSMP is not designed to be read in programmatically. But we'll see when we talk about the commercial products, this is one of the indications because they parsed it in exactly the same way. Now that list from LSMP might not be just one process, there might be several processes, so oversight had to take other actions to figure out one exact process, the active process accessing the mic or the webcam. So step two, I observed the fact that in the IO registry there are very several undocumented key value pairs that contain a list of PIDs that did include the active process which was accessing the mic or the webcam. Here's the code to do that, it's pretty straightforward because we can access the IO kit registry directly, basically just query these key value pairs, specifically the IO user client creator under the IO PM root domain. Again, these are undocumented keys, again that's relevant when we talk about proving equivalency. We also have to do some parsing to then pull out the PIDs as well. Finally, because again this list might have several PIDs as well, we do one final thing if we don't have one single process and that is we sample candidate processes. Read their remote memory, look at their stack traces and look at the APIs they're actively calling. Now again, Mac OS doesn't allow you to do this directly, you need special entitlements, but lucky for us there's a command line utility called sample that again we can execute against the target remote process that we think might be accessing the mic or the webcam and then this will give us a stack and thread trace and what we do is we specifically look for the CMIO graph do work function which is related to reading frames off either the mic or the webcam. So via these steps oversight was very accurately able to identify the process that was responsible for the mic and the webcam and again since it was free it became very popular. Unfortunately this popularity came at quite a cost. So now let's talk about how oversight was torn apart and its secrets shamelessly stolen for commercial gain. First though you might be wondering how did this even happen? Was Patrick's computer hacked? Was the source code stolen? Now it turns out it was far easier, right? The binary is distributed, anyone with basic reverse engineering skills could reverse engineer the oversight binary and reconstruct its algorithm. So from a technical point of view, trivial, from an ethical legal point of view, not really that cool. Again considering this was stealing from a free tool and then utilizing it for commercial gain. Also it's worth noting that oversight's algorithm is first and foremost very unique. If you Google a lot of the strings, the actions it takes, there are zero hits. Also it's kind of janky, right? First and foremost I'm a security researcher, not a software engineer. And to give credence to this claim we'll see that when Apple pushed out an update it triggered a lot of bugs in oversight. Yikes. All right, so how did this all begin? How did I even think that someone was stealing my code? Well actually I never thought that this would happen. Maybe I'm not even optimistic. So I was actually looking at a binary that had been flagged by some antivirus products for a client. It turns out it wasn't malware, it was one of these kind of suspect security tools. But as I was looking at it I noticed it was executing the LSMP binary and I'm like, that's strange, I do that and I haven't really heard of other people doing that. And the more and more I looked into this product it looked more and more similar to oversight, especially when I then read their marketing material that said they provide the ability to monitor the mic in the webcam. And in that same time Apple pushed out some updates to Mac OS which horribly broke oversight, embarrassingly. So on the top of the slide we have some bug reports that people submitted basically saying hey Patrick you got to fix your tool. So I did what any programmer does, Googled for fixes and I found other users complaining about similar issues. And I was like, man sorry y'all I didn't realize this was so widespread. But reading the forums the users came to the conclusion that this belonged to another product, another tool. And so I grabbed those tools, did some analysis and turned out that again it looked very similar to oversight. At this point I decided to do some more proactive hunting. I wrote a simple YARA rule. YARA rules are normally used for detecting malware but of course we can use them to detect other binaries as well. So whipped up a simple YARA rule to basically detect oversight's algorithm and then ran it across the internet and again found some interesting hits that when I triaged looked very similar to oversight. So at this point I had a handful of commercial products that at first glance appeared very similar to oversight. Of course though we had to dig deeper to prove without a doubt that this code in these commercial products came directly from oversight. So I'm going to hand this over to Tom and he's going to talk about how we were able to prove equivalency between oversight and these commercial products. All right thanks Patrick. So first I just want to give a shout out to my wife and daughters, their birthdays this week. Thank you. Thank you. So for this section so I've looked at a lot of Patrick's code over the years as well. So for this section when you look at the slides on the left hand side is oversight on the right hand side is the products. As he mentioned there were three products that were sort of ones that we needed to investigate a little bit further. We only have time to go over two of them so I'll try to go through them a little quickly since we're running short. If you look the oversight's algorithm as Patrick mentioned sort of has three unique steps to it. And as we go through sort of the first two I think you could make an argument that maybe someone was running these a little bit but when we get to the third step to me having done a lot of reverse engineering over the number of years I think we can reasonably think that this was taken from the oversight. So on the first slide here this is the first product. We have the LSMP parsing and as Patrick mentioned there's a little bit of jankiness going on in his code so he basically just tries to parse this looking for in between the parentheses. So I think there could be a use of optimization here for some regex but you don't see that here. You see literally the same processing that's going on within oversight as there is with this secondary product. We also see this going in the method name as well so if you look at the method names they're very similar. So I think that's an interesting point but it's not necessarily conclusive of taking the algorithm from one component to another. Another aspect of this was the unique aspect of the oversight was looking at the IO registry. So when you look for these specific key value pairs that are used in oversight they're not necessarily, you don't really get a whole lot of hits so maybe after the talk we'll see quite a few more. When we look at this again we see the same exact steps that are going on with oversight as they're in this particular product. Right down looking through this, again looking at the method names, this is a little telling of that this could have come from another source. And for the final one, for the first product if we look this aspect of it is the sampling. To me the sampling component of oversight sort of makes it very unique especially when we tie it back to those other two components. And for a commercial product the sampling, for a security tool that's free open source doing the sampling is fine, right? You're trying to eliminate those false positives and figure out which specific process has that unique to the camera microphone. But the sampling is a little invasive, right? It pauses the process, has to go through the backtrace and provides that for you. And again, oversight is looking for the CMIO graph do work which is really the worker thread for accessing the cameras and microphones. So when we see this in the backtrace that is evidence that this particular process is accessing one of those two currently. So it gives the way for oversight to extend the first two algorithms to this third one and eliminate all those false positives. And if you look this is exactly the same, right? The CMIO graph do work. So it's very suspicious. So I think we can really conclude that this algorithm, as it's written, especially in the three aspects of it, are directly from oversight. And again, we can see we have the same drill here. We look at the secondary product here. Look at the LSMP processing. In this case, there's a little bit of reject. So we give them a little kudos that they're trying to improve the performance. Maybe they saw some bugs in it and wanted to improve that. Great. But when we continue further, right, it's not just that one aspect to it. It's tying these together to get a better picture of what's actually going on. The IO registry, again, we see exactly the same processing here with the IO root domain going down to the correct key value pairs that we see in oversight. And finally, in this product as well, we see the sampling. Right? And sure, there's a little bit of differences here. They do some sampling differences with respect to timing. Maybe that was just for improvements on performance because they are a commercial product, right? They don't want to hog all the resources, so they have to optimize that a little bit. But we see the do work method here, right? So this is sort of a dead giveaway. And as I tell my students, one of the things you want to do is look at the binaries, right? That gives us the ground truth. And so when we're comparing these two binaries, right, this gives us a pretty good confidence that this algorithm was taken from the oversight components. So I'm going to turn it back over to Patrick for some, hopefully, better news. All right, thanks, Tom. So at this point, we clearly have pretty indisputable proof that these products directly copied from oversight. So now the question becomes, what do we do? How do we turn this into a happy or a happy-ish ending? Now, obviously, I was going to reach out to the perpetrators, but I quickly learned there was somewhat of a winning approach. First and foremost, I found it was really important to define or articulate exactly what you wanted. Did you just want money? Did you want to disparage them and flame them on Twitter? Did you want them to remove the code, open source their tools? Knowing what you want was important, because they were always going to ask. Then, and this is important too, create irrefutable proof. The code comparisons Tom walked through, to me, are 100% obvious, but understand that you're probably going to be talking to lawyers or the intellectual property team that might not have or probably doesn't have such an in-depth technical understanding. So, provide those code comparisons as well, because they probably will go to the engineering team, but then also understand you have to have some more high level. So like say, hey, Google this string that's in my product and your product, there's zero hits. Please give me an explanation why they're in both. Also, speaking of lawyers, highly recommend getting your own. If nothing else, I found when you talked to the other companies, the commercial entities who stole this and you mentioned you have a lawyer, they take you far more seriously. I was lucky enough to work with EFF, EFF if you don't know, amazing non-profit organization that provides free legal resources to security researchers and other non-profits. So, a big thanks and shout out to EFF. Finally, why is the probably reach out professionally versus flaming them on Twitter? Also then, it's good to know what corporations want and this is generally an amicable solution. Also, and this was something of a learning experience to me, really found that the majority, if not all of the cases were a result of a single, arguably rather naive developer, reverse engine and stealing the code from oversight versus the malice of an entire corporation. When I first kind of figured this out, I was like, you know, FVs companies, they're evil, they're stealing from my non-profit, but then really once I gained a deeper understanding what generally happened was, a developer had been tasked to implement a feature, they went out, couldn't figure out, reverse engine-made tools, and then, you know, no one asked them where did you get that from. Also then, what do corporations generally want? There's two main things. They wanna cover themselves legally and so often this is achieved via a licensing agreement, sometimes retroactively as well. And also, they're very interested in not being disparaged and for both of these, they are often willing to provide financial compensation. So let's look at some win-win resolutions that came out of this, noting that all three of the corporations, the companies we approached, eventually fully admitted fault and said, yes, we stole from you, which was kind of nice. The first company, you know, really was very quick to recognize their fault, so we see in the email, they acknowledged saying, hey, yeah, wow, it's really not cool. They also took some steps that we're gonna remove the code from our products that infringed upon your utility and then we're gonna provide you financial compensation if you can give us a license. So kind of win-win. The other company, again, similar win-win response, clearly acknowledged the issue, which is really nice, removed the code and then made a nice donation to the Objective C Foundation. So let's wrap this up with some takeaways. First and foremost, if you're a developer, don't be naive like me. Don't assume your code will be not stolen, right? I mean, I thought, hey, my tool is closed source. Someone would have to really premeditively reverse engineer and then steal it verbatim, but it happened, right? Also, be proactive. The corporations aren't gonna come to you saying, hey, we stole your code. So maybe use some of the methods we talked about today to create YAR rules for your signatures. If your product has a very unique feature, maybe keep an eye on the competition and if they implement that same feature, oh, reverse engineer and see how they do it. Bug reports we showed was kind of an interesting neat way as well, which allowed us to kind of uncover some other perpetrators as well. For corporations, it's really important to educate your employers on the topic, even just to reiterate that stealing code is not okay. I mean, I thought this would be obvious, but apparently that's not the case. And again, if you're a corporation, this will avoid serious legal issues and optics issues. Also, I think it's wise for corporations to implement various internal procedures to detect this. Or maybe perhaps when a developer implements a really cool new feature, say, where did you get this from? I worked at a variety of large companies and no one ever asked this question. I'm not gonna steal code, but still having that extra question might have avoided some of these in the first place. And then finally, it's really important for the corporations to be amicable when someone reaches out. There was a few scenarios where they were a little defensive, really disagreed with the results. Once I showed up with the lawyers, they quickly changed their tune. And again, ultimately when they admitted to fault, I was like, well, we could have solved this way more amicably out front. So that's a wrap before we jump into Q&A. I just wanna recognize the amazing companies that support the Objective C Foundation. Also wanted to thank Defconn for having us talk and of course, all of you for attending our talk. So I think we have a few minutes for Q&A. If not, I will be around here afterwards. Also, I'm gonna be at no starch presses table, signing my new book at 3.30. So if you wanna come grab my signature to steal my identity, I will see you there.