 Hey, guys. Welcome to the talk. A picture is worth a thousand words, literally. Deep neural networks for social stego. Do a quick intro for both of us first. I think we have, you know, kind of our thogonal skill sets that complement each other nicely for this kind of talk. I'm a data scientist at Xero Fox, so I work in social media security. I did my PhD before this. I studied biological neural networks that were a lot more detailed than the type of things that I'll talk about today. But more and more, I've been looking into how to study big data, social media data specifically in terms of these type of networks. And here's Mike. Thanks, Phil. So my name is Mike Rago. I've been doing a lot of research in stegornography for many, many years. Another gentleman that's here today who I presented with at SkyTalks and later at the Wall Sheep, Chet Hosmer and I authored a book called Data Hiding a few years ago. We explored a lot of new and we tried to make it very groundbreaking methods rather than yet another stego book. And this is kind of a spin off of that, which we'll tie in. And super honored to be back. I last spoke here at DEF CON 12. So I'm kind of maybe a gray beard old school. And at that time I had spoken on stego and authored a tool actually in VB called Stegspy. And I'll get into that a little bit more detail and kind of bridging that to what we'll be discussing in more detail today. When we took a look at the theme for DEF CON 25 this year, Phil and I had already been kind of brainstorming this topic. And when we saw the theme, unintended uses of technology were like, wow, this brainstorm, some of this research we had already spawned and started would fit in perfectly with the conference. So in terms of leveraging that expertise, myself more on the stego side and Phil of course more on the ML side, we kind of put our two heads together and said I think we can create a really cool presentation, some really cool research and ultimately a really cool tool as well. I'm sure the majority of you are familiar with 2600 magazine, the hacker quarterly. I know I've been collecting the magazine since at least 97 and it even predates that. And in looking at that, if you've ever looked at some of the covers of these magazines and you put them in different types of light, you'll actually find hidden messages on the front pages of these magazines. So if you've got an archive of some of these old things and you're hanging out one night having a few beers, it actually is kind of cool to kind of break them out and see what kind of messages you can find amongst them. So our agenda is we're going to go through the evolution of stegornography and kind of bring you up to speed and also really more kind of focus on everything that kind of led up to our idea around using social media for covert communications and some of the ideas and cool research that others have done that kind of inspired our research. Furthermore, we'll then kind of get into DIY or do yourself type of social stego. I'll walk you through a lot of the testing that we went through to kind of vet out the various methods we could use for hiding data across social, across images and now across audio and video and the different types of insertion techniques that we employed. And then Phil will take over, we'll get into deep neural networks for social stego, red and blue teaming approaches and then we'll collaborate on the wrap up and kind of real world use cases of this. So one of the things I always refer to especially when I present at forensics conferences is kind of just kind of taking a step back in what is covert communication. And if you refer back to the U.S. Department of Defense orange book back from 85, they describe it as any communication channel that can be exploited by a process to transfer information in a manner that violates the system security policy. If you take a step back and you look at stegornography or covered writing, secret communications, things like that, before we really got into the internet era, you had the code breakers book that came out by David Kahn. Anybody here ever read code breakers? Really, really good book came out in the late 60s and then in the second edition in the 70s. Fantastic book takes you through the history of different types of covert communications dating back to ancient times in the Egyptians and the Romans and the Chinese. Fantastic book and from that, you can get a lot of ideas more from a digital standpoint about the various ways in which you can communicate covertly. I know I refer back to the book daily. I keep it right on my desk all the time. Really inspires me to go back and say, let me step away from the digital side of this. What are the methods they were using way back when? If we take a look at the internet era and the evolution timeline, Neil Provost and many others around the 90s and into the early 2000s were analyzing and there were lots of apps that were coming out for various operating systems to allow you to form different types of stegonographic techniques. These employed a lot of methods primarily focused on hiding in images, whether they be JPEGs, GIFs, and other different types of formats. And you could additionally employ crypto to not only hide the message within an image using a variety of techniques we'll talk about next, but furthermore encrypt it, cypher it, even disperse it across the image. But people have certainly expanded upon that since then as a plethora of different ways you can do this that we'll cover next. As things progressed, people started to kind of take a look at the mobile aspect to this. And there are more than a thousand mobile apps that allow you to leverage stegonographic techniques to hide a message, to hide a picture within a picture, to hide content in audio and video as well. And furthermore, Chet Hosmer and myself presented here the last few years at both SkyTalks and The Wall of Sheep covering different techniques for hiding within different types of video formats. And then later on today, also presenting on covert TCP, covert UDP, and covert Wi-Fi. And so there's a lot of different ways in which you can leverage stegonographic techniques for hiding information. But this is kind of what led up to what we're going to cover today in terms of social. So just kind of revisiting some of the ones that were very inspiring to us. One particular app called Open Puff kind of expands on what we described already in that, hey, I can have an image in which I can hide content. Maybe I mess around with the metadata. Maybe I append it to the file. Maybe I use an LSB technique, DCT technique, or other techniques for that matter. Open Puff basically said, I'm going to use multiple images and I'm going to hide my content across multiple images. And then I'm going to throw off the forensic investigator by creating decoy files as well. So if you had all of the images and you were the investigator trying to piecemeal back together that original message, you would be thrown off by some of the pictures actually having decoy data, making it extremely difficult to not only identify that there's hidden content there, but actually putting it all back together or performing steg analysis to reveal the original hidden content. We also saw a few years back where operation shady rat and the research surrounding that was released, right? And one of the ways they did that was it was very much weaponized by an actual call back to a WordPress site to other sites for that matter where we would get updated instructions hidden within an image that it would parse, extract, and update the command and control information. Additionally, there was some great research done at SANS around expanding upon alternate data streams within the Windows NT operating system such that you could do or perform stealth alternate data streams. This thereby some of those things that are built into the NT operating system like LBT and other things can actually also be exploited leveraging alternate data streams but in a much more stealthy manner. Bridging that to today, you know, lots of different types of protocol exploits for this as well as I did some research around a smart watch, hiding data within that, presented that at DEF CON demo labs and then furthermore within MP3s which we'll come back to at the very end of the presentation about where we're going with further research. So breaking steganography out into these different categories, many of which we explored for the research. Linguistic stego basically modifying the text and either adding additional words, additional text, misspelled words and other type of linguistic approaches can allow you to hide information in a very simple way. And we've seen this employed on Twitter and other types of social media as well as Pinterest to allow you to go ahead and post something to one of these social media networks. And although it looks like a bunch of words that really don't make a whole lot of sense to us, the intended recipient, the bot or something else is extracting the pieces of those that they're most interested in. From an image standpoint, a lot of different methods you can employ. In terms of JPEGs, for example, you have XF or even J5, whereby there is metadata at the beginning of the actual JPEG file. Some of that may actually be leveraged when you take a picture on your smart phone. You post that to social. Unless they remove that metadata, you know, there could be your location information, there could be a variety of other information such as what phone it was taken from, the time and date and a variety of other information. But that same metadata or those metadata fields can be used for hiding data. Additionally, there are other techniques in which you can append beyond the end of file marker or using a least significant bit technique or frequency as well. And we'll talk more about that in detail and how we use that technique within social media. Additionally, done lots of research both across audio and video. Remember with an MP3, for example, that typically there's a copy of the album cover embedded or a JPEG embedded within that MP3. If I can hide stuff in an image, why can't I hide it within the JPEG that comprises the album cover that's embedded within the MP3 itself? And then for those who author a lot of these Stego programs may additionally employ different types of cipher techniques. Visionaire for those of you who know was used at Cisco for a long time, XOR and many other types of crypto. All right. So what do we do with our actual research? Obviously when you look at social media and social networks, there are a variety of images that you can target and that's exactly what we did in our testing. In that we said we'll take an ethical approach here but to what extent can I hide information in a profile image, in a background image? In addition to images that I may actually post or an album, book or collection I may actually create. Additionally, can I do that over DM? Or can I actually have a link that points to another site where that image with the hidden content resides that's actually rendered and presented within the actual social network on that particular post or that page? It's important to consider when you're analyzing and looking to leverage these image formats as a carrier to determine to what extent can I hide data and what are the different compression methods that they employ? If you look at JPEG, PNG, TIFF, GIF and Bump files, most of these are lossless but they do have some lossy capacities. One might argue for example, although GIF and some of what was originally patented was a lossless technique, it does use compression and in other formats of GIF you may lose data as it's compressed. Bottom line, a lot of these image formats leverage a compression and a lot of that had to do with the early days of the web so you could post it and that file simply wouldn't be quite as big when you visited the page over a modem connection it would render much quicker for you. But what exists today is a lot of information about those compression techniques that can either be targeted as well as information about how they're formatted in terms of the metadata, the file markers and a lot of other characteristics. So in terms of a research then we said hey, you know, of all of these which ones can we actually model out in our testing and test for each social network and all the different variables that we previously outlined. Least significant bit for example basically allows you the ability to go and modify the least significant bit from a zero to a one or one to a zero but only modifying the least significant bit across the file or dispersed or even at specific file markers repetitively throughout the file such that the recipient who may use the same program or technique to reveal it extracts all of those least significant bits to put them back together to either reveal the original ASCII code or reassemble an image or something else that was hidden within that image. Other techniques that we employed a lot of times when the social networks render a JPEG file they're looking for the end of file marker which in a JPEG is FFD9. What we found in a lot of instances was we could do something as simple as just appending content after the end of file marker which for some of these social networks was actually just completely ignored and what is rendered is just what you see up to the end of file marker just either ignoring or throwing away the extra content but we found that the survivability was 100% because when we downloaded it after uploading it it would survive and that kind of leads to my last point which was one of the other techniques we kind of employed with this that I personally had never done before is well let's upload it. Let's see to what extent they jam it, they recompress it, they strip the metadata and take that post download file and now actually hide content in that and post it back. Does the social network look at it as now hey that's a file I've already touched, it's in my format, I'm going to ignore what's in there now or do they recompress it? So that was also part of the testing. So just a very high level testing workflow. We used a lot of these different types of hiding techniques within an image, uploaded it to a variety of social networks, then downloaded it and tried to understand the difference with the characteristics and what kind of content would or would not survive. So this spreadsheet is kind of a breakdown of the results of the initial testing. As you see here, whether it's Pinterest, Slack and others, you know we went through a process of hey let's try the profile file photo. Let's post an image as part of a post. Let's try the background image. Does the picture residing in an album, a collection or a book have any impact one way or the other on this? And then that round trip of hey you know I uploaded the file with the hidden content, it went ahead and recompressed it and removed the metadata thus destroying what we had hidden or essentially jamming it. But if I then download it, modify it and repost it, did it actually survive? So with Pinterest for example and Slack, we were able to post images and hide content in a number of different ways that included insertion techniques, whether it be prepend, upend or within a portion of the file that may be ignored, modifying the metadata and also using least significant bit to hide content. So anywhere, obviously you see here a yes that's highlighted in green is where we had a success rate in terms of these methods, uploading it to the social network and then simulating a recipient going out there treating it like a dead drop and downloading it. What's interesting about this is we're using one of the most you know open forums for sharing information for everybody to see but yet take an equal and opposite approach of actually secretly hiding data right in plain sight that nobody really sees by actually observing it. Lastly, amongst all of that, we started exploring MP3s and I'll come back to that later with Tumblr and how we actually were successful in hiding content there because that's sort of the next stepping stone with our research. So at this point I'll turn it over to Phil. Cool. Thanks Mike. So to to build off that, we have a lot of research now and we have a lot of results about how we can go up and post images and download them and see what type of effects are being rendered by the different social networks in this round trip. So we want to build off of that and for the instances in which the social networks are doing some kind of compression or some other type of back end re-rendering of the image, we want to find out a way how to retain the ability to implant Stego, upload the image, download it and have that message survive. So deep learning. I'll talk about that in a minute. To zoom out a little bit first, why social networks are such nice conduits for steganography. We all know this. They're massive. There is so much content that is being poured across social networks on a second by second basis. It's incredible, right? The scale is out of control. It's four, almost five billion pieces of Facebook content shared per day, hundreds of hours of YouTube shared per minute, 500 million tweets per day, above that, etc. So the idea here is that there's so much content being poured across. It should be fairly trivial or fairly anecdotally simple to hide some piece of data in that huge stream of data being poured across. And even though it's public, have someone, a recipient of that message, be able to take that data from the sender, decode it, and understand it while everyone else doesn't understand it, right? So they have like a special key or a certain way of decrypting that message. On top of that, social networks themselves are evolving. So initially when it was Facebook and Twitter and the initial like MySpace, a lot of the way that we as humans communicated with each other was through text, you know? It was very simple, 140 characters. We got the message across text messages more and more now, and the older social networks are catching onto this. But the way that we communicate with each other is mostly through images. And there's a lot of reasons for this. And there's a lot more engagement that gets created as a result of this. But networks like Instagram, Snapchat, Pinterest, Tumblr, these type of networks where the primary avenue of communication is over an image, whether it be a meme or a photo that I take on stage and send out to my social network, we're living more in the moment and we want to share that instantly to other people. So in addition to the fact that they're heavily trafficked and they have this public nature, social networks provide community APIs for sharing content for developers and the apps that they build. And so it's fairly trivial for me if I have an account to design or to build some code that makes it so that I can automatically upload an image to a social network and then in turn download that. If you worry about attribution, fake account creation is pretty trivial in all the social networks. Anyone can go up and assume some identity. When you're worried about steganography and the more malicious type of steganography, well, which I'll get into in a little bit, if you're an IT guy at a company, from a network perspective or from a forensics perspective, social networks look completely benign. Interacting through a social network doesn't raise any red flags. It's expected almost people post on social media at work all the time. In addition to these kind of characteristics, you have a lot of examples of these things happening in the wild and I'll go over that now. I split it up now into kind of black cat versus white hat and the most prominent example of this black cat example was hammer toss and of course this was discovered by fire I a few years ago and this was allegedly the Russian APT 29 group that once the malware was installed locally on machines, it would go up and look for different social network users on Twitter and if the user would exist it would look for a hashtag and a URL that was associated with the last post they made and if that existed at the URL there would be a link to typically a github page which contains an image and within that image was steganography. So you have all these layers of obfuscation here that are being employed by the attacker just to retain command control and to communicate with the infected machines. You have other examples of this not just on Twitter more recently you had the allegedly again the group Turla doing this with Britney Spears Instagram comments so they're getting pretty creative in the way that they maintain their command of control infrastructures and so on the other side of the page here you have more white hat research in this so some pretty smart people last year at endgame presented a way to deliver power shell code through Instagram images using discrete cosine transform steganography and again this was a way to maintain a command of control and infrastructure and to and to keep contact with the malicious computers or the infected computers or workstations. So in addition to that you have a pretty cool tool I'd like to point you guys to called Secret Book by Owen Campbell Moore this was a Chrome extension that made it super easy for you to go up and put a little message into a Facebook post or a Facebook image and upload it or upload it to the network and then download it and encrypt it and be able to recover that message on the other side. So the way that he did this and the way that the folks who did the Instagram research both did this was that they were able to look at the quantization tables that were being used by both Instagram and Facebook and basically reverse engineer those things and once they knew or once they had the knowledge of what the quantization table or basically what the JPEG algorithm was doing behind the scenes they were able to predictably and reliably transfer data through the social networks despite the fact that they tended to clobber or really compress the crap out of the images and there's other heuristic discrete cosine transform schemes that exist. Another reason why social networks are kind of nice conduits for social stego is that from like a machine learning perspective it's really easy to go up and get data. As a data scientist you need access to data and label data and so it it's very easy because social media provides permissive APIs. I can I can take a bunch of images on my local machine and either upload them to an album in bulk and then download them or I can do it the hard way whereby I can just take a for loop on my local machine and add some time jitter to it so maybe to to avoid some detection if it's very regular that I'm uploading images it might be very predictable and they might take notice and so I can go up and I can piece by piece post each of these images to the social network and then just download them at the click of a button from a Facebook album or not just necessarily Facebook but any other kind of social network. So to get back to this workflow that Mike introduced we have a pre uploaded social network image our cover image that we want to upload to the social media and then next we want to download it and the message that we stored up before we uploaded we want to be able to recover it and so some of the social networks that Mike identified before for example Pinterest and Google plus and Slack and Flickr they don't do anything to the image when you when you upload them to the social network so so there's no reason why you can't just do LSB out of the box or append it to the file or change the metadata they're not doing much so this for me is not interesting I care more about the fact that different social networks like like Instagram and Facebook and and Tumblr are compressing the image so I want to isolate these and I want to be able to say despite these alterations that are being made I want to be able to still pass a message on on the pre upload side and recover that message after it's downloaded that's kind of the challenge that that I post for us and so why is this such a challenge is because you have different jamming techniques that are being employed by the social networks when the content is uploaded to the back end of their servers and so they do this for a few different reasons the most obvious reason is that when they want to serve up content to the users they want to make it so that their users have a seamless UI or a seamless user experience should I say they want to be able to serve up images as you scroll through a timeline or as you scroll through albums you want to be able to uninterruptedly look at these images and render them on the fly and because this is a very expensive operation typically on your mobile device they tend to compress it so they make it they make the image smaller and when it's smaller it's able to be fed quicker and more easily and more conveniently and it's not just compression there's a lot of different other type of techniques that are that are used like Lopez filtering like Mike said stripping the metadata they could even convert the file type so if you upload a PNG the social networks might convert it to a JPEG and there's other type of image alterations you can do like alpha compositioning that might be that might be done but there's anyway there's a slew of different operations that are done by these social networks and I want to say and I want to kind of prove that despite these things we can still create a message that survives the transit through the social network so this is a pretty fundamental figure that I'll talk about for a few minutes when you when you take an image and you implant it with some stego and you or some hidden message whatever it may be and you upload it to the social network and then you download it you have two images if you pre uploaded image and you have your downloaded image so what we did is we looked for a bunch of different images like this what was actually happening to the to the individual pixels of these images as they transit it through the social networks as you can see this is a pre post pixel difference histogram and what it means is that if I compare pixel to pixel the pre uploaded and downloaded image what is the difference in RGB value that I'm seeing from Peter from pre to post during the transit and the peak here for for both Tumblr Facebook and Twitter and the other networks that we saw when we compressed these things is centered at zero and when it's centered at zero that means that the majority of the pixels aren't changing right so so this is good news that means even though there's compression and other type of operations happening we might be able to somehow predict which of these pixels aren't changing in advance but this is a really hard task because I can do it for a single image and I can do it for a few images but I want to be able to know before I do this Diego which image locations which pixels are most are most embeddable are least likely to be changed by the social network transit and the compression and other stuff that they that they do upon the image right so basically from a machine learning perspective we take a bunch of data and we take a bunch of images and we label it in a binary fashion all the pixels in the image which are least likely to change so they have zero difference between the pre and pre uploaded and post downloaded images we label those pixels with a one those are prime locations prime carrier locations for us that we want to target with our message that we want to embed in the image all the other ones where you have some slight pixel differences happening between pre upload and download we don't care about them we want to toss those away if we try to change some bit and store some message in those pixels it's going to be changed right so you can do this and you can scale this up so for example there's a lot of image libraries and we use the combination of that and some of our own images and selected a bunch of samples because the algorithms that I'll talk about in a little bit rely on a lot of data to be able to learn which locations with or which pixels within the image are most likely to survive transit and we can automate the uploads and the downloads using the API functions and so in the end you have let's say we start with 50,000 images you have 50,000 pre uploaded and you have 50,000 post downloaded and you can go and you can create labels like I said before and you can do the diff and you can create the labels so that you have basically binary masks for each image right and so then the question is great you have these labels you have the for a bunch of different variable images which locations are most likely to survive social network transit now how can we go about predicting that for yet unseen images and so to do that we want to use a neural network but classic neural networks like this simple one layer with one layer single hidden layer types they don't scale well to images when you have so many dimensions with the width of the image and the height of the image and you have the three RGB channels what you end up getting is an unmanageable number of weights that would take way too long to compute and so starting in the 2010s or even a little bit before that there was a class of algorithms that came out called convolutional neural networks and these type of networks allow us to encode the properties of these images into the network itself into the network architecture itself so instead of dealing with hidden layers that are single-dimensional we're now dealing with multi-dimensional hidden layers and you're basically kind of like the human visual field is doing I won't dwell on this too much because it's probably outside the scope of the talk but kind of like the human visual field is the visual system is doing it's taking convolutions or filters over each layer and it's in each layer it's responding selectively to activations in the previous layer and so this has been proven to be really effective for different computer vision tasks like object classification and facial recognition a lot of the big companies are using this now at scale and they sell this as products too but like I said before we want to pose this as a binary classification task for each individual pixel given an unseen image I want to predict for each pixel which asks a question for each pixel is this pixel likely to change when I upload it to the social network and then download it or is it likely to be one of the pixels that are going to change and therefore I should kind of toss that away and not store a message there and so this task is akin to image segmentation I'll go into more detail on the next slide but you want to create a binary mask for each image so you basically want to select the pixels that are most likely to keep your message intact and deselect the ones that are least likely to and there's a lot of reasons why you would do this you can imagine the path that the image takes as it gets compressed the social network as a function that's being imposed upon the image and feed forward networks have very nice properties so that you can approximate these type of functions so we set up a model and the model is built using Python, Teras using a TensorFlow back end if anyone has more details about or is more interested about technical details come find me after but we used a GPU and we used a neural network with 23 layers fed through through values that was kind of contracting and expanding and it looked like these type of networks so if you can imagine finding pixels that are at least likely to be changed as you upload an image to a social network it's akin to identifying pixels and images let's say for objects right so on the left hand side this is an image taken from deep mask and the idea here is that you want to do object recognition so you want to selectively choose pixels that are most likely to contain objects on the right hand side this is a this is a picture from UNET and the idea here is is that you have like more of a biological use case where you have cells or you might be interested in like analyzing ultrasound or doing some kind of cancer screening or cancer detection more and more of these tasks are being or being accomplished by neural networks and more automated techniques and less so by surgeons or doctors so the idea here is that you have specific cells that you want to isolate from the four you want to isolate the foreground from the background of this image and the same thing can be done or the same idea can be applied to identifying pixels that are least likely to be clobbered or least likely to be altered during social network transit although it's not as pretty you know you're not identifying objects anymore on the right hand side you see the base image and then on the on the bottom you see the pixels that are most likely to to be able to contain or be able to survive social network transit so these are the ones we want to select for for embedding our hidden messaging but it it works to some extent right so we're able to predict using a bunch of different images which picks the locations for an unseen image can survive the the throughput and so we have several different caveats to this first we we impose constraints upon the neural network so that instead of being able to willy-nilly embed a ton of different data we actually want to make sure that the difference between the pre and the post uploaded image doesn't look completely different otherwise if we go too far in that direction you get in the in the zone of watermarking where people are trying to put too much data in the image to make sure that it survives compression we don't want to do that we want to make it for a human still imperceptible we don't want the human to notice that the message is stored inside there and so this affects the capacity to some extent but there's anyway there's different constraints you can encode on these algorithms to make sure that that they don't that they don't show up for for a human you can do different things like mssim or use peak signal to noise to impose that constraint and then the results we were looking at and I've yet to quantify this but the the learned pixel locations that are most likely to to survive social network transit with a message correspond to regions of the image that are that tend to be more complex and busier and that's because of the constraint that we imposed upon it that we want to minimize the visual difference between pre and post image and so what's the novelty here you know with with spatial steganography traditionally and space by spatial I mean that you're actually flipping bits or you're doing lsp or or even to the two last bits or whatever you want to do typically this this technique tends to have more storage capacity and so you can you can embed larger payloads within your image compared to frequency base stega which is where you're encoding the message and decide the discrete cosine transform coefficients however typically it's been thought of as being compression or any kind of alteration intolerant and so here we're trying to show that that's not necessarily the case right and so previous ad hoc approaches where we're based on okay I have a bunch of different images and I have I have a network here and I want to just try to see what happens and and and present the result of of what's actually going on what can I actually do here we want to actually create a feedback loop and use the data on the other side and let it inform future data and make future predictions for us and in principle although the results I'm showing here today are are being used on Tumblr this should generalize across social networks that that use compression and so the nice thing here is that you don't necessarily need to know the implementation implementation details of what's going going on behind the scenes you don't need to necessarily know in advance that the social networks imposing this specific type of compression or with a certain specific range you can just kind of let the data and let the machine learning algorithm do that work for you and then so just to kind of to kind of contextualize this I got up on stage last year and and gave this slide and the idea here is that a lot of a lot of past thinking in information security has been based on machine learning but applied to defense so whether it be network intrusion detection or spam filtering or antivirus prediction people tend to associate machine learning with being able to detect the stuff that the bad guys are doing but last year I was up on stage with my colleague John Seymour and we talked about a way to generate text and generate messages on Twitter that people were much more likely to click on and so the idea here was that you could use a neural network and train based on people's preferences and their likes and interests based on their Twitter timelines and actually deliver them a payload and deliver them a message that looks a lot like what is something that they might be interested in clicking on again and so the idea here is that you can kind of mix the effectiveness and the high accuracy but the get away from the high manual labor associated with spearfishing and still and scale it up to the level that you would see with fishing and so kind of the overarching theme here is that red team or offensive techniques and machine learning is rising there's a growing number of examples in the literature both the stuff that we worked on last year and this year when it came to steganography and micro targeted social engineering on Twitter but also when it comes to password cracking captcha subversion Hiram Anderson gave a talk recently about anti-virus evasion and so these type of things are being employed more and more and in fact it's easier than defensive machine learning you don't need to necessarily go out and get a lot of labeled samples to be able to do this effectively here I was able to automate the the labels that I got just based on differences between pre and post uploaded images the work we did last year with micro targeted social engineering we used we didn't even need labels it was unsupervised in nature so we let the network spit out a tweet that looked like exactly what someone might post previously right on top of that success matters less for the red team you know if I go out 100 times and I succeed once that's great for the blue team it's the exact inverse right and so there's kind of like a a slew of these different characters that are conspiring to make make attacks easier and make machine learning a viable way to do this on top of that the retreating barriers to entry but I don't want to worry people here I think red team machine learning and offensive machine learning is a positive development for the community it's going to start to keep us honest if you apply statistics and make the attack more statistical in nature it's going to make your defenses more robust and fortify them in the long run and I think people like Elon Musk who who tend to be more fear mongering about AI you know they might have other ulterior motives to do that but I think in the long run for security this is going to be a really a really nice development that's only going to improve security in the faster this is realized then the better will all be yeah so you know from a forensic standpoint and trying to perform steg analysis right is is is quite difficult and so this sort of simulated offensive approach to testing out all the different characteristics in ways in which you could potentially hide data in all the different methods that we've outlined so far from an ml perspective that is does that allow you to get ahead of the people that may be actually looking to maliciously exploit that right and from that learn other ways in which you could further jam or prevent those techniques I think that's you know some of the the things to consider here and looking at the general use cases and we're almost out of time here you know coming back to some of the actual use cases from a data exfiltration standpoint if somebody is communicating covertly posting these you know they look very benign right when people are posting images to social and although it may be a medical environment a government environment something else for that matter when people post that you may be observing that on the network you may be observing what they posted it may look very very benign but as we've demonstrated these techniques definitely circumvent a majority of your IDS's malware protection systems and other types of security products and so it remains a big threat and a big risk furthermore it makes a perfect dead drop right hiding in plain site whether you played Zelda on Nintendo and tried to find that brick where that you know item was hidden behind to a digital you know applied perspective of that this provides a great mechanism for performing that furthermore it's been demonstrated in the wild and there are real use cases like hammer toss and others in which the CNC was weaponized and then one other thing I'll mention in terms of privacy you know when we post those images to what extent is that data stripped away and conversely how can that be further used to communicate covertly cool yeah and you can also think about this in terms of the you know bypassing censorship type of situation more and more you know governments throughout the world and a lot of Western governments too are are imposing restrictions on what can it cannot be posted on social and so this was one of the ideas that was that was emphasized in the in the chrome tool talk the one by Owen Campbell more is that these type of hidden messages allow you to still retain the ability to to to bypass these online sensors and still get your message across to the people that you want to reach and then lastly kind of the one of the purposes we wanted to emphasize here is that we would just want to raise social media security awareness in general a lot of people may not even be aware the fact that when you upload an image from your phone or from your camera the metadata or other identifying characteristics might be still there and this might be a really nice way or a really easy way from government to can't government to track you and track your location and other type of meta data that surrounding testing. So just to wrap up then we've started to spread into video and audio so one example is some of the sites will allow you to upload audio but it'll get converted from an mp3 to an mp4 but in others you can upload an mp3 and within that a lot of mp3s have that field for a jpeg so could you hide information within the jpeg within the mp3 upload that to social and would it survive in our test cases so far yes it has so you could certainly leverage that from an audio and even a video standpoint too with mp4s and really quickly in terms of mitigations you know we're not we're not presenting an undefeatable or undefeatable technique here there's there's things that can be done you can you can imagine more sophisticated and a dynamic jamming techniques so switch over the quantization tables more often and more frequently and there are different ways to detect secondography as well that are well vetted out in the literature and so you know that's it here's some summary points and we're going to be around for questions after this if anyone's interested in talking about it I'm going to release some code in the next few weeks that allow you to play with secondography on different social networks and automate it through your through your user and if anyone else this is this is in a lot of ways this is a work in progress so if anyone is interested in these type of techniques continue and collaborate on these ideas later thanks great thanks everyone