 All right, so yeah, we got a lot of full slides today So I'm going to try to go through most of them fast and the ones that I think have the more intricate details We'll cover a little more in depth. So I'm going to introduce some those of us who might not be super familiar need a little refresher a little bit about stego Stegonography and kind of Eventually get to this point where we're applying some old techniques to new technologies and that's basically what this project I've been working on essentially coming up with a file system that operates within other data And we'll kind of take it to the next level and consider the ramifications of doing something like this as well as you know, what if somebody's already doing it so Just to kind of get things going, you know stego might be something as simple as somebody posed to classified ads says hey Wanted a single white female whatever right? That's all entirely code that really means something else or it might be something more subtle Something that's part of another message I'd say you've got inside the little tiny period of a letter on a printed letter You've got a bunch of little tiny ones and zeros that are real subtle to the human eye But if you know what you're looking for you can decode those and there's your message so the More interesting at least from where I'm coming from Ways to implement stego would be using something you might be a little more familiar with like taking the image or some other content and Finding some either unused space in that media Something that might be reserved for comments or metadata something about the data Instead of the actual content itself or maybe you can tweak it a little bit so that the original content is still represented It still describes the original thing it was doing, but yet you've tweaked it enough that it actually has another subtle meeting so Usually most people and they think of stego think of it that way they think of okay We've taken the least significant bit the little tiny piece of information That doesn't make as much of a difference as the rest of that say talk about an image a pixel the very very Least significant bit that represents that pixel color and we're gonna Do things a lot differently than that, but it's important to understand that the That somehow You can have data that isn't as important as the other data and then that's what we can leverage to either hide more data in such as a serial number watermarking or actually Exceed or normal storage capacity it is kind of the way I'd like to think of it So like I was describing the classical digital steganography Excuse me steganography You can do that a couple of different ways I like to think of it even though there's much more clever and interesting ways to do it But say you've got some number representing the color or say it's an audio sample It's represent numbers representing the magnitude of that audio sample If you just round off that number and even number when you're going through the media could be a zero and an odd Could be a one real simple. You don't have to think about real complicated Encodings here. So that's kind of what I'm basing the rest of this talk on is this very simple Description of steganography. Yes, you can get much more complicated. You can get much more clever you can Use other techniques to compress more data into the data you've got This particular project I've been considering actually for the couple of years now you know the growth of What's unfortunately called the web 2.0 new media environment? Sites let you upload videos prime target. You can cram a lot of data in a video file. They take a lot of data and We can also leverage that I also started considering the ramifications of of you know what we consider a viral marketing somebody downloads a video And forwards it around Well, doesn't that be great if I wanted to have my data mirrored by innocent bite or excuse me Not innocent bystanders. I shouldn't say but you know by the random public. Sure. You can maintain my backups for me As well as you know the actual marketing aspect of it where you know privacy concerns Okay, what if somebody's already tracking these videos with some sort of water marking under the hood and people are forwarding around You know what kind of One, how would you do it and then to what kind of issues come into play because of it? One of the issues considering this kind of a mechanism say you take a video upload it to Pick on YouTube today YouTube comm you upload it. It will convert it. It just does it re-samples things compresses the audio and does other things And if you just have your Secret code Encoded with just something simple like that least significant bit mechanism. I mentioned the Your code will actually be destroyed mostly because of the compression and the tweaking and the changing that they do to the video So that's probably the biggest issue we have today that I'm going to address is how do you make? that Stagnographically encoded data survive one maybe even two or three or four conversions between different things Hmm we can do it a couple of ways we can try to Pick something that when we upload to the the system that's hosting in any way that it's not going to do as much Conversion okay, it's not going to tamper with the bits too much If we can pick that from the get-go will be doing pretty well as well as if we include enough redundancy of our message in the Host media itself will be in pretty good shape So this is the kind of thing we start thinking about when we talk about distributing this day go Keep in mind too though that there are other ways to encode data. I like to use The misspelling of the blog comments right if you spell the with the TEH that could represent a one and for all the Zeros just that regular English text or whatever language you want to pick the regular spelling could represent a zero So at the human eye, you know, that doesn't look like a message is encoded But if you've expected that you can now decode the message Of course the conversion here would be if there's an online you're submitting it be an online form and it does the spell check for you would be an issue or For example using double spaces instead of single spaces your browser renders HTML If it's not a non-breaking space, it will just collapse those white spaces into one And it human eye doesn't matter you download it though, and now you can see that there's extra bytes there So we can use these in combination. Maybe even alternate between the classic media image audio video encoded messages Or maybe we just use and Consider this an out-of-band type and coding to represent maybe the structure that we're storing The actual data in this is how we can find the data. We're looking for Of course, it's pretty hard to prevent and to police all the blog comments, right? So it's going to more than likely than not Not be tampered with talk about the upload the conversion is usually not going to be as damaging So it's a good candidate for our structure for such a file system So to kind of Throw a few ideas out there stego FS as it's still I guess I should call it pre-alpha. It's still well, I Wouldn't want to hand it out Which is why I don't have a link to download it yet because I still have a lot of work to do on it But it does do a few things it does use URLs and the first version had a duly linked list so It would actually have a URL that pointed to this block a URL that pointed to the previous block a URL to point into the next block and that way if You were navigating through you could find okay yet more data yet more data and now you can tie everything together So your URL could be a blog comment and then the next one could be Maybe a video or whatever Originally, that's how we were doing it But for the example, I'm going to show here. I'm actually going to use a Flash video on YouTube that I Actually picked one that was already up there that I anticipate nobody will see So there's you know less funny business going on and I go ahead and I encode one bit per pixel and store some extra Some extra bits to make sure and check that those bits haven't been converted Generally what I'm talking about here is is what we call for error correction, right? So there's been especially in the last couple of weeks. There's been some interesting News articles a little bit of press coverage about for error correction. There was a thing a few days ago It was either digger slash. They're talking about the Viterbi engineering school Viterbi Named after the guy who came up with a lot of the for error correction. We use this stuff in data communications all the time It's not new The one I'm actually going to use I'll show in a minute is actually a hamming code based off of dr. Richard hamming Work so because I think that's the easiest one to understand My point is though we're going to try to prevent if we lose one or two pixels one or two bits Because they've resampled the video or they've stamped a watermark of their own over the top of it We can still fill in the gaps We can also extend this across Things So consider I might have multiple frames of video for example I could Have it mirrored repeated through multiple frames now if You know one of those maybe my error correction isn't good enough on one frame if I average it with the next frame I might be able to figure out what it was supposed to be So if you want to look up some more information and It's late in the afternoon your eyes are rolling back of your head or whatever Here's where I suggest looking most to go to Wikipedia's entries for a hamming code is really good and look up some of these other terms These techniques are being revisited quite a bit now because of you know the progression in quantum cryptography Quantum computing and other things So it's it's it's a hot topic now. I'm using a a stego in media example, but the error correction techniques Are definitely alive and well today So for our hamming code, I'm gonna encode one byte. Okay, it's gonna be hexadecimal ff and For this example here of the underlines that I have in this first bullet point each underline is Going to be a parody bit the first one the very first zero position is going to check every other bit Include itself actually starting counting with itself include and check every other bit and this is going to be an even hamming code So we're going to want to have an even number All right, so if we take ff, right, that's eight ones essentially in binary So to figure out what our first check bit should be we look at every other bit So the course will come back to our self later We skip a bit the next bit is a one the very first bit of the data that we're actually storing Is a one so we've got one so far we skip another one Which is a happens to be a blank so far the next bit We've got a one and so on we actually end up with five ones And that's odd, but yet we're doing even parodies so we need it to make it even so we actually place a one in Our check bit state now if any one of those bits has been flipped to a zero We can now detect that but the key to this Forward error correction is actually in the layering of the different check bits, okay? Take the second one for example if we check the second bits going to check two and three six seven ten and eleven so The same thing applies we added up we happen to get another five ones, so now we store another One and I'm not going to go through everything else on the slide, but the point is We've got this pattern and how do we actually detect and correct when something's wrong real simply if We go through and add up the bits that lie Okay if we go through and check and say that the the bit to that's supposed to check these other ones doesn't add up and The eighth location also doesn't add up we add those together Well it happens to be that bit ten now is the one that was damaged and that's just the very simplest way I can describe you for error correction. We're using 12 bits to describe eight bits and Well 12 total bits to describe eight bits of data and that can let us detect a couple of mistakes and always correct at least one Now we can add and keep going there's extra tricks make it more complicated to correct more than that many bits But now if one of these pixels has been tweaked in our stegode Image frame from the stegode video. We might be able to correct it Now notice I didn't add 12 bits to the image. I'm just rounding off choice ones. I want to One thing I like to consider too in doing this is how was you know how to extend it from other materials See we have a history of really poor actually stegobiographic techniques A lot of them are pretty bad if you just Google and try to find a stego tool It's probably not going to be very good unless you you pick something really clever There's easy ways to detect a lot of these tools That detect what appears to be a use of the tool let's say By doing things like looking for near duplicate colors if we've tweaked just that one in zero throughout the whole image Then there's gonna be a lot of colors. They're almost the same but not quite and yes I'm over generalizing, but you get the idea It's easy to detect and then because we're putting in a pattern that changes the pattern that's already there the compression Capacity changes of the content to host media. We're looking at so of course we can just go real low-key and only encode in very select places and it'll be a lot less likely to To not just be detected, but it's going to be a lot less likely to Be damaged in a lot of ways if you're not replacing everything and just replacing some things especially on real quiet to frames like it's not a whole lot of of Color in that frame you're going to want to be more subtle so it's less likely to Like I said get detected but also to be damaged when the things start getting compressed and the online engine as we're doing in this Example goes through and tries to make colors, you know, oh, this is the close gray to that gray So let's make it the same gray and so the video takes up less space, right? That's kind of the way it works If we encode very sparsely we're not going to have that damage happen in that way Um So again remember we can store things in the data itself we can store data about our data And we can store things Well, let's say what could we store we could store something like a serial number by using stego We can now say hey, I upload this video to YouTube if I want to find out when somebody tracks it Someone's downloaded it where they got it from I can now look at Their copy compare it to the one I encoded and I know which one they had But the other thing I want to kind of expand on is what if we use a visual watermark? To hide our data in that is you know how you go to YouTube, right or whatever my space whatever you do There's the YouTube logo in the bottom of the video Well, what if we make a YouTube looking like image to hide our stego in See when you use something like like YouTube Most of these things will do a matte layer in the player that doesn't actually encode in the original media stream So that if you actually were just to download it, there's no logo and there's no watermark from YouTube there But so if you're viewing it YouTube's logo is over the top of your stegode one, but when you download it there it is so some techniques like that and actually what I'm going to demonstrate is Just like a kind of like a barcode a watermark It this was an interesting topic to try and prepare for because how do I show you something that's designed to hide? It was a real hard one to to think up. So we'll get there shortly though Excuse me All right, so Mostly what I've been covering so far is making one frame survive conversion. That is one image one point in time survive that Take for example, say you've got a video you're uploading that's at 30 frames per second and the engineer uploading to decides to hey, let's take it down to 15 frames per second well, that's going to chop out every other frame that you have and If there's not a If the ratio between the two frames per seconds is not evenly divisible divisible Now you're going to have a problem where there may be new artifacts created When they take two frames and average them together So to make our encoding survive the resampling We're going to want to mirror between the frames the encoding that we're using Okay, so we can survive drop frames and we can survive That extra bit of information that sometimes is added when people resample at a non divisible rate so Just kind of illustrate that why don't I just pull up the Here is a video. I upload YouTube last week. I have up here in the Upper left The watermark now. I know you can't see that so I have a magnifier. I'm going to bring as I start playing this All right, this is the upper left corner here of the video as it's playing This is the same ff encoding that I showed you with the hamming code. We have actually three Semi-transparent Bits and then we have a black bit Which is going to represent that zero and then continue on the pattern But I've actually done it three times in a row as the video is here is playing We can average those frames ourselves and Figure out, okay, what is the most? Unique, you know, okay, this is gray most of the time see right now. It was pretty damaged there at that second But I can take this do three different independent Samples across the whole video and I can Figure out what is most likely to be the ones and zeros and now I've basically survived this ff encoding across everything All right, so that's kind of the idea now. I've just done this in the black area So it's a little easier to see and I use the the 50% transparent with white but you can use and Choose the colors very well and hide it in a more appropriate part. That's part of the image and not up in the In an easy to find place. Okay, that's just for demonstrated purposes. So That's what the stego fs tool encoded the bite ff with All right, all right, so we're talking about that Now it's not super obvious, although if you know what you're looking for you could see a lot of that You might you know do some other interesting things one of the things I was kind of contemplating doing that wish I had more time to do this but you know use the You know the old-time real to real type film that would show the patterns the streaks in the film I use those and actually have them represent a barcode as the video is going by You know you can do all sorts of techniques like that to store this You don't have to have it as blatantly obvious as a demo I just had to try to make it so pretty much everybody could see it for today So you can also add parody across the frames I just talked about mirroring the frame so far, but you could parody the frames themselves and either a hamming code in yet another Dimension across the frames As a different way Take for example here, we've got the two bites encoded Just an XOR example not real Creative nothing new just something we're doing in a different way that I haven't really seen anybody do before in this dimension so I Still have to rewrite a lot because I haven't used the actual libraries what I had so far was just command line scripts glued together so between image magic and FFM FFM peg Here's what you could do to rep replicate exactly what we did here We're going to run the convert command to create This is The code the the gray because it was 50 percent transparent white and black and using black fill points and Here these points are the zeros in the first command line option here Or excuse me the first line To create an intermediate image now. This is the watermark image. We will actually Use and then what I do is I create a third image or excuse me I create another image that has it three times and then I go ahead and make sure I've enforced the transparency and Create my actual watermark that FFM peg with the video hook library, which is getting rewritten It'll be interesting to see the newer version that comes out after Google summer of code But in this way and actually this particular one I ran on Windows With FFM peg and not on anything else So it works quite well to create the coded M peg and this is a At that point we upload it to read it of course you've downloaded it And I'm not gonna bother going you showing you scripts to download but downloads it Strips out everything saves it as individual PNG files goes through and peels off where we've placed that watermark Which again, I think the more interesting use would be to to make it look like the YouTube logo or something that would be hidden Overlaid with the matte version of the YouTube logo and then we can go through into our averaging to our parody checks and whatever I you know didn't have enough space to show you everything On the slide of course, so this is basically what Stego FS and its current state does Inside of a fuse module Fuse being that the file system for user land So to kind of reiterate we have that frame As an image we've peeled that off okay, if In that particular frame it's damaged We can try to repair it inside the frame or try to repair it by referencing the next frame or the previous frame Okay, or some other out of band in coding, right? We still have the audio We haven't been playing with we might have other things we've stored somewhere else Maybe we've mirrored the entire video somewhere else with a different Provider we've uploaded one to YouTube one to my space. Let's say Okay We can prevent detection or Well, at least make it a little less obvious if we don't just obviously choose those those colors in that spot We can also extend it to too many other Excuse me many other ways Another aspect of this is I don't have to just store the URLs right I could ask Google to find them for me or I could Make it a little more dynamic in another way trying to extend the concept we've gotten there's some other interesting things we can do By the way, I've got a Already got a link to this presentation the one on the disc for the con is not Complete there's been a lot of changes since then so you're probably on a download I've already got it up here if you want to download the full power point already the boot on comm slash resources we have Though to consider All right So you can hide stuff inside of videos. Yeah, what's new? Well, we're trying to do it in a way that's easily accessible inside of a file system format now You might not be interactively, you know editing a file Very well, it's not gonna be practical to edit a file, right? It's you're gonna have maybe multiple videos It's gonna have to download if you edit it and make it right It's got to upload you've got to wait five to 20 minutes depending upon the provider It's for them to compress re-encode it make it available and so you even know where it is so that you can reference It in the file system, so it's not something you interact with Like a normal file system, but we use the file system As a mechanism to store inside of there and then now stego fs could do all that stuff we need to do for us Extending it though, what else could we do? Well video uses keyframes Basically when you're skipping ahead in the video with the progress bar, whatever you want to call it The way that you can do that is because they've keyed in frames that kind of show approximately where in the video it is They're not really content Sometimes you could well you could if you wanted to make a video with no keyframes. I think you might have to have one to start with I'd have to read the standards, but See if you can even start with one without any But you could use say one keyframe If the next keyframe is with less than five seconds, that's a zero if the next keyframe is over ten seconds That's a one now you're not gonna store a whole lot of data there But you know you haven't manipulated the video at all all you've done is you've changed the way people can seek in the video And of course there's other things there's metadata inside the video the mpeg standards allow you to put in some comments and There's other things you can do to hide data But I think the one I'm probably gonna try next is to to make a watermark of my own look like another company's watermark So that when they're watermarks overlaying mine, you're not gonna see my watermark But when you actually download it and view it you see mine what you think is theirs So I think that's probably the way I'll have the next demonstration go The other thing to consider too is it's not just in the media content We can have anything we can signal and reread We can store something in you know just like think of a dynamic RAM Every time you read RAM you have to rewrite to it DRAM Well, what if we had something like a spoofed DNS request? and then With the reply comes back or it resolves to our DNS server that Represents a zero or a one now. I'm not talking about tunneling or using DNS as a covert channel as you probably have heard of it before Which is a great technique, but I'm still thinking along the timing realm, right? I can spoof a DNS thing see the result Or shoot don't spoof it Just do it if the result Excuse me comes back in less than you know five seconds or more than ten seconds again now those could be zeros and ones all day long Now that's a lot of overhead for encoding ones and zeros, but you're not actually storing anything anywhere The problem you would have is you'd have to constantly refresh That data right you'd constantly have to resend the DNS traffic to keep that media live But now on the wire you've got a file system That's never actually stored anywhere So that's kind of one of the things I hope to add in the stego FS before I release it is Some of that it's still very pre-alpha quality. It's especially with the video sites It changes so often that it's really difficult to Squeeze the most out of it actually last week the demo. I showed you on you with YouTube video Never once showed any kind of variation. It always was the same. I thought that would be a very interesting demonstration It looks like I just hard-coded that on every frame, but Anyway, it's a moving target. So I hope to add in some of these extra features But you get the idea we can still extend this into other forms anything we can store We can store more stuff in is what I'm saying Now as I started to work through some of this material I stopped to think about, you know, other companies watermarks and I thought about well What if they're already doing this? It's pretty easy to encode in watermark and serialized things and I know people who do forensic investigations I do too, but I mean some of the people I know of that have this have encountered and have been investigating people who Have had material that's been serialized in a stigma graphic type way But what about the viral marketing aspect of people trading around things now maybe using something maybe not a full file system version But they're still storing those blocks in they're using it redundant Blocks redundant bits Not so much ooh now they're doing that. Yeah, we probably were paranoid about that already But what I found is I can tell how they got it That is if you download something from YouTube, I can tell if my number Was that pattern I showed you it looks different when it comes from YouTube than from my space So I can tell you where you got it from I can also if I make it robust enough my pattern I've been playing with a little more. I Can have you I can tell that you download it from YouTube first and then re-uploaded it to my space and Then record it to an AVI file I'm trying to see how far I can I can take this before it gets crazy But that started to kind of freak me out a little bit And the more I work with it the more I find I've over complicated the problem. It's so unbelievably easy to do that So, you know, there's some large privacy concerns here So basically I Just tried to apply some old techniques to what we cook, you know, unfortunately call new media Considering the impact of of some of the problems we deal with today, that is we're running out of space in a lot of ways I mean as a security professional type person I can never keep enough tracking information and Sometimes those relationships are what I'm really concerned with not the actual serial number. I just want to know how you got it right so kind of like We'll like what the perturba guys did with Maltego, right the relationships is the data That's what we really want not so much the data itself. Yeah, it's important, but sometimes the relationship is more important So I'm you know in considering this aspect that we can use those relationships For example, I talked about the relationship with respect to time with the keyframes and it's DNS traffic To encode those ones and zeros so even though we're physically only got so much space in storage to store something We can actually store more than we can have physically stored on the device itself that way if we pick the relationships appropriately so And that's not a foreign concept either the Relationships between hashes and rainbow tables and the chains and everything that goes on there I'm not gonna get into that but essentially what I'm talking about is creating a reduction function of The data and the relationship itself with steganograph steganographic techniques So it's easy to consider how you might be able to extend this to other current problems like Some of the issues we have quantum computing if somebody reads something they've tampered with the data, right? It changes when they read it For error correction very important I have a few things I'm going to eventually put up a link on blue notch comm slash resources where you can download The Python module once I try to add in some more things But I do have a copy the presentation there that's current If you want to look at some of the flash video source originally I wrote it in in pearl for the the older FLV codec Standards, but since YouTube moved MP4. I thought that was kind of starting off in the wrong foot So I just dropped it all started off getting more in-depth with Python as the framework. I'm using and and Using MP4 as my tests But these techniques of course apply to other ones if you want to look up the hamming code probably the easiest place to start is this Wikipedia article and I Did run across somebody else doing something a little similar. There's a virtual dub filter Which you can look at on this URL. I could never get it to work, but it's interesting concept They show a good demonstration of hiding a message and then adding the redundancy that would survive Converting from an AVI into another format So it's basically a different way of explaining some of the things I've explained today That's pretty much all I had Q&A and 104 right so Go over there for questions unless you want me to play the rest of the hacker videos trailer. No Yeah All right, I don't know if the sound works very well in here, but why not just for fun Thanks guys If you can maybe you can see a little better now in full screen you see the the watermark I've got up in the upper left. Yeah, so all right I'll let it play I guess that maybe that would make everybody clear out the room if I play the hackers trailer All right. Yeah, I tried to find something. No one else would be viewing so that when I was Doing my test I could tell if if I was the only one going and fetching these files. So Obviously, no one ever watched this besides me so I'm taking over a TV network All right, I'm not gonna play any more my hard drives pegged for some reason so all right, that's all I got. Thanks I'll see you in 104