 Alright. So, my name's Mike. Thank you for coming to the talk. Sorry for the long delay there. They make us test our VGA to HDMI converters in the green room, but it turns out that the green room equipment is not the same as the equipment in this room. So, it didn't work. Anyway, I'm going to talk about loading code from a copier. And now, I've mentioned this title several times during the weekend, you know, with the blue badge. People ask me all the time what I'm talking about. So I have to disabuse some notions right away. I'm not infecting the printers I'm talking about here, the scanners, by doing the firmware. I'm using them as designed as a scanner and moving documents from the scanner to a target workstation on a closed network and interpreting those documents in a way to drop binary files onto the target machine. So, I just wanted to show you that right away. This is definitely an insider attack. This is for something I worked on to do work on a closed network to load arbitrary tools. Here's what I'm going to go over. It's a phase attack and each step of the phase kind of changes the amount of data I get per page on the scanner into the machine until I go from basic just text analysis down all the way to getting about 80 kilobytes of data per page onto a target machine. So that's the different phases. We're going to go through that all with you. So, the way this all started is I was at work on a closed network and it had a collaboration portal on the network. It was kind of share point based kind of thing. And it had these text entry boxes like you can see here. And it didn't take me long to discover that they are being, the data was being validated client side with some JavaScript. And I was like, oh, well, I bet I can beat that and put some cross write scripting attacks on my collaboration portal at work. But that's easy to do when you're at home, right? When you're using your machine, you would use something like tamper data or burp suite to intercept the call and modify it after it's been through the JavaScript modification. But I didn't have any of those tools available to me. So I kept trying to think through what would I do to make this happen. I was like, well, like I said, tamper data, burp suite to intercept the post call, don't have those. I could forge the post call, but I didn't have curl or W get available to me. And eventually I came to the conclusion that what I really wanted to know how to do was put whatever tool I wanted on this machine without making anybody mad. Without getting caught really. So that's kind of where I ended up working on this particular problem. So these are the conditions I had to work with. I had a closed network, sort of, right? There are no really truly closed networks anywhere because they're basically not useful. But this is, for all intents and purposes, a closed secure network. In this network, the USB ports are secured and monitored. Sometimes they're physically locked. CDUs are secured and monitored. Typically from a writing standpoint, not so much a reading standpoint, but nonetheless it is monitored. There is an endpoint security system on my workstation. It's generating logs for everything I do down in the mouse click, I guess, I don't know. But only certain things that it logs are going to draw attention from any kind of security people, right? So I wanted to avoid those things. There does exist a data transfer point between a less secure network that's closer to the internet and this secure network. But I didn't know how it worked. I didn't know what it logged. I didn't know what rules it had for scanning. I didn't know who it alerted. And I wasn't really in the mood to try and keep poking at it to see what I could figure out and raise my noise level until I got through what I wanted to. Because I didn't want to get fired. So I didn't want to use that. And basically it's a Windows at Microsoft Office environment. These are the tools I had available when I got right down to it. I had Microsoft Office, which provides access to visual basic for applications. I had professional level printers and scanners that can print and scan to a very fine level, which is really useful for what I was doing. And Adobe Acrobat with optimal character recognition is what I used. Alright, so first is getting a cell into attack mode. And this is just turning on developer mode in Excel. Now, we all get those little pop-ups that say, hey, don't run the macros, that kind of stuff, or you want to prove these macros. But if you're the insider writing the macros, that kind of is pointless, right? And I call Excel attack mode because inside of Excel you can write arbitrary script. And Excel with visual basic for applications can modify files at the byte level. And not only that, you can call arbitrary DLLs with arbitrary functions, with arbitrary inputs to those functions. And that's an awful lot of arbitrary for any insider to have available to them as an attack surface. So I call out putting Excel into attack mode. And it's not hard to do, and I'm sure you guys all know how to do it, but you just go to the file options, turn on customize ribbon, you turn on the developer check box, then you get a new ribbon on your Microsoft Excel page and you click the ribbon there, you click visual basic, and then you now have access to a fully functional integrated development environment on your workstation. Now I think the important point here is you're an unprivileged user and you now have an integrated development environment. And I know in many places the users who are developers, who are the ability to write binaries, they get extra monitoring, they get extra scrutiny. But the point is every user on a Microsoft office-based network can do this. And it's probably not being watched. So this is called, I call it putting phase zero, getting it set up. Now the next thing you want to do is you want to get an arbitrary script into your Microsoft Excel. And the way I do that is by printing it and scanning it, basically. There's some tricks to it. Let me show you here. This is a Mac, so I'm going to mess this up, but that's all right. Let's see. So this is the script that I'm going to talk about a little bit later. This is a script that is the phase one of the attack. And you can see some things you need to do is you don't have any indentation. Because the indentation on the OCR messes up the order of execution in the script. So that's not super useful. And a lot of other things will kind of go wrong here when you do this. Now let's see. Macs me. All right. Now I would show you if my Windows machine were here that I would do this live here. I would just cut and paste this whole thing. So basically you scan this on your work computer, your work scanner. You have it emailed to you. That's how typically the documents get to you. You OCR it. You highlight all. And then cut and paste it into a visual basic. Let's see what happens if I do that here. Of course visual basic isn't turned on here. Because this is not my machine. And I don't know how to do it on a... Yeah. I don't know how to do it on a Mac. All right. So we're not going to do the scripts. Okay. So I have some samples in my presentation though. So let's go back to my presentation. Nope. Nope. So we're not going to drop out of that anymore. Okay. All right. So I talked about how you do it. You can print it down to about a point font. You scan it. Nope. No demo time. So let's skip it. All right. So these are the screenshots from previous briefing I did on this. So when you drop it into Microsoft Excel Visual Basic, it doesn't work exactly right. You can see here that these lines here, these are all comment lines. And the comment delimiter has fallen off. So that's one kind of error. Let's see. Another common one is right here. It gets rid of an equal sign. That happens quite a lot. Let's see if I can find any of the function flow ones. Nope. I don't see it. Other kind of weird errors that happen. Sometimes it interprets ones as L's. So I have a function called a calculate checksum one byte exclusive or. It changed it to L byte exclusive or. But it did that for every instance of that word. So it basically still worked even though it changed the name of the function. So that was kind of a happy failure. But you have to watch out for all the changes in the program flow. And then once you go through and re-edit your stuff, you still find more errors. When you go ahead and click F5 to run it, you can see there's one highlighted right there. The value is kind of in the middle of nowhere there. And I'm not exactly sure where that came from on this one. But it'll help you fix it. And the bottom line is you can do this. You can get an arbitrary script into place using the scanner without too much of a problem. Now you could also type them. If you took out the comment lines, the hex magic stuff I'm going to talk about in a second isn't that long. It's only a few pages. But if you had a really long complicated script, you could get it in this way. All right. So the goal is to use those methods I just talked about to make a script that will take an arbitrary file and encode it in binary, sorry encode it in hex and make it so you can print it out really nicely and then take those to work and scan them. And why did I go with hex? Well I did a bunch of experiments. I found that I could get down to a much smaller size font from 12.8 to get more data on there between hex encoding and base 64. I didn't have any word length errors meaning when the OCR ran through the document it interpreted the length of the words as it was supposed to be. Whereas base 64 about, you know, over 10% of my words got messed up with length. So like missing symbols or added symbols. Transcription errors. I didn't have any transcription errors in my initial experiments. It decoded every word correctly, every hex code correctly. Whereas base 64 there was a ton of errors. Now, other experiments showed me that there are errors in hex encoding but they're usually one for one and they're usually really easy. So it means it's like an 8 goes to an s and it always does that. It always interprets 8s as s's. So it's easy to fix that and it's also easy to realize that an s is not a valid hex code so if it's an s it's actually supposed to be an 8. Base 64 that won't work because almost every typeable character is included in the base 64 encoding and so you can't tell where your errors are. You don't know what's going wrong so I didn't like base 64 encoding even though it gave me a lot more data per page. So this is what it looks like when you encode a file. The script, the hex attack script which I would have loved to have shown you running real time. We'll create this and it generates two columns. This is the data column, the information in the file there and this is the two byte exclusive or checks them which I'll talk about here in a little bit. And then you just export those as a CSV file and print them and you can take these pages and scan them and transfer your data into your closed network as long as the secretary is not watching you scan. So I realized that the hex code wasn't going to be perfect. I was going to have errors so I built this kind of compact exclusive or checksum in there. Now the reason why I used it really, it needed to be really small because every byte I give over to my parity, my checksums is another byte that I lose in data and I needed to get as much on a page as possible. So along with this two byte exclusive or taking a gamble that I wouldn't have that many collisions between failure modes to show that the data would work. And it did work. And when you run the code if it can't match the checksums it'll give you this little data's corrupt cannot decode the data. And then it'll highlight the offending line in red. And I'm going to have a hard time showing you what I usually show. Now what you typically have to do here, let's see, I'll do this in a second, but you think it'd be pain in the butt to find these broken lines in your printout, but it really isn't. You would just take this exclusive or and you would find it in your Adobe document and find that line. And after you do this a few times you realize there's a pattern to the failures. There's certain symbols that show up like tilde's and stuff like that. And any dots that happen between the lines of your actual printout will cause errors. And so you learn to find them very fast. It doesn't take very long to fix even a large amount of hex data using this method. And now since I'm briefing at DEF CON and I was warned that I have to have pictures of cats, if you were to decode this hex code it generates this picture of an osloot. This is something I was working on at work. I didn't want to actually drop a binary file, but I figured a formatted file would work. So that's what that one does. Now when I really took this to next step and I was going to use it to drop my DLL in place, I discovered really quickly that it didn't work as well as I thought. I had quite a bit of error. Although it's only about 1% error, it's still a lot of problems to fix. And so I discovered all these kind of errors that you see here. B hat turns to 8 a lot. 1 to L, 5 to S. These kinds of things here. And some of these are pretty bad. A B to 8, that's bad because both B's and 8's are valid hex code. 1 to L's, not a problem. 5 to S is not a problem. D to 0 or O, that can be a problem. And 6's get changed. So I came up with some alternative characters that actually show up in the printouts. I used a hash mark for a B and a question mark for D. I just chose them because it didn't look like anything else. So I thought that they would OCR pretty well. And I was right. They did work really good. And I auto-replace the other major errors and then I put strong visual indicators in the decoding to show you where your problems are. The only thing I can show you about that right now is the one you already saw, the red one. But when I did this with my actual DLL, I only had 1 manual correction and 1210 lines of text. That's about 19 pages of decoded text. And so it worked out really well. I don't think I can show you. Maybe I can try to show you. Yeah, so here is... There we go. Okay. Alright, so you can see here, here's the 2-byte Houssevore. And here's the data line with the questions for the D's and the hash marks for the B's. I don't think I can find any easy to see errors real quick. But I'm not going to do it fast enough. So it'll scan pretty well. Alright, does anyone know how to make PowerPoint come back to the slide you just left? Say again. I'm on it. Alright, there you go. Thank you. Alright. Okay, so the hex attack is really super reliable. You really can get data very easily onto a machine and it's not going to fail pretty much at all. And if it really had to, you could enter it by hand. You could type in those hex lines if you really wanted to. And now that gets kind of tedious after 19 pages. But if you didn't have a scanner available you could do this and still get arbitrary binaries on your system. The bad part is it doesn't allow data density. About 3.6 kilobytes of data per page. And I put some common tools here. No, go back. Put some common tools here between PowerSploit and MemeCats. That would be like 200 pages of data you would be trying to scan at work. So that would probably raise some flags. So that's a little too much. There's no exfiltration compression advantage. If you wanted to remove a binary file from this closed network and print it out in hex code and take it home and recreate it, you wouldn't really be able to do it with any kind of real compression. If that file was 3.6 kilobytes long and you printed it it'd probably be a page long and you're not getting any real benefit unless it's an unprintable document. So I needed to do better. So I got the thinking. How could I possibly put more data on a page? There was just some technology somewhere that would allow me to encode data black and white two-dimensionally on a piece of paper at the pixel level. What could I possibly use? Well, yeah. So it didn't take me too long to figure out that there's an awful lot of 2D barcode stuff out there. And so I went with some barcode experiments. First I practiced with data matrices. I wanted to see how close I could get them down. And I just took this big one you see here and I kept shrinking it using PowerPoint and saving it as an image until it got to the point where the lines between the data bits started to blur and it wouldn't work anymore. And just trying to see how small I could get onto a page that way. But I kept thinking about it. And with the amount of error correction built into most two-dimensional barcodes, I was only going to get about 25 kilobytes of data per page. They have about 60% error correction that depends on the barcode. But it's because they're designed for machine purposes. They're designed for low light. They're designed for weird orientations especially using cell phones. And that's a different design problem than I've got where I'm basically taking the sheet, putting it on a scanner that scans very well in a perfect environment. And I control the orientation from the get-go. So I thought about, well, maybe I can make it better. I took some features from these barcodes, timing lines in order to help locate the data and read Solomon forward error correction. But I was like, I can make it better for my purposes. And so lo and behold, I generated the 8.5 by 11 inch big barcode. And that's what it looks like. And with that, I can get about 85 kilobytes of data per page. And this is what it looks like up close when you zoom in. It has a timing line on all four sides. And it has the data, I call it the data meat in the middle. And if I print that image at about 72 dots per inch, I can get about 88 bytes of information across a single pixel line. And each of these is a bit, right? I mean, that's an off bit. Those are on bits. And I get about 85 kilobytes of data on a page. So I was pretty happy with that. And so interpreting it, I basically, I start with a raster scan going across the image until I find the top leftmost timing mark. And then I kind of stop. And from there, I do a thing which I technically call wiggle fit, where I've got my mask and I put it over the timing mark that I found. And I just keep moving it around until I find the most black part of it. Because you can see when they scan, the edges get pretty fuzzed out. Man, that was cool. The thing got all big. Anyway. And so I wanted to find the most black part. So that's what it does. It moves the mask around. It finds the mask that has the most dark, picks a center point and moves across to the next timing line, timing mark. And it just finds the center of the next timing mark. And it works very well. And I do this on all four sides. And then the end, I end up with this, where each of the centers are indicated. And then you end up with just a bunch, a grid of intersections. For each of these lines, you know, matching this mark with the one all the way at the bottom makes a straight line. This guy here, it makes a straight line. And I just calculate the intersections. At the intersection of each one is a data pixel. And I pick the data off that pixel and I decide whether it's an on bit or an off bit. And it works fairly well. I do get some errors. I didn't expect it to be perfect. My first test runs, I ended up with something like this. This is a heat map. All the black is bits that were red correctly in my scan. Now, these red ones here are bad bits. And there's a couple little outliers. There's one here. There's a couple over here. This is what I really expected it to look like. Since I started in the upper left, I figured it would start getting bad by the bottom right. It turns out I wasn't really correct. When I took the 8.5, 11 document, I get this big heat problem in the middle here. And stop that. So the big problem here is there's a lot of error. Can't see the red marks? Yeah. Okay. So imagine red marks where I'm circling. I was afraid you weren't going to be able to see it when I was thinking of doing this brief. Sorry. So there's a bunch of red marks and they're kind of clustered. Now, the problem is I have to adjust my error encoding on the big barcode to handle the worst error, not the best error. So if you were able to see it, you would be amazed at how clean it is up here. And you would be astounded at how nice it is around here. But you see this giant red stuff in the middle. And that's what I have to base my error correction on, which causes a lot of data loss for parity bytes. So I knew I needed error correction. I knew it wasn't going to work. So I won't read Solomon Ford error correction. And it turns out I don't understand read Solomon Ford error correction at all. I don't understand the math behind Galois finite field either. So I was like, well, I don't want to do this stuff from scratch. I'm just going to find a library. There's lots of libraries out there for forward error correction and forward error correction. Except upon test I discovered that the majority of the forward error correction ones I found out there just don't work. I don't know who's writing these opaque API libraries that I can't figure out and actually contact university professors and they couldn't figure out. But stop it. If you're going to put something out there, make sure it works. But there's a lot of forward error correction libraries out there. So I decided to go forward error correction and see if I could use it. Now the problem is forward error correction is for a data stream where you're missing data. That doesn't make it to the receiver. That's what it's really for. And it works a bit like this. You have some data and you separate in the blocks. You assign a parity byte to each block. Parity bytes to each block. And then if one of those blocks turns up missing you use the parity bytes in the remaining blocks to recreate the missing block. And that's how forward error correction works. Now my problem is not missing data. My problem is corrupted data. So I decided well what if I did a checksum and if the checksum didn't match I consider that block dead and I just take it out. So that's what I did. I got my block of data and my parity data and then I've got my checksum for the whole thing. And if one of the parity bytes turns bad or checksums is bad then I ignore that block and try and recreate it. But it didn't work. I had too many collisions and so I was actually trying to recreate the missing data with corrupt data and the math will still work and it'll generate a corrupted response. A generated corrupted file. So it just didn't work. So I knew I had to go do forward error correction. Forward error correction is for corrupted data. So you have a word of data. You separate into bytes. You add parity bytes to that data. If two of your bytes go bad you can use two parity bytes to find the bad data and then two parity bytes to correct it and it works very well. And this is what I needed. But like I said the problem was there weren't any working libraries out there for me to use. So I had the right one. Much against my will. But I found this really good python based library at wikiversity and a line for line I just recreated it in C basically. C plus plus until I got the thing working. There was a lot of debugging and pain and suffering involved in there. But I finally got it working. And this is what I had to do to get re-solving forward error correction working for my big barcode. Alright. So because of the big heat map of error in the middle that I told you about that you couldn't see but you just have to trust me. I needed about 45% error correction for it to work. Which means I only got about 47 kilobytes of data per page. Which resulted in, you know, it still ordered magnitude better. So if you have powersploit you can get it in 18 pages versus 232. So you can really start moving some data now. You have a good kind of compression advantage over the previous methods. And the demo is awesome. It really is. So I show you how it all works. I show you how you use the script and the DLL to open the, to create the barcodes and to interpret them. And I do live drops of everything. So yeah. It was really good in my room. You guys should have been there last night. Alright. So I decided to give myself a grade on how this went for me. So my goal was to install powersploit on a machine that didn't have it on it using these methods and not using magnetic media. So just some grades. Interpret a page size barcode. Yeah, I could do it. The re-solum and encoder decoder. I was able to make it work eventually. There's a yellow mark there. I'm going to talk about that in a second. I built the library. I call it sideloading. I was able to get the payload decoded onto my target machine except because it was like 18 pages of data I just made a portion of powersploit. So it was only three pages long. So I only gave myself a yellow on that. Or I guess an orange. The hex encoder works. I was able to replace the library using the OCR method. And I was able to generate, write my DLL, hex encode it on my target machine so I could read my big barcodes. It all worked after much effort. It has taken my word for it. So the POC was a success. The kind of stuff I learned from this was that standard offers tools provide a lot of power to the user that maybe you're not fully aware of. We basically have a user concode. The system is not secure. But the bottom line is any user on a Microsoft based machine can code. And that is a big attack surface to pay attention to. And a determined insider can do it. And you can use innocuous input-output systems for creative purposes that weren't intended and that no one's really monitoring. No one's really monitoring the printing and scan load even on the secure network that I was using. And they're not watching for information to come in this way. So it just provides a kind of a hole there to try and squeeze through. Alright some future branch research. I like to reduce the size of the big barcode side load DLL. It was 19 pages of hex code. I like to make that a lot smaller. Size optimization is not really my thing but that's something that I could work on. The error rates. I made an experiment to add more timing lines into my big barcode thinking it would help with the error rates for reading the big barcode. And I was 100% incorrect. It actually messed it up. I still don't know why. It doesn't make any sense. But I like to improve the error rates so I can use less error rates. But this next line is the real key. If I can use 2 to the 16th read Solomon encoding I can do a lot better. So read Solomon coding at the 2 to the 8th means that your code words are 255 bytes long and it has to include your parity bytes. So you have to base your error on the amount of error you're expecting in 255 bytes. And because of the invisible heat map I have to plan that for the high error areas. Not the really nice areas. 2 to the 16th read Solomon coding means I can have a code word 135 kilobytes long which is longer than my page. And I only get about 1% of error across that page as a whole. So I wouldn't need very many parity bytes at all if I could use 2 to the 16th read Solomon encoding. But I couldn't get that math to work and it also runs much, much slower and so running experiments to debug it was taking me too long. So I didn't keep pursuing that but if I get that working it would improve the amount of data I could put on each page by quite a bit. If I could add colors to the big barcode instead of just black and white I did a four color experiment to see that way I'm only using four blips instead of eight to find my bytes. I was able to get it to work but there was a lot of error in decoding color from a scan quite frankly. But I think it's an area for future research. Also I got really excited about using Excel to make a lot of mess with things. Though visual basic for applications is kind of a pain, it is powerful. The ability to write the byte level means you can do anything with it you want. Making a hex editor out of visual basic for applications would be super easy. I started that a little bit. A steganographic decoder, I did that already so I could send myself stuff to work. That's easy to do. Restoring the command prompt. If you're on a machine where the command prompt is locked down by security policy you can make that work again. You can do that with Excel. I don't know for sure but I think you can get away with some direct reflexive DLL injection as well. Messing with the way Excel calls DLLs. I don't think any of the stuff is earth shattering new. People have been running it macro viruses forever and they're all back in vogue now. This is from the perspective of an insider being able to do these things to your machine. I think you should need to watch out for. I don't think I can show you much more unfortunately. Let's see. I really wish I could show you the demo. Here's some stuff that looks like it's left over from when I was practicing. Let's see if I can open this here real quick. You guys are watching me mess up this guy's computer right here. What the heck's the text editor? There you go. This thing here I don't know if you guys can read it and I don't know if I can zoom in. Nope. Let's say again. It's amazingly hard to hear people down there. I don't know if you can read it or not. A little bit. So this DAT file gets dropped when you do a coding with big bar code. These are the important parts here. You have to have this encoded data length and you have to have the MD5 sum in order to decode it with the big bar code on the backside. You have to provide those as inputs to your script. That's important there. When you decode the DLL it also drops this file here which is a prototype for using the DLL. By visual basic it's very, very picky about how DLLs are called and used. This gives you the prototype for it. This is all supposed to be in the materials that are delivered with the brief. That's really about it. My machine was too old in order to use super fancy screens. That's kind of all I've got. Any questions? Thank you guys very much.