 OK, so welcome to my talk. My name is Angèle Bertini. I'm the author. I'm doing reverse engineer visual documentation. And the title of this talk comes from my work in the POC GTFO publication as the funky file format polyglot. So that's where the title of the talk comes from, OK? So this talk is about files. And what are the usual file categories? It depends if you're a newbie, a user, a dev, or a hacker. But in general, typically, people are just interested in exploiting with file formats. And typically, valid files are considered boring. But I still think the important point is that the limits between, can you see the top colors? No? There's valid written here. The colors are weird, no? It's supposed to be red here, OK? Weird colors. So the problem is that the frontier between valid files and corrupted is not clearly defined, and I play with it. So just let's take an example. And here is a valid file, so just to show the kind of valid files that I like to try, it's not exploiting everything, but it's maybe not a standard file. So this is a JPEG picture that might ring a bell. And it's also a Java file, because why not? That's not really complex, but you can play further. If you apply AES on this picture, it's a JPEG picture, then you get a PNG picture. So it was encryption with AES. If you decrypt it with triple S, then you get a PDF. If you encrypt the same file once again, but with a different key with AES again, you get a flash video. I could go on and on and on getting crazy with the proof of concept. I thought I could do a whole talk with a single file, but maybe that won't be a real talk. So at least I hope that by now you're convinced that I'm just a normal guy, and I'd just like to play with binary. Although I like to explain and represent binaries, and maybe you've seen my posters in the building. So this is a picture of them at the top floor. So it's printed thanks to Kurt. And so I play with binary, and I also like to represent posters visually for everybody. And all these posters are free to download at pic.corcomico.com. And if you want to order a print on a pillow, iPhone case, or whatever, it's print.corcomico.com. OK, so oops. No, it doesn't work anymore. What's wrong? Yeah, so let's not go too deep in the technical details, and let's go a bit back to the fundamentals, and let's talk about cars. So how do you identify a car? How would be the possible ways to identify a car? And of course, we can apply the same model to file somehow. Is it by head? Is it by the body shape? Is it by the sound? Well, you can see that identifying a file could be done by different way the same way, in a similar way. In practice, here is an early file type identifier from French technology. So basically, you look at the head. I had fun drawing guillotine myself. So basically, typically, the file type is identified by a signature called magic that is fixed and enforced out of set zeros. Some have a meaning, some don't have. Most file formats have a magic signature of set zero. Some don't, like some archive formats, particularly the ZIP, which is also used in many other formats, like APK, JAR, and others. Some compressors actually enforce signature out of set zero. And PDF, which I like to abuse, theoretically has to start out of set zero, but in practice, only within the first K of the file. So that's how I could abuse it. We'll see that later. I could abuse PDF files a lot. Important point for ZIP is that ZIP is not that ZIP enforces a signature anywhere. It's actually ZIPs are written backward from the end. This is for old school reasons. When you are writing a ZIP file on the fly on multiple floppies, it will write the last information on the last disk, and it would minimize the floppy swaps. So basically, ZIP actually enforces that the start of the signature, I mean, the first structure to be checked is near the end of the file. The thing is, it's actually not respected all the time. And I did a talk on ZIP schizophrenia. If you want more details, you can check it later. A few formats are bound to hardware. And usually, when you have a memory range to be executed by a special chip, then they don't want a header there. So basically, TAR, ISO, MBR, even TGA, they start directly with the data. And optionally, they have a header that is later in the memory space. So those formats have an excuse because they are bound to some hardware not to have a magic at offset 0. But in general, a good magic signature should be enforced at the offset 0 and unique. And if you create a new file format, please respect this rule because otherwise, it can lead to a few abuse that we'll see now. So if you think how a standard tool, a standard parsing tools act, it just checks the magic, then it chooses a path and it will never return and try something else. It found the signature, oh, I chose this path. It must be this file type and it will ignore any other file type that could be included in the same file. So another common yet important property that is useful for abuses, you see a cow. There is something coming next, but you definitely see a cow. So it's like, because you can see a complete cow, then there is a cow, then there's something coming next. It's still a cow, it's still a valid cow, right? It means whatever you put after, it was origin, you see the full cow, so you think it's a cow and something else. And file formats typically define a terminator that says, this is the end of my file format and once the terminator is, you met the terminator, there's nothing left to parse. So with this abuse of file formats not under forced at offset zero and some files format allowing something that comes next, then you can just tag them up like the animals of Bremen and you can end up doing a file that has several file types. So this is an example of a jar jar being polyglot. So a being is a special game oriented video format and I chose a random picture to display by the video. So you just create your bink and then you append a jar file which are just zip and as we saw zip doesn't enforce starting at offset zero. So that's why a jar jar being polyglot is possible. Now another kind of file polyglots is when you have a host and a parasite. So if your cow swallows, keeps a frog in its mouth then it can speak frogish and cowish. So the outer leave space for the inner. Okay, a more realistic example, here is our cow with the various data chunks and if your cow swallows a micro SD then it's still a valid cow even if it contains foreign data that is tolerated by the stomach. So as an example, I did this file which was, so it was HTML, Java, Windows executable and a PDF in the same file. So it's interesting because you actually HTML launching a Java, dropping a P or a PDF exploiting and dropping a P are two valid infections chains and those two infections chains are present in the same file. And because I built entirely the file myself here you have that the PDF part of the document is actually inside the Java. So it's not just stacking stuff together but you put some one format inside the other. You have some real life example in the wild, oh sorry, another example that's actually used for pentesting. Basically it's a valid picture, here you see the black line which is the picture and it's also a valid JavaScript. You abuse the header so that it starts a JavaScript command and you close the command and then you put your JavaScript so it's a valid JavaScript and picture. You can break a lot of stuff with that. It's also available in BMP flavor for your pentesting purposes. So as I said, this kind of host parasite exploitation tricks already exist in the wild. It was represented in some famous movies and it's just that the inner part used some unallocated space left or made possible inside the outer file format. So as I said, I worked on the POC GTFO. If you're not familiar with this very nice publication it's really interesting to read but also the file itself. So the issue two was bootable, booting a Berliner-Spargel OS. It's also a zip and a valid PDF. The issue three I got a bit crazy. It was a valid radio message and a JPEG and if you encrypt it you get a PNG and a zip. Issue four was a valid TrueCrypt container, a PDF and a zip. And I just created this and two days after TrueCrypt is discontinuated. Issue five was ISO, PDF and a flash. So that's the ISO booting, a Tetris game which was explained in the article and you have the flash which was re-crawling the audience. And issue six is the latest issue that was out last month is also a tar, a PDF and a zip. And this is one of the examples. If you open with a lot of PDF readers they just see, oh, it starts with a tar and like the outer end picture, the reader says it's a tar, I open it as a tar and they never see the PDF. You imagine that with a security tool and maybe you got to win or lose depending on your side. So, oh yeah, a few interesting other polyglots. So Java, JavaScript, so it's two sources of JavaScript. You are using the source parser of the Java compiler, I mean compiler, yeah. So that's, it's a Java and JavaScript in the same source file. Or you can do the same at binary level so that you can tell your friends that Java is equal to JavaScript and yes, now it's proved. Okay, so not polyglots anymore but still worth playing with because they always have funny results. If you do extreme files like way too small or way too big they tend to bypass filter. So an analogy, that's actually a hoax. So the farmer got denied permit to build a horse shelter so he just built a giant table to protect his horse and he doesn't need a permit. So that's actually a hoax but you can feel it's almost real, right? And if you do it the other way, if you make a valid PDF files for Adobe Reader so that's a complete file. That is usually too small for software to consider it can be valid. Then they just reject it and they will not parse it as a PDF. So you can bypass scanners and security feature protections by creating a file that is too small to be likely uncorrupted, valid. Or you can do the opposite. You can do a huge file. So here with the 64K of section P it was crashing directly all the debug and other tools because it just, if they were trying to allocate everything even worse, the whole, every section was fully executed. Even though they are physically empty they were taking a lot of, all of them some space in memory and they were all executed and it takes actually a few seconds on the modern computer to run even though it does nothing but a lot of nothing. So you crash not only, not only it's slow to execute directly, natively, but also you crash a lot of analysis tools with similar files. So now we showed how to combine file types but we can also abuse the parsing. How do you parse a cow? This is how a user sees a cow. So how do people parse cows? Well, you all know, you all have an image of cows parsing. This is how a dev could parse a cow. But it turns out that not everybody agrees and this is how another dev sees a cow. So this is French beef cut of a cow, the official beef cuts of a cow and these are the Brazilian ones. So you see, it's the same data and different parser, different interpretation for different implementation of cow parsing. It would have been too easy, like mankind really not just sucks with computers but also with cow parsing. So as you see, the same cow can be seen in completely different ways. I mean, okay, the head is still the head, luckily, but still the parts are different because the standards are different. So if you abuse that, you can, for example, create a PDF that has three different trailers. The trailer is defining the root element of a PDF. So this is the same file and with three different viewers, it gives you the three random pictures. And because two readers are not, so Chrome and new PDF readers are not respecting the standard, like Adobe, then they see a different root document and they see a completely different document to be parsed. So basically one file, but three different documents and the other readers don't see the other document, the two other two documents at all. It's not a trick in a conditional if by detecting the reader version or anything. Or another one, a bit different here, but it's actually abusing a feature, but you have the PDF that shows something and when it's time to print, it shows something completely different. And here it's not lack of respecting the standard, it's actually a part of the standard but unknown to most people because, yeah, it's a security, oh, no, obfuscation by, what does it say? Not security by obscurity, but yeah, basically the standard is too complex with a lot of unused, not so useful for everybody or not security-oriented features, I'd say. Or this is a presentation I did last year. So this was my presentation, it was my first binary inception, so that was the same file was the PDF viewer and the PDF slide. So basically the file was viewing itself and at the time, the people who were watching the slides but they were actually already watching the demo because it was the file running on itself. And it was also a Java file and a JavaScript with Mario, okay, because why not? And the same file, if you run it into different viewers, you have, it was also schizophrenic so you would have a different document with open with a different viewer. So a bit combining everything in one proof of concept and this time it was not written by hand, it was like really generated and that's now what we do for POC GTFO. It's a make file that combines everything. It's not me crafting manually the file until the end and also we care about compatibility. That's the problem. Okay, so another problem for security in general is that you have unexpected parses and that's not mine, but that's Elka, I don't know how to say Elk came tough, who basically found the exploitation via the strings command, so that's a CVE and basically you would expect that strings, the command line tool, yeah, doesn't parse anything but just looks for string but no, it's actually calling parses and it was exploitable and it's a CVE and now he also did that with less. So the problem is that not only you have different parses but also you have parses in an expected place. So don't run strings on a known file, don't run less on unknown file, don't do anything basically because you never know, especially if the file comes from me, who opened my slides? Okay, just a little parenthesis on metadata but you know people like to attribute, oh, there's a Chinese string here, oh, it must be China or North Korea, yeah, why not? Yeah, so cows and metadata because you cannot see the head easily, then you just brand the cattle with a branding and the problem is this branding irons can also be faked or patched into another symbol, like you extend the sign on the cow to look like something else and the conclusion is that attribution is hard and the big important thing for us who I don't really care about because but still we did a proof of concept of a real branding iron but we didn't have a cow to just check about metadata modification live but just for the POC GTFO's sake, we did, I asked Moonin to actually forge a branding iron just for the sake of the presentation, that's me. I'm a normal guy. Okay, now let's change a bit from file types and let's move a bit to crypto stuff and the important thing is that usually when you encrypt file, you think that the result is encrypted in the terms of it looks random. So the operation of encrypting a file is usually thought as being random but it's wrong as you saw in the introduction and the result of encryption can be valid. So I'll try to introduce that quickly without all the advanced details. I did another presentation on that before. So basically let's take two fake, I mean, yeah, fake file formats and so we have a data file format and we have a text file format and the properties that are important is that data has an end terminator and what comes after this end terminator is ignored. So that data is tolerating appended data and text also tolerates a comment like I'll just take the normal comments. So you can, as soon as the source format tolerates appended data and the target format tolerates has a way to have a host, parasite, polyglot data, then you can apply that. So basically if you encrypt with AES you get something random in general. You cannot control what you have in input and what you have in output because that's, yeah, encryption. AES is still not broken to my standard here at this. But the thing is AES is a block cipher, it just works with block and if you work with the file then it needs to work with the mode of preparation. Usually what people think know about this is that if you use the bad ECB mode then you can still see the penguin and you know that this is bad encryption. So if you use a mode of encryption that just takes every block and encrypt them independently then identical block will give the same results so it's not good encryption basically. So one of the mode, the CBC mode is actually using an extra parameter, the initialization vector that you initially XOR with the first plain text block. So basically this is an extra parameter and then after encryption by AES with a given key then you get the first cipher block. The thing is this operation is you can do it backward, you can decrypt and XOR also you can decrypt. So basically if you define the first plain text block and the first cipher block then you can and the key is defined once for all, then you can craft initialization vector that will actually make this block and encrypt into this block. So now we control, we can control one block of outputs and then the rest we can don't control that anymore. Okay, but at least now we can craft an initialization vector so that the first block is something that makes it valid and we still have control of something. Okay, now what about the random rest? What comes next? We don't control it anymore because it's a result of AES, we don't control any parameters anymore. We cannot do, we cannot manipulate that. So basically we'll just, here we use the command feature of the text format of the target format so that our initialization vector starts a command. So this will be ignored. Okay, now we have chosen initialization vector so that this encrypt text start a command. So the magic signature start a command and this is ignored. Now if we take this file and we actually close the command we just append data and the original data, then this file is correct and it's equivalent to the initial file we wanted to have as a result of our encryption. Now the thing is if we actually just decrypt with the same initialization vector, we get back the initial blocks because these blocks here we're just depending on the first blocks not on the next blocks. So we get back the original data file and we have something random that we don't control but it's after the end term terminator. So this is ignored and this is still a valid data file as we wanted and this is the text file which has the content that we want. It's not exactly the original file but for a parsing perspective it's exactly the same file. It's just using a command and something garbage. So that's basically the trick of what was called encryption and that's what I used in different ways with PDF flash videos in the introduction. So now because AES, CBC only works on what comes from the previous block then this will indeed encrypt correctly as what we wanted. So now we can encrypt this file with AES into this file because we control the initialization vector but it's perfectly normal AES and AES, CBC is seen as secure. So it's not a problem. It's not that AES is broken. It's not that CBC is bad like ECB. It's just standard, normal. It's a part of the specs all along because the file format tolerates extra data and appended data. So that's the layout of the files before and after encryption and all of you. And you can even try it at home with just OpenSSL for example if you want to entertain your kids or your friends you don't need much. And you should try. It's very good. Go to bed. No, okay, I'll let you try. Okay, another kind of polyglots that is a bit artistic but it's interesting because sometimes advanced file processors just look for the body because they saw a head, they saw a JPEG header and they said, oh, here is my JPEG data. Now let's keep forward. And this is a JPEG zip and a PDF and the PDF shows the image, the JPEG is the image and the zip contains the image but the image is present only once. So basically you put three headers and you make them point to the same data. So if by any chance an advanced tool was just checking the bodies then it will just see one file type and it will ignore the others. So this is a layout. I'm not sure it's really visible now but basically you have the layout of the file and JPEG PDF and zip. So JPEG starts first because it enforces the magic at zero. Then PDF and zip and the image data is only seen once. You still need to abuse the formats a bit. So for example, you have a part of the PDF structure into the zip commands because it was not made to be done so initially. Tough problems. So okay, different one. This is a picture of cat or proof of concept and it's a BMP that is not compressed. BMP has a funny characteristic that enables to define the bit mask of each color. And if you map it to a 32 bits then you can have bits of free space. So you have for each, so you have for each double word, you have 16 bits that you can control. What can you do with that? Well, you can put some sound so that you can play the picture. Seriously. And so now you can, I won't make the demo because that would explode your ears. I mean, well, I could but you would not like me for that. But okay, you have sounds. Initially we put some actual music into the BMP that was playable. It's not secondography because you can play it directly from a sound player. It's just raw PCM. But we went better, we went further. So if you consider the BMP as raw PCM and you encode a picture in the sound so that it's viewable via socks as a spectrogram then you can have another picture in the picture when playing by sound. So never forget to, so never forget to open your favorite picture in the sound player. And you have all the gels here. And Philip, actually he did further and he did with three channels encoding each RGB picture. And he could represent, that's the actual spectrogram view of the sound data that is integrated in this picture. So we've represented with both lines. So this is image and this is sound. Okay, another kind of artistic file. So this time you have two heads with the same type for the same body. And of course, it's not secondography once again because the data doesn't need any extra extraction. It's usable directly. And but it's interesting. So this one I do it live. So this is RGB picture. And does it work? Yes. Ooh, it's a bit small. So where is it? Whoa. This is this picture. So it's a PNG picture. And this is RGB picture. So let's show it again. So basically the data is made of triplets of bytes and for red, green and blue colors. Okay. The trick with that picture is that we, wow, does it work? Yes, it works. We added a palette, a random palette. And basically the trick is that when you have a picture data for a palette, then each byte is an index in the palette. So the idea is that you adjust each RGB value, red, green, blue, so that it actually maps to a different color in the palette so that it's a valid RGB, but it's also a valid picture. Yeah, I probably won't do it live. And so basically you have a second picture that is stored in the same data via the palette. And in this case, this is the picture and this is a barcode inception because you have a QR code and a data matrix code inside. So depending on your reader, then you will see one or the other. The danger is also if you scan directly or if you just swipe, because if you swipe, it will see the smaller one first. So just, you can see it, you can see the data matrix here. It works better with a white line so you can notice it if you are trained. But if you didn't know, then maybe check twice. I mean, feel free to scan it. Yeah, you can trust me. No worries. Okay, I also worked with famous cryptographers and they created a collision of a modified version of Shawan. So this is the full Shawan, all the rounds, but it's just that Shawan has five constants, internal constant, and you just modify four of them so that it looks secure like Shawan, but we actually can control something and get a collision. Okay, the collision rules are complex and it gives you this. Okay, so you have these two blocks that collide, like, oh, really impressive. And the rules are a bit complex. At most, three consecutive bytes without a difference and every day world, only the middle two bytes have no differences. Okay. And this takes, like, between 15 and 30 hours to compute on 80 cores. So, whoa, this is a modified Shawan collision, but it's not exactly super impressive, right? So, okay, so my task was to abuse that with a valid file format. And JPEG has the nice availability to have several, so it has a very short signature, and then it has several markers, E0, E1, and E2, that are all valid, and then we just abuse the length so that we can combine two pictures, and then we, these are the question marks we don't control them. But at least we can put something at the end. The other good thing is that the length is not too long, it's just on the world, not a double world. So you can, at the end of this length that was generated by the cluster, so we don't control, you can put back the start of the next image. The result is a bit more visual, two random pictures. So that's, they actually collide with the modified Shawan. And this will just work with the most JPEG size, I think, with any JPEG, and that was just before the final, so yeah. Just a coincidence. And of course, because the problem is that the back door only gives you one collision block, not as many collisions as you wish, it's also interesting to actually turn this collision into a multi-type, a polyglot collision, so that we could actually make not only a collision with various file types, but also with, so we have the collision, but also with various file types so that the back dooring is more efficient potentially. It doesn't mean Shawan is broken, but it was certainly an interesting experience from a file format perspective. Okay, this one is a real demo. You know probably the Pony Award, and Pony Awards has different categories, and one of the categories, the best song, which I'm not sure is pawning a lot of things exactly. So Melissa won the Pony Award. That's, will it run, can you open, no, yeah. Yes, can you open, no, can you open? Yeah, so I made a PDF with her picture with the pony and the lyrics, okay? But Melissa, so a bad idea, also can, yeah, well, I'm not sure if I should disclose. Can I see, yeah. What happened, I hope I have sound. Oops, where are you? So this is a Nintendo music as a polyglot in the PDF. So you have the music, the lyrics. Oops, why did it stop, I don't know. Now you have a good proof of concept. You have the picture, the sound, the lyrics and the sound. I don't know why it plays so fast. So obviously, never forget to open your PDF in your favorite console emulator. Actually, I went further, and maybe you remember that, this picture, if you're old enough. I mean, I'm young, but yeah. And in a similar way, I use, this is, as you can expect once again, this is a PDF document, and the PDF document is a valid Super Nintendo and Mega Drive ROM with their funding logos up there. Okay, so I still have plenty of time, do I? Yeah, what? Well, I don't know, well, let's, oh yeah, do I do the bonus first? I mean, I still have a lot of time, right? Yeah, oh yeah, so another one, I'll go back to this. Oh yeah, well, I'll do the conclusion. I can do the conclusion on the bonus again. Yeah, okay. So the conclusion, don't forget what you learned today. Open your PDF in a hex editor. Your pictures in a sound player or a console emulator. Just apply any cipher in case, and double check what you printed. So a more serious advice for today. For a security reason, don't do anything. And for a research reason, try everything. And especially if you say that you got something, stop the marketing, and just stop blaming people, oh, they got owned, because usually people blaming the other, they got owned, are usually people who just want to sell their security solution. And yeah, POC, prove it or get the fuck out, because that's annoying to see all the people, oh yeah, we have this, but yeah, we cannot prove it or anything. That's really annoying, I think. As you can see, I like open, oh yeah, all the proof of concept of this deck are public and everything. And they will be on my website. A bit more seriously, so for the file formats, there are many abuses of the specs in many ways as you can see, but the specs itself are often wrong or misleading. The thing is, there is no one who steps in and says, okay, now we want to have like a secure zip, PDF zip secure, we just leave the people who originally created the spec, maybe update them and then we follow them blindly. And there is, we have, how do you say, reaction of the InfoSec community when there is an exploitation, but because the specs suck, there is nothing like I could say, okay, now let's enforce, I don't know, something like zip secure or something that would be more restricted to security and not keep it in control of the company that is just marketing their professional product, but it's still not really secure, you know? A bit like public reviewing, exactly like for cryptocyphers, format specs don't have this and the specs usually are really misleading. And there are very few public parsers and even fewer dissectors, like parsers that really understand what the file format is about, not just the structure and the double word and everything. And now it's, if you make it, it goes in the wrong way as usually, mankind. And for example, standard tools like Office Adobe Reader, they had a secondary parsing mode where they say, oh, this file is detected, it looks like it's corrected, maybe I could recover it. And you can see that they have a secondary mode that is even more lax than the official one. And just, it really puts back together, oh, now it's valid, okay, I'll execute it and suddenly you have something that shouldn't be valid at all. This is just recovered, thankfully. It's good for user, but for security, it's not. Or sometimes it's actually very annoying, like for example, WinRar has a different parsing mode when it's viewing the file and when it's extracting. So yeah, what you see is not what you print, what you list is not what you extract and everything. Yeah, very difficult. And if, once again, this was a kind of overall talk on the possibilities, but for the technical details, check my previous talks because I went into inscription with details, true crypt and everything, or my articles in PoC GTFO. Thanks a lot to everybody. And I have some bonus, but maybe first, oh yeah, so that's it. So do you have any questions? So if you've got any questions, please line up at the microphones. And yeah, let's start with the microphone too. From your experience, do you think it is possible to write a file parser that will properly decode something as seemingly easy as a GIF file? Because Google a couple of years ago decided they couldn't do it and they decided, like for Gmail, when they wanted display pictures, images, they wanted to sanitize the byte stream and finally they decided they couldn't do it so they changed their model so it runs in a different security context. So do you think it's possible to write a parser that is clean, can produce a cleaned up version of a file? People are trying that. I'm not trying personally. I would first like the specs to be a bit more reasonable. But no, I don't know about the formally possibility of this and everything. But what I see is that when they say this buffer should be null, the parsers are never saying, oh, if there is any non null byte here, let's return, if I'm in secure mode and say, no, this should be null, so let's be a bit German and strict. And, but yeah. Okay, then microphone one please. So what would your concise advice be for someone say designing a new binary file format? I mean, it seems to me it's start with a simple header, make sure you check how, you know, that there's no garbage at the end and then that's that. Well, first it depends if your file format is like made of pointers, like it's made to be executed by an OS or if it's like a structure, sequence of structure like images. But yeah, for those OS formats, you should, it's difficult to enforce that because the loader evolves. And yeah, then people have their own interpretation with the compiler, but at least I was thinking when enforcing the actual content, it's more with data file format. And yeah, OS, at least the thing is with the OS, usually you have one standard loader that no one knows fully, but it's like really defining the standard because it's not like everybody likes to write his own elf loader for no reason. Sounds like we need a how-to, thanks. Okay, microphone free. Hi, well, first of all, thanks for the talk and also thanks for your work in POCR or GTFO. I have one question. First of all, where can I download this presentation? And secondly, how many programs should I try it with? I let you find out. Okay, yeah, there are a few extra spoilers in POCR GTFO, but I have my secret. Next one is microphone one. Yeah, you mentioned that in the PDF spec, there are basically two separate parsers kind of, one for viewing and one for printing, but that sounds like a really bad idea. Do you know why that is? Is it for historical reasons or? No, it's not actually the same. In this case, it's not kind of two parsers, it's just you, it's for what you, the requirements of the screen or the requirements of the printer. So it's actually you enable or disable some content here and it's not a discrepancy, it's a part of the specs. So the printing schizophrenia is actually the only one that is official. Yeah, it's layers and you make one layer appear by default for printing and the other for viewing and it's because people are not used to enable or disable layers, then you can abuse that. But to be, I accidentally found a few days ago with a manually edited PDF, a different schizophrenia from Chrome printing and our Linux where suddenly a parameter was ignored and you could have that, but I didn't have the time to experiment that further. And this time it was true schizophrenia, like what was on the screen was different and it wasn't a feature. I mean, it's for a feature for me, but thank you. Microphone free. Yeah, thanks again for the talk and I have a question. How did you find out about all the possibilities about the different parsers? How did you find out what you can exploit? Did you just read the specs and see, okay, I can comment there and I can append there and it stays still valid and then I can combine these two formats or did you just, it's the exhaustive testing. So it's a part of my workflow when I'm doing a poster. I'm reading the specs a bit, but just enough so that I can create a file manually, but to be able to explain it in a clear way and make it small, I need to be sure that I know why each byte is there, just in case I could remove those bytes and make the file smaller so that it fits on the poster. And then in the end, I actually created most of this file manually. So I have good, I have total control of the files. That's why I could mix the Java and PDF all together because they are all written in Assembly EX86. And then I can easily experiment, say, what happens if I change the pointer here, if I suddenly add a buffer and I get a blue screen or different result or everything? So it's not definitely exploitation research, but it's because I study, because I want to make sure what each byte is for, for the clarity, for the final result of the clarity of the poster. Then, consequently, I can manipulate every structure of the file freely and this happens sometimes, but many of those were discovered by accident, like open it in different viewer and you get a crash or something. Okay. But it's not active fuzzing exploitation. And I just read part of the specs that I need to for my limited understanding. And I don't go through the whole specs myself. Okay, thanks. Okay, then we've got a question from the internet. Yeah, this question is actually two questions or kind of a combined question. Somebody wants to know what are, are there any like counter measures and if they are and how could you detect that somebody did like this advanced binary magic? What are the counter measures and, sorry? And if you can detect if somebody did this stuff to a file? Well, you can still check if there is something after the appended data. You can still see, it's a problem that you can, you could check if a buffer is big and it would not use, it's not referenced anywhere in the source. I'm thinking about FX Blitzablighter work on Flash. And as far as I, so it was a sanitizer for Flash files. It was really rewriting the Flash files in a clean way. And as far as I know, no one was really interested. So he, even though it was a fully working tool. So yeah, people just want to open the files anyway. So this work should be done really at the specs level and not as an extra tool. So there are counter measures, but when they are well done, then people don't use them. Okay, microphone two. Hi, I would like to know whether you have ever tested how your fights behave in a forensic environment like X-ways, NKs, FTKs or something like that? Not really. I heard of funny results with the virus security tools, but I'm not trying actively. And yeah, I expect surprises, especially if you see my previous talk that was focused on file schizophrenia, where you have a zip file that was parsed in four different ways, different depending on the tools. But I don't try that actively, lack of time. Okay, thanks. Then we've got another question from the internet. Yeah, the question is, do you think we need to return to raw and plain text, ASCII and ASCII art for textual representation? No, absolutely not, but it's just that if you think about it when you have a specs and it says, this is reserved and should be zero, how many parsers are actually saying there's something wrong because it's not zero? Maybe I'm old fashioned, but definitely as soon as there is such a field, then I can write some whatever in there. And as long as I can allocate a buffer, I can put whatever in there. So no, not going back, but at least not being afraid to enforce a few things like you have for like a safe crypto algorithm where people have public reviews before things are going public, I think. Okay, I have any more questions? No, just a few bonus that I had. Okay, so bonus stage. Yeah, the abstract of that talk was initially ASCII only because an abstract needs to be ASCII only and a PDF TAR polyglot with some ASCII art. So that's probably why people were afraid to actually check my abstract in the first place, but the FAR plan removed all the new lines, so I went back to a standard abstract. But that's the file name of the TAR archive, of the TAR, yeah, okay. Solar Designer did a great keynote a few months ago and this keynote was, the title was, Is InfoSec a Game? And the keynote was a game for which he used an old engine and he used some very nice graphics that can ring a bell, including a dollar one. And so the whole keynote was a game that he played through and he made all the interactions. You have, I don't know if you see the go-to, fail, peak, poke, exploit, patch, so it's really, but it's a bit difficult for people to just enjoy. His game, because they would have to run it into Dustbox and everything and go through the game without knowing really what to do, not everybody has the time. So he created screenshots of all the game and I just wrote it by hand, the PDF that contains all the screenshots in the original resolution with bundle, the actual game so that you can run the game from the PDF, because why not? So that's a good way to distribute it as a single file with everything without any huddle. Yeah, just Quine. Quine, people usually see artistic and valid files as Quine. I don't do that much, but just in case. So Quine is just a file that prints its old source. So basically, this is a PE file that prints is on source, but once again, I don't use Linkers. I create a whole header structure myself. Then you can do that, yeah, Quines are very sexy, so using a compiler chip to my standard or Linker. So you have Quine relays and basically you have a elf that creates the source of a PE and a PE when you create the source of an elf. But I'm really a little player here because there's a Japanese guy who did that with 50 languages. Oh yeah, a few other and description proof of concept, the initial one. So you encrypt this into this and so on. So I had fun with random pictures once again. Then you can also combine. So this is a polyglot with a hash collision and schizophrenic because if you think about it, it's always possible. It's more artistic and that's about it for today. Thanks for your attention.