 Hello and welcome everybody to the next talk after the break. It's so great that you're all here to learn about QRT. And I'm so glad to introduce Henry Plutz to you, who usually speaks about security at Congress, but this time he brought to you some great tool he will release and explain to you. I hope you're all here to make QR codes better, more beautiful, because nowadays there's still many of the times ugly, and next year at Congress I really want to see some great stickers with great QR codes. So give it up for Henry. Thank you. Yeah, thank you for the introduction. It's actually good that you said next, Congress, because it's not quite finished. But I will give you some building blocks for a tool and I will... Most of the talk will consist of me explaining to you how a QR code is constructed, which you will need in order to manipulate it. As you might or might not know, before the talk I've met a couple of people who didn't even know what QR code was, that that is some kind of bar codes. QR stands for quick response, was developed in 1994 for the automotive industry in Japan in order to store more data than a conventional flat 2D bar code. It has since been standardized as an international standard in ISO 18004, first standardized in 2000 and updated in 2006, the 2000 standard is no longer valid. QR codes have a wide adoption in Asia. They have been using this stuff for ages now, but I've seen, especially last year or last two years, I've seen a lot of codes in Europe and I was especially glad to see so many QR codes on this Congress compared with last year where there were basically none. One of the reasons why this is now getting more and more adopted is that there are free reader applications for most smartphones. Even for non-so-smartphones, my old Nokia, for example, didn't have that many applications, but there was one bar code reader built in and I was always confused because this thing wouldn't read any EAN, the 2D bar code that you find on every product, but it was called a bar code reader and turns out this was a QR code reader and it could read a QR code just fine. Apart from the standard use there, which I'm only talking about little, people have been using QR codes in a variety of creative ways. Most of you might be familiar with the right one. That is one of the first examples for a QR code that isn't just from machine consumption but also nice to look at for human. That was 2008 for the BBC. It turns out that this isn't actually a very sophisticated mechanism. It only looks like this because the BBC letters on there are just overlaid. You could have drawn, I don't know, a pony or a unicorn in there and from a machine reader perspective, from the bar code reader perspective that would have been the same. Those look like they belong to the code but they don't. This example I got, so I'm always citing my sources, this example I got from the great Russ Cox who has a very similar project that isn't as wide in scope. I'm going to reference that later on again. He also referenced another block entry from 2009. Like I said, the people in Japan and Asia have been using QR codes for quite some time now. Those are Disney advertisements which also have a perfectly readable bar code that looks like it might include some of the Disney characters but they are not really part of the bar code. This is the project from by Russ Cox, Q Art, he calls it. He did really great things and is able to encode pictures in most of the pixels. Those two codes are readable. They are fully valid QR codes. There is no trickery, no error correction involved. I'm not going to explain fully how that works. I'm just referencing that. You can read up on the first link. For these examples, like I said, these aren't properly integrated. They are just over-painted. That works because the QR code can handle a lot of damage. People might smudge and there might be coffee stains on the code. That's why there's error correction in there. If you look at this code and this code, human might not be able to see the difference that easily but I've inverted a lot of the pixels, which means they are damaged. They don't encode what they should encode. But a QR code reader, if anyone has one, you can try to read this one just fine. It's encoded at the highest error correction level. And at the higher errors correction level, the code can be damaged to up to 30%. This one might be a little bit easier. This is where I, instead of inverting the area, I over-painted it. I could have drawn a logo in there. I just made it black. Of course, now it's not 30% damage. It's only 15% damage because, on average, 15% of the pixels were already black. Over-painting is in wide use. One of the most recent examples I particularly liked was a project called SenseDorm. This is their logo. It already looks like a code. And this is their QR code. It does look like the code, the logo isn't better there. It's not. It's just over-painted. With my tools, you could see the original code that was over-painted on. Another example, I'm not showing too many examples because most of them, these are for advertisement. I don't want to advertise. I might advertise for this project because it's cool. They also have a code that is using over-painting. Like I said, the other method to do fun stuff is the project QR by Russ Cox. He does a different trick. He noticed that it doesn't matter if you have a URL, an HTTP URL. It doesn't matter if there is a very long hash that doesn't really exist. This. So this is the actual HTTP URL. And this is a hash fragment identifier which the browser is going to look for but won't find and then just ignores. And the face you see in the code below here is actually just encoded in this number sequence. He has done a lot of math in order to calculate the number sequence that will encode this face. With this technique, he can control both of the data and error correction pixels, though not particularly simultaneously. He can only... So a QR code, for example, has 26 words in total. 16 bits of those are data. 10 words are error correction. Of course, you can only control the data, so at most you can control 16 words in total, like 10 data words and 6 error correction words. So this was the introduction and a little bit of motivation why I want to be able to modify more than that because as you can see, there's still a lot of structure in there that he can control. Turns out that a QR code, as you know and love it, looks simple but isn't. I call it deceptively simple. The basic module, or the basic unit of a QR code is called a module, that's simple enough. Single pixel in ISO speak is called a module, but a lot of the other stuff also has some function. Those big ones in the corners are called the finder patterns. Those actually need to be there in order for the QR code reader to find the pattern. And what I didn't notice before, I'm not sure if any of you has noticed this, there is a timing pattern over here. Those pixels always alternate. You usually don't notice that, even though a human normally is pretty good at identifying patterns, I didn't see that before. The ISO standard specifies the encoding procedure. Turns out that also what I didn't notice before is that there are four different encoding modes, numeric, alphanumeric, 8 bit bytes, and kanji. Like I said, this originates from Asia. Then you have to determine the version. I'm telling you a little bit about that later. This is then encoded into a bit stream. Padding is added. As you will see later, all the encoding switching is done in four-bit units. Everything is bit-based. Lengths are sometimes nine bits long, sometimes 11 bits long. So everything after that has to be split into code words. The code words are then, two of these code words then is an error correction code based on readSolomon or edit. Afterwards, which is also an error correction code, which is also very confusing and I didn't know before, the data and the error correction words are interleaved. So even though the code is filled from one particular position, you won't find your data at that position incrementing, but your data is scattered all over the place. I'll show a picture later. Then those special modules, those special pixels are placed. They're called function modules. Another thing I didn't know before is that a masking is applied. There are seven different masks that are overlaid over the code in order to obscure, deliberately obscure features you would normally see in the code. Of course, if you know that beforehand, you can adjust your data so that this doesn't bother you like Russ projected. But if you would just encode, for example, a sequence of spaces, you would normally expect, I'm not sure if anyone has ever tried that, you would normally expect to see some pattern in the resulting code. You don't because there's masking applied. And at last, format and version information is added so that the decoder can find out what kind of code he has. Like I said, the data has to be encoded. There are four basic modes. The first, the most obvious and the first one that was originally in the automotive application is the numeric encoding mode, where you can only encode numbers. They do by encoding three characters into 10 bits. They just use it as an integer from zero to 2023. Ignore the last 23 and have three digits. Alpha numeric mode uses a substitution table. They have 45 characters in this table, and they encode two entries in the table into 11 bits. 8-bit bytes is just plain 8-bit bytes, as you would expect. And there's a kanji mode that I've never seen used, which, according to the spec, uses 13 bits for two characters. I've never... I don't know how they are mapped. I don't speak Japanese. There's also something fun. It's called extended channel interpretation, ECI. With this, you can switch between different interpretations. You use the same basic modes. You can... the same basic modes that encode... normally encode ASCII data or normal characters, 8-bit byte characters, but can make them mean something different. The most important mode for most of you probably will be UTF-8, which has a number, but there are a couple of different modes. I haven't explored them at all. I'm guessing you can also add some degrees of freedom by just choosing a different mode, though it might be doubtful that any reader can read that. All readers can read that. There's another interesting feature I've never seen used, which is structured append. If all your data doesn't fit into a code or your encoding region isn't quadratic, you can chain multiple codes together up to 16. All of these codes, QR codes, will then have a checksum, and each of the codes tells you that the IM code 1 out of 5, M code 2 out of 5, and so on. So your reader will then prompt you to scan all the other codes, and afterwards, chain the data together. The reader that's most often used, which might be the barcode reader for Android, doesn't support that. I was somewhat shocked, but my old Nokia phone, for example, does that. I have an example on the next slide. Another mode character is the FNC N1. I'm not going to talk about that. That's barcode stuff. There's a lot of special stuff. This is a structured append example right out of the spec. If you want to try that with your reader, the four codes on the bottom have the same semantic meaning as the one code at the top. It's just an alphabet. But for example, the Android reader can't read the four codes at the bottom. It will just show you the sequence, the part sequence that was encoded in there. If you try that with an old Nokia phone, the Nokia phone will happily tell you that this is code one out of four, and will prompt you to scan all the other codes. So how much can you store in a QR code? Quite a lot. The code size and storage capacity depends on something called a version, which was very confusing to me at first. There are 40 versions, but they are not versions, like you would say in a software speaker. They are not software versions. A version is just the length of the code. They do have something that we would normally refer to as a version. They are called models. There's a model one and model two code. We are always using model two codes. Model one codes are deprecated and not in use anymore. So this is the version one code. It stores 152 data bits. And some of these may be error correction bits. This is a version two code. As you can see, something happens when going from version one to version two. Another alignment pattern appears. Those are the finder patterns in the corridors. Version one only has these. Version two has an alignment pattern. Then it just gets bigger and bigger. At some point here in version seven, additional alignment patterns appear. And another change is that you can see that now version information is added. Version seven stores 1,200 bits, version 10, 2,000 bits, 26,000 bits and goes up to version 40, which can store 23,000 data bits at this error correction level, which this one just stores... Yeah, you can try this one just stores the letters QR and the rest is padding. I didn't need all those bits. From this back, there are pages and pages of tables. One of these tables tells you how much storage capacity is in each version of the code. Version four, for example, like I said, 23,000 bits, which makes something like 7,000 numbers or almost 3,000 8-bit characters. Like I said earlier, data and error correction is interleaved. There's another table from the specification. These are code words. A code word is just an 8-bit byte. And the data code words arranged here from there to there. So if you had Hello World, this would be an H, E, L, L, L, and so on. And for each of these rows, the error correction is calculated. And then for including into the code, they go by columns. They go from here to there, then there, those, and so on. On the next slide, I'll show... On the slide that follows this slide, I'll show an example. When laying out the data blocks, the code words, you will start at the bottom right. So this is the first one, the second one, third, and so on. Goes up, then... No, it goes up, then down again, and in a zigzag pattern. And this is another debug output from my code that tells you which code word is which. This is the first. This is the second. As you can see, this is the interleaving. I'm not sure how many of those are, but this may be one, and this may be 11, 21, 31, 41, 2, 12, and so on. So this is the interleaving in action. Like I said before, something most people don't know, there's masking involved. This is so that a QR code reader that looks for QR code can just look for the piner pattern. That's why it's called that. And doesn't have to worry about finding the piner pattern in the data. Even if you encode some data that would lead to a piner pattern appearing in the data, you would not see it in the QR code because there are seven different masks that are just X-ORG onto the code, and there's an algorithm in the specification that tells you that you just try all those seven masks in order, then calculate how bad the resulting code is, and then choose the least bad one. There's another debug output from my code from my program tells you where the function modules are. We already saw the finder patterns in these corners. Right next to it is the format information. Those are the Fs. It's here. And there's a second copy down here and over here. So even if you cut off one of these corners, it will be okay. There's a version formation here and here. So even if you cut off one of those corners, it will be okay. And those are the alignment patterns and the timing pattern. To show you how many degrees of freedom you have, it's best to use an example. I'm just encoding the text hello23c23c3. And using these mode indicators, I would split this into two different sections. One has 8-bit bytes because the normal alphanomeric doesn't have lowercase characters. So I'm using 8-bit bytes for those characters, alphanomeric for those characters, and then there's a terminator. This would be the resulting bitstream. You start at the top left and then just use this as a bitstream at padding code into 8-bit bytes. Most often, if you look at the back output, it will start with 64. If you look at the previous slide, that's the 8-bit byte indicator. And this would be the structure that results. Those are the data words. Then I calculated the readSolomon error correction code. Luckily, I didn't have to do that by myself because I suck at math and there was a Python library that I didn't have to use. So I did that. Yay, open source. And then encode this into the matrix, place it, and this is the code that would result. One of the possible codes that results. Turns out that you can create quite a lot of different codes that are semantically equivalent, but look nothing like each other. Which is the entire reason why I'm here. All of these codes, encode, you can try that with your reader, encode the exact same sequence. Hello, 32C3. But they don't look anywhere near close to each other in the encoded version. And they also don't look particularly similar in the debug version. So this at the top is with masking applied and this at the bottom is without masking applied. Now, if you could control all of these elements, you could make the code look like whatever you want. My grand plan for the future is to have a graphical user interface where you can just click at a module and it will tell you what you have to change in order to change this module. I'm not there yet, but I have most of the infrastructure behind that that is able to tell you which of these modules has which function and can help you with decoding and encoding so that you can get your desired result. With this, in order to do this, you need something like an assembler. The interface for the assembler is not quite ready yet, but as you would do with a software assembler, you would give it very low level information about what it should do and then it will assemble the code for you and you can modify it on a very low level. For this, those are the non-standard dimensions, which you can customize it. You can play with the encoding modes. You can split encoding modes within the text, which will be semantically equivalent. You can insert extended channel interpretation into it at random points if you want to. What you also can do, I haven't tried that before, is to add padding. If you notice before there is a terminator, the last mode indicator is zero, which means this is the end of the mode indicators and afterwards comes padding. Instead of encoding all the data, all the picture as rusted into the data, you could just encode it into the padding and I'm guessing most decoders would decode that just fine. What you can also do, this is rather standard, it would be changing something non-semitically equivalent, like for URLs, you can just try lowercase and uppercase and the cutest way, I think, would be to just paint what you want into the code, see what it decodes to, maybe fixing the error correction and if it decodes to something that is a valid domain name, you can just register that and have a redirect to your actual site. That's what I'm doing next. Currently my business card is a pretty standard code, but I'm guessing I will change that soon. You can combine all those techniques with the existing techniques by Russell and if you are close as close as possible to your desired code, you can just over paint what is left. Everything has error correction. Like I said, the data has, you can choose that, 10 to 30% error correction. The function formation has 30% error correction, so there are five bits encoded into 15 bits. The version formation has 30% error correction, those are six bits encoded into 18 bits. So you can change basically everything as long as you don't change too much and it will readjust fine. The code I've just pushed into GitHub, like I said, it's not very useful yet. I'm still working on it and I think most of the time we have, there are four minutes left. Oh yeah, you can try to scan that. Thank you so much, Eric, that we still have time for questions. So if you have any questions, please come to the microphones. We have them here and also in the back. And we also have a signal angel and I'm checking whether we have some questions from the internet. We have one, so please signal angel. Thank you. So one user in the ISC states that according to Denso Wave, they have an FHQ on qrcode.com and it states that they may decide to exercise patent rights against codes with colors or illustrations. I know that there is a variant of QR codes that uses colors for information. I'm not sure if they refer to that. But yeah, fuck patent law. I mean, it's Germany. Thank you. We have one question at microphone two. I'm not exactly sure you talked about the masking. How can we decode the masking? I mean, you said encoding works by choosing the best, but how does it work for decoding? So the masking is encoded into the function modules. Over here, so the first two bits encode which error correction level we have. I forgot what the mapping is, but those first two bits encode the error correction level and the next three bits encode which masking is used. Those bits are not masked. So you just read these bits or there's a mirror, I think, over here. You just read these bits and then see what masking was used and unmask it. It's XOR, so you can just do it again. Thank you. So I don't see any further questions. So thank you again, Hendrik. I hope you are activated to help him with the project and make cooler QR codes.