 And now, our next speaker is Jan, who is a computer science student in Aachen and he will talk to us about circumventing video identification using augmented reality. Please welcome our speaker. Hello. We are talking about video identification. So let's first find out what is video identification. Video identification is a process to identify a person remotely so that we know he is who he says he is. And this is done using a video chat like on Skype. And especially this is done between a call center operator and a user. And example use cases are like if you open a bank account, purchase a loan by SIM cards or insurance and in Germany, for some of these use cases, this is even required by law. And now I come to the story how I had the idea of this process, what I'm going to show you. I needed to create a bank account for me and my flatmates and because I'm lazy, I did it online. And so I went to a bank, entered my information and was greeted by a call center operator who also asked me for my information again to verify that that's me. And then, still happy, I was asked to present my ID card. And then we came to trouble because they wanted to see some watermarks on it. And I was on my laptop and the lighting in my room was not perfect. So I had to unplug it, move around, do some dancing until they could see the watermarks. In between the Wi-Fi dropped, of course, so I had to do it twice. But in the end, I got it working, but I noticed, okay, this is not fun. I was annoyed and got thinking, okay, this stuff can really be faked somehow. This is not really secure, just checking my ID card using a webcam. This is not what it's made for. So I began tinkering around and I'm going to show you what I found out. Now, back to the workflow of video identification. The German institution for financial oversight gave out a letter with the exact workflow, what they have to ask and how. And they consider the identification successful if three watermarks are correctly identified on webcam screenshots, if the text is correct, if the text matches the personal information I gave on the form. And if the image on the card matches the person in the webcam feed. Additionally, I have to move a finger in front of the card and I have to move my hands in front of my face. I am not sure why. And the agent is supported by tools, meaning when he sees the watermark, he presses screenshots. That's why you have to hold the card for a few seconds in exactly the right position and angle and take screenshots for reference. So that's for video identification. What are we going to do now is I'll show you how we can fake this webcam footage, but only as a proof of concept, meaning no perfect results. You see some small errors. And I only show still images while I promise a video footage works as well, but the lighting conditions have to be exactly right. And the workflow I'm going to go through is first we create fake ID card images like textures, templates, and then in the second part we use these textures to render the card realistically. And as I'm not allowed by law to create fake ID cards with real people, I have bought my stuffed animal here. This is Hank. It's an octopus. Small octopus shouldn't have an ID card, never had one, but we'll make some for him. And as to manage your expectations, I give you a small hint on what we'll end up with. This is me holding his ID card with a watermark, clearly genuine real card. His name is on it, even some signature, everything fine. And so now let's go on and talk about how we generate the textures. We scan some publicly available documents. We find out how to remove the sample text, we find out which fonts are used on the real documents, we'll talk about the watermarks, we'll talk about how to generate our own watermarks, and then we'll talk about rendering the card. Okay, so there's a law called Personal Ausweis Verordnung and it, of course, contains some sample images, small on the page, but if you right-click it and press Save as, you get some high-res sample images for use. Awesome, without any watermarks, everything correct. We can work with that. Same for the backside. But now we don't have the watermarks and if you scan your ID card, you have garbage, basically. This is not good. The Bundesdruckerei, the company responsible for actually producing the ID cards, gave out a nice flyer with a list of all watermarks, visible and invisible ones, and now it's up to you. Guess what happens if you right-click on the sample image and press Save. Okay, so we got some nice sample image. We have a map of all watermarks on the ID card. We can use this as a baseline to work through the watermarks. Same for the backside. We now do a quick recap of the watermarks, especially those with images on this flyer from the Bundesdruckerei. We have repeatable background patterns. We have microscopic prints, which you don't see on the webcam. We have UV print, which you never see on the webcam. We have a variable color print, which you may see but won't see on the webcam. We have contrast inversion hologram. And did you notice that this is pretty high resolution? So we can just redraw it in Gimp or Photoshop and have our own sample watermark to work with. Next, we have some images like the holographic portrait, which I'll show you later how to fake, which is the only one I'll show how to fake because the others are homework for you. And then we have the eagle, the hexagons above the photograph, of which you only see some. And at 8, you see a microphone, which is basically word art because no document is perfect without word art. And next, we go to the backside, again, repeatable background patterns. Microscopic print, which you won't see. UV fibers, which you never see. Some embossings, you can feel it, but the webcam can't feel, so you won't see it. We later need to remove this text there, and if it's too hard, we just use the address change sticker and put it on top of it, and then it's no problem for us because the sticker is very easy. Remove the text there, no problem. And we also have this tilt image. The effect is like from a Happy Meal at McDonald's when you get some tilt image there. It's exactly the same. And finally, the machine-readable area, but that's not really a security feature. Oh, and the security thread personalized with your own text, but it's so small you don't see it in the webcam. Okay, so to sum it up, all our security features, we remove those which you basically don't see on a webcam, and now we remove those that are basically already in the pictures I showed you, the sample images. And now we are left with seven security features of which we really need to take care of because you can see them on a webcam, and they're dynamic when you tilt the card. They have to change somehow, and this is basically what you really need to fake the hard part. And to sum the security features up, only some features can be verified. The UV light is not available for the video identification, and when we, the bad guys, want to do some video identification and want to fake it, we can, of course, enforce some low resolution, make bad image quality like compression artifacts, make the footage blurry, or make bad lighting conditions, or when we have some trouble, we can drop some frames. No problem, bad internet. So we are on the easy side of the webcam. Okay, now we have the sample textures, and we need to remove the text there. How hard could it be? Okay, we have repeatable patterns all over the ID cards, which means we just use the clone stamp from Photoshop, and clone all the text away completely. It's just this simple step, no magic involved. You just clone the stuff beside it and draw over it, and in the end you won't notice, because text will be over there, and you can blur the stuff, and it doesn't matter, really. And I won't show you the real template, because this is not legal for me, but this is basically the process. Normally there would be a show in Mustaman, but I removed the mask there just to show that's the way it works. Next, we need to identify the fonts actually used, and there are two fonts really important for us, and the first font is also used in a machine-readable area. It's standardized, you can look it up on Wikipedia. It's called OCRB, and it's used in all the areas marked in red. So we already have the fonts, and we can simply draw over it and we removed the sample text before. Now we have another font which is used for the personal information. It's not the same if you look at the number 3. Do it at home. You can see that it's different. It's not the same font, and it's not documented. If you Google it, no chance, but there's a reverse font search engine, and if you take a good sample, crop it out of the image, put it into a search engine, you find a font called SexMonoRegular, and this matches exactly the font they used on the ID card, and if you try to draw over the real ID card, you'll notice they changed some stuff, so they have customized this SexMonoRegular font. For example, the spacing between the characters, but this is all stuff you don't see on the webcam, so who cares. And now we have the fonts, and now we talk about the watermarks, and if you look at this comparison, the blue area is what we'll fake, and the area on the left is the photo, the base photo, and you'll notice the dark areas of the photo are not really considered in the watermark, but the bright areas, they are really green in the watermark. So we know, okay, they maybe have a black-white image, where dark areas are not really considered in the watermark, and the bright ones are, and we see those eagles on the left side of the watermark, in the blue rectangle, and those are not considered in the watermark as well, so they must be black on their base texture. And this is what I've come up with. The eagles, you can simply draw using Gimp, you have a high-res image here, which you can use. So you draw the eagles, and you generate your picture, and that's basically about it for generating a watermark-based texture, and now we'll go on with how to implement the watermark, actually. And it's very, very easy. I use the Fung lighting model, it's the first lighting model you learn in a computer-graphing lecture, and the specular component, which basically tells if you have a blue cup of porcelain, and you shine a light on it, so the white parts are specular, and if the specular part is above a specific threshold, I basically fade in my watermark, and this is already this nice effect. I didn't have to do more, it's like three lines of code, and you're done. Now I show you the base textures, which I generated for my stuffed animal here. You don't, on the left side, you don't really see that this is faked, or that I changed, removed text beforehand, because this fine repeatable print is so small, that's no issue for us. So to sum it up, information to build a sample, baseline texture templates is public, information about the used fonts is public, the repeatable background patterns can be used with the clone stamp, and embedding high resolution images in PDFs is maybe not a good idea for ID cards, but yeah, okay. Now we go on with actually rendering the ID cards in our video stream, and there's five steps. We first have to detect it somehow, we have to create a dummy card, which we detect in the stream, actually detect the cards, remove the skin, if we remove a finger in front of our fake card, and then we have to put everything together, this is the easy part basically. So there's something called Aruco markers, it's like QR codes, which you can arrange on a piece of paper on a plane, and you can detect them individually, and if you know how the board is laid out, which marker is where, and you detect a few markers, you can calculate how the board is laid out on the image. And you can use this for getting the orientation, and if you take the image on the right, we have the pose, and if you want to render something in 3D over it, and you would just do it, it would look like the image on the left where the hand is missing, this is bad, because you would fail the video identification, but if you detect the skin somehow in front of the board, you would not render over it, and this would look pretty realistic, and this is what we've done. Now I'll show you my fake ID cards, which I've made. Can we get the other camera, please? This is my first prototype made out of cardboard, very simple, it has markers, it's like 1, 2, 3, 4, it's like numbers, those markers are different, and if I change the site, I have markers as well, so I can show the back side of my ID card later. And then I ordered a printed plastic card online, 10 bucks, and this is what I've worked with then, it's exactly the same size as the real ID card. Okay, now we can go back. Now we have our marker boards, and we can start really faking our video feed, so on the left you can see the image we'll work with, on the right you can see what the detection algorithm put out, so we have the pose, because I've made the card, I know how large the card is, and we can work with this to detect the skin in front of the card, this is the next important step because we don't want to render over the skin. So with this information, on the top you can see the skin as a card boundary mask, which basically tells me, okay, in this area of my source image, there is the card, and only inside this area I start to try to detect skin. And I do this basically by looking at the colors, individual pixel colors, and checking if the color is within a specified range of a specific color space, which is good for detecting skin. After that I blur my detection image and detect the contour of my finger, because if I would not do this, sometimes the finger would have holes in the skin mask and my finger does not have holes and this would not match reality. Okay, now we are basically also coming to the end, and this is the image composition pipeline, on the top left we are starting with the source image. Then we are going down to the marker detection, so we know how the card is laid out in the image, and then we do two things with the known information of where the card is. First we render it, how I have shown you in part one, and then we create a card boundary mask to do then skin detection on the source image, only within the card boundary mask, and then we put everything together. We take the source image and say, okay, write the rendered card over it, but only where there's no skin, and this is basically it. This is the process, and the final result, what you can see here is there's still some work to do, like the card has some black border around it, but this is only a Photoshop issue, and you can see at my finger it's a bit fuzzy, this is also an issue, and in the video stream this would be fuzzy all the time, it would change its fuzziness and stuff like this. There's still some work to do, and the card looks too good because you would need to blur it or something, and you would need to add some more lighting effects. Also the generated marker board is not perfect according to the paper which I've got it from. I need to order the markers differently, they are 1, 2, 3, but you want to order them so that they are different, the neighbors are different. Also the skin detection does not work reliable in varying lighting conditions. Here at the congress I have not found good lighting conditions for creating a sample video. You need indirect lights, so this light would be very bad, so a demo would not be possible, and I would need to implement more watermarks. So in conclusion, video identification, the process of identifying a card with a webcam in between is not really the way it was meant to be, cards are not meant to be verified with a webcam in between, and with enough preparation, video identification can surely be bypassed because for this I only took two months of work in my spare time, and I'm really an enormous student with no special knowledge in any of it, and I've noticed the general trend is you can't really trust any video at all, and that's for my talk. Thank you. Thanks for the great talk. We have a lot of time for questions, so please line up at the microphones and ask away. Also if you're on the stream, you know the drill in the ISC on Twitter, you should be able to ask questions. Mic number two, please. That's you, yes. Hello, my name is Pavel. I really like your presentation. I have a question. Don't you plan to release, I don't know, like some kind of generator for these videos, so everybody can use it? I think this would be illegal for academic purposes, of course. Well, you have the blueprint. I think it would not be hard to build it yourself. It would take a week, maybe. So just take away. I'm not stopping you. But it's also not your fault. Mic number four, please. Did you ever use any community sourcing for the watermark sources, or did you do a lot of independent research? I didn't understand the question. When you were identifying your watermarks, you had like one, two, three, all the, you know, numbered. Did you go through, did you use the community to help you identify that, or did you just do completely independent research? I used the flyer from the Bullestruckerei, and they did that. They numbered the watermark. Thank you. So I did not number them. Mic number one, please. Wouldn't the skin detection be easier if the example card would be in green and blue so that there's a clear color difference between the skin and the card? Yes, that's exactly the case. This is for the work, like testing different colors. But when I ordered this sample card, I ordered exactly one, and the printer, the printer is an online printer service. They blocked ordering just one card after I ordered it. Maybe they made a loss or something, so I had no choice to order more, more cards cheaply. Would have cost them 100 Euro instead of 10 Euro. So that's for the work. Thank you. Mic number two, please. Thanks for your interesting talk. And in the last slide, you mentioned that video can't be trusted, and I'm thinking about neural networks and deep fakes or deep learning. Do you think this process or this webcam image video could be faked with some deep fakes so that you don't have to print a real card and render that stuff or simply show a video and replace the card in the video? I think a card is more structured to be too structured to be faked with a neural network. Maybe you can augment this workflow. For example, you could use neural networks to detect the skin in front of the card. Maybe this is more reliable than my approach of just looking at the pixel colors. Number four, please. Hi, thanks for your interesting talk. Two small questions. Have you tested it yet in the video stream? What happens if you turn the card around? Because I would suggest that you see a small image with your sample numbers on the card in that moment. And the second question, do you have asked any official government or the Bundesdruckerei yet what they think about the security for the video ident? Okay, for the first question, the card has different markers on the front and the back, so yeah, I can turn it around and see the backside, but in the process of turning it around, of course, when the card is very steep to the camera, I could not detect the markers, but we have a bad internet connection really and we just dropped the frames. They won't notice. And I have not talked to any officials about this yet. Number three, please. Thank you for your talk. How do you make the lighting change when the card moves? You seem to have used a static image, you just inserted it into the detected card, so how do you still make the lighting change and render it? The card is rendered using OpenGL, so I can use all OpenGL features for this. And in my shader, I can simply... Well, the shader computes the lighting independently for each frame, so I can move the card around and the lighting adjusts accordingly, but I couldn't show this in a still image. Number one, please. You've made a pretty impressive effort with trying to delete the letters and recognizing the pattern behind it, I think. I think that's great. You've tried the existing tools, like even Adobe Acrobat, does a very excellent job sometimes. I'm surprised with the OCR and the text replacement, so... No, I just did the manual approach because I saw, okay, it works good and it works reliably and I only need to do it one time, so that was okay for me. Acrobat does an excellent job with the letter replacement in OCR. I want you to take a look at the full version of Acrobat, yeah. Okay, thank you. Number four, please. Hey, so when you actually do video verification, if you combine that with also checking with a third party, are you able to also generate valid identifiers and have them... Are you able to swap the photo out and then still fool people? Well, that depends on the third party they use for verification. Of course, we don't know what they use, so we could not fake this then. I also only used a random identifier for the ID card because I can make them up reliably and correctly. Mic number one, please. Have you made any kind of research on how you can replace your live face too in the sample? So you can, for instance, open a bank account as your mother or something like that? I think you could basically do the deepfakes approach of just replacing the face and then use the face for the generated textures as well. Or I thought about maybe using something like the Unreal Engine to replace the background and the person as well. It's reliable enough and you can blur the image and stuff like this, so in future work you could try this. You can make the image noisy, blurry, low resolution, and then you can also use a computer-generated person. Do we have any more questions? And also, does the internet have any questions? The internet has been uncharacteristically quiet. No? OK. Number one, please. Yeah, maybe from your experience, what would now be the recommendation to this video identification approach? So, OK, don't use it at all maybe, of course, but it seems to know how to fake it, but what would be really hard for you? Well, this is work of two months, so if you take a team of people who know what they're doing and let them work a year, they can make it perfect. And there's basically nothing in a webcam that would be really hard to fake. And my recommendation would be to use anything but video identification. OK, please thank our speaker.