 Lijkt u dat een challenging subject? Duitenschaf is broken. Just to affect a lot of people. There is a problem with Duitenschaf. And I want to show you. And they'll hurt you if you don't know it. Don't know about it. Do you have solutions to it? There's also a solution. That involves thinking. And that's always hard. I'm really necessary. Is die erin mee? Nee, nee. Dat had ik er te gevraagd. So let's start with a little bit of history. Maybe you recognize this. In the very early days, computers were very slow and so on, and communication was a big telephone lines and copper wires. You needed a lot of protection against bitflops, bits falling off during communication. And one of the easiest ways to protect you or detect, at least, is by adding some parity. So here you see seven bits of character communication. And these are the bits, 0, 0, 1, 1, and then you get parity bits. The parity is a little trick. Either you use even parity, that means the bits you want to communicate, including the parity bits, is either even, altogether, if you sum them. So this is 0, and it's 0. This is an even number. And here you have 1, 2, 3, 4, 5, 6. So it's even again. So it's correct. Either you have an even parity algorithm to see whether a flaw has happened or you have no parity where you just add all the bits and it should be an odd number. So the parity bits is going to use to detect whether bits flaws happen. And you cannot correct if there's a bit flaw. It's just an error, a transmission error and you have to resend it or something like that. This is a fairly low level communication. Then you've got the CSC codes which is a lot smarter. CSC codes can not only check whether bits fall over but also correct them. This takes some time to correct bits. Here you have a sequence you want to transmit and then you get some number. And that's a... these are very wisely smartly chosen numbers. It's usually 32 bits. And you do a division. I don't know how it is called a long division. Long division. To get a number. Just like that on your primary school. Dividing in binaries is, by the way, very, very simple. A long division is much easier in binary than a decimal because it's either it fits or it doesn't fit. So it's really easy to implement as well. If you want to implement a division yourself. In square root it's even simpler because then you get just two bits at the same time. So what you get is this is the real matches to be transported. Here you get the remainder. You transport it as well. Then the receiver is also getting these bits and check it. But if there is a mistake if our bits falling over then you can fix them. Although it can be compute intensive if there are a lot of bits fail. But if there are one or two bits fail you can just multiply again some bits of difference. So it's all about transport security and storage. You can see here see usually used on disk blocks on all our places to just have a check. And one of the other. So you can fix this and the other thing is that you can do it both ways. Crypto checksums are just for me in my opinion in this same process. Same idea. And you see that also for your application. The big main difference between CSE codes is that it's one way approach. So you get a disk stream and you compute the crypto checksum on it. You send it in. And then you can find whether there are transmission problems. Whether there are bits left out. You can prove that the data is still being changed because it's very difficult to send another text stream with the same signature. Mb5, for instance, is known to be broken. Well, it's very difficult. You can produce a second text message with different contents which has the same Mb5 checksum. For shallow it's a very hard skill but if you have enough power and you say like computing power then you can maybe with some efforts produce a second text. It's quite relatively safe. So this is the logical step from the previous two slides that you have these calculations and the checksum but there is no correction because it's a one way function. So you should do it quite expensive. For all bits it's possible and that's not really global. A very important use is in contents and topity for instance, if you have HIT and you have a lot of changes in your HIT repository by calculating a Mb5 is identifier you can have a unique identification for many, many, many lines of changes. You can just and it's still an idea that it's quite hard to make two texts resulting in exactly the same Mb5 checksum. So this is a very important use and hashing is especially focused on this one way function for instance the password is encrypted Mb5 or Xiaowang or other and it is also kind of an topity but usually these strings are quite short when you start them and you don't want to store passwords in your text so you calculate Mb5 and if someone else comes to the same password then it will result in the same Mb5 but it's very hard to figure out to find a password which is different which also makes it the same Mb5. So for instance the center height passwords in the shadow file this is an indication of what algorithms used and this is the Mb5 for Xiaowang and the password is nowhere in the text. Could you close a store meeting in Baxxon? Sorry? Could you close the door in Baxxon? Okay. So this is where I use it for instance in other places. I use it on an XML and cryptosignatures in XML it's horrible. So far it's written between all these cellular records. What you get is a show message some XML message to speak between applications and the message has a header and a body and just like in HTTP you have a header and a body and the body is the real information in the header is some additional information in this case it's a cryptosignature about the body so no one has to tell much of the body and you get so here the body has an ID in the header I have a signature and a signature references the body I go in to sign this body it's a two steps first you have a digest it calculates Xiaowang for the whole body the XML body and then the second thing is this header itself is unsigned by a certificate so it's a two step process first Xiaowang about the body well there's even one very horrible extra complication in this XML is you can for instance change the order of the attributes in the XML or you can change namespace decorations in your XML just use a different prefix for a certain now I have to use prefix WSU but I can just change the name I left out of the prefix decorations in this example so it's a three step process the first step is categorization C14N is a user user it's such a difficult name to pronounce C14N means C and 14 characters in NNN but it's a categorization which tries to make this XML message in a standard form so what's in standard form it can be in Latin but you just need to change it so a categorization will translate it into a different way it will source all these attributes alphabetically to fill in defaults and all those values the weird thing is the signatures of XML packages are on the level of intelligence of what is there and not on the bytes categorization is a horrible process and it's very difficult to write applications that's categorization for all the libraries working so you must be very careful on what you're doing then so you get the categorization then you get this digest of the categorization C14N version of the XML and then you get these couple signatures and if any bits on any of these three steps is incorrect then it doesn't work one bit is very important it's safe for CSE and for parameters if one bit flips it's broken it doesn't work anymore so this works a different number is on the couple signatures work until all of our customers complain they have to stop working it didn't work on his system he spent 3 days on it and I spent 1 day on trying to figure out what's the difference between our two systems what's the bug one bit dropped off and apparently it's a new version of the artistia well this is the artistia if you look at the manual page it has two uses either you use it in the differential interface or you use it in the object oriented tools and it speaks about data which is signed sequence of bits and here I can add data or add data from a file this is if you have a string containing 1's and 0's and add bits and then you get the base 64 die just is used in the HTML messages so what I have to do in the HTML is I have this element the body of the message and then I see what we ended call your artistia about it and then the basic format and then the artist until die artistia 5.8 and you will see it soon because now if you install a new version of Linux or so you will get new versions of the artistia and you will break it I forgot this I forgot that this is I work in real in my pro programs I like UTF-8 I do everything in UTF-8 and I forgot that Xia is of course on bits and not on UTF-8 I need data Xia is calculated about bits if you go look in the new diegis you see a new chapter on unicode and side effects on unicode and with here let's enlarge it a bit be aware that the diegis between silently before the UTF-8 input into a bi-sequence of native encoding it's what it calls downgrade that means that it translates the UTF-8 into a Latin mom is this UTF-8 encoded or is this UTF encoding that it talks about because both are called UTF-8 in our community which is highly confusing but it would mean something entirely different because if the UTF-8 flag is on then it does exactly everything that pro internals do yes but in that case this documentation is broken incredibly confusing this side effect side effect in bitsign if that is only the way that those scores they take internally I thought that Xia was used to communicate passwords and my XML messages so if one of the two parties is just changing the bits then it doesn't work anymore here of this I needed some timers on it and the check-sums for the Dijkerschild before 5.82 and you see that I can make two versions which produce the same check-sum and two versions which are different and after that we can also produce two which are the same well this one is so the newer version is downgrade so downgrade UTF-8 until letter 1 and when the UTF-8 character is too large it doesn't fit in this byte sequence then it will just come downgrade it so you can imagine what I did here the first one is just pure letter 1 just straight without UTF-8 character the second one is the very simple way of letter 1 UTF-8 this is where my name cleanly have encoded from UTF-8 into byte and this is for instance this is an A with a composite composed schema on it en in those environments for instance I have an application where people upload files to web server and on windows and junix if you type A3 make a composite pullout then you get one character but if you use match you get two characters you get the A and composite you see the difference in size what does even worse here I call I have this string then I call my routine and before the routine my string is set alone and after it is 5 yes cause you're using bytes you requested this to peek into pro's internals that's what the module bytes does pro now that's not the bytes that's doing it this is a downgrade yes if you had used UTF-8 instead and your source code where UTF-8 encoded then before and after it would be 5 no no this is a downgrade in the routine which downgrades the string the original string and keeps it that way there is a book report about that it really mutilates my constant string in my main program and if it contains white characters go break so you will have you won't have this problem if you double code the column X in the sharebase core because of the base copy yes maybe a previous slide in case you step in between the UTF-8 encode you're skipping the state now that's just one yeah, that's my problem if it's the same but you need to hide you need to check well the problem is calculating shamans has nothing to do with characters it has to do with bytes you should be very very careful of the bits and there is a reason why they do the downgrade because in per role you have these data-monstrates accidentally they may get upgraded at UTF-8 because you do a regular expression a regular expression of the UTF-8 string and then both will be so accidentally you may upgrade the the letter string into letter string it's very rare to have those accidents but I don't think this is the solution unicorns and character strings both things to do with shamans well calculation you can better break out when you get UTF-8 string when you calculate the shamans so that you can find back where this accidentally upgrades has happened then just silently do something and calculate something and then see that no one understands where your shamans are so actually for some things it's still useful in per role if all your applications are pro then it's still maybe okay but if anyone started writing for net applications to check your passwords then it doesn't work anymore and you cannot compute the backpip it's one way function make a check for any transmission problems as long as it's between per role ok so as long as you stay in per role it doesn't hurt but shamans is a bit wise operator and it's not only that it dies in shamans but it's also my base 64 which has been broken this way it's the same one to my way but I think it's the wrong way of downgrade is just per role I have suggested this because many of you still say that it expects data here but it's a string so to stop this one and have an act spring for people who still think it's same to do something as strings and so on but don't forget normalization in this process what is totally ignored now and have separate ad bytes which is compatible with the old versions of shawan which just grow with these so that's what I suggested this is totally even my wish is just being rejected so that's what I want to tell you if you are using dyke shop please recheck whether you nicely encode or use your base strings if you do that then do you still run into any of these problems because then you don't ignore that there are characters in the same bits but downgrading I'm not sure if it is the best solution but it is what everything in the internals do for example if you use print on a file handle that doesn't have an encoding layer it will downgrade and it will warn if you have a white character it might not be the right solution but it is at least consistent with the rest but shaw is a bit operator and you're not doing it on bit operators otherwise you'd have to copy the string to even figure out whether it is properly encoded or not whether it is properly encoded or not it should describe it is a bit operator nothing to do with strings no go first, no downgrade, upgrade don't worry about encoding later the message code the issue is pearl has strings that can contain more than just bytes and that's what you're feeding it and that's why it breaks but it can check for this without very expensive operations and typically shaw is also used in large messages like 4 gigabytes in size en if you would have to check everything without downgrading you should have downgrade just broke it but it's not easy in pearl warranty to check if the utf80 contains only code points below it doesn't have utf80 it should only accept bytes that's the issue whether or not the utf80 flag internally is on or not does not mean that you don't have a byte string it does not indicate the type of the string pearl doesn't have this concept I know we don't have it so we should have it blocked croaking if the flag is on would be a different mistake but silently changing shit it's also wrong but this is what the pearl string model is I know how the string model works but it's not a string you should block people from using and doing this as strings and then so the solution would result in people presenting talks that it is broken it would still be broken but in a different way it's bad that people get borns when they do something wrong