 Hi, I'm David. So I started using Scratch when I was 11 years old, and I wasn't very good at it back then. I was doing more of the cat spinning on the spot that was mentioned earlier. But now I'm back 13 years later with a bit more programming experience under my belt, kind of returning to my origins to see what I can actually do with Scratch now. So by day I'm a security researcher specializing in software reverse engineering. I'm particularly interested in file formats, protocols, serialization and parsing, among lots of other things. And I also enjoy breaking weak cryptography. But I'm not a cryptographer or a mathematician. So my mathematical knowledge is actually pretty weak. So this might be interesting for other people in the same boat to kind of see how I get by without actually understanding some of the maths. So the first thing that people ask when I tell them that I've implemented some cryptography in Scratch is they're like, why? Why on earth have you done this? Why would you do that? And the first answer and the shortest answer is because I can. And that's the only answer really, but I felt like I had to make up some other answers after the fact because otherwise I'd seem a bit insane. So one answer is that as a reverse engineer, it's all about spotting patterns in code to figure out what it's doing. And the only way you can spot a pattern is if you've seen it before and the best way to see it before is to implement the algorithm yourself. So by implementing algorithms from Scratch, it improves your reverse engineering skills. And implementing things in Scratch is kind of just an added level of difficulty really. And it means you can't cut any corners. You really do have to implement everything from Scratch with no cheating using libraries and things like that. The next reason is that I want to demonstrate that cryptography isn't like this magic black box that does stuff. You can actually construct it yourself with primitive components like Scratch. And finally, I'd like to show that Scratch is a real programming language because a lot of people like to say it's just a toy. It's not a real programming language. But there's really no reason that it wouldn't be a real programming language. You can do anything in Scratch that you can do with a computer. It might just be a bit more tedious. So my goals for this project is to have like reasonably fast cryptography, Scratch itself is quite a slow language. It's interpreted and the interpreter runs in JavaScript. So there's like a lot of layers of slowness going on. But I don't want to be waiting all day for a small message to encrypt. So I want it to be reasonably fast. And of course, I want it to be reasonably secure, but I probably wouldn't rely on it anyway. It is more of a prototype than something you would actually want to use. And finally, I really want it to be correct. So I want my implementation to match the specifications to the letter so that if I encrypt something in Scratch, I could put it into Python and decrypt it in Python so that it's compatible with other implementations. And finally, what do I mean by modern cryptography? Because that's a bit vague, deliberately so. But for my project, I want to have standardized ciphers that are actually used in the real world because I think that makes it more interesting. And also, I don't want to use any known vulnerable ciphers. So something like RC4 that you might have heard of is pretty old at this point. There's multiple published vulnerabilities in it. So I wouldn't want to use a cipher like that even though it'd be very easy to implement in Scratch. So the algorithms that I chose to implement, the first one is Cheshire 20 Poly 1305, which is an authenticated symmetric encryption algorithm. I'll explain what that means a bit later. But I chose it because it's secure and fast and it's standardized and it's used in TLS 1.3. Also, the X25519 key exchange algorithm, also secure and fast, also standardized, and also used in TLS 1.3. That's interesting because TLS is what underpins HTTPS used by web browsers. So every time you load a web page over HTTPS, there's a chance that your browser is using these same ciphers to encrypt the web page. And finally, as a hash algorithm, I implemented Blake 2B. Also fast, also standardized, and it's used in WireGuard, which is a very modern VPN protocol. And one nice thing about it is it only uses 32-bit integers compared to some other hash functions. A lot of other hash functions like to use 64-bit integers, which, as I'll explain later, are annoying to do in Scratch. Another reason I picked it is because it shares a lot of code with Cheshire 20 so I can kind of copy and paste a lot of my implementation. At some point, I would like to implement a signature algorithm. I haven't done that yet, but when I do, it will probably be ed25519. I don't know how it works yet, so I can't implement it, but one day I will. So I'll give just a quick overview of Cheshire 20. It's a symmetric encryption algorithm, which means that you can encrypt a message using a key and then decrypt the message at some later point using the same key. So that might be after you've sent it to someone else, or it might be saving data to your hard drive, something like that. And it was published in 2008 by Daniel Bernstein. And that's interesting because Scratch was published in 2007. So this is actually more modern than Scratch itself, even though it might seem a bit old in the grand scheme of things. And it's an example of what's known as an ARX construction, which means it only uses three core operations, addition, rotation, and exor. And I'll explain what those mean a bit later, but it's ultimately quite simple. Only three things I need to worry about. And it also, as I mentioned earlier, uses 32-bit numbers, which work nicely in Scratch. And it's a stream cipher, which means it works by producing a pseudo-random sequence of bytes based on the key, which it then combines with the bytes from your message that you want to encrypt using exor. So I did say this is from Scratch, so I'm going to start off with a very basic, how do computers represent numbers? So obviously, as humans, we're used to using the digits 0 through 9. And then when we want to do numbers higher than 9, we use more digits. So we go up to 1, 0 as 10. Whereas binary only uses the digits 0 at 1. We call a binary digit a bit, and there are eight bits in a byte. And commonly, you'll see other lengths of numbers. So 32-bit, for example, or 64-bit, or even more are frequently used in computing. So before I give an example of binary, I will remind you all how decimal works, because you kind of take it for granted. But you've got the number 137 on the left there. The first column is kind of worth 100. The second column is worth 30, and the last column is worth seven. And if you add those three components together, you end up at the number 137. And so binary is exactly the same concept, but we've just got zeros and ones. And so in this example, the number 137 is 1, 0, 0, 0, 1, 0, 0, 1. The first one is worth 128. The next one is worth eight. And the last one is worth one. And if you add them together, you will also get 137. So hopefully you can see how that kind of translates conceptually across. And obviously I said that Chacha 20 was an ARX construction, so we've got addition, rotation, and XOR. And obviously the first of those three is addition. And that's exactly what you'd expect it to be, except there's one catch. Because we're dealing with 32-bit integers, you can only represent numbers less than two to the power of 32, otherwise you run out of bits. So what happens is when a computer adds two 32-bit numbers together, if the result is too big, it just kind of chops off the higher bits, which results in it wrapping around to a smaller number again. So for example, if I had the number that's one less than two to the power of 32, and then I added one, I would end up back at zero. Or if I added two, I would end up at one. So that's called wrap around. So the next of the three of ARX is rotation. So for rotation, you look at the bits of your, through the number. In this example, I've got 23, the number 23 in bits. And you shift all the bits one way or another. In this example, I'm shifting to the left by one. So all the digits move to the left by one. And then the digit that would kind of fall off the end comes back round to the start. And if I was shifting by two or three, then there'd be two or three digits wrapping around back to the start. So in this example, I had 23, I shifted it the left by one, or rotated it the left by one even. And the resulting bits are equal to 46 as a decimal. And finally, we've got XOR. So fundamentally, XOR is a function that takes a pair of bits as input and outputs one bit. If both of the input bits are the same, then the output is zero. And if they're not the same, then the output is one. And bitwise, XOR, which is kind of what I mean when I say XOR in general, is where you have two numbers, you take their bits, and you apply that XOR function to each pair of bits at each number. So in this example, I've got seven being XOR before. And you can see how in the first column, where I've got zero XOR, zero becoming zero, because they're both the same. And in the last column, you've got one being XOR with zero becoming one, because they're different. And that results in 0011 at the end, which is equal to three. So seven XOR four in this example is three. And obviously, there's only four bits of this example, but the exact same thing is done with 32 bits as part of Chacha 20. So with those three core operations adding, rotating, and XOR, they're combined together to make the whole cipher. So Chacha 20 has 16 32-bit integers called the state, and they're arranged in this four by four grid. Now, the key is only 256 bits or 32 bytes, but there's 512 bits worth of state. So the key only occupies half of it, and it occupies the middle two rows, spit up into 32-bit chunks for each number. Now, at the top row, we've got this phrase. It says, expands 32 byte K, spit across four words. And what that is is an example of a nothing up my sleeve number, which cryptographers call them. And that's basically just an arbitrary, completely arbitrary number. And that's their way of proving that they're not doing anything sneaky with that number. It is purely an arbitrary number. They're not trying to introduce any vulnerabilities into the cipher somehow. And on the lower row, you've got the counter and the nonce. The nonce is quite important. It has to be unique for every message that you encrypt. If it's not, then you end up with the cipher being vulnerable and you could potentially crack some or all of the messages. And there's also the counter, which I'll explain in just a bit. So once you've got the state set up, there are 20 rounds, hence the name Chacha 20. And in each round, something called the quarter round function is applied to the columns of the state. Or rather, on every other round, it's applied to the columns. On the other rounds, it's applied to the diagonal columns. So that quarter round function looks like this. This is the same function represented in two different ways. So on the left, we've got a kind of pseudo code representation. It's not quite valid C syntax, but it's close enough. And on the right, we've got this kind of, I actually don't know what the name of this diagram is, but it's a diagrammatic representation of the same operations, where the yellow box represents addition, the blue circle is XOR, and the green box is rotation. So you do 20 rounds of that. And then finally, you add the initial state into your current state. The reason that's done is to make sure that the cipher isn't invertible. If that wasn't done, you could kind of undo the quarter round functions and derive the key. So that's quite an important step there at the end. So finally, after all that's been done, you've randomly mixed up all the data that was in your initial state, and now you've got your pseudo random values. And this is what you XOR with your message that you're trying to encrypt. And to decrypt it again, you XOR it with the exact same data, and you'll end up back where you started. Because something I didn't mention earlier about XOR is that if you XOR two things together and get the result, if you XOR the result with one of your inputs, you'll always end up with the value of the other input, if that makes any sense. So XOR is kind of reversible like that. So enough cryptography theory. Now it's time to talk a bit about scratch. So I'm sure you've all heard of scratch at least to some extent, but you might not realize quite how popular it is. There have been over 100 million projects shared on the scratch website. And by contrast, Github has about 150 million public repositories, which is of course more, but is not very much more. I'd have never of guessed they were in the same order of magnitude. And scratch is open source, which means I don't need to reverse engineer it, fortunately. The source code for their runtime is on Github so you can look at how their internals work, which is very useful. And I'm just gonna give a quick demo of like the capabilities of scratch. Won't take very long, hopefully. So I've got a little test project up here. On the right here, you've got your sprites and you've got your stage. The stage is where the sprites are positioned. And on the left pane, you've got your scripts. So you write code by dragging these blocks around and you've got all the standard control flow that you'd expect from any other programming language. You've got loops, you've got if statements, you've got variables, you've got lists, and you've got basic mathematical functions, addition, subtraction, et cetera. And yeah, I'll just make, I'll make the cat spin around since we were talking about that earlier. There you go. So that's an example of the kind of things you can do in scratch. So scratch, although in theory you can do anything because it's Turing complete, it has quite a lot of limitations when it comes to trying to implement cryptography. So it's all the numbers in scratch because it's running on JavaScript under the hood. They're all 64-bit floating point doubles. And I'll explain, sorry, 64-bit double precision floats even. I'll explain what that means in detail a bit later, but it's not a 32-bit integer so we can't just drop in the algorithm and have it work. It also lacks XOR, it lacks rotation, which might sound like a bit of a problem, but I'll go into how we address those shortcomings in just a bit. And of course, you've got to use your mouse. Making longer scripts gets very tiring, like you just got to do miles and miles of dragging to get your blocks into position. And it's very tedious if you're a fast touch type in comparison. And also there's no version control. When I write code normally, I'm very used to committing it to Git. And when I can't do that, I feel pretty lost and feel like I'm about to lose all my changes at any point, which of course you can. So before I get into how I overcome those difficulties, I will explain how floating point numbers work. So as I said, Scratch's variables are stored as floating point numbers. And if you're familiar with scientific notation in physics, it's a very similar concept. So rather than writing that the speed of light is 300 million meters per second, we write three times 10 to the power eight. And floating point numbers are basically exactly the same concept. You've got the exponent, which is like the green section of that diagram, and the fraction, which is the red section of that diagram. And so in the speed of light example, three would be the fraction and eight would be the exponent. And there's one extra bit that says whether it's a positive or negative number or not. So that can represent arbitrary numbers. So you can have fractions. You can have really big numbers. But there's one catch. If your integer goes above, so it's the power 53 because there's 52 bits for the fraction. If it goes above that, then you start losing precision. And what that means is the accuracy of your maths will start to drift from the correct values, which was a big problem if you're trying to do something precise like cryptography. But fortunately, because the 32 bit numbers only go up to two to the power 32, that's actually not much of a problem for 32 bit numbers. As long as we make sure that we keep our numbers rounded to whole numbers, we won't run out of precision. So the next challenge is bitwise XOR. And of course Scratch has no XOR operator, but as a bit of a cheat, we can make a list of every possible result. So we can make a list of every pair of numbers and what the result is if you XOR them together. Every pair of 8 bit numbers that is. And so the list for 8 bit numbers is 65,000 entries long. So of course I wrote a Python script to generate that list. And then you can implement that in Scratch, like shown there. So Scratch lists are only one dimensional, but we've kind of got a two dimensional table here. So there's a little bit of arithmetic to convert the indexes. And that just has a plus one on the end because Scratch's lists start counting at one instead of zero, which is a bit annoying if you're used to it being starting at zero. Now that's for 8 bit numbers, but for chatter 20, we need to XOR 32 bit numbers together. And this is where it gets a bit crazy. So we need to split our 32 bit inputs into 8 bit chunks. And then we XOR those chunks together separately using the lookup table. And then we recombine them into one 32 bit results. And the method to spitting the numbers up is called shifting and masking. So shifting is very much like the rotation that I mentioned earlier, except the numbers that fall off the end, you don't bring them back to the start, they just disappear. So shifting left is actually equivalent to multiplying by a power of two. So five shifted left by two is the same as multiplying five by four. And conversely, five shifted right by two is the same as dividing five by four, but you need to round down once you're done, otherwise you would end up with a fractional number at the end, but you actually, you want a whole number at the end. So if you round down five divided by four, it's exactly the same as doing a bit shift right by two. And that can be done in scratch very easily. So with the XOR lookup tables, the shifting and masking, oh actually I haven't covered masking yet. So masking is quite a common operation you do in Bitwise logic. If you want to extract some portion of a number, for example, the last eight bits, you might end it with 255, which what that essentially means is you extract the last eight bits of the number. The scratch doesn't have a bitwise and operator, but in a pinch we can use the modulo operator, which fortunately it does have. So if you take a number and you put it modulo 256, you extract the last eight bits of the value. And for example, if I wanted to get the second last eight bits, I could shift it first and then do the modulo and I would get the eight bits that were the seconds, so the second lot of bits of the number. So combining all that together with XOR, yeah, XOR masking and shifting, you have this horrible abomination of scratch code. It doesn't even fit on one slide. It's going off to the right. Even I zoomed out all the way and it still wouldn't fit. So as you can see, this is pretty unwieldy. So I didn't want to have to write a whole program with hundreds of blocks like this. So I decided that I needed to automate this process somehow. So I'm sure you've all seen the XKCD comic about writing, writing tools. You'll always spend more time writing the tools than you do spend actually using them and you'll probably never finish them either, which is very true, but I still did it. So scratch scripts, when you save them to a file, they are stored inside a file called, what, with a .sp3 extension, which is basically just a zip file in disguise and inside that zip file is a JSON file, which contains a list of all the blocks in your scripts and that contains all the attributes, like what their arguments are, what their position is and what are the blocks they're connected to. So rather than dragging and dropping blocks around with a mouse, I could, in theory, write some code to generate this JSON file with all the blocks already in it and that's exactly what I did. I made a Python library that I called Boiga for generating a scratch code. Now the reason I called it that is because there's apparently a species of snake called Boiga Faustini, which supposedly is nicknamed the cat snake. I'm not quite sure why because I don't see the resemblance myself, but that's why I picked the name because the scratch mascot is a cat and the Python logo is a snake, so you put the two together and you've got a Boiga. So this library that I made is a bit weird. It's not quite converting Python code to scratch code. What it is is a library for expressing scratch syntax or scratch code through Python syntax. So it's actually more powerful than translating Python into scratch because you can write a program to generate other programs. Awesome. So I'm just gonna show you a quick example. So with that horrible XOR example I showed you before can actually be implemented fairly concisely in Python. This might look more complicated perhaps, but it's much easier to work with because you don't have to drag and drop a million blocks around. So this is an example of co-written with the library. Here, actually I might zoom in a bit so you can see it, but there we go. So here I'm declaring two scratch variables and I XOR them together and make the cat say the result. And here I'm doing all the shifting and masking and lookup tabling that I talked about earlier and it's all implemented through Python syntax. And you'll notice that I'm using the Python bit shift operator here. That is under the hood, translates it into the scratch blocks just like I described earlier. And if I run my script, which hopefully still works, if I haven't exploded, I hope not, I ran the wrong script. And if I open that and scratch, so this code was just generated by that script that I just ran. And if I click that little green flag, actually I make it full screen, you can see the scratch cat says hello world and then XOR is two numbers together and tells us the result. So now that I can write scratch code much more easily, effectively in Python, the sky's the limit really. So this is what I came up with. So what you're looking at here is all the ciphers I mentioned in the initial slide. So we've got the Chacha 20.1305, we've got the, gosh I forgot now, got the Lake 2S hashing and we've got the X25519 key exchange algorithm. Now I wish I had more time to explain what every bit of this does in detail. So if you wanna know, find me afterwards and ask me. But for now I will just show you it working because I think it is very cool. So the code that you just saw on the screen there was generated by this Python script and it's actually pretty short. There's only 94 lines of Python in this file but it's all calling into libraries that I wrote. Again, using this library. So I've effectively written a cryptography library for scratch that you can use with my tool to embed cryptography within any scratch projects. And so if I generate the scratch code just now, open it in scratch. Takes a little bit to load because there's almost 4,000 blocks in here. I'm gonna shrink that down a bit and scratch has this handy function called cleanup blocks that artfully arranges your blocks into a giant list so I can scroll through it all there. But actually if we get down to the bottom, this is like the main loop effect or the main function even. And if I zoom in, it's doing fairly readable things. So we've got, let's see, we're initializing a random number generator. We're getting some random bytes. We're performing an X25519 scalar multiplication which is part of the key generation process, et cetera, et cetera. So now I will show you actually doing something. So here we go. So it just, what it just did here is it generated an X25519 key pair. It generated a public, it did a Diffie-Hellman key exchange against a public key that I embedded within the program. Generated a shared secret and then encrypted a message or derived a session key from the shared secret. And now if I type something in here, like hello, it encrypts it very quickly. And I'm sure you just noticed that was like a blink of an eye. So I'm quite glad that it was that fast after all that. And also obviously in this demo, the key is being printed on the screen. You probably wouldn't want to do that in any real application. So if anyone wants to try decrypting this message, you should be able to use standard tools to decrypt it. So yeah, that's it. Thank you for listening.