 All right, hello folks, and thank you for coming to this talk. I know I'm the last thing standing between you and a bunch of awesome gatherings this evening, so it means a lot that you're here. Also, before we start, just wanted to mention, when you're leaving, I'd appreciate if you give a thank you to the AV folks in this room. They have been doing a great job and all the folks in all the rooms, so definitely give them a shout out when you're leaving today. This talk is called Registries After Dark Part Two, Distributed Random Access Merkle Dags. My name is Dan Mangum. I work at Upbound, and I'm a maintainer of the Crossplane Project, and unfortunately, Jason is unable to be here this week, but he worked with me on preparing this and definitely had a large part in it. I think he might be actually watching virtually, so if you're also watching virtually, feel free to bug him about anything I get wrong in this presentation. Like I said, this is part two, and hopefully the clicker works, yep, it looks like it does. So the first part of this was given at KubeCon North America a few months ago with John Johnson, and in that presentation, we talked about really how to get things out of container registries. We specifically focused on the advantages of pulling by digest and kind of the general structure of how a registry works. In this talk, we're gonna be focusing a lot more on how to get things into the registry, specifically some things that you may not expect to put into a registry, and why or why not different things would be a good idea to store in a container registry. I'm gonna be moving pretty fast. There's a lot of content in this, so stick with me, and I'll try to make it interactive as well, so feel free to pull out your laptop if you're interested in that. So we're gonna take a bit of an interesting pattern in this talk. We're gonna start off back at the very beginning, if you will, and talk about how computers work in general. So if you're sitting in this room or you're watching virtually, you probably have a laptop in your bag, or maybe you're watching this on a laptop. And even a laptop, which is a pretty similar machine between all the ones we have, is very different in the internals, right? There's different components that make up the machine. You have discrete processors. You might have a CPU, a GPU. If you have maybe an M1 Mac, you have a number of processors, as well as memory, all integrated on the same chip. If you have a PC like me, you have more discrete components that are talking to each other. So even in very similar machines, there are lots of different heterogeneous components. However, even at the wider scale of types of computers, this heterogeneity can still be described through a logically similar model. And typically the way we do that is through five general components here. In this talk, we're only gonna be talking about the first three of them, although input and output are obviously very critical, especially today in a very networked world. We're gonna be focusing on control, data path and memory. And we'll start off with control and data path. These are frequently referred to as the components that make up the CPU. So it's the brain or the decision making, the thing that actually does the work in your computer. And broadly, these two components operate on two different types of data, really. Instructions and the data that instructions operate on. Data is simply any information that can be stored and retrieved, right? You can put it in memory somewhere, you can take it back out. And the instructions tell us when we want to take it in and out of memory, bring it into the processor, as well as what we wanna do with it, right? How do we transform it? How do we perform operations on it? To understand how we kind of got to this place where we have logically similar components in computers, it's helpful to go back, and this is, we could go back way farther, but we'll stop around the late 40s, early 50s, with John von Neumann, who I'm sure many of you have likely heard of. And around that time, and for actually a number of decades after that, there were computers that had memory, right? They could store data, whether that was magnetic core memory or earlier than that, maybe a vacuum tube array, something like that. There was ability to store memory and that greatly enhanced what we could use computers for. However, programming a computer was a different operation that was done outside of the actual execution of the machine. This was typically done via paper tapes or later punch cards. Some folks in this room might actually have experience with using these kind of technologies. Looking at you, Jared. And what von Neumann realized in the late 40s when he was working on the EDVAC, which was the successor to the ENIAC, an early computer, was that we could actually store the instructions, not on paper tape, not on something that was outside of the machine, but also in memory just like our data. And not only could we store it in memory just like our data, we could store it in the same memory as our data, so the same actual components. This became known as the stored program concept. It's kind of a foundation of a lot of computing we do today. It's why you can actually write code on your machine, compile it, store it in memory, and then run it as an executable. This is all because of the stored program concept. And then von Neumann's specific observation of that you could store the instructions and memory in the data in the same memory as known as the von Neumann architecture frequently talked about in contrast to something like the Harvard architecture, which has discrete memories for instructions in data, but still stores them both in memory. So if we're gonna store instructions in the same way that we store data, we need a way to encode them and define them. And this is known as an instruction set architecture or an ISA. I'll use the term ISA throughout the rest of the presentation to refer to this. And over the last 50 or so years, there's kind of been two tracks of instruction set architectures, complex instruction sets or CISC and reduced instruction sets or risk. You're likely very familiar with complex instruction sets. If you've ever heard of x86, that is the most popular complex instruction set. Essentially the goal of a complex instruction set is to give you higher level operations that allow you to achieve more functionality with a single instruction. And when instruction sets were first being defined, this made a lot of sense because programmers were actually writing assembly directly, right? So now that we have higher level languages that give us these really expressive capabilities, earlier on we tried to actually jam more of that into the capabilities of the hardware. Reduced instruction sets have also been around for quite a while. I mentioned M1 Max earlier, they're actually an ARM ISA so it uses a reduced instruction set. And it, as you might expect, is kind of the converse of that. So you have less instructions that are simpler, easier to implement in hardware, but you have to execute more of them to achieve the same kind of outcomes as you would with a complex instruction set. Both of these are in service of making your computer faster. That's actually all computer architecture is, just finding new ways to apply physics essentially to different computing domains. And this is the general equation that we use. It's pretty simple. If you hold the clock rate constant, the number of instructions in a program and the average cycle per instruction is gonna dictate how long it takes us to execute it. So sys and risk both want to bring down the CPU time. And since the two factors that we can impact here are instructions and CPI, they take different approaches to making your program faster. So a complex instruction set is going to reduce the number of instructions required to execute a program, required to execute any logic you're trying to implement, thus bringing down the overall time. And a reduced instruction set is gonna increase the number of instructions required to execute a program, but bring down the average clock cycles per instruction. So different approaches, both trying to achieve similar goals. Over time we've started to lean more towards using risk architectures for a variety of reasons, including power consumption and hardware complexity. If you follow me anywhere, you know I'm a big fan of risk five. I'm not gonna stand up here and say it's the best instruction set ever, but I will say it's a modern instruction set. And what I mean by a modern instruction set is that it's taking into account the current status quo of computing. Over the last few decades, we've started to see a decline in Moore's Law, which is essentially the idea that you can put more transistors in a chip every couple of years, thus getting performance for free, if you will. You can just buy a new machine and it'll be faster. Since that started to decline, we've need to be more creative and create domains of specific accelerators for a variety of different computing domains. There's also been just more computing domains that have come in vogue and have become important in our business applications. And there's been an explosion of devices. We have more embedded devices all over the place. There's tiny embedded devices that are in fields and agriculture. And we also have, at the other end of the spectrum, high performance computing and data center servers and that sort of thing. So the way that risk five wants to propose a solution for this is to have a single instruction set architecture but employ the common engineering principle of making it modular and extensible so it can apply to all of these wide variety of domains that we have. The way it does this is it defines some base instruction sets. The four that it defines in its spec are listed here. RV32i and RV32e are the 32-bit integer base and embedded base instruction sets. And then 64 and 128 are corresponding integer base instruction sets for 64-bit and 128-bit. The risk five organization, which by the way, this is all open source, which is another kind of embrace of a more modern instruction set, defines standard extensions. So things that you'd need to do pretty much any of the operations we do today with our higher level languages because as you'll remember, right, we're compiling down complex logic into these simpler instructions. So we have multiplication and division, floating point operations, atomics, memory fences, those sorts of things are defined. But there's also the ability to add custom extensions. So if you're a vendor and you make some custom hardware, you have some proprietary IP, you can build your own compiler or you can modify an existing one. And this slide is a little bit intimidating, but please don't get up and leave. We're not gonna go through exactly what this means, but I think it conceptually helps us think about what a risk instruction set and how it differs from a sysc one. So these are the six core instruction formats in risk five. So they are the way that we encode our instructions as data and memory. And I wanna point out two specific things. One, if you see an operand, such as the opcode or RD or RS1 or something like that, in any instruction format, it's always in the same location, right? It's always the same bits in the instruction format. That makes it a lot easier to decode these instructions in hardware because you can just generically pull out these arguments and then select the ones with a multiplexer that allows you to, you know, based off the opcode decide what you wanna do with the bits that are present in the instruction. The other thing you'll notice is they're all the same length. Every instruction and all of those architectures is 32 bits with the exception of compressed extension, which shortens them to 16 bits, but they're all the same length, which is in contrast to something like x86. If you've ever tried to parse x86 instructions, it is very upsetting. I would not recommend doing it. They have lots of different lengths and they depend on kind of unorthodox things. So anyway, just wanna take that away. We don't have to understand exactly what's happening here. We're gonna see a more applicable example here in a few moments. Also wanted to give example of some of what these base instructions are, just to tie maybe some of our high level languages down to what actual operations happen in a CPU. So we have our base things like addition and subtraction and variance for adding a word, which is 32 bits versus a double word, which is 64 bits. We also have load and store operations. Another important thing to point out about risk architectures is they are also known as load store architectures. That means that anytime you access memory, that's a discrete operation. You don't actually operate on the memory you access. In order to be able to actually modify data, you have to first load it into memory, then you can perform a modification of it and then store it back. And that's what we mean by a load store architecture. So these are two categories of instructions. Another one that's not shown here is things like control flow, changing the program counter and jumping around. So that was a very quick overview. I also wanna give an extremely quick overview of memory hierarchy. If there's any computer architects in the audience, I will take all your ridicule afterwards, but there are two big ideas in memory, which is a brush simplification. But the first idea is that data that was accessed recently is likely to be accessed again. A good example of this is storing a value in a variable and then performing some operation on that variable. And the second one is that data that lives close to data that was accessed recently is likely to be accessed soon. The canonical example for this is iterating through an array or something like that. So values that live close together in memory. We refer to the first one as temporal locality, so it has to do with time. And the second one, spatial locality, has to do with how close together things are. These are two big ideas and they drive our memory hierarchy, which in theory will show here in a second. So our memory hierarchy, once again, in a very simplified view, looks something like this. Registers live in the CPU. In risk five, we have 32 general purpose registers in the base instruction set. They have specific purposes, which we'll once again see in a moment, but that's where the fastest operations can happen. Then we have cache. We typically have multiple layers of cache. In most modern computers, you'll have three levels of cache. And then we'll have RAM, which is what you configure when you buy your laptop on whatever website you're getting it from. These are made with different physical components. So like I said, the registers live in the CPU, the cache is static random access memory, and the RAM is dynamic random access memory. We unfortunately do not have time to get into what those individually mean. But the big idea here is that as you move down the memory hierarchy, you can store more things, but also as you move down, things get slower. What's not mentioned here is as you move down, things also get cheaper, which might make sense given their other attributes. Okay, so now we're actually gonna move to doing something to illustrate this a little bit. So there's gonna be a lot of live demoing for the rest of this presentation. And the odds that things go wrong are extremely high. So I expect you all to bear with me here. We are going to look at a very simple program. We are declaring three 32-bit integers. We are assigning values to two of them. We are then calculating a sum, and we are returning that sum. This is a weird program. I know you probably don't wanna return a non-zero exit code for main here, but this is just to illustrate some of the instructions that are gonna be generated. So let's go ahead and compile this for RISC-5 here. So I'm gonna use my RISC-5 cross compiler, and I'm gonna specify that we want to use a subset of the generic standard extensions. We're just gonna use integer and double precision floating point. So I can say that with RV64 ID, and we're gonna compile main.c, and we'll call the output kubecon. All right, so we have kubecon there. And let's look at what that actually looks like in our main function. You'll notice there that I didn't compile with any optimization, so if you were actually compiling this with any type of optimization, this would all get factored away. It would just be a simple function. It would probably get inlined because it just returns five every time, but we're intentionally not doing that here. So let's take a look at this. We'll disassemble the instructions, and we wanna remove any pseudo-instructions and actually just look at exactly what it gives us. And I'm gonna search for our main symbol here. All right, so this is the body of our main function. The first three instructions there what's known as a function prologue. What we're doing there is setting up our stack frame. So we're first moving our stack pointer down 32 bytes. The stack grows downwards in risk five. We're then taking S0, which is the frame pointer register, and we are storing that on our stack at the top of our stack. That frame pointer is actually from the calling method from CRT0. So we're basically just saying we're gonna save this and reinstate that for you when we return. And then we're going to set our own frame pointer to the top of the stack. So our frame pointer is now up here, 32 bytes stack pointer. This is what we're gonna use for our stack for the remainder of our procedure here. Then we're gonna go through a variety of operations. You're gonna see we're going to load two into one of our argument registers. We're gonna store that on the stack right below where we stored that frame pointer. Then we're gonna store the other argument and then we're gonna load them back in, add them together, return the sum. Then there's some function epilogue there at the end as well. So like I said, there's no need for us to be putting things on the stack in this case. We could keep them all on registers, which would be optimal, but because I compiled with no optimization so we can see more instructions, this is what it looks like. For this presentation, I actually wrote a RISC-5 emulator in Rust because I wanted us to be able to see exactly what was happening. And lots of emulators don't actually give you really good insight into instructions and just kind of assume you know what's going on, which I don't like very much. So hopefully this one will do a bit of a better job. So with each instruction that we get, it's gonna describe what the actual name of the instruction is, as well as logically what's happening here. It's also gonna tell you the format, the exact binary of it and what the operands are. So in this case, we are adding an immediate, which is negative 32, so we're essentially subtracting 32 here. From our stack pointer, we're creating our stack frame. You can look at the registers if you want. This is kind of hard to see on the screen, but you'll see that we have a frame pointer loaded in there at 0x80010. And we have our stack pointer set to 80000. So we're gonna subtract 32 from that. E is execute, it's gonna let us know what happened. Now we're gonna move to the next instruction. And this one, we're gonna store that frame pointer on our stack. So this is a memory access. So we should expect for this to take a little bit more time. And as you'll see here, it is hanging a little bit, but it should let us know, yep, that the value that was in S0, which is our frame pointer, was now written to the address that was specified as a 24 byte offset from our stack pointer. We could also step through a few other instructions here. Once again, any of them that are acting on registers are going to complete much faster. And any of them that once again are storing to memory are gonna be a little bit slower. So I am going to stop doing that for just a moment because I wanna show you one other thing. Let me make this one a little bit bigger and let's see what we have here. Well, I don't know what that play me thing is. Did y'all see that earlier? That looks like it's new. Let's take a look at what that is. I'm a little concerned. Jason, Jason, can you hear us? Are you there? Hey Dan, I'm sorry to interrupt everyone. Hi, I'm Jason from the talk. Dan, I'm sorry to interrupt. This is registries after dark, right? Am I in the right room? Thanks for the super interesting deep dive in how computers work and everything, but can't hope that notice we haven't mentioned OCI registries yet. And I kind of thought that was the point of the talk. So if it's all right, I think I'm gonna come and dear take over for a little bit and try to get us back on topic. So how's this related to OCI registries? Well, I'm glad you asked. There was an interesting thing you mentioned. You said storing and retrieving data is the foundation of all computing. I thought that was a good, maybe a good segue into OCI registries. After all, OCI registries let you store and retrieve data. You can put a blob, you can get a blob. Everything works out. If you squint, clients even do a form of caching with a temporal locality. If you pull a blob and then pull it again, a lot of clients will just say, hey, I have a copy already here on desktop. You don't need to fetch it again. So that's kind of like caching, if you think about it. You may say, but OCI registries are just for container images, right? Well, typically, historically, sure. But it doesn't have to be. In a lot of cases, registries don't care what your blobs are. If you put anything you want in there, they'll allow it. And in fact, in a lot of cases, registries don't even know what your blobs are. They're just going to accept them and store them for you. So yeah, you can put anything in there. Why would you want to do that? Well, content addressable storage is a cool concept. You can apply meaningful tags for humans. You can see registries after dark, part one for a lot more information and a lot of interesting demos about storing things content addressably and with tags in registries. OCI provides a consistent portable API. It's widely implemented across a lot of clouds, AWS, Google, everywhere. And even on-prem, you can run a registry locally. And in all those cases, auth is mostly normal, is the best I guess I could say about it. So like my friend Drake here would say, you can keep your S3 and your vendor lock-in. It's OCI for me every single day. And we are not even alone storing a lot of other stuff in OCI registries. There's a lot of projects these days that are storing YAML of all kinds in container registries. Tecton tasks can be pulled from registries, cross-plane packages, helm charts, tons and tons and tons of other things. You can even sign contents and attach S-bombs and attestations to those things using cosine. And tons of Wasm thingies. I didn't go to Wasm day, but I hear Wasm is all over OCI registries these days. The whole ecosystem of stuff around OCI registries, anything in an OCI registry can be signed. You can say, I'm Jason and I approve this thingy and here's a signature that proves that I'm me. Those signatures are actually also OCI thingies, which means they're also portable across registries and clouds and on-prem with auth and everything. So that's pretty cool. I would even go so far as to say, because registries can be sort of arbitrary storage, in theory, you could store your program's memory in an OCI registry, but that's, but then that's crazy. Then that's not what you're doing, is it? Then, oh no. Oh no. Oh no. Well, I didn't think that Jason was gonna join us, but it turns out that he seemed to have dropped a little video for us. And he also kind of spoiled what I was doing, so I'm gonna have to talk to him after this. But Jason is in fact right. When we were storing to a memory there, that was actually writing to an OCI registry. In this case, it was Docker Hub. And so let's kind of step through what's really going on here. So most risk architectures follow some variation or some evolution of this pipeline here. Starts off with fetching the instruction. Then you need to decode it, figure out what we need to do. You need to execute it, and then potentially access memory if it's a load or store operation, or write something back to a register. So in this case, we were fetching an instruction. We're looking specifically at that store double word operation there. We were fetching that from just local memory. We read in an elf file and just we're parsing that. So we didn't actually pull that from the registry. We decoded that, and that looks something like this, where we broke it down into those components we saw on the format slide earlier, and then we combined them together, found out what registers we were accessing. So for instance, RS2 is X8, which is the eighth register, or the ninth one, since there's zero index. And that's the frame pointer. We also had RS1 pointing to the stack pointer, and then we had our immediate value, each of these in binary. Another thing that you'll notice here is RIS5 is a little Indian architecture, meaning the least significant bit is the rightmost one, or is at the higher memory address. Next, we executed it. In this case, execution was just finding that calculation of adding 24 to our stack pointer, and getting our memory address of where we wanted to store this double word. And then we didn't have any right back in this case, but we did have a memory access, and it looked something like this with those layers included as well. So this is an OCI manifest here for an image, and you'll see that our single layer is only eight bytes, because we are storing a 64-bit double word, or an eight-byte double word, and we have the hash there. You'll also see that we told a really big lie and said this is a file system image layer, and it's a tar ball. I don't know many eight-byte tar balls, but that's what we said it was. So let's actually see if that's really what happened. So just before, while Jason was talking, I went ahead and I posted a tweet that tells how you can follow along with this, and actually pull these images. This is a public repository, so you're welcome to do the same that I do here. But the link to the actual repository is this one, if you wanna look at the tags, and if you want to pull down a hashtag KCEU22, you can pick any of the tags there, and those will be memory addresses that we actually wrote to. And so that's what I'm gonna do now. Let's make a new directory, we'll call it image surgery, and we'll go in. All right, we don't have anything here. I'm gonna use crane. You may have some trouble with Docker if you're pulling down with it, because Docker is not gonna like that we lied about that tar ball. But crane is much more accommodating for us here. So what did I say it was? KCEU22, and let's see if we can find one of those memory addresses we wrote. How about that first one that we were walking through? Looks like we wrote it to this address. So that should be the tag that we have, and I'll put that just in a tar ball. Let's see if that works. It looks like it did. All right, we have a tar ball here, and I'm gonna go ahead and untar this. All right, and it looks like we have what we would expect from an image. We have our manifest here. We have a GZip tar ball layer, and we also have our config. In this case, our config is not very useful. I just stuck Rimu in there, which is the name of the emulator. So that's gonna make things unhappy. But if we tried to look at our GZip tar ball, that is not a GZip tar ball at all. It's just a data file. So let's see what data is in there. So we're gonna pipe that to a hex stump, and I'm gonna say this is little indian, and we want to group by eight bytes here. And what we're gonna see is we have 80010, and we were wanting to store our frame pointer at this address. Let's see if that is our frame pointer. Yep, it does look like that's our frame pointer, and I think we performed some other operations as well. Let's see if our tags are present here. Yep, it looks like we have a few other memory addresses as well. And if we wanted to go through and continue executing this program, we could continue to have as much fun as we've been having this whole time and continue to store new things in our registry. So we should see a new tag pop up here, and we do. So we can continue to share this memory with everyone who is likely watching this all over the world and very enthralled. So why are we doing this, right? This is ridiculous. You probably should never do this. So why are we doing it? Well, one of the things we want to illustrate is everything is just bits, right? And it's just data on a wire. And frequently we think of things in different contexts, because it's useful for structuring it in our brain. But in reality, we're just passing information around. And Jason mentioned a few different applications of putting non-image things in a registry. I work on crossplane. We do that quite a lot, and are big fans of doing it. So I hope this kind of makes you think a little bit about what things might be good to put in a registry that you wouldn't normally think of. There's also a distribution. It allowed for this cool demo where y'all could pull it down and look at that and go see it on Docker Hub if you wanted to. It also was free for me, at least. I was able to push to Docker Hub and hopefully we didn't abuse their rate limit too much. But they're available and relatively cheap. So it didn't take a lot of effort. I didn't even create that repository before. It just pushed created. So lots of cool things there. But we obviously don't want to do this. This doesn't make any sense. Why doesn't it make any sense? Well, we're trying to emulate a program and the network is very slow. Obviously, I was stepping through it in this case. If we tried to actually execute it, this would be a massive, massive bottleneck. Also, we don't really need distribution here. I hope that maybe some people looked at the memory contents there, but you really didn't need to. I don't need to share the memory of a process I'm debugging via a container registry. So it's not a great application of what it can give us. And the last one, and this is the one that folks don't usually think about as much, but are you taking advantage of the data structure of a registry? The subtitle of this talk was distributed random access Merkle DAGs. A Merkle DAG is essentially a directed acyclic graph, but one where the nodes are a hash of their leaf labels, which are a hash of their content. In this case, we're not getting a lot of benefits from that because we have very tiny payloads and we're not really reusing them very much. That being said, in theory, you would be able to reuse the same layer if you push the same memory address. But in general, not a great use case of it. So what I hope you have kind of walking away from this because I know I'm probably cutting into Q and A here, is that there are some great applications. Jason listed a few of them and there is good reason to use the registry for some things, but there's also good reason not to. So be due to this in your usage. And with that, I'm open to any thoughts, comments, or questions, and expressions of utter disgust. I'm not really surprised, honestly. All right, well, I did go all the way up into the end. I am more than happy to talk about any of this. When Jason and I were making this presentation, this is about 5% of the total content that was in it and we've been systematically cutting it down, which I think it was still kind of a fire hose. But would love to talk to anyone who found this interesting or wants to talk more about RISC-5 or container registries or anything like that. But once again, thank you for coming and hope you'll have a great rest of your KubeCon.