 All right. Good morning or evening depending on time zone. We're here for a presentation about the S390X architecture, something that I've never heard about. But I mean, that's why we're here. We want to learn something new. So I'm excited to hear something new about this stuff from Claudio Imbrenda and Janos Frank. Thank you. So welcome to our introduction of the mainframe CPU architecture. We are both two KVM engineers. So we have a lot of insight into the architecture because we have to support all the machine-specific parts in the Linux Hypervisor KVM. And we thought we have seen so many interesting instructions and so many interesting things in this architecture. We'd like to give you an overview. So we'll start up with some history because the mainframe, as we know it started out in the 60s, then we'll go over the general CPU, most of the registers, interrupts, instruction formats, a lot of memory management and peculiar memory protection, and at last some I.O. Yeah. It all started in the 60s with the S360, which was the first abstract architecture. It was a nice idea to have an abstract architecture independent from the implementation so you could sell a cheap mainframe. And if the company grew, they could just buy a more expensive mainframe and just re-use the software. It was a brilliant idea. And already in the 60s it started with 32-bit registers, 24-bit addresses, and with model 67 MMU and paging already. So it was already quite advanced. In the 70s they decided that maybe 24-bits were not enough, so 31-bit addressing mode, yes, 31. And some other innovations but we're not going to cover that in the 90s. More extensions, L-Power, and Access Register mode, which we will discuss. And finally in the 2064-bit version, also called Z-Architecture or IBM Z, or the name changes every time there's a new machine basically. So 64-bit registers, 64-bit addresses, and the Corel Machine is the Z14 introduced in 2017, so it's quite recent. This is, yeah, the history. Let's see, let's see, do you know how to count? Let's start with 900, 990. So the 900 ones are the big machines, the really big machines with a lot of cores. The 800 ones are the small machines. Then they went up and added 90 for the next generation. Yeah. And then we have nine. EC stands for Enterprise class, the big ones. Again, BC is the business class. And what comes after 10? Because that was the number of cores, obviously. Kind of, I mean. Yeah, let's go. And that's, at least it's, the number is 12, 13, 14, you know, that kind of makes sense, at least. Yeah. Every time a new machine comes up, we ask ourselves what they will come up with as a new numbering scheme. They have been quite a quiet lately. Yeah, let's hope for the best for the next one. Oh, let's start with instructions. Convert UTF-8 to UTF-32. You have this instruction. It has two operands, R1 and R2. They actually, it says somewhere it's, they have to be an even odd pair, register pair. So actually it's like four registers. And R1 has to be an even register. R2 also has to be an even register. The even register is the address and register after it is the length. And what happens is that the UTF-8 string in the source operand is converted to UTF-32 and stored in the destination operand until one, until the source run out or until we fill the destination or until a CPU determined number of characters have been converted. In which case you have a condition code and you can just loop on it again until you finish. And of course there's a whole family of instructions to convert between all the possible combination from UTF-8, 16 and 32 into all the others combinations. And the M3 field is used for deciding what to do in case you get an invalid, an invalid character. As you can see it's a very sisk architecture. Yes. So registers. We have 16 general registers. We have 16 control registers. We have 16 access registers. We will talk about the access registers later. We have one program status word which we will discuss later. One prefix register which is 18 bits. We will discuss about that later. Floating point counter register. We will not discuss about this, but this is just some kind of floating, some counter register where you can choose like the rounding mode of the floating point operations and that kind of stuff. It's not really important. 16 floating point registers which are partially aliased with the 32 vector registers. And this drawing shows how the aliasing is performed. So you have the first 16, the first half of the first 16 vector registers is aliased to the floating point registers which is an interesting choice. Oh, yes. Another crazy instruction. Vector multiply and shift decimal. So in one of these vector registers you have a sequence of binary coded decimal numbers. So you have one digit per nibble. Yes. Binary coded decimal. Vector instructions. Why not? Because I mean you need many digits and the vector registers are big enough, right? I mean, come on. It's absolutely logical. So this makes multiplication between the second and third operand which is then shifted right by the number of digits specified in the fourth operand which is an immediate value and placed at the first operand location and the operands are in the signed packed decimal format because I mean it's complex but basically the last digit is not a digit. It's the sign. And here's some interesting things is that V1, V2 and V3 are only 4 bits long. So we have 32 vector registers so we need 5 bits, right? That's what this is for. This contains in order the first big bit, the highest bit of the first, second, third and fourth operand. In this case we don't have fourth one. So yes. So if you have register 16, here is zero and the first bit here is one because all the other instructions have the same format but they only have 16 registers so you only need four but here you need five so put it there. Why not? What does M5 even do? I don't remember. Yeah, this is the kind of operations. Oh, program status word. Yes, this is a combination of what you'll find in other architectures as the flag register and instruction pointer. So here's just one thing. It's called program status word because it's actually not even a word, it's longer but who cares? And it contains of course the instruction address and some other interesting bits. For example, the most important are the T bits which is used to switch paging on and off. The IO mask to mask IO interrupts. The external mask to mask external interrupts. The PSW key which is something we will cover later. The machine check mask which can be zero of course. I mean your CPU might literally be burning but maybe you're doing something more important and you do not want to be disturbed. So you set that to zero and you continue. In the wait state, when the wait state is on the CPU will stop running and will wait for interrupts. Other architectures have an instruction for it. We don't. We have enough instructions already, I guess. P is the problem state which means use the space because... The name actually comes from the 60s to a school-ready problem state because at the time you didn't really have real-time operating systems. You had the scheduler that would just schedule batch tasks and P means you are now processing a task instead of... Other space control which we will cover later. Condition code. Condition code is two bits. Many architectures have specific flags to mark the outcome of arithmetic operation. For example, the result is zero or is not zero or there was no flow or all kind of stuff. We don't have those bits. We only have two bits and these two bits... So basically we have four different possible values for the CC field. Every instruction has a specification of which condition calls they give out. Of course, it's consistent between arithmetic and floating point operations. But many instructions just say, oh, if something weird happens, we will come back with this condition code and then you have to handle it. Programmask is to mask some exceptions relating mostly to floating point, not very important. EA and BA is for undersea mode because you can still have programs that need to run in 24 bits undersea mode in which case wrapping around needs to be performed at 24 bits. Boundaries not at 31 or 64. So you need to have the right bits there, otherwise things don't work. No. Should we say why 31 bit? Basically they use the highest bit to choose between 31 bit mode and 24 bit mode. That's why it's 31. But 64 bit mode, thankfully, they didn't do that. Yeah. Oh, yeah. Rotate then or selected bits. So you have two registers. You take the second register, you rotate it, sorry, you rotate it by the number specified by one of these immediate values. Then you take... You only consider the bits starting from I3 up to I4, so you don't consider the whole register, but only those bits. And then you perform an OR of that and the result may replace the selected bits of the first operand because you can put, I think, here, the highest bits indicates whether you just want to do the operation just to see what the condition code is or if you actually want to actually write the result back in the register. Yeah. Every architecture has a stack, right? It's like push, pop, you know, for passing... No, we don't have a stack architectureally. It's just an ABI convention on Linux, I think we just registered 15, one of those as a stack pointer, but you can use anything. I mean, there is no push instruction. There is no call instruction. I mean, there is a call instruction, but it does not push on the stack. It just writes the return address in the register and you can choose which register. So... And there is no return instruction. You just jump back to the address that, you know, you came from. It's up to you to save the return value. Interrupts. We have six interrupts, machine check, hardware failure, supervisor call, yeah, syscall, program needs to call the operating system to do stuff. Program interrupts, which means exceptions. External interrupts, some time-related things, some other inter-CPU things, but it's like not proper IO, but it's also not... I mean, it's weird. IO then, of course, and the restart interrupt, which is called when the restart button is pressed on a console, which we don't really have a console, a physical console anymore, but you go to the web interface, and you click restart. Interrupts can be masked in the PSW. As we saw, there are three bits in the PSW to mask some of these interrupts. Some interrupts are floating, means that they are just pending and they can be delivered to any CPU that is running, but most are directed to one CPU, most notably program exceptions, like it's the CPU that generated the exception that has to handle it. Yeah, this is the list of program interrupts. It's sometimes very nice because you know exactly what happened, because you have a very specific code that says exactly that it was not a page fault, it was a translation specification exception, because there was some other issue in the page table, but it was not missing the page, right? Yeah, I guess we can just move on. So if you want to translate that into other architectures exceptions, the first one is operation, which means you've used an illegal instruction. Privileged operation, well, you've used non-privileged instructions or you don't have the rights to execute the instruction, but you tried it. Executed something will come on later, maybe. Protection is virtual memory protection. Specification is when you gave a wrong, yeah, I'm not doing all of them. If you gave some special subcodes, like in an immediate value or in register value, which is currently not available in the instruction. Or in case you, as the instruction I presented before, they wanted an even register, if you put an odd register, you get a specification exception, for example. And then as you see, we have lots and lots of translation exceptions, because we also have a lot of virtual memory management. John. So what happens when an interrupt is delivered? Well, on normal architectures, you say, okay, we push the return address on the stack. And maybe if we have some parameters, we put that we pushed them in the stack. And then we jump to the interrupt handler. We don't have a stack. So instead, we have a vector of return addresses. So you save the old PSW, so the instruction pointer and all the flags in an interrupt-specific location. You load the new PSW from another interrupt-specific location, so your interrupt vector. And the interrupt parameters are also saved in an interrupt-specific location in memory. And here you can see exactly where in memory those things are saved or read from. This is in a low memory area, which is why it's called low core. And it's called core because back then it was actually with core memory. Plus the facility list is stored somewhere there. Yeah. So you just store everything in a fixed memory location, read everything from a fixed memory location, and go do your business. When you're finished with the interrupt, you jump back, you just reload the previous PSW and everything is good, right? Does anybody see a problem with this approach? You want to have interrupt disabled when you are handling interrupts? It's okay. That is another problem, though. Does anybody see? Yes? Bingo. Do you remember the prefix register? So what happens is that the prefix contains the 18 highest bits of a 31-bit address aligned to an 8-kilobyte boundary. So the 13 lower bits are zero. So 18 plus 13 is 31, yes. So what happens is that all these addresses are real addresses which are translated to absolute addresses by means of prefixing. So what happens is basically this 8-kilobyte area that is specified by the prefix register is basically swapped with the lowest 8-kilobyte in memory. So when you think you're accessing the lowest 8-kilobyte, instead you're accessing somewhere else which is specified by the prefix register. And each CPU has its own prefix register. So basically the lowest 8-kilobyte of memory for each CPU map to different blocks of physical memory which is called at this point absolute memory. And that's how it works. And the funny thing is that these prefixing was already present in the 60s. I have, I forgot it, but I actually have the book, the instruction manual for the CPUs for an S360 and they already had prefix registers, exactly for this purpose. So we have three different address types. Absolute is what is really a memory which is seldom used. Then it is real which is like absolute, but it's before prefixing, not after. And virtual is before that translation. So first you have a virtual address, you translate it to a real address, and then finally to an absolute address. And that goes in memory. OK, let's come to the fun part, the parts from the 60s. Back then they didn't really use virtual memory management, but they still needed protection for physical pages. So every physical page has a storage key associated and that storage key is not in addressable memory. You can't manipulate it in memory. You have instructions for that and only the kernel can do that. As I said, each key has it. Each physical page has the key. And the key is four bits of key ID, a fetch protection bit, a reference and a change bit which show read indication and write indication. And every time you try to access a physical page, the key in the PSW will be matched to this storage key. So we'll have to fetch it before we can actually access the physical page and compare it to the PSW. If they do not match, you're not allowed to store. And if the fetch protection bit is on, you're also not allowed to read from it. Wonderful thing. In Linux, we haven't used them in ages. Virtual memory management is just easier. There's a special key which is the zero key and that matches for everything. So even if you have another key ID in your PSW, you can access key zero pages. Then we'll come to the virtual memory which is called that dynamic address translation. Page size is 4K. And we have five levels of pages and the byte index of 12 bits. As you can see here, they are evenly spaced until you come to the page tables and the byte index. As normal in every other virtual memory architecture, you index translation tables until you index bytes. Each entry in these translation tables is 64 bits. And we are actually able right now to exercise all of the 64 bits of addressing space. So we're not doing the internal way of giving a CPU ID bit and saying, oh, all right, I got a few more bits of virtual memory and physical memory. We can do the full thing, which is important because we are currently at 30 terabytes of storage of memory, what we call storage in one machine, which is a lot. A lot of translation tables. The highest one is called region first. So here and the region tables which go up to the region third have a table type to distinguish them and a table length. So actually you don't need a full four pages for each region table. You can just say, well, I only need one or two and go on from there, which is quite nice. Usually on other architectures, page tables occupy exactly one page, which is, yes. As you noticed, we have 11 bits and 4K pages. So we can need up to four pages per level, which is why we have this table length and TF as well. Then we have the segment table, which is basically a special region table. It has the table type of zero. And then the last one is the page table, as you know it. We have EDAT, which means we have one megabyte huge pages and two gigabyte huge pages. So we can have huge segment table entries and huge region third table entries. If you want to span up an address space, you have to use an S key, an address space control element. And they are stored in control registers 1, 7 and 13. So we have how many spaces? Three. More or less. They have a designation type, which tell you which region table or segment table or which table comes as the first table. So we can actually only have two tables if you want to. We can just have one segment table and a page table or we can have one segment table and huge page entries and then we only have one table. Quite nice. So we have four address spaces. You forgot one thing. Go back. One interesting thing is the R bit, because you can still specify that an address space is a real space. So even though you have enabled paging, you can still use real addresses on a specific address space, because why not? Yeah. Nobody uses that. But we have to support it. So we have three address spaces. The first two ones are for user space, the primary space and the secondary space. The thing with the secondary space is instruction addresses come from primary space, but data comes from secondary space. And that's actually how the Linux video is always implemented. So it's quite nice to have a second data space where you can grab data from, which isn't in your primary space. And then we have the home space, which is basically kernel space. A user space program is not able to go into kernel space. Also, everything, the spaces are controlled via the PSW bits. And as Claudio said earlier, you can set that on and off if you are in privileged mode. If I understood this correctly, you did not do any... So if I understood this correctly, you did not do any address space layout randomization for the video with O object. That's a good question. We have to look that up. Okay. That's a Linux specific question anyway. I mean, nothing to do with architecture specifically. You see, one interesting thing in there, I said, we have three addressing spaces, but there's one more thing, the access register mode. Have fun. Okay. So did you remember, I told, we have 16 access registers. So each of these access registers designates indirectly an ASCII and address space control element. So basically an address space. So when you are in address register mode, when you reference data using a general register, many instructions have like base register plus an offset. So the base register, the number of the base register is used to index the address registers. So if you're using register one, the content of register one is added to the offset, but also the first address register is used to choose an address space, and then that address is accessed using that address space. If you're using the fifth register for that, the content of the fifth register is added to the offset, and then the fifth address register is used to identify the address space, which will be used to access that offset, that address that you just generated. If you think, why? Well, this was introduced in the 90s, so the architecture was still 31 bits, and memory was already bigger than two gigabytes, so they needed a way to access more than two gigabytes of memory for, for example, large databases. And that's why we now have these, and we cannot get rid of it. And of course, there is an ALB, which is the ASCII lookup buffer. It's like an ALB, but for address spaces. So it translates this address space number with a two level table structure into a proper ASCII, which is the root, then, of the page table tree. So there's not only the TLB to take care of also the ALB because, you know, and general register zero always indicates the primary space because, I guess. So yes, you can fetch data from 16 different virtual address spaces at the same time if you want, but with great power comes the ability. And this is how it works, kind of. So you have the instruction, you have, as I say, the base register and the displacement. So you take the content of the register and the displacement, and you get an address, like normally, but also you take the number, you index the access register, you take that, you do a lookup of that number, and you get an address space, and then that address space is used to translate the virtual address into a real address, of course. Then you need to do prefixing. So you can have seven layers of tables until you have an address if you really want to. Yes, lots of time. So we have this interesting stuff with different address spaces, right? So kernel has its own address space and user space has its own address space. So how do you move stuff from kernel space to user space? Like, I mean, the kernel needs to read from user space, do things, and then write back to user space. In the Linux kernel, there's a function called copy to user and copy from user. And in other architectures, they do things. In our architecture, it does exactly this instruction. So what happens is that the first operand is replaced by the second operand. This is basically a move operation. And the bits in general register zero determine the address space control modes and protection keys that are used to access the first and second operands. So you can say, yes, move from here, from this access address space to the other address space. I'm not going to do this last documentation. And the interesting thing is that it moves at most 4K of data. And if you have more than that to move, then what happens is that you get a condition code, and basically you loop on the condition code until you're done. So we have copy to user in one instruction. Because why not? We don't really have memory mapped I.O. We have channel I.O. Very interesting stuff. Normal channel I.O. is basically disk, tape, 3270 console, and punch cards. Network is a bit of a special I.O. They have our instructions for that. We have PCI Express. We have crypto cards on an own bus. And we have text consoles in ASCII and EBC-DIC. Yes. Wonderful. And we still do IPL, which means boot, from virtual punch cards sometimes. Sometimes regularly. Yes? Repeat the question, repeat the question. We don't have punch card devices as far as I know. We have virtual devices for that. So I'm actually not sure if you can put an old punch card reader from the 60s onto a new machine. Would be interesting to try. Each channel has a 16-bit identifier which identifies the device. And this channel executes channel programs. So they consist of channel control words, and they basically tell the device, well, I want some I.O. from this disk. How much I.O. do I want, and where to put it? We have instructions which manipulate the channel to do the I.O. And at the end, we get I.O. interrupts on every CPU that's enabled for an I.O. interrupt, and they can then choose to take it or leave it. So we don't have directed, normally, we don't have directed I.O. interrupts onto specific CPUs. Yes. This is a very nice instruction, load logical and zero rightmost byte. You have a memory operand, which is that you have a base register, an extra displacement register, and a 20-bit signed integer. This is, by the way, the lower part of the address, and this is the high part of the displacement. Yeah. In the end of the thing, you just read four bytes from memory. You put it into a register. You zero out the highest bytes, but also the lowest bytes. So basically, you're reading three bytes from memory and putting them in a register. And fun fact, it is unpredictable whether an access exception is recognized for the rightmost byte of the second operand. So if your access is not aligned and the last byte is on a different page, maybe with different permissions, you don't know if the permission, I mean, if you will trigger the protection or not. Maybe, maybe not. I mean, you're not getting the byte anyway because it's going to be zero, right? Time? Time. So there are some special dedicated instruction for timing. There's the tod clock, which is very funny, just to be general. It's increased in real time. It's a global clock. And then you have your timer, which is decreased only when the CPU is dispatched. You basically cannot run an operating system on bare metal, although, kind of, you can, some, but not all of them, but it's not supported anyway. You can't. Yeah, you can't. So this timer is only decreased when the CPU is dispatched. So this indicates the CPU time that has been really used and it generates an interrupt when it reaches zero. And then you have a clock comparator, which instead compares the value in the register that you have to load with a special instruction with a current tod clock. So when you go over, then you can interrupt. And this is used, for example, for scheduling and for many other timing functions because this is actually real time instead of CPU time, which might depend on how loaded the machine is in general. Instructions? The first two bits of the instruction indicate the length of the instruction. So it's not like Intel where you have no idea how long an instruction will be. You have two, four, and six bytes. And you have just a couple of different instruction formats with a couple of variants. I'm not going through them. Let's go to the branch instructions instead. There are three main types of branch instructions. You have branch and link, which is basically used for the subroutine call. It's an unconditional jump. You jump to the destination and you save the return address so that you can jump back there when you're done. This is the call subroutine call. Branch and count, you give a register and your branch shift register. The register is decreased and you jump if it's not zero, it's for loops. And then branch and condition, which it's conditional or unconditional or it's an op, depending on how you use it. You have a four-bit bitmask of condition code. I said condition code is two bits, so you have four different values. So here you have a four-bit bitmask. So you have a bitmask for each possible value of the CC. And if there is a match, then you jump, otherwise not. So if they are all ones, it's always a match and conditional jump. If it's somewhere one and somewhere zero, then it's conditional jump. If they are all zero, you never jump. It's an op. Yeah, speaking of which, branch and condition. This is exactly what I said right now. Do you have the target? You have the M is the bitmask and you jump if the condition matches, except that if the M field is a specific value and the register is zero, then this is not branch instruction anymore, but it's used as a checkpoint synchronization or just a serialization instruction instead, like memory barriers. Why use a branch instruction for that? But it's there. That's how you do it. That's how memory fences are done. Oh, execute. Yes, so, and I checked actually, this was already there in the 60s. So you have an address, again base register plus offset register plus fixed offset. And what happens is the instruction at that address is executed. It's not a jump. You're not jumping there. You're just executing that instruction and then continuing. Bonus, the R1 field is a register and the lowest byte of the register is ORed with the second byte of the instruction before it's executed. Stop laughing. I was told we are using that in Lipsy. And the current address is the one after execute. So if the target instruction is a call instruction, the return address is the one after the execute instruction. So you continue after the execute instruction. If it's a normal jump, then you just jump. But if it's a, you know, return, it's, yeah. I have no idea why they needed this, but it's there. And if you try to execute another execute instruction, you get an execute exception, which is the one we discussed before, of course. Floating point. We have hexadecimal floating point, which don't laugh. This was before the IEEE standard existed. So imagine in the 60s, you had already 64-bit floating point. Not bad. And then binary floating point, which was actually introduced quite recently, and decimal floating point in hardware, which is used by banks for bank stuff. But it's there. There are decimal floating point instructions. Goal. Store facility list extended. So on Intel, you have this CPU ID instruction, which you have to put some strange values and some strange registers, and then you get something back, some bits here and there, and you have to guess what this bit means, which feature this means. Here instead, you just give an address, and the CPU will just write in that memory area a sequence of bits. And that's all your features. Most of your features. Yeah, most, yeah, yeah, right, right, yeah. The fun thing is it can be up to two kilobytes, but it's not specified whether the extra bits are actually stored or not. So the CPU is allowed to only store the bits that are actually sets and not go further. So you have to zero everything first, if you want to check. Yes, any other interesting things I don't think, right? No, this is not a privileged instruction, by the way. It can be run by user space. Oh! And that's basically Godzilla. It has 15 pages of documentation, and we'll have a look at it, yeah. So this is the perform locked operation. It's used to do some interesting atomic operations, like Comparance WAP, although we do have a separate Comparance WAP instruction, but this can do that as well, because why not? So it has two implied registers, like register zero and register one are always used for the lock, for locking. And there is a function code, which is used to determine which function to use, and then you have these two, B2 and D2 and B4 and D4, they point to some memory areas where the operands, where there's a list of where to find the operands for these instructions. We have Comparance Load, Comparance WAP, Double Comparance WAP, Comparance WAP and Store, Comparance WAP and Double Store, Comparance WAP and Tribal Store. And we also have that on a lot of operands, so we have that up to 128 bit, because reasons. Yeah, if you want to have fun, this is a list of other interesting instructions that we didn't have the time to present. You can look them up yourself. Throughout the presentation, you have seen this pop and then some numbers. This is a reference to the chapter and page number in the principle of operations, which you can download freely if you're good at Googling. Or you can actually register on the IBM website and get it anyway. It contains basically everything we have presented here and many more other things, including the description of every instruction. So if you want to have a look at these perform locked operations, all the 15 pages of it, it's in there. 2,000 pages of fun. Yup. If you really want to play with it, well, you can buy a mainframe. There have been some people who actually did that. There was a college student in the US who did that. Recently, some people from the UK bought an old mainframe out of mines somewhere. Nomek. Nomek? Yeah, I forgot it. There are the CPDT tools from IBM, which are development tools where it's basically an emulator on X for C. It will be hard for you to get it, I guess. And I don't know how much it costs. But there's also QMU and TCG and QMU, so you can run Linux on S390X, on QMU, on your Intel machine if you want to. We have actually some Linux distributions. We run on the mainframe, namely Debian and Ubuntu, Red Hat Fedora, which we are currently using for development on our machines, and SUSE Linux. All right, thank you very much for this interesting presentation. We still have 10 minutes for questions, so feel free to ask. What's the total number of instructions in the set? 1,500. I counted them. Okay. With a script. I didn't. Okay. So, going back to the storage keys for physical addresses, you mentioned that the storage keys are not used because you prefer the virtual pages and you've been doing so for ages. Do you know of any system that still uses the storage keys? All of the... Either in combination with the virtual address. Yeah, yeah. All of the IBM operating systems. So they all do this? Yeah, yeah. Okay. On Linux, we don't use it because we don't need it because we can do everything with paging. Because other architectures do it with paging and so Linux is made to use paging, basically. There's no need to use storage keys. It would have been hard to include it into the Linux common code, yes. And Linux torvus would probably come screaming at you and rightfully so. All right, thank you. Any more questions? So, with all these instructions, this is all microcode, I guess, on a CPU, because, I mean, you couldn't possibly implement awesome hardware or can't you? Let me answer this one. So, many instructions are actually surprisingly implemented in hardware. Some instructions are microcoded, but most of the instructions, especially the most complex ones, are not in microcode, but in millicode. Which is basically means it's implemented in software, kinda. As in, there is a special millimode, which where the most complex instructions are implemented or in some cases, like the fast path is in hardware and the slow path is in millicode. And this is nothing terribly new, by the way, because the deck alpha and the, what was it before? The Vax also had a similar thing called PAL code. So it's not really a big innovation on that, but most instructions are implemented actually in hardware with some, with millicode and these crazy ones, the most complex ones are in millicode. And millicode is weird because, of course you cannot use the instructions that are implemented in millicode, but at the same time, you have some more extra instructions that you can only use in millicode to do things that you normally are not supposed to do. And they have a lot of registers and actually you can concurrently upgrade millicode while the machine is running, which is very nice. If you wanna fix your CPU architecture while running. Can we use an update the millicode or can only IBM do this? I have, honestly, we never had a problem, but I think, I think that's done by IBM. I've done a lot of development the last few weeks with firmware developers, and I have actually no idea how they bring the code on the machine because I always ask them to update it. So I don't know if you're allowed to, I don't think so. Yeah, my guess is no. I don't know if you would be able to in any way because you would need to have special access, yes. So there have been many instructions and most of them have been commented along the lines of, well, you can do this, but you could have just used a simpler one. My question is, do you have any instruction that comes to your mind where you would say it's a particular good one that maybe you would wish to have on other architectures? Well, MVCOS is one of the best. Copy to user, MVCOS? Move characters. Optional specification. They want to copy, they want to do a mem move from one other space to another other space. That's a very good one, and yeah, that means that basically our implementation of copy to user and copy from user is literally a loop over this instruction until it's done. Instead of doing crazy things with page tables and stuff. So yeah. And we also have, I mean, we have instructions to basically solve one specific problem, and that problem might have arose in one of the four or five operating systems the mainframe currently supports. Or supported. Or supported. So it depends on which operating system it came from, how usable it is currently for Linux or another operating system. We have interesting things like pauseless garbage collection so and Java support functions and all that stuff. So we can do crazy, crazy things. And we're in the fortunate position to actually be able to talk, to just phone up architecture and say, well, Linux needs a new instruction. We need you to. I've been told that's how this copy to user instruction was introduced actually. And PLO was also implemented that way because one operating system didn't want to solve one particular problem, so they solved it in the processor. It wasn't Linux. Just to make that clear. All right, any more questions? Well, then I still have one. If I want to see one of those living fossils, where do I need to go? Are there some places, enterprises, museums that still use these? It's some time ago, I was there, but the University of Karlsruhe has a set 10 in there for your underground floor. Yeah, I think the kid has cooperation with IBM, but also in the laboratory in Bübling where we work. We have most often the newest ones in the glass box. And we also have a museum with some of the old ones, which if you know the right persons, you can visit. They have working S360 mainframes. But sometimes the glass boxes are not there, which means that when there's some kind of event somewhere, sometimes they get moved to the event. So sometimes if there is an event with IBM mainframe stuff, maybe you can, if you're lucky, you can get to see that in the inside of the mainframe. Yeah, the KVM forum would be one and the open source summit sometimes. Okay, I think that's the time to wrap it up. Once again, thanks a lot for this cool presentation and have fun for the rest of the day.