 Mae yna Simon yn ddweud â'r AEP, sy'n ddweud o proses o ddweud o'r ddweud o architechur, sy'n ddweud â'r Embercosm. Dwi'n ddweud â'r ddweud o'r ddweud o architechur, o'r ddweud o'r ddweud. A dyna ddiwedd ar gyfer, dyna'n ddweud â Simon. Simon wedi ddweud â'r simulator. Y ddweud â'r Embercosm yn ddweud o ddweud o'r ddweud o architechur, sy'n ddweud â'r ddweud o'r ddweud o'r ddweud o'r ddweud o architechtechur, mewn gwiriaf ac blincau, ddweud â'r ddweud, aent rydym yn ddweud. Dwi'n ddweud â'r ddweud â yw y cerdd. First of all, I'm going to what AAP is, so this is what it looks like, a sort of high-level block diagram. It's Harvard architecture, so the code and the data are completely separate. We've got a variable number of registers, I'll talk about it a bit more later, which varies between four and 64 registers. They're 16 bits wide each. All of the operations work on 16 bits. We've got 16 bits worth of data memory, so 264 kilobytes of data memory, and 24 bits of code, which is word-addressed, so we've got 16 megawords of code maxim-addressable. At the moment, the architecture looks very sane, so it's a risk architecture, and it's load-store, so if you want to do anything you have to load from your memory into your registers, do the operation there and push it back. At the moment, the instruction sets relatively straightforward, there's nothing that surprising in there, but the purpose of this architecture is mainly to form a good basis. It's simple at the moment, and it's going to become more complicated as we add new things and try new things out. That's basically what I said just now. I want to go into a bit about the motivation of why we're doing this. As I said, our main business is working on compilers and things. Why come up with an ISA, why come up with an architecture? The reason for doing that is basically, there's a number of reasons, one that is very, very good for training. If we've got new employees, new people, or if we want to help out with the community, then it's very good if you're trying to learn how to write compilers to have something to do, basically, that doesn't involve digging deep into the x86 compilers or the ARM compilers, which are very massive and complicated. Having a good basis, which is well documented and open, it's very useful from that perspective. For us, it's very useful for experimenting with new features, and this ties into the main points, which is basically that. We have customers come to us and want new architectures or want a new compiler for their new architecture, and we can't necessarily talk about that architecture. That architecture, the compiler for it, the tools for it, can't necessarily be upstream in the code bases, so we can't necessarily have ourself entry in the main source code for LVM, GCC, and similar. But we would like to have something entry which represents that architecture because otherwise we're working out of a tree, so we're working on our own, and we have to maintain everything, we have to make sure that nothing in the main tree compiler breaks anything we're doing, we have to maintain that compatibility, and we don't have some architecture entry which we can push our changes back and give back to the community. That's the main reason that we want to do this, and from that perspective, the experimentation side is very true as well. By having an architecture which we can play around with and modify, we can, if a customer has some strange or interesting feature in their architecture, we can add it to our architecture, implement it there, demonstrate it, it can be done, and then we can push that into the main tree if it's allowable or otherwise. It's good practice, it's good. To demonstrate we can do something. I just wanted to quickly go into a couple of the problems, so this is to try and address some problems that we've seen in compilers, and we want to fix things that we see in those compilers when it stumbles on an architectural feature that isn't represented in a tree, so we want to address that, and so I'm going to talk about a couple of the things that we've had to address even in this relatively simple architecture thus far. The main premise is that compilers often assume that your architectures are boring and straightforward. They are not used to having very weird architectures, things like DSPs, throw a spanner in the works a bit and get more difficult to implement. So an example we've got in AAP, we've got this data memory, it's only 16 bits. Our registers are 16 bits, so a register can reference to all of our data memory, and that's all well and good. Our code memory, however, is too big. We can't reference all of it, so things like function pointers, we can't reference all of our code memory through a single register, so our function pointers can't live in a single register. So we then have to pair up our registers and that itself isn't a problem. Compilers handle that fine. The difficulty comes when you want to do both of these, so we've got some things we want to reference with one register, some things we want to reference with two, so when we're using functions we want two registers to talk about it. When we're referencing dates we want one. However, the compiler tends to assume that one point size fits all, so you have one pointer and that is assumed to be big enough for both. So that gives us a couple of choices. You can use the smaller amounts because it's more efficient and try and reference everything with it, but now you can't use that for all of your code, which is not helpful. Sometimes you can get around that with some workaround or some hack, but that's not very ideal. The alternative is that you use two registers and a pair everywhere, so all of your data you reference with two registers, and that's just mind-blowingly inefficient and you don't want to do that either. Overall, the actual problem can be brought down to we just need to fix this assumption in the compiler. It's an assumption that doesn't hold for us and it's an assumption that doesn't hold for our customers, so it needs to be fixed. Another issue that we come across, so we're a load store architecture. We can only do calculations when something's in the register file. So we have to first load things in and store them out once we've run out. So once we've run out of registers, we have to shuffle values between registers and memory. We can't operate directly on the memory. And that code to do that, to do the shuffling, spilling and reloading of registers, that's all produced by the compiler. So we've got our obvious example, which is C plus D. We've got some stuff in some registers already and we've got the values here. We first have to store what's already there. So store one register, two registers, out to these. Then we load in our values that we actually want to do the calculations on, which is C and D. And then finally we actually do the add. Now, in all of that, we've got five instructions here to do the add. Four of them are completely useless. They're not doing anything interesting. So we do this operation, C plus D. This is the actual guts where that operation is happening, but we've got to do all of this shuffling of stuff to actually do that operation. And you want to minimise that, basically. And these compilers do a reasonably good job of this, but it's quite a hard problem. It's quite difficult to compute how to spill and restore registers, when to push stuff to mirror and when to pull them back. The problem we have is that in compilers, there's often an assumption that you have quite a lot of registers. So you've got at least eight registers, or 16. And when you've got eight or 16 registers, it doesn't matter quite as much if you don't use them all quite as effectively. If you waste one register out of eight, that's not so bad, but if you've only got four registers, as we're aiming to support with AAP, you've lost a quarter of your register space just because you're using it inefficiently. So we're addressing this, which is basically handling this case where everything is much more constrained than targets are typically used to. So those are basically a couple of the problems we've seen. There are a whole bundle of other smaller things that we've encountered, but basically these are the kind of things we see, and we want to fix these in compilers, and we want to fix them in a way that we can put those fixes into the main source tree and have them be supported and maintain support for them. So back to the high-level view. So this is where AAP has come from. It started out at FOSSTEM last year, and that was Simon's. Simon did the initial ISA design and did the initial compiler. He brought that up incredibly quickly. We did a big bundle of development in this gap here, and then a year ago, almost exactly a year ago, we had a high school student come in to do an FPGA implementation, and that runs on a D0nano, and that was Dank Orenge, and that's also available. We talked about it at Orconf, and we also talked about it at FOSSTEM this year. Just before the end of last year, Simon started working on the simulator, which is what he's going to talk about now. So the current stage of things, we've got our ISAs in version 2.1. We've made a few changes. They're mainly relatively small changes. We've got a compiler, which is Clang LVM based. We've got a basic debugger. We've obviously got this FPGA implementation. Sadly, we don't have a portrait of Dan, but he left this on the whiteboard for us. And we've got the simulator, which Simon's going to talk about now.