 So how to present you our work on lower-jit.org? Actually, it's for platforms, which are sponsored by... I suppose many of you are familiar with lower-jit, as a jit compiler for the low-programmed language, and it leads to this version of F2.0.4. It supports many... not many platforms, but most of them are private platforms. There's, among others, support for MIPS and ARMS. There's two, but it's most noticeable that support for MIPS support and ARMS support is missing. And basically, that was our task, to support to make lower-jit runable on these platforms. So, we started with MIPS 64.0.3. That's the first thing that we built. And continued with ARMS 64.0.3. Because ARMS 64.0.3. was already available. So, ARMS 64.0.3. we developed in collaboration with a company, and these changes are now available a few months ago on the lower-jit development branch for GitHub. And now, from a few days ago, we finished the work on MIPS 64.0.3. And we submitted the batch with changes to the container for review and uploading. So, this is like a little announcement. And we'll still start working on MIPS 64.0.3. So, a little bit about changes between the main two versions, 12.4 and 12.1. So, the current release version supports only 32-bit GC references, which is suboptimal for many of 64-bit architectures. So, this problem is solved in introduced GC64 mode, which enables using the pointers up to 47-bits for pointers, meaning that you can now address the elegant memory from ARMS space that is up to 47-bits, address the lower 47-bits instead of only 32. So, at first, this GC64 mode is implemented only for interpreter mode of the X64 and ARMS 64 platforms. And not long after that, first, the jit port of that mode was available, and that was for the X64 platforms. And basically, we use that as a reference in our work for public duty. So, in a little bit about the differences, GC64, meaning, before GC64 was introduced, all the lower jit outputs, which are 64-bits long, were represented with 32-bits for type and 32-bits for pointers. Why is that? Well, as you know, Lula is dynamic type language, and there needs to be a way to distinguish these types, and in logic, for that purpose, it's used in n-tagging. As you know, there is an IEEE standard for Z-polypoint numbers, and in that standard, it's defined as some, there is some range of value that defines and not the numbers, and some of these, some of the values from these range are smart views, and Lula jit to port tagging the pointer, meaning, setting the type of the pointer to the logic object. And, as we said, in the earlier version, there were 32-bits simple for type. Of course, there was examples, for like, use of data on 64-bit platforms where the pointer are extended to be possible, for example, long. And, of course, there is always a case where logic object is a number, then you look at it for this to represent the output. What the new G64 mode introduced is that now all the pointers can have 47-bit long, and all the other 17-bits, out of the other 17-bits, there are four that are used for tagging, for setting the type of the pointer, which is basically enough, because logic uses up to 16 different types of values, and the other 13-bits, meaning the bytes of 13-bits, are all set to one, and it's kept that way in order to keep the recognizing of the number values. So, this is basically it for that. And now we'll see what main changes were for this 64 mode. So, basically, in the most part of code you have the operations, three basic type operations for handling the values, you often need to either extract the pointer or change the type of the pointer or extract the type of the pointer. And here we can see how these basic operations are and the other four instructions for pointer extractions. There is a defined constant with the lower 47-bit set, and we can use that in any instruction for basically extracting just the pointer value. For pointer tagging, basically the reverse process, we put in register certain constants that defines this type, and then we use it in the next instruction for addition. This register is now used as an operand, which is a bit in the same instruction shifted to the left, for some places, to place the type value bits on the proper location and keeping the id 47, go 47 bits on the original register, which holds the pointer. And for type checking, you only need to, of course, isolate higher 17 bits, and that's done by a shared write instruction, and after that, comparing with certain constants, in order to determine the test pointer of the desired type. Something similar here is done, it gives four instructions. These instructions are from release two instruction sets, and it is really, really useful because for these operations, we can use a lot smaller number, one or in a couple of two instructions for broad integration. So we have a special MIPS extract instruction, for pointer tagging, also there is special insert instruction, and type checking is more or less done in the same way as MIPS, like shared and write 4547 bits, so I can do the pointer and then compare. And basically, when someone tries to forward it to any new platforms, it basically, he needs to just take care to implement these three basic operations, I would say, in the most efficient way, with less instructions as possible. Okay, so now I will go a little bit to the source code, and I would explain what someone might need to do if they wanted to forward logic to another architecture, and this is all with an assumption that there is already an interpreter to implement it. So you have an interpreter and you want to implement it. So the first thing, one of the things that you need to do is to implement some piece in the interpreter, because there are some pieces in the interpreter that are only usable if you have JIT. Those things are like entrance and exit from JIT code, detection of hot code that needs to be JITed and stuff like that. You need to implement instructions for your architecture and the final registers it uses and stuff like that. You want to implement different emitters for instructions, so there are different types of instructions that use three registers, instructions that use two registers and the constant, so we want to implement all those emitters. There is an IR to machine code transformation, so logic emits a special IR for ULA code, and that IR needs to be transformed to machine code for your architecture, and this is where you want to do that. And you also want to implement the disassembler for your architecture just as it sounds, the disassembler for your architecture. So the first thing is the interpreter. As I said, hot-root detection exits from JIT code entrance as to JIT code, etc. Those are all the things that you need to implement. This is a function that handles exits from JIT, so when something happens in JIT code so it finishes or it fails, you want to go back to interpreter and mostly every example that you are going to write is going to go through this function. So you want to get this function implemented properly in the very beginning of your code. One of the things that you want to do in this part is first of all you want to save the state of the JIT. You want to save all the registers that you use and put them on the stack. So those three dots are actually some a missing code. That code basically just stores all of the registers, all of the registers to the stack. You also want to make sure that you extract exit number and error. Properly, those are mostly different for every architecture. So the target file, that is a file with a lot of defines and metros. You want to define all of the instructions, all of the instructions that you are going to use, so it's just complete encoding. You can probably find that in reference manual for architecture. You want to encode instruction fields. So you want different destination registers on one encoding, source registers on different coding, so those are instruction fields. You also want to define registers, how many registers your architecture has, how many registers floating point units has, how many registers are used for function calls, etc. This is an instruction editor. We have two functions here. These are the simple ones. The first function emits an instruction with three register operands, so DNM. You pass a standard state, one instruction and three registers to it, and it just merges all that information and puts it into machine code pointer. The branch emits a PC relative branch or PC relative chunk instruction. It just needs to get the current PC subtracted from the target pointer and encode it. If you're not familiar, this is not going to make any sense to you, but it's not that common. This is a little bit more complex emitter. It emits a load from our store to a pointer. So you pass it a pointer and a register and you want the information in that register to be the information that the pointer points to. So this is for ARM64 architecture. And ARM has some specific instructions. It has a PC relative load. So you first want to try to ask, can I do a PC relative load? If I can, then do a PC relative load. Else, I want to try something different, the GL relative load. The GL is global state for legit. If you can do that, then you just allocate the register for that pointer and load it slowly. And then after that load from that register. That's the slow path. The ER assembler. Well, this actually, this IR assembler has a function for every IR, logic IR instruction. These ones present here are minimum and maximum IR instructions. So they are implemented in one function and but they are defined as macros. So when logic gets a minimum IR it's going to go asm main function. And well, there is a different path if you are doing floating point, minimum and maximum and if you are doing integer, minimum and maximum you want to do different things. You want to use different registers. So this example here shows the integer, minimum and maximum. So what you want to do, a minimum and maximum is a tree operand instruction. So it has the result, one value and the other value the left and the right and you want to know which one is minimum. So you allocate the register for destination and all these functions that start with R, A are register, allocate the functions. So we want to allocate destination register, we allocate left, we allocate right register and you emit the two instructions, compare and conditional select. So you want to compare values first and then you want to conditionally select. If left is higher than you select left and if it's a maximum and if it's a minimum, vice versa. And as you can see here, the conditional select is emitted first and then compare. That's because logic emits the code backwards. So if you take a look here you can see that the machine code is detrimental. So it's generating backwards. So don't let this confusion. The other thing you want to pay attention to are optimizations. They are probably as an architecture specific optimizations that you can do. For R64 these are some of the optimizations that you implemented. So you can when you write in Lula A plus B times C, you can actually get that to be just one instruction. And Lula is a dynamically typed language with dynamically typed interpreted language. All that is just one instruction that is pretty impressive to me because usually in interpreted languages you get a lot of instructions for some expression like that. Also there are interesting optimizations concerning loading concepts. You want to look at that. And you probably want to look into popular fileers like GCC and LLVN to find ideas for these optimizations. So now for MIPS 64 part as I already mentioned we did interpreter port and JIT port as well. As for today we will not give some more details about interpreter port but just to mention that all the changes were related to early mention the value-tagging operations and JIT part changes are pretty much similar kind of changes that are done on our 64 port. The thing that we are going to do MIPS 64 soft load will leave a little bit specific changes earlier to port because in current state LLVN doesn't support any 64 bit soft load architecture so in order to enable that we need to make a little changes in split pass precise disabled splitting 64 bit IRs into 32 bit IRs and after that keep the continue the normal work that we planned can hold the point like that properly meaning to make sure that we don't have any multiple destructions and cold functions in proper manner so that's all of it and here you can see benefits of this work on these slides are presented changes the result that was run on on four platforms these bars the red bar represents how many times the logic is faster compared to low 5.1 interpreter and who bars adjust the difference between the logic interpreter and the logic in a more so that's a massive current questions you said that in the 64 bit soft load you're going to divide it into 32 and then call it are you going to call 32 bit soft load will it save you the questions the question is if we're wearing soft load we're going to divide it the answer is that in the current state of code it's already been doing soft loads so 64 bit structures are divided into 32 bits 32 bit 232 bit instructions do we want to save that because it's not no longer necessary okay is there any specific problem in our architecture well there are specifics in the interpreter and in jits there are a few things that we have to take care of but nothing especially but in interpreter now there are also things that we have to take care of because you have to implement calling convention for that architecture out of course you want to use how do you want to how do you stack how do you pass studies it can get complex I guess that's it