 All right. So, Reptigid, a fast dynamic systems programming language. So, first of high, I'm Max. I'm an open source systems hacker. I'm currently dabbling in high performance networking applications. And I suppose I should start with a quick run on of Lua just in case you don't know it. Lua is a simple minimalistic high-level language. It has like a schemish semantics and a Pascal-esque syntax. It has first-class functions, multiple return values, prototype-based object orientation, and it's surprisingly flexible. And its central data structure is the table, which is a hybrid between a hash map and a sparse array. And the canonical implementation of Lua is called Pook-Lua, which is a simple embeddable interpreter intended to be embedded in C and C++ applications, among others. So, here's an extended hello world to get you kind of familiar with the language. I'm not going to explain anything about it. I'm just going to let you stare a little bit and hope there's somewhat obvious. So, Lua.jit is an alternative implementation of Lua. It implements a dialect of Lua 5.1 and a half-ish, plus some extra goodies. That's what I'm saying is a dialect, not just an older version. As its name suggests, it comes with a just-in-time compiler, and that compiler is really, really impressive and achieves performance competitive with C. It's also a really good language for expressing programs that are close to the metal, thanks to its built-in language support for accessing and operating on C data. And I think personally that Lua.jit is a good systems language. Systems language here, I mean that it's a good language to replace C, like an application that you would have previously written and see, you could also write in Lua.jit and you would get very far with that. So, here's an example of some Lua.jit code showing off its ability to poke at low-level data. So, you can see here that the language has built-in primitives for expressing the C data, such as C struct, and you can access that data and the fields as if they were native Lua objects to illustrate that for instance, if P here was instead a Lua table with containing a string or containing a C array and a Lua number, the code here wouldn't change at all. You could still access it the same way and you could still copy to the C array contained in the Lua table. So, Reptigit is a fork of Lua.jit and its goal is to be a really good systems programming language. With Reptigit, we want to do a couple of things. First, we want to simplify the implementation and improve maintainability. We also want to improve the compiler for heavy-duty server applications specifically. So, we have a very narrow use case. We want to write systems applications like with systems hackers and we want to create a systems language and we have a more narrow set of optimization targets and we think that we can improve the compiler even more by targeting this narrow use case. Especially, we want to eliminate performance pitfalls meaning small changes that have big impact in performance and as well as unexpected jit behavior. So, we want to make it more easy to understand the jit compiler and to use it and under the bottom line, we want to provide more reliable performance. So, right now Lua performance is great. We just want to make it more reliable and maybe even better in some cases. Right, Reptigit also adds new features. Among those is a low overhead profiler and a matching introspection tools for that data. And hopefully, there are many more features to come and that's where you come in, I guess. So, in order to simplify and maintain the code, we are doing, oops, sorry. In order to simplify and maintain the code, we are doing a big spring clean. So, here's this pull request titled, big bang, remove all the features that I can live without. And yeah, it says merged. That's the purple icon on the left there. It removed support for all architectures except x8664 because at the moment, that's the only thing we care about and it's enough for maintaining one architecture. Removed window support and removed LuaJIT's 32-bit heap mode. And this allowed us to get rid of a lot of if-defs. We're trying to get rid of all if-defs because we don't like if-defs and that resulted in a total code reduction of roughly 50%, which I think is a big win. The LuaJIT interpreter used to be handwritten assembly duplicated for each specific architecture and we have almost completed rewriting that virtual machine interpreter in C and hope that that will make it easier to port and change the language implementation, that is. And the rationale behind this change is that we spend most of our runtime in compiled code, meaning in traces and compiled traces. So, for our use cases, high-performance networking, spending any significant time in interpreted code is out of the question anyway because that would be way too slow. So, for us, an interpreter that's fast doesn't really do anything. We really don't need interpreter that's like heavily optimized. So, our idea is that, look, we're gonna make it easier to maintain and we're gonna skip the optimizations that we don't benefit from. And instead, we want to make it more easier for your code to stay compiled and not fall back into the interpreter. So, we're also looking to remove complex optimizations that don't carry their own weight anymore. So, here we removed a special case, fast pass for string interning. And on the next slide here, I'm paraphrasing from the pull request which somewhat reads, this fast pass in air quotes was bad because it was a tricky custom mem compare routine that needs to be maintained. It turned out to be slower than the stock mem compare on modern x86, which was the slow path. And it led to confusing performance behavior where unrelated memory allocation could bias off news buffer to the fast pass, again, the fast pass and impact overall performance. So, in the description, Luke concludes that the fast pass code was written 10 years ago and a lot has happened since then. And he goes on into how the CPU architecture, the operating system that is Linux in our case and even compilers like GCC had really evolved in their time and goes on to say that, I think the optimization had simply bit-wrotted. So, what I want to stress here is five. Okay, what I want to stress here is that this is not to bash this individual optimization or anything, it's just to say that if you have optimizations that are kind of smart and try to outperform certain components in your system, you have to account for the cost of maintaining them and the work of continuously verifying that they still actually work and still actually make your program faster or just that weight. Okay, so I'm running a little bit out of time here. Right, so RaptorJIT wants to improve the JIT compiler. To understand the setting here, I guess I should explain that LureJIT acts as a drop-in replacement for Poclure. So, it has a really fast JIT compiler and it also has a freely fast interpreter. So, if the JIT compiler for some reason fails to compile a code path, it will drop into an interpreter that is still many times faster than the default Lure implementation, canonical Lure implementation. So, in any case, if you have a Lure program that you used with Poclure, you're gonna have big performance improvements if you run it with LureJIT and so you're gonna be happy. And we think that we can do better for a narrowed set of use cases with RaptorJIT. So, to summarize here, the LureJIT is for existing Poclure code bases and RaptorJIT is for new Lure applications written with a just-in-time compiler in mind. So, in case you don't know how tracing just-in-time compiler works, I will give you a super-reduce explanation. A tracing JIT interprets code. When it hits a branch, it checks if it's hot or not. If it's not hot, it increments a hot count for that branch and continuous interpretation. However, when the branch is hot, it starts recording a trace. And then, again, continuous interpretation after recording a trace. And the next time we will hit a branch which is not shown here, because it's very simplified. Eventually, instead of hitting hot branches, you would start executing traces of hot code. So, with RaptorJIT, instead of treating the compiler like a special source that you throw into your Lure programs to make them magically faster, we want to foster a culture of understanding a JIT compiler. And with that understanding, we want to formulate design goals and implement them. And this should, in turn, again make the JIT compiler easier to understand and, most importantly, easier to leverage. And the issue referred below here talks about avoiding high-impact medium-generality optimizations. I touched on this before. So, a high-impact medium-generality optimization is, we call problems this term if they have the following behavior. So, if you make a small change to your program and you get a big change to performance, that's what we call a high-impact medium-generality optimization because it's not general enough for you to freely make changes to your program that are relevant in scale and scope. But it's high-impact because you get a big hit if you fall off that optimized path. And that is why we want to avoid unreliable compiler behavior where small changes to your program cause big changes to performance. So, a little aggressively blacklist code pass that faces a compile that favors short-running programs. Our programs that we target are all long-running. So, just as a policy change, we spend way more time trying to find good traces and we're good with that. We just want to make sure that we find the best traces for the program because we know that it's gonna run for a long time. Yeah, as a first step in that direction, we updated the just-in-time compiler's statistics for trace selection to match our target workloads. And yeah, not gonna go much into that. On another note, Lourge doesn't actually consider the time domain when selecting traces. So, a branch could become hot because it was executed for the hundreds time after an hour of runtime. So, the hotness is not actually frequency, but rather this abstract idea of some counter that at some point overflows. And I think that's really unintuitive behavior. I think that maybe ReptileJet should consider the time domain and only compile code which is actually executed frequently. We did some experience with that. The results so far were positive, but that's just to show you the kind of hacking that we're doing on that fork. Right, we also added new features. We added a low-end profiler. The intention is to have that profiler always turned on. It will always collect profiling data even in your production applications, which you can then grab while it's running and display in this front end that we wrote to find which traces take the most time and what's the problem and see how they got created. This is to show that the tool that we wrote is a very visual tool. So, this is a dependency graph of the immediate representation instructions where you can see, this is the loop body and this is the head that's executed before entering the loop. Yeah, there's lots of experimentation going on. We've experimented with a trace barrier primitive that kind of stops traces from going over that line of code and forces a new trace to be created if at all. We like that one. We experimented with a jit unlikely primitive, which are too long to explain now. We didn't like it. And one thing I guess I want to touch on is that this jit seal primitive, what this would basically allow you to do is declare or constify a table at runtime and that would kind of give the compiler a super power where it could treat tables created after configuration changes as constant and optimized based on those contents. I've suggested just a few minutes left so if you want to take some questions. All right. Yeah, I guess this one's important. I just have to mention it like in a minute. This is something I knew of. Vela has apparently already implemented and something we're really interested in because it removes another high impact medium generality optimization. I guess check out the slides if you want to know more about that. Right, there's general jit compilers. There's new literature, new science happening that really fixes some basic things. We could implement those things and we're open to experimenting with that. And yeah, one of my personal goals would be to have safe foreign function memory access where because all the information for foreign types, meaning C data, low level data, we have all the type information for that available at runtime. So there's like nothing that really stands in between us making that type safe. And the compiler's really good at optimizing these checks so that's something that I would like to do. Right, so thank you for your attention. Please get involved. We are on GitHub. And yeah, I would. You can take a set up and Vela, you can. Sorry. Yeah, I would be pleased if you, if I got some of you interested in this project but also if you like into jit hacking. I think this is a cool place to start and yeah, if you have any questions, please. Yes? Yes? No, I don't think so. I mean, maybe some bug fixes, but I'm not aware. Yeah, I mean, we have a very, very specific goal and we're willing to separate and split. Like we want to cooperate with other folks and change exchange ideas, but we're really open to just like going wild on it. Anybody else? Yes? Yeah, I think so. So when I started working with the Reptile Project, I actually wasn't aware of Lua Vela, but I just recently reread some of the presentations on slide. And to me, it seems that they are very similar in spirit and they have some very specific features that they both want the same thing. And I think I hope that in the future there will be like a strong collaboration between these two projects because some things that we want already implemented in Lua Vela and maybe the other way around even. I think that Lua Vela wants, for example, to have also try out the C interpreter, stuff like that, yeah. Anyone else? All right, thank you very much. Thank you very much. Yes.