 So, hi everybody, my name is Rahul, I'm the CTO of TypeLead, and TypeLead is a company that's working on building a programming language called IDA, which is based on Haskell, and runs on top of the Java Virtual Machine. So today, I'm going to talk about something we've been working on recently, which is a way, another way to work with concurrency on the JVM. There's already, there's some existing libraries and frameworks that solve a similar problem. We're going to show why the way we're doing it is much more powerful. Okay, so just a bit of an overview of this project. It started out with the name GHCVM last year, as part of a Haskell Summer of Code project. And my mentor was Edward Komet, and soon after this, soon after the summer ended, I really wanted to continue working on this. I saw a huge feature in it, this response was great from lots of people on Twitter. And so, what happened was, my wife also saw how much response this project was getting. And she decided she would also work full time with me on this, and make it a success. So that's when we started TypeLead, which is a company that takes it forward. And we also decided to give it a better name, GHCVM is like a very lame name. So we decided to call it Eda. And just recently, we got invested by Techstars, small investment, but it's enough to sustain us for quite a while. So, yeah, so we're in New York for a couple months. So now let's just talk about Eda itself, what is it? It's a pure, lazy, statically typed language, and it runs on a JVM, as I said before, and it's a fork of GHC. So what this means is, Eda actually has access to all the major type system features, it has access to all the major optimizations, it has access to the optimizer of GHC. But the only thing that changes is the backend, which is the part that generates the intermediate representations out of GHC into Java byte codes. So another cool part about this is that we can compile Haskell packages out of the box, which means even though we haven't spent that much time on building libraries, we already have access to a bunch of them because of the fact that we support the same syntax. And we also have a strong and typed FFI. So we have an FFI that lets you, so one of the tricky bits of having a pure language is you have to be able to interact with impure and impure ecosystem, in this case the Java ecosystem. And you need to do it in a way that doesn't break the purity, otherwise there's no point in using this language in the first place. So we have really a solution for that. I will not be talking about that in this talk, but you can find information about online. And one of the main focuses of this project is it came out of burning need to want to use Haskell everywhere, like at work and in industry in general. So in order to do that, we need to one, focus on user experience, make it really, really awesome in terms of tooling, in terms of ID support, all that has to be great. And we also have to focus a lot on performance, both compile time performance and runtime performance, both of which are important for using any technology in business. So most of the talk today will be focused on ways you can do concurrency. We'll first start with the ways you do concurrency in existing languages. We'll probably be talking about other JMM languages, but it applies to languages in general. And we'll start with the definition that it took from Wikipedia of what concurrency actually is. So concurrency refers to the ability of different parts or units of a program, algorithm, or problem to be executed out of order or in partial order without affecting the final outcome. So what that means is you should be able to have parts of your program that are running simultaneously and the order in which the instructions can be running can be interleaved. So I'll show a diagram to make that a bit clearer. So one of the basic fundamental ways to handle concurrency that's provided for you by many languages and operating systems is that of a thread. There are different types of threads. There's OS threads, which is the native kernel threads that the operating system provides you. Then there are green threads, that's implemented on the user side, application runtime side, and also fibers. So I'll be talking about the advantages and disadvantages of each of these different types of models. So let's start with OS threads. So what OS threads allow you to do is allow you to take existing... So people have been used to programming sequentially. This instruction runs after this instruction. So now when you go to the concurrent world, it's very easy to use that same sequential thinking in the concurrent setting as well. So threads allow you to do that. So you have two threads, which have a sequence of instructions that are running, but the order in which these instructions are running can be interleaved. So you can have a couple of instructions from one thread. So we're talking about a single processor executing multiple threads. So a single processor executing multiple threads will execute some instructions from the first thread. So we'll switch over to the second thread, executing some more instructions, and it'll switch back to the first thread, or other threads if there exists this thing. And this is what this diagram illustrates. So OS threads are primarily scheduled by the operating system. Users don't have much control over it. There's a couple of syscalls it provides, but not much control. Threads are expensive to create and contact switch. And yeah, as I said, you have the thread abstraction available in almost every language, except for Node.js because it's a single thread language and JavaScript. So another requirement of using threads is you need to be able to synchronize between multiple threads because a lot of times you'll have different points of execution accessing the same area of memory, and you have to do it in a way that keeps memory consistent, otherwise you'll get the dreaded concurrency bugs. So now let's go to another category of threads, which are the multiplex threads, and you may have multiple user-level threads into the kernel threads. So it's typically referred to as the M-to-N threading model, where M is typically greater than N. N is the number of processors you have. So you typically use these kind of... Now you'd probably want to know why you'd want to use such a threading model, and the point of this is to implement asynchronous and non-blocking architectures. So what this allows you to do is say you have a web server. So a web server will process a request and send back a response, right? So what happens in processing that request? It blocks on some input output, say maybe connecting to another server, like calling an API of another web service or something like that. So when that happens, it blocks that thread. It can't do any work until that data comes through. And the problem is you can't actually accept a new request while this thread is blocked. So the point of these architectures allows you to handle many requests at a time and to be able to suspend that point of execution when it's being blocked on some input output and then be able to run process and other requests simultaneously. So even in this model, you stop to deal with synchronization. So now let's talk about green threads. So this is one implementation of the M-to-N threading model, and it's preemptive. What this means is that there is some application runtime that's scheduling these threads for you and you don't have much control over when their context switched or not. Yeah, so it's scheduled by an application runtime. And then fibers are... You could think of it as like a dual. In this case, the user has control over when to context switch that lightweight thread. Yeah. So it gives you a bit more flexibility in exchange for it. Like, now you have to actually worry about when to context switch and when not to. So it's a trade-off. So there's an alternative model like the event loop model where you have a single threaded architecture and you have a single event loop that goes through all the asynchronous methods you've registered as handlers. So the ways you do it in Java and the JVM ecosystem is using NetE and you can also use... Scala has a great number of libraries to handle this problem as well, like ACA and all those. And Node.js. And typically, you can get pretty high performance out of these kind of architectures. And the cool part is since this is a single thread, you don't have to deal with race conditions or any of that because only one bit of code will be running at a given time, right? So that means you won't have any of the conflicts with shared memory anymore. So this is great. If this gives you high performance, you probably want to use it for everything, right? But wait, no. It's not that simple because you end up getting code that looks like this. So don't focus too much on the actual content of the code. Just look at the basic structure. So what you see in the structure, basically it's an asynchronous program that basically counts the word occurrences. So one thing you'll notice is that the code starts branching out and goes further and further to the right. Eventually, this is a very simple program. So you can at least see it on one screen. But when it becomes more complicated, it's going to be much, much messier than this. So in exchange for getting good performance, now you have to trade it off for a more complicated code. So one nice way to solve that is using concept of futures and promises. So once you use futures and promises, then you can program it in the functional style, and you'll be able to get nice, tight code. But now you've gone from what is sequential code, which is a bit easier to understand than all of this craziness of functional style with flat maps and maps. So is there a way we can go back to the old nice, sequential style and get the same benefits of asynchronous programs? Well, yeah. So async await is a concept that was introduced in C-sharp, and now it's gone to branched off to other languages. So it's a pretty cool solution. Now you have a nice, clean sequential program that's easy to understand and figure out what's going on. But our problems are not done yet. Because async await does not first class. That means you can't typically pass around async functions to other functions and treat them as values, you know, which is like the basic core of functional programming. So what you'll get is examples of where you think the program should compile, but it actually doesn't. So, and the reason is because of the reason has to do with how async await is actually implemented. Basically what happens is it's implemented as a state machine. Like it's a compiler transformation. So because there's so many restrictions on it. And you also get unexpected semantics sometimes. Like where you should be able to abstract out that slow calc future. It turns out that actually changes the meaning. So this is not a problem with a specific implementation of async await. Like the one I'm showing as an example is the Scholas async library. It's not a problem. The library is implemented very nicely. The problem has nothing to with Scholas. The problem has nothing to do with like the way you're coding. The problem has to do with async await itself. It's a nice solution. It handles like the common case. But when you want to abstract your code, when your code base becomes larger, it becomes very, very difficult. And you start having to repeat yourself over and over again. And we don't want that. So what do you want? We want to be able to have first class to handle this kind of asynchronous code. And another problem with asynchronous code is it becomes hard to debug. Because when you get an exception in like a normal sequential program, you get an entire nice stack trace that tells you how you got to that exceptional step. But you lose that when you go into the asynchronous world. So another thing we want, we want to make another problem with abstractions is that they might actually make your program go slower. So we want to make sure that this abstraction is powerful enough, but at the same time gives us good performance. So what do you want? I already mentioned some of the reasons. We want to be able to write reusable code that we use in a lot of different cases. We want to be able to iterate fast. So what happens when you have to write the same thing in different places? You can't iterate as fast, right? So another thing is code becomes more complicated when you can't abstract. So we also want simpler code. So before we get started on the solution of how we solve this Anita, we'll just introduce a basic core part of the solution, which is sequensibles, which is a mnemonic I use when I talk about monads to make it easier to understand what a monad actually means, because a monad is like an abstract term, not known by many people. So like it's easier if you talk about it with a term that people can relate to. Like sequensible means something that can be sequenced, right? And that's what a monad helps you to do. So basically what is... I'll use monad from now on. The whole point of the sequensible part was just to get you to... Yeah. So the whole point of sequensible is just to introduce that mnemonic in your head so you have that association, so you know what a monad means from now on. So a monad is a general-purpose abstraction for handling computations that can be sequenced, okay? So another colloquial term or phrase for talking about monads is programmable semicolons. So take this piece of Java code. So you see each statement is terminated by a semicolon, right? So monads are an abstraction that lets you abstract over that semicolon and actually do extra work in that termination of each statement. So monads are based in category three. That's why they have such an opaque name. And category three is the factory for principled, powerful, and reusable abstractions. And using monads while trying to develop theories on monads does require a PhD. Using monads themselves doesn't. So also before I continue for it, I just wanted to spell some myths that are going around about monads. Things like monads are impure. They're not impure. There's nothing about monads that are impure. And about the fact that monads have something to do with side effects, monads have nothing to do with side effects. They're just another design pattern. They're a nice, pure abstraction that you can use. And it just so happens that they're typically used to model side effects in a pure functional language. So what's the definition of a monad? So a monad is a... Oh, that should be class, actually, in that instance. So imagine a class over there. So it's a type class with two methods. So for those who don't know what a type class is, it's like an interface in Java, except it's much, much more powerful. And so here we are defining... So a monad is a higher kind of type, which has two operations, which is return and a bind. So this two greater thans and equal means bind. So the nice way to think about this is monads kind of abstract over the concept of a callback. So if you look, think of this as the core action and think of this as the callback. That's a really nice way to start thinking about what this action means. So basically what it does is it provides an operation to let you sequence stuff, as you see. So it waits for the result of this first action. It takes that, then it has a continuation. So what to do with that result and give another value. So now I'll talk about the IO monad, which is the standard monad you have to use in IDA in order to do any useful stuff. So an action of the IO monad is a description of a computation that can return a value of type A. So this is an approximation. This is to get you to start thinking about what it actually means. So you can think of it as a function that takes unit and returns. So it's a function that takes like a void argument or returns A. So this is not exactly true because this invalidates a lot of the assumptions in pure function programming. But there's a way to get to start to think about what this actually means. So now this is an almost exact definition. It's not exact because the exact definition is much more efficient than this because this actually returns a boxed ordered pair. The exact definition doesn't. It's a lot more efficient. But basically what it does is you see this abstract type called real world. So what real world is, it's a type that the user cannot, typically when you have a type you can usually create values for that type, right? If you have and you can declare numbers and the compiler will take it. But real world is a built-in type to the compiler that the user cannot construct. So what this allows is it allows the compiler to sequence your operations in the correct way and it prevents the compiler from optimizing when it shouldn't be optimizing. So just a quick example to show how you use program in the IOMONEd. So a simple example of downloading processing music files. So you have a music file type which is just a string of bytes and then you have a couple functions. So download music file takes a string and returns a music file in the IOMONEd which means it's going to transform real world. It's going to return a new value of real world after you execute this function. And then you have another function called merge music files to. So it'll merge the two music files. It'll combine them to one file and then save them to some place on our operating system. So now all we know about the IOMONEd is that it's a monad. So if it's a monad we can use these methods, right? So let's say we have two music files and we want a 1.MB3 and 2.MB3, we want to merge it. So this is how you'd write it. And the style of programming should actually look very familiar. It's just like callbacks, just like we saw earlier in this call example. And as you see it, just like with callbacks it's going to keep on expanding to the right and making it harder to read. So because it's such a common problem and because monads are used so much in languages like Haskell and Eda, there's a thing called do notation. So you can think of this as the most general form of async await. So async await is like a special case of do notation. So these two pieces of code are actually identical. The only difference is this one's written in the standard imperative sequential style so it's easier to understand what's going on. So if you look at this code it's very easy to see that, okay you're downloading a music file, you get two songs and then you're going to pass it as arguments to the next function to get the result. It's a lot easier to reason about. So now let's talk about how we got around to implementing fibers in Eda. So the concurrency story for Eda has been a bit tricky for a long time. So Haskell uses green threads and implementing green threads on the JVM is a very difficult task. So sometimes it can interfere with JIT optimizations and there's all sorts of problems with it. So we've been struggling with how we get a nice lightweight thread solution on the JVM for a long time and we always assumed it would be something at the compiler or runtime system level and with the right inspiration we were able to find a solution that actually provides a fiber implementation in the language level itself with some random extensions. So we have a Gitter channel for Eda and it all started when a guy named Alberto said this one thing. So Alberto, I'll introduce him later, he's the author of The Transient Framework. It's a framework for doing reactive programming in Haskell and it has a very interesting implementation. So basically he basically made this comment. He's like, I was fantasizing about a base IO plus monad which makes a continuation available for the programmer so a lot of effects can be created including threading. So there's something about what he said that instantly sparked a light bulb and I started hacking. This was on Wednesday or Thursday and by that weekend I had a working implementation of fibers. So you can find out information about Alberto here. And I just want to briefly show some of the power of the framework that he built. So this is an example of a distributed program written like what, six lines. What it does is it has two core parameters called wormhole and teleport. So wormhole creates a gateway into another machine. So in this case the machine is referred to as node and inside the machine these instructions get run. So can everybody see this by the way? Okay, so in the node machine we are doing hello world in node. We grab the process ID and then we teleport. So what teleport does is it now brings you back to the current machine. So wormhole creates a gate and you're on the other side and when you teleport you come back to the original side and you can continue programming. So the cool part about this is that it allows you to program distributed systems, especially distributed system protocols in a sequential style. So we wanted to get something like this inside of Eda but the only problem is this requires a lightweight threading system to work very nicely. So that's when it came up with the concept of a fiber monad. So it's a monad for working with computations that can suspend at any time. So it's actually, in terms of implementation wise it's actually exactly like the IOMONAD but the only difference is the bind implementation. So the bind implementation what it does is it maintains a continuation stack and this stack is very efficient because we made it, we represented it using an efficient mutable Java array. So now when I say the word mutable it's going to raise red flags with every one of you. It should. And the thing is if you actually look into it it actually works out. It won't cause any major semantic problems and it becomes very efficient. So this is the basic interface, the monad interface so it gives you a return which takes a pure value and puts in the fiber monad. And then a bind which allows you to sequence fiber computations. And then there's some other instances you have so I don't have time to talk about these other abstractions like functor, applicative, and alternative. I just put basic comments on what it does so a functor lets you transform the output of a computation and applicative allows you to paralyze. So in this particular case you actually do paralyze the computation. So this allows you to spin off fibers that do parallel computations and come back with a compound result. And there's also an alternative instance which means if the first fiber fails it automatically runs the second fiber. And there's also two more primitives. So these are these two primitives that are down here yield and block. So these are like the very low level primitives that allow you to construct all sorts of concurrency models you want. So yield what it does is it terminates a current computation at that point. And it'll automatically add the remaining computation to the run queue. And then block is a simpler version of yield that doesn't do that part where it adds it to the global run queue. So I'll talk about this stuff when I get to the second half of my talk where I talk about the internals. But just know yield... Yeah, so that's going to be in the future work but yeah, it's possible. But we haven't done it yet, yeah. So now how do you spawn a fiber? So we provide a fork fiber primitive that you send a fiber computation and it'll run it in the IOMO-NED. And let's talk very quickly about mVars. So mVars, you can think of them as a single element bounded channel. So what that means is it can only carry one element at a time and let's say the channel has an element if you want to add a new element into this mVar the operational block. So we already had these operations in the IOMO-NED and IDA and it'll actually block the current thread which won't give you as much throughput. But we actually provide implementations of the operations in the Fiber-MO-NED. And the cool part is in the Fiber-MO-NED it's actually non-blocking. So it actually terminates that thread and allows other threads to run. And here's a very simple example of using the basic stuff of InJuice so far, InJuice and mVars. So basically what this program does is it creates two fibers and this fiber will say ping and it'll send a message to the other one and this fiber will respond and say pong. It'll keep on going, it'll keep a counter of what's going on. So this is the core function. So what it does, it takes a source channel so a source mVar and a sync mVar. So the source is input to this fiber and the sync is the output. So basically these two fibers are connected in like a cycle. So this fiber, the input of this fiber is connected to the other one and the input of that fiber is connected to this one and the output. So it's connected in a circle. So what it does, it'll try to grab some input and if no input is available it'll just stop the thread and allow other threads to run. So this is actually non-blocking. And the cool part is, so now you get asynchronous computation. So this is non-blocking but now you don't know the difference. Like in other languages where there's async, wait, and that kind of stuff you'll see an async keyword that says okay, this is non-blocking but here you can distinguish. It's because of the power of the abstraction we've done with this. So the only way you can distinguish between asynchronous and synchronous in this model is through types. If it's in fiber, it's most likely going to be non-blocking. If it's in IO, it's going to be blocking. That's the way you distinguish. So it just prints a message and then it'll just send like a unit value to the other fiber and then it'll increment. So it'll go in almost like an infinite loop. So here's the main function that runs this example. So what I've done here so this is what tells a runtime system to only use one thread. So the runtime system is pretty configurable. You can configure how bigger that pool is and all that kind of stuff. So in this case we're configuring it to one thread and this is to actually demonstrate this is actually non-blocking. All these operations are non-blocking because then you'll see stuff always happening. So now this is the basic initialization stuff. You fork two fibers and then on the main thread you have... you have... so you activate this fiber in the main thread here and then you wait about a second and see how far it goes. So what I'll do is I'll run that example so you guys can see it. So as you can see it's alternating and I've cut it off in one second so it'll go on forever but yeah. I've cut it off in one second so you can at least this program terminates. Okay, so the cool part about this is it allows you to code asynchronous code like it's synchronous without having to worry about extra syntax overhead like async or await or anything like that. So after seeing all this you probably understood that fibers are the fundamental primitives which you can implement efficient concurrency mechanisms. So now I'm going to talk a little bit about what you can build with fibers now. So now you can build actors on top of this. You can build reactors. So there's an alternative concurrency model that's a join calculus. It's a purely functional concurrency model. So I've just colloquialized that to Reactor because it kind of gives you intuition as to how it works. It allows you to program reactive systems. It allows you to do non-blocking IO applications. So I haven't actually shown you the primitives to handle non-blocking and all that stuff. This is still under development but we can get that done eventually. You can also just use it for simple asynchronous test. Let's say you're writing a crawler and you want to download HTML files from multiple websites. You could program stuff like that as well. So now I'll talk a little bit about where this is going to go and what kind of APIs we're going to add eventually. So an API to convert callback-based Java APIs to hybrid APIs. So this allows you... So many Java APIs have... take handlers and applications. So you want to be able to use these APIs in this new Fiber Monad, right? So we're going to provide primitives for that. We're going to support Java.Utilite Concurrent of Future. So almost so many libraries use this... use the Java's standard Future data type. And so we want to have a way to integrate non-blocking premise states. So Anita, we currently have software transactional memory, but it blocks. So if the conditions of a transaction don't hold or you're doing a retry, then it'll actually block the current thread instead of letting other transactions run. So we're going to have a Fiber API for software transactional memory as well. And as you asked, there's going to be an API for catching exceptions as well. Like a normal sequential IO program. Except it'll now take into account the fact that it's asynchronous. And then another thing we want to do is we want to get the applicative do extension from GAC8. So what that does is as I showed before, the applicative API for Fiber actually paralyzes the fibers. So what would happen is say something like take something like this API back here. Take for a second. Replace this IO with Fiber. So assume this is a Fiber computation. So what applicative do will do is actually convert this to an applicative expression and automatically paralyze it. So to the user, it looks like you're just sequentially doing stuff, right? But it'll actually automatically find the most efficient way to do the computation. Which is in this case to paralyze these two because they don't depend on each other. So what do we want to take for it? We want to be able to handle blocking calls. So any asynchronous system has this threat of a call that actually blocks and just cripples the system. So we need a way to fork off blocking calls into a separate thread pool and then just be able to wait on it. And then now remember the wormhole and teleport examples I gave you. So that's the distributed systems angle of it. We want to be able to support that API as well. We want to be able to implement a wormhole and teleport instead of IDA using fibers as like the core. So we want to add some mechanisms of fault tolerance. So basically this exception API and the distributed setting what we can have it do is we can have it, an exception thrown in another system can actually propagate back up to the original system giving you a nice fault tolerant mechanism. So, and yeah, and the transient library has lots of other really cool APIs you guys should check it out. And also there's another aspect of we're going to address which is the tooling. So a big problem in asynchronous, using asynchronous frameworks is being able to debug when something goes wrong. So how fibers are implemented right now is whenever you yield it actually sends an exception and that exception captures the stack trace of your program. So actually what we can do is we can actually make a simple Java agent that actually captures the stack trace and then gives you an entire sequence. So you'll know exactly what happened so actually that's why it's a Java agent. So the standard one actually I turned off stack traces for performance reasons but yeah, this Java agent will turn it on and then it'll do a bit of rewriting to turn on the stack trace capturing so that you can actually get nice debugging stuff. That's why it's a Java agent. Yeah. So currently the development of EtaFibers is going on in a separate repo called EtaFibersDev and the cool part is we're able to implement this without many runtime extensions and the reason is there's a concept called foreign primitive operations in Eta, so it allows you to extend the runtime system using Java. So you can write some Java methods to extend the Eta runtime system. So this allows you to program all sorts of primitives that you want and make interop easier. So right now it's a separate library but eventually we're going to include this inside of the standard library. So this is the standard library story because now we have an issue of Haskell's and Eta's base. We'll figure that out but eventually whatever standard library we have it's going to be in there eventually. So there's just some basic instructions if you want to play around it. I'll upload these slides. So now this is the second half of the talk. I'm going to be talking about the performance aspect of these fibers. So this is a nice API. It's a lot cleaner to deal with today if it can give you the performance you need. You program asynchronous in the first place to get performance. If you don't get the performance then you're not going to use this in the first place, right? So I'm going to talk about how I was able to optimize this down to really tolerable speeds. So in order to understand the performance aspects of this I'm going to cover three components. Let me just check. Okay. So here's the run time, the hotspot jit compiler and now the benchmark I ran to actually test how fast these things were. So just a quick overview of the run time. It has a concept called capabilities which are execution contexts that can run lazy functional programs. This is similar to how GSC does it but the semantics are slightly different. And so you can effectively think of a capability as being attached to an OS level thread. So you can think of it effectively as an OS thread. And you also have what are called thread state objects which are light, they're just Java objects that contain all the information about an execution. And you have a... So now the run time system is actually a bit different than how GSC handles concurrency. So we have what it's called a global run queue. It's a work stealing deck. So you can imagine it as a bunch of capabilities. So you have four capabilities and you have this work stealing deck. So what happens is each capability running its code adds stuff to the work stealing deck and the other capabilities that are not doing anything will start grabbing TSOs from that. So the reason we have to do this is to avoid doing checkpoints in normal user code because in order to work pushing you need to be able to do checkpoints at some point in your code to know when you have work to push. So as I said before the run time is pretty configurable. You can configure how long you wait to see what there's lots of configurable parameters. But we just haven't got around to documenting them yet. We'll get around to it. So now let's look at the JIT compilation aspect. So what JIT compilation does, so how it works is you have JVM bytecodes and they can interpret run time. And then what will happen is as you interpret the code the JIT compiler will learn what code is being run the most and what's causing the performance bottlenecks so these are called hot methods. So the methods that are being run the most are called hot methods and they'll be optimized and inlined by the JIT. And JIT can also de-optimize. So let's say it assumes that a certain structure of your code and turns out those functions are wrong later on in your program. It'll de-optimize and then optimize it again. So there's also different levels of JIT compilers especially when you talk about hotspot which is Oracle's JIT compiler. So there's C1 and C2. C1 is called the client server, client compiler. So the assembly code it generates is just okay. And C2 does more aggressive optimization. So I'll just cover very briefly some of the kind of optimizations that JIT can do. So there's one thing called null check elimination. So these are very common in Java code where you can check whether something is null before you actually method on it. So what it'll do is it actually take out that branch altogether and it will just call the method directly and it will install a thing called uncommon trap so that when a null pointer exception actually does happen that trap will catch it and it'll de-optimize the code. There's also branch prediction. So what will happen is let's assume that this branch is the only branch that gets taken in your program what the JIT will do is actually take out the other branch completely. So all you'll have, instead of even doing a branch it'll just directly go to this code making it very, very efficient avoiding a branch. But at the same time just like before you'll have an uncommon trap and if by chance the other branch gets taken at some point in the future it'll de-optimize and then do a different optimization. So when both branches are being taken it'll figure out which branch has more frequency and make that the standard and avoid that extra branch that happens when trying to... right there. And then inlining. This is like the mother of all the optimization that JIT does and it has many parameters to TUNET so I won't go too much into it but these are the two most important ones which is max inline size and freak inline size. So max inline size is the maximum byte code size of a method before which JIT will just stop inlining it and freak inline sizes. When you have a hot method what's the maximum limit before it doesn't inline? So this is a way you can get... so these values are actually platform dependent and you can get them using these flags. Again I'll pull this side so you can try it fine. And there's one more cool optimization it does which is type profiling. So interface methods and virtual methods so the problem with these is that you can have many implementations of given interface. You can have many implementations of given class. So what will happen is you'll actually have to do a look up and verification every time it tries to run a method. So what option do you think you can do it? Does it think of type profiling? Check what types are... So evaluate what are all the types that gets passed to X and if it's just a single type all the time it will actually inline the method of that specific implementation making it a lot... making improvements a lot better. It does the same thing for virtual methods as well. Same thing happens. So let's say another type comes along so there's a concept called bimorphic call sites and all this stuff I won't go into that but the basic point is it will deoptimize and we optimize again. So now let's talk about the thread ring benchmark. So this is what I used to measure how fast eta fibers were with respect to the other alternatives on the JVM. So basically what it does is it measures context switch time and in this particular benchmark, so basically how it works this benchmark you have like a bunch of threads and you start it by passing a message to one of the threads and it will send that message around in that loop. As many times it's required to reduce the value of the message to 0. In this case the value of the message is 1 million. So you're going to be passing that message around 1 million times in this ring of size 500. So this really stresses the context switch aspect of any threading implementation. So the benchmark is upload on github so you guys can run on your own machines and it's done using the Java micro benchmark. So actually this benchmark was already done by some other guy and so I forked from him and added the eta implementation as well as a vertex just to compare. So here's the result of the benchmark. So ACA takes around 650 milliseconds and eta fibers take around 950. So the performance difference is around 1.5. Now it makes sense because ACA is specialized for actors and actors have a specific structure right so they can be optimized so ACA would be designed to optimize that kind of structure and fibers are more general you can build like anything out of fibers. So I guess taking a 1.5 performance set with in exchange for flexibility is not a bad thing. And the interesting part here is so how many of you have heard of Quasar? So it's this library it's this JVM library that does byte code generation at runtime that adds code to like to spend any Java method. So the interesting part about this is that eta fibers are able to get the same performance as the byte code generation method and in our case we don't do any byte code generation we get a performance through compile time optimizations that are done on the code. So now we'll try to analyze a little bit so we'll look a bit deeper on the results. So this is how the so how GIMH works is that it runs many iterations to take into account that JIT optimizations happen over time. So if you look in the warmup iteration it took around six seconds actually in the beginning and they got very fast. And if you look at eta the start time is actually a lot faster almost three times faster but then it optimized down but not as much as as and actually part I'll discuss it to the reason as to why this happened part of it has to do with lazy evaluation. So here's a way to look at what was actually being inlined. So if you send these these three flags to Java when you execute your program you actually get a nice trace of all the methods that are optimized during your program. And so here's the trace so I'm running out of time so I'll just quickly zip through this. So this was on the hot methods. So this was a method that actually did the logic of checking whether checking whether decrementing the message value and sending it to the next thread. So in here's like a decompiled form of that method. So yeah, so this set currency push next and all this is the part that maintains the contuation stack in the fiber monad. Yeah, that's it. So Eda fibers give you a composable solution. Composable alternative to the async-await problem and for once it's not bad. It's almost as as good as quasar fibers. And conclusion is another conclusion is lazyfp in the JM is actually not as slow as people think it is it's actually pretty fast. It's just a matter of time actually continuing working on making it better and better. And that's it. Oh, before I get started so if you want to get involved in like the development of Eda and just keep up with what's going on, we have a Gitter channel that's the most active thing, channel. Twitter's also pretty active and we have Google groups if you want to do an asynchronous discussion and yeah, thanks. So I guess you could say most of the binding implementation is in Java. Like you can't because it's working with mutable arrays, right? Mutable, it's a mutable continuation stack and I'll just interact with the runtime system. Being able to push to that global run queues you have to be able to push to that, right? When you yield. So taking care of all that logic has to be done on the Java side because the runtime itself is written in Java, right? So yeah, you could eventually make it all on the Eda side as well if you provide more primitives. So I guess I guess it's similar to that only difference is like Haskell has green threads by default so they don't have to deal with any of these problems. This is a JVM language, you have to deal with that kind of stuff. You do actually. Oh, I think that fork fiber actually returns a thread ID. Yeah, it returns a thread ID, I'll fix that before I play. Yeah, yeah. You don't have to think of it as something new, it's just a way to get that green thread functionality from Haskell onto JVM without all the overhead, you know? So I personally just use Haskell mode on Emacs. Yeah, but yeah, we are working on, so currently we have a guy working part-time on building the Intel J plug-in for Eda. So it is being worked on, yeah, it's just a matter of time. Oh, okay, so we'll talk about that offline. But the timeline, okay, so I'll just discuss what we're going to prioritize right now. So main priority right now is focusing on the user side, like getting nice applications up for all these patched libraries and getting nice tutorials written and developing this fiber API further. And also we want to support template Haskell, which is like, it's going to take some time so we want to get it done as soon as possible. And yeah, we're focusing on those kind of things and getting more libraries supported. That's the current priority right now. So fibers, we haven't figured out where it fits in yet, but yeah, we're probably going to spend at least a day on it a week and gradually build it up, but we have lots of things to do. So it's a small team. I haven't actually tried, but I think you can have quite a lot. Probably given the fact that the context switching time is as much as quasar, whatever quasar can get, you could probably get the same amount. Yeah, more or less the same amount, yeah. So this is 650 milliseconds to switch fibers one million times.