 Good afternoon. Welcome to my presentation. I am going to talk about our experience of porting.net core to Unix. I am the member of T-Mac Microsoft that was doing the porting, and I think it will be pretty low level. So the plan is to have 30 minutes of presentation, then I have five minutes of demo, and in the end I will have five minutes left for questions. So if you can leave the questions until the end, it would be nice. But obviously, if you want to ask a question in the middle of my talk, please do so. So is there anyone who doesn't know what .NET is? Okay. So anyways, .NET is a versatile application development platform that supports multiple modern languages. It uses, I should say, managed languages, which means it uses garbage collection for memory management. The languages that Microsoft supports or develops directly are C-sharp, F-sharp, VB.NET, or C++ slash CLI. But there are many other languages that are supported by third parties like Python and others. And the .NET core that I am going to talk about is basically a new version of .NET. It was created primarily for the purpose of ASP.NET, which is slimmed down on the framework layer, and it's basically developed so that we can deliver great performance on .NET web applications, but it's still very useful, and it can work well for other purposes like console applications and now for cross-platform development. So the porting of .NET core to Unix has started in November 2014, and after a few months we went public on GitHub. It was in February 2015. When I say we went public, it means that it's like living development in open source community. It's not just a snapshot of the state of what we have. All the runtime is shared with the full desktop framework, and we take contributions from both Microsoft people and from the community, and we have had a lot of contributions from the community over the year. And the final release of this is planned for later this year, so for now it's still working progress. There are still some rough edges. We are feature-complete at this point, but we need to work more on the user friendliness of some tools and stuff like that. So here, on the right side of the slide, you can see a stack of the .NET application. The orange stuff is written managed code, which means in one of our managed languages. On top there is an application. There are framework libraries, and they depend on special platform abstraction layers for the framework, and on the managed runtime. The managed runtime depends on native runtime, and the native runtime depends on runtime. This is like Windows, sorry, UNIX-specific part, because I'll talk about that on the next slide. Underneath there are the platform libraries like libc, librt, libunwind, libssl, libttng, and so on. So obviously the first part that we needed to create was the platform abstraction layer, or PAL. We have started building it on top of PAL that Microsoft already created in the past for UNIX systems for Silverlight, but we had to do a lot of modifications to that. The PAL basically emulates a subset of Windows APIs because the whole CLR was originally written for Windows, so obviously it uses a lot of Windows APIs. For file operations, console.io, time, synchronization primitives, waiting for events, logs, and stuff like that, for threading executable files loading because we have the same file format for Windows and UNIX, the P file format for the managed applications, so we need a special loader for virtual memory management, memory heap for general-purpose memory allocations, and for string operation, environment, locale, and other. It also, besides emulating the Windows layer, it also implements some low-level stuff that we needed for exception handling, I mean hardware exception handling, and native stack unwinding, which I'll both talk about a little bit later. As I said, it's present only for non-Windows systems because on Windows we use just the Windows APIs, and this is the only part of the whole stack that is able to use platform header files because basically it needs to be complete abstraction for the rest of the system. And as I've shown on the slide before, we have multiple pulse, one is for the runtime, and we have several more for the framework libraries so that we can version them separately because the framework is developed separately from the core system. So first I'll talk about the basic differences that are between Windows and UNIX that we have to tackle somehow. First is the compiler, obviously. So on Windows we use the Microsoft Visual C++ compiler, and on UNIX we have decided to use the new Clang compiler, which is a stricter C++ compiler, especially in C++ template handling and in its sensitivity to various not-so-clean things in the sources, so first step was basically make it build with Clang, it was quite nice that it also cleaned up the sources, and Clang is a very nice compiler. We didn't have any problems with the compiler itself, with the code, except for little assembly tricks that we needed to do. The compiler also has, or those compilers also have different way to express structure alignment, I mean memory structures alignment, inlining, exports, thread local variables, which are like DeclSpec in one of the, in Microsoft's compiler or attribute in Clang or pragmas that are different, so these all had to be somehow tackled. Then the biggest difference, or bigger difference was in the W character or the white character handling, because on Windows W character and the .NET character type is UTF-16, it means 16 bits, while on Unix WCHAR is 32-bit type, and since we pass a lot of strings between the managed code and the runtime, we cannot just translate it from the perf reasons, so we had to, we couldn't use the standard WCHAR library functions for string manipulations, so we had to re-implement them, the same for string formatting, and also some print F as kind of formatting characters have different meanings on Windows and Linux. There is also a difference in the long type, in the NC or C++. On Windows it's 32-bits, even on 64-bit systems, which is called LP64 data model, while on Unix it's 64-bits. Then on Windows we use ETW eventing, which is eventing system for high performance events where you want to log how, for example, GC behaves, how often it does certain things, and on Linux we have found a very nice replacement for that, which is LTDNG library, which is, we have incorporated into the system. And quite interesting thing was this flush process write buffers function on Windows, which is a function that ensures that all processors running certain process flush their write buffers, which is used by the GC to ensure visibility of changes in the thread state that were done by multiple threads to other threads without the burden of having memory barriers everywhere. And there was no equivalent of such API on Unix, or there is no equivalent of such API on Unix in general. So although recently a syscall called sysmembarrier was introduced to Linux kernel, I think it was 4.3 RC1, which we can use in the future, which we plan to use, but in general on Unix we had to use a trick where basically the way to achieve flushing the buffers is to make the processors, send the processors inter-processor interrupt, which causes the flush of these buffers, and to trigger this interrupt we have a dirty memory page where we change its protection from read writes to read only, and in that case the system has to send this API inter-processor interrupt to other processors so that they don't have stale records in their translation look-aside buffers, which would otherwise allow them to write to memory that one processor has marked a street only. So that's why this interrupt has to be sent and the processor has to flush this cached mapping. So this is a trick that we used on OSX and other Linuxes and this used for the time being. And there were also other challenges that we faced on the side of framework libraries, but I won't talk about them because I was involved with the core system, so let's move on. So I guess that the biggest challenge was the exception handling, because if you look at the stack on the right side, which is a stack of some managed application or could be a stack of a managed application, you can see on top or at the bottom of the stack then there are some native code frames which are basically the hosting application plus the runtime and it calls into some managed code which creates some stack frames that managed code can call into native code again where you have some native code frames and it can call managed code again and so on. And before I go into the details how we did that, let me just explain how the exception handling works. So the exception handling works in two passes. In one pass, it goes from the top of the stack frame by frame and until it finds a handler for the exception. And when it finds the handler for the exception, which is a catch in C++ basically, then it starts second pass and in the second pass it walks the stack again, but now it destroys all the objects on the stack or there's destructors in C++ until it gets to the place where the exception is handled and it reclaims the stack. So basically when the exception is handled, say if it was in this frame, then all of this part of the stack is gone. So on Windows this could be done in... or I should say one more thing, that you need to have a way to basically walk the stack. You need to have a way to go from one stack frame to the next one, which is easy for the native frames because the compiler that compiles the native code generated it for you. It's stored in some data sections in the executable file and the platform-specific unwinder can unwind it. For the managed code, it's different because it's generated code and it's generated by the just-in-time compiler. And on Windows we didn't have any problem with that because Windows has a centralized exception handling that allows you to register any function with the exception handling code. You can basically specify so-called unwind information for each function. And the system then knows how to move to the next frame because the frame, it means how to get the IP address, the stack pointer of the next frame and to restore some registers. While on Unix there is no such support for dynamic registration of this information. And so what we basically had to do in the first pass, we... or what we had to do in the first pass, we used lip unwind, which is a library available on Unix. There are two actually lip unwinds. One is lip unwind, called lip unwind 8, and the other is lip unwind from the LLDM project, which was not available as a separate library at the time when we started porting. So we used the other one, which originally came from HP if I remember correctly. And this library basically understands the unwind and the dwarf unwind information that's in the executables on Linux and can unwind the native frames. For the managed frames, we used a copy of the Windows Unwinder and just in time compiler generates the Windows Unwild Style Unwinding information and we can use that. So in first pass, we basically walk the managed frames. When we hit the native frame, we switch to the lip unwind and walk the native frames and so on until we find the handler for that exception. Then in the second pass, we do basically a similar thing for the managed code frames where we walk them using the lip unwind and call for each frame, we call a function in the runtime that's responsible for destroying objects in that frame and doing all other needed stuff in the managed frames. Once we hit the native frames here, we need to switch to some other way and we decided to unwind these native frames by standard C++ exception handling mechanism. Basically, we changed the processor context to be the first frame here or one frame below. We created a helper frame below the native frames and throw an exception and it basically is handled by the standard C++ compiler unwinding. Here, at the boundary with the managed code, we have a special catch that in case where if the exception wasn't caught in these native frames are right in the middle because it can happen, there can be handler as well, then we catch it here and switch back to our managed stack unwinder for the copy that I've talked about and we can also find the handler somewhere here. The native frames are unwanted by the C++ exception handling and the managed frames are unbound by our stuff. As for hardware exception handling, on Windows, they are handled the same way as the software exceptions. The structured exception handling system on Windows handles them in exactly the same way. On Unix, hardware exceptions generate signals, so we just catch the signals and then run our exception handling routines exactly the same way as we do for software exceptions. So that's just like little step in between. Next thing that was different was the calling convention on the AMD64 processor, which is the main processor that we support right now. It requires changes in the just-in-time compiler because it was generated, we wanted it to generate compliant code with the calling convention that's specified for Unix. Then we had some assembler helper functions that obviously had to be changed. Then there are changes in reflection invocation because reflection invocation goes through the run time, the native run time and P-involve, which is a way to call native code from the managed code and delegates invoking. The differences are that in the way that the parameters can be passed in registers, on Windows or on Unix, you have two more general-purpose registers that you can use for passing parameters, which is RDI and RSI, and four more folding-point registers, the XMM ones that you can use. Obviously, the just-in-time compiler needs to know that and generate proper code. It needs to know that, for example, this pointer is always in the first parameter, so it has to go to RDI, not RCX, and stuff like that. Then there are different callee-safe registers, which are registers that basically the callee is required to preserve for the caller. If a caller calls a callee and the callee wants to modify or use some of these registers, it needs to store them and restore them before it returns. There are two less callee-safe registers on Unix because they are used actually for parameters. The biggest difference is in structures passing by value. On Windows, you can pass structs only that are one, two, four, or eight bytes long, and they can be passed only in single register. All of the others are passed by explicit reference, where you have a pointer in register to the structure that's stored on the stack. On Unix, there are structs up to 16 bytes long, can be passed and returned in one or two registers, and it can be a combination of general-purpose and XMM registers. This basically caused the most changes that were needed in the runtime because that changed a lot of places. It's difficult to explain in a short way, but you can imagine that if you had a way where you specify a register index to pass a parameter, now you have two, and they can be interleaved with the folding registers. It's a complication. Larger structures are passed on stack by implicit reference, which means there is no pointer in a register, but it's like a fixed offset that points to the parent color frame, and that's where the structure is stored. Similarly for return values, on Windows, the return value can be in RAX for one, two, four, or eight bytes colors, no structures, or in XMM zero register for folding point or doubles or M128 register types. On Unix, you can use RAX and RDX, XMM zero or XMM one, and you can also pass or return structures in the same way as you can pass them to the function. Obviously, on both systems, if the return value doesn't fit into the registers for some reason, the color is a space on stack and passes it in the first argument register to the function. Then runtime suspension is another very important feature which is needed for the GC. It basically ensures that no thread is running managed code at the time when the runtime is suspended so that the GC can walk the stack, find the objects that are on the stack, so it can know that those objects and the objects referred from those are still alive. We have basically the way it works is that you set some global flag in the runtime that says now the suspension begins, and then there are three ways how the actual suspension happens. One is that there is a barrier at the boundary between managed code or where a manager calls native code, and this boundary basically, what it does is that when the native code returns to that place, it just stops there waiting for the runtime to end. If the code was not running native code, then we used two other ways. One way is to, we call it hijacking return address, which means that we look up the return address of the current function and change it on the stack to point to our function that does the similar thing as the barrier. It means waits until the suspension is done. And for certain functions, this wouldn't work very well because just imagine the function that has a long loop, and you don't really want to wait until the loop ends for the GC to kick in. So for these functions, we basically interrupt, redirect the context of the thread immediately to our function. But this function has to be specially prepared for that so that you can actually do it. On Windows and also on OSX, we use the thread suspension for that. But on most Unixes, there is no thread suspension API. The way it works with the thread suspension is that we suspend the thread, read its context, it means the process registers. Check if it's in the native code. If it is in the native code, you just let it run because it will eventually hit the barrier, and it will never walk back to the managed code again. For the other two cases, for the hijacking, we just modify the return address and again, resume the thread, and once it returns from the function, it goes to our function that waits. And for the last thing, we modify really the context, the IP to point to our function. For most Unixes, as I said, there is no suspension API. So we use real-time signals that basically interrupt the, we send to threads that we don't suspend. It interrupts the thread and transforms a handler. And in that handler, again, we check whether we are in, we read the context because it's part of the parameters that we get from the signal. We check whether it was running in the native code. If it was, we let it, we return from that and let it run. If it was, if we can hijack the return address, we do that and return again from the handler. And if, in the third case, we just wait in the handler until the suspension ends. So that's for the runtime suspension. The other part is a hosting API that we have created so that any native application can host the .NET runtime and can execute the managed code. There was a hosting API in the Windows version, obviously, but it was based on com and somehow we figured that it wouldn't be a nice way to do that on Linux or on any Unix. So we've introduced a simple flat API with just four functions that allow you to host managed code. It's a function for initialization of the CLR, which is a common library runtime. Then for shutdown, and then you have two functions to execute actual managed code. Sorry, the first is executing an assembly where you give it an assembly which is basically an executable managed file and it runs its main function. And when it returns, this function returns. That's it. And you can call it multiple times if you need to. And the other thing is creates a delegate function which was created for basically on demand from some people on the GitHub where they say, okay, we have a game engine. We want to use managed code for the runtime for our game logic, but we want to use native code for the graphics, obviously, the engine itself. So this last API basically allows you to create a function pointer to a managed function. It's not just a function pointer to that. It's a pointer to some stuff function that does some stuff underneath and then calls into a managed function. But you can create as many of them as you want and then call any managed function in your application. And you can call them as many times as you want until you shut down the whole system. These APIs use standard C types for parameters so you don't have to include any weird headers, any weird Windows type definitions and stuff like that. And when we did that, we decided it's a good idea to use it on Windows as well so that we can allow application developers to use the same hosting APIs if they want to have applications that work on both the systems. So now is the time for a short demo. How much time? 10 minutes, okay, cool. So I hope everything will work. So let me open Santos here. Okay, I need to make it a little smaller. I was not sure. So let me show you how you can get the... how we can get the Core CLR. I mean, the .NET Core framework and how you can compile and run a simple application. So let me create a demo folder here. I have somewhere here to get for the... it seems I'm not connected. Okay, so... Okay, let me actually skip this installation step because I don't know what's... So there is an application which is this program. It's a very simple hello world. So tell me what should I put here just so that... What? Okay, all right. So now we have this .NET tool that has several subcommands for compiling and for running and also for creating a distribution of your application. This is not the only way how you can build stuff but it's the easiest to start with. So I'll say .NET compile. I say .NET run. Oh, sorry, Bin. Yeah. Yeah, there is... Here you go. Thank you. Let me go back to the presentation. Okay, so you will be able to get this presentation later and you have the link to get the tools. We obviously plan to have some better distribution means so we will create RPM package with all this stuff. We already have packages for Ubuntu but for Red Hat it was... I was hoping for it to be ready for this conference but unfortunately it was not. So you need to get this star ball and just untie it and you will get the binary for it in there. And the way I've created... You can create a simple Hello World project that there is this like help recommend .NET new which creates you this simple Hello World program and then you need to restore the project dependencies which are the managed assemblies that it needs to use and then it pulls them down from the Internet to the local subfolder and then you basically do the .NET compile and .NET run and you can use .NET publish which basically takes everything that's needed for the application and puts it into a folder and you can then copy the folder wherever you want and run just the application. There is an executable named by the application. You just run it and it works. So this is the end. So now it's time for questions if you have some. I'll be happy to answer. It depends on what API you emulate. So if you emulate get environment string there is very little overhead except for the fact that we have to convert environment string from 8-bit characters to 16 bits. But difficulties are in the area of things like logs, semaphors, mutexes or for example waiting for multiple events which are things that B threads which we use on Unix for threads doesn't support. But I cannot really... Well, I should put it this way. We have run some benchmarks over code, web server code, compared to Windows and Linux and we are basically on par in some cases a little bit faster on Linux even with these. Yeah, we have to maintain some... For virtual memory allocations we maintain some data structures so that there is one unfortunate thing which is that for example the virtual memory freeing function on Windows doesn't require to supply size of the region while on Unix you have to supply it so we have to somehow store it and remember it for all the allocations. So that was one of the things. But there is... Even for threads we have some data keeping that we need to... So yeah. That was another guy if I may... Yeah. Yeah. Yeah, it was very limited version. It basically was... I think it was the stuff that was originally used for the server light and then it was... But the stuff that was basically made public was the GC, as you said, or Jitter was limited because at that point we weren't that open source friendly company and we just thrown out the... You can take a look at whatever you want with them but it was no light development. It is. Yeah. Everything in the runtime is exactly the same. There are if-depths that are for the full desktop runtime because it has some additional functionality that are not in the core profile but it is. Because the core CLR is quite big. I mean this is the native runtime and it has these native functions I mean the Windows functions call all over the place and we still want the same like runtime... It's not like Linux port. This means that the runtime on GitHub is the Windows one as well. It's like compiles for both. So we couldn't just replace them with something else, right? Yeah. You mean like using some other effort. We already had that from the router and server light. The PAL was basically... The PAL existed. We just modified that. So when we started porting we didn't have to create the whole PAL again. I don't think so. I think it works pretty well and there's no reason but if anyone from the community wants to change something they are free to do that. Sure. Everyone is welcome to participate because... Yeah. Okay. Wine. Well, for managed applications well there is... This doesn't have any UI stuff obviously. It will be kind of weird to try to simulate Windows UI on Linux because it would look weird. And there are actually... On the other hand there are some community people that created basically emulating the frame... The windforms on Unix so that it can work on top of that. But wine is for native code, right? So it's like it doesn't have... It can run anything while... This is just for managed code. So I don't think it will... It's like a replacement in a way. It's mainly for... Okay, so... Yeah, it would be nice. There is a... The SQL connection. The entity framework stuff. That exists for SQL Server and we're also working with Oracle and my SQL people like that to make sure there's a .NET or K-core database driver. And then the application itself if you go to .NET.github.io which is a later page there which says once it takes port and there's a tool you can run you can just tell it to be a code until now. Because there are some API differences between... Well, it's mostly similar in the framework API. We did take the opportunity making some great changes to the API. So there are some changes in there that you'll need to pick up because it's more modular and certainly guys have moved over in different conditions. Normally about 9%. It depends on all the dependencies you've got and if they have a core version then they have a core one as well. And we've only just got like X units over but it was only just for anything else. So we ran out of time. So thank you very much for your attention. And if you have any more questions I'll be happy to answer them outside of this room. I'll be here till the evening so if you see me just don't hesitate to ask I will be in front of this room. So if you have questions right away I'll be happy to answer. Thank you. And I have something for the first three questions. OK. OK. OK. OK. OK. OK. OK. OK. OK. OK. OK. OK. Alrighty! Yayo! Yeah! And there is another group called Lidietsche My name is Michael Strahovski.