 All right. We're ready to start. We will have Thomas, a photographer working on Bear itself, amongst other things, and working with NCC Group, doing his presentation now. So please join me in welcoming Thomas. Hello. So this is something that I've been thinking about for 12 years also. And it's no time to show it. So it's about, as the title says, secure programming for embedded systems. So this first requires a good definition of what I mean by embedded systems. So I have pictures. On the left, embedded systems. On the right, not embedded systems. So on the right, you will notice that there is a smartphone or a home router for the purposes of this talk. These are big, large, not embedded systems. I'm not going to talk about these ones. On the left, you will recognize an ESP32 module. So it's a small microcontroller with some RAM, some flash, and Wi-Fi capabilities. There is an actual smart toaster, a real one. I mean it's really a connected toaster which can send notifications to your phone over Bluetooth for some reason. And OK, the Kitten is not embedded. But the caller is a GPS tracker that it receives GPS signals to know where it is. And it broadcasts a radio signal to inform the owner or master of property of the Kitten where the cat actually is. So that's also the same kind of system. It's small, constrained in power, electrical power, computing power, memory, and so on. So in practice, what I mean by an embedded system is a small microcontroller, typically an ARM cortex, M line, or equivalent. Not a lot of RAM. We're talking 64 kilobytes of less than that. Not a lot of RAM, which is usually flash. It has some network connectivity and does not have an operating system. It runs on so-called bare metal. And there are very strong constraints on everything. For instance, the computing power or the physical size of the device or how much it's thermal dissipation because you don't want to cook your cat. And in all of these, the CPU constraints are actually not that important. In a typical application for that kind of device, it's not fast, it's low. But that's not the issue for the large majority of the code. And the RAM constraints are more a problem. Another example of a MIDI system is this badge. It's typically that class of hardware I'm talking about. So these constraints have a lot of consequence on what is called software security. And the first one is that you don't have a mirror management unit. So there are no protection on pages of RAM. In fact, there are not enough RAM to have pages. And all the RAM can be read by the code. All the RAM is accessible, can be read and written. There's no isolation. And it has some consequences, such as, for instance, if you have a null pointer dereference, it will work on a normal computer or a smartphone. If you try to dereference null, you get some sort of exception, segmentation fault, general protection fault, something. In that kind of hardware, you just access RAM at address 0. And it just works if there's RAM. Or if there's not RAM, the write goes nowhere. But it still does nothing else. It does not interrupt your application. You don't have address space layout randomization because there's no MMU to actually randomize things. You don't have guard pages on stack overflows. Normally with C code on big computers, if you allocate too much on the stack, you get the general protection fault, and so on. No such thing here. So this implies that if you want to develop code for that, you have to abandon rope of using general recursive algorithms, where a function calls itself ever and ever. Because these allocates cases on the stack, it's very hard to limit how much space it will consume depending on input data from the outside. So this means that if you have a recursive algorithm, usually you have a possible stack overflow, which is a general overflow of RAM. So it's a good attack opportunity. Another consequence is that you really do not have room. And you don't have room for large stacks or for multiple stacks. And if you have a device that must do several things in parallel, you cannot just run threads because you don't have an operating system to schedule threads. But even if you write one, you don't have the RAM to run several concurrent threads. Realistically, if you're doing some C code for small microcontrollers, you need something like 4 kilobytes per thread per stack. And if you have 32 kilobytes of RAM in total, you cannot afford a lot of these. And it turns out that the C language, which is the usual language for that kind of endeavor, has a tendency to consume more stack space that you would like. And it does not tell you. So I've got an example. This specific function is from the source code of last year batch for Northeck. So it's an embedded system. And this specific function allocates on the stack a 256 bytes array, which is used later on to assemble a message. So there are two things I want to point out here. One is that the 256 bytes are allocated when the function is entered. But they're not used while the three first function calls, GFX FillWake, SetCursor, SetTextBagoonColor, are called. And yet they are allocated. So this common idiom in C programming of simply allocating variables at the start implies some unnecessary memory consumption when calling these sub-functions. For the other one, let's have a look at the rest of the output, which is here. So I've compiled it, then disassembled it, and that's ARM code in some mode. So at the start, in red are the two allocations of the stack. And the first one saves a bunch of registers because the calling convention says that these registers should be saved by the calling, not by the caller. And that means 24 bytes just for the six registers which are stored. And then there is an extra allocation for the 256 byte array, which turns out to be 264. Why 264? Because when it's going to call the S-printer function, it will pass six parameters to that function. And per the API, the first four are in dedicated registers, but the next two are on the stack. So the C compiler not only is allocating the RAM right from start, even though the compiler knows that it won't be used at the start of the function only later on, but also it allocates more than US4. Oh, that's the kind of treatment that the compiler does because it's usually a good idea on big machines, but on small microcontrollers, it's more of a problem. So there are several languages for embedded development. Usually it's C. Why C? Because it works everywhere. Every provider for microcontroller also gives a development kit which is a C compiler, usually GCC or something based on LLVM. I have seen things based on other compilers, but even Visual Studio. C works everywhere, but it has some hidden automatic codes that are hard to control. In fact, it's hard in your source code to have strict control of how much space is allocated on stacks. It's way known that it's not memory safe at all. There's no check on the work processes. There are no pointers. There is a manual allocation and de-location when you use malloc, so you have the double three, use half of three. You can leak memory. There is a type system, but it's not strongly on thought, so you can do type planning. You take an object in RAM, and you reinterpret the bytes as something else. And then you have a whole lot of aliasing issues, which means that some code which works may cease to work when you optimize more aggressively or when you change the compiler version. It's a real mess. And still you have to support some sort of C, because it's a C word. And the SDK you have, which provides the necessary code for accessing the hardware, is usually, offers usually a C API. So you have to, whatever you do, you have to interoperate with C. Oh, another candidate is called, it's Java micro edition. It's an old thing, which we're working on phones when phones were not smart. I mean, that's like 20 years ago. And it has a garbage collector, it has strong types, it is known to be memory safe. And right now on the right is something, it's a screenshot from Oracle site because Oracle owns Java now. And it means that if you want to use Java, it means that you need at least 128 kilobyte of RAM, one megabyte of RAM, and still it's, and you need to provide an operating system on top of that. And it's still a very constrained there is or not that says actual footprint will vary based on target device and use case. So it's like the millage on a car, it may change. So this, usually you cannot use that on the smaller microcontrollers, just too big. Another example is fashionable language as Go. And Go, if you try to make a little word for Go on your computer, you will get something like a three megabyte binary. For some reason, it turns out to be large, but there are people who are working on something called TinyGo, which pretends to be Go, but for microcontrollers. And it has right now some limitations. Officially the support for Go routines, which are the threads in the Go world, is weak, which is an euphemism to say that it does not work. And you can have maps. I mean, if you could do some Go development, you have a lot of maps. It's a very basic memory construction just to organize things. And in general, you can have up to eight entries in maps in TinyGo, no more. So it's kind of restrictive. I mean, I don't want to blame the people who are writing TinyGo. They're trying to do something which is not really feasible. So they do their best, but it's not because it's Go that it's magical. The RAM constraints are still there. Even they have a garbage collector, which works from ARM and from other platform, it allocates, but it does not collect. So memory just accumulates. Okay, it's not easy to write a portable code which does garbage collection, but starting with Go would not make things easier. In fact, it's a language which has been designed for larger systems and which tends to assume that there is space. And on this hardware, there's no space. Another fashionable language Rust embedded, so they really intend it. I mean, there's a dedicated website, a lot of documentation, and they are making specialized conference and so on. So it's supposed to be as memory safe and as safe for everything like Rust. So you can have a heap, but you can work without a heap, but then it still have some second-greenness like C and even more so, because it wants to do a lot of magic things. And the magic is the selling point of Rust, and it's exactly what I do not want. It's, I want to have some good notions when I develop of where my stuff is, my data, sometimes cryptographic keys. I don't want data to be automatically compiled because the compiler decided that it was necessary to make a copy just to maintain the strict semantics of sharing but no concurrent modification of Rust or to unroll, sometimes the compiler will unroll loops because unrolling loops give speed and speed is good, but it uses a lot of more code space and it can imply a lot more stack allocation. So it's a good effort, but then again starting with Rust will not magically provide a better match with the one constraints. It provides the memory safety of the language, but it does not solve the issue with the memory usage. Now, that one is very old. It's from the early 1970s. It's older than me, which is already some feet. And it's more a field of ease and a language. And if you talk to people who are doing something with force, they tell you that you are not supposed to use the language, you are supposed to reimplement it. And it actually works, but you end up with something which is very close to the hardware, very non-portable, and you have absolute control of everything that happens in RAM and you can save a lot here. As far as safety concern, it's less save than see, which is again quite a fit. So I take it as an inspiration for a lot of concept, but certainly not a thing to do like that right away. So given the state of existing programming languages, I just told myself, okay, they all suck. I want to be thorough, so I'm going to design my own programming language with my own compiler just to do better than the whole world, if it works. And the language I need must, in fact, it must be able to interoperate with see. It must be, I must be able to compile it to some sort of C code so that it could be portable right away. It must be non-magic, a bit like force, and very unlike rust, but I want it to be memory safe, like rust, but very unlike force. And it needs to do something differently than other languages, because I want to solve the huge RAM constraints, and existing languages are just not designed for that. So I've got a success story from something which I call T0, which is the predecessor of T1. And you may have heard that I've written an SSL library called Bear SSL, which is optimized for embedded systems. And it does SSL TLS with a lot of features. It supports 45 styphal codes. It is very small, both in RAM and in ROM, but it supports the whole protocol. It's supposed to be uncompromising with regard to security and so on. So it's something that works so far. It has no dynamic memory location. There's no malloc code. In fact, there are only three external function calls which are memcpy, memmove, and srlm. It requires nothing else. It does not know what an operating system is. And it's, in fact, a computational library that does not know what a network socket is. It expects you to provide the bytes which have been resized from the network, and then it will do the SSL magic and tells you what's in that. And it runs as a co-outine. The co-outine is like a thread except that it does not work concurrently. It has a state in which it knows where it is within a complete uncheck or inside the uncheck of SSL where it is within the validation of the X599 certificate chain sent by the server. And when it expects incoming bytes, it can be interrupted and then you can jump again into it just where it was when the new bytes are available. So I've got a schematic here. On the left is what your application is supposed to do. The application is supposed to do all the Wi-Fi, the IP stuff and also provide the application data which is to be sent through the tunnel and retrieve and do something with the application data that's been obtained. And then the wall of bear SSL runs as a sort of state engine in which you input bytes, it gives bytes back. It does not perform network calls, it does not block, it just tells you, okay, at that point I have some bytes to send, here they are. Call me back when the bytes are gone. And internally it uses other coroutines, one for passing all the unchecked messages of TLS which are complicated. And that one will itself use another coroutine which passes the certificate chain from the peer, from the server, typically, through the client. And the certificate chain can be up to 16 megabytes as per the standards. It's uncommon, but even open SSL, for instance, allocates up to 100 kilobytes. And I don't want to allocate 100 kilobytes, I don't have 100 kilobytes. So I must pass everything in a streaming fashion. And so this requires the X519 processor to be able to absorb bytes as they come and to give me back the control of the CPU when the next bytes are not dead yet. So I need to have, in fact, several coroutines and each coroutine being like a thread I would need its own stack and I don't have the run for the stack. So here comes the magic. So the magic is to write my own force. So I told you that I don't want to do that and yet I did, but I changed a few things. So for instance, everything in force is about doing your development of the target systems. I mean, it was designed by an astronomer who wanted to be able to patch his code at night on his telescope, a 1970s telescope. So that was not a powerful machine. And I don't do that. In fact, I have a separate compiler which is implemented in C-sharp and which runs on a big systems where I have RAM. I have a lot of computing power. I can do a lot of optimizations and it's a separate compilation step. Then as force, it compiles through something which is called threaded code. It's not native code and it's something which I will be going to detail and it's, if you remember only one thing from this talk is how to write threaded code. It's a very nice way to generate some virtual machine interpreted code which is relatively efficient and which is extremely compact syndrome. And in that virtual machine, there are two stacks. One were handling function parameters and the other for handling local variable for functions and instruction pointers. So, and these stacks are very small. And in fact, right now I limit the stack size to 128 bytes each, so that's small. And actually I can guarantee with a compiler that they're not used, not fully used. The total, I have figures afterwards but I use 168 bytes of states and 168 bytes I can allocate that. I mean that, I can afford that. So here's something. This is T0 code, so it looks like force and if the first time you say that you should be slightly disgusted and confused. But it actually makes sense. It's a post-fix notation which means that when you are doing operations, you put the operations first and then the operation that acts on them. So you don't compute one plus two. You push one, then you push two, then you do a plus on both. And everything happens from left to right. So it's very clear the order of operations and that's part of the no magic. Here, you have three functions. In gray are commands. The one in parenthesis are not actual commands. They are also a bit of type systems which I've implemented in T0. This one says that the function readAIDS expects one value on the stack on entry and returns two values. All values are 32-bit words. The names I use are just commands. But the compiler knows that this function should return one more value than it had at the start and it verifies that. So here, the readAIDS, what it does is that it's trying to read one byte but the value it has on input is the maximum number of bytes it can still read. It's for passing nested structure in certificates where each sub-element has a specific length. So that, and you're not supposed to, even if there are more bytes in the certificates after that, you must unfold that you're not reading bytes beyond that limit. So here it duplicates the limit if not compared with zero. So it consumes the duplicate. If the limit is zero, then this code will be executed. Otherwise, it will jump as just after the then. And this code is basically, that's a problem, does not work. It's an error code, which values 36 and it just stops everything, everything in that coroutine. And otherwise, what is that? It's subtract one from the current limit and then it calls another function which is not shown here, which does the actual byte reading. Underneath two functions which uses red eight and want to read two bytes and interpret that as a 16-bit value in big engine or in little engine because of course both are used in certificates and in TLS. So you have to have both. And what you must remember from here is that in that kind of code, executable code is mostly a sequence of function calls. Sometimes you push a value just at the eight, it pushes a value, but about everything else is calling a sub function. So the straighted code is about representing each interpreting function as a sequence of pointers to the structure that represents the call function. So it is the worst schema you'll see today and everything is in there. So first, I have the two functions which I just shown which has read eight and read 16-B and I'm concentrating on the second one. On the left, I've represented what it does. It's just a symbolic representation that is basically calls other function. Read eight, left shift, swap, read eight again, what which is something that rotates value, the three top values on the stack plus which adds two values. And it has two special order, one const which just pushes a constant value and read which exits the current function. Then here on the left is the representation in memory of that function. It's each box is a pointer or pointer sized world. And each call is actually a pointer to another structure in RAM which is again a sequence of pointers. And the first one is a pointer to a native code and in fact, C code, assembly, whatever that executes the function. So let's first see for instance const. Here the read 16-B function add a pointer to const. So when the interpretation arrived at that point and the interpretation is running the virtual CPU, it's just a loop that reads it has an IP, it's an instruction pointer. You can imagine it as a dedicated register or a variable in RAM. It's something which is simply there. It's a pointer to the next instruction to execute. So it reads in another search variable called w. It reads here the pointer const. And then it will follow that pointers and it's at the first slot at the pointy structure. It expects a pointer to some native code which is run. So here it will just run const which will use the instruction pointer to read the next slot which contains the eight and push it on the data stack. Afterwards it just loops. So from the C point of view, you're not accumulating function calls. It's flat. It does not allocate stack space. Now for normal function which is itself interpreted, it does the exact same thing. So when it reaches read eight, it sees a pointer to the structure and the first field of that structure is a pointer to code. Which and what it does here is the magic of the threaded code. It saves on the system stack, it saves the current instruction pointer and then change it to just the next slot. So it's calling a function remembering on that stack on the system stack where it was and modifying the instruction pointer to point to the code of the call function which is here read eight, which does its stuff. And at the end read eight, when it reaches the red, it will just point to the same case as here which just restores the instruction pointers. So when you run that small loop, which is basically your virtual CPU from the C point of view, just a single loop that does not call everything, but by virtue of reading these functions and following the pointers and executing the things, it's really running the code and the function calls and functions that function calls and so on. So this is called indirect threaded code because there are two indirections for each instruction. So it's not especially fast, but it's very simple. And with that kind of technique, you can have complex code which does allocates very little space on your stack. Here there are two stacks, which are manually implemented, they are not the system tax. There are one where you put the instruction pointer and it's just four bytes. You know exactly how much space you consume on it. There's no automatic compiler which will say, oh, I need a few extra space just to put some parameters for function and so on and allocate more. This does not happen. And the data stack is where all your values such as eight and this one again is something where you know exactly how many values you push on that. Because that's where everything goes. So there is an extra trick that you can do over that. This one, I just explained it. It's token threaded code. Token threaded code is the same thing as that, but you add a function which was a sequence of pointers and we're talking about 32 bits, four bytes for each pointer, that's a lot. We want it to be more compact. So in practice, when you have a number of functions that call each other, you don't have four billion such functions. You have a relatively limited set of function. In my X519 decoder, I have 200 something. And 200 of something is in fact less than 256. So I could fit each pointers on one byte with an extra indirection that is instead of having your pointers, I have just single bytes. And each byte is an index in a table which contains the actual pointers. So at that point, I arrive where each instruction is basically one byte. And that saves a lot of code space at the expense of a bit more CPU. But as I said, it's not really an issue. So when I compile my code, so I've got the compiler is written in C-sharp. I run that on Linux with Mono, it works. It just works. And then here, I give it two T0 files to process and it tells me, okay, the maximum stack usage here is 17 words for the data stack and 25 words for the written stack, written stack where are the instruction pointers. So a total of 42 words, which is 168 bytes. It also tells me that all the code that implements all the X519 certificate validation with the decoding of the names and which are doubly nested structure and the decoding of public keys and it decodes the validity dates. It implements all the words of a polyptych-gegorian calendar. You don't want to know what that means. And it does all the transcoding. It can extract names which are in UTF-16 or UTF-8 or let-in-one and so on. And it transcode everything to UTF-8 and verifying that it's really valid and it explores the subject item extension, it explores the basic constraints. It does a lot of stuff and it does all that in 2836 bytes. There's a bit of extra C-code but the complete compile one fits in about six kilobytes. So to show you how it looks, where did I put that? And okay, this is the code which has been automatically generated. It contains a lot of C-code. There are basically interfaces between the TZ word and the C-word. So this I wrote manually. And then here, these are the instructions. So the actual code written in TZ-0 is there. This has been computed by the compiler. So it's basically bytes. There are some symbolic values but with a macro that says that for instance, this one is just a single byte. So this is a single byte. So these are the 2836 bytes. Then this is the indirection for the token that say when it says the token zero since the function that address zero, address five, address 10, these are the addresses offset within the other table of where the function code starts. So you've got all of them here. And I am skipping the macro. Here is the instruction pointer. It's just a local variable. And the virtual CPU here is just a for loop. And what it does is read the next token. If the token is below a computed value, it's called to a primitive native function. And then it's a switch. It's where I put all the small code which goes to the C-word. And I've got 63 of them I think here. Yeah, not 61, it goes to 60. And after all the while, these will enter, it means okay, it's an interpreted function. So it does what was shown in a previous slide as enter. It just enter the next function. The data stuff. So there's only a single function which really interprets all the code. And what that means is that I can do some complicated treatment while saving a lot of room space. I know exactly how much room I'm using and it's not a lot. And I have strong guarantees. I have strong guarantees which are compiled and verified on stack usage. And since there's no malloc code, I know that I want leak memory and I know it won't overflow, it's stuck. And so it has some sorts of memory safety, but not everything. Not everything because it works but it's not actually memory safe. And in fact, two and a half years ago, there was an actual buffer overflow in that code. In the experiment certificate which was formed with a fuzzer which was relatively impressive because it means that a fuzzer can actually follow that kind of code just automatically. And there was a type confusion. I was decoding a value at the 32-bit unsigned integer and then interpreting it as a signed integer to see if it was not larger than a buffer size. And of course, three billion and so on was negative as a sine 32-bit, so it was not larger than the buffer. But when trying to read three billion and so bytes into a buffer, it was a bit large. So at that point I thought, okay, it works but I need something better. Why do I need something better for a number of reasons, which is that one? And also that I want to implement TLS 1.3 and TLS 1.3 has a few exome characteristics. One of them is that there is no longer the guarantee that the server will send a certificate chain in the right order. In fact, explicitly it may send extra certificates and do that in any order. So now I cannot really process and do TLS 1.3 if I just process a certificate as they come, I have to buffer them. So I need some memory allocation, which I want to control. And I want to be memory safe, so I will need a garbage collector. And because of the initial issue I have with TLS 0, I also want a type system. So when I had chosen the name TZ0, I had chosen it for three reasons. One, it was free. There was no language called TZ0 as far as Google knows. The second is that the first letter is the first letter of my name, so that's good. And the third one is that if I name something TZ0, I have room for ulterior versions, T1, T2, and so on. So here is T1. T1 is a work in progress. It's not complete yet, but there are also sufficient stuff that I can show you without shame. So what is that? It's like TZ0, so it keeps the beautiful force-like symptoms, which are actually some advantages, so you can do generic metaprogramming with it. It's something which comes from the force world when you're processing source code. The source code can just hijack the process right away. That is, you can define a function which is invoked as the compiler runs, and then it can take over the syntax for the remaining of the source code, or give it back. So any domain-specific language you want to implement, any syntax you want to implement can be known from the language itself. And in fact, the loop construction equivalent of the switch of C of a four are done in the language itself, just hijacking the source code and re-interpreting it. So it's something where you can define your own syntax on the go, which is abominable when you want to work with coworkers, but extremely powerful when you're alone. So I want something like that, but also memory-safe, so it should check our relinks. It should also have controlled memory allocation with an automatic memory management that protects against double free or use after free, and there should be no explicit free, and that means that there should be a garbage collector. And it can work because the garbage collector can in fact be very compact in RAM. It can even help because if you have a language with a rich, strong type system, then it's an ambiguous whether each word in RAM is a pointer or not a pointer, which allows the garbage collector to move objects in RAM, which means that it does not suffer from fragmentation. And if you try to do just C type malloc and free as it go in a small amount of RAM, fragmentation is going to kill you. Fragmentation is where your memory becomes Swiss cheese. It has holes, a lot of small holes, but no big enough hole for the new allocation you want to do. If you can just squeeze the holes out, but just moving the object and compacting them, then you can defeat fragmentation. And garbage collector can give you that, so I want that. So I want a rich type system to avoid all the type printing, which was an issue. I also want some sort of object oriented support. Object oriented programming is not really a language feature. It's a development philosophy. It's a way of designing all your WITER code. And the language can provide a few committees which helps with that. If you're trying to do object oriented programming in C, you end up with function pointers, which works, but which is absolutely not comfortable. So I wanted something a bit easier to manage. I also wanted some extra features to make it a really good, general proposed programming languages. So some sort of name spaces and possibilities of splitting an application into several modules that talk to each other. Memory safety, I've talked about it already several times. I've not told you what it means. And it means a lot of things and you never get all of them. So for instance, you want no uncontrolled type printing. You want no buffer overflow, of course. You want, in the context of dynamic memory allocation, you don't want the use after free or double free. Another facet of memory safety is that you want guarantees against stack overflow since you cannot, on small hardware, you cannot detect it with an MMU. You don't have the MMU. You can't handle it. You can't have the guard page. So you want some guarantee from the compilation system. You would like some guarantee against overall allocation because you really don't have an infinite heap. You don't want memory leaks. Some parts of memory safety that you may want to prevent concurrent writing. Worst as that. Rust, if you try to write Rust code, you will meet their friend, the borrower. The borrower is the enemy at the start. I mean, you really fight the borrower. That's how they present it. And the documentation of Rust says that in the end, at some point, you just give up. You cease to fight. You embrace the borrower, which means that it's still like some, but somehow you have admitted that it's normal. And it's part of the memory safety of Rust, but for instance, it's not part of the memory safety of Go or Java, which do not have this feature, and which are still said to be memory safe languages. So memory safety is an aggregate term for a lot of properties, and you will get some of them in any given language. And if you get enough, you can pretend that it's memory safe. It's never absolutely safe, but you want some of this. So in T1, memory safety tries to be, I try to achieve it with the maximum of compile time checks because compile time checks happen on my machine where I can change the code. If there's a problem, I can fix it, and it's a powerful machine. I want to have the least possible runtime checks because runtime checks, when they fail, happen on the device which is deployed, so it's too late to fix things which can just throw up our hands and give up. So the runtime checks which are unavailable are the checks on the web bones. In full generality, you cannot prove arbitrary code, that arbitrary code will not try to access on a web beyond its boundaries. Sometimes you can, but you still need some dynamic checks. And all the automatic memory management is also a very dynamic system because you cannot in all generality compute all the allocation and dislocation which should happen at exactly the right time. If you can do that, you just solve the halting problem and the halting problem is unsurpassable. So instead, you need some runtime you would seek which is called a garbage collector. And it may fail with the out-of-memory exception or equivalent when it just cannot. But usually it works well. As compile time checks, I want the maximum stack size. I want some sort of escape analysis because I want to be able to allocate full objects on one of my stacks. To be sure that it won't be accessed after the space has been released. And this is called escape analysis. It's important for safety, but it requires some sort of smartness from the compiler. I want all method lookups to be solvable because all my function calls are becoming method calls which depends on the runtime types of what the object and the values I'm handling. So I want the compiler to make sure that I will not fail with an exception that says no for these objects, there is no collable method. I don't want the time predict. And I want to have also some statically allocated constant object which will go in flash. And so the compiler should make sure that at no point I am trying to write data in flash because it won't actually write it. And so this should be prevented and I can't have a dynamic check for that because I don't have a MMU. So here is an example of Java code to explain what I mean about object-oriented programming and why in that aspect Java technically sucks. So on the left is Java code. You've got two classes called A and B, B extend E and each of them defines two methods called foo which takes an A or B parameter and each has a print statement that says exactly which one you've called. And then on the bottom, you've got something which has two local variable of type A but filled with reference to object of type B. It's correct because B extends A so you can write and reference an object of type B into a variable of type A. And then it's calling on one of them the method foo with the other as parameter. And this will print foo B A because it will call that one. That is to decide which of the foo-foo function is called. It will use a runtime type of what is in X. It's a variable of type A but it's really an object of type B so it's calling a method of B. But to decide which of both methods in B it's calling is using this time the static type which is the type of the variable and not the type of the object which is in it. And you have that in Java. You have two type systems for method lookups. And one uses the runtime type and the other uses the static type computing from the expressions. And it mixes them and when you write code you have to live with that. And I don't want to live with that. I want something better. So here is the equivalent in T0. So I'm defining a structure A and B which extends the structure B. In fact which is a subtype of B. There's a subtlety but anyway. I defined the four methods and they are defined outside. This one says foo is called if at that time the stack contains two objects of type A then this one should be called with and do that, print a string. Then the three other methods for all combination of A and B. And when I'm here I'm calling new to create some object of type B. I'm writing them in local variables just to mimic what happens with the Java code. And the local variables don't have a type associated with them because there is no static type analysis with the expression. There's no notion of this variable as type A. It's a local variable which at this point will contain an object of type B. And then when the foo method is called it will use the runtime type for both parameters. There's nothing special about the first parameter. So it's more generic object oriented programming than Java or C sharp or about almost any other programming language out there. So I mean it's like I've decided to be more Javaist than Java. And here I say there is no explicit static type analysis. However, I want the compiler to do everything at compile time. So in fact, there is a lot of static type analysis but it's not explicit. I want the compiler to be able to work out when it sees that code that when it's calling foo there will be the two objects will have type B and there is a matching method which will be called and preferably even to work out at compile time that this will be that foo method only. So that will be a direct call and the three other methods won't even be included in the generated code because they are not called. So I had to define sometimes you have seen some structures. So I want to have decided that every value was a pointer. I mean even a plain integer is a pointer. It's just a formalistic trick. I mean I say that when you have a value five it's not five. It's a pointer to a virtual instance which incarnates a value five which allows me to say that everything is a pointer and it contrasts with go where usually you have value types. If you define the struct in go and you have a value of that structure it goes will say on your stack. If you pass it to a sub function it will copy it so it's more space on the stack. So instead in T1 everything is a pointer. So if you want to have a copy you have to call for it and this is part of the process of keeping the developer which is myself aware of all the memory which is allocated. So I've got a number of types including plain integers and modular integers. Modular integers are usual 16, 32 bits, 64 bits integers, you know them. And there's no null pointer. I just don't want null pointers. I mean everything is a pointer but there's no null. Just like it. So if there's no null there's no null pointer the reference. This should fit. And there's no null pointer difference so it's not a problem. But of course you can have some structures with uninitialized fields. And then the idea is not to return a null in that case is to trigger an exception. Raise an error. In fact kills a complete coroutine. So it's a way of avoiding the null values to spread everywhere. This is not a completely novel thing. I mean there are other languages with just null. One of them is objective camel. For the plain integers I've decided to go as Ada. I mean if you have some integers and you do an operation which goes out of range of representable values in integers you've got basically six ways to handle that. In C-sharp Java Go it's granted to use modular arithmetic. It truncated to 32 bit and that's it. In Ada, a null language it was an exception if you make an overflow. This has destroyed the iron five first flight the expensive rockets. But usually it's still a good idea. In Rust you can have both. It depends on whether you compile with or without debug. So much for safety. And in Python or Scheme, most Scheme implementations when you go out of range we don't go out of range. It just automatically switches to big integers with dynamic memory allocation. Which is nice but it requires dynamic memory allocation. In JavaScript it's doing something weird which is to use floating point instead of integers. So when they go out of range they become approximate and at some point they become infinite which does not make for really strong code. And in C or C++ if you go out of range it's an undefined behavior so anything goes. So I've chosen to use the Ada way because it seems to be the safest among these that don't require dynamic memory allocation. Okay, this one I've got two minutes. Okay. Here's the secret weapon to make the whole thing work. It means that all the type analysis which is performed by the T1 compiler is not on a per function basis, it's on a whole program. The idea is that here you've got a function called triple which is defined to be invoked on any object. And what it does is that duplicates twice the values and calls twice the plus function which supposed to add it to itself if it's an integer. If it's a string it will just concatenate it. So triple can work with integers, modular integers, with strings but not with everything. Not everything can be added. And the compiler does not mind that I said it can be called with any object because it does not try to imagine what it would happen with any matching possible type. Just want to make sure that I'm not calling it with a type for which it would not work. So here when it sees the main function, it follows what the main function does. See that it calls triple and at that point there will be a modular 32-bit integer. And so it analyzes that triple function with a stack that contains a 32-bit integer and it works. And then in the second line of main it will see that it calls triple again but this time it's a string so it does the analysis again. This time with a string. And it still works. And crucially it does not try to merge both. It considers the two triple calls to be independent of each other. So it does not try to imagine a triple function which would have a stack with three values which can be each of them an integer or a string. So it does never ask itself what would happen if it tries to add a string with an integer. Because it does not happen. And the compiler works out that it does not happen. So everything is fine. And when generating code it will know for each of these code which plus function is to be called and it won't even have to look at the runtime time of objects because the compiler will have known at each point where what is really the type of things on the stack. So this implies one detail that but it means that the code has generics. In fact every function is generic. So you can tell that to go programmers where they don't have generics. In T1 there are generics. I mean you can't do anything without generics. It's all there. It also means that we really can't tolerate recursion because that would mean an infinite tree. And an infinite tree takes an infinite amount of time to compile so no. And there's a lot of this also allows escape analysis and detection of writes to constant instances. So the current status that's my last slide. So I've put it two days ago on GitHub. This one just put the stuff and what you will find there is a 55 page document which is a specification of the language with a rationale and a lot of explanations of why it's designed that way. And there is an incomplete bootstrap interpreter compiler. It's the first compiler which with destiny needs to be trashed at some point. It's just to be able to write a second compiler which would be the final one written in T1 itself. So I need a bootstrap for that. And that first bootstrap has a working interpreter. It does the whole program analysis but the code generator is not finished. So I have to finish that to add some sort of standard library with for instance list and sorted maps with the memory allocation and a small garbage collector which I already have but it's not integrated. And then I will have to rewrite the T1 compiler in T1 because that's the tradition. When you design your language, you do that. I mean, it's unavoidable. And it's a very good test bed. And if it works, if the compiler can compile itself then you know that a lot of things that you implemented actually worked. So you're welcome to follow that URL and try to read my prose and be overwrote or disgusted depending on your test. We're almost out of time but we might have time for one or two questions. Why do you want to rewrite your T1 compiler in T1? Why do you want to rewrite it? Okay, first because it's very traditional to write a compiler for language in itself. It's a way to demonstrate that the language works to other developers. I want to test that the compiler works and having the whole complicated code base written in that language is a very good test bed. And it's also helped me to validate that I did not get it wrong with the design. I mean, I'm trying to design the language feature so that they work well for the developer. So it's a good way to verify that it actually allows to write code efficiently. And for that, I am the measure of all things. I mean, I'm writing that for me. You're welcome to participate but for all aesthetics question where there's no really good rational answer, I make things as I please. And I want to know if I will be pleased with it. Well, thank you very much. Thank you.