 Okay, so I've done this presentation one month ago and I ran badly out of time, so I've got a lot of stuff to say and I know that some will not fit. That's not a problem, so I've moved things around so that I will say today things that I did not say last month. And I will talk about bare SSL, which is an SSL library. So the outline answers the first question of why another, and then I talk about constant time implementation and all the challenges about implementing SSL in constrained RAM because that's the focus of that implementation and a few considerations of extensive certificates which are pretty horrible and then if time arose I tried to describe a lot of existing SSL attacks and how an implementation can strive to avoid them and finally why SSL is that kind of protocol which is in the need of replacement but that's maybe for another day. So about two years ago I woke up one morning and I told myself today I will write another SSL library and it took me a little more than a day but now it's done. So in order to understand why, first a couple of slides to explain just a refresher of what is SSL. SSL is now known as TLS. I'm using the term SSL to design, to designate the whole family. So the change from SSL to TLS is a change of ownership. The SSL versions were owned by Netscape and the TLS are designed by the IETF. So it's more a question of marketing than technology. That's why they use SSL for all versions. That in practice SSL 1 and 2 and even SSL 3 won't be implemented. So whenever I say SSL you can feel free to understand it as TLS. So SSL is basically plumbing. Whenever two machines must talk together over potentially hostile networks they need some sort of protection against passive if dropping an active attacks. And SSL is the most well-known protocol to do exactly that. That is it takes as input some reliable transport for bytes, bidirectional streams. And by reliable I mean that if there's no attack ultimately all bytes will go to the right destination with no duplicates and in the right order. So you can imagine a TCP connection which that's exactly what it does. So when there's in a non-unstile context you get your bytes from one client to server and back. And then SSL provides security above that. That is it takes that reliable transport and provides a reliable transport which will really detect alterations and will provide confidentiality. So in HTTPS the S does not stand for SSL. It stands for secure but the first S of SSL is secure. So it really is HTTP in SSL. In other protocol SSL must be invoked explicitly for instance in SMTP you've got the start TLS command and so on. So it's an important element for transport security. It does not do everything in security. If your server has a secure injection vulnerabilities and it's powered by SSL then you'll get SSL powered injection vulnerabilities. That won't solve it. But at least if you don't not have such a reliable secure transport protocol a lot of things that you would like to do are done from start. So it's important to have it. That schematic is the core of SSL. It means that whenever you want a client and a server want to talk there is this onshake which is a protocol with messages exchanged. In both directions that ultimately agree upon cryptographic algorithms and keys to use to do encryption in the application data. So what you must understand from that point is that the client is the one that talks first and sends the client a law which basically says what the client can do. And then the server will answer with a server law which answers what the client and server will do. That is the client will offer a number of cryptographic algorithms and the server will choose and so on. The messages with a star optional so it depends the server may ask for instance for a client certificate but it may also not ask for one. So the implementation must be able to react to the flow of messages with our exchange. So it's a relatively complex protocol and to get it right you have a number of options to take care of. So these things are technically known as things. These are rising category of objects which are internet protected. So on the upper left you've got something which is called a Kevo lock. It's basically a door lock that you can open with your phone. So you understand that there's some security issues so that only your phone may open the door to your home. On the upper right there is a security camera which you put again in your home and which will propagate pictures from the inside of your home to the world in general. So you generally want it to be restricted in some way. Other things are the light bulbs on the lower left which can be controlled from the again from your phone or even from a very remote place. So that for instance in Montreal if I had those light bulbs I could light up my house from here and my house is in Quebec, it's 250 km away. And people do that, don't really know why but they do that. And on the lower right is yet another connecting thing, it's called a MOOCOL, it's a green part. In fact that device is meant to warn the farmers that a new co is about to happen. That is that co will soon spawn a new wheel and when the interesting things happen the farmer must find the co back and get the veterinary and some help so that it goes smoothly. So basically that co is internet connected and we'll talk to the outside and also reveal its location. So it's really an internet, it's a server, it's on the internet. So all those things are in need to do some secure communications so you would like to run some SSL on that. Unfortunately there was no SSL library which could do all the stuff listed here. That is a new library which is really correct and secure which works on systems which are very little RAM so I'm not talking megabytes here but rather 25, 30 kilobytes. It should be small also in terms of size of code. It should be able to run on systems which do not have an actual operating system. That is just like your North Tech budget, that kind of system is bare hardware and you don't have an operating system, you just the code must issue command to the hardware directly. So a library that runs on that kind of hardware should be able to run without the luxury of having threads or system codes to open files or so on. And since it should run on embedded systems, it should run with the tools which are used on embedded systems, this is basically a C world. So we are not talking about Java or Rust or anything which is more meant for bigger computers and not yet ported to small systems. There are a lot of existing SSL libraries, open SSL of course, and it's derivative like Libre SSL, Boring SSL but also smaller one including Embed Telus which was called Polar SSL and it's meant for embedded systems. But it's not as small as we would like, it requires about 50 kilobytes of RAM and it needs some OS support and so it's not as small as I would like. So I decided to write a new one just basically to show to the world how all such things should be done. There's a lot of rubric in that, I'm totally aware of it. So I wrote it from scratch, I wrote it in C, mostly there's a twist we'll see later on. And I wanted to fit in 25 kilobytes of RAM, about 20 kilobytes of code. That one is not a success, I am at 21 kilobytes of code but I'm working on it. And I wanted it to use no dynamic memory allocation whatsoever, so there's not a single mallocall in everything. It uses static size buffers provided by the caller and it tries to do anything in that. And this means that I can claim with full certainty that this library is one of the reviewer libraries that have correct memory management. I mean, since I don't allocate anything, it's obvious that I did not forget to fill anything, there's no memory leak. And it looks like a gimmick but it's important. When you have your system which is attached to the core and just working through the fields, you don't want it to crash or needing a reboot because it has run out of RAM. So having correct memory management is an important feature. And it's a bit extreme to completely ban dynamic memory allocation but it works. And I also wanted it to have a state machine API. A state machine API, it means that you're not using the library with a simple read call, telling to the library, connect, do your stuff and then give me the bytes because that would be blocking. Instead, it's a state engine in which you push data that you would like to send or bytes that you received from the network. And then it provides bytes to send to the network or application data which has been received and decrypted. So we'll see later on how it looks like but it's meant for better integration in a system which does not have an operating system, does not have threads and still must do several things more or less simultaneously. So I wanted some extra goals because why not? So I wanted all the cryptography to be pluggable so that new implementation could be substituted for specific platforms. I wanted all the code to be clear, documented and possibly to serve as an educational tool so that the source code could be shown to potential students and we could tell them, read that, you will learn. And of course I wanted it to be open source and reusable. It's an important characteristic in that there are some small SSL libraries which have non-difficult licenses. It's a bit inflammatory but I mean there are usually either GPL or proprietary or both. So if you want to reuse it in your system you must either pay or show your own code. And in practice a number of vendors of embedded hardware don't want to buy things because they like to keep their money of course. But they don't want to show their code either so there was somehow a need for a library which would be MIT license so that anybody could reuse it without thinking about these problems. Okay and of course I needed to support many algorithms because the point of using SSL is to be able to interoperate with existing implementation. So it should be able to connect to existing servers which have various configurations. And finally the library should work on a small system but also on big systems and it should work reasonably well. So there's a lot of cryptographic algorithm that I have implemented so it supports RSA up to 4000 and 96 bits. This is a hard limit because it's a size of an internal buffer but arguably anything beyond 2000 bits or so is useless unless you want to impress an auditor. But it won't bring more practical security because RSA with 2000 bits is already far beyond that which can be broken with existing algorithm and technology. So I also support APT curves, NIST curves and the new fancy curve 25519. With the X it means that it's Diffie Elman over that curve. Also AES and Chacha 20 encryption and so on. And I included some support for Tripodes and Chawan just in the interest of interoperability but I keep it at the bottom of the list. Every time a client and server supports several safer suits there's a preference order so I put Tripodes at the very bottom of the preference order. So that it gets used only when that's Tripodes or nothing. And arguably Tripodes is better at encryption than no encryption at all. So a constant type cryptography. It's a relatively new thing. It appeared around 2005 really in earnest. The first attacks which were related to site channel records for timing were published in 1996 I think. And then it developed and it has become now a rather constant theme in that having good algorithm is a nice thing. But when you implement them there is a possibility that some secret data especially the secret key might leak through how fast the speed of execution of the algorithm or how it impacts the speed of execution of other operations on the same systems. So a lot of interesting demonstration have been done in the lab. And there has never been as for now an actual use of timing related leak to crack a system observed in the wild. We have code which works some demonstration. It seems important but right now real attackers who do that for profit don't do that. They will one time but they still have easier practices in practice. But when the last SQL injection has been finally killed then attackers will try to break cryptography and they will try this timing attacks. So timing attacks are a subclass of all site channel attacks. Site channel attacks are about leak of information from outside of the abstract model of the execution of the algorithm. The algorithm has a mathematical description and it says you get this an input you will get that an output. And every piece of information which flows to the physical world outside of that abstract model is a site channel. So there are many possible site channels such as in particular energy construction. It's called power analysis and it's a very efficient tool to try to break the cryptography in smart cards. But one of these channels which is the exact timing of operations has the unique property of being observable from remotely. That is if you are trying to do power analysis on a smart card you have to be physically in the same room of the smart card. You must interact with it. But a timing attack could be done from another country. So for anything which is connected to the internet and anything which does SSL must be using some network of some sort because otherwise SSL does not make sense at all. So anything which does SSL is a potential target for timing attacks. So execution time can vary on several classes in fact or in fact of leaks related to timing. The algorithmic timing attacks are basically every time you get an if in your code. If you execute some part or not depending on secret data then the execution time will vary depending on the condition for that if. Then the second class is called the cache-based attack. This is whenever your CPU is doing memory accesses for instance look up in tables or in code pass. That access may take some varying amount of time depending on whether the data was in cache or not. But also even if you can assume that the data was not in the cache at that point so it will be a cache miss. Then it may move away from the cache some other data. So it may imply a timing leak for some other operation non-cryptographic operation which will be done later on the same hardware. So you can have some indirect leaks. And finally you have some specific operation in particular integer divisions or in some case multiplications which are not constant time. And whose execution time will depend on the data which is processed. So these three classes of potential timing leaks must be avoided when you are trying to implement a cryptographic algorithm and you want a constant time implementation. So let's see the case of RSA. RSA, a classic implementation is a modular exponentiation. So you are using a square and multiply exponentiation in which you are doing a lot of modular squareings. And some extra multiplications which correspond to the bits equal to one in the secret key. So this is a conditional path. That is if the secret qubit is one you are doing an operation that you are not doing if it's zero. So the overall execution time will leak some information on that secret data. So it's a posterior case of a timing attack and that's what was published about 20 years ago. So one solution which has been employed by many libraries is to do some random masking according to these equations that you can read if you like mathematics. And if you don't like mathematics I won't explain it to you because you would be bored. So and the problem with that masking is that it requires a source of randomness and randomness is a hard requirement especially in embedded systems. So another solution is whenever there is a conditional operation to perform is to always perform it. And then doing a constant time choice between the unmodified data and the modified by the operation to choose one but always reading both. So in terms of code it will look like that. So this is a constant time look up in a table. Yeah, I've got that. Here I've got in that T2 variable I've got a lot of pre-computed values and I'm selecting one of them based on a special secret value which is bits which is exactly what it's named it's a sequence of bits in the key. And I'm reading the complete table but I'm using a special masking value so that I'm doing a bitwise and with everything I read bitwise and with a mask which will be zero almost all of the time except for the one value which I need. So the memory access pattern is always the same I'm always reading the whole table but infinite the value I get is just one of them. But from the point of view of the cache all the memory accesses have been performed and nothing depends on the secret bits so nothing has leaked. And then I'm using the value which I recovered in here. Here these are the square wings I'm multiplying value X by itself it's modular so there's a lot of mathematical stuff behind that it's a square wing and here I multiply the results of all these square wings with the value I've looked up from the table and so this is completely constant time. And so I have an RSA which does not leak information for timing and which does not require a source of randomness. So RSA was relatively simple in that sense. So now we're talking about other cache based attacks and especially in AES. AES is an encryption system which is well designed mathematically it's elegant but a normal implementation will use lookup tables and the slots which will be read will depend on secret data at each point. So some lab demonstration have shown that it was possible if you have two process running on the same system or even on the same hardware so they could be from different virtual machines on the same hardware. One of them is doing some encryption and the other process does not know what is being encrypted neither the plaintext nor the ciphertext but it can still work out the secret key which is used. So the key is leaking even across virtual machines. So of course on small embedded systems I won't have several virtual machines but you can get the same kind of problem if one of the encryption has taken place and the hardware may respond to a network query and which will exercise the same memory, the same cache. So one possible difference for the cache based attacks is something which is called micro-architecture difference. It's basically making extra accesses so that all cache lines are populated. So it's some trickery that depends on your precise knowledge of how the cache works on the hardware you are using. It's hard to do because you usually do not have an exact description of how your hardware works because the hardware vendor won't tell. I mean the best you can have is the hardware manuals from Intel or AMD which gives you a lot of information, some of which is wrong. And of course whenever you've written your code even if it's nice and it eats all the cache line and does not leak on the specific hardware you have developed then a new version, a new subversion of the hardware is produced and you move your code to that new version and it does not work anymore. So that wasn't that. Functionally it will keep on working but it will leak information so you won't know it. So that's why I say that it's a fragile way of doing things. So the true constant time is to perform no secret dependent memory access. That is the address of the memory access won't depend on anything secret and also no secret dependent conditional jump and so on. So one tool for that is called bit slicing which was discovered by BM about 20 years ago now. And it's a sort of new way of implementing algorithm by thinking about that as if there were some hardware circuit. So for instance if you've got a 20 bit value instead of putting it in one variable you are spreading it over 20 variables. It will take one bit in each variable and then everything you implement will be bitwise operations like XOR and HAND as if you were taking a world circuit and translating all independent transistors into operations. So for instance this is just an illustration suppose that in some algorithm you've got two six bit values you must XOR them together and then rotate left the result by one bit. So on the left that the classical implementation you do XOR then this is a rotate. Some shifting and masking and R to do that. So the bit to slice version we look on the right. On the right you see that you have now six bitwise XOR because you've spread all six bits into independent variables. So you have to do the six XORs. On the other hand the rotate has disappeared because it's a merely problem of routing data that is of taking the right variable. So you see that the results are numbered. So the results are variable Z from 0 to 5 but it's actually 1 to 5 then 0 because it incarnates the rotate. From the face of it we just replace five operations on the left with six operations on the right so it does not look like a big gain but there is an important speed in the bit slice operation which is that it's parallel. That is my variables here are not single bits. There are words with 32 or 64 bits. So whenever I do the XOR between X0 and Y0 I'm actually doing 32 or 64 XORs. So what I'm doing in the bit slice operation is that I'm doing 32 or 64 parallel operations. That is what I do on the right with five opcodes. I'm doing 32 or 64 times that operation with six opcodes. So it's in fact extraordinarily faster if the operation I have to perform can be made in parallel. That is if I have many identical operations to perform simultaneously with the bit slice representation I can do that efficiently. Moreover it's necessary constant time because since they are done in parallel there's no way that the opcode I'm submitting to the CPU will depend on the data since it should work for all of them. So it has been called authorization of data because if you want to represent it in matrix notation it's a transposition. You have to move data in such a way that what you thought as several variables because several bits in a variable and vice versa. So it looks good but it has also some problems that is it will use more RAM, more code, can make really big code and some operations such as lookup tables which will be used in the ES become very complicated circuits. So it's not a matter of five or six XOR but rather a few hundreds of them. And finally there are some contexts where you do not have several operations to perform in parallel. CBC encryption not decryption but encryption is typical in that when you are doing CBC encryption you process data block by block and the output of the encryption of one block is used as input for the next block. So you can do them simultaneously since each one depends on the result of the previous. However it's possible to do some mixed strategies in which you are trying to exercise internal parallelism and you have that in ES. ES is layered as a number of bonds and each one contains a number of operation and one of them is applying the S box. S box is a 8 bit to 8 bit cook up table and you've got 16 identical S box that work on 16 bytes and it does that in parallel. So that's a good opportunity to try some bit size which will compute these 8 S boxes in parallel. And that's exactly what I'm doing and I'm not showing the code for the reason that it's very long. That is it won't fit on the screen. But infinite I got some ES which is constant time slower than a table based ES but not terribly slower. I mean we're talking on a 64 bit machine we're talking that it's about 55% of the speed of normal ES. So we are down to about 80 megabytes per second on a big PC which is really sufficient for many users. And this specific ES implementation has been included about two weeks ago in the OpenBSD system. So it's now the default ES encryption in the OpenBSD. So for constant time cartography there are still some tricky of codes that must be overdid. So we talked about mirror accesses and conditional jumps. Integral division is a big no-no because it's never constant time. On some architecture such as ARM there's no integral division at all. That is it's a function. So if you want to know if it's constant time you have to look at what the C compiler is producing for division. And it's not constant time. You must make some care about shifts and rotations. That is when you are shifting or rotating some data the execution time may depend on the shift amount. Not on the data which is shifted or rotated but on how much you are shifting it. And famously the Pentium 4 which is already 15 years old but that was a processor which did not have that hardware piece called a barrel shifter. And if you are doing a rotation by some amount of bits it could take longer for a higher amount of bits. And this impacts some cryptographic algorithm where there are rotations with non-fixed amounts especially RC 5 if you remember that. And finally there are integral multiplications and on some processors most PC it's constant time. On some others it is not. And on some embedded systems especially the ARM Cortex M3. It's supposedly constant time but it's not. That is it's one case where the hardware the manual from the hardware vendor is false. It's state things which do not map to reality. So I'm trying to keep track of which architectures have constant time multiplication so you are invited to read that page and to contribute if you have some extra information on the subject. So another point which I would like to talk about is streaming and buffering. A lot of writing an SSL library is to be able to process data and to handle it as it comes in streaming. So for instance consider the first message which comes from the client sent to the server. It's a client yellow. What you should like in this description what you should see is that it's a nested structure. So it contains some sub structures including one which is called extensions which is a list with a number of sub structure and it's all nested. And similarly the client will need to validate the certificate from the server and here is a description of the format of a certificate. So it's a very small part of the description but there again you've got seconds which is called it's a structure which contains other structures which themselves contain other structure. And in the encoding each sub structure we have its own length and contents and sub elements. So what's to be understood here is that if you want to analyze such a structure it's much easier if you can have it all in one piece. Because then you can just work through it with a function and sub function that's passed the various pieces. That's the solution which is called buffering. So basically you allocate some RAM to get a complete message or a complete certificate in one go. So that's exactly what OpenSSL does but the maximum message size is about 16 megabytes. That's because the length of a message in SSL is uncoded over 3 bytes. So it could be up to 16 megabytes and theoretically you could send to a server a message header saying okay you'll get 16 megabytes just allocate that. So since it has a rather high potential for denier of service, OpenSSL will enforce a smaller and saner limit. That is it will refuse any message which would require more than 64 kilobytes. Now remember I wanted nomalock and to fit in 25. So buffering is a bit out of the question for BearSSL. So another way to process that data is exactly what you would do in modern languages such as I mean not so modern like Java or C sharp or more modern in which you have basically stream instruction. So you can write code which will read bytes and whenever it needs some extra bytes to pass a structure it just call a read method to get the next byte. And of course if the next byte has not been made available from networks this is a blocking call. So the thread which is waiting for the next byte will wait, other threads will run until everything has happened. Now this needs blocking operation and threads. So basically an operating system and each thread will have its own stack. So you have a big computer with RAM and an operating system. So it's not a solution which is applicable to my context. What I would really want is something which is known as co-witings. A co-witings you can imagine that as being another thread but it does not run concurrently with your own code. That is you have several co-witings and at any time only one of them is progressing. And when it needs to call another the CPU jumps to the other one and won't come back until that other one has decided. So if you have co-witings you can make a state machine API in which the decoding engine will be invoked by pushing some bytes as received. And then it will come back but still be alive as a co-witings and telling you give me the next bytes when you have them. So if your library is implemented as a state machine you can stream data and the library itself is purely computational. It does not perform input-output. It reacts to data you push and it produces some other. So you can then even if you do not have an actual operating system you can handle several stuff at the same time because none of them will be blocking. Now the main problem is that in C you do not have co-witings. It's possible to do some dirty acts to make sort of co-witings with set jump and long jump and it's that whole truth. And it needs bigger stack and it won't work on small systems because it requires a little bit of extra support from the system. So this is how the SSL API looks like. So there are four pairs of functions and each pair will tell you, will give you the address of buffer. This one send up buffer means the buffer where you can write application data that should be encrypted into the SSL tunnel and send to the peer. So peer SSL engine send up buffer will return you the buffer where you can write such data and will write in line how many bytes can be accepted at that time. And when you have returned some bytes in that buffer you call the other one to tell him, okay I've returned line bytes, do your stuff. So there are four pairs. The first two are for Pentax the application data and the other two are on the other side of the engine that is the bytes that must go on the network. So from the caller none of this call is ever blocking and it's the job of the caller to talk to the Wi-Fi, Bluetooth, serial line or whatever mechanism you use to move bytes around. Okay I do not have coroutines because at sea so the solution was obviously to invent a new programming language because why not at that point. And okay it's not pure madness because it was basically derivative from a very old language known as force which was invented in the 70s to drive automatic telescopes. An astronomer called Charles Moore invented that. So my language which I call T0 which has absolutely no meaning whatsoever is basically a force dialect which are then mixed with some things which are absolutely non-force which is that for instance it has a separate compilation in true force. It's interpreted and everything runs on the same hardware which is both the development system and the production systems at the same time. Having a separate compilation is not something which is no more in force but I still did it and since I had to play with two languages I was the compiler in a third one which is in that case C sharp. And basically well it looks like that. Okay it's a bit barbaric but it's a stack system so each operation, each word that you see here is basically about pushing something on the data stack and calling a function. So this excerpt is about processing bytes from the peer which appear in our special SSL messages which are called alert messages. So what must be understood from here is that each word is translating to a sort of up code in an interpreting system and which fits on one byte. So this code will take about 40 bytes of code space. It's very compact and with some experiments it actually reads like normal programming systems. So you have to think with the stack and it's a reverse polish notation. So for instance that dupe is duplicating the top of the stack. One is pushing another value. This operator is taking the two values on the top of the stack and think if they're different or not. If he's doing a conditional jump based on the results which stops on the then. The then is after what you're doing. So one thing that we are completely losing is that in C you've got some type analysis. The compiler will remember which variables are int or pointers or so on. And if you take an integer and use it as a pointer it will emit a diagnostic that is a warning or an error. When you're doing fourth everything is a 32-bit word so you don't have that. But still I've implemented some sort of type analysis rather it's the stack depth analysis. That is the compiler is able to determine the depth of each call at each point of the code and can do that statically. So in my case this is for the piece which is passing X509 certificates. It's guaranteed that it won't use more than 17 slots on the data stack and 25 on the written stack because it's a two stack machine. So I've got strong guarantees of memory usage and these are actually stronger than what I can get with C. So I can claim that it's in some way a bit safer to code in that fourth del X than to do it in C. Now safer does not mean safe. So for instance this is an excerpt where I add an actual bug which could lead to a real buffer overflow and remote code execution and so on. And it's a bug which has been discovered amazingly but an automatic leap further which would cause leap further. And the bug was just here. That seven was an F. What I was doing was that I was decoding a length which is an element of the encoding of a certificate to know how many bytes were there. And I wanted it to fit on 32-bit variables so I was checking for overflows. But I was checking for overflows assuming an unsigned 32-bit system. But in the rest of the code I was using that length with a signed comparison. And since there are no differences between signed and unsigned in my small type systems, the compiler was not seeing the problem. So the further produced a length which decoded to about 3.5 gigabytes which is huge and there were not that many bytes afterwards but when interpreting in a signed way it was a negative value. So now all the rest of my code was perfectly okay for reading up to hundreds or even kilobytes of code in a buffer which was much smaller because the comparison was using the size. It was considering that the size was negative. So this brings us to this atrocity which is called X519 certificates. So for the best or the worst and mostly the worst, SSL is married with X519 certificates and in the new TLS version they are trying to change that somehow. Basically it's a K distribution system in which clients know the public keys of some root authorities in some way and can use that to validate certificates which are signed entities that provide knowledge about the public care of a specific system. So the SSL has a pluggable support for example X519 certificate validation and I've made two implementations and one of them might be a better idea for an embedded system which is the non-key. In non-key that's a very simple implementation which simply completely discards the incoming certificates and uses an outcoded public key. All the certificates are meant for the server to explain to the client to publish the server's public key to the client. In a way such that the client may make sure that the public key is the right one without any prior knowledge. Now in closed systems and many embedded systems are closed in some way in that they already know which servers they are going to talk to, prior knowledge is easy. That is it is possible that the system has some prior knowledge of the server public key in which case decoding the incoming certificate completely useless. So the non-key implementation does just that, it's very efficient. Now I've done another one which I called minimal which does the actual certificate validation with all the signatures and name mapping and decoding extension and so on. So non-key I described it and minimal. It has some restriction because it's small. I'm reasonably proud to tell that I could implement that specific validation in less than 7 kilobytes of code. So it's rather compact but it can do everything that could be expected in particular because it does not allocate extra RAM so it cannot even remember a single certificate at a time. It must possess the certificate bytes as they come and can never have the full one so it cannot also reorder certificates. So it must validate the certificate chain as it is sent by the server. It can't report a set of transfer codes that is the color, the application which uses the library will configure it with what certificate authority is using. And then it will check the following things which are that all along the chain the subject and it's ready and match that the expiration dates are correct with regard to the current date. With a small difficulty here because knowing the current date means that you have a clock, not every embedded system has a clock. So the API is that the application can push the current time on which to do the validation if it's available but if it's not available you have no choice to accept any certificate as if it was not expired. Basic constraints is over distinguishing between certificate authority and non-certificate authority and K-usage is a restriction extension to explain that a given public key in a certificate may be used only for signatures or only for encryption. I also extract names from the common name and from the subject name because the client SSL in a web context must validate that the server certificate is not only a real certificate but also the real certificate for the server it expects. That is a certificate which contains the expected server name as known by the client from the URL to use. So I'm extracting that information and I'm doing all the unicode decoding and so on and the UTFS and super gate pairs and there's a lot of complexity and it's all done in T0. And I'm supporting the simplest case of a wildcard name when the name starts with a star, star.google.com. So what this does not support are these small elements that basically revocation all the serial and OCSP. It won't try to download extra certificates if it will validate exactly the path received in that order and nothing else. It won't support name constraints up until about two years ago nobody supported with name constraints but still it does not. And certificate policies that have yet to find the certificate authorities that understood what certificate policies are about because none of them produces anything which is compliant to the standards. And yet it's safe because whenever there's an unsupported extension which is not critical the validation will stop because that's exactly what X519 mandates. So you won't be able to make it accept a path that it should not. The problem is just in the other direction. There are paths which are valid but use features that are not supported and will be rejected. So normally here I would talk about a number of attacks on SSL but of course as predicted I don't have time so I just show a summary. Here are 12 kind of attacks which have been published and described on SSL. And if you want to write on SSL library you have to understand what all those trails are about and what to do about it. So we talk about timing attacks. PUDOL basically means that you should not use SSL3 because there's an unfixable problem in the padding for CBCC encryption. So bare SSL does not support SSL3. It would be relatively easy to support it but no I won't do it. Crime is interesting in that it's a fundamental question, information theoretic fundamental point. That is encryption is very good at hiding data contents but not at hiding data length. And compression is a mechanism by which the length will vary depending on the data contents. So compression which was including in SSL optionally it necessarily leaks information on the data since it will vary the length that the encryption won't hide. So the only way out of that issue is not to support compression. So be it, no compression. DHC parameters is about usage of DFL man. And in fact it's very hard for the client to validate whether the ephemeral DFL man parameters from the server are correct or not. So I took the extreme case of not supporting DHC at all but only the elliptic curve version which are more efficient anyway. Smaller and faster and it's aware of the future. And for instance in the Microsoft world IS never supported DHC server suite. So that's not a and they survived. I mean Microsoft still exists. So it's not a it's not a suicide not to support DHC and so on. So I have to sort. So these slides are already up on my website. So you could see my my splendid the schematics my story. So a lot of explanation of how we are doing things. And these attacks such as beast are have been published with the nice acronyms over the years. That's a current trend. Five minutes. Okay. And for each of them, there is a content measure. So for instance, it is possible to implement TLS 1.0 securely even in contents where that that attack may apply. It's called the one and minus one split. So there is a current fashion to say that TLS 1.0 must be disabled as well. And it is expected that next year PCI DSS will enforce a ban on TLS 1.0. But it's not a bad idea to use newer version, but it's still possible to use securely TLS 1.0. It's it requires some tricks, which I have implemented. And that works reasonably well. So and last one, there are a number of Cypher suits which have been defined in SSL and especially export Cypher suits, which were meant to comply with the export regulation in the United States before year 2000. And which basically were weak and purposely weak so that it could be broken. So there's a simple solution about weak cost Cypher suit is don't support that. So I don't. But a number of existing libraries still support weak Cypher suit and there they have been broken. And that led to existing to real attacks. So I've got two or three minutes to just say that SSL sucks. So it's a it's a bad protocol. It has problems, especially records. Everything you send in SSL is a bunch of records. It's organized as record and each record is the unit of encryption. That is whenever encryption is applied, it's on a whole record and an integrated check a Mac is applied on the whole record. So from the receiving side, you must buffer the complete record in order to verify the Mac before being able to process the data which is in it. So client or server must have the ability to buffer at least 16 incoming bytes. So if you remember, I wanted to do everything in 25 kilobytes of RAM. So more than half is just that buffer for a single incoming records. And if you want to have a full deplex policy that is being able to send and receive records more or less at the same time. And you must be able to do that to support HTTP because HTTP says that you can send several requests in a row. Then you'll need two buffers, 16 kilobytes in each direction, so 32. So for embedded system which are constrained in RAM, for instance, the one you have around your neck has 16 kilobytes of RAM only. This is a problem. So there is an extension which has been defined to use so that the client may ask for smaller records but is badly defined. So it's not very reusable and even less supported. So for instance, OpenSSL does not support it because it's client driven. So the server has no occasion to enforce a smaller. So I am working with some other people on defining a new extension which will allow smaller records but it's not done yet. And it will be sometime before it's, sometime in years before it's actually deployed anyway. So there are a lot of historical crafts from older versions of SSL. There are renegotiation which is an old remnant of older version. And TLS 1.3 is being defined recently but it's an emphasis on the web. So it's defined with features that requires more RAM and it's not for embedded systems. It, there are the cookies and session tickets which can force the client to be able to allocate up to 64 kilobytes of RAM on demand. So and it's really designed to be a companion to HTTP2 which is meant not for embedded system but for web browsers. So it's expected that the prediction not only for me but for other people such as Peter Goodman, that TLS is in the process of forking. That is the new version is for the web and TLS 1.2 will remain basically forever for the non-web applications. So if we want to make better, you would have to start from scratch, we start from scratch for a new protocol with less options, an easier to parse encoding of messages and normalizing on smaller buffers and so on. So I've got some idea to write a new protocol but I know it's a terrifying amount of work so I don't want to do that right now. Anyway, just proposing a new protocol would not be sufficient to make it accepted. The main strength of SSL is that it's already there so people can start using my library and it will interpret. Whereas defining new protocol is nice but in 10 years from now they won't still won't be used.