 My name is David Brown. I work for Lunaro in security, and I wanted to give a little bit of a talk. I've been kind of bubbling around in my head for past six months or so that I've just called IOT TLS, why it's hard. It's kind of a bit of a background on this. I've worked with, I work on the project, Zephyr. We've pulled in embed TLS. We do a lot of things with trying to figure out what's wrong and what's insecure and go through a lot of demo apps. And my first pass going through all of them is none of them were actually secure. And so I just wanted to kind of give a little talk to explain actually what makes this hard and why it's not a real trivial thing. So to start with, what is IOT? I did a search for that exact expression, found a definition, maybe as good as any other. Not everybody agrees on what it means, but basic ideas, things that are connected to the internet, hence the term. The thing to really take away from the definition is that we're talking about things that are connected to the network that talk to each other without human intervention. That's the big difference from at least this definition that unlike a computer you would sit at, these devices are on their own. Now let's break that down a little bit. It starts about the internet of things, and so we have a bunch of things. These are things that I could find 3D models of on the internet easily. They may not be really typical things, but they're things that are connected to the internet. And they've got now the ability or some reason need to communicate with other devices over the network. What comes with that is our criminal. There's a much more opportunity now for these devices to be used for nefarious purposes. So if you go into Google and you type in IOT hacks and look around, there's another talk that really goes into depths on this. Five worst examples of IOT hacking and vulnerabilities in recorded history. You type this search in, I got 1.8 million results on IOT hacks just for that phrase. What are these worst examples? Probably the Marai Botnet is kind of famous. There's devices that were used to attack name servers on the internet just because they were vulnerable because they're connected. Then a lot of these really nasty things of cardiac devices that are hackable, this baby heart monitor web cams. The Jeep one, if you want to look into that, is really scary. Multiple systems on the car were needed. The ultimate vulnerability is the attackers were able to control the steering and the brakes of the vehicle remotely without needing any otherwise access. So clearly it's important security is important. I don't know if you know who James Mickens is. He gave a keynote talk at the recent Usenet Securities Summit and his comment was that IOT security is not interesting. The reason for this comment is that as far as he's concerned, there's nothing different about IOT. We've learned things the past 20, 30 years about how to do security and we should just do the same things. I don't necessarily agree with him, hence the point of this talk. But the other thing is he says that TLS is the only good thing that we have and I don't know if I really completely agree with that either. But the idea that we have a solution, we have ways of solving these problems of security. We just need to know how to do it. The IOT space brings in a lot of developers that have been used to devices that aren't connected to anything. And now we're talking to other devices and all these vulnerabilities come out. And we need to learn how to do this right. So it brings us to the topic of this talk. What are we talking about? The word IOT is very vast. It describes large devices. A lot of people when they think of IOT, they think of this, the Raspberry Pi. It's something they typically have a couple gigabytes of memory. There's flash drive on there. They run at gigahertz. They're kind of ordinary computers. They're just small. And this isn't really the point of my talk because if you want to do TLS on this device, it's running Linux. Link against open SSL. Write your application like you would on a desktop OS. And as he says, it's not very interesting. So I mean, there's a lot of IOT devices that are like this, my wonderful smart fridge. Things connected to mains. There's no reason to not put a Raspberry Pi in there or a full desktop machine. A lot of these smart fridges have. Just use TLS on this. Just do what you would with a regular connected device. The other end of the extreme, the other class of devices at the extreme are these really tiny types of IOT devices. You know, I'm thinking of things with like tens of kilobytes of memory and maybe a hundred kilobytes of flash and CPU that runs at 10 megahertz. And the example I gave here is a thing of a light bulb. This is not the kind of thing you really can run TLS on. We don't even have enough memory for the buffer for a minimum packet in TLS, let alone space to deal with an RSA key. So security is still important for these devices. It's a little beyond the scope of this talk. What you typically do is these devices talk with something much simpler to an edge device or gateway. And that's usually a machine running Linux, something like that. It uses open SSL and it just talks to the rest of the cloud, the internet, using things that we've learned how to do. So what are we interested in? What's the topic where I want to focus on in this talk is the toaster. And yeah, I pulled this out because it was a pretty nifty little model that I found without having to do any work. But there's a real IOT toaster. Somebody makes one of these. It's a completely pointless device, but it exists. The idea here is this area in the middle where maybe we have hundreds of kilobytes of memory, maybe a megabyte of flash and 10 to 100 megahertz CPU. We can't just say, oh, we can't do TLS because we don't have enough memory. We can, but it's hard. Another way to put it might be we can barely do TLS. We have enough resources, but just barely. And that's what I want to focus on in this talk is this class of devices right in the middle where we can do TLS, but it's kind of hard. So let's talk a little bit about TLS. How does TLS? The people who are younger than me would say. So typically networking stacks are divided into layers. Let's think of this thing on the left as an IOT device with its little network stacks of an application, maybe HTTP, MQTT, some kind of protocol. You have a TCP stack, IP, hardware, physical interface. Maybe this is a Linux machine in the cloud. It's got the same kind of stack. But you notice there's a green box, which is the kernel barrier on the Linux machine. And our IOT device, maybe it doesn't have kernel and user space. It's kind of small. That's kind of a lot to ask of it. So these might be system calls over here. These are maybe just function calls. But this is an insecure network protocol. We can do an HTTP-centered data, but it's completely interceptable. There's no security. So where do we put TLS? We just stick it in there. This is how it's done. This is how OpenSSL works. This is how most of these libraries work. The HTTP library, we kind of hack it up. Instead of calling open or socket, accept, connect, and read and write, well, we call those, but then we wedge these calls into the TLS library. And we do the same thing on both sides. And now we are communicating with TLS and everything's secure, right? So what's going on when you do this? So as I said, we have a socket connection. It's a stream. Data goes back and forth. It's reliable in the sense that it will get disconnected if there's something wrong and it will retry packets. But it's not secured in any way with cryptography. So TLS sits in the middle. These little boxes are packets sent back and forth with data that goes over the TCP connection that has enough information in it to establish a secure connection. And an interesting hint, when I wanted to come up with this diagram, I searched the internet for TLS handshake. And I found dozens and dozens of diagrams all different. And mostly wrong. This is probably wrong too, but this is based on a packet dump of me doing a TLS connection between my machine and I think Google or something. This is actually the data that went back and forth. There's some flexibility in terms of how this works. But basic idea. Client sends this hello packet, which has some cipher suites that it's willing to use. This random number sends it to the server. The server comes back with its own random number. The cipher suite that it chose, it sends over a server certificate. There's a key exchange part of this. And then the server says I'm done with my part of the hello. The client does its part of the key exchange. There's this magical packet called a change cipher, which is the barrier to when we now are doing an encrypted based on our negotiation of the cipher suite. We do this handshake. The server decides to do a change cipher. Do the encrypted and then everything down here is protected with that cipher suite. So there's a lot in here. What's important, the cipher suite here is how these two sides agree on what they're allowed, how we're going to communicate. And the cipher suite describes a lot. A particular algorithm for encryption and for authentication of the packets, in addition to how do we authenticate who the other party is. These are kind of the three important parts of communicating securely. We need to be able to keep the traffic from being observed by other parties. We need to know who we're talking to. And we need to make sure the data hasn't been tampered with when it's going back and forth. So the reason these are communicated is sometimes we find weaknesses in protocols. And that's kind of a difficulty with IoT is we don't have space in our code for a nice large TLS library. We usually have to pick a cipher suite we want. So if that thing becomes vulnerable and the server refuses to use that anymore, now we can't talk to anybody. Conclusion from that really is make sure that your software can be updated. Discover vulnerability. You need to push a new version out that supports another cipher suite that's robust. There's also these random numbers. And I could probably give a whole talk on random numbers. In fact, you could actually attend a whole conference on random numbers. And it's a topic you could make a career out of if you feel like it. What this amounts to is that the random numbers are critical to the security of these protocols. With commonly used cipher suites for web connections, the randomness from the client is the most important thing. It's the typical cipher suite with RSA that's used for most of the certificate exchanges. The entire keys that are used for this private communications come from this client randomness. If your client doesn't generate good random numbers or they're somewhat predictable, well, somebody intercepting your communication can now pretty easily figure out what's going on. If you go back to that search of IoT hacks, this is one of the biggest ones. There's a lot of devices out there that don't generate very good random numbers because it turns out it's hard. What it really comes down to is if you're building an IoT device, make sure the SOC you're using has support to help you generate these random numbers. It needs an entropy source, something unpredictable that can generate enough that's not predictable and you won't be able to just decode it. The scary thing about it is if you don't generate good random numbers, everything looks just fine. The packets don't look any different. The key exchange doesn't look any different. It's not until somebody actually knows and looks into this and figures out what's going on that it then becomes trivial. Rather than say you have a 128-bit key and you have to do two to the 128th computations, or half of that maybe, to decode what the key is and to guess it, if your random number source is just coming from a few little bits of entropy in your system, maybe you have to do two to the 16th work to try all of those. You suddenly went from this thing that's not solvable within the lifetime of the universe that we have left to something that can be solved in a few minutes and your communications are completely insecure. The other thing is this server certificate. There's two parts of this. I'm only showing one of them here. The client can actually do a certificate too. It's not very commonly used on the web because, well, certificates are hard for people to manage. It makes more sense in an IoT device and a lot of the cloud providers do that exactly. A notable example would be Google's IoT cloud service where instead of using the TLS handshake, they have this token that has to be signed and sent over. The important thing is the server sends over a certificate which contains, who am I? It's usually a domain, an address, some time validity, and then it's signed. Next place where a lot of IoT clients are vulnerable is they don't check the certificate. You could not check it at all, which basically leaves you completely vulnerable because anybody can generate a certificate and sign it, and if you don't care that it goes with the host that you're checking, then you don't even know who you're talking to. But that certificate then has to be signed by somebody you trust. Your root certificates have to be stored in your device. We'll get to in a minute why that's challenging. So I should move on because I do have more slides. Just a summary of what I was just talking about. The Cypher Suite Agreement is important. The verification of the certificate, it's not optional. You need to do this great hallway talk conversation. Somebody pointed out that TLS done incorrectly is worse than not using it at all. At least with no TLS, you know that your communication is insecure. There's nothing like having this false sense of security that, oh, we threw TLS at it, everything is secure, and yet it's no more secure than if you didn't use it at all. So what makes this hard? We're talking about these kind of memory constrained devices. A big thing is memory. The server certificate is on the order of kilobytes, but when you have tens of kilobytes of memory, that's a lot. You have maybe 100 kilobytes of memory. We need to store the server certificate. TLS has buffers. There is a 16 kilobytes send and receive record length that's part of the TLS spec. There is an extension that allows that to be negotiated, and I've never seen it implemented. I tried poking around a bunch of servers, and I have not been able to find a server that would let me negotiate a smaller, including services that are supposed to be talking to IoT devices. They still are assuming that I have 32 kilobytes of RAM readily available to buffer my protocol. Another one is time. Several things, getting randomness can take time. RSA operations are expensive. We need to be able to have enough computational resources to do the connection, and we'll get more into that. One thing I missed on memory. I mentioned the server certificate. The way these work with TLS is there's a certificate authority, a set of trusted entities that validate who hosts are on the internet, and they sign these with digital signatures, and your web browser has a list of these that are trusted currently, and every time you update your web browser, it updates the list of who's currently trusted. Right now, that list is about 200 kilobytes, which in an IoT device in one of these small devices, well, that's a quarter of my code space just for this certificate. Some good approaches would be for maybe the vendor to say, the cloud provider to say, well, you need to use this root certificate. We'll always sign it with this one. Back to the Google example. The best I could get out of them was, you need to use this list of 60 kilobyte list of certificates, which is still pretty crazy. And what ends up happening is you pick one, and then you make sure you can update your device when they have to move to a new certificate, because these do expire. They have time ranges on them. As I mentioned, randomness is key. Randomness is a critical thing to doing all of this. All right, so what's going on with these APIs? This is another area where this gets hard. The way TLS usually works, and this is based on OpenSSL or Embed TLS. They're very similar. You start out with an init call, and you make a bunch of functions that are too small to print nicely on a slide that's readable in a presentation like this. It doesn't really matter what those are. You have to initialize your random number generator, the SSL stack. You have to configure things. You have to initialize the certificate chain that you're using. You have to initialize your entropy source. You have to feed entropy to it. You have to set things up. You have to set the hostname that you're talking to on and on and on. The thing is, if you get one of these wrong, you leave it out, you don't configure it correctly, and everything still works. It's just, oh, we're now not actually verifying the certificate, or oops, we forgot to feed it randomness, but we're always using the same keys. So that's kind of messy. We'll get to that in a minute. Then we have sockets. We actually open a connection using whatever socket API we have, and then we use this set bio thing, which effectively hooks an underlying transmit and receive into the TLS stack so that the application then can do TLS read, TLS write, and the library knows how to go call those functions to transmit data. We start with a handshake and get into the application. What's important to realize is that when the application says TLS read or TLS write, that can result in more than just one call to read and write. There's a protocol that sits above the socket API, the TCP connection itself, the stream, and when you do a read, well, maybe I have to write some data, because maybe there's something pending, or when I wrote data, I'm waiting for a handshake back before I can write the next data, and the stuff that sits on top of it that's important. In the end, what this means is this library is pretty much designed to be called from one thread, that if you want to do receive and send at the same time, you basically call these read and write functions with telling it you're doing non-blocking, and then you return, oh, I need to wait for the underlying socket to have data, then you go off and you call pull on the socket, and when you get data, you go back and try them again. It's important just because that's how the API is designed. So this is messy. Everybody has to do this. Every time you write an app, you do all of this, and you get it wrong. Pretty much guaranteed, you say, well, okay, I'm not. I'm just going to copy from some other app, except they got it wrong, too. And maybe you aren't keeping up on their bug lists to see that, oh, they fixed how they were using it, and I copied off of that one, because they weren't keeping up with the app that they copied it from, and the bug and vulnerability that was found in that. So it would be nice to get rid of all of this. There's kind of two ways that we can do this. So one of these I'm calling is stream abstraction, and if you've ever done TLS from something other than C, or C++, if you've used, and you pick it, Python, Go, Java, Rust, any kind of list of these more modern languages, this is what they do. They basically abstract the socket interface with some kind of stream product, some kind of way of transmitting data, and when you want to do a TLS connection, you also go through the same abstraction, you just start out by saying, well, I want it to be under TLS. This is really clean. The other way you can do this is you can try to bury the TLS connection under the socket API. This is done. Note that this is not really done in Linux. You can go find a presentation on how this is done in Linux, but it's not in the kernel. This is a project that somebody was working on. It does have the advantage of keeping the same API, but realistically the layering, it's the wrong place. If you're familiar with the way the layers are described, you have the transport layer. TLS is kind of the presentation layer. It sits between the application and it sits there, which is why OpenSSL did it the way it did it. The real problem there is that the API is convoluted and complicated, not that we put it in the wrong place. As far as that was concerned, well, naturally we chose the second approach. There's reasons for this. This isn't just because let's do it the wrong way because that's cool. There's a lot of reasons. There was already support for offloading, which does kind of the same thing where maybe we move TCP to a separate piece of hardware, but maybe that's hardware also supports TLS. It kind of makes sense to do that. The other really interesting reason is that abstractions are scary. I wish this wasn't true, but when you get to the embedded space, sometimes people see the word stream and it's frightening. So let's use the socket API because that's really familiar. Otherwise, it might look like classes and that might be too much like C++ and that'll scare us. So what's the problem with this? Remember I mentioned this complicated API that kind of assumes that you can't call back that everything is done from a single thread? Well, POSIX API isn't that way. You have this socket, you do a connect, and I have these functions send and receive or syscalls and you can call both of them from different threads. A lot of protocols, MQTT is like this, you have to be waiting for data to come in while you're also periodically sending data. And that's kind of messy. It doesn't really work. So in order to implement this, and as a status, this is not done yet, you have to actually do some of this really complicated stuff so that you have the socket API with connect and send and receive but when you do a send well it's got to call down to TLS read because we're not going to write our own TLS stack, we're using embed TLS. It's not re-entrant. So we need to call it in blocking mode use of mutex or something so that we don't actually re-enter into it. And then if it comes back and says that it would block we then queue ourselves up for something to wait. And then we return. Then the important thing down here is when the network packet actually comes in we then wake everybody up so they can figure out if there's more to do. This is needed in the general case. The other way to do this, the current way is you just call the socket API and use pull the pull call that's done the same way it is on Linux machines and you only call these when they would return data. So either of those works excuse me. So where are we at now as far as effort concerns? We have this sockets with TLS under it that merged not too long ago. The idea is that they've extended the setsock opt with some extra calls to give it certificates and effectively do that huge block of things that were initialization. The ones that are actual data you need to set like what root certificate do I trust maybe do I have a client certificate what's the host name I'm connecting to. That gets passed through and then you make a call that sets this up as a TLS connection and it does all the complicated stuff under the hood. There's stuff coming in Zephyr. Right now Zephyr is an intermediate transitionary stage with its networking stack. The networking stack previously was kind of sad at a weird fragment level. You would request data to be sent when you got received data you'd get a bunch of pointers to where all the pieces of your received packet. The idea was to avoid having to do extra copies of the data pretty much decided kind of like with the TLS API we can actually do that right so let's just implement send and receive the socket API but there's work underway to change the rest of the code in Zephyr to use this and what this means is that means things like MQTT Time, JWT we're in the process of taking these calls to use this new kind of API but this is not actually merged yet with the changes that are pending. So that's kind of what I have as far as TLS goes and why it's hard and what we're doing about it with Zephyr I wanted to give a couple of minutes to see if anybody had any questions about anything that we covered here to point out that we do have a microphone so that it ends up on the recording so that I'm not just answering silence but opening it up for questions You touched on it briefly when you were talking about your certificates and being important to check them but if you don't have an RTC how do you establish that initial time zone in a safe way? Well the simple answer is you don't that's not the best answer the one that Google uses this whole java web token is even more strict because these tokens have to be you're supposed to know time within about 5 minutes you have to sign the thing with a timestamp that has a validity of about an hour whereas certificates are on the order of you know days, months that kind of thing that if your time is within a day or two you're probably okay what it really comes down to is it's important to check the timestamps on the certificates it's not as important as the other aspects I mean it's a less of a vulnerability because there are weak certificates that have been out there but that's not where most of the things have happened what it really comes down to is we haven't come up with a good solution to secure time you can put it on your hardware if you've got an RTC that works but if you don't have an RTC kind of the best you can do is go ask somebody what time it is on the internet and hope that they aren't lying to you I haven't seen a good solution to that even talking to the maintainers of NTP kind of haven't given a good solution to that it would be nice to have a good solution I wouldn't say it's a good solution but we assume that the root certs we have were in a valid time zone for the certs we have already on the box and then we do TLS talk to another web server and the time that it tells us while I'm trusting the server anyway so if you're telling me the time then I'm going to go with your timestamps but I'm not sure it's such a good idea but maybe there's nothing better actually we get the time from the same server that you are not near certs if we compromise the entity and if you're asking if there's going to be a time as well then you are in a way validating its original plan to skin the certs here then let it be but yeah so it's not a good idea I think the time server has to be a different entity but that doesn't always help either I mean if you're able to do a man in the middle attack you can intercept anything and the places these IOT devices get installed that's kind of something you just have to assume can happen you need to have the the challenge right and pretty much everybody's kind of there is work on secure NTP it's not really going anywhere currently because it's hard and difficult to do anything else question wise well thank you for coming oh we do have a question does the TLS help with anything or the problems are equally hard it's just UDP but that's it I wouldn't say the detail helps anything other than the problem it's trying to solve which is how do you do this with UDP and if anything it makes it slightly worse there's some extra complexity that there are some nice problems that we've solved with TCP dealing with denial of service types of things that you suddenly don't have those things helping you anymore so DTLS has to do some extra work to protect against that kind of attack but other than that it kind of is very similar to just doing TLS we haven't really gotten to the point of implementing this yet so basically it was the first stage we do want to do DTLS since there are protocols that need it I have a question so basically from what you're saying is that no one up until now did not start working on let's call it like a small version of an open SSL something that can be used on IoT devices so if you have the medium devices that you were talking about earlier no one who is really concerning of doing this kind of library let's call it which is able to run the TLS on these devices and be secure so that you don't copy from are you talking about the medium devices or the smaller ones because for the small ones I think no one should bother should bother that much about what can I do with there are actually things you can do with the smaller device you can offload your network stack to another processor to a dedicated device that handles that part for you that you ask it to make a TCP connection to a host and it does that now you have to make sure that that's secure we did one with an offloaded device that turns out doesn't actually check the root certificate of the server you're talking to so kind of pointless actually to use that device but as far as libraries yeah there's some libraries out there that have addressed different aspects of the problem even if you take something like I mean I'm most familiar with embed TLS there's um Zephyr has also a tiny Crip library which doesn't do TLS but for the smaller type of devices you may need a kind of lower level protocol and it will define specific simplified versions that are only for talking to an edge device but yeah I mean we do have code we just need to use it and we need to make it usable in an easier sense I think that's the real problem with them is people do a TLS API they just look at what OpenSSL did and do basically the same thing and that's kind of why we wanted to put it in here so that it's easier to use and possibly easier to use correctly does that help? yeah it does it's a developer you always rely on with Java support or for the OpenSSL so we don't really think about yeah which is a good thing that you can do that it just gets harder when you don't have enough memory to really do that alright anything else otherwise thank you for coming