 OK, so we're going to get started. So thanks everybody for coming today. I'm David Tardini. I'm a principal software engineer, engineer lead at Microsoft. I work in a project called Azure Sphere. Azure Sphere is an end to end solution for building secure IoT devices. Today I'm going to talk about some of the design principles that we've been thinking about. It's called the seven properties of highly secure IoT devices. And my top level summary would be if you don't want your device to be pawned say five to ten years from now and you're connecting to the internet, you should pay attention to this talk. So as I'm sure all of you know, microcontroller units are everywhere. So there's something like 9 billion MCU devices built and deployed every year. And what's happening now is that many are being connected or will be connected to the internet. This started a while ago. This picture here is of a MCU with radio on die. This is from 2014. So you could buy these in volume back in 2014 and have Wi-Fi and on die. 2014 is like forever in computing terms. So this is just accelerating that eventually we believe that most devices will be connected to the internet. And there's sort of compelling reasons to do this, which is that connected devices create profoundly better experiences. So I'll give you an example. Suppose you have a refrigerator. You could be a consumer. You could be a business. You know, you could be something like a restaurant. How do you know that the compressor in your refrigerator needs to be replaced? Well, the old way is that, you know, you come in one day and if your restaurant like your refrigerator has died, your food is spoiling, you can't serve your customers. If you're a consumer, you might come home and find your ice cream has melted or your food is going bad overnight. The new way would be auto diagnosis. The new way is you get a message or a letter or a phone call from your refrigerator manufacturer who says, we noticed that something's going wrong with your refrigerator. Let's take care of this. And you might wonder, like, well, why do you need to connect this thing to the internet? Like, why would you need to do that? And the reason is that things like machine learning and aggregating large scales of data may allow large amounts of data, may allow the manufacturer to observe patterns that they couldn't really see or predict ahead of time. So you can gather data and then you can statistically observe what kinds of failures are occurring in the field. And this is a much better experience, especially if you're a business to be able to monitor your equipment and get ahead of things going wrong. The problem is, so what happens when you connect your device to the internet? So I didn't say this. Dr. James Mickens at Harvard said this, the internet is a cauldron of evil. So all kinds of bad things can happen to you. These are just headlines that we've grabbed from local papers. Or from, I would say local papers, we've grabbed from the papers. You see all kinds of things, like security experts warn of dangers of connected home devices. Your smart fridge may kill you. You can use it for ransomware. You can use it for spying. Somebody could pawn your fridge, for example. So to make this concrete, I'm going to go over an example of an attack that's actually happened using IoT devices, which is the Mirai botnet attack. This happened October 21st and 22nd in 2016. And what people did is they used everyday devices. They were talking like webcams, baby monitors, and they took out the internet on the east coast for basically a day. They used about 100,000 devices, and this was not rocket science what they did. A lot of these devices had bad default passwords, and so you could just use the bad password to take over the device. There was no early detection that this stuff was going to happen, and worse yet, there was no remote update capabilities for these devices once they were taken over. So you couldn't just go out as a manufacturer and be like, I'm going to fix this. I'm going to take care of this. The Mirai attack really highlighted some risks, and it really showed, it really caught attention. It actually ended up in the New York Times. And on the first date was in the technology section of the New York Times about how hackers used new weapons to disrupt major websites as in take out the east coast internet. By the second day it moved up to the politics section, and it said new era of internet attacks powered by everyday devices. So there are a few risks to pay attention to here when you connect your device. The first is that device security is now really a socioeconomic concern. And this is illustrated by this thing moving from technology to politics. So this is a concern that government is going to pay attention to, and it's a concern for society. The second thing is the attack relied on well-known weaknesses. So this wasn't somebody mounting zero-day attacks. This wasn't a nation state. These were people using things like bad default passwords. Future attacks could be much larger. So this is where, if you extrapolate, things could look much worse. So imagine what would happen if somebody used 100 million devices. Suppose somebody was able to take a device in almost every house in the U.S., for example, and take it over and use it. You could clearly take the internet out, and you could take the internet out for a long time. And the future attacks could have much worse effects. This disrupted access to websites, it disrupted internet usage. You can imagine that hackers could take an entire product line and brick it in a day. Or they could take a product line and actuate a device and cause property damage or loss of life. So this connecting your devices to the internet is a serious and challenging issue. I would actually claim that there's a moral concern here, which is that people could really get hurt. This isn't turn off your PC and reboot it. This is more serious. And the question is how do you build a secure device and what are the best practices? So Microsoft has been on the front line for more than 25 years now of protecting customers and their devices. So here we have a timeline of various events that happened on the internet as well as things that Microsoft has been doing. I mean, all of you, I'm sure at some point have used Windows. Windows has been a big target for hackers. Microsoft has been around when Trojans first started appearing on the internet. We've been through DOS attacks, the Nile service attacks. We've been through people developing worms for mobile devices. And so as things have gotten progressively tougher, Microsoft has responded to this. And so now if you look at Windows 10, if you talk to people who do attacks for a living, they're like it's a pretty difficult environment to operate in. So the question is what lessons have we learned from 25 years of being on the front line of trying to defend customers and their devices? The first thing that you need to take away is that your code is going to have bugs. So here I've shown a graph. This is grabbed from a government website, the National Institute of Standards and Technologies. And this shows security vulnerability causes from 2008 to 2017. And I know when you put a graph up like this, everybody is probably trying to read this. The top blue line is actually like buffer overruns and buffer errors if you're wondering what that line is that's shooting off there. But the interesting thing is to look at the different kinds of attacks there are. There's authentication issues, there's buffer overrun issues, there's SQL injection, there's cross-site scripting, there's improper access control. It's just really easy to get code wrong and you have to accept that your code is going to have bugs. The second thing I think that you need to accept is that your device will be hacked, which is hackers are smart, they're creative, they're persistent. You have to assume that after you deploy your device there is a possibility that it will be hacked. There are some lessons to take away from this. The first thing is that security is foundational. You have to build it in from the beginning. So trying to graph security on as an afterthought isn't going to work. More importantly, there are actually concrete things you can do to build more secure devices. And here I have a list of seven properties of highly secure IoT devices. If you think there should be eight, I'm interested in hearing about it. These are the seven based on what we've distilled from our experience. There's actually a white paper online. These slides are online as well, so you don't have to try to memorize these seven things. You can go read our white paper or you can go just look at the slide deck here. The things that you need include hardware root of trust, defense in depth, a small trusted computing base, dynamic compartments, certificate based authentication, some way of failure reporting, and renewable security. And I'm going to talk through what each one of these things means. So a hardware root of trust, you want hardware to protect your device identity. So in practice, what this means is that on your hardware, you want unforgeable cryptographic keys that are generated and protected by the hardware. You might ask, like, why does it have to be done in hardware? Because, well, if you do it in software, at some point somebody is going to steal the private key and then you have a problem. The hardware should actually hold the private key and the public key can be used to establish identity of the device to others. Now at some point you have to register the device. This can be done sometime during device manufacturing. But with the private keys in the hardware and protected by the hardware, you have to mount a physical attack on the hardware to get those private keys. And in many cases you would probably destroy the hardware trying to extract those private keys. You also want the hardware to secure software boot. So in other words, in practice we want a boot ROM that's going to ensure that your OS loader has the integrity that you expect. That it is the OS loader that you expect. And once you have ensured integrity of the OS loader, then the OS loader itself can ensure integrity of everything else or the other software that's loaded. So you have a chain of trust here. And you want hardware to attest to system integrity. So the question is yes, when the device booted, it was running your software. Now it's been running for a day. Is it still running your software? Is it still running your OS? You want hardware to do this. And I think it's really important to understand there does need to be a hardware root of trust. If you try to do this in software, it ends up being defeated. If somebody can compromise the software, they can basically compromise the identity of your device. The second thing that you want, and people who have some security background is going to say of course this is what you want. You want a small trusted computing base. So the trusted computing base for software is the software that ensures the security of the system. So the very first thing is you want it to be as small as possible. And it's pretty simple. Less code equals fewer bugs. You also want to reduce the attack surface. You want to make it harder for people to get in. In practice, you actually want to use something like trust zone or have a security monitor or a hypervisor sitting underneath your kernel. And the rationale is pretty simple. The kernel is very big, and so therefore trusting the kernel completely to be free of bugs, that's a risky thing. You can have a smaller security monitor that ensures that things stay isolated and that critical hardware stays protected. You also want dynamic compartments and defense in depth. And I think this picture here shows it all. It's an old style castle, and the castle is surrounded by layers of defenses. So you want compartments so that if somebody gets in, you can limit the reach and impact of the security breach. So, you know, in an RTOS, if somebody gets in, they sort of own the RTOS. If you have proper processes, they might own that process, but that doesn't give them access to the entire system. This requires hardware and software support, you know, for processes. You need some form of memory protection. The second thing you want is defense in depth. So again, if an attacker gets in, you want to be able to defend against the attack. And so this means multiple layers of defense. So compartments or processes give you one way of defending against a breach. You also want multiple mitigations. For example, you might want internal firewalls and buses. You certainly want a network firewall. You want protection against drop attacks at ASLR, and you want no execute bits and those kinds of things. So passwords. So passwords for devices are problematic. It's hard to get consumers to set them in the first place. They can be stolen. They can be hacked. They require administration, part of getting them to be set. And so if you go back to the Mirai botnet attack, part of it was just that there was a bad default password. But even having a default password is like a bad idea, because I can just find out what it is if it's on all devices and then make use of it. The interesting thing is that with the measures that I just described before this, we know who the device is. And we also know that it's in a good state. We can ensure that it's booted and is running the software that is expected to run. And so what this allows us to do is use certificate-based authentication instead of passwords. So the idea is that you have a trusted authority. And in the picture here, I just showed an hour-going one way. Let's see, my pointer is not working so great, but hour-going one way. In fact, there would be some kind of protocol where you establish that the device is who you expect it to be and that it's in a good state. And then from that, a trusted authority can issue you a certificate, issue the device a certificate, and that can be used to communicate to services. You can make the certificate short-lived so that, you know, if a compromise happens and the software is compromised, eventually you're going to, within a short time period, you're going to lose the ability to access services, or certainly we're going to know something is wrong. So once you have those measures in place, you also need failure reporting. So failure reporting is gathering information from devices at scale to detect when something is going wrong. So the question is if your devices are being hacked, do you know they're being hacked? Do you know something is going wrong? So for example, some of the stuff we see at Microsoft, zero days can be detected because some of them don't result in... They don't just result in taking over all systems. They might provoke a crash, for example. So you might just see an abnormal number of crashes in a component that you weren't seeing before. So some attacks themselves cause crashes. You can also observe things like probing, unusual activity happening. And finally, it gives you insight into just sort of the raw fodder for some attacks, which is software flaws. So the fact that your software crashes is probably an indication that something is wrong. So once you have failure reporting, though, you need what we call renewable security. So you need to be able to update the device to address these security threats. And so you need to have cloud infrastructure that allows you to update devices. You actually also need to have the technical ability or you have to be watching what's going on and actually updating your software. And you have to have the technical ability to do the software update and, for example, prevent attacks like rollback attacks. So meeting the seven properties is challenging. You know, it's not easy to do. You have to make sure in the security realm that you've met all of the properties, your security is really only as good as your weakest link. You have to have the operational capability to recognize and do something about threats when you see them. And then you actually have to have the ability to distribute and update patches. And what we see in the MTU space, for example, a lot of devices are just missing update. And if you look at phones and you look at PCs, that's a standard practice nowadays. So now I'm going to switch gears. It's hard to do this, but I'm going to talk about how you can actually realize the seven properties with Azure Sphere. This is a case study meant to show you that you can actually realize these seven properties. So as I mentioned, Azure Sphere is an end-to-end solution for securing connected devices. It consists of certified chips. So chips with a hardware root of trust. It consists of an OS and what we call the Azure Sphere Security Service, which is a cloud-based security service for updates and failure reporting. So the certified chips are chips that come with a built-in hardware root of trust. And the hardware root of trust is created based on Microsoft's learnings from securing free generations of Xbox consoles. So Azure Sphere actually defines a template for secured chips. I'll talk about the first chip that is shipping, which is the MediaTek MT3620. And this isn't vaporware, this isn't slideware, there are devkits available now. You can go to this link if you want to find out how to get one. So the MT3620 has an internet connection. It has a Cortex-M processor, which is what you would traditionally see on a lot of MCUs, and this is for real-time software. But then it has what we call the Microsoft Pluton subsystem. So the Microsoft Pluton subsystem actually provides the hardware root of trust. So in the Pluton subsystem, that's where the private keys are generated and maintained, for example. And then, of course, this is a crossover MCU, so we have a Cortex-A processor in here for general-purpose computing. This chip has four megabytes of SRAM on it and 16 megabytes of flash. So once you have this chip, of course, you need an OS to run on it. So I'm going to go over our OS architecture. Our OS architecture consists of a number of layers. At the bottom most layer, of course, is the Azure Sphere chip, which is providing you with your hardware root of trust. So minimal runtime, the Pluton runtime, which is just mediating or providing access to the services on the Pluton chip. Then we have a security monitor. We have a custom Linux kernel. So the Linux kernel is based on Linux. It's highly cut down to fit into the memory footprint we've got. On top of that, we have on-chip cloud services. So in other words, we provide update, authentication, and connectivity. And then finally, on top of that is your application. And we have two kinds of application containers. We have containers that will run on the Cortex-M, so a program that will run on the Cortex-M, and programs that will run on the Cortex-A. So in addition to having a hardware root of trust and having a software stack that embodies best practices, we have what we call the Azure Sphere Security Service. So the Azure Sphere Security Service is a cloud-based service for failure reporting as well as updates. It also provides you with a certificate-based authentication for communication so you can be sure that your device is who it is, who it says it is when it talks to your service. So some important things about this. The first is that software updates are mandatory. So the OS has to be updated. For devices, there might be some timeframe in which the device doesn't have to be, you can defer the update. But at the end of the day, you have to update the OS on the device. And we also provide hands-free deployment of updates to applications. So you can basically press a button and your application will be distributed to your devices. So the takeaway that you should have from this is device security is like a stool. It really requires three legs. And if you remove any one of those legs, you remove the hardware root of trust, you remove the secured OS, or you remove the secure cloud service which provides updates, you're going to end up on the floor. So some key takeaways. Security is foundational. So you need to design and build it in from the start. I suggest that you use the seven properties when evaluating security claims. Is your device really secure? And how will it stand up over time? Read the white paper. It's on the internet. And if you're wondering, you want to see a real example of this in practice, check out Azure Sphere. It's an example of realizing the properties. So with that, I have a few minutes for questions. So I'm happy to take some questions right now. I can defer to Ryan who would know that answer. Yeah, I mean, the integrity of the system relies on you have the ROM which is checking to make sure that the boot ROM checks to make sure that you have a signed loader. And from there, you can establish a chain of trust. Let's see. So obviously the Linux kernel is free software and open source. So I would say most of the software is free software and open source. The security monitor is not open source at this point. And the boot loader is not open source. But everything, the kernel itself, you can go and download our kernel from the internet. It's right there. And a lot of the libraries that we're using for applications, the application programming environment are open source as well. Okay, any other questions? I'm sorry, what was the... That's a tough one. I mean, specters, I'm sure what's in your mind. You really have to try to... So there's some things in the chip that I didn't mention, first of all, which is there are internal firewalls and things like that. So there's more protection in there than I mentioned. You know, at the end of the day, you really need to try to physically isolate things. So I think my thoughts are basically that's where having a secure core, like why do we have a separate core for the subsystem as opposed to relying on a general purpose chip. It's because that's harder to attack. So I think, you know, thinking about hardware and thinking about in your design of hardware points where secrets could be exposed or information could be leaked is an important thing. And I think physical separation is a good thing. Let's see, that's a little complicated to explain here. But basically, your device calls home. Your device has to regularly call home, call to the security service. And with Azure Sphere, by the way, you have to use the security service just for the security parts if you want to hook the thing up to Amazon Cloud, you can. But your device has to call home regularly, and then we have to issue a new certificate. So it is an interesting challenge and we're issuing lots of certificates. But the key thing is that your device calls home to the security service and establishes its identity, you know, the security service establishes it is who it claims to be. It's a registered device and it's running the software that's expected to run. One of the things that can happen is that you may find that in fact, you know, the usual concern would be your device sits in a box for like two years and then somebody powers it up. So you may find when you call home that in fact you do need to do an update. That's basically how it works. I know they're working on the baseline. I'm not super familiar with it. So I don't think I can give you a good answer on that. But I'm happy to talk about that offline. Are you talking about the baseline that ARM is working on, by the way, or where I think, so it's, I think we're, hopefully we're involved in some of those discussions, but I'm happy to talk with you offline about that. Well, I think when they, so they're not, so they're not directly connected to the internet. I think, you know, you would have to make sure there's a transitive route of trust there. So, you know, in a factory setting, of course, you know, you may have devices that are internal. I mean, I think the important thing would be making sure there's a transitive route of trust and there's some, there's some hard problems in there about, you know, who's applying the updates and things like that. The thing that I would say is it, it's important that you have a route of trust and that you also make sure the updates are actually happening. Because the natural thing in that kind of setting would be, well, we don't really need to update. The other comment I would have is it's, it's important. Let's see, I mean, there's this scenario about you have some other device, maybe you're on a secure network or something of that form. You know, I think that things like the, the attack service may change. You might be able to assume that the attack service is a little bit more limited. But I think the principles in general are going to hold true there. Okay, so I'll be around. Any more questions? Okay, thanks you guys. Thanks for attending. We'll be around up here to chat after the talk.