 Hello, my name is John Kubatowicz. I'm one of the faculty here at UC Berkeley working with Joey Gonzalez, Anthony Joseph, Ken Goldberg, and Ken Lux. And I want to tell you a bit about a project that we have that's fundamentally changing the way that people interact with their information when building large distributed applications. As a bit of a motivation, I wanted to give an application that I'm sure you're all familiar with, which is FaceApp. Suppose that you basically have an image of yourself that you want to see what it would look like if you were older. You might pull this application onto your phone and probably select that it has full access to all of your photos, which might actually be a little bit worrisome. But then what does it do? Well, there's some cloud service somewhere that it's going to send these images. And who knows what other data to. And then hopefully you get back an aged version of your face. But have you thought carefully what's going on here? Because along the way, there may be many listeners. There may be malicious elements in the cloud service itself. And so by the time your data is packaged up and sent to the service and come back with an aged image, you've basically had copies of your packets potentially grabbed by multiple places. And the service itself may be ferreting away your images and other secure data for future reference. And you have no control over that. As a flip side, you might ask yourself a different question, which is when you use somebody else's information that they've given you, how do you know it's even authentic? I'd like to give this example of what happened in 2015 when a team of researchers figured out basically how to exploit a firmware update bug to take control of a Jeep remotely over the Sprint network. And they were able to make it speed up, slow down, and veer off the road entirely remotely, which is a little scary given the complexity of cars today. And really, this is the rise of machine-to-machine communication, which is pretty much everywhere. Robotic systems use models that are generated elsewhere by machines. Firmware updates, safety protocols, navigation systems, all of these things are done in a machine-to-machine fashion. The internet of things, which has a wide variety of definitions, pretty much takes sensor items, processes them, and automatically acts on them in cyber-physical systems. This is pretty worrisome, because in all of this, you might ask, do you know where the data came from? That's a provenance question. And do you know that it's still ordered correctly or has none of the bits have been changed? That's an integrity problem. And really, these are all offshoots of what I like to say the rise of fake data, which is kind of exactly like fake news, but worse, because it's basically data that gets automatically acted on. And you corrupt some data in one place, and suddenly, a car goes off the road, or a remote factory starts making something it wasn't supposed to. So our approach is about changing the conversation for how people interact with their information. And the inspiration here is shipping containers. Before 1956, these shipping containers didn't exist. And the way that things were shipped from one port to another was Longshoremen would play Tetris with the items on the ship, and they'd get unpacked at the other side. And when the shipping containers were invented, all of a sudden, these containers had a standardized size. They were all locked in a certain way. They had a standard ID or a unique ID. And as a result, now all of the ships and trains and trucks and cranes and all of that infrastructure can now handle one format. And now you can basically ship something from one place in the world to any other place with no problem because of this standardization. So can we use this idea to help? And the answer is yes. We have something we call a data capsule, which basically is a standardized metadata. That's this green piece here that the infrastructure understands how to deal with. And within are potentially encrypted transactions. There's no constraint on what they look like, except for the fact that they're all linked together, kind of like a git tree or a hash history. And they're all signed in a way that basically makes sure you know where the data came from. And you could think of this as a little small piece of a blockchain. And you can put anything in here. And the underlying intrafit structure is basically now possible for it to move these around anywhere in the world in a standardized way, assuming that the owner of the data authorizes that. So for example, you could imagine this example here where we have a bunch of training data for how a robot might grasp things. And that model gets trained up in the cloud, and it's put into a data capsule. And now it's secure because it's encrypted and it's signed by the producer. And then it's shipped off to the edge where the only place that it can be unpacked and utilized is in a secure execution environment where the keys for unpacking it are given. And what we know is we know it's authentic because we can check the signatures. And any updates that happen can only happen in authorized environments as well. All right. And so basically our approach is a platform approach. We imagine that the data capsule version, the ships, trains, trucks, and cranes is something we call the global data plane, which allows these data capsules to float freely, assuming they're authorized. And that global data plane has been designed in a way that allows many service providers to interact to provide service. And also above the global data plane, it's not necessary for application writers to use the data capsules directly. Instead, they can use the data capsules viewing them as file systems or databases or whatever your favorite storage pattern might be. And so a physical view of the GDP, I'm not going to go into this in great detail, but greatly mirrors the internet. That's no surprise. But you can have a series of trust domains in different places, like cloud centers, homes, factories. There are transit networks in the middle of the GDP. And then the global data plane basically can host these data capsules. And how does that work? Well, we have switches that can move the data capsules around as well as queries to them. We can have peering. We can have name resolvers to help you find them. And as a result, now, the clients can use these data capsules anywhere from anywhere to anywhere else if they're authorized. So we also have a project called Fog Robotics, which is explicitly looking at that model building example I gave you earlier, where models can be distributed in data capsules and used securely on the edge. So why should you come to Berkeley? Well, because come change the world with us. If we succeed, we change the way that people construct distributed applications. Safe computation on data anywhere, any time, but only if I want it to happen. What do we hope to accomplish? We want to exploit the networking effect, pun intended, of standardization and federation to allow basically everybody to use secure information. So what do we as faculty do? Networking distributed systems, machine learning, secure edge computing. We're interested in a variety of new applications. And we really hope you will consider coming to Berkeley. We're excited. And you should too. Thank you very much, and I hope you have a great day.