 Welcome for secure decentralized optimization. Great. So if you're debating whether to be in this room or up in the main hall, we can cut to the chase. The topic of my talk will be discussing how to solve hard math problems off-chain. How can we actually address problems in machine learning, market clearing, operation scheduling? How can we use smart contracts to be able to bring those problems into consensus on a global optimum? And then how can we guarantee security, feasibility, and optimality of this type of problem? The general outline, I'm going to be talking about optimization problems. And I don't mean compiler optimization here, but a couple of different classes of problems that I work on. I'm a PhD student at UC Berkeley. And then talk about how we can integrate smart contracts to really improve the application of these types of optimization tools to a wider set of problems. I come from an energy background, energy markets, and the electricity market specifically. So I'm going to be discussing an application of this that I've researched in electricity market formulation for small microgrids. So optimization, just so that we're all on the same page. Who here has heard of convex optimization before? Oh, good. A few hands. That's great. So in general, optimization problems have the goal of minimizing or maximizing some function, the dark curved line here, subject to some constraints. So we can't be in this dark blue area. In general, these are pretty hard. We might end up in this local minima. We'd like to end up in the global minima. Convex optimization problems make this a lot easier. We can prove that a type of problem basically looks like a bowl. As long as we follow the slope of this downwards, we'll end up at the right global optima. So there's a lot of tools that we can bring to bear when we know that a problem is convex. One of the more interesting components of this is that we can think of it in a decentralized optimization form. Conventionally, we just have one system operator who has all of the information about all of the data, all the constraints in the system. So even if these are, say, individual households who have mobility constraints on their electric vehicles, individual participants who have different investment goals for portfolio optimization, individual factories that have constraints on production, one system operator would have all the knowledge about all those constraints and run a centralized optimization problem. This is great if all of those entities are owned by the same person. It's hell if you end up with people who don't trust each other. And to reach a global optimum, you need some sort of way of decentralizing this problem into a trustless, cooperative system. So decentralized problems break that into a set of local optimization problems, these blue dots, that then coordinate back and forth with an aggregator until they reach consensus on that global optimum. And this is really important. We're actually able to guarantee that we're going to hit the global optimum even though we start up with these local problems. Why would we bother with this? So one really beautiful part of this is that we can actually take a hard, big problem, break it into a set of analytically solvable sub-problems. So you can get a thousand X speed up in computation time, even while running it on the same computing node. Another part is just breaking a hard problem that has, say, greater than quadratic compute complexity and break it into a set of small problems, even though you're breaking it into n smaller problems, you've drastically reduced your compute time. Then the other part is that if there's those privacy and trust issues, you don't want to share all that information with the global aggregator. So the aggregator, that person who coordinates between all of these nodes, has a really easy job, fortunately. These nodes have hard compute problems subject to local constraints about which they have private information, but the aggregator only has a very simple arithmetic update problem in a lot of these formulations. So even though this can be something that takes minutes or even hours to compute off-chain on a CPU or a server cluster, this is just a really simple update problem. So this is just adding all those numbers together and broadcasting the update to the rest of the network. It's sort of like a market. So individuals have a sense of their value functions for a particular good. They can propose how much they'd like to sell or buy. The market operator aggregates all those together, estimates what the market clearing price would be, broadcast that price out to everybody. People can update their estimates of their bids or asks for the object. And when you iterate back and forth with that, you're able to reach consensus and for these comics optimization problems, we can prove that that's the global optimum. So going from these decentralized, sorry, decentralized problems to a fully decentralized model, one of the challenges with this model of architecture is the aggregator has to have direct lines of communication with every single node. So not a problem when you just have three nodes here. For the types of electricity market problems that I consider where you're trying to control say 10 million electric vehicles, you don't want to have one server that's trying to talk to 10 million electric vehicles all at once. You need a more decentralized model. So fully decentralized optimization has become a tool that's in vogue in particular for these really large, decentralized optimization problems. One of the challenges though is that you're only sharing information with your neighbors and those neighbors might be lying to you if they've been compromised. So while you might be communicating with your neighbor, that neighbor may have had his electric vehicle hacked by somebody and that hacker is attempting to cause a blackout in the electricity grid, for instance. So we have a question of how do we identify the nodes that have been compromised? How do we reach stable operation and then can we guarantee feasibility or optimality of the system when these nodes have been compromised? So we have this bi-directional communication between each neighbor. So we just see messages that our neighbor is sending us. In this case, the red node has been compromised. So rather than the normal operation of the system, the red node could distort its private optimization problem. So it solves some problem that doesn't actually reflect reality. In a machine learning or data science problem, this could be that you're throwing in junk data and the network doesn't necessarily know that. It just sees some message from you. It might not have been entirely expecting that message but it can't know whether that's because you're acting strange or whether that's because your upstream participants have sent you bad data. So there's difficulty in identifying where the flaw in the system is. Also, a compromised node can just send a bad signal to its neighbors, like unidirectionally. And if it knows that there's certain types of problems that this node cannot address or like certain areas where this node would be infeasible, it can send a signal that's always going to be infeasible. This node basically freaks out, is never able to reach consensus, it's never able to reach feasibility and the system fails to converge. Then also a compromised node could just inject noise into the system so that at every update step, you have a noisy signal and you're failing to converge because of that. So what can we do to prevent these modes of attack? What we'd like to do is perhaps bypass any node that might be compromised. If you're trying to make your system redundant or resilient to attacks up to end nodes, this ultimately would mean that you need a fully connected graph with secure communications channels. The other way that we could address this would be have some other centralized database that provides a global information layer on all the state variables. Now this is starting to look a little bit more like a blockchain. So taking that information from peer to peer communication and having that passed trustlessly between all the peers on the network gives us global insight into the state of all the variables in the network. This allows us to perform security checks on the signals that we're receiving from our neighbors and thus identify the nodes that have been compromised. This is really powerful because it means that we no longer have to trust either an aggregator, somebody who's holding the database or just trust our neighbors but we get the benefits of both of those. So one of the things that's really interesting about this is we think of blockchains as these iterative computations that reach consensus. Also decentralized optimization also is the same model of iterating between local problems with an aggregator update step or the rest of the network and reaching eventually consensus on the system. The research that I've been doing is highlighting the similarity to be able to create a sort of combined model where these local update steps are coordinating with a smart contract to be able to create or to process a secure verifiable update step. Ultimately reaching a solution that is auditable, you can see how you got to that solution. You can prove the optimality of that and know whether or not there were attacks along the way and then that solution can be stored on the blockchain for everybody to access. Thinking about a couple of the different ways that these different models fail. So we first talked about aggregator coordinated decentralized optimization prone to monopoly distortions. So anytime that there's a market operator, a scheduler who has their own incentives and you may not trust their incentives, you wouldn't want to trust some monopoly aggregator. They're prone to communication dropouts. If there's say an outage in the electricity grid, that monopoly or that centralized aggregator wouldn't be online, wouldn't be able to coordinate between these local nodes. Conversely, the fully decentralized models addressed those but had exposed us to these other weaknesses. And by using a blockchain to provide a global information layer between all of these, we're able to have the benefits of the centralized aggregator problem while also avoiding the pitfalls of the fully decentralized problem. So let's take an example. This is where my research is in electricity markets. And this may be a field that's unfamiliar to a lot of you, but as we looked ahead at larger penetrations of electric vehicles and photovoltaic solar systems, we'll end up with these big swings in power flows on the local distribution network. So going from a model where you can always plan for the power consumption because you know that people just have small loads like toasters and electric light bulbs, to a world where you have, the sun might go behind a cloud. Everybody's photovoltaic generation cuts off. At the same time, people's electric vehicles are plugging in. And you have these massive imbalances in generation. Utilities would like to be able to address these imbalances in a way that minimizes the cost of electricity generation. That cost to minimization means that we're dealing with an optimization problem here. The other challenge though, is that the electricity network is constrained. We can't just have some generators off 100 miles away that are going to make up for this. We need to be able to respect the local constraints in the system, particularly in what I'm modeling, the distribution network within a neighborhood. One of the challenges though, is that the utilities have market power and all of the people who provide your smart devices may be competitors with each other in the marketplace. So there's a lack of trust, not only between perhaps you and the utility who wants to sell you power, but also between all of the participants in this marketplace. So you might have a household that has a Tesla electric car, a Ford electric car. Both of those are competing with each other to sell regulation services on an energy market. You might have different smart thermostats who are competing to be able to provide energy services on a market, while also provide these household level services. We'd like to be able to reach consensus on the way that most equitably provides compensation for energy services that each of these devices gives. So taking that same paradigm that I described earlier, we're breaking this big scheduling problem with a bunch of trustless participants into a set of local problems where each household or each node on the network, as defined by the actual physical constraints of the system, solves a private optimization problem. This is harder than something you would want to run on the blockchain, but a lot cheaper than solving it all as one centralized problem. Those work together through this smart contract that coordinates the update step, to be able to eventually reach consensus on the best schedule for all of these devices. So the goal is to minimize the power cost for everybody, provide compensation for all of the devices that are participating in this network, and do so automatically. So we have a model of our network. This is the physical network that actually provides these constraints. That gets fed into these smart contracts. So we have the optimization problem, the smart meter, and then the optimization updates get communicated with smart contracts on the blockchain, one for actually coordinating the update step. Once we reach consensus, that can be bounced over to a smart contract that does billing and any reconciliation between expected schedule and the actual realized schedule. And so with that, we're able to eventually identify when we've hit consensus on the global optimum, we can guarantee the optimality of the system along the way we're performing security checks to make sure that none of these nodes have been compromised in a way that will push us into an infeasible region, cause a blackout, and ultimately end up with basically a self-scheduling, utility-free electricity system. So just to sort of show some pretty graphs because we have results to share, we can see how we're actually shifting around the behind the meter batteries, the shapeable loads in the purple lines, deferrable loads like smart washers or smart appliances, and moving all of those around to make sure that we're taking advantage of all of the photovoltaic energy that's being pumped into the grid. Across the whole node, we're actually monitoring the physical constraints of the system at every point in the distribution network. We're making sure that the voltage doesn't go out of our acceptable bounds. This is the physical constraint that defines the system. And we're able to converge on a schedule that is only suboptimal in price by 0.4%. So really close to what you would get if a utility were solving all of this privately. And then we can also guarantee that the constraints are satisfied, that this isn't going to blow up, that our electricity system is still going to stay stable. So we've solved a hard optimization problem with physical constraints, with an algorithm that respects local privacy, and can guarantee the optimality and the feasibility of the solution. Challenges and opportunities looking ahead. So these mirror the three big areas that Vitalik was talking about yesterday. One is privacy. So I'm very interested in the sessions tomorrow on zero knowledge proofs, to be able to identify how can we actually make sure that the information we're sending back and forth is not sensitive. So in the model that I just described, the actual power consumption for each household ends up getting stored on the blockchain. There's concerns about, say, a thief being able to identify when somebody is away from home and thus could break into the house if they know what everybody's expected power consumption is. The other part is there's a potential for data leakage through the iterations back and forth. So that even if the information that you're sending is not fundamentally sensitive, that the information that is leaking out of the system by how much you're changing your estimate at each of these steps might indicate something about what your private optimization problem is. So the security, we've been working on like provable guarantees of the feasibility with untrusted nodes. One big comment here is that these proofs generally have much weaker conditions than BFT would give. So in these constrained systems, your ability to guarantee feasibility is going to be dependent on the topology of how faulty nodes are connected. And in general though, because it's going to be dependent on that, we can't guarantee the same one-third proof against attack that BFT systems generally have. Scalability, so currently looking at converging in about 20 iterations of the system, each of those iterations requires update steps or variable updates sent to the aggregator node, performing that, or the aggregator smart contract, performing that update, and then broadcasting that back. So each of those is at minimum two to three secured blocks. So in net, this could be a relatively long, slow process. That's an area where, again, like looking at plasma or other ways of speeding this up could improve the system. With that, I think we have some extra time, sure. One of the general techniques for dealing with blockchain scalability is only going to the blockchain when necessary. Are there equivalents here where you could sort of determine that some subset of the network is behaving badly and only go to the blockchain in that case? Yeah, so one area that I'm interested in looking at is state channels to be able to identify basically what are the portions that we need to update, and which portions of the updates need to be saved to the blockchain, which portions can just be done through state channels, and not have all of that constantly back and forth on the main blockchain. You mentioned there are some nodes that might be untrustworthy, but it seems that every node corresponds to a household, and every household has, you know, is in their own interest to act, you know, in a trustworthy manner, so why is that an issue here? So if you think of, say, the hack against the Ukrainian electricity grid, where an outside party was interested in actually just bringing down the whole grid for malicious intent, you don't care about the private household reasons for the attack, there might be somebody who just wants to cause a blackout. So this is a really unique challenge of this type of work. The private problem of the household is dependent on constraint information fed to them by all the devices. So somebody could hack an electric car and pass bad information to the, even if it's a secure enclave that does the actual computation, if that secure enclave is getting garbage, then it needs to, you effectively can't expect that the messages being passed the network are trustworthy. Great question, though. Yeah, maybe one last question. Any more questions? What about offline, when the devices are being offline? So this is a great question. I'm interested in looking into models of sharding and being able to split this up. That is one of the big benefits of the fully decentralized algorithms is that you don't care where there's a break in the system, those can still reach local optimality. When you have a blockchain, being able to bring those two basically forked chains back into consensus is something I haven't yet looked at but want to. Thanks, Eric.