 Good morning, everybody, and thank you for bearing with us through some technical difficulties. My name is Milind Kulkarni. I am a professor in the Elmore Family School of Electrical and Computer Engineering. I'm also the interim head of the department. So it gives me great pleasure to welcome you all to the Engineering Frontiers Lecture. So the Purdue Engineering Frontiers Lecture Series is modeled after our Distinguished Lecture Series where we invite world-renowned faculty and professionals to Purdue Engineering to encourage stop-provoking conversations and have them share some of their ideas and research with us. And so while the Distinguished Lecture Series is planned well in advance, we kind of book a year out every now and then, we have an opportunity to welcome truly distinguished people to Purdue. And so the Frontiers Lecture Series gives us an opportunity to do things a little bit quicker and get things going a little bit faster. And so welcome today to the Purdue Engineering Frontiers Lecture Series. To introduce our speaker, I'd like to invite Professor Lunalu to the stage. She's the Riley Professor of Civil Engineering and the Associate Dean of Faculty for the College of Engineering. Today I'm truly honored to introduce the Professor Vincent Poore. Professor Vincent Poore is Michael Henge Streeter University Professor at Princeton University, where his research interests including information theory, machine learning, network science, and their application in wireless networks, energy system, and related areas. Among many of his publication in the areas is a recent book, Machine Learning and Wireless Communication published by Cambridge University Press. Professor Poore is the member for both US National Academy of Science and the US National Academy of Engineering and is a foreign member of the Royal Society and many other national and international academies. He has received IEEE Alexander Graham Bell Medal in 2017 and holds honorary doctorate degree and professorships from a number of universities in Asia, Europe, and the North America, as you can see, were truly honored and excited to have Dr. Vincent Poore with us as our distinguished lecturer. So please join me, welcome Dr. Poore, to get on the stage for his speech. Thank you very much for that kind introduction. It's really great to be here. The last time I was at Purdue was in 2005, so it's been quite a while and I'm really happy to be back, so have this opportunity to speak. So today, the title of my talk is a little self-explanatory. I'm going to talk about the combination of machine learning and wireless networks. So machine learning and wireless networks are, of course, two of the most rapidly advancing technologies of our time. And neither of these technologies really is new. I mean, wireless goes back to Marconi and even the modern wireless era goes back to the 1980s. We have 5G now. We're working on 6G, the research community. And machine learning also goes back to Marvin Minsky and even before in the 40s and 50s. And there have been many resurgence of AI and machine learning over the years. In the 90s, there was a huge interest in neural networks and so forth. So these are all, in a way, old technologies, but ones that continue to advance very rapidly. I'll say a little bit more about why in a minute. But one of the things that has happened in recent years and we're expecting to happen more as we move towards sixth generation wireless networks is sort of the confluence of machine learning and wireless networks. And that's what I'm going to talk about part of that today. So when you think about machine learning, by the way, I hope the lighting is OK so people can see this. All right, is that OK, Chris? Fine. OK, so when you think about machine learning and wireless or when one thinks about machine learning and wireless, there are two sides of that sort of connection. One is the use of machine learning to optimize wireless networks. And it's a very natural application of machine learning because wireless networking involves many types of things that machine learning is good at. For example, routing, flow control, all kinds of optimization problems, channel decoding, coding, all these are statistical inference problems, channel estimation. And so these are things that machine learning is really good at. And wireless networks, of course, generate examples. So if you want to learn from examples, the data rates in wireless networks are now gigabit. And so you're getting exemplars very, very fast. And even in the routing side, you're getting packets very, very rapidly. So there's a lot of opportunities to learn. There are a lot of problems from which are useful for learning and for which machine learning is good. In the past, these kind of problems were solved using model-based techniques, which has worked very well up till the present. But now networks are becoming a lot more complex. A number of users, a number of terminals, is exploding. And just the complexity of the problems are getting much more difficult to handle using model-based techniques. So machine learning is a natural way, given the advances in machine learning. It's very natural to apply that in wireless networks. So another side, which I'm going to talk about today, though, is the use of wireless networks as platforms for machine learning. And I'll get into the motivation for that later. But basically, terminals, even these, are pretty sophisticated now computationally. And they're also sensors. They gather a lot of data. So there's a lot of data out of the wireless edge. There's a lot of computation power. And one of the big trends in networking is mobile edge computing, pushing computing out of the edge. And machine learning, of course, is one of the main applications of computing. So it's very natural to think about, let's use wireless network as a computer for machine learning. And so that's what I'm going to talk about today, part of that. That's, of course, a big area as well. And along the vein of shameless self-promotion, I'll mention this book here, which also was just mentioned on machine learning and wireless communications, which just came out last year. And it looks at both sides of this coin. But I'm only going to talk about the second one here today. So I'm going to talk, first of all, I'll motivate a little bit, again, further why we want to think about using wireless networks as platforms for machine learning. And I'm going to talk particularly about federated learning, which is a type of learning which is not particularly for wireless systems, but it's going to tailor made for wireless networks. And I'll talk about two things there. One is scheduling, and the other is privacy. And then at the end, I know there are a lot of students in the audience, so at the end I'm going to talk about some research issues. And most of what I'm going to talk about is not going to be the most cutting-edge thing. It's just to give you an idea of what some of the problems are. And then at the end, I'll come back and talk about some things that people are working on right now. So that's the program for today, and hopefully I'll be able to keep on time. So let's start with some motivation. So here's a very broadly brushed and oversimplified, perhaps, view of the state-of-the-art in machine learning today. So as I mentioned, machine learning is not new, but what has happened now, which was made in machine learning, so prevalent are two things. One is there's a lot of data, a tremendous amount of data available, collected by devices like this, by sensors, collected on the internet, and so forth, which was not available during the last upsurge in the study of machine learning in the 90s. There were no smartphones. The web browser was just invented, and so forth. So it wasn't the same culture we have today in terms of technology. And also, the computing power since the 1990s has grown dramatically. The amount of power you can get in even a small device like this, but certainly centralized with the cloud, centralized web services, and so forth, is a tremendous amount of computing power out there. So a sort of standard view of machine learning is that you have a lot of data, and machine learning algorithms have access to all that data. And this is happening in the cloud or in some data center or something like that. So it's a centralized problem being solved in one place with all the data together. And of course, you don't really today, you don't even really have to know much about machine learning to use machine learning. There's a lot of tools out there. There's TensorFlow and PyTorch and so forth. So you really just have to know how to use a computer and have an internet connection, and you can do machine learning. And also, of course, a machine with lots of data of things like deep neural networks and so forth, the computational complexity is huge. So there's also, of course, a lot of specialized hardware for that. NVIDIA, if you watch its share price, you can see why the GPUs are very prevalent. Google also makes TPUs and so forth. So this is, again, a very oversimplified view of the state of the art in machine learning. So there's some problems, though, where this centralized model is not really adequate for certain applications. And some examples are shown here. One is automation. So things where you have autonomous driving, as for example, or Industry 4.0, where you have automation of a factory, mobile apps, and first responder networks, and things like that. So these are applications where, first of all, the data that's being used is already out at the edge of the network. So it's collected by sensors, smartphones, and so forth. Also, and that data may be plentiful at the edge of the network. And in order to do centralized machine learning, it has to be backhauled through the network. And often, that's a wireless network, and then capacity is limited. So backhauling data to the cloud is impractical sometimes. But also, another perhaps more important thing is latency. So these kind of applications, autonomous driving, and so forth, factory automation, latency is very important. And so the less time you have to spend backhauling data, the quicker you can respond with your models, and so forth. Also, privacy is an issue, of course, with mobile apps. A lot of the data that's being collected is personal data, location, and so forth. And health care, there's a lot of privacy issues. So the less you have to share data back into the network, the less likely it is that that data will be compromised. And then finally, scalability. So if you have hundreds, thousands of terminals, say, in an IoT internet of things network, just to be able to scale machine learning up, it's helpful to be able to do all that out at the edge. And then finally, locality. A lot of applications, you don't really need to collect data over a widespread area. For example, in autonomous driving, you really just need more local data. So the models can be local. You don't need to build big models back in the cloud. So all of these considerations motivate moving machine learning out to the edge of wireless networks. So there are three, you can think of three network models for wireless networks. One is the one I just described, the standard machine learning where data is collected out at the edge of the network by end users. And that data is uploaded into the cloud. So data is collected there, and machine learning takes place there. Models are built. And then either the models are shared back down into the end users, or the models are applied in the cloud, and in decisions, or what have you, or shared back recommendations, or whatever you're doing, are shared back down. So this is what I call standard machine learning before. So another model is federated learning, which I'm going to talk about today, where the data is collected by end users. And then the end users actually build on themselves, on their own devices, a shared model, or they want to build a shared model of something. And then they build their own smaller models, or own models based on their smaller data sets, and then share those up with an aggregator, which could be an access point, or a base station, or something else. It could be the cloud, actually. And then the aggregator puts all these little local models together in some way, and then sends out a global model back to the edge, or to the end users. And then those end users then can update their models, and it can iterate until you get some kind of convergence. Yes, that's federated learning. And then another type of learning at the edge is decentralized machine learning. I'll talk a little bit about that again. But decentralized machine learning, again, you have end users who are trying to build a common model. But they don't have an aggregator. So any communication is peer to peer, and any aggregation that takes place has to take place in that setting. So that's another thing. There's a lot of interesting work there. I'm not going to talk much about that, although I'll say something about it at the end. OK, so this is the setup. We're going to talk about the middle model today. And so federated learning, by the way, was not invented for wireless networks. Federated learning was invented as a distributed machine learning algorithm invented at Google. Not thinking, I don't know if the, I can't put myself in the mind of the inventors, but they were thinking mainly, this is a way to keep data out in the processors and to share the workload and so forth. But it's tailor made for wireless networks because of the distributed nature of the computation and so forth is naturally there. So I'm going to talk about federated learning over wireless channels. And I'll start out by talking about scheduling. So the basic idea, as I said before, is to enable end user devices to do machine learning without centralizing their data. And the basic idea are the data sets remain on the end user devices where they're gathered. And so the raw data never goes up into the network beyond the end user device where it's collected. Training is done on the end user devices. So there you have multiple end user devices trying to train a shared model. And then federated computation. So these local models are uploaded to an aggregator and that aggregator collects the train weights or gradients depending on how you want to do that. So the model is updated and aggregated by the aggregator and then that's sent back down to the end users who try again, update the model and iterate back and forth that way until convergence. So this is federated learning. So now if you do federated learning over wireless networks then you have to worry about some other things. So in wireless networks the communication between the end user devices and the aggregator has to go over wireless channels. So immediately you run into some physical constraints. So first of all, the wireless medium is shared and resource constrained. So because it's resource constrained there only a limited number of devices can be selected for uploading models in any given round. And secondly, it's a wireless channel. Those of us in wireless we spend our whole careers trying to figure out how to overcome the impairments of the wireless channel. So wireless channels are not reliable because of interference and other things, fading and so forth. So we have to worry about two things when we're doing federated learning over wireless channels. One is how do we schedule the devices? How can we choose which devices at each time to upload their models given finite resources? And second, how do the interference and noise affect training? Okay, so those are two issues I'm gonna address now looking at how scheduling, how those things affect, how scheduling affects the performance of federated learning in this environment. So I'm gonna do that, illustrate that using three scheduling mechanisms. And one of them is random scheduling. So the setup here is we're gonna have K in user devices and they're in channel resources that can be used. So you might think about those as frequency channels or time slots or something like that. But basically, N is less than K. So at K might be hundreds and N might be dozens. So the access point, the aggregator has to decide at each update round which K of the N devices to choose. And random scheduling is just what it says. At each update round, you choose that random K out of N, N user devices to update their models, okay. Round Robin is another standard scheduling mechanism where you take all the N users, divide them into groups, so they're K over N groups. And then you sequentially, on each update round, you just sequentially go through and select each group one at a time. And then there's proportional fair which actually uses some of the physical characteristics of the wireless channel. At each update round, the access point or the aggregator selects N out of K N user devices with the strongest signal and noise ratios relative to their signal and noise ratio history, okay. So what you're looking to do here is to choose the ones that have the best chance of their actual communications getting through at any given time. Okay, so what I'm gonna do is I'm gonna compare these three in terms of how they affect convergence of federated learning over wireless channels. So how can we do that? Well, first of all, we can, in order for an N user devices model, local model to be part of the aggregation, two things have to happen on a given round. First of all, it has to be selected. So the scheduler has to select that N user device. And secondly, the data packet containing the model or the gradient that's being sent up to the aggregator has to be transmitted correctly. So one way to model that is to say, well, the so-called signal to interference plus noise ratio at the aggregator for transmission from an N user has to exceed the decoding threshold. So you can ignore the equation here, but basically there's a measure of the quality of the received signal in an interfering environment. And if that quality is greater than some threshold theta, then the packets can be, the data packet can be decoded. That model can be added to the aggregation and so forth. And otherwise that packet is lost and that aggregation is, that model is not part of the aggregation, okay? So theta here is a parameter. So if theta is really small, that means most packets get through, which means it's a really good link. If theta's large, most, a lot of packets don't make it through, it's a bad link, okay? And so how can we compare schedulers? Well, we can look at the number of communication rounds required by the federated learning algorithm to reach a certain level of accuracy. And here I'm gonna talk about so-called epsilon accurate solution. So now be more specialized in the problem we're trying to solve. So a typical machine learning problem is you're trying to maximize a concave function or minimize a convex function, one way or the other. And if that function is strongly convex, then the so-called dual problem and the primal problem have the same solution. So an epsilon accurate solution is one where the duality gap, the gap between the dual solution and prime solution that you compute is less than or equal to epsilon, okay? So I think the main thing just to think about that is just a measure of quality of the final solution, epsilon. So two parameters, theta and epsilon so far. All right, so for those three schedulers, it turns out we can get lower bounds on the number of communication rounds needed for achieving epsilon accuracy. And they're given here, and don't worry about the details, they're just, I just put them here to show you that they exist. But the main parameters here are theta, which we introduced before us, the SI and R threshold, epsilon, the degree of accuracy that we're trying to achieve. And then alpha, which is the path loss, so it's a wireless channel, so there's diffusion losses and it goes according to a path loss exponent. We abstract the performance of the end users and the machine learning into a precision level beta. And then N is just the total number of examples that we're trying to learn from. So they're distributed over all the end users. And there's these lower bounds or lower bounds, we don't have upper bounds, so we only have lower bounds, I'll address that in a minute. But these parameters tell us what, at least in terms of lower bounds, to expect in terms of communication rounds. So don't worry about the formulas, but we can look at a plot of these. And so here's a plot of two different scenarios of those lower bounds. So the one on the left is the high SINR case. So this is a case where mini packets are lost. So the high SNR means high SNR threshold. Mini packets are lost. On the right, we have a low SINR threshold, so most packets get through. And what you can see here, so remember there's random scheduling, round robin and proportional fare. And you can see that on the left, where we have a less desirable link, proportional fare works much better. So small is good. This is the number of communication rounds to converge. So proportional fare works much better than the other two, at least in terms of this lower bound. And then on the right, where we have a really good channel, a really good link, round robin works better than the other two. Okay, and so there's some intuition here, because if we have a poorer channel, a poorer link, remember proportional fare takes into account receive signal and noise ratio. So there we care about the signal and noise ratio a lot more than if we had a really good link. So proportional fare works better than the other two, which don't take into account anything about the wireless channel. On the right though, where most packets are gonna get through, what really matters more is that we touch all the data more regularly, right? And so round robin assures that you touch all the data equally. And so we can see that intuitively, if the wireless channel is really good, we don't care so much about it. If it's really bad, we care a lot about it. So that's sort of the takeaway from these graphs. So these are lower bounds, and of course they're just lower bounds. But what about machine, and this is a very pristine model, what about a real machine learning problem? Well, here we look at a problem of building a support vector machine on the so-called MNIST dataset. So that's just a dataset of handwritten characters from zero to nine. It's a very standard test for machine learning algorithms. And what this shows is versus a number of communication rounds, two things. One is loss, which is like mean squared error or something like that. And the other is accuracy. So percent right. For random scheduling in blue and round robin in red, and this is a case with a really good link. So low SINR corresponds to the right hand side of that previous chart. And what you see here is that although both of these converge to the same kind of quality of performance, round robin converges much more rapidly. And if you remember on the previous page under these circumstances, you can see that round robin was much better here. So this bears it out in a real machine learning problem, the same thing. So the basic conclusion is that you can't just choose an arbitrary scheduler. The scheduling protocol matters. So we might ask the question as to whether we could optimize the scheduler. And there are a number of ways to do that. Actually, there's a whole cottage industry of scheduling in federated learning. But one thing we might do is the following. So age of information is a metric that is used a lot in modern communication system design because it takes into account, classically communication systems were designed to minimize the distortion between what's received and what was transmitted. This classical Shannon-esque information theory. But in modern applications, particularly sensor networks with energy limited terminals, you're also interested in the freshness of the data. So age of information is a metric that's applied together with distortion in order to take that into account. So we can use that here. We can think about the age of our updates. So for each end user device, we can assign an age to its information. And if it's chosen by the scheduler on the ground, but its age is zero, and every round that goes through when it's not chosen, its age implements by one. So we'd like what we'd like to do is to minimize the average age of information. So that's one way to optimize scheduling. And it turns out it's pretty low complexity algorithm to do that. And so we can do it on every round. And here's just another example. This is the same learning problem I mentioned before, support vector machine on the MNIST data set. And what I'm comparing here is again, versus communication rounds, this wireless round Robin, which is the one that minimizes the average age, versus maximum pack, which is another sort of optimum scheduler, optimum trying to maximize the number of updates on any given round. And you can see that by choosing, by minimizing the average age of information, we can do much better. This is the bottom line. So again, I said there's a kind of a cottage industry of this. I mean, there are many, many ways you can schedule. You can look at only those end users, you can schedule only those end users, for example, who's gradients on their gradient update. So if you do gradient descent and optimizing the do machine learning, the gradient small, there's no need to update the aggregator. So there are a lot of things like that, that you can be done, and I'm not gonna go into that, but this just gives you an idea that, yes, scheduling is important. So now I'm gonna shift gears and talk a little bit about privacy. There's another problem that comes up here. I mentioned it earlier. And the idea here is that, when machine, when federated learning was first proposed, and for a couple of years after that, it was always said to be privacy preserving, because the data never leaves the end users, right? So if the data never leaves the end users, it's private, right? But then it was shown subsequently that actually, and it kind of in retrospect makes a lot of sense, that the data, actually you can infer things about the data from which a model was built from the model itself. So if you have a machine learning model, you can back through it and figure out something about the data that was used to train it. And if you think about it from an information theoretic point of view, well, of course, you know, the model is a function of the data, so it's not independent of the data, okay? And there's some pretty spectacular examples where the data consists of images, and you can look at the model and recreate the image, looks pretty much like the original image and so forth. So this is a real thing. So it's, and it's a concern, because many applications of machine learning and federated learning in particular are ones where the end users have private information in their data. So there's a lot of work on this problem as well, and one way, sort of the principle way, I think right now of looking at this is to use so-called differential privacy to protect the data. And I'll just say a little bit about what that is in case you're not familiar with it. So differential privacy is a definition of privacy that was developed for databases, and the basic idea, I'll try to describe it as best I can, but basically if you have two databases which differ only in one piece of private information or private data, but otherwise are identical, then you say that that type database is differentially private if you can't distinguish between those two with high probability. So that's roughly what differential privacy is. There are a lot of definitions of privacy, but the nice thing about differential privacy is that it's easy to achieve in practice, you just add noise, okay, and that gives you a certain degree of differential privacy. So that's an approach that can be used in federated learning. You can add noise to what you're uploading to the aggregator, and of course, that automatically creates a trade-off between privacy and learning, because you add noise and so you're gonna have noisy, the more noise you add, the worse it's gonna be. And we can see that, just here's an example, this is also the MNIST data set. Now this is a convolutional neural network, but otherwise it's pretty much the same problem. And what's shown here is basically the trade-off between privacy and learning accuracy, okay, and for this for an example. And so down below is aggregation time, so again, that's the number of rounds. And so differential privacy has a parameter epsilon in it, and the smaller epsilon, the more private. So here you can see no privacy is the top one, so that's the kind of thing we've been talking about up to this point. And then as you apply more and more privacy, you can see that the algorithm converges to lower and lower testing accuracy, okay. So it's clearly a trade-off. So it's to be expected, right. So there are some things that we can do about this. So one is just note that as you get closer and closer to the optimum, if you're sharing gradients, the gradients get smaller and smaller. So if you add the same amount of privacy noise all along, it's actually having a worse and worse effect. And then it also causes you to converge more quickly to a poorer solution. We saw that on the previous page. So another thing, one thing you can do is to over time reduce the amount as you iterate through federated learning algorithm is to reduce the amount of privacy noise you add. And there's an analysis of that. I don't have any charts to show, but there's a paper on that that came out fairly recently. This shows that you can get a much better trade-off between privacy and accuracy by doing that. You can also look at other things. So one, an important thing in machine learning is generalization error, which is how well, so you train on data and you wanna see how you get a model and you wanna know, did you overfit that model? How well is it gonna work if you could get new data on it? And that's sort of measured by this thing called generalization error. And we can use some information theoretic techniques to get upper bounds on that. And this is a bound versus the parameter epsilon that appears in machine learning. I mean, it appears in differential privacy. And you can see here that as epsilon gets larger, more private, that you get a high, I'm sorry, as epsilon gets smaller, more private, you get a tighter bound on generalization error. So of course you also get worse accuracy, but at least you're not overfitting. So adding privacy actually gives you another bonus, which is you get better generalization error. And by the way, the same thing happens with quantization. So if you quantize the quantize what you transmit up, you also, it adds privacy, it gives you better generalization error, but at the expense of less accuracy. So it's a trade-off. So I think I've got about 10 minutes, so let me spend a bit of time talking about some other issues. So this is what I've talked about so far has been pretty pristine and real problems. There's a lot of other things to think about. So first of all, you know, these devices or even other edge devices are more sophisticated or there are ones that are less sophisticated like sensors and an IoT network, but in any case, resources are limited. Okay, your smartphone has more to do than do federated learning. It has a lot of other tasks. So there's a lot of limited resources are limited, things like energy, storage, computational power. So that means there's a lot of trade-offs in these problems that I haven't talked about. So for example, if you're gonna put a neural network on your phone, how many layers? How should you quantize the coefficients? How many neurons per layer? How much energy can you spend on that? How accurate do you want it to be and so forth? So another thing, another important issue is the heterogeneity of the devices. So not all devices, for example, you're doing federated learning, I sort of talked about it like it was a synchronous thing, right? But they're stragglers. So some devices, their models converge more quickly. They have more resources to apply. So you start having problems of delay, having to wait for the last device to finish converging, right? So that's a problem. So you have to worry about that. You have to worry about the fact that the datasets are not all the same. I mean, the examples I gave, I just divided them equally among the learners. But in fact, that's not the case in practice. Every device collects its own data. Also, I assume the data was independent from learner to learner, which is not true in general because particularly in a sensor network, the data is correlated. People are collecting data on the same phenomenon. So there's a lot of work on this kind of problem. Okay, so I just wanna point that out. These are not things that people have overlooked. Another thing that people have looked at is so-called split learning, where in split learning, you divide the model. If you have a big model, you put part of it in one place and part of it in another place. So here you would put maybe a bigger model at the aggregator and then let the end user devices learn a smaller model because of some of these other considerations and then the aggregator would combine that. So there's something called split fed, which does split learning with federated learning. That's another thing. Another thing, by the way, is building foundational models so you can build a foundational model from a big data set and then fine tune it for specific applications. So this comes up in language models. You can train a foundational model on all kinds of language data and then if you wanna train it for French, you just use some French examples and then you can fine tune it for French. So this is a thing in like chat and GPT type applications. Well, those kind of models, foundational models are too big for end user devices, but you could use the cloud to train a foundational model and then use end user devices to do fine tuning. So those kinds of things are also things people have been looking at, trying to bring large language models to the edge. Another problem is communication efficiency. So I talked about the effects of the wireless channel, but one thing you can do is to think about using coding. So because the models, as you iterate through federated learning, the models become more and more correlated. So the problem of transmitting data out from the end users to the aggregator is the problem of distributed source coding. And because they're correlated, you can do a lot to improve the efficiency of that. So you could aggregate more, you could schedule more users, more end users at a time because they can use more efficient coding. There also, I mentioned this information theoretic bound. There's a lot of work on trying to use information theory to understand federated learning. And I mentioned also there's a lot of other scheduling methods. And I mentioned the one where we look at the average, look, try to minimize the average age of information, but there's also, you can use reinforcement learning to do scheduling. Just start seeing where the model looks the best and so forth and do reinforcement learning. Another problem is the amount of data. So typically you're not gonna have big data on your end user device. And so the models are sparse. I mean the amount of data is sparse. So that means that it's probably, it might be useful to try to do some kind of model-based machine learning on the end user device. So then you don't need as big a model, you don't need as much resources. So things like unfolding and so-called deep unfolding and things like that can be used to put, say, iterative algorithms on an end user device and use machine learning to train them. And then finally, I'll just mention security and privacy. Again, so I mentioned privacy, but there's also security issues with federated learning. I mean particularly it's the thing about Internet of Things application where you have hundreds of low complexity, low security devices. It's very easy for an adversary to compromise some of those devices and then to try to do something malicious to upset the learning process. So you need to have techniques for finding anomalies. For example, the aggregator needs to know that everything it's using to aggregate should be, is coming from legitimate parties. There's also, there's this issue of so-called curious aggregator problem where you trust the aggregator, but you don't wanna let it learn your private information. So there are ways of getting away from doing the aggregation in a distributed manner. So you can do sort of decentralized federated learning where the aggregation is just done peer to peer. You could use blockchain and things like that. So there are other ways of doing that. And then finally I'll mention over-the-air computation which is another thing. So I talked mainly about the idea that you have data, you have models at the end, you send it over data packet so you digitize it and send it up. But another approach to aggregation is to use so-called over-the-air computation where all the end users transmit in such a way and they use analog communication, they transmit in such a way that when it gets, when the signals get to the aggregator, the superposition property of wireless transmission adds them all together in the air. So the aggregator never sees the individual users transmissions. Of course that requires a whole another level of timing synchronization and so forth, but there's a lot of work on this too, it's called Air Comp. Okay, well that's, I think I'm out of time. Oh, I got three minutes, but I'm almost out of time and I think that's all I wanted to say. Oh, I wanted to mention some papers. So there's a lot of work, it's a very active area of research, it's hard to keep up, there's a lot of papers every day, there's something new, but I'll just mention some here. I mentioned this idea of communication efficiency and there's a special issue of JSAC, the Journal of Selected Areas in Communication earlier this year and this is an overview paper at the beginning where it talks a lot about some of those methods. There is some work in device scheduling for Air Comp as well, work here also on using compression and quantization and so forth. So there are a lot of ways, as I said, there are a lot of ways of trying to make things more efficient. There's also a lot of work on privacy and security and I mentioned a couple of papers here, here's one on blockchain, here's one that looks at trying to mitigate latency and when you have differential privacy which interrupts the convergence and so forth and then there's some other perspective, signal processing, information theory and so forth. And with that, I have one minute and I'll just say thank you and I'm happy to take questions if there are any. Thank you. So I think Chris is in charge of this. Have you thought about, so you mentioned before with regards to privacy, adding noise in the submitted weights or gradients, have you thought about adding or optimizing over the type of added noise so that it affects the generation of the total model less but makes it more difficult to reverse engineer the data that was used for the generation of that submodel? I haven't thought about that but it's a great idea. I mean normally what I've seen is people use either Laplace noise or Gaussian noise but that's a great idea. I don't know if you know of work in that area but I think it's a very good problem actually I hadn't thought about that. But yeah, I think clearly there will be some kinds of noise that are better. I mean, just quantization is a way of adding noise so that might be better than adding entropic noise so it's a good question, good problem. I've been trying to get James to stay for his PhD so you can probably make that into your topic. And then for the age of information slide, you might have mentioned this before but it seemed like the assumed current behavior is that like if you're a sensor and you're not chosen to submit your gradients or weights then you just hold the previous measurement, correct? That's right, yeah. Okay, so is there any work in like if you're, sorry if I mentioned this, but is there any work in if you're not chosen how you aggregate your current results with the next measurement? I don't know of any but I think that's another good idea. I mean, I hear we assume that the data is collected at the beginning and so you're not getting fresh data as you go through, right, you're just getting a fresh model. So that would be maybe an example of where you continue to aggregate but so you're hearing, you can hear the down, you're saying you can hear the global model coming down and you can continue to update and then update but in this model we didn't really consider that possibility. Thank you. All right, who's next? Yeah, here's somebody over here. Guy, get here first. You started the talk with showing federated learning and decentralized learning as two paradigms. Moving forward, do you think they are going to coexist together or do you think that's going to go more towards decentralized route and whichever one you think, what are your reasons to believe that whichever one is going to, you know, research is going to move forward towards? Well, I think they will exist together because I think they have different use cases. I mean, if you have a cellular network, you already have a natural aggregator so there's no real advantage to not using it unless you have some privacy issues but you know, in IoT you might have very light infrastructure. You may not have the same, you know, ability to carefully control the scheduling and so forth so then this decentralized model is better. So I think they're different use cases, basically, is what I would say. So I believe they're both useful and they can coexist. Yeah, so. In scheduling updates, have any approaches been used instead of like round robin or picking at random or maximum SNR? Actually, minimizing uncertainty in the models, you know, maximizing mutual information between measurements in the models or increasing fissure information or some way to improve my knowledge of uncertainty in the model, picking the sensors in that. Yeah, that's a good question. So I did mention this one thing where people look at, you know, the gradient, the magnitude of the gradient from of the local learner from the global model. So in a way, you know, if your gradient doesn't, is small, you're not adding much to the mix. So that's one way of doing it. It's not using fissure information or anything, but that will be another thing you could do. There are also, you know, a lot of, if you have feedback from, you know, if you can do some kind of feedback, you know, handshaking before scheduling, you know, you can share all kinds of information and that could be something you could share. Yeah, but, you know, and also people look at the channel quality beyond the signal-to-noise ratio and so forth, you know. So there is a lot, like I said, it is a cottage industry. There's a lot that's been done and I wouldn't say that no one has looked at that, but I think I haven't, I'm not aware of any work right now. So it's a good idea. All right, who's next? Thank you for your presentation and I would like to ask a question. As you mentioned, there is a scheduling problem in federated learning, but as we know, we cannot control some H devices to, if they can participate in the chaining. So is it possible for federated learning to allow the clients to participate in the chaining app view that they randomly join the chaining, not just schedule or control by the server, require them to send the gradients or send the models to the server? Yeah, well you could think about that as being an example of random scheduling, right, where the randomness doesn't come from the aggregator choosing at random who participates, but the users themselves say okay, I'm gonna participate or not. So I mean that, of course, then you're not choosing exactly, so it's random with a random, also known as number of participants, but it is kind of an example of random scheduling. Yeah, but you're right, I mean not every, it's not like we're all in lockstep with the base station all the time doing what it tells us to do. So exactly, you can do that. So that's a, yeah, it'd be interesting to see what happens if you allow random scheduling with a random number of things. Of course, then if you go beyond the amount of channel resources, you're gonna have some other problems, but yeah, that's an interesting scheduler. Another question? Maybe one or two more? Okay. No? I actually, if you don't mind, I could ask you one. So I was, you got me thinking when you were talking about, during the training phase, when we think about personalization, amongst devices, that the edge devices themselves would do the fine tuning, based on the global model from the server, which makes perfect sense. I think it's interesting, if you consider then during the inference phase, in a sense it could sometimes be the opposite, where the devices have maybe the most coarse granular type model that they're doing an inference on. And then the server would be the one doing the fine tuning of maybe the least significant bits. So I wonder if there's a, I don't know, maybe it's not a question more of a comment. Yeah, it's a good comment. I mean, I was thinking of the more complexity issues rather than data issues, but yeah, I think you're right. It could, that could work as well. By the way, there's also another, in terms of, since you mentioned inference, there's also an issue of privacy from inferences that are made, right? So if someone can see what decisions you're making after you've trained the model at the end user, then someone might infer from that something about your private self. That's a totally different kind of privacy, but it comes up in all machine learning, I think, that sort of thing. So yeah, okay. All right, well thank you very much, everybody. Thank you so much. Thanks for your questions. Thanks, Chris.