 welcome to the webinar series of the IT Journal on Future and Evolved Technologies. My name is Alessia Magliardidi from ITU, the International Telecommunication Union. ITU is the United Nations Specialized Agency for Information and Communication Technologies. ITU allocates frequencies to the services that make use of the radio communication spectrum. It develops standards and assists developing countries in setting up their information and communication infrastructure. ITU and academia share a commitment to the public interest and this commitment is embodied by the IT Journal which offers complete coverage of communication and the working paradigms free of charge for both readers and authors. Our journal work on submissions at any time on any topic within its scope and we believe that this webinar series launched in March this year will inspire more contributions from researchers around the world. It is my pleasure to open today the webinar with Professor Nes Shroff from the Ohio State University and we count on your support to make this webinar an interesting experience. So please submit your question via the Q&A channel and we will address them to the speaker during the Q&A session. And after the talk and the Q&A, please stay online. We have something special for you, the wisdom corner, live life lessons. Professor Shroff agreed to a personal chat. He will share with us today some lessons learned over the years that might perhaps be useful for some of you. It is my pleasure now to introduce Professor Iana Kildins, the moderator of the Q&A session editor-in-chief of the IT Journal on future and evolving technologies and also the president and founder of Truva from the United States. So Professor Kildins is Ken Bayer's chair professor in telecommunication emeritus at the Georgia Institute of Technologies. His editor-in-chief emeritus of impact factor journals highly cited and at the top of the most prestigious international rankings is visiting distinguished professors in several universities around the world. His current research interests include 60 and 7G wireless networks, hologram communications, bio nano things, molecular communications, intelligence surfaces and many other subjects. So Professor Kildins, the floor is yours to give your remarks and to introduce our speaker. Thank you. Thanks a lot Alessia. Good morning, good afternoon and good evening worldwide from Abu Dhabi. I welcome you all to the second season of our IT Journal for Future and Evolving Technologies webinar series. I have the immense pleasure to introduce you, one of the leading researchers in our era and a true friend of mine Dr. Ness Shroff. Ness has over 25 years a super career. He received his PhD degree from Columbia University in 1994 when Columbia University was the leading unit in telecommunication through the Center for Telecommunication led by Tom Stern and Misha Schwartz and Tony Akampora. I still remember those years and the Ness joined Purdue University immediately as an assistant professor at Purdue. He became a full professor of the School of Electrical Computer Engineering and director of Center for Wireless Systems and Applications in 2004 and my our roads crossed many, many times with Ness. One of them was through that center and he invited me to serve on the advisory board and I visited many times as well as he visited me at Georgia Tech many, many times and we met at many other occasions. In July 2007, Ness joined the EC and CSE departments at the Ohio State University as the Ohio Eminence Scholar Chair Professor in Networking and Communications. Ness also has a guest chair professor of wireless communications at Tsinghua University, Beijing, China from 2009-2012 and currently he holds an honorary guest professor at Shanghai Jiatong University in China and visiting position at the Institute of Indian Institute of Technology in Mumbai or Bombay. And the Ness research interests span the areas of communications, networking, computing, storage, cloud, recommender, social and cyber physical systems. He's especially interested in fundamental problems in machine learning, design, control, performance, pricing and security of these complex systems. Ness also serves the community in many, many capacities. For example, he is editor-in-chief of the ACM, IEEE ACM transactions and networking and also he's on the steering committee board and also chair of the ACM Mobihawk conference. And he also served many, many IEEE and ACM conferences, the leading and prestigious ones like Infocom and Mobihawk and also he served many journals on their editorial boards. Ness is an excellent speaker. Accordingly, he gives and gave many, many keynote addresses. Also, I had involved him at conferences that I organized and had invited him as keynote speaker. And so he also has many awards. For example, he's an IEEE Fellow and also recipient of National Sounds Foundation Career Award. His papers receive numerous awards like IEEE Infocom 6, 8 and 16 and the best paper of the year award from many journals. And so he is overall an excellent researcher. I respect him since the beginning. I met him back in 1997. He was a young fellow and he came, he asked me to be involved in the technical program committee of I can't remember Infocom. And I said, of course. And then since then we are very close friends and I'm really proud that he became one of the top researchers in the field. And Ness is on the list of highly cited researchers. And so like, you know, this most influential, world's most influential scientific minds in 2014 and 15. And as I told you, he received many, many awards, including Infocom, which he won in 2014. And I think I should start to stop here. And as a final note, let me express my sincere thanks to Ness for accepting our invitation and giving the speech, which is entitled AI edge designing future XG networks and distributed intelligence. Again, Ness, thanks a lot and the stage is yours. We look forward to your talk. Thank you. Thank you so much, Ian. I greatly appreciate the kind introduction. It's a real pleasure to be here. And let me begin by sharing my screen and then I hope everything works out. Okay. Okay, is my screen visible to everyone? Yes, very good. Perfect. All right. So as I said earlier, it's really my pleasure to be here. I'm Ness Shroff and I'm at the Ohio State University. And I'm also the Institute Director of a very exciting new NSF AI Institute called AI Edge, which is focused on designing future edge networks such as XG networks, as well as distributed intelligence. So what I'm going to do is I'm going to go over this outline relatively quickly. I plan to spend roughly about half of the talk describing the research agenda of this new Institute, primarily because I think it will be of interest to the broader community. I'll sort of talk about some specific, you know, open questions in various trusts and so on. And then I'll spend roughly the other half of the time on a particular use case study where I will show how AI and in particular online learning is important in improving a system performance. So let me begin by saying, you know, a story. So while I was preparing for our Institute kick off about a year ago, someone sent me a video where the great Len Kleinrock, who by the way is a great friend of Ian Achilles, I know, was recalling a conversation that he himself had with another great leader in our field, Richard Hamming, about why so few scientists are remembered for their work. And Hamming's answer was that they don't work on important enough problems. So taking this lesson to heart, we should all do our best to make a contribution. Not only that will be remembered by writing yet another paper, but hopefully be remembered decades from now. And this is how we have, you know, charged this Institute and we are all very excited to be part of this journey, which is really an opportunity of a lifetime. So you're, let me begin with the Institute vision. So as all of you I'm sure know, networking and artificial intelligence are two of the most transformative information technologies. These technologies have helped improve people's lives, contributed to national economic competitiveness, national security, and national defense. So the overarching vision of our research efforts in this area is to exploit the synergies between AI and networks to create not only a research but also an education, a knowledge transfer and workforce development environment that will go towards helping reestablish U.S. leadership in future generation edge networks and hopefully distribute an AI for many decades to come. And we hope that this Institute vision is ambitious enough to pass the Hamming Klein rock test. So this address, you know, the various challenges that are, you know, are present in, you know, designing the future of XG systems. This Institute is made of a strong and diverse consortia of lead research leaders from various universities, industry, and government labs. The goal will be that these will all work collaboratively to realize this overall vision. So we have 11 universities in the U.S., and now we have gone global. We have three of the top Korean universities as part of this group, KAIST, SNU, and Korea University, as well as IIT Bombay and IIT Madras. All right. So this figure gives you an idea of really the scope of the research that I'm going to talk about and what we mean by the network edge. So, you know, the totology is that everything outside of the core, I guess, of the internet is the edge. So this is sort of a very geeky and precise definition, but doesn't kind of tell you anything. But what we really mean by XG networks and the edge networks is all, you know, wireless networks such as camera networks, drone swarms, vehicular networks, robotic networks, cellular networks, et cetera. These are all part of this XG ecosystem and the edge ecosystem. And also, you know, the smaller data centers, not the very, very large, you know, data centers that are part of the core are also part of the edge. And the reason why I feel that researchers should focus more on the edge is because of two reasons. Number one, this is where we expect most of the explosive growth to take place thanks to IoT and other devices. And also the second reason is that the edge is an area that is, let's say, less affected by legacy standards. So you can do more innovative works that can have a real impact. So in this figure, you can see that the bottom layer really corresponds to the physical network while the top layer corresponds to the intelligence substrate or the super straight that controls it. And our goal is really to sort of develop this distributed intelligence top layer. So the research plan of the institute is organized across eight trusts that span two broad symbiotic areas. I'll talk about each of them in a little bit more detail. The first area is AI for networks, which corresponds to trust one through four. And the second one is networks for AI, which corresponds to trust us five through eight. And to, you know, one of the things that we in academia do really well and as a theorist that has been much of my focus. But to make sure that our foundational research is grounded in reality, what we've done is we've developed, you know, different important wireless use cases that our research tasks will explore. And the idea is that these research tasks will be, you know, further enhanced and fleshed out by exploring these use cases. You know, the first one being, you know, sensing and networking combined, the interactions between machines, humans and mobility, and the third one being, you know, programmable, virtualized networks. And while these use cases, as I mentioned, are important in their own right, we have already started to think how to connect them to key research, to key experimental platforms. So, you know, some of them are government platforms like, you know, the various power platforms as well as industry platforms of all the various industries that are part of our consortium. All right. So, I'm going to list three differentiators of the institute, but this is not really meant to be an advertisement for the institute. It's rather some key differentiators that I feel are critically important for doing work in this area. So, right now, it's very, very fashionable to apply machine learning or AI to everything that you design. But if you look at the plethora of networking papers, for example, that use AI, what they typically do is they take an off the shelf AI, you know, usually a neural network, they train it and apply it to a networking problem. And then they say, okay, well, you know, you see some improvement and we are done. However, while, you know, AI has been enormously successful, and we have a real opportunity to use it to design edge networks. The reality is that the applications that AI has been applied to are quite different from networks. I mean, networks are very dynamic beasts. They're very complex beasts. They require, you know, handling constraints which are not typical in AI systems. You know, these are hard constraints like power constraints, etc. You have to account for the fact that you have mobility. You have, you know, information arrivals and departures and so on and so forth. So, I strongly believe that simply applying known AI techniques to these networking problems is the wrong approach. Instead, what we want to do is to develop new foundational AI that takes into account these various network characteristics such as, you know, power controls, interference, scheduling, network dynamics as well as the decades of domain knowledge that we all as experts in the field have built up. You cannot simply throw this away and then use a black box and then declare victory. The other important aspect of this institute and research which I think is very promising is that because of this tremendous growth in the edge devices, there is no question that the future of AI is going to be in distributed AI. And so we can think of this in two ways. One is that we basically, you know, use the distributed AI like it is done right now where, you know, your iPhones, your end devices currently have some, you know, app that uses a machine learning tool but doesn't really interact with each other or doesn't interact with the network. Or you basically create an environment where these different ML agents that are in different edge devices can interact with each other through a smart network. So here what we need to do is we need to develop AI aware networks and network aware AI in order to sort of unleash the power of collaboration to solve large scale distributed AI problems which are very, very important in the future. And finally, you know, I will briefly mention this notion of a virtuous cycle. So as I've mentioned, you know, I'm a theorist but a critical component of what we will be doing is to help to shorten the time scales of interactions between foundations and use case research across multiple disciplines. Hopefully that results in this what we call virtuous cycle that will have a cascading impact that dramatically accelerates the time it takes for the research to go from foundations to implementation and tech transfer. And to that end in year one, there have been several examples of works that have been done in this institute that have actually seen a large scale experimentation. So let me briefly introduce you to the research thrusts and some of, for some of the thrusts, I think about four or five of them out of these aids, I'll go into a bit more depth about some interesting research open problems. Unfortunately, given the time I can't do it for all of them. All right. So we begin at the physical and so that in the first thrust, the goal is how do we reengineer the physical fabric itself? How do we expand the constraint set if you're a mathematician? Okay, so the idea here is how can we use, you know, physics based approaches to sort of reengineer the physical fabric and, you know, use the fabric as a controllable entity and also expand the capacity region. So let me give you two sort of example research challenges in a bit more depth. So the first challenge is how do we leverage physical knowledge in order to, you know, design, you know, better sort of, you know, a better communications medium. So the motivation is that physics based knowledge have often been used to solve complex problems. They've also recently been used in, you know, neural network types of settings where you sort of leverage meaningful physical models into a neural network in order to speed up prediction. So big question is when we are designing our edge networks, can such models be used, for example, to better estimate, you know, certain parameters that might be important, multi-path component mobility, you know, make initial guesses, provide sort of boundary conditions. So the exploration that takes place is not done unfettered. It's sort of done in a way in which the physics dictates it so that you converge to the solutions much quicker, right? These are actually quite hard problems. And initially we had some, you know, ideas that, okay, if you start with an initial condition, that's perhaps closer to the optimal little automatically result in much faster convergence. It's not true. But there are some very complex issues related to that. I'm happy to chat about them. The other aspect is can physics based models be integrated with machine learning tools to help make better decisions? So for example, you know, for online learning, you know, you can use it with sort of, you know, physics based constraints to do better beam searching, beam forming, etc. The second important research challenge that I want to talk about here, and there are, of course, many in this trust, is how do you use this physics based knowledge to discover efficient codes in communication systems via machine learning? So the key motivation for this problem and coding, by the way, is sort of the fundamental building block of communication, right? And so the key motivation is that developing efficient codes, if you look at the history of how you know, coding has, you know, codes have progressed over the last 30 to 40 years, they sort of primarily go along the line of incremental advances with occasional breakthroughs that happen because of human ingenuity, you know, think of turbo codes, for example, you know, think of Gallagher's original papers. So these things happen very rarely, they typically happen on linear codes. And so the big challenge here is can machine learning be used to expedite discovery and perhaps, you know, even use something like deep neural networks to expand the search space to also include our nonlinear codes. And so there are some very interesting questions here. For one, once you get into the nonlinear codes, the number of code words is extremely large. You have generalization issues like you normally do with machine learning tools, where training at one SNR and perhaps a small block length does not easily generalize to other SNRs and larger block lengths. And so to overcome these challenges, domain knowledge from communication, coding, information theories is critical. So knowing the physics of the problem is very, very critical to making advances. And there are some very exciting new works, you know, primarily pioneered by Seh Wong Oh and his group at University of Washington in designing these low latency nonlinear codes and a research area that, research area that, you know, I strongly recommend folks who are interested in the physical layer to look into carefully. So the second thrust is really about resource allocation. And here our goal is to develop new AI techniques for optimizing resources under this hopefully expanded constraints. So there are issues of complexity that I'll talk about, how to learn from incomplete network state information that I'll talk about. So let's again discuss two examples in a little bit more detail. So machine learning is a great tool for prediction but often needs a lot of data, a lot of information and might take time. So a big challenge here is how does one design low complexity and especially sample efficient AI network algorithms, right? So the idea is, you know, you basically want to develop new strategies to learn the environment very quickly and adapt to sort of non stationary behavior. Okay, but you also want to do this where the information content that you might be getting from your wireless system, etc might not be very data rich. So how do you make decisions when the number of samples that you're receiving is relatively low? How do you make design online learning tools when the computational resources that you have at the end users are relatively limited? And what do you do when the information exchange channel itself is somewhat limited, right? So how does one design machine learning tools that have all these great features of, you know, low complexity in terms of sample, compute and communication and also is able to handle things like non stationary dynamics, odd and soft constraints, distributed nature of these large network systems, potentially multiple objectives and so on. So there is this wonderful area of Bayesian optimization, you know, which again sort of, you know, one can develop further to develop online tools that have these features. You know, a fantastic area to work on if you really want to, you know, learn about, you know, online learning and online reinforcement learning, especially if you're also interested in ensuring that the solutions that you develop, you know, have provable efficiencies are safe. Another big challenge that happens here is what do you do when the network state information is incomplete. So the practical challenge is that many network control functions in reality cannot wait for all of the data to arrive before a decision needs to be made. So for example, right, you cannot, you know, tell a self-driving car, hey, let me wait to make sure that, you know, I get all of my decisions before I tell you to make a left and not hit that wall, right? You clearly cannot do that, right? So you, and even for simple things in our, I mean, not simple, but in traditional network control, congestion control, for example, needs to be done on the order of milliseconds potentially, right, or, you know, or seconds at most. And you cannot wait for all of the data to arrive before you make this decision. So your one has to develop multi-scale machine learning tools and how to do this efficiently, how to do this optimally is open. For example, one could use local data for real-time control, but then use historical global data to determine the policy in an offline manner, right? And, you know, open questions against, you know, these sorts of approaches, you know, perform well, you know, are they efficient, you know, they are they near optimal, can they maybe it's scalable, all good questions to sort of look at. Thrust 3 is about dealing with multi-agents, possibly non-cooperative entities. So you're the idea is the following, you know, while you may have developed wonderful techniques for optimizing individual agents, okay, things may go quite badly if these optimized agents start to interact. For example, let's take a very mundane, but important, you know, networking requirement, which is setting handover thresholds, right? If you use you know, if you use the techniques that the single agents have set for setting handover thresholds, and now you have multiple base stations interacting, these thresholds might be too low, you know, for, you know, for this distributed system, multi-cell system, so while it's perfect for a single cell, it could end up overloading neighboring cells. And of course, the problem becomes much more exacerbated when this agents may be adversarial or competitive, you know, where you have Verizon and AT&T, you know, competing and, you know, you have this introduction of this very, you know, important and flourishing area, which is just right now, people are looking into, which involves, you know, games and reinforcement burning. I'm not a security person, but security is a very important piece in designing XG networks. And so thrust form is really about making sure that security is taken into the design of these networks from the ground up and not as an afterthought. And there are a variety of design problems where machine learning could be useful, you know, how do you analyze network protocol specifications, network protocol implementations, how do you detect anomalies, etc., using machine learning. A very important, practical, but meta question is how does one use machine learning or develop machine learning tools to create automated tools to ensure security. And this is an area where it's quite a bit of work that started in our institute. And on a broader context, are there ways in which one can fundamentally characterize security performance tradeoffs? This is a very open problem, very exciting problem, and one where perhaps machine learning can help. All right, now we'll move to from, you know, AI for networks to our sort of second meta thrust, which has to do with networks for AI. So thrust five is really aimed at developing distributed AI tools that are network aware and can coordinate with the networks themselves by taking into account, you know, the constraints that the network has, you know, computation, communication, data, etc. So let me again give you a couple of examples here. A big challenge in distributed machine learning is to do distributed optimization. So the question is, can we do network aware distributed optimization in order to further enhance, you know, the performance of the end AI distributed applications. So your, the motivation comes from the following observation. If you look at traditional distributed machine learning like federated learning, etc., it assumes reliable communication and be typically much of the work assumes communication between a central aggregating server and the worker. However, if you look at edge networks, workers could be communication constrained devices at the edge, you know, IoT sensors, cameras, etc., and the reliability might be suspect in terms of the connectivity. So the question is, what are the fundamental tradeoffs that one can get between communication efficiency and making these machine learning applications better, making them converge faster. There's this very exciting work done by Gauri Joshi and so on for important class of problems, which is stochastic gradient descent, which shows that communication gains can be very substantial if you transmit infrequently when the distributed AI algorithms are far from convergence and you transmit much more often when they are closer to convergence. And this gives you order of magnitude improvement in communication gains. So the question is, can such insights be used to more general distributed machine learning tools? Can we design appropriate ML algorithms that explicitly take into account network constraints, bandwidth and delay? And can we further improve performance by not only focusing on communication, but also taking into account computation, compression, coding, etc. All right, another big question and happens a lot in sort of distributed inference is that you have this notion of stragglers. So firstly, let me explain that many machine learning problems have been seen benefit by distributing them over many, many servers. But a big problem if you talk to, for example, Facebook or Google or Microsoft, is this notion of stragglers. These are the servers that are very, very slow. And so the tasks are reported to the distributor of these machine learning tasks in with a substantial delay. So basically the idea is that these tasks can get slowed down by servers that might be overloaded, they might be rebooting, or they might have low link delay in an XG network. And these stragglers, because they come late, can substantially impact the performance of the machine learning tools themselves. And so there are many practical solutions that have been proposed, but they have significant limitations. And because of this limitations, an entirely new area has come into being called coded computation, which allows the final results of this distribution to be recovered, not from all the tasks, but from a subset of the completed task. The problem is that coded computation is really a research area that's primarily worked on by information theorists. And so they don't understand delay, they understand compression very well, they understand coding very well. And so while they resolve the straggler problem, each of the individual distribution of these ML tasks have a certain code that's put into them, that basically makes the parallelization not complex. So all you've done is take and solve one problem, but you've incurred a problem at the other. And so a big open question is can coded computation take into account the structure of the machine learning problem, for example, sparsity, and be adaptive to this heterogeneous edge. And you're sort of, there is some very exciting preliminary work in exploiting sparsity for code design that indeed answers the question in a preliminary way with yes, but of course, much more work needs to be done. So thrust six now is about not only considering the network constraints, but also reengineering the networks that can adaptively allocate resources based on the needs of the distributed AI applications. So you have the opportunity to design network operations for managing both AI uncertainty as well as network side uncertainty. And then thrust seven, as I said earlier, is about the interface between humans, AI, and networks. And the motivation for this thrust is very simple, while all of us would love to have fully automated systems. The reality is that humans are going to remain a critical component in this complex systems for a long time. And thrust seven is really about how do you develop new collaborative methods across this human AI network spectrum to make the systems more efficient than they would be by either humans or machines themselves. The last thrust, which is certainly last but not least, I don't want to kind of diminish its importance, is about making sure that user's privacy is preserved to the level specified. Because as these ML applications become more and more distributed, privacy concerns are going to become more and more important. And so your can we design and control these networks such that their privacy aware differential privacy, which is a very, very important and fast growing area where you might have some kind of noise to ensure that users data as private, even though it's being shared, can you do this in a systematic, perhaps even a rigorous fashion in order to facilitate protection from information leakage and attacks, especially when you don't have trusted curators, where you cannot trust an intermediate party to sort of resolve the privacy issues for it. So this is a very exciting and important area in distributed AI. All right, so to summarize the first half, there are lots of fun and exciting problems that exist at the intersection of AI and networks. They should keep us engaged for a long, long time. However, I want to tell you a story from World War II that I hope clearly explains why context matters with data and data should be used very, very carefully. So in World War II, the Royal Air Force, the British RAF, lost many, many planes to German anti aircraft fire and decided that we need to armor these planes. But then they asked the question, where should you put the armor? And the engineers said, look at the data. And indeed, they came up with a very obvious solution. They looked at planes that returned from the missions. They counted up all of the bullet holes and put the extra armor in areas that had the most bullet holes. So this seems like a very logical solution, seems like the correct solution. But in fact, it was wrong. So anyone wants to take a guess as to why you can unmute yourself. Actually, they can write questions or they can respond via the Q&A. They are not allowed, sorry to... That's fine. All right. So let me explain why it was wrong. The idea was that if a plane made it back safely, even though it had lots of bullet holes, say in the wings, it basically pointed to the fact that bullet holes in the wings are not that dangerous. So in fact, you don't want to armor up areas that have holes, but you want to armor up areas that do not have holes like the engine. Why? Because planes with holes and engines never made it back. So this is sort of an example of why it's so important to use and interpret data carefully. And this is where domain knowledge plays an extremely important role. All right. So now let's discuss a case study where designing new machine learning tools can play a significant role, that of edge caching. So edge caching is critical for a variety of edge applications and use cases. For example, in autonomous transportation, we may want to cache geographic or congestion or environmental accidents, construction, etc., information at nearby edge caches for vehicles to make fast decisions. For intelligence education or personnel as education, we want to cache education materials, let's say based on users' interests to support faster data retrieval. And for edge computing, we may want to cache pre-computed data, for example, product embeddings for recommendation services in order to accelerate computation. So clearly edge caching is important, but there are very different challenges at the edge versus the core. If you look at the core, the core uses conventional caching policies such as least recently used, which will not work well at the edge because they are focused more on the most popular items, most recently used items. And in edge devices, typically you serve a small group of users that have highly individualized and dynamic demands that are not seen at the core. So if an item, let's say a music video or something like that, is requested recently at the edge, it's unlikely to be requested again unlike at the core. So your ML can be used for addressing these challenges by developing efficient caching solutions by predicting individual dynamics and appropriately adapting the edge of the cache contents. The other difference between the edge and the core is that edge caching performance is often impacted by other system components, such as storage layers from the edge to the core in a very complex and implicit way. And so key question is how do you determine this implicit impact and develop caching solutions accordingly? And that's going to be the focus of the next 15 to 20 minutes. All right, so let me begin by just introducing you the notion of a content delivery network that are multiple cache layers, the edge caches that closer to the users typically have a smaller capacity, but also correspondingly small delay. The intermediate caches have larger capacity and larger delay. And the back end storage are these very, very large caches that can potentially cache the entire dataset, but have the largest delay. And typically if you look at a Facebook type of a meta, I guess, type of a company, you will find on the order of hundreds of edge caches that are connected to an intermediate cache and tens of intermediate caches are connected to a back end data storage. So this animation just shows you what happens when you're trying to serve a data request in a content delivery network. So once a user generates a request, if it's stored in the edge cache, then there's a cache hit, you serve it with a small delay. If there is a cache miss, if it's not in the edge cache, you try to serve it from the intermediate cache. And then if, of course, it's not in the intermediate cache, you go all the way to the back end storage. And for the purpose of this talk, I'll assume that all of the data is always available at the back end data storage. So why is the edge caching problem challenging? Because the delay of fetching the miss data, which we will in the future call miss costs, depends very much on where the data is stored, whether it's in the intermediate cache or the back end cache. But whether a data item is stored in the intermediate cache is not really known to the edge cache, because the cache contents of intermediate caches are not only dependent on the data request from a particular edge cache, but from hundreds of edge caches. So essentially, in this scenario, the miss cost of a data item is really unknown to the edge cache and unknown to the end users. So a new challenger is how does one use or develop machine learning tools to strategically estimate the miss costs and use them to design efficient edge caching policies as something which is not considered at the core. So let's say that our overall goal is to minimize the overall miss costs. If you knew this miss cost and you knew the popularity distribution, then the optimal policy is very, very simple. You cache the item that has the largest product of the popularity of that item times the miss cost. But for unknown miss costs, there has been a lot of heuristic solutions developed in the systems literature, you know, most notably this paper on hyperbolic caching and usenics, where essentially what they do is they estimate the miss cost by taking the average of previous observations of the miss items and then they cache the items that have the largest product of the popularity times the estimated miss cost. But there's a key issue with this heuristic solution. It ignores the impact of caching decisions on estimating the average cost in the future. So what it does from a learning parlance point of view, if you kind of know a little bit of a machine learning is that you only explore it's a greedy solution, but you do not explore and this can result in substantial suboptimality. In fact, we will show that this heuristic solution has actually a linear regret, which is quite bad in terms of performance. So let's look at an example, which sort of gives you a little bit of a deeper idea into the problems with the heuristic solution. So let's say we have two data items, D1 and D2, they have equal probability, so the popularity is the same, the edge cache has a capacity of one. And let's say the miss cost is two, if the item is served from the intermediate cache, and it's 10, if the item is served from the backend cache. And as we mentioned earlier, since the cache content of the intermediate cache changes over time, let's say the sample cost is as follows. For item D1, for the first two time slots, it's in the intermediate cache, and then it's always in the backend cache. For item D2, it's always in the, I mean, it's in the intermediate cache, I mean in the backend cache for the first two items, and then always in the intermediate cache. Now if you use this heuristic solution, what will you do? Based on the first two observations, you will cache item D2, because its estimated cost of D1 is two, and the estimated cost of D2 is 10. So you want to basically, you know, use the one with the larger cost and put it in the edge cache. But the problem is in the future, the script is reversed, and the heuristic solution will never replace D2 by D1, since D2 is already cached and its future miscosts will not be observed, and the estimated miscosts of D1 are always less than 10, because the average of this, no matter how far you go, will always be less than 10. So this kind of gives you an idea of why this heuristic solution is so suboptim. So in designing a policy, there should be this natural exploration, exploitation trade-off, which is critical in machine learning and online learning algorithmic development. So our solution that I'll show will be guided by online learning principles that will balance the trade-off by adaptively learning the miscost of each item, and it's closely related to something called the combinatorial stochastic multi-unbanded problems. But as I'll show later, there are some new challenges beyond conventional CMAB that one needs to address. All right, so let's quickly go over the model. Let's say I have a set of M data items, D1 through DM, these all have unit sizes. And let's say that my edge cache is K, of course, it's much less than M. I'm looking at a discrete time system. And at each time slot T, there is a data request R of T. And let's say that it's generated according to some popularity distribution, which for the purpose of this talk, let's say as known later on, talk about how this could easily be generalized. And then the cost to serve this item depends on where it's fetched from. If it's fetched from the edge cache, we assume it's zero, you can make it some CE. It won't change anything, except make the equations a little bit more complicated looking. It's CI if it's served from the intermediate cache and it's CB if it's served from the back end cache. The only thing you want to know is that the cost of getting it from the back end cache needs to be greater than the cost of it from the intermediate cache, which needs to be greater than the cost from the edge cache. Now, the caching decision, as I've said before, off the intermediate cache is not known at the edge cache because it depends on what's going on at many, many different edge caches. And so at time T, let's say QI is the probability that item DI is not stored in the intermediate cache. And gamma I is the expected miscost for this item, which is given by this simple equation. And let's assume without loss of generality that we've kind of ordered these items D1 through DM, such that the product of the expected miscost gamma I times its popularity is decreasing with respect to I. And remember that all items are always available in the back end cache. And this could simply mean that we have multiple back end caches and they're all available in any of them. So the system goal is to minimize the expected accumulated miscost of the delay for edge caching. And so if you look at a policy pi, the expected miscost achieved by caching for that policy is given by the expected cost of getting it from the intermediate cache plus the expected cost of getting it from the back end storage. And if you knew the miscost probability, then the optimal policy is static. As I said before, you always basically cache the one that has the largest product. So in this particular case, you always cache the first key item since the product of PI times gamma I is decreasing. The key challenge is what do you do when these QIs, these miscosts are unknown and you basically need to adaptively learn them and update the cache content and then decide whether or not to sort of, you know, serve a particular request and load it into the edge cache if it's not already cached and which data item to evict if the edge cache were full. All right, so this basically shows the connection between our edge caching problem and the stochastic combinatorial multi-arm banded problems. There are lots of similarities, but there's a very important difference. And this is why one has to design new machine learning tools to handle the specific constraints of your problem. The key difference here is that there are additional constraints on the action sets for edge caching. So caching decisions at time t in our problem depend not only the request at time t, but also caching decisions at time t minus one. While the action set in the traditional stochastic combinatorial multi-arm banded problem at time t can be arbitrarily selected. So we cannot directly apply existing analysis to the edge caching and we need to come up with a new solution. So what we do is we basically propose a what is known as a cool back Lieber lower confidence bound or KLLCB based edge caching policy where what we want to do is we want to compute this qi tilde which is an estimate of this miscost at time t which is all initially said to be zero. And at each time t we calculate essentially the sample mean of qi. All right so we first do that and then this estimate qi tilde of t we calculate at is as the lower confidence bound based on the sample means it's a function of this and the reason why we are doing underestimation of this qi so we are using a lower confidence bound is we are want to encourage exploration. Okay we want basically not to be fixed on a greedy solution which could come and be problematic for us in in the future and that's the key idea basically of of the scheme. So then we will update the cache content by this gamma i which is the estimate that miscost which is a function of this estimated qi of tilde. So let's say if item di arrives at time t that's the request if that if this item is stored in the edge cache we don't need to update it at all. However the policy is that if it is not stored in the edge cache you load this item di into the edge cache if the product of its popularity times this gamma i the estimated miscost is greater than the product of the popularity of any item currently in the cache and then you look and see if the cache were full you have to evict a particular item so you basically evict the item that has the smallest product of the popularity times the estimated miscost that's basically the algorithm. And the performance metric that we evaluate the algorithm with is called the regret so the regret of any policy pi over time horizon n is simply the cost of that policy minus the cost of the optimal policy knowing full information like you know as if you were god you knew the full information so in this particular case the optimal policy would be the policy with the known qi's where the optimal policy is very simple you always cash the first k data items based on the estimated miscost. So it turns out that you can write this regret with a little bit of algebraic manipulations in terms of the expectations of ti out and ti in where ti out basically corresponds to the duration of time when item di is not stored in the edge cache and ti in corresponds to the duration of time when item di is stored in the edge cache and so we have two key theorems the first one is the regret upper bound so for example for our proposed kl lcb caching policy we can show that our regret basically is logarithmic and we are able to find the constants okay so we basically show that the regret of our policy at most is diverging from this optimal you know godlike policy logarithmically in time okay and then we show that no other caching policies can actually do better than our policy as as the time horizon goes to infinity so this basically shows that the kl lcb based edge caching policy that we develop is asymptotically optimal and unlike the heuristic solution the difference between the optimal and and and our policy is logarithmic while the difference between the heuristic solution being optimally diverges linearly has linear regret i don't have time to go over the proof but basically what we need to show is that you know these two terms in red are equal to each other and i'll sort of skip all of that in the interest of time since we are kind of running late this basically shows some numerical results where you have the heuristic as you can see has linear regret while our policy is logarithmic regrets so this is for a fixed you know popularity distribution this is for a popularity distribution which is more in line with what we see in real web caching situations for example facebook's photo caching once again you can see our regret is growing very slowly while that of the heuristic is blowing up and your it's with the ziff distribution but it's unknown to us remember when i initially kind of gave you the the the results therefore knowing the popularity distribution but we really didn't don't need to know it we can estimate it by the sample mean in our kl lcb policy and once again you can see that our policy does much much better all right this just shows that the policy is asymptotically optimal which coincides with our theoretical results so what are some of the takeaways the first takeaway is that underestimating this qi this you know miss cost will encourage exploration and avoid the issue of the heuristic solution and using this kl lcb based underestimation we are able to show that this optimizes the amount of exploration that you need and it appropriately balances the tradeoff between exploration and exploitation in order to achieve asymptotically optimal regret and the solution is generalizable already in many different directions if you don't know the popularity you can estimate the popularity through our algorithm if you have non-identical item sizes that's also handleable you can cache the data items that has the largest product of this probabilities divided by the item size that works as well so bottom line ml techniques can be developed can be sharpened to substantially improve edge caching performance of course like anything we've scratched the surface of this problem there are many many different directions to look into you know for one how does one jointly optimize edge and intermediate caching when you have limited and delayed information exchanges how do you generalize the proposed solution when you have time varying popularity distribution so you have non-stationality here we can use some work that we have done in non-stationary learning in order to solve this problem a big question is how do you efficiently implement the kl lcb policies in systems that have small computation overhead especially when this qi probability is not Bernoulli because in that case the form of this estimate using lower confidence you know cool back lever lower confidence bound is becomes a little bit more complicated so here one can use potentially this technique called boosting we have some works in in in this area in order to accelerate the performance so anyway lots of questions i'll sort of leave you with this and you know since there is going to be a follow-up discussion for young researchers i sort of wanted to point you to two links i have some career advice and philosophical musings on a couple of keynotes that i gave at nsf for young career scientists and also i have a on my website you can google this and you can get you know some ideas about how to do phd research some elements of excellence which i think are useful both for me as an advisor and also i hope for the students so welcome to us and with that i'll stop here and thank you very much thank you ns excellent talk and i'm sure uh the audience appreciated your ideas and your directions we have some questions uh let me go through them uh riner child from itu is asking do you on your initial slide slide six you mentioned that researchers should focus on the wireless edge doesn't a lot of innovation also have to happen in the fixed network in order to deliver future or futuristic xg services muted myself um yeah so the answer is that you certainly uh i'm not saying that you should limit yourself to the edge but if you are sort of looking for the biggest bang for the buck that's where the innovations will have to happen because uh the core of the network and i'm talking about you know the real core of the network for many many years there have been so many uh techniques that we have provided that can substantially improve the performance of the core and they have not been adopted and the reason why they've not been adopted is because there is too much uh standardized legacy systems and these are you know that the the inertial mass to change these solutions is very very strong you know the the core of the network the focus is how do you switch you know how do you take the inputs and switch them to the outputs as fast as possible and this has at least for the last you know couple of decades that's been sort of the mantra at the core so certainly many of the applications that we design will of course have to be may have to run on the core as well but i think the the the the biggest bang for the buck in terms of innovation is happening at the edge on the other hand if you are a you know a researcher that's working of the come at companies that uh you know look into the core or sort of you know own the own parts of the core of the network uh then it's certainly very valuable to sort of you know push them push these companies to do innovations of the core because you can certainly make you know big big changes there as well and maybe with quantum networks and quantum computation happening perhaps the core will eventually have to have to change yeah so it was more it's more of a preference that would be a long way to go right for the quantum network in place long yeah so there is another question by Reiner a network operator mentioned in a recent talk that they will sacrifice accuracy of machine learning models if they could instead have more explainability of why machine learning makes certain decisions i was so much surprised at first because thought accuracy of predictions what one will care the most can you comment on this i'm really sorry i'm outside now so there is some background noise no no no no i can hear you perfectly yeah okay perfect yeah yeah i mean i think explainability is important uh uh the question i think is uh is is more about uh uh you know uh you know sort of the uh uh you know the amount of of of sacrifice that you're willing to you know make right so you know in in terms of performance so again i'm surprised that actually a uh uh you know a company individual is is basically saying that they're willing to sacrifice performance for explainability but i can understand it to some extent because i think the problem is the following i think if you don't understand what is going on with the machine learning algorithms if you cannot explain them well it is quite possible that you may end up with an unsafe action which while giving you good average performance can result in catastrophes from time to time so i i suspect that that's probably what that person was was was was worried about uh there are some more questions hong ji guo is asking thank you for the great talk i agree for the edge caching problem did you consider the correlation between data items when a user requests data item d1 for example the most likely the next request will be the data item with high correlation to d1 yes good question i like it bravo very good question and in fact remember in the edge caching scenario i i i had i kind of put two questions two sort of open challenges right so one was essentially the this implicit characterization that i focused on for this talk and the other was personalized demands and in the in the in the former work we've done some work where we've actually looked into uh this question of correlation it's a very important one yeah it's it's yeah and you know you this is also an issue about spatial and also temporal correlations right that would be good to look from the joint perspective right yes absolute correlations yeah and these are open questions that i think one should definitely look sure yeah there is also a question by Shah Hosseini could you please comment on what is the simulation flat platform for edge caching oh the simulation platform for edge caching just sort of our own internal simulations that we that we did so we had some event-driven simulations that were designed by the students okay are there more questions while i'm waiting let me ask you also like i'm interested in these metaverse applications and that's also very important you know like how you have very large volume of data and you need to decide where to do these point cloud compression at the edge or at the at the end you know at the source and so you are talking about like optimal policies especially for caching right and so what is it take you mentioned about what are the your you know delivering messages but my question is you do a lot of analytical work which is really solid and excellent you know since many decades now the question comes about the practicality so like can you know companies utilize these analytical results and somehow deduce what they should do in terms of you know edge caching like and all these ads intelligence so can you please explain it to us yeah and in fact so the so the interesting thing is that the edge caching problem was actually motivated through an internship that my student did in a in a in a in a caching company all right and so so the problem was very real and they were very very interested in in in seeing simple so even though our analytical work has a sort of theoretical kind of a heft the end solution is actually very simple right so it's it's a very sort of implementable policy that's that's i think what we should all be striving for so i think it's a very good question right so it's no point in doing complicated analysis and then coming up with a complicated solution that nobody will use right you want to come up with a solution that's that's useful yeah and i assume that you will also develop some AI machine learning boosted solutions right in your yeah prizes right that's the center are there any other questions or that was one by uh rinehard i think he says uh if you look at another one yeah yeah the very last one do you want me to read it yeah wait a second it's in the chat yeah in the chat i'm going there now okay time okay one more oh wow rinehard is active today time scales differ a lot in a network from parameters that change on an annual time scale to parameters that change on a millisecond time scale or faster even not a scale you know depending on the use case right would you say that the shorter the time scales the more limited the use of machine learning um it it depends so for example depends on what machine learning tools one uses so if one uses for example let's say deep neural networks or things like that where you have to do a lot of training then the answer is yes if on the other hand you use you know online learning tools uh uh uh where you know there are there are data rich then even if the time scales are short as long as you're getting enough samples in the short time scales then you can you can do things uh which might be interesting so i think it depends okay i think uh that's enough now q&a and i i thank you again from the bottom of my hardness again you delivered an excellent talk as usual and i will let alessia take over and ask you her questions so again i look forward to seeing you okay we'll be thank you very much thank you for moderating thank you very much professor schroff for this very informative session i would like to move to the um wisdom corner so live life lessons which is based upon the idea to give a unique special angle to these webinar series so successful researchers like professor schroff today will share with us some lessons learned over the years and that and they will guide young students young researchers in the field of current ict research so i would start with uh with the first question uh professor schroff which is your hard-earned life lessons or failure if we can call it this way that you would like to share with us today that might perhaps be useful for um the participant attending sure uh so i i guess what i would probably say is that you know uh never let anyone else define what you should and should not work on uh so oftentimes especially when i was a young researcher you know people would say oh you know famous person x worked on this problem and has not solved it how will you right and that's a very discouraging message uh but usually it comes i found from critics who are maybe well meaning uh but really have not accomplished anything of significance and so my own feeling is breakthroughs happen because you believe in yourself you believe in the problem you believe in in a mission and that could take a lot of time uh so so it's it's it's something that you you just have to uh you know ignore the noise and basically do and i'll i'll give a very sort of uh uh uh you know a concrete example uh where i had this exchange with a student uh where he was a brilliant student who was actually now a famous professor in his own right uh and i have given him a problem which basically was an open problem and you know this was a very smart student and each week he would come and tell me uh professor shruff i don't think this problem can be solved because of x and i would say um you know uh this is a very thoughtful uh uh rationale but in fact you've made this small uh uh error in your in your logic and therefore i still don't believe that the problem can't be solved and so we went back and forth with this week after week he would come up with another reason why the problem couldn't be solved uh and it took a year and then in the end he was so fed up of me telling him that the problem you know uh can be solved and and and kind of you know giving him counter examples for his message that he eventually just did it and solved it so the point that i'm trying to make is that uh you know uh if i was to sort of you know believe from the very beginning uh you know that uh that this open problem couldn't be solved i wouldn't have persevered that hard with the student right so i think that's that's really important just you know uh if you really love what you do you know you take the time to do it and sometimes you're successful sometimes you're not but that's the game of research thank you thanks a lot so second question uh uh which strengths and capabilities that you believe that uh that students uh should be most focused on developing and how would you suggest that they accomplish this yeah a great question so the first and most important thing is choosing a good problem in a problem that will have a significant impact either in theory if that's your interest or in practice if that's your interest uh and and so this is very very important you you don't want to waste your time working on incremental problems just because you can publish one more paper okay so that's that's the one thing so choosing a good problem is important and if you look oftentimes being the first actually makes it easier to have a big impact right so i mean if you you know if you look at uh you know some of the researchers you know if they've been first they've been able to make a big impact and move on so oftentimes you know looking for for problems where you are the leader and not the follower uh uh is is is important uh the other thing is you know be willing to move out of your comfort zone try new things every few years you know you basically uh you know if you've been working your entire life uh on just one class of problems you haven't really uh you know learned anything new after a little while right so i think what you want to do is from time to time be willing to change your fields and and move into to two different areas and most important for young researchers is learn to communicate effectively because you really are the most important advocate of your work and and it's amazing you know we see a lot of papers that are technically not good uh that somehow get a lot of visibility because they have communicated their idea as well and then you see the reverse as well you see papers that are technically outstanding that almost nobody pays attention to because the way in which the message is communicated is very poorly done yeah very clear thank you and specifically in which fields and which topics would you recommend students to study yeah so uh i think it would really be foolish for me to sort of you know tell people what they should do but i think what what my suggestion is find the area that excites you the most because we right now are living in very exciting times that are so many grand societal problems that we can contribute to whether it's in big data and ai or sort of applications like self-driving intelligent transportation automation unlimited clean energy healthcare virtual reality quantum you know the list goes on and on so my suggestion is find something that really excites you to work on and then go for it thank you and and we talked about some some hard-learned uh lessons for that tell us one of your uh most tangible contributions that you believe had an impact a direct impact on your life or no other slides that you are very proud of thank you uh that you know that question is a bit like asking you know which of your children you like the best but perhaps if i sort of think about it you know our work on opportunistic scheduling might have had the most impact since since 3g wireless systems these opportunistic scheduling algorithms are now part and parcel of every phone so whether it's 3g 4g 5g 6g they will remain so that's probably i guess the most impactful if i was to pick one uh but i would say that you know i've i've enjoyed working on a vast variety of of problems that i yeah yeah i can imagine actually um i remember not one of your your probably last slide of today's presentation um you gave us some links i will definitely look at them for the material of interest for young researchers and the first one is a career advice and philosophical musings i'm i'm pretty interested in in this philosophical musings by you if you can share even one with us now you you know it comes to your mind like that oh i i think that uh you know as i as i mentioned right earlier right i think i think the most important uh aspect of doing research and i think the one that gives us the most pleasure is if we really do something that we ourselves are excited about right because let's face it we are in a field that's not we are not becoming multi-millionaires we are not sort of you know this is not our goal right if if that was our goal there are many other things that one can do uh so make sure that whether you're doing it for a phd or make sure that you know whether you're doing it for your career do something that you really love and do it your way right don't be a clone of anybody else you know uh you know you know don't be a clone of the of the greats don't be a clone of your advisors don't be a you know just do it you know in your in your own way you know find what you are really good at and sort of do it that way wonderful thank you just last one i promise just to close if you can share with us if you have in your mind a motto and aphorism or or like a book that you loved a piece of art or music that you would like to share with us that represents you yeah so i guess i sort of you know led led to that maybe frank sanatras i did it my way that's wonderful yeah thank you so much uh it has been a pleasure to have you as our speaker for this webinar today thank you thank you very much and thank you ian for being with us today thank you all participants and do you want to do you want to share probably ian some final words again thanks alessia for leading this life lessons session and again i thank you nes for this excellent technical talk and i'm sure that a lot of young researchers on the list get some stimulations to conduct research on this topic and what i can say is nes keep up your good work you always produce excellent results and thanks again for joining us and i ask everybody to submit their paper to our journal and if you have any ideas for special issues please do not hesitate to submit your ideas for a call for special issues and many people are writing great comments about the webinar today nes i'm sure you also read them here and everybody say everybody says great talk and and professor shroff looks the best so and yeah thank you again also to the audience and on the 22nd again online for another webinar with professor medar from mit great yes so we look forward to that too we had always these superstars in our webinars and thank you and have a nice day and night wherever you are i will have a dinner here now in the market fish market in abadabi so thanks a lot thank you alisa for arranging everything uh been a real pleasure and uh looking forward to uh you know you're in from all of you take care thank you thank you so much bye bye bye bye bye bye bye bye