 So, welcome, and we've got a very exciting and very diverse panel here on the future of AI hardware, and I'm just going to go back to something that when Dario was talking earlier this morning, that if he's talking about the energy requirements, I mean the compute requirements, doubling every 3.5 months through that production, you know, if you follow this out, I think the estimate right now is by 2040 AI on this current trajectory will exhaust all of the energy in the world. That obviously is not going to happen, but what we have today is a panel that's going to talk about how is AI hardware going to advance. So I think what we're going to do is we're going to bring up each of the panelists, they'll introduce themselves quickly, and do a very short five minute kind of introduction on their point of view, and then we'll go to Slido and start the conversation. Sound good? Everybody still awake? Awesome. Okay, I think with that we're going to start out. Come on up, Andrew, and if you could, just introduce yourself and your affiliation really quickly, and then we'll... I think I don't need that. Oh, no you don't. Do you need any hardware? No, I'm just going to use this. Okay. Some hardware? That's got a huge hardware. Use that actually. That would be good. Hello everyone, I'm Andrew McCallum on the faculty at UMass Amherst, where I also direct the Center for Data Science, and I'm very grateful and happy to be a part of this IBM collaboration. In two days at a workshop I'll talk about my research in natural language processing and knowledge-based building, but here I'm really here to talk about a sort of a side project in understanding energy and policy considerations, which was really instigated by one of my students, Emma Strubelle, who's now on her way to a faculty position at Carnegie Mellon. All right, so we're facing some large problems in the world, and AI can be part of the solution by helping us do better evidence-based decision-making in the midst of these problems, and there are many sources of that evidence, and one of them actually comes from text. There's a large quality of information in text, and let's consider for a moment scientific text. So one example, we are working together with an MIT professor of material science named Elsa Olivetti, starting with hundreds of thousands of research papers in material science, extracting from them the recipes. Sorry, there's a really bad delay with this remote. The recipe's for synthesizing new materials for car batteries of the future or better recycling processes so that once we have this large collection, we can infer perhaps brand-new recipes that have never been tried before that may have good properties, and understanding this text involves quite a bit of natural language processing, understanding its structure to really understand its deep semantics, and the operation that's mostly needed is called syntactic parsing, and the reason that it's reasonable to actually try to apply in this environment now or to this type of data is that accuracy is really, after much research, arrived at a point that it's really useful. So maybe back in the 1990s when people were using contract tree grammars, we had accuracy less than 75, but now thanks to machine learning and now deep learning, we have accuracies up in the 90s. And so now doing natural language processing is not just a matter of academic interest that we write in papers, but a matter of large scale processing by massive companies that are trying to run this on the entire web. And so that makes its efficiency quite important. And what do these models actually look like? Well, they're transformers. There's attention-based models of the kind that Yashua mentioned at the beginning of the day. So starting with word embeddings for all of these, projecting them into three different spaces through various transformations, and then making an n-squared comparison amongst all of them to decide the tension weights through which information is shared across all of them. Then furthermore, there's not just one attention head, but many attention heads. The results of all of them are then brought together through further processing, and the result is then brought back down to the beginning, where this entire process is run again many, many times. So you can see that it involves a tremendous amount of computation, tens of millions or hundreds of millions of parameters. And there's been a whole profligation of new machine learning models of larger and larger computation that for some reason seemed to be named after Sesame Street characters. So there was Elmo a few years ago that really caused some big gains in natural language processing accuracy. Now, of course, many of us know about this method called BERT. And here I've plotted the amount of scale with GPU hours to train on the horizontal axis, and this is on a log scale, right? And so it's really growing very dramatically. Up at the top there is a method on evolved transformers that I'll be talking about more in a bit. So this uses a tremendous amount of processing power, a tremendous amount of energy, and then also a lot of carbon. So putting carbon into the atmosphere. So AI is not only part of the solution, it's becoming part of the problem. So I want to put this into perspective in the following way. So if you fly from New York City to San Francisco, you've used almost 2,000 pounds of carbon dioxide. The average American life uses about 36,000. And a model that we in my lab trained and sent to EMLP where we were fortunate enough to win the Best Paper Award, the entire research development of that model took more than two American lifetimes of carbon in order to produce. Something that we weren't really aware of until we thought about this. And if you take this large transformer with neural architecture search, then you get something that looks more like five times the amount of carbon of the entire lifetimes use of a car to train, to develop one model. So this is the manufacturing of the car, gathering all of its materials, producing the steel, driving it, all of its fuel that entire time. What did this model give us? It gave us 0.1% in improvement on a machine translation task. I don't think the researchers were really thinking about the impact of what they were doing. And so let me just wrap up by saying that it uses a tremendous amount of carbon. It also uses quite a bit of cost. These are dollars calculated in terms of cloud computing credits. And hardware can help. If you use, say, a tensor processing unit instead of a GPU, then that helps somewhat. But this is not a solution to the problem. It only makes things incrementally better. All right, so first of all, researchers should just be aware of these issues. And when we publish a paper, we should let the community know how much training time did it take and what was the sensitivity to hyperparameters that affects how many times you need to run it if you're going to apply it to new data. Some academic researchers are fortunate to have massive computing resources, but others not so much. And it's really quite expensive. We need to think about equity. I'm almost done. Number three, researchers should prioritize and really think in their research about computationally efficient methods. And so these are researchers in NLP, machine learning generally in systems. We need to think about algorithms, like instead of grid search for meta parameters, be thinking about smart Bayesian search methods. And I think my following panelists are going to talk more about efficient hardware if we're doing these things. And then when we come up to discussion, I'll have more to say about algorithms. With that, I'll close. Thank you. Looking forward to the discussion. We're here from Professor Song Han. And Song, if you would come up and introduce yourself briefly and give us your thoughts. Clicker? Yes. OK, great. Hello, everyone. My name is Song. I'm an assistant professor at MIT ECS, the Microsystem Technology Lab. I joined MIT last year. And before that, I did my PhD at Stanford, proposed a technique called model compression, where we can aggressively compress the size of the neural networks by 10 to 50 times without losing the accuracy. So the slide is going backwards. Can you switch to the first slide? OK. Yeah, so as the previous talk mentioned, AI has lots of computation requirements. So I proposed a deep compression algorithm where we can prune the redundant parameters and the quantization to shrink the bit width per parameter. And as a result, we can reduce the model size by 10 to 50 times. However, that caused another problem to run on those spars and compress the model because it brings sparsity, which is not efficient to run on conventional hardware. Therefore, I proposed the efficient inference engine that can directly work on the compressed small model rather than inflate the model to run the inference. So recently at MIT, we are working on making AI efficient with any resource, including both hardware resource and also human resource, especially human resource that is very expensive. And we want to apply auto-ML techniques to make AI to design better AI. And then for computational resource, we want to have, for inference, with low computation resource, we want to have a small model, small computation. And when you have a lot of computational resource, you want to have a scalable distributed training. So we work on this algorithm and hardware co-designing. First, propose new algorithms, adapt them to require less hardware computation. And also, we are designing an efficient hardware that provides more computation. And with human in the loop, we are working on AI-assisted design automation to automatically optimize the neural network and also automatically design the hardware. So this is a big picture for conventional AI, require big data computation and engineer. So in modern time, we wanted to do design automation to reduce the amount of engineering efforts to tune the parameters for both the neural net and the hardware by using a lot of computation power using the data. And later, we are also aiming at reducing the amount of computation power to train, to do neural architecture search. For example, in our recent work covered by MIT News, we are able to design neural architecture search algorithms that are 200 times more computationally efficient than conventional NASA methods. Actually, previous neural architecture search took more than 48,000 GPU hours. If you convert that to US dollars, that's roughly 100K US dollars, which is the cost of maintaining a PhD student. So when I just joined MIT last year, I don't have those computational resource. So we designed a more efficient algorithm that can do neural architecture search directly on the target hardware called Proxyless NAS that take only 400 GPU hours, which is the cost of having a group lunch or group dinner. So we can shrink the computational resource, but delivering equally good models that ran fast on your hardware to use AI to design better AI. And we find some structures that AI found previously human didn't find, which is very interesting. And also talking about when you do have lots of computational resource. So IBM, last year, introduced us to the Summit supercomputer, where we are able to train with 1,500 GPUs. So previously, we don't have GPUs. Now, we suddenly have so many GPUs. Can we well utilize them? So my student came with an efficient model that scales really well to 1,536 GPUs. But previously, when we trained those video understanding models taking two days, now it takes only 14 minutes, more than 200 times speed up. And by designing the small models smartly, it takes less networking, takes less IO. That's the reason it can be more scalable than those conventional methods. So we have some demos on the efficient video understanding. We can show you later. So in conclusion, we showed a beauty of the efficient model design with hardware and algorithm co-design to enable scalable distributed training and efficient inference on those power and energy constrained devices. Thank you. Next, we'll hear from Professor Vivien Se. So Vivien, just introduce yourself briefly. And then, thank you very much. All right, hello, everyone. I'm Vivien Se. I'm a professor in the EECS department at NIT. And so I'm excited to share with you some of the research that we're going to be doing in our group and how it might relate to the future of computing. In particular, I'm going to emphasize the work that we've been doing for efficient computing for AI and robotics. I just want to highlight that this is work in collaboration with Sertesh Karaman, Joel Emmer, and Thomas Heldt. So a lot of the exciting algorithms that we're seeing today is happening due to a lot of the process in the cloud we saw just in the two talks ago about the huge carbon footprint that it emits. So there's a lot of computational challenges there. A lot of the research in our group focus on how we can move the computing from the cloud out into the edge, though, and then do the computationally locally in the device. And there's many compelling reasons to do that. The first reason is communication. So if you don't want your system to have to rely on very large communication networks, and so if we want this technology to reach people around the world, we want to have less of a reliance on communication, want to bring the compute locally. Another reason is privacy. So there's a lot of exciting applications for AI these days that deal with very sensitive data. So for example, in the healthcare space, when you're collecting a lot of the sensitive data, you might not want to push it all to the cloud. So again, if you can bring the processing locally to where the data is being collected, you have much more privacy and also arguably more security. And then finally, if you're talking about applications where you interact with the real world, so for example, if you're talking about, let's say, self-driving cars or robotics, latency is really important. You might not be able to afford to, if you're driving down a highway really quickly, wait for the time it takes to send the data to the cloud, wait for it to be processed, come back and take an action. You really want to do the processing locally to meet these latency constraints. So these are all the reasons why we want to move the computing from the cloud into the device itself. Now the main challenge for doing this is actually power consumption, right? So, let's see. Existing processors consume way too much power. If you're thinking about self-driving cars, often we're talking about up to 3,000 watts of compute. And this is gonna be even more of a challenge if you wanna shrink it down to something that might be the size of the palm of your hand. So for example, your phone or even smaller robots, on these very small form factor devices, you can't afford to have a very large, it's working, yeah, a very large battery because of the weight and the size of the battery, so your energy is limited. And then if you look at existing embedded processors that are typically used for these, even these edge processing, they typically burn over 10 watts, whereas on these handheld devices or small devices because of the heat dissipation, you want the power consumption to be under a watt itself. So how do we go about doing that within our group? So we look at a kind of a holistic cross-layer design. So similar as what Saul mentioned, we looked at how do we build new algorithms that are not only accurate, but also account for the complexity of the energy cost, right? So how do we make them both efficient and accurate? How do we then build new hardware to support these algorithms? So often there's a co-design aspect where you wanna have the hardware in the loop when you're designing the algorithm itself. In terms of designing new hardware, you typically need to design the hardware from the ground up, so you might have heard a lot about domain-specific architectures recently. This is because transistor, the Moore's law is really slowing down, so the only way to get more efficiency is to really strip down the computers and design them specifically for a given task. And really what we focus here is to focus on reducing data movement as it turns out moving data is much more expensive than computing on data. It's just like kind of like communication, communicating data is more expensive than computing on it. So even within the hardware itself, we try and figure out ways to design the new compute architectures to reduce data movement. So this is a chip called Iris that I designed with Joel Emmer and we really focused on reducing the power consumption for inference so it can do inference in under a third of a watt. And then finally what's also really important is that we wanna look at how we integrate these compute hardware into actual systems themselves. So for example, if you're thinking about applications like robotics, you wanna look at the interplay of both computing, sensing and actuation and optimize across the entire thing. You wanna take a really holistic approach. And so what can you actually do by enabling all these types of optimizations for energy efficiency? I'll just give you two quick examples. One is that it turns out there's a wide range of robotics out there that actually consume very little power in terms of actuation, right? To interact with the real world. So for example, you can imagine lighter-than-air robots that are kinda like blimps so lifting it is very cheap. Miniature satellites that do deep space exploration doesn't require too much to stay up there or even very small origami robots you can imagine for healthcare applications. So all of these things, the actuation power's really slow. So as a result, the computational power, if you wanna make these autonomous, is also very critical. And one of the key first steps that we've taken towards this work, with Sir Tashkaraman, is design hardware for robot localization. So if you're gonna do autonomous navigation, the first thing you have to do is figure out where you actually are in the world and where you're actually looking at. So you can imagine, for example, this here's a video of a video coming in and we're figuring out where the robot actually is in the 3D space. So together we built this chip called Navion that really optimizes for this. Again, Navion really focuses on reducing data movement. We're actually able to integrate all the data for this navigation process on the chip itself, so for this localization process. And as a result, we're talking about orders of magnitude, energy reduction, compared to a general purpose processor. Another application space that I'm really excited to share with you is this work on monitoring neurodegenerative diseases. So we know that dementia affects millions of people around the world. One of the big challenges for this is that often the way in which you assess a given disease is very qualitative. So a patient might go to see a doctor and they get asked a series of questions. It might be asked to draw a clock. And the specialist will assess them and it's often very expensive to do that, so it happens very infrequently. It's also very qualitative. And so as a result, we don't get that much data to help analyze the patient. What's been really exciting is that recently it's been shown that eye movements are correlated with this neurodegenerative diseases. And so if you can measure the eye movement or the eye reaction time, it does show some indication of these diseases. Now, typically right now, these type of measurements are also taken in the clinic, but if we can bring AI into the phone and do a lot of the processing on the phone, you could actually do these tests on your smartphone itself, get an assessment, and then as a result, you can take frequent measurements in the home itself and this gives the physician a much more richer data set to help assess the patient's state of mind. So this is in collaboration with Thomas Heltz, and I shall also mention that we've recently been funded by IBM for doing this coming cycle. So that's basically, as well as true examples, but really the key takeaway here is that energy efficient AI can really extend the reach of AI for a wide range of applications beyond the cloud itself. So it's very promising to making AI ubiquitous, but in order to achieve that, we really need to take a cross-layer approach and look at the algorithms, the hardware, the applications and the system itself to truly enable all of this. So there's more of this on our website if you're interested. Thank you very much. Thank you. Thank you very much. And finally, we have Jerry Chow, Chow from IBM. And Jerry, if you'd just introduce yourself briefly. Sure, thank you very much. So my name is Jerry Chow. I'm a senior manager of quantum technology from IBM TJ Watson, and I'm coming to you guys today in this panel but from the other direction of quantum computing. And if you listened to Dario's keynote this morning, you heard about effectively the story of the future of computing from three different parts, right? So bits, you have neurons, and then there's the qubits part. And so I wanna talk to you a little bit about where that really starts to come in, how deployable systems start to make an impact, even today for actual access to the community and really building up the basis for the future of computation. So what's an interesting thing to ask about when we look at computation is effectively what are the limitations of the systems that we have today, right? So I think a lot of us are here because we are always engaged into where does computing take us into the future? We know that there are easy problems that are handled in polynomial complexity for our traditional bit-based computers. But then we also know that there's a class of problems which are extremely hard, right? These are ones which scale exponentially in the number of features, which scale exponentially in the number of variables, and these include things such as simulating quantum mechanics itself, understanding chemistry, particular some types of very deep neural networks with a lot of features. And so these problems we know are difficult for traditional computers, and so are we at the limit of what we can do with our computational model? Fortunately, the answer is no, and that's really thanks to these beautiful ideas of looking at bringing quantum mechanics back into information theory. And so this is looking back into the 1970s where Ralph Landauer, Charlie Bennett, were able to really put the quantum mechanics and physics of information into play so that we can actually find a more fundamental understanding for how to process information. That of course led to a lot of many other thought leaders going into thinking about how you can actually perform computations using quantum mechanics to the point that we define a different element for actually computation rather than just a bit, but in fact a quantum bit or a qubit, which allows us to have these quantum mechanical effects such as super positions and entanglement. It effectively changes the rules of how the information is processed in the underlying computation. And once that occurs, we're able to actually potentially find ways of extracting more from these systems that is more efficiently done than you can with traditional computers. So putting it in a pictorial way, our traditional computers, we have inputs and we have outputs and we want to perform operations bitwise on those inputs and outputs. But with a quantum computer using these quantum mechanical principles of entanglement and superposition, we have a much richer access to computational space. It's effectively exponential in size. And so in between of these inputs and outputs, you blow up to say two to the end, if you have any qubits. And now the idea of a quantum algorithm is how do I best use those different paths to interfere, to destructively interfere wrong solutions, constructively interfere the right solutions so that at the end you get an advantage on performing a particular computation that you'd like to. And so really now this is where we can start looking into those problems that were more challenging. So finding also those ones where scale exponentially in optimization or chemistry or machine learning and try to address them using quantum mechanics. So then we are now actually building real systems. So at IBM we've been working on building superconducting qubits that need to be cooled down to extremely low temperatures using cryogenic infrastructure. This is a picture of our most advanced system. It's called IBM Q System One. And it's been engineered in order for us to have really high performance of the underlying quantum processors so that we can actually start to do really, really good science on it. And even at this current stage where we are with these quantum processors, which is still early in terms of the total numbers of qubits, we're already able to start to explore some of these problems that we're talking about. So looking at molecular structure for simple molecules, as well as looking at classification problems. So actually performing a quantum support vector machine to perform a classification task using a simple quantum processor. And these are results that we were able to publish in just this past year. Now, the point here though is also it's beyond just what the research has done in our lab. We're really excited that we are able to actually have these systems hosted on the cloud for anyone to access. We launched the IBM Q Experience in 2016, which at the time was just a five qubit processor on the cloud. And now we've expanded to a large number of processors that are available for different levels of access for clients and for public open access. But we've really been excited about the number of users that we've been able to get engaged with quantum computing, as well as the research output that's come out of it. So 190 plus papers in just over three years. That's something that I know we wouldn't be able to do just ourselves within a laboratory. And then we also see this as an important part of building both the usage part of the systems, but also the software stack. So a lot of the work that is going into quantum computing today is to actually build the open source software infrastructure. For us, this is called Qiskit. And we have various different levels of APIs so that different end users are able to enter and actually make use of the quantum processors to learn about how to program a quantum system. With that, I just wanna end by saying that right now we're in a really exciting phase where we went from this transition of science into becoming quantum ready, where we wanna engage the community, get broader buy-in in terms of learning about quantum so that in a few years, as we improve these systems and get to the point where we can actually have an advantage over traditional computers, that we're able to actually leverage that and have the community and have businesses use it. Thank you. So we're gonna get ready and get our panelists up to the stage. Thank you very much, Sherry. And then I'm going to get a crash course and how to use the crowd sourcing of questions. I guess it comes up here. Excellent. Should we sit in order or? Order you. Gosh. Yes. However you guys would like. I think however, let's see. Maybe we like. Let me get out of the way. No, I think we're all good for my question. The list, the list. Good. I guess we'll... I think we can sit anywhere. That's okay. So if it's okay with people, I'll ask a couple of questions and then I'll go to Slido. Is that all right? Was that allowed? Who's my boss? Okay. Vivian, I was very taken by... First of all, thank you all. And I was very taken by the specialized hardware. I'm a hardware geek. I've got more horsepower in my pocket right now. I've brought all my specialized stuff so you could sign it. But I was... I've lived in this world for a while and I know that when you start to do customized hardware, there's always this trade-off between customizing the hardware and sort of optimization of software, the sort of urge to early optimize. What's the trade-off between general purpose with better software and specialized hardware? So it's always kind of an open question between the trade-off between flexibility and efficiency. So ideally you'd like everything to be flexible. You can support the future-proof but there's often costs that are associated with that. Whereas if you have very specialized for a given task, then you don't have all that flexibility overhead but it also can be very efficient. So I think it really depends on the type of application. So I've worked on a variety of applications throughout my career starting with video compression. That's a space where energy efficiency is really, really critical. So you have video compression on all of your phones and they're hard-coding it is very key and it actually works out well because there's some form of standardization. You still need a little bit of flexibility for the encoder to, for companies to differentiate between the products. At the other extreme, you still have things like deep learning which are continuing to evolve. So obviously a lot of flexibility there is very key to be very critical if you want to keep up with the new algorithms and the new models. And then there's some place in between where in terms of robotics you want it to be still efficient but you want to be able to adapt to for example different environments where you navigate. So you still have to have some form of flexibility there. So from my view, it really depends on what is the application and where on that flexibility efficiency curve you want to be at. Makes good sense. For Andrew, I'm gonna actually, is this okay? I'm gonna go to the second question which is when I was gonna ask you as about when do you think that quantum and, I'm sorry, I was gonna ask Jerry this. Oh, I was just like, all of a sudden I went, wait a minute. Jerry, when do you think quantum and AI are really gonna intersect specifically? And what steps do you think those first ones would be? Yeah, no, that's a really great question. I think that quantum computing and how it intersects with machine learning is a really, really hot topic right now. There's a lot of research going into how those threads mesh, right? And the whole concept though behind it is really how can we use the fact that quantum computing, these qubits give us access to this very, very rich and large state space of exploration. How do we use that to effectively look at machine learning type problems where you also might have a very rich and deep feature space? Are there connections there where we can in fact find classification planes using the quantum space more efficiently than you would in a particularly deep neural network? So there's a lot of thoughts about how you would actually map problems with classical data that you want to actually classify onto these quantum computers. But all I can say right now is it's an active area of interest, there's a lot of research going into it and I think that we're really excited about it. Jerry, I had a follow-up question if that's all right. Please, that's even better. So I'm curious about the energy usage of quantum computing. It's clear that quantum computing can do massive searches very quickly, but my guess is that the energy consumption is pretty large and that it's larger now than it would be had we done the same computation on silicon hardware. Is the crossover gonna happen? So actually the energy consumption is not very large. So even though we are using these systems to cool down these processors to very low temperatures, that's a fixed amount of power. Effectively you need to run a couple of cryo coolers which take up some energy, but it's certainly not nowhere near a HPC center's amount of power. And for that matter, currently we're starting to build these systems to the point where the number of qubits within them, that state space is starting to grow large enough that you can't simulate that on a traditional computer. Any traditional computer. And so as we are getting towards this realm of say 50 to 100 or more qubits, you're really pushing up against that barrier of at least some aspect of the system is beyond what you can simulate with any high performance computer. Good news. Andrew, let me ask you just kind of to go back to that system that I was really taken by the point you made on power. Do you foresee a day when we're actually getting like a carbon price tag for our algorithms? Like that we would choose how much hyper parameter stuff we would do, I mean cost is one thing, right? We know how much it costs, but do you foresee a day where we'll actually know the carbon footprint of our calculations and make different choices? Yeah, I'm arguing that that's the way that researchers should be thinking now and that they should be reporting figures that would enable carbon to be calculated as part of their current papers. And I think that advances in hardware, well I think quantum computing is almost just a different animal and so incredibly wonderful that I'm not sure quite what to say about that, but in silicon computation I think that advances in hardware design like TPUs and other similar things are going to be helpful, but in a way I say with some guilt that the machine learning community will just consume hungrily whatever we're provided with and I don't really think about it as a major solution. What I think about as a stronger solution is things that actually relate to what Joshua Tenenbaum was talking about earlier, learning that's more human-like, learning that's done in lifelong fashion rather than training a model from scratch for every new problem, we'll be able to build on a lot of training that's already been done and to learn some new related tasks just do a small amount of training on top of that. And for all of the complaints that I had about BERT and how much carbon it's using now, that's already I think a small example of a step in that direction, because Google released the train parameters for their BERT model and there are many people who just consume those parameters and do a little bit of training on top. I'm arguing that that should happen a lot more in the future. So I guess to take that point and ask Song and to go to the Slido, when you think about the trade off between hardware and software, how much do you think hardware can solve this problem versus software algorithms? How do we allocate our effort? I know we have to do both, but where do you see the biggest energy savings being? Do you think that there's a magic bullet in hardware or do you think it will be through software and approach? Yeah, that's a good point. I can start by giving an example from my previous work on keep compression from the algorithm side and efficient inference and in the accelerator side. For example, for deep compression, we can make the model an order of magnitude smaller. So that's roughly 10 times if you can think about it. And from the hardware side, we can squeeze everything from DRAM to SRAM. So that's immediately from 240 Picojoules per word to only like five Picojoules per word. That's like two orders of making the saving. So that's just one example, the saving from software, like one order of magnitude from hardware, two orders of magnitude. However, it's not possible to have this DRAM to SRAM case without the model compression. If the model is so large, you still have to access DRAM. It's even not possible to have those two orders of magnitude saving from the hardware side. So the co-design is really important. That's good. Vivian, one of the things that I think about is the answer, some of that is going to come from hardware innovation. I mean, when I look, think about Edge, I think about the huge diversity in standards and stuff. I was an IoT guy and I kind of wonder, do you envision large scale deployments of Edge-based IoT? What would drive the unification of that? So it was a single addressable software environment or do you think it's gonna always be, it depends? I think it depends. I think that's a very big business question, right? I think there's been many, it's a very fragmented space right now as I understand it from the IoT space. So how to unify that and what is it that would bring it together is certainly. Well, can I push a little harder on that? Is do you think that some great software innovation of standardization would drive the hardware people to kind of get in line? An Edge, kind of a combined cloud and Edge kind of thing that would be so compelling that people would converge? If people are using it, then the hardware people will come and provide hardware for it, but it's kind of chicken in it, yeah. That would be kind of an interesting, I don't know what the answer is. You have a. So I had a related question for the end. So do you think, would you say that on the whole putting computing at the Edge it reduces power consumption or are there economies of scale of having the computation happen mostly in a central location? And I know that for individual systems it could go either way, but I just wondered, does a broad trend that you could say? It certainly saves. So one of the most expensive parts of computing the Edge is communication, right? So it's just like moving data is always expensive. So moving data far away, even off the chip to the cloud is expensive. So I think you can factor that in. I think there's a lot of, in addition to the energy, there's a lot of benefits of doing in the Edge like I was talking about in terms of latency, privacy. I think in the Edge you also have the benefit, I guess, of diversity where you can really tailor it towards your particular application. Whereas in the cloud it tends to be more general purpose because you have to serve many clients, right? So that flexibility versus efficiency trade off, you might make a different decision if you're on the Edge in which case you can be more efficient. So I'd like to think about these ecosystems and Jerry, I'll go to you. And it's been very interesting to see how, you know, quantum, you've worked very hard early on to build that ecosystem and to try to train people and the sort of crowdsourcing of that ideas. What kinds of insights have you gotten from people outside of your own development area that have influenced the actual evolution of that? Yeah. That song, I'll ask you a similar question in hardware too, yeah. Yeah, that's a great question. Especially with the Q experience, with the software stack that we've defined it within Qiskit, it allows to have a broad array of end users to contribute. So we have those that want to sit at the very top who want to explore, say, new applications and algorithms and chemistry or machine learning organization. You have those that want to actually be in the middle, that want to actually study some of the nitty-gritty details of what you would call the guts of the software stack. So things like compilers, coming up with new compilers that allow you to translate the end algorithms into a smaller number of the underlying gates. We've seen people develop faster compilers, in fact, and then we had in our lab. As well as coming up with ways of characterizing noise and visualizing some of the end states, you see a lot of people looking into trying to benchmark the quality of the underlying processor because that itself, as these quantum processors grow, it becomes more and more difficult of a challenge to benchmark them properly and you see a lot of people innovating and thinking about ways to do that. That's really cool. And a somewhat related thing, you're doing a song, you're doing AutoML, right? And so as we start to move towards automation, are you finding that the algorithms are giving you insights that you're a human designer, a network designer for neural networks would never have? I mean, you're coming up with something that's non-intuitive. Yeah, that's a good point. So, yeah, recently we are working on AutoML techniques to automatically find new neural network architectures and also hardware architectures. And we did find, since there's a super large design space, humans usually use those rule-based heuristics like you should use three-by-three small kernels, bypass layers, inverted bottleneck, those rule-based heuristics, those primitives. But opening up the largest design space and just give the AI a reward function, tell him what is good, what is bad, and you explore on yourself what is a good architecture that not only ran fast, but also give good accuracy and low energy. And we find some interesting phenomenon. For example, on GPUs, AI will prefer a large kernel, even like seven-by-seven kernel which human designers has almost forgotten or ignored since people think three-by-three kernels, it has 27 weights, but one seven-by-seven kernel having the same receptive field, they have 49 weights. But people neglected the fact that GPU has so large parallelism, it's even not full. You cannot keep the GPU fully full day full. It stays hungry if you run small kernels. As a result, the AI agent determined to use lots of large kernels for GPUs, quite shallow but large kernel. Well, for CPUs, they prefer to use very deep layers and small kernels. And this is just between CPU and GPU with CMOS technology. And imagine in the future when we have new technologies, different hardware, maybe the AI algorithm will be completely different and human design, it'll be, we can do a heroistics, but it'll be even better if we can have those new heroistics from AI. Just to ask kind of a follow-on question adapting from one of the Slido questions is that do you imagine that in an age where most, do you, first of all, do you imagine that there will be a time when most models are developed using AutoML? If what models can be developed using AutoML, but there are still some challenges. For example, the search cost is very high. We are trading GPU hours for human hours, right? For example, we mentioned the CO2 emission. We calculated the initial neural architecture search from Google. It takes 11,000 pounds of CO2. And how much is that? That's like five trips from flight trips, one person from New York to San Francisco. You fly five times, one person that causes 11,000 pounds. How many student dinners is that? That's 100 K US dollars, like a PhD year. But however, we did see some opportunity, like my students, they develop a smarter algorithm by making latency differentiable. So you can directly do back propagation and search those architectural parameters, learning not only the weights, but also the architecture. So we cut the search cost to only about 50 pounds, 50 pounds of CO2 emission. That's the one human breath for two days, that's the amount of CO2 emission. So we still see some promising directions to use an algorithm to design the model. That's good. Vivian, there was a question up there that scrolled off, but in terms of the dance between memory and logic, what do you see the migration of sort of memory-centric computing? We were talking about it earlier on. How do you see that playing out here? Yeah, I think there's certainly been a lot of interesting research in the space of processing, near processing and memory. Since data movement's very expensive, the idea there is that if I can bring the compute to where the weights are at, then I really significantly reduce the amount of power that I need to consume there. Actually, we had our project sponsored by IBM on this topic as well, and I think this actually echoes a lot of stuff that Song was mentioning too, which is that basically one of the challenges there, if you're moving all the processing into memories, also you might need to think of how to design your algorithms that are more tailored towards this type of processing architecture. So one of the big benefits and also big challenges of these processing memories that you can build these very large arrays of weights to store your data itself, and that's useful for density reasons, and also to amortize the cost of moving the activations or the inputs in and out of the weight array itself. But the existing networks that people have been using, they've been trying to make it smaller and more narrow, and so as a result, you can't really fully utilize the full array in processing memory. So the question is, how can you redesign some algorithms that map better to this particular hardware and just really once again bringing the hardware into the loop when you're optimizing the algorithms for these even more exciting or future-looking platforms? So I think it's very promising, but it's important to kind of look at it holistically, not just as a piece of hardware, but... Oh yeah, I was gonna ask you anything. Go ahead. Actually, I was gonna ask you a separate question. That's all right, based on the third question that I see on the title. Yeah, please. So I understand that in the past IBM, or maybe hopefully still, it was working on analog computing, which I think could be a great efficiency for deep learning, and I wondered about a status. Yeah, let's talk, let's ask just across the panel. So I'm an analog person. We all are, actually. Just keep that in mind. And there's a lot of promising technology. Tomorrow there's gonna be a session, I think the third section tomorrow, I think, is it number three? Session three, and where they're actually gonna be talking about that. But logic has had a long run. Do you anticipate, do any of the panelists anticipate that these kind of new analog like analog neural nets or any of those kind of technologies will have a major impact on AI before we move to quantum or the process? Okay, good, got you down. What are you, so I know Song, your group is actually doing analog, you're using AI to optimize analog circuits. What do you think the promise of non-digital circuits for AI? I mean, we're using 20 watts here, and you know, megawatts there, so is there some? So basically deep learning is using multiplication and add as a basic operation. So the way to minimize the power is minimizing the number of switches from digital. So if analog devices, we can reduce the amount of switches and also have the scale that digital circuits can provide. I think we can do some useful stuff, provide both reasonable amount of precision and reasonable amount of parallelism as long as we can do them. Physically, yeah. Vivian, do you see that? Yeah, so I think it aligns with basically, so I want to be analog compute as opposed to, I'm not exactly sure what analog neural nets are, but analog compute in the context of processing and memory is certainly a space that a lot of people are pushing through, and it offers a lot of, promise there's a lot of challenges too in terms of the fabrication, the scaling up of the design, and as we mentioned, the mapping of the algorithm onto that platform, but certainly in terms of density and the opportunity to minimize data movement, it has promise and there's a lot of new devices. So with that, I think if I understand what this thing is telling me, this analog, it's a digital analog read out here, is I believe that we've gotten to the end of our time. Could I ask everybody just to have a, what do you think the biggest impact in AI technology is going to be as we progress? Just a three word sentence. Is that too hard? Okay, everybody think for 30 seconds. What would I say? Anyone want to offer? This is the graduate moment where somebody says, in plastics. I would reiterate lifelong learning, not training systems from scratch, having strong priors. Yeah, we go several S. So S is the first letter of my first name, but also stand for small models for inference, small, efficient, scalable system for training, go large for training, and specialize the silicon, specialize the silicon to make a effusion. For us. You've got three V's for us, Vivian. So what I would add is that, I mean, we talked a little bit today about how quantum computing could make a difference for machine learning. The flip side is actually true. There's a lot of effort. In fact, with the MIT AI collaboration right now, looking at MIT and IBM collaboration and get how machine learning methods can help us understand and characterize the underlying quantum systems better. And so from that, actually develop, understand the noise and build better systems for the future. It's okay. All right, so I'll fit into a V1 V. So vertical integration. So looking across the entire stack, I think it's really important to look at all the different layers and how they interact and really think about the application challenges that we're having and try and optimize across everything to do that. Great. I think you all rose to that challenge very well. So thank you all very much. I think that is that the end of the program for today. Oh, just my talk. Okay. I'm off. All right. Thank you very much. Thank you very much. That was great. All right. Thank you. Thank you so much.