 So, good afternoon. Welcome to Purdue Engineering Distinguished Lecture Series. This lecture is, today's lecture is hosted by Elmore Family School of Electrical and Computer Engineering. I'm Dan Zhao, Synopsis Professor of ECE and Associate Head for Resource Planning and Management. So many thanks to our distinguished lecture, Professor Malik from EECS in University of Berkeley, UC Berkeley, who delivered an inspiring lecture this morning on robots that can learn and adapt. Now is a panel session of this event. So it's a great pleasure for me to introduce the moderator of this panel session, Professor Carti Romani. Then Professor Romani is Donald W. Fedeson, Distinguished Professor in Mechanical Engineering. He is also a faculty, a professor in ECE, my department. He is also a professor in Department of Educational Studies. In addition, he founded two successful companies, had 30 patents, many of which were licensed, his students, he also encouraged his students and found many successful companies. He was as multidisciplinary as one could be, and being an expert in artificial intelligence and human machine interaction by himself. Professor Romani is going to organize a panel that engages our distinguished guests, faculty representative panelists in multiple departments, audience on campus and online to explore across boundaries and stimulate broad and thought-provoking discussions. So let's welcome our distinguished professor, Professor Romani. Thank you so much for the introduction. So, first I wanted to give a very short quick introduction of Professor Malik. And not only is he an excellent researcher, but also has won teaching awards, excellence in teaching awards and member of all the National Academies Engineering Science and also the American Academy of Arts and Sciences. More importantly, he has done a lot of very pioneering work. Some of them that stand out to me in particular are normalized cuts and shape context which tied to some portions of what we are going to talk about. The vision side and also the robotics talk was very inspiring to all of us. And some of the recent work that I particularly paid attention to was the 3D post tracking and the earlier model on object rotation with robot hands. And also I wanted to thank all my colleagues here, David, Dan, Aniket, Chang and Joy, and also to Professor Jeff Siskind who was the instigator of many of the events happening today for this nice framing of what the panel is going to be doing. So, with that, what I would like to do is the first 20 or 25 minutes I'm going to pair each of my colleagues with Professor Malik and it'll be like a three to four minute discussion on each of those questions and that will sort of frame everything to the audience. So I just have what we'll do is we'll have all of you participate and ask questions so don't mind challenging anybody. This is an open territory in terms of what is going on. But first to sort of frame the overall, overall direction of the panel. I mean language certainly is a powerful tool for describing and reasoning and so on, about a lot of things, perhaps, much more than what we can do. But also, vision and robotics has had had a lot of attention in the past, and more importantly, in the last several weeks, we have had increased attention to chat GPT. But more importantly, even more important perhaps is what got released yesterday by Facebook, and one of my students in the audience, we were playing around with it earlier today, segmented scale, sort of very much related to what Professor Malik has done earlier. So language and semantics on the one side but also the other side, the vision and the physical world that we live and work in and robotics role so framing that in that perspective what I would like to do is sort of start with. Start with a question to my colleague, David. David does a lot of research and probabilistic understanding and has got a machine learning lab in a very open mind so David, my question to you is, what can we expect with natural language and vision working together at large scale or are we seriously limited in any different many ways. Right, so, I guess, I don't know if I have a full answer to that but I have to two points I thought would be useful in this. One is kind of from a philosophical viewpoint one's a practical so philosophically, to some extent, I think language has the ability to convey much more rich semantics than say vision or images. So try to write, for example try to write a research paper with only images. You can't use captions, you can't use labels, it'll be very hard to convey meaning with just images. On the other hand, images, you know if you're trying to describe a missing person. They have more precision, you can just show them a picture of a missing person and everyone can then find them as opposed to trying to describe every little detail of the person. So I think they have very different ways from the philosophical viewpoint they might have different kinds of information. And then from a practical viewpoint I think the data sets are very different. So, so natural language has incorporated in it. We use ideas causal notions we use because, you know, all the time and we're explaining things. And so there's, and there's logical reasoning in language already built into the data sets of course that we have our people have already used logical reasoning, and that's not envision in, you know, explicitly and robotics that might be built into say the simulations, things we understand the causally about the world but in language it's already kind of bear already in the data. And so I think that would also make a big difference in terms of the between the types of data, kind of the qualitative differences of the data sets that we have. Yeah, so so I think two key points are coming from from your points of view one is that language is much more richer than vision at least as computer vision sees it today, but also your other point touched upon causality the why question of understanding So maybe, Professor Malik, can you comment on, you know, your, your lot of insights into robotics on the one hand but also human perception and vision on the other side. What do you think are the big chunks of missing blocks in terms of maybe even getting to some level of accomplishment in terms of robotics and vision that LLMs are capable or in doubt today. Okay, so maybe I'll start by making a comment on vision and language because I think this is what making. So, there's a difference between how children acquire their understanding of the visual world and language and how LLMs do. So in the case of children they start out with with obviously with vision but also with accompanying language given from their parents and siblings and so on. So, if you look at the process of language acquisition, the, the words that children acquire in their first 123 years, they are all very concrete words. Like dog, milk, bottle, mother, hungry, things like that, right, which are very, which for which you can imagine a person place thing action, it's all very concrete. And then they start going to school and then they start to read, and then their vocabulary grows, they read books and then they acquire the meanings or words in context and so on. And then they learn words like justice or peace or fairness or things like that, right. So, there is a, there is a point at which the, what they are acquiring is just through language. But there's an early stage where the language is grounded very much concretely in physical objects or physical actions. There are these beautiful curves of, if at the age of five, their vocabulary is mostly concrete and at the age of 15 it is many more abstract words than concrete words. Okay. Now what these LLMs have is meaning acquired in this way which is based on relationships or words to each other. I mean it connects to this theory called conceptual relationship theory for meaning that you can think of them as arbitrary symbols it's the relationships which give meaning. And that's what is, and of course in that sense language is like a reflection of the world so lots of things are about the world which show up in language, and then they show up in, and then that's what is being used by chat GPT or GPT for. I want to really end with a question which is, what can we achieve without grounded language, which is what LLMs have, what will be for what will grounded language be needed. Now, of course, I think that we have grounded languages coming. So grounded language is there and when you have we have these models like clip and blip and all that, and then like the segment anything I you can imagine over the next couple of years models to emerge such that from any image, you will be able to say, what's every object in it, and you'll know the boundaries of it. So, but then I want to see a link between that understanding, which a two three year old has into the rest of the language model. And that to me is like a research question because I don't think anybody has shown that yet. It seems it's much easier to teach a robot than a human. Yeah, two three years versus your robot learning at no time in simulation. So, you know, I would be curious how we can make humans learn faster, but coming back to our points of view. I want to ask Dan to make some comments on on your ways of thinking. You do a lot of work in natural language and real world scenarios, especially to guide NLU. So from your point of view, talking about mental states and morality in the social side of language. What do you think are the big limitations and differences in terms of use of language in the physical world with human beings versus machines. So, so I guess my answer would very much echo what we just heard. It's, in my opinion, kind of hard to think about language understanding without somehow mapping it into the real world right languages a mechanism is to communicate it's there to influence our relationship with other people to manipulate objects in the world in response to commands and right now these large language model defined the meaning of languages more language, which somehow seems to miss on the communication goal. If you think want to think about an example the answer to, can you pass the salt is not yes it's this movement with the salt in your hand right in the kind of work that I do and this is specifically in the context of a recent DARPA project. We tried to analyze social situations across specifically across cultures and we want to understand if people are offended or did we somehow acted in the wrong way so this is a very smart conversational agent that can take this kind of social cues. And one of the things that we noticed is that language is really not enough our version of, you know, passing the salt, or sort of programmatic representation of the world requires to understand, well, maybe it's not what you say but what is the context, the social context in which you are right so you would act differently in a business negotiation or around your group of friends, you bite me you might be more formal with your extended family or with just your mom and dad, and having information that helps contextualize language is really crucial. So, I guess like thinking about what would be the next generation like GPT seven, I don't know. I think there are two big research questions that I think we should deal with the first one is how to amend the representation that language uses such that it identifies prominent aspects in a scene a video or real real world scenario. The next project is how do we derive supervision that can help guide our language understanding. For example, in the context of the stuff that we that we're working on right now, from somebody who looks upset, which is an easy supervision signal to get from an image or from a vision is a very strong queue that we've probably used the wrong language, right and you could also think about where a person is looking at is a way to contextualize the space of things that they're going to talk about. Thank you for your comments. Maybe, maybe you can comment on connecting some of the issues that Dan has brought up in terms of your recent work in learning from Internet videos, and also, we talked earlier today about the kitchen and how far are we away from teaching aspects and social cues and ability to interact with humans in terms of all those. You think that we want them to do. Yes. So I think I feel that language is shadow of the world but it is not the world. Right. And, and this is why the excessive reliance on LLMs doing everything is problematic. Because, yes, the world gets reflected into language and the text that we can train our systems from, but it's obviously missing ground in which I think that problem will probably get fixed, but it, but there are more aspects, I think that has been referred to just recently, but theory of mind, I mean do we need to have an explicit modeling of that. Do we need to have a modeling of just physics of the world. So, as every human being we have, we have a basic sense of, you know, objects fall break, etc, etc. And that's different from saying that a glass will break. An LLM will know that a glass will break, but does an LLM really know what a glass, that a glass will break, because that observation of you know something falling and shattering into pieces and, and so on, there is a rubber ball will jump. So this. So, in a sense, we, we care. Oh, sorry. Yeah, so we care about. So what's fundamentally important is the world and language is a shadow of the world. And so I believe that we should get to the world and languages and adjunct. Therefore, I believe that the right that eventually we want to learn from video. So, so there is, in fact, I looked this up recently, there's 156 million hours of video on YouTube. Okay, if I convert this to seconds, it comes down to about half a trillion, half a trillion tokens, if you regard one second as a token. This is not that far off from the amount of language that was used for training these GPT models. It's actually on the same order of a trillion. Okay, but what would it, what would the video have which language won't have it will have this behavior of the physical world, the behavior of dogs and cats and chairs and and how people sit in chairs and how people slice onions. And so forth. And I think that ultimately we should be learning from that. Now the technology for that is, is not quite there, because in language we start with a token already in images or video we'll have to convert it to something. And then, and then on top of that build the models. And that conversion process is complicated now in my group we do work on this we try to we can take videos and we can detect the people and we can lift them to 3D and we can see if somebody is manipulating some object and try to learn robot behavior from that but it's going to take a little bit of time but ultimately my goal would be instead of learning from text or instead of just learning from text also learn from video. And then we should be able to answer questions like if you've seen this movie Casablanca, you have to answer the question of why did Ingrid Bergman leave rather than stay in Casablanca. And if you want an answer to that question. And, you know, how are you going to get that I think that needs. Thank you for your comments and especially 2D 3D and then with or without depth into 3D and then the video understanding and description and so on at scale. So just as a point of view, I believe the GPT used about 0.6 terabytes and the segment everything used 11 terabytes, but of course the features and everything are different. It's kind of starting to see these large language models from from companies starting to drive a lot of our thoughts here. So maybe you can, you know, your research is on intelligent design of augmented systems and effective systems in the eye. Maybe you can talk a little bit about having thought about robot and locomotion. And talk about are there connections where LLM kind of ideas can be applied to learning path planning and locomotion for robotics at scale. Yeah, I mean, there's a lot of and sort of extending a lot of things already discussed in this panel. I'm at a point where so many things have already been discussed. How do I extend that to something which hasn't been spoken so far so we do a lot of this human robotics interaction for different applications. So, like part planning, you know, robots trying to go to a certain position trying to understand the environment around it. So far, a lot of this work has been about visually trying to model the environment. Okay, there's a space next to me. This is where I'll go as we bring in more humans into this collaborative space in order to solve these problems. There will be more interactions with with the robots themselves. And as in with any interaction there is bound to be more conversational interaction, more visual interaction. There's this multimodal way of understanding the world. I think, you know, human robot interaction human robot collaboration teaming these kind of areas would really leverage from a is large scale language models. But I think beyond just LLMs or, you know, right now we are using large models, so to speak, whether it be trillions of videos on YouTube or large language repository. Or even the way you interact I keep on giving this example in my class or in this research that human conversation is way more than just text language or whatever you see as an output to chat GPT or, you know, these LLMs. One of the project we are working on is building these robots, therapy like robots with with with therapist robots with with our medical schools here is, let's say you were to ask a question to your friend, how are you feeling today. Now your friend might answer the question saying that I'm feeling okay, or your friend might say, I'm feeling okay. These are the same text in both the cases right. If you have an output from from a system where you aren't trying to understand just based on text, you would be completely thrown off the affect or the behavioral space. We are trying to infer from from the other person is almost diagonally opposite like one is a very pleasant happy state and the other one is, I think this person needs some help. Right. So, in order to understand the world better from robotics point of view, understanding the world navigation locomotion, or even better interaction. You know, can be, we've been always looking at these problems of can we trust the robot but we also need to understand the reverse question. Can the robot trust a human based on the interaction there's so much beyond language that we need to bring in multimodal aspects in this edge human robot interaction space, in order to meaningfully solve the next generation of problem and when I say robots essentially, I mean agents which could be graphical, you know, avatars or characters, at the same time could be physical robots which are able to solve certain tasks in the real world. Thank you for your comments. Maybe you can, you can make a segue between what is being said to some of your work earlier today, in terms of, can robots start learning at scale, and if so how that can be. And also this physics that you brought in a simple bouncing ball and being able to understand that from videos, can that be connected to real ball bouncing in a robot catching it. Okay, so, so I think one way of putting it is, ask not what the robot can do for you, ask what you can do for your robot. Okay, so I think that's actually, but I'm not being facetious I think this is very important. Eventually, for the robots to help us, first we need to help them. In the process of robot learning, I feel, has two components and this is inspired by what we know about how children learn. There is a part of it which is very private and personal based on your own experience in your own body. Right, because the child, I mean, is exploring the world. She is looking at a ball putting things into her mouth throwing things, doing experiments. There's a book called scientists in the crib referring to children. Through that they get a sense of their own body the hand I coordination and so on. And so like our walking robots are sort of in that phase. But that is not the only way we acquire knowledge. Right. I mean human beings can go to the moon. Okay, you didn't. One person would not be doing trial and error to go to the moon. It is the result of our culture what and culture is not just in the last 1020 years culture goes back 100,000 years. When we first some humans figured out how to light a fire and then transferred the trick to the next generation. So there's culture is the transmission of knowledge. Right from generation to the other from parents from siblings from everywhere. So anyway, long story so much of what we can do mechanically in manipulation is not done by raw trial and error, but you learn by observing others. So I think that this is what in my view robotics will be 10% is what you'd learn for yourself. 90% is what you learn by observing humans. And so my joke is not is actually quite serious. You will the robots have to learn and for this underlying technologies analysis of video. And, and our computers today are terrible at it. So I think of this as like we talk about IQ and EQ right emotional intelligence social intelligence are computers are pathetic. They need to be able to do the things that you want that therapist to do right, which is that you see somebody and you know whether they are despondent or jubilant. And, and, and, and we'll get there but but this has to be foregrounded this has to be a central problem we all work on it. And finally my work in computer vision I've worked on many different things in the past, but at this point, what I'm totally focusing on is, is perception of humans. Yeah, thank you very nice segue to bring joy into the conversation. So, your work has been on designing novel machine learning models to improve interpretively fairness and robustness but also you have some interest in in the in the digital robotics kind of space. So maybe you can comment a little bit more about what are the types of applications that you start seeing should be be able to bring language and vision together in new ways right so any concerns, or other issues that you want to raise regarding it. Yeah, thanks for the question. So, as the like panel has Friday discuss about the like video versus language and also as Professor Malik has mentioned that language is like a shadow of the word so meaning that using only language for for the robots to understand the physical world is not enough. So definitely like multi modality like bringing a variant data into conservation like doing integration between variant data and language data so that will be a definitely good idea for the robots to better understand the physical world and also we can get better interaction with the machine so that we can learn how the robots can help us and also we how we can help the robots. So, one, in my opinion, as we mentioned like a variant and language data they are playing different roles. So language that will be definitely good at summarizing the high level abstract concepts. But on, like in contrast, we're in data will be good at summarizing or convey information in a more intuitive way, especially if we want to describe things in a concrete or specific idea. So for example, if we want to direct the direction from one location to the other location. If we describe that in language then that will take things in efficient because we need to describe we have to go along certain streets for certain miles. So basically the second stops and turn right something like that, but if we can have a map, then all we need to do is just draw a route on that, and then things can be conveyed more precise and clear and neat. So based on this I think integration of this multi modality data is actually what people are doing. It's allowing like multi modality inputs, allowing not only tax inputs but also image video inputs so that we can get better interaction. And in terms of the applications based on my own research I work in like healthcare medical applications. So people are like doing a lot of brilliant ideas for example people can use that for disease surveillance and provide virtual assistance to patient care, and then by integrating such multi modality. The machine cannot only see the text like describing the certain profiles, but they can also see maybe the video images of the patient and learn much better about the patient status, and then give like appropriate and tailored precise recommendation treatment to them. So those are the like ideas we can support as for concerns as we get powerful tools that there are any concerns for for me related to my research I work in trustworthy AI. So I really have concerns about if the model I using appropriately, if the model is making maybe replicating some stereotype making bias this screen bias the series in high stick areas. And if people from maybe because this tool is bringing so much power and so much resource are people from different maybe geographic locations were different areas getting similar access to this powerful tools those are the like fairness concerns we may have and also we may have many concerns about data leakage of like issue or privacy concerns, but I will say that all those concerns are opening up the new research directions. So it doesn't mean that we should just stop use this model, but rather we should be using this model by using that in an appropriate way. So we should learn how to use this model better so that they can better like serve our life and also we can also better better help the model and then based on the better interaction and use of with caution. So we can use that to make the human life to to be better. Thank you for your comments earlier today, you made this comment on technology going at breakneck speed and also to some extent, you are you are supporting that perhaps maybe I'm right or wrong, but also we all like theory. So what's your feeling of many of the issues that you know we as researchers, interpretively fairness robustness and many other things that we are concerned about versus how fast technology is going and our ability to have any say in it as researchers. Do you think it's decreasing or how do you see that evolving. Yeah, so I, first of all, I think the concerns that you expressed about, you know, the whole area of responsible AI is very important. And I think it is getting more and more prominent I think what I see is like, like 10 years ago nobody would have cared about it but people today care a lot both in academia and in industry. So that's a step in the right direction. That doesn't mean that we have solved it. And I think when we make a moving so fast that the problems are only increasing. So, I don't have any special insights into this as to how to how I think that there's a role for everybody in the sense that I think there is some role for government. There's some role for university ethics boards, like collecting data. I mean, so let's take an example of medicine right once upon a time you just did an experiment, but now you can't do that right there's a review board and your people have issues of privacy harm possible benefits. I mean, and their trade offs to be made but we've sort of decided that that is important rather than just, you can go and do whatever. So I think that that is that perspective has to be brought to bear. We are becoming too important to behave like kids. All days we could behave like it because nobody didn't matter, but we have become too important be meaning collectively the AI type community and that brings a greater responsibility. We cannot play with matches anymore. Okay. That's, I think high level view but I acknowledge the need, but I don't have any extra insight than anyone. Anyone else. Yeah. So your comments. So your lot of your work is on image understanding and representation, but also earlier you showed a lot of interest and had opinions on role of universities in this new and emerging areas, especially in terms of part of the generative aspects that we are starting to see becoming quite powerful. So maybe you can talk a little bit about that in the context of application. Okay. Yeah, thank you. So this is a question I think that we all care but I don't think I also want to know the actual answer so I just try to initialize the discussion. So I think that all story have two sides. I think since we are all very excited about chat GPT so I will start with the, maybe the discouraging side. So it's very expensive to train large model we all know. And I don't see the official numbers but looks somebody quote, like 4.5 millions to train the chapter, the GPT three, maybe much more expensive to train the GPT four. And I think that in the research area we are all using these stable diffusion stable diffusion is more economic way, but they still say it's a point point five millions to transfer scratch. So for a regular researcher in university, those are kind of expensive. We probably we don't offer to that. So today I see most of the research in university, maybe in the past several months is more surrounding like how to fine tuning those models. Of course there's a lot of elegant work like control not taxing version they're very beautiful. Yeah, but from that expensive training perspective I think that we probably the contribution from the university maybe will be limited slightly. But the good side, the exciting side is to, in my opinion, I think that this language, large language model perspective, allow us to revisit every single vision task. For example, in the in this morning I was reading this Facebook is segment everything actually I read the paper I write away sent to all my students. I feel this basically redefine the segmentation we were happy with secondation, but it looks like this is a racist to a new level with a and also most of the senior members I think that as Professor Malik's former students. So, but then again you may say hey this is a large model from matter. But we, we are doing machine learning, we know machine learning nowadays is more data driven. So the first question is where does those data come from, I feel actually just live with those large model also give an opportunity to say, can we use this generated synthesized data, can we use those data for something. I believe that is a great opportunity there. Yeah, maybe, for example, like this segmentation work. Open up my sauce to say hey maybe we can really think about how to learn those visual relationship in our image. Those things are typically like of data to do that. And also I believe most of the university researcher are capable, highly capable. So given the barrier of this expensive learning I believe soon, maybe a lot of massive will come out to how to December those large model into many small models to learn. I think probably it's not too far. Yeah. Thank you. You have any comments on. Yeah. So, I think it's a fair point that today, research on these very large models has essentially got concentrated in a few institutions I mean Google can do it meta can do it. Open AI can do it Microsoft can do it but they're probably a few big Chinese labs which can do it but the number is small. And what does that mean. And what how should it go forward. So I have a couple of points to make there one is that I think on in the US at the federal government level, there need to be initiatives to make it possible. And it should, we can look at other sciences for examples. So look at the astronomy people. Every university doesn't build their own telescope to send to the outer space, they negotiate they figure out okay we're going to send the Hubble telescope. And they agree, okay that this is the specifications of this instrument and, and these are the experiments we'll do and then there'll be some individual work, but by and large the community has agreed on something. Large Hadron Collider I mean the Higgs boson, you can't have every university build their own LHC you have to agree and work together. And similarly I think should be possible here so four or five million is big when we think of it compared to NSF grant, but four or five million compared to these astronomers and that telescopes in space is nothing. Right. So I think that that's a question of getting organized. I feel that we need to have a lot of. I'm arguing for big science basically, but I'm also going to argue for small science. And this is because in the, what happens in the companies is that there is a desire, there's a lot of hurting, which is that if somebody wants to do applied technique a, the other company also wants to do apply a technique and everybody wants to go towards the winners, because they, they, I mean, lots of money is at stake. I mean the battle between Google and Microsoft for who controls the search and ads related to search. It's a very big battle. So, so, I mean I work for neither of these companies but my point is that the stakes are high, which means that you can't take that many risks you pay. Okay. And, but research strives on, on a lot of, there should be a lot of exploration, not just exploitation and academia does that very well. So, so then, okay, so I'm making that statement as well so I'm making a statement. So then is there a can we connect the two and I think the idea there is that can be democratize the process in some way. So sometimes it could be like that large model from meta, you will be able to use it. Right. So sometimes it can be that there'll be models which are trained but then they are modified or fine tuned. And I hope that there will be enough diversity because it really would be very sad if we, we limit our research agenda, and if you all do the same thing. Thank you for your comments I think at this time, since we have framed the conversation I would like all of the audience to start participating. Please go ahead. Yeah. Very exciting panel thank you all for your wonderful comments so I, I have two part of my question. I think the first part echoes with some of the discussion about you know trustworthy season, you know, you know how to do responsible AI. I think with any new technology and new technologies there's only always two sites, right one side is all excited, you know, advocating it. And the other side is oh no no this is going to take over the world. And I think this is something we're seeing right now, particularly with gtps and then you know some of these very in a large capable models right. So as I think us in academia, what is our role in this I like to hear from our panelists, and then you know how to help move our field forward, you know, in these sort of very, you know biased. So that's the whole side of the spectrum. So that's my first question. And the second question I think is related to it, in a sense that. So throughout the day, we've here a lot of people with pessimism Alex about how you know children learns, and then what they observe. And I think a part of it is also, you know how not only they learn by observing the world, but also, you know they learn by through their parentings to character buildings, you know with their social connections, how they have empathizes. Right. So I think there is that kind of missing link or gap between how our machine current learns and then you know compared to in our humans I like to hear, perhaps some opinions from the panelists on this. Yeah, I think, I mean your points are all valid. I personally, so I want to speak to the first one about should we stop research so I'm on the side of continuing research. So there was recently a petition to sign, which I did not sign. So that reveals my position which is that, oh, don't build a beyond GPT for because I think that's not even a well formed question and how will you control it by the way, I mean there will always be people who will do it. And what does it mean in different dimensions I think in robotics we are not yet at even GPT 0.1 right. So, okay. Second, what can we learn. I mean the distinction between how these models are being trained versus how children are learned. That is to me very crucial. But I think this is not being paid enough attention to. So for example, this thing is learning from the GPT is learning from all the information on the web, which includes lots and lots of garbage. Right. Okay. And, and then that is embodied in it. It's it right so they are these experiment to show that there are all these different personalities which show up and you know so there are all these very laudatory articles but they're also these articles where people have these very weird and scary encounters. So it's like all those personalities that it has the best and worst of humanity. Now what do we do for kids. Do I take a child of six and expose them to everything on the internet. No right. So we have, we have some mechanisms in our standard. You know, schooling or we rate movies and we say okay, when you're a kid of this age you don't watch this kind of movie. We try to restrict access to pornography. You know, I'm just saying there are lots of different kinds of controls that society has worked out over time. And I don't know what's a counterpart of that but, but I feel that right now we are more like, let's throw in everything. And I feel that I personally think that's problematic. Because the question was closer to your space, maybe you can comment as well. Yeah, I like to comment mostly on the second question. I think that there's something really unsatisfying about this, the way these language models are trained right now which is predict the word that has been artificially taken out and it builds on linguistic rules and distributional differences but it misses out on intentionality on the ability to, you know, achieve your objectives using language and with it comes the objective of not upsetting people and not offending them and so including humans in a tighter way as part of the training loop I think is crucial and with humans would come natural interaction. Nice. Okay. So I also have questions. Yeah, first I'll dance the panel. So, first question is from GPT to currently GPT for we can see the trend, like more data, bigger model, better performance. Do you think this trend will continue? Do you think there is an upper limit there? The reason I'm asking because my group has been working on model compression. In certain scenarios, we may want smaller model like energy constraints, resource constraints and so on. So, right now we still see quite some big room in the compression but we have been wondering like is there an upper limit there. So that's the first question. The second question I have is a little bit different in a different space so it's actually related to education. So there are many students here. So, like with chat GPT so people have been trying it, like writing papers, writing scientific reports and so on. So is there some debate out there? Like should we develop some detector to like detect if a report is written by chat GPT or do you think chat GPT can become like a new type of calculator like students should master and should utilize in their future education. So I'd like to know the panelists opinions on this. Thank you. I was going to be my big closing question part of it, but that's fine. You know, that's fine. I mean, I've been feeding chat GPT a lot of our standard mechanics and projectile problems. So I can talk to you independently about what it does at least today. Every one of you has comments there but certainly a bigger question for all of us I think maybe President Malik can respond and Berkeley I'm sure your colleagues are talking about it. GPTX right how disruptive building on your point of view, how disruptive is it in particular to computer science education and things that undergrads, you know often taught and do and learn over a period of time, and how do you think we can respond and perhaps all panelists can can also talk about it. And especially, you know, it's in some ways intruding into the way that we think at least as some parts of it happen now in classes and so on, like things that are taught. So maybe you have some comments on its, you know, educational models on that I I don't again I don't have any specific research to put but I have opinions like everyone else in this room. So my position I'm on the side of acknowledging it and incorporating it. And I would think of it in the following way. If you look at the history of programming from 1950 to 60 to 70. It has always been in the direction of abstraction and higher level communication of control. Once upon a time, machine language was I have, I learned programming in like 1976, and I have programmed a machine with paper tape, and by pushing buttons on on I'm from the. So first you had to load a program which would load this paper tape, and I entered that program by pushing a few buttons and then it loaded the binary tape and then it looked something and something. We don't need to do that right it's not that but in teaching, at least at Berkeley we still have a class where people have to learn computer architecture and machine language. Though they're never going to program in it 99.99% then there's a stage where you program and you know Pascal or Python Java or whatever. And then there is a stage at which you program and chat GPT okay and all of those are valid and I think I mean similar thing happened in computer graphics I mean you're using OpenGL versus you know how to write a rendering engine yourself. I think that this is your it's a lot I to try to fight that but make people understand each of the stages. And then you can have exams which are appropriate to that so like calculators right, I mean if your kid is in fifth grade, she needs to learn how to multiply without a calculator. Right, I think schools can solve that problem why can't we solve the problem that for certain exams they cannot use chat GPT and for certain other things if their software engineering productivity is improved by chat GPT how are you going to hold it back. Did that answer your question partly yeah please go ahead yeah the first question yeah okay yeah the upper limit this one is we don't know this is a research question but I have my speculation is that that yes there'll be bigger. There is as much space at the top as there is at the bottom so that means both are needed. So if I think of brains, I'll use biology as an example insects like a fly has a million neurons. Okay so 10 to the six, the human brain has 100 billion neurons. So 10 to the 11. Okay these networks have parameters which are in the trillions right 10 to the 12. Okay, that fly is amazing in terms of what it can do. If you look at the data that a child takes a child learns basically a million words a month. That's the kind of data for our child being exposed to language. So if you take the data over 20 years. It's like a factor of thousand less than what screen train. So my guess is that over time we will see all of these points. So there will be space at the bottom and at the top. Very interesting panel. So I have a question on how like what you talked about where babies learn from one of the things being social. I guess part of it is actually understanding or getting feedback from an expert on how to do motor control. So can we adapt something similar in the context of robot learning. For example, we saw like large language models benefited a lot from using like RLHF where there was evaluative feedback on just saying what was a better text generation. But for robots would it benefit to actually model something similar like RLHF but instead of using an evaluative feedback views corrective feedback so hold the robot's hand and provide a kinesthetic correction. Would that improve robot learning from interactions rather than just having learning from a data set which is a supervised learning but having human in the loop during the learning from corrective feedback. And another question which is related to like language models. So do language models provide a more orthogonal space to discriminate rather than visual features so for example, if I have a dog or a fox image of them they might be very similar in visual features, but a dog and the fox the word itself is like more orthogonal. So if we can, is it better to rather directly classify from visual features, learn vision to language embedding to do the classification. Those are the two questions. Maybe we should take the first question in terms of the first question I'll give a very short answer. Yes, yes. Okay, meaning that yes this what you're describing should be done. And this is being explored I think the robotics field is doing like demonstration learning imitation learning. Then they have this area is becoming quite big and all these variations are being tried and should be tried. The second question was more related to alignment. A language versus vision. Yeah, so that. So I think that's a point that he made earlier which is that there are certain things which are more explicit in language, and I would say, I mean, you know, like there's this thing a picture is worth a thousand words, but sometimes one word has more than a picture, right, both are true. So I think it's both. I mean, I'll give okay I'll give one one fun example. Try to teach somebody how to tie their shoelaces, just using language, or try to teach somebody how to climb stairs, just using language, you'll find it is difficult. You're not allowed to point, you know, you just have to say, how do you describe the process of tying your shoelaces in language it's terrible. But is visually it's much easier motor it's even easier. But then if you want to talk about the Ukraine versus Russia, and you want to describe the issues there. Try doing that with pictures. It doesn't work right. So, I have a question on the software programming aspect of these large language models. So as you know, people are trying to generate software code using these models. But it seems like there's a blind spot where the reliability or the security of the code that's being generated by them is still open to question, because these models are learning from a preponderance of data. And by definition, security attacks or vulnerabilities they're very rare events. So do you see this as possibly another blind spot of these kinds of LLMs. Yeah. I think we'll figure it out over time. But I think the person who somebody should still be responsible. Right. So if I'm using this to generate ideas of code, then it's my job to review it. And so this comes down to, if you have a compiler which works well 99.9% of the time, but 0.1% of the time it produces wrong code in the old days we would have regarded that as unacceptable. Right. So it's somewhat analogous to that. Thank you. Good evening. Good afternoon. It was a great discussion we've covered a lot of topics from how a child learns like a very neuroscience perspective of it. We talked about ethics trustworthy AI like programming things about it. So I have a question like which kind of ties all of this together. This is something I've been hearing about deep learning in like my limited experience as a graduate student that the deep learning the applied part is more going much faster than the theory theory can catch up to it. So with the given recent thing in the like about Twitter going viral that should be signed this petition to stop language model training, should we reframe that question as instead of like, not just stop it but also simultaneously increase that which we are able to explain this like through explainable AI to get these metrics for our trustworthiness and also catch up with like theoretical explanations like a student asked like is it just making the space orthogonal. Like should as a researcher like with so much experience like I'm barely like five, six years of experience but in your experience would you think like we should increase our pace on this theoretical aspect of it. One of the theorists should answer this. Who's the theorist you're the theorist. I think one of the, especially in the trustworthy space. One of the challenges there is formalizing what is safe. Or, you know, what is trustworthy. These are concepts, one that humans don't usually agree on, especially when you get into like social sciences there's whole fields people studying trying to say what is justice, people disagree on what's justice is. So it's like you have to then choose which view of justice you have and half the people in the room will disagree with your choice of justice. And then so you implement a theory like it's awesome but no one agrees with your view of it. So I think that's I end up thinking some of these are more social challenges in terms of the trustworthy part in terms of the theory part. I would say people are probably going to use it no matter if it works, even if they don't have theory about it so I think that. I think it would just take a pragmatic view like even if you can't explain why a rocket goes into space if it goes into space you'll still probably use it. So I think people are going to use chat to be to even though they don't know and I think explaining it is definitely a worthwhile in certain applications but it's just it's just a very hard problem because even explanation doesn't act as philosophy and this notion of language and how people interpret things right people interpret the same sentence differently depending on their background and they're. And so you get into even, you know, you know it can't be solved mathematically in some way it's a social, it ends up becoming a social problem pretty quickly, I think. So, Professor Malik, I had like several thoughts about you mentioned like the upper limit and like the lower limit, and we, and you discussed like the astronomy people actually have like a single. Well, the Hubble Space Telescope. Yeah, so I think that a very like interesting development model for like software that actually comes into the democratize democratization process and like the consensus will be something akin to open source standards right, because we have a community that argues the technical merits of a certain idea. And some of the ideas will get sifted out filtered out some of them will get adopted. Some of them will evolve into something else. I think that will be suitable for like machine learning correct. And also like I have an idea, I mean, if we look at the fly, like one million, I forgot the exact number like one million. Neurons and like our chat GP has like much more. Perhaps it might make sense for us to do something like FBGA but smaller. So we can just implement our like weights and biases that directly into digital logic, instead of having it be passed through some processors with memory and things like that. Maybe it might make sense. Sure. I mean, I, I, I, I think that all of those ideas will will will get explored, I think, over time. Okay, thanks. Hi, I'm Ali, I'm a PhD student in the Department of Civil Engineering. Thank you for all your very interesting discussions. I was told I'm the last person so I'll just try to keep it quick. I'm working on the learning from construction site videos. So I use the segment anything model last night and this morning and it was right out of the box and the segmentation on my construction video set was very incredible so with input points collected from previous learning boxes for simpler models I got a greater output for this and so I have two parts of my question first is AI progresses are going so fast every year there is a big new thing. At this phase do you feel it is important for us PhD students to focus on applying some of these progresses towards industry. There is a huge gap in adoption in areas like civil engineering in practice. That is my first question. The second one is extension to one of the other questions. Do you see any new learning mechanisms so that the models don't keep getting larger and larger. So thank you. Yeah, I mean the adoption part, I think it's going to happen the way all new ideas get adopted. Usually younger people such as yourself are always the ones who lead in that, whereas the old four days will follow. So you should just do what you do on the second of this compute energy and so on. As I said I'm not professionally in that but I'm an optimist I believe that there are, there's a whole area right the system and people who are working on this and there are no fundamental, there are no limits like from physics which say oh it cannot be done at lower energy. So I would expect that people will keep finding better ways. I think that I would I start out being optimistic not pessimistic. Thank you. Thank you. Yeah, one of my friend was asking like the academic is slower than industry but when compared to like other divisions like civil engineering I feel that option in industry is much slower than what is actually happening in the academic field. Thank you. Yeah. One more question. Yeah, so we have seen the great development of this chai GDP and the general AI, it can deal with a lot of tasks, like including daily, daily tasks and also some specific domain tasks. So my question is, like, should we evaluate this AI model, this maybe chai GDP model like as we evaluate a some human like expert or like when human dealing with the same tasks, like, like this, that's my question. Well, that's what is being done in there, all their PR literature right. They make it take the SAT, they make it take the LSAT. So that's what they do. But ultimately, I mean, there's no money to be made by taking the LSAT test, unless it's to cheat, right. So, ultimately, it's going to be tested by particular domains and particular applications. So this will get sorted out commercially. I think that there's always a scientific thing you have to do, which is to prove to people in some way that this is significantly more capable than previous work. And I think the GPT people have shown that. So now, then in terms of where to apply it, how to apply it. I think time will tell. Okay, thank you. Thank you so much. Thank you all. All the panelists, David, Dan, Nikit, Sean, Joy and especially for having come to Purdue and participated in the panel and also for the wonderful talk in the morning and hope some of the conversations will continue during individual meetings. So that thank you to the audience as well for having participated. Yes, thank you.