 Wilbur University and MindLabs presents Tijsig Talks. Welcome. This is the 16th edition of the Tijsig Talks. My name is Pieter Sponk. I will be chairing this talk. I have two guests today, Djegor Shopa and Chris Emery, who are specialists in the area of deep learning. The reason that I wanted to do deep learning for this particular talk is that the AI has got a lot of attention in recent years. And it's kind of a revolution that started around 2014, 2016, and that revolution was the seat of that revolution was the deep learning technology. And we see this in many places, in many applications, in society, and in research. And because many people know that this is the case or they at least know that AI has become very important, I think it makes sense to try to discuss on a higher level what deep learning actually is, why it is so important, what it can do for us, what the limitations are, and we can delve into several applications of it and then examine those. So welcome. Can I ask you to quickly introduce yourself? Djegor, go ahead. Thanks for having me, Peter. My name is Djegor Shopa, I work at the Department of Cognitive Science and AI at the Brook University. And yeah, I work on deep learning applied to language and similar things. I also teach a couple of courses. Chris Emery, I'm a lecturer. From next month on, I will be an assistant professor also at the Department of Cognitive Science and AI. My work mostly focused on the intersection between privacy, security, machine learning, natural language processing combined, mostly generally interested in how algorithms have negative effects on us as a society. OK, that fits actually Tilburg University very well because we do both the technological side of things, but also the ethical and legal and societal aspects of AI. So I'm glad that you're here as well. Can I ask one of you to try to explain, without any sheets, what deep learning is? What should we think? So you're probably the big specialist that we have in the house. So deep learning is kind of a more recent synonym for neural networks. And deep learning and neural networks are just a type of machine learning technique, which deep comes from the fact that there is usually several layers of processing of the input before the output is produced. And like most machine learning algorithms, deep learning works by learning from examples. So you give it some examples of inputs and outputs that you know are correct. And based on these examples, machine learning algorithms extract some rules or some features or some correlations between inputs and outputs and are then able to reproduce this function from the input to the output. And deep learning is the special thing about it is that indeed it does it in several steps usually. So you have, as I said, several layers of processing where gradually the input is converted into a form, into an internal representation, which makes it easier to solve the problem, makes it easier to produce a desired output. And another thing which is important in deep learning is that it's a technique which allows often to solve the problem in an end-to-end way. So for example, if you want to work on something like labeling images, you just give it the input image in terms of the matrix of pixels it has. And you give it the output labels that you might want to have, for example, describing what's in the image. And internally, this type of method produces internal representations, which kind of convert the image into some other representation, which makes it easier to decide what is in it. But it's learning directly from the output, and it makes a mistake. You tell it what should be the correct label. And then it just updates its own parameters in a way that everything is becoming more correct over time. And so there are no like, you're not doing it manually. You're not doing it by manually defining different layers of different steps of the process. There are different steps internally, but you're not defining them manually. It's all learned end-to-end. That's kind of a high level summary. Yeah, but you had some terminology that might not be clear to everybody. Because you mentioned neural network, for instance. And I know I studied AI neural networks. I had technology from the 1980s, 1990s. It was really popular there. And people often say that neural networks like the human brain. But I assume that is not the case. A human brain might be a very global analogy. But to what extent would you learn that representation that you talk about in such a neural network? So I think it's at a very high level. There is some similarity to brains of animals. But I think that the term neural network comes from a more local type of analogy where there are neurons in animal brains which are connected among themselves. So they form a network. And I think one way of thinking about deep learning is also involves that kind of analogy that there are parameters in the network. You can think about these parameters as connections or weights between different units. And I think that has in the past been like a very kind of most common motivation for why we should be doing things in this particular form. But just an equally valid perspective to think about neural networks as differentiable function compositions or any other type of analogy that you can come up with. So I think that there is a connection historically to brains and to neurons. But it's kind of a loose analogy. And I don't think it's the only way of thinking about them that is the most useful necessarily always. But is deep learning always involving in neural network? Or can it also be something else? Well, it depends. I think as long as you have some multidimensional representation which you project to a different multidimensional representation, you can think of this kind of projection from one matrix to another as a neural network. But it's not literally a network. Like it's just a vector algebra in a computer. So literally in terms of hardware, it's not any type of network. But it can be an analog of that. So in that sense, yes, deep learning is almost always can be visualized or thought about as a neural network. OK, and so it's always a mapping from inputs to outputs. And you learn that mapping, what is happening? It's not always like that. But it's almost it's very frequently like that. And I think the easiest way of thinking about it and coming up with examples. But I think in some cases, there are some exceptions. But I'm going to turn it over to you. One of the first ones introducing it into the department. I don't know. Probably. I think Jaguar is probably was OK, the two of you. So but as I said, as far as I know, the technology existed already for quite a while before it became this revolution. So why did this happen? Want me to answer? Yeah. Well, so I think there were a few things at play. So most of these techniques were developed around, I guess, the 90s. That's where all the seminal work is from. But at the time, the computers weren't powerful enough to run these algorithms yet. So you can imagine that the larger these networks become, the more sort of space they occupy on whatever PC or something you're running it on. So while, I guess, theoretically, people knew that these things were powerful enough to do sort of like making general representations of particular data, the computers were kind of behind in terms of processing power. So I guess the big names in what is now deep learning were just sitting on this for quite a while, thinking that, OK, at some point, Moore's law is going to kick in. We'll have this sort of increased computing power. And then we can actually see if this works. And that was, I think, around the time that I started doing my PhD. This got up to steam. There were people curating larger data sets, because that's also something that was foretelling it back for some time. So sure, you can have these very powerful systems to represent things, but you also need a lot of data to then do that to actually train from, right? So smaller data sets don't work that well. So two components, I guess, some large data sets and a lot of computing power. So I think at the time it was ImageNet, correct me if I'm wrong, that was sort of the main thing that was the first what we call a benchmark data set. So it's basically ImageNet is a data set of images with labels, like, I don't know, horse. OK, so this is the image. Yes, and there's a horse in the picture. So there might be categories, and then later people went into sort of describing the picture a bit more in detail. But this was the main thing. And I guess it's nice from sort of an application point of view. You can also see it in your phone now. I guess if you search for a particular term, then your photo library automatically filters out things in this view. It's kind of like that. That was the goal, at least. And in computer vision, which is the research field that deals with this, people were like sort of all these little components that people would combine to then do this classification. And it wasn't really like a main algorithm to do it. There were like things to extract features. There was an algorithm that would then learn from these features and try to do something. And once this data set got more traction with people actually running neural networks and being able to do it within a time that you are still alive once it finishes, they actually got pretty good what we call accuracy. So it's like, how good can we actually classify all these objects in these images? And then it was like, wow, okay. This actually is now happening that these neural networks have now superseded the manual kind of like intelligent design of these, well, according to the research at least. And that's when things started rolling. People started sort of developing special more sort of tailored things, what we call feed forward neural network is the most basic instantiation, I guess, typically at least. But people started coming up with ways to rep because this worked for simple classification, but for, as Jaguars also mentioned for images, you have different representations and for example, for language, let's say. So people started tailoring specific algorithms to specific types of input or domains, right? So images, text, I don't know, graphs, this kind of stuff. So and from there on, it's been sort of a race of who can get sort of the main like, because still these were old ideas, right? That were implemented. And I think only from the last, I guess, five, six years people have started introducing sort of newer systems that actually prove more powerful than something that was already done in the 90s. So I think a lot of this initially was a rediscovery of old techniques due to availability of compute like Chris mentioned, but also to the availability of large datasets. And large datasets, I think to a very large extent, were enabled by the rise of the internet and the fact that people are sharing things on the internet. So like the way to get like millions of images with labels, it's just an issue of scale like when people are sharing them on social media, then you can collect them and use them the same with text, the same with a lot of other things. So yeah, so initially this was enabled by these two main factors and people starting applying things like convolutional neural networks or long short-term memory networks to different problems. And they started to work really well, which attracted more attention from other researchers. And I think that that's kind of a snowball effect that you have more people interested in this, working on this and starting coming up with new techniques and new approaches. And I think nowadays like there are so many people working on this that there's definitely a lot of new ideas and new contributions, not only rediscovery of techniques from the 80s and the 90s, but definitely new developments, which we probably can't keep up with in general in deep learning, but with our own domain, usually it's quite visible that there are new things that also make a big jump, make a count for big jumps and the performance of these systems. Yeah, I would like to know slightly more about the technology because I've been working with neural networks when I did my PhD, which is already a while ago. And then I worked with some master students and we built some neural networks and usually it was, I don't know, 10 inputs, one output up to 100 nodes in a network. And I was already pretty big and it took a long time to train. And when you look at these neural networks, every node that you add almost makes this exponentially more complex because you get all these extra connections in there. And if you talk about more law, then well, it's not a law, it's an observation basically, but it held, but still that says every year and a half or so the computer power doubles in capacity and in speed. But I would think that that would not be enough to within 10 years go from these relatively small neural networks that I was using to these gigantic networks that are currently in deep learning. So I understand that point that you make about the data sets, but I would think that the hard way not only technology or the software technology should also have seen some changes. I think there is issues with hardware definitely also, like being able to distribute it over many nodes, so-called, over many machines and many processors on the same machine also in terms of like how you design the network. You don't want to just add a node or connect it to everything. You typically, you have little modules which are connected internally, but they're only connected partially with the rest of the network. So that allows, and you also design them in a way that you can do stuff in a modular way. So you can distribute things more easily. So one example is a kind of a recent development called the transformer. Well, not so recent anymore, I guess, but anyway. Not from the 90s, right? As a transformer, which like the main idea behind this that is that, or one of the main contributions that it made, what enabled the scaling of these networks because it's very easy to parallelize thing. A lot of computations can be done in parallel on multiple machines, and then that makes it possible to train a network if you have a thousand machines with, I don't know, a billion parameters or something like that. So there is a combination of different things, where a lot of companies now have access to a lot of computation because they make business that way. We have techniques to parallelize things, and we have smart design to make them possible to run at scale. Okay, and yeah, you already mentioned, for instance, image classifications or describing what we see in an image, and when that came up, I thought it was surprising. And when I look back on it nowadays, I think, well, that's actually pretty simple. But many more things that have been made possible now, can you give a couple more examples of things that we couldn't do 10 years ago? It's always like that, but once a machine starts being able to do it, you know, it's actually simple. But it's not simple. It's not at all simple. It's very complicated, and that's why it took, you know, since we had some first computers in the 50s until now that it can actually be done, you know? So these things are not at all simple. They're like extremely difficult, but we are also not simple, right? Like mammals, we recognize things in images effortlessly. It's, you know, it's something that we're born to do. But again, it's not simple. We don't really necessarily know in detail how we do that or how, you know, a dog does it or a hawk does it. So yeah, that's the key thing. Let's know a bit about how the computer will do it. We know more or less how a computer does it, right? But it's not something trivial. But other examples of things that were not possible, let's say 20 years ago, are now possible now, I think from kind of something that I am a bit more familiar with is machine translation or translating from one language to another, right? Like at a level that you basically can understand and you don't need a person to actually correct it much. So you still, you know, these systems make some small mistakes and sometimes some big mistakes, really. But it's for most pairs of languages, you can give it a text or a paragraph or even a longer text and it will output something which is completely fine. You know, it sounds natural. It conveys the ideas in the other language with some small glitches sometimes. But definitely good enough to be completely usable for an everyday. Yeah, I know that the university actually sometimes writes a message in Dutch and then they just throw it through Google Translate, which then gives you a or another translation tool, which gives you the English version. But the funny thing is that it then also translates the undersigned name. Yeah, there are some glitches, but you should have tried to use machine translation 20 years ago. It was completely gibberish. Like, you know, it was ungrammatical, first of all, and like completely crazy. So there is a huge jump in this time period that has been, that has happened. And, you know, you always focus on the things which are still missing and there are definitely much room for improvement always. But you also, you know, you shouldn't lose sight of how much actually we are able to do now that we couldn't. Other examples? Yeah, I think, I mean, I always like to use whatever you have to demonstrate these things. So like, your phone is always an amazing example of all this AI stuff that is sort of hidden behind what you generally use. So let's take your camera, any sort of smart optimizations that are in a smartphone camera that all happens with some like computer vision techniques or, you know, part of it actually. Any things that are sort of served via the internet, so search engines, the ranking of items is done via some smart systems. You can, you know, if you look at a video, you can, for example, turn on this automatic captioning things that has also had like quite a big leap, especially for English. Where it's, if you're a native speaker, it's actually quite accurate in terms of automatic. So I'm going to get like. Yeah, I turned it on by action during my last lecture. And it started on the type. Yeah, subtitling me. Yeah, so and there's always quirks, right? But if you compare it to systems that we even had like five years ago, there's such tremendous leaps in quality within like controlled environments, obviously. But yeah, so, you know, whatever we have in terms of engines for doing, for example, computer game AI that has improved quite a bunch. If I compare it to, you know, sort of the starting 2000s AI where it was mostly sort of rule based. We has been researched into AI that is quite competitive with professional players. Self-driving cars are something that even if it's not fully functional, you can just say that this wasn't something we could do from 10 years ago. No, that definitely has improved. But I actually wanted to bring up that example because I've been looking a little bit into the self-driving cars and how far they go and mostly because I was pretty annoyed to someone like Elon Musk who's every year claiming that next year we will have full self-driving. And he's been claiming that since 2015. Today he's still claiming that, but all the other producers of cars who were doing self-driving a couple of weeks ago, they said it's probably never going to happen full self-driving. Now, of course, you can make it happen by just redefining what full self-driving actually means. But it seems that there is now a lot of skepticism about whether or not something like full self-driving is actually possible. Now, you might not be a specialist in this, but I would think there are certain applications that are out of reach of something like deep learning. So are there things like that? Are there things that we can imagine that's probably not going to happen? Good. Shall I start? Now we get a whole list. So once you start applying AI in real life, where it's more invasive, things start to become a bit problematic, right? You have to actually deal with how noisy the world is. And self-driving is a very good example where I had a brief discussion with the Jaguars this last week, I think, about this that we obviously have laws and stuff that apply while we drive. And driving is quite an involved cognitive task, right? You have to not only remember that there are certain rules and some things might override certain rules, but you also have to deal with that not everything is a boring highway where there's just like cars going like for it. There's a very wide spectrum of things that might happen. And the car, if you want to have full self-driving capability, has to react to this in a way that is what? Is that like according to the law? Is it something that is beneficial for the driver? Is it something that's beneficial to the driver plus everyone around it? So typically we want to avoid things that kill the driver and other people. And these self-driving cars now have to operate within an environment where things aren't self-driving, so you can coordinate or something. Even sort of getting a car from a parking lot, there's so much stuff that can happen there. And basically anything can be a kid that crosses and has to hit the brakes immediately, because if it's an unknown object, then better break, right? That's the most safe to stop driving. So I think from that, it's difficult to imagine the sort of utopian self-driving component within our existing sort of law systems and how we'd like to have it mimic human driving where we know it's flawed, right? So and operating within this very noisy, quite complex system seems to me like it's quite a reach still. But if, suppose I would say we can't do it because we don't have the training set, it would that we don't have a data set that you can use to train it? I think yeah, that's one of the, I think one of the points. Yeah, so this is, so driving cars and other kind of robotics applications are an example of something which is a little bit quite different from classical deep learning applications where you classify images or you translate a text or you recognize speech or you do things which are kind of virtual in nature, right? This is an agent which is acting in the world. It has real effect on the world. So getting this data to train it is more difficult because things have to happen in real time. And they have real effects, right? So if you let a self-driving car run around the city and to collect data, it can kill someone or injure someone and that's not something you want to do. It's not the same when an image classifier makes a mistake. It just makes a mistake, nothing happens. You tell it what the correct answer is and you keep training, right? You cannot do exactly the same with a self-driving car. You have to come up with some simulation environment or something like that and then transfer those skills to the real world. That's one reason I think why it's working less well. Another reason is that the threshold for defining what it means to work well is so high, right? Because human lives are at stake. We have very, very little tolerance for mistakes. We have much higher tolerance for mistakes when we're translating text, right? Something is wrongly translated most of the time. It's just okay, small glitch and nothing happens. There's a small glitch in a car which is driving around the city if it kills someone. And so for that reason, I think it's just a much harder problem. Yeah, but people also argue, of course, this is very cynical arguing, people make a cause accidents as well. And as long as a self-driving car causes less accidents than what people do. Well, it's hard to measure, right? You have to let these cars lose and to figure out whether they're making more of that. I was actually thinking that self-driving, but now we maybe talk about the solution. It's really possible if we only have self-driving cars. It would be easier, yeah. For sure. Yeah, but then we go into more of what is the critical point. There is some sort of worst-case scenario in this, right? Where everything is optimized to each other, where the whole system fails and everything starts crashing, it exists somewhere there. So then we would have to get into, I guess, more of a philosophical debate of do we then have failed switches? And are we confident enough that they work? But then we talk about a certain risk that happens if you automate too much, somebody can hack the system and then create a huge problem. But that is actually not what I had planned to talk about today. But we can talk about it as well, if you want. But because now we came up with self-driving a couple of times, and you'd say actually general robotics is something that is basically deep learning is not enough to solve the other examples that you would like to bring up. Because otherwise I would like to speculate more about this. No, I think, yeah, I mean, we can probably think of examples, but I think that's like this general area of things where it's not so easy to get to training data, right? Like whenever there is data scarcity of some sort, whether it's due to this being like an application involving interaction with the real world, or whether it's just due to there just not being enough data at all. So for my own domain of language, these applications work very well for common languages, which a lot of people speak like Chinese or English or Spanish. There's a lot of data that has been collected, and it works very well most of the time. But there are a lot of languages that only, like if you're 100 people speak, and they maybe don't even say that much necessarily. So for those, like they, people are able to kind of learn the language in that kind of situation, but our deep learning systems currently cannot. They need like a lot of much more data than humans to get to some useful kind of performance. And so that's another example, which is not related to robotics, but has another type of data scarcity built in. Okay, I couldn't imagine, because that would be actually a follow-up question that I would have in general, is the other things that we could add to deep learning or maybe add next to deep learning, which would allow us to do more of these things. And in particular, if you talk about something like a language spoken only by 100 people, if we put a human between those 100 people, after a while, they will get better and better at that language. So I can imagine that you have a different kind of learning system which could do that, because is deep learning really dependent on big data sets? Or could you... Yeah, current ways of doing deep learning are, definitely they need much more data than a human being in a comparable situation for language at least, maybe not for other things, but also for driving. I think a typical person in their 20s can learn to drive as little as 20 hours of practice. Like that's probably is a very small fraction of what type of data that the AI system would need. So definitely as humans or many other animals are similar, we have had like millions of years of evolution which has kind of built in already certain abilities into our nervous system. And so that is not something that is not how deep learning works. It's kind of like a very generic, you define some generic architecture, but other than that is basically starting from scratch. It's a blank slate. So there's a lot of... You need a lot of training data in order for the systems to reach some level of performance. So like you asking what could be add, well, of course we can add more what are called learning priors. So some kind of hard coded or preferences for certain things that would make it easier for them to learn a language, for example, because if the structure or the innate preferences of the deep learning system were more similar to how the human brain is, then they would also find it easier to learn from smaller amounts of data. But the thing is that it's an idea which has been around for a long time, but it hasn't really been easy to implement because we cannot reproduce evolution easily because it's computationally extremely expensive to do that. And so we have some other workarounds, but nothing so far has really, from this idea, has kind of made a big difference. No, I can imagine that since there's so much success with deep learning at the moment that people say, well, I'm just going to build some more applications with deep learning because I know why I've got a success rather than research things which might maybe give a success in 10 years or something like that. Yeah, I think the low-hanging foot is scaling. So for a lot of languages, we do have a lot of data. For a lot of things like biological applications, there are a lot of data out there. And so we just have to, or people who are doing that, they're just doing the things that already are possible with current technology, just scaling it up to these big data sets because they can achieve very good results and very practical and useful things without having to think very hard about new ideas and how to make them work. That's very difficult, right? It doesn't happen every year that someone comes with a brilliant new idea that revolutionizes some field. Yeah, so just to add, I think if we had a good idea about what to add, we would not share it. Yeah, we wouldn't be sitting here. Now, I think, I mean, there's been some recent papers I remember, one recent one by Yanle Kun where he tries to reason about which components will be necessary to get the existing deep learning algorithms into the next step, which is typically adding some, if I remember correctly, because I've mostly seen this fly by on Twitter rather than actually read it in detail, but there's long-standing things like better memory, world knowledge, representations, some sort of teacher-oracle interactions, certain playgrounds and ways to synthesize data. And these are all components that I think are individually studied and haven't really been, these ideas, again, are not new. They've existed in AI for quite a while, but a good implementation that has demonstrated to work isn't there yet. But I think all these ideas probably have some value to them. It's just the combination and implementing into something that indeed scales to such large data sets that doesn't exist yet. Yeah, okay, good. I would like to talk a bit about computational creativity because that's also driven by these deep neural networks as far as I can see. And we've seen these systems coming up in recent times, started with Dali and then Dali too, and then we get variation of that in the end. Easily accessible, and we have the language systems like GPT, which has several iterations. And well, I talked about that a bit earlier. Okay, when you look back, it's actually less impressive, and impressive once they're done. And actually I've seen these things, and then I think my first instinct is, oh, this is incredible. And then I look a bit closer. I think actually it's not that impressive what I see here. And I see some mistakes in here. So what's your idea about this? What's your opinions? Yeah, I think it's kind of how it works with people. We see the, we impressed at first, and then we less impressed when we kind of see the little glitches. And yeah, when I see that in the image, I stock suddenly, we have, I think, okay, the computer just put an I stock image in there. Yeah, but I mean, why wouldn't it, right? It's, well, it should, but if it understood the image, it would remove that. Yeah, because further for a computer, like, you know, this part of its world, right? Like you have never told it that these watermarks are something that shouldn't be there. Like they just see images. Like I think, you know, put yourself in the place of this system, right? You just don't interact with the world in any other way. You just, other than through this training data and the training data, like, you know, I don't know, some percentage of the images have watermarks on them, which is ice talks or whatever, shutters or something like that. And so for you, it's like a natural part of the texture of many objects, you know? And that's why it does it. So I think that's maybe not a brilliant example, but there are other things, you know? Like maybe like sometimes you see a hand with six fingers instead of five, you know? Like these things are, you can think that this is kind of less justified, right? Like as a mistake. And so like counting objects often, I think it's a little bit iffy for... Yeah, and it's from the newspaper, for instance, they had some, they gave some example where they wanted the robot as a painter and you show your robot there and the arms were not connected to the robot. And an artist would have made that, if they would have done it, would have made that decision deliberately. But for the computer, it's just generated. Yeah, again, I think this is probably, you know, would speculate that this is related to how these systems interact with the world. They don't interact with the world, other than through these kind of images and textual descriptions of these images. So it's like, you know, a person probably has a much better understanding of how things are connected in the world, because it interacts with real objects. It has like, they have some physics understanding of how objects behave in the world. None of this is present in this system, other than, you know, as a kind of like a very tenuous side effect of just looking at, you know, gazillions of images and textual descriptions. I think if, you know, the systems which would watch videos probably would be a little bit better, because they would be able to see continuity in time and how things are, you know. Isn't that much harder to analyze and to learn from? No, I think it's just less easy to scale, right? Like, because, you know, instead of having a single image, you have, I don't know, depending on the frame rate between 25 and a couple of hundred of, you know, images per second. So it's just a lot of more data that you need to process. But again, like, you know, you have more information. And so the more information and the more aspects of the world these systems will have access to and be able to get information from, the better they will get. And I think these kind of glitches will, you know, become rarer. That's a good question. One of our colleagues, actually, was interviewed recently in the newspaper. And he remarked him on the impressiveness of what we can do now, which is really validated. And he said, well, maybe in 10 years, we can generate movies this way. And I was thinking, I don't see that happening, generating movies. Because, first of all, you cannot just, let's say, learn a general structure from observing a bunch of movies. You cannot generate a new movie on a basis. And also movies are definitely 3D. Even if they're projected today, it's a 3D projection. So we have to learn complete objects in some way. So what do you think, would 10 years? Of course, 10 years is so far in the future in AI that you can always say, well, I don't know or maybe. But would be making predictions. We have to meet again in 10 years. Yeah, I'm still alive by then, I hope. I would say that probably generating short cartoons already is probably possible. Yeah, but those are usually 2D. Yeah. Well, no, you still have occlusion. And yeah, I would say in 10 years, definitely we can have at least shortish reasonable clips. I would bet that that's going to be possible. I think it also depends on whatever you classify as a movie. There is there. And I just wanted to tie it to this, too, with your previous question, given these, there's now a lot of systems that based on, and that's GPT-3 is an example of this, but also, for example, stable diffusion generations that these are all sort of prompt driven. So there's a little text that people enter as a sort of a starting point. And then either it comes out like a picture or more text. And I think where we are sort of inclined to not even having the input and output side by side, but think just the surprise of things that are generated sort of appeal to us in a way. I think stable diffusion was used to win some sort of modern art competition a few weeks ago. But if you look at that picture, it looks really like it is visually pleasing. It looks nice indeed. If you zoom in, you see these irregularities, but that's not what we focus on, right? We focus on the main thing and we look at it and think, wow, this is spectacular. Like this is not something that we're used to. There's obviously a lot of copying involved with these algorithms, but you can also argue that there's also a lot of new things. We as humans being artists also are inspired by certain genres, styles, techniques, et cetera. So that also takes some time to deviate from these things and come up with something completely novel. Whereas these things are basically some sort of eight ball with give me some or some randomizer would give me something in this style. And now it's like very interesting to look at, but it is a bit boring in a sense that it mostly comes up with abstract, interesting things that only have some sort of quality because either you're impressed by the algorithm or you're impressed by the fact that it's not like anything a human would produce. And I think with the movie question, it would be the same thing that if you treat this as some sort of avant-garde movie, it's probably quite easy to be impressed. Oh man, you just fake that you're some director and you put together this thing and people will be very impressed. But doing like a blockbuster, yeah, that's probably you need indeed all these things that you mentioned like a stable script. There needs to be things and they follow up and so I'm sort of a logical order, some story going on that needs some conclusion and like a lot of the stuff that's good if not just coherent. But I was actually talking this afternoon to a woman who's in game development and she works for a commercial company in that respect and she told me, she said, look, artists make concepts for us. And actually I know that in game development of the game development cycle, 70% of the money goes into art and only something like 15% into programming. And she said, well, the artists, so we asked them to make concepts and they now use these kind of tools and usually took them a month and now they do it in a day, which made me think, okay, so many artists are gonna lose their jobs now because. Yeah, or maybe they would just adapt to that and we'll collaborate. Some artists will get better jobs, that's for sure. That's the same with translators and with a lot of professions, which, you know, which you can automate now, I think, you know, like their translators are now probably doing like proofreading or post-editing and not translating from scratch and it's probably with a lot of artists, they won't be like manually painting every single frame of the cartoon, but they would just maybe give some general instructions and the AI system will fill in the details and the rest. So I'm not sure how would the economic impact will be on the profession, probably in the short term, a lot of people lose their, maybe their job, but in the long term, probably people adapt to technology ways that have always kind of adapted, right? Just start collaborating with the machine in a way that is more productive. Yeah, we still have a lack of people for jobs. We still have lots of vacancies, you know. It's not bad if you lose a couple of things that we don't need to do anymore. So when I talked to you beforehand, and I mentioned a couple of applications, you came up with something that I hadn't heard of at the time, but immediately after that, I started seeing and that was Alpha Fold. And actually something like in 2005, 2006, I think there was an application called Folded where humans collaborate with computers to create protein foldings, which could be very useful. And what they found is that humans actually did it better than computers, but you could teach computers the same techniques that humans used and the computers became better as well. And then you mentioned Alpha Fold. And I was quite impressed when I heard that. So can you speak, because if people don't know about it, I think it's worthwhile mentioning this. Yeah, I think it's a very interesting thing because the other examples that we've been talking about so far is- Maybe also talk a bit about the application. That are things that people actually kind of can do, like recognize images or translate. But this is something that people maybe are not so good at. So the idea is to, from the sequence of amino acids in a protein, determine its three-dimensional structure. So proteins consist of amino acids. Amino acids are just these little building blocks that are about 20 of them or so. And they're like basically changed together from tens to hundreds to thousands of amino acids in a protein. And then just physics and chemistry of these amino acids makes the protein fold into a three-dimensional shape. And the three-dimensional shape of a protein is very important because it determines its function. So if you have an enzyme or a drug or anything that is functional, that's basically determined by its three-dimensional structure. Like what is the exact shape? Does it fit in this particular other protein and so on? So this is like an extremely important thing in biology, in medicine, and so on, and other similar applications. And it's a very difficult problem. There is like exponentially many, theoretically, ways that the protein can fold itself. But in reality, only typically a very small number of those are stable. And only one is typically the one that ends up being the real three-dimensional shape of a protein. And so as you mentioned, there's been a lot of work on this problem for many years and like the relatively slow progress. But in 2018, there've been actually a big jump. So there was this competition where scientists would come up with algorithms which predict the three-dimensional shape from the sequence of amino acids in the protein. And every two years, they would have a benchmark. They would try to have a competition and see how well their systems are doing. And in 2018, there was one entry which scored much higher than the others. And two years later, they got to a score of 90% out of 100, which is very close to kind of perfect. And so this system was called Alpha Fold and Alpha Fold 2. And it was developed by people from DeepMind which is a subsidiary of Google. And so now, this is at the level that this by basically a factor of two better than what we were able to do before. Like I think before that breakthrough, the scores were somewhere in the 40s. So about 40 now, this score is 90 or a bit higher. And so this makes it very usable and very impractical applications that I mentioned before in biomedicine. And as far as I know, this program has been applied to a lot of protein sequences. And their three-dimensional structure has been predicted and this database has been made available for scientific applications. So I think this is a big thing. People can't easily do it. It's not something that, you know, it's just like doing things that people can't but just faster. This is something that determining protein structure is a very slow and tedious work which involves like freezing a protein and taking an electron microscopy of it and so on. So it's long and expensive. Now we have shortcuts and so that's amazing, I think. And does it, I don't know if you know this, but I can imagine because such an application will predict what the folding will be. Is that an always correct? Probably not because it's 90% accurate. Yeah, there are some cases where it's not, again, it's not really exactly my field. You know, I kind of follow biology but not all the intricate details. But it's definitely much, much, much better than what we could do before. And it's much cheaper and faster. So we can, these predictions are, I think, super useful. You can always, if you have some candidates, candidate proteins, you can narrow them down and then you can do the microscopy work to actually determine the actual thing. Right, because that would be my question because if you still have to check all of them but you have to pre-selection now, which helps. Okay, anything you want to add to that? No. So what I was wondering here, well, I had actually two questions left but we are running almost out of time so maybe we should go towards running off. Can you tell me a bit about your own work for, let's say, the next year or two or three, not 10 years because I know that you're going to make movies then, but... Sure, I'll start off. So yeah, as I said, I'm generally interested in sort of problematic algorithms. So work that I've been doing before is looking at, for example, how algorithms can infer things about us given our language use. So let's say that you post a bunch of social media posts based on this, it might be able to predict certain things that might be used for sort of good applications, right? So if we want to know some demographic information about you, then age and sort of education level might be relevant as sort of an easy way to collect data but obviously there's ways that this can be used in more malicious ways. Some direct examples, always if you let things like... Let's... And I'm not saying that these are necessarily accurate or good algorithms, but there is definitely work that tries to, for good causes, try to predict if you suffer some sort of mental disorder or depression. These things are relevant to study for certain areas of research that can be done based on your language use. You might not want to reveal those kind of things. So providing some sort of insight into this and ways to not share that information is relevant. So that's things that I'm trying to or have been trying to study based on sort of changing your language use. But I think it's a much more broader thing where... This is a very direct example of how an algorithm might be employed for not so nice things, but a lot of things are also hidden. So the internet is mostly driven by how we're shown in particular content, so ads or some ranking of even social media posts. And this creeps in very sort of things that don't necessarily... I wouldn't want to use the word manipulate, but it certainly exposes us to a very limited set of things. And we're not really sure what drives that and I'm sure that some of the companies that provide these things also don't know, but it's very understudied how these things affect us if there is some sort of bias in these very large systems. So that will be what I'll be studying in the next few years or at least starting this because it's a very large endeavor. Very socially relevant. So this is the TLDR, I guess. Yeah, so for me, my main focus of research is on language, especially spoken language. So we mentioned that there's been a lot of progress on many language-related applications such as translation or recognition of speech. And however, the way it works is a little bit maybe not ideal. So for example, what we typically do in order to train a deep learning speech recognition system, we give it a lot of audio with speech in it and we give it the corresponding transcription. So what is like the written version of the same thing. And if you give it like, I don't know, a few tens of thousands of hours, it will figure out and you can have a system which transcribes spoken speech. But of course, if you look at how kids learn to speak, they don't really need that much information and they don't need this kind of very close supervision. They don't get written and spoken input which closely match at the same time. They just listen to people chatting, maybe try to say something themselves. They figure out that what people are saying in a certain situation maybe is related often to that situation in some way. But it's not clear exactly how often and what aspects of that situation it's related to. So kids when they are learning a language, they are very good at working with very noisy, very loosely correlated data and relatively little data. So a few thousand hours is basically enough for a kid to start speaking in a reasonable way. And so me and my colleagues were working together with me trying to figure out how we can make deep learning systems which are approximate some aspects of this human ability. And so, yeah, going back to movies, we were currently, for example, looking at children's cartoons and trying to use children's cartoons such as Peppa Pig as a kind of a testing ground for that idea. So in a cartoon, the characters are speaking, the cartoon takes some everyday situations and we just give this data to the computer and try to make a system which learns language from that kind of input. Like the system should be understanding which concepts are present in a spoken sentence just by listening to these sentences and seeing the scenes that were these sentences spoken in the cartoon. And hopefully we will in the future kind of move to more realistic movies and maybe into also like more real world interactions. That is definitely something that I'm looking forward to. Okay. Well, thank you. You both now gave me an ID for a follow-up podcast but then I can ask you who I should invite for those because you probably should not repeat this. But is there anything more you want to add? If not, then thank you for being here.