 Think about, everybody walked through the outside, it's very beautiful here to come into this building. You looked around, trees, moving cloud patterns and the ocean and the grasses and the other human beings and people talking on their phones. Of all those things I just mentioned, how many of them are thermal equilibrium? None of them are thermal equilibrium. Now, human beings like to surround themselves with dead stuff and so here we are being human beings surrounded by dead stuff more than there is outside. But even in here, of those systems that are interesting, which for most people includes the social media on their phones, but also things like this and things like the other people in the room, how many of those are thermal equilibrium? None of them are. So, it arguably almost every really interesting system is not at thermal equilibrium. Arguably, I would say, what is one of the most impressive things about equilibrium statistical physics, all statistical physics up through the end of the 20th century is how much they managed to exploit a approach to understanding the world which is restricted, badly restricted, hobbled, fettered is a word in English. Horses have fetters around their feet so that they can't move. It is fettered in its very foundations, which really is so amazing about the people who got the Nobel Prize, Parisi and so on is that they managed to actually figure out so much that they can do with a purely equilibrium approach given that the world, everything from cellular biology to neurobiology to computational processes, all these things are massively far from equilibrium. They all work, sometimes people like to call them complex systems, those are the kinds of things that I think are the most interesting and they have not been able to be addressed using conventional equilibrium statistical physics. Okay, so, let's see, come on, move. Why will this not move? There we go, okay. So, this is really a big problem, so what do we do? So, here, if I had more time to put this together, there would actually be audio in the background, there would be Baroque trumpets, toot, toot, toot, toot, riding to the rescue. Over the past several decades, starting maybe in the very late 1990s, but really gathering power in the 21st century, there's this field that's sometimes called stochastic thermodynamics. It is other people sometimes confusingly referred to as non-equilibrium statistical physics. The reason that's confusing is that phrase was sort of used in the 20th century for some very, very simple models of things like percolation and so on. Stochastic thermodynamics, what people have essentially done, these are people like the names of Chris Drozinski, Gavin Crooks, Udo Zyfert, many others, and starting to be understood actually, and appreciated by the physics community, these things always take a lot of time to actually be appreciated by the other scientists. What stochastic thermodynamics is all about is it is a way to actually use the understanding of statistical physics that we need to have probability distributions to capture all that we don't know. We know something about the physical processes going on underneath those probability distributions. We now have a set of tools that we are getting the ability to use to analyze systems arbitrarily far off equilibrium. And there have now been probably, I don't know, 100 experiments or so, covers of nature and whatnot, testing and confirming all of these different predictions from this theory. Where did my show screen go? Okay, I've got a chat here too, but let's see. Okay, so that's all well and good, and it would have been even more impressive if I had the actual trumpet sounds in the background and things like that. What do I mean by interesting system? One set of systems, which is interesting, I'm not gonna claim that this is the only thing that the word interesting means, is those that we intuitively would think of as though they are doing a computation. Information processing is sometimes used for sort of a mundane version of it. There's actually very deep questions here. For example, a simple hydrogen atom can actually perform any computation. This is what's called Turing complete. If you actually establish its ground state in an arbitrary way, you can get to perform any computation in the same way that what's called a universal Turing machine can. We'll get to what that means later on. So given that even a hydrogen atom can be viewed as performing an arbitrary computation, what does it even mean to say a physical system computes? Does every physical system compute, in which case it's kind of a vacuous concept? So that's actually a very, very difficult question, but instead what we'll be doing in this course is considering the interplay between stochastic thermodynamics, all that stuff with the trumpets and so on, and computers as they are understood by computer scientists. So these are really two completely different ways of viewing the world. We don't know what's in the middle. I do not know. I would be delighted to have somebody else explain to me how to tell by just giving a Hamiltonian for the evolution of a system, or a symbolic dynamics, how to say what is the computation it's doing without actually imposing an interpretation on it, a way of trying to assign meaning to its states, just intrinsically, what is a computation? No idea of how to do that. But do you know how to figure out the thermodynamic cost of systems that are operating off equilibrium, and also have these precise formal definitions of different kinds of computation from computer science? So rather than go after that central problem, what we're gonna be doing in this course is actually learning some computer science, learning some stochastic thermodynamics, and seeing what those two can actually say about one another, okay? So, let's see, I think I may have, okay, I'll get to that at the end. So here's a, I'm just a quick overview of the syllabus. Today, as soon as the pompous windbag that's me gets off the stage, my delightful assistant, my teaching assistant, Guruje Karjesh, she's gonna be presenting actually a review on information theory. Saying this, I feel a little bit like those cliches of 19th century where there's a magician on stage about to carve somebody in half and so on and so forth. In any case, and then in the second hour today, I'm going to present a way that you can use information theory to derive equilibrium statistical physics. This is all to get people with the right set of tools that we can move forward. Then tomorrow, the first lecture is going to be using a more conventional derivation, just to get a review of conventional statistical physics based upon infinite heat baths. After that, we're going to, I'm gonna be presenting then continuous time Markov chain-based stochastic thermodynamics. This is the core of this new discipline. The way that you basically decompose the time derivative of the Shannon entropy to get a whole bunch of different, very, very powerful bounds on what computational systems can actually do. Some of those bounds are things like what are called thermodynamic uncertainty relations. Some of them are what are called finite bath thermodynamics. This is a very intriguing set of approaches. Everything up through here, everything back through there assumes you have an infinite bath that's always a thermal equilibrium. That's the way that the statistical physics you learned when you were younger, that the 20th century statistical physics, that's the way it was done. No bath is truly infinite, or more to the points, no physical system accesses all the degrees of freedom of an infinite bath. Real baths, real reservoirs are finite. That's what you find when you look inside the innards of eukaryotic cells, the innards of computers like this one and so on and so forth. It turns out that inspired and motivated by a lot of these kinds of results and continuous time Markov chains, stochastic thermodynamics, you can actually extend them to get some very interesting results for the case where the bath itself is finite. So it's not just that the system is off the equilibrium, so is the bath. Then going to be showing how to use that for quantum thermodynamics, where life gets more slippery, more interesting, richer, in some ways that you would wish it weren't, in some ways that it's very nice that it is. Come on, there we go. Then we'll be doing a little bit of computer science, a little bit of an overview of Chomsky hierarchy. This is sort of the most canonical undergraduate introduction to computer science. Kinds of models of computers are all basically elements in the Chomsky hierarchy. The focus there will be what's sometimes considered essentially the simplest, interesting computational system was called a deterministic finite automaton. Then on to Turing machines, which only has one S, in case anybody's curious, and what's sometimes called algorithmic information theory, which is basically a way of taking a lot of the concepts of Shannon's information theory, which is going to be reviewed in lecture two, but instead formulating it in terms of Turing machines, the idea is to get away from the need of probabilities completely. To be able to say, if I've just got a single instance of some systems dynamics, how complex is that single instance, as opposed to using Shannon entropy to try to give answer that kind of a question for a distribution over instances? Some amazing minds are involved with all this. Information theory, conventional information theory, that's, of course, Shannon, and algorithmic information theory, that's Komogorov, Turing machines, well, that's Turing. So as a goal for people here, if you wanna have your name in front of a machine, all you gotta do is make some brilliant insight along the same scale as Alan Turing did. Anyway, then we'll be presenting stochastic thermodynamics of computers. Everything up to here, there's computer science, that stuff is statistical physics, stochastic thermodynamics, and now we join the two in misogynistic mishmash. First, graphical models, thermodynamics of Bayes net, thermodynamics of what are called composite systems. Then one of the kinds of thermodynamic costs that has been discovered in stochastic thermodynamics is what's called mismatch cost, and here we'll be going over mismatch cost in various circuits, circuits with deterministic gates, circuits with stochastic gates. There might be some work here as well on mismatch cost in deterministic finite automata, what are called Fredkin circuits, and so on and so forth. Here is some brand spanking new work being done as we speak, hopefully it'll be completed by then, but that's all on me. Anyway, inclusive thermodynamics, so this is finite bath kinds of models that we can apply to arbitrary computational machines, and basically what it turns out is that this finite bath thermodynamics, there are some assumptions that people have to make when they formulate this, thinking of it only as applying in terms of physics systems. There's some weird things that have to be assumed about the initial conditions, but when you start considering computational machines, those are not any more weird kinds of assumptions you need to make the math work. They're actually part of the definition of a computational machine. So it really is a very clean marriage right here to be applying inclusive thermodynamics to computational machines. Anyway, then a little bit on stochastic thermodynamics of Turing machines, specifically, and then the hazing ritual, I am told that, and everybody here has gotta go through the pleasantry of a final exam, and that that will then be the last two hours of the course. They're gonna be hopefully two problem sets. They might end up being one, so all of this is gonna be somewhat in flux, but that was at least the original goal, problem sets, and they have associated due dates. Here, I'm gonna be, this is all gonna be visible on YouTube, I assume. Okay, excellent, because here's just some recommended reading. For just stochastic thermodynamics, there's a book that just came online. Well, Amnato Shirishi. I basically pleaded with him and got him to agree. He's writing a textbook, and the PDF can right now be, before he's actually completed, you can actually download it. It's a very, very good textbook. Pelliti and Pigoletti are spelled. They have a textbook, which is not downloadable. It's physical copy only. Sasuke Ito is thesis, his PhD thesis is a book. You can grab it on the web. Takahiro Sagawa, same kind of thing. I have no idea why that's indented, and this is not, but in any case, I've got a review article from several years ago. It's kind of big, but it covers a lot of the basics called stochastic thermodynamics of computation. Christian Vanderbroek, you'll see me refer to VDB often. That's Vanderbroek, shorthand for that. And he and Masamiya Esposito. This is a fabulous paper. You can read it in afternoon. And as I always say, I mean, this is just me repeating myself. I never felt like I really understood statistical physics. I never felt that it was legal and legitimate until I learned non-equilibrium statistical physics, stochastic thermodynamics of the type that's in this book, in this paper. It's only like when I was 11, a page paper. Before that, I just thought it was, well, frankly, bullshit. But then once I read that, I was like, oh, okay, now this makes sense. Yes, this is the only way you can go after things. And it's interesting the sociology of science. Why, when they put together non-equilibrium statistical physics, they were able to do so in a much cleaner way than statistical physics, but in any case. For conventional statistical physics, what I just impugned, I like actually Sethna's book, Entropy, Autoparameters, and Complexity. A lot of the things that when I read this, when I first learned equilibrium statistical physics, I thought were just crap. People are making this up. It's violating the laws of math. Sethna actually goes through it cleanly and shows, well, this is why that particular point is legitimate and so on and so forth. He pays attention to the things that other people just skip over. For computer science theory, there's tons of them. Aurora and Barack is a nice introductory textbook. For touring machines, there's really no alternative. That's the same breadth and completeness as Lee and Vitani. There are several additions out of that. Chris Moore, a colleague of mine at the Santa Fe Institute, he's got some lecture notes on deterministic finite automata, which I really like. Hopcroft, Matvani, and Ullmann, that's one set of three authors. This is kind of like the standard textbook from a couple of decades ago. I find Aurora and Barack and so on to be easier reading than this one, but anyway, I think that's a pretty good one as well. Quantum Mechanics, Mike Nielsen and Chuang, that's the standard textbook there. There's a huge amount of material that we're not going to go after in this talk, despite how broad it is going to be. All of quantum computation. Notice that even though, for people who know something about quantum computers, the actual dynamics of a quantum computer is a unitary, which means that there is actually no entropy dissipation. Thermodynamics does not tell you anything really about a quantum computer as it's operating, but you've always got to get data from the classical world into your quantum computer and from the quantum computer back out to the classical world. And if you think about it, in fact, we want to ultimately be able to do massive amounts of data in and out. All of these fantastic things about AI in the last decade or so, those are really just based upon nothing more than massive amounts of data. Neural nets are frankly, and I can say this because I was one of the first people to work on them. They're kind of junky. There's nothing particularly profound there. They're just held together with gum and bailing wire and tape and things like that. But the reason that they managed to work so well is because, A, you can actually implement them on GPUs, especially made processing units, and B, because of all these massive amounts of data. So if we want to ever marry the big data revolution with quantum computers, we're gonna have to figure out a way to get massive amounts of data in and out. That interface between the quantum and the classical world, there is gonna be all about thermodynamics. So even though the quantum computation itself isn't going to involve it, everything that's around it, everything that allows us to use it will. And nobody actually knows anything about that yet, by the way. This is, if you want to come up with a PhD thesis, you'll be one of the first. Then other things I'm not gonna be talking about, this is back in sarcastic thermodynamics. Speed limit theorem, stopping time theorems, first passage theorems here at the ICTP at Goroldon. I've been having some really interesting conversations with him, they're really smart. Really nice guy, he's got some results here and I am slicing him, by now. Try now. Okay, all right, great. My bad, as they say, my bad. Anyway, so there's gonna be nothing that's gonna be presented in this talk on speed limit theorems, first passage, et cetera, et cetera. All of, go away you. All of computational complexity theory. A lot of that actually is being done. With, go back to share. What happened here, where did my zoom? I just got knocked out of a zoom it seems. Does anybody remember what the passcode is? 664793, join without video. Okay, moving right along. Nevermind the technical difficulties. Other things that will not be talked about is computational complexity theory. There's a lot of great stuff about that and I am muted, I'm supposed to be, go away. That's being done, I'm a qubit, great name for somebody. Unfortunately, he passed away recently, but he's got some great papers. Nato Shiraiishi has some great papers on basically what's called uncomputability, the systems that actually reach or exceed the limits of what Turing machines do, if you interpret them correctly, goes all the way back to work in the 80s by Perle and Richards. Arguments about what the physical church Turing thesis means, whether it applies or not, so on and so forth. None of that will be discussed here. There will be nothing about the stochastic thermodynamics of what's called transducers, nothing about the effects of constraints on a system's dynamics. This is work by me with a postdoc and other people, there will be none of that. Nothing about statistical physics of generative grammars, there's work by Max Tegmark and Eric DeGriely on that. Highly detailed models of electronic circuits, Massimiana Esposito, David Limmer, nothing on that, nothing on computation in cells, nothing on coding theory. There will be a little bit of asymptotic partition, but very, very little on these kinds of things beyond what's gonna be reviewed this morning. Nothing about phase transitions of computational complexity problems. All of these things are not gonna be talked about. Anyway, here, this is the main thing that I would think would be good for possibly for the students. Here's a very, very partial bibliography as of about a six months ago of all kinds of papers that are related to this basically exploding field. And so we can go through, people can go through all of that at their leisure. Okay, so then just some operating rules the road for this course. Please, please, please, I cannot say this enough. Interrupt me. I, for whatever bizarre psychological reasons, I can just talk into a vacuum for a long time. I am built assuming I'm gonna be getting feedback. That's the way that I talk. I need that feedback, or it's just gonna be one long drivel stream, okay? If you've got something that you don't understand, really, really good odds that other people in the audience don't understand it as well. So please say it. If you want to impress people who don't matter, then shut up. But if you want to impress the people who do matter, who really understand, oh yeah, there is a subtle issue, ask the question. That's the way to impress me, okay? So, for whatever that's worth. Also, here's a little bit of a warning. This is actually, all of this is Matteo's fault. I've never given a course lectures to students in my life. I've had students a lot. Back when I was a TA, I guess, I gave some kind of lectures and teaching assistants hours. I've never done this. I've never given a course. I fully expect that there's way too much material here. We probably won't get to all of it. Anyway, be warned. I'm doing my best, so I'm learning as well as you, okay? So that's even more reason for you to ask questions because it's the cliche. Things are obvious to the person who already knows the material and not obvious to the students. So please, please interrupt me for that reason as well. And so, that's it for these introductory slides. Gugge will now give a review of some information theory. And then, I guess, in the second half of this, and we might be already running long, but I think that's fine. In the second half of this, I'll at least start showing how you can use information theory to derive physical physics, which is the opposite, which is not the way it's done in the conventional introductory textbooks. Any questions there at that point? Okay. Yeah, so I talked, come on, go back. So we were thinking to have two of them, one set that we were going to be giving out today, and then another set later on. So it's gonna be Tuesday. Tuesday, okay. Okay, so the other things there. All right, any other questions about logistics or subject matter or the meaning of life or just what kind of color dog should you get for your next pet? I've got great insights on that issue as well. By the way, if anybody's concerned, many people will confirm this. I do have the smartest, cutest, most wonderful dog on planet Earth. There's no doubt about that. That's an easy proof. That's what's called an existence proof. You can do the experimental test. So in any case, okay. So let's see, it's 11.30 now. We're supposed to go to 12.45 today, correct? Mistake, mistake, mistake, mistake. We'll be here for hours. Let's go to 12.45, because people will be burnt out. At least this person will be burnt out. Okay, great, thanks. Somebody here is paying attention, thank God. So let's go to 12.45. We might not get through everything, but there's 18 hours of these lectures. So it might be that the second half of the material on using information theory to derive statistical physics that might bleed over into the beginning of tomorrow morning. Okay, everybody fine? Okay, now somebody who actually knows what the hell they're talking about. Gilgit, you're up. You see, that's the key to being a professor. You gotta make people think you know what you're doing. You don't have to actually know it, just make people think that you know it. Yeah, yeah. You want this thing. But are we given a break of five minutes or do you want it? But you are coming up. Like I just put it. Like this, can you put it in? No, no, it's here, it's here. Yes, like this too, yeah. You're talking about smartness and talking about literature. Okay, that's good. Now I'm gonna do it like that. It's okay, it's working. Okay, now I need to drink water because I'm thirsty, but then I'll start. How many people here, sorry. How many people here have taken a course on, everyone knows a bit of like information theory, right? Okay, perfect. So this is gonna be a review lecture. So what we are gonna do is like talk about like this major contributions of information theory and so on and so forth with sort of like references to the subsequent lectures that we are gonna see actually, which is like, I don't know, ranging to for example, finite-bath formulation of computational machines. David didn't talk about it, but actually there is like a corresponding information source formulation of them. So I'm gonna put like, I don't know, like so many references inside, but do not expect because this is like, yeah, we have finite amount of time and it's one hour of lecture, we're not gonna go through like the proofs of, I don't know, no easy channel coding theorem, but we are gonna be reminded of it, so. Okay, that should work. Okay, so just if you would like to be reminded of some things about information theory, the, I think the Bible, like the holy book of information theory is, should I go like that? Okay, it's almost in color. Oopsie, sorry, sorry. Okay, okay. So if you're curious about something like a very technical detail that we're not gonna go through in this lecture, you can always consult with this one. And my favorite one is actually from David McKay. It's given much more of an, I guess, insight of what information theory is. But my favorite, favorite one is, I think this one. This is the original paper, A Mathematical Theory of Communication. And it's actually, so going back to this point, we can start talking about information theory. Information theory is actually one of the rare fortunate fields where we can identify its beginning, not like, oh, you know, there's like an amorphous amount of like research, blah, blah, and sort of like debates, but seriously, so there was some really interesting work done by, for example, people like, we quizzed and Hartley in 1920s at the Bell Labs. And then Shannon, I don't know, like probably really young, young, young at the time when he entered Bell Labs to study, sort of built his insight on these works that used information theoretical concepts for particular machines, such as like telegraphs and so on and so forth. But what Shannon did was to take these, all of these ideas that came before him and turned it into like a system where we can discuss, universally discuss communication. So this is one of the major contributions that we're gonna go through. So what is communication? Not really like a philosophical question, but I think it just involves basic three components, which is like, you need to, first of all, to communicate something, you need to have that something. So, encode a message. So this is the source level, okay? And it can be like wave-led forms or just like binary digits or whatever you'd like. And then you need to transmit that message. So you need to have a physical, and this part is important, you need to have a physical medium to transmit that message. This is the communication channel. I don't know why this is making that noise, but I haven't been using Chalk for a long time, so yes, that's it, yeah, thank you. So yeah, that's good. And then after that, of course, you need to have a receiver that will decode that message, right? Okay, so we say that this is all physical, right? And if you have a physical environment, what you have is you can't avoid it, it always wins, it's noise. So the environments are noisy and the communication is imperfect. So if I send the message from here to here, there is non-zero probability of making some errors, okay? So to sort of tackle this kind of a problem, there are two main approaches that you can take, you can take like a more physical approach, or you can take a system approach. In the physical approach, what you're doing is to actually relate, this is expensive and this is not good, for example, if your circuit is noisy, you can just try to obtain some more highly reliable circuit components and so on and so forth, but it's gonna take so much time, blah, blah, blah, and you can not get rid of noise. So we don't like them. But the system approach, it's neat and it is what's like given in Shannon's information theory. So through the system approach, we can define, I'm gonna number it with zero because I actually want to talk about two points that are central to information theory. So zero is defining the system approach through a formal architecture of communication systems. This is really important because what you're doing is to take the physical details out like telegraph or I don't know, telephone like mobile that I don't know, I don't care, but put it into some universal architecture in this form. Yeah, your information source generates some messages, let's say like a encoder in a vector s, and then this encoder takes this message and it's redundancy, okay? And then this is like a transmitted message and the reason that we would like to actually, normally if you consider communication, noisy communication, what you have is like a source channel because we say that you have a source channel and they're receiver like a destination. But the introduction of these, like this computational requirement of encoder and decoder is actually what allows you to have reliable communication as we will see in the, I don't know, following minutes. And so encoder adds redundancy and channel has inherent noise and it transmits this T message like this T vector in terms of like a R vector. And decoder actually knows the redundancy added by the encoder. So decoder takes this, for example, if you're using like a function, let's say for the encoder to add some redundancy, you're taking like this, you know, the reverse effect and then so that you can keep track of the redundancy that is introduced by you in a smart way, in a neat way, and also the inherent noise of the channel. So through this formal architecture, you can model whatever you'd like. This is one of the beautiful things about information theory that actually sort of reminds the Chomsky hierarchy, for example, where we can actually define a formal model so that we find a formal architecture so that we can talk about the ultimate limits and on bounds on some of the things that we would like to do. In computer science and in the theory of computational complexity, this basically comes with the question of just thinking about things like, for example, you know, can you solve this task given this formal architecture, or if you can solve this task efficiently, you can solve this task. But in the case of communication theory, information theory, it becomes, instead of solving this task of computation, solve this task of communication. And these tasks of communication, they generally come into classes. And one of them is data compression. So it asks, what are the ultimate limits to efficient representation of data? And this is related to what we know today as the source coding theorem, okay? And then the other part comes with the data transmission. And this is the part that's concerning the channel coding. Okay, so of course, yeah, it's easy to talk about them, but where is the formal structure? So just as a sort of like a spoiler reminder, because we know about it. So the ultimate limit to data compression is actually captured in terms of entropy, H. And the ultimate limit to data transmission is captured in terms of channel capacity, which is given in terms of the maximum, like this, yeah, maximum of the information, mutual information. We will discuss more about that, but let me just, yeah, put them here. Okay, so yeah, just as a reminder of how these things go. So let's say that you have a random variable X that is drawn from some distribution P of X. X is a member of some alphabet that defines this random variable. You can write down the channel entropy in this form. So as it goes, yeah, we know that it's like the measure of the average uncertainty in a random variable and so on and so forth. And there are some cooler descriptions to it. So for example, if you want to encode a random variable, you need, it also gives you like the average description, like the description that you need to invoke. So this is like for one single random variable, you can, you know how to write joint entropy, but anyways, just like that. So this goes like that. And then, yeah, so this is one important thing that we're gonna use throughout our lives probably. So this saves lives. But anyways, so building on that, we say it, okay, for example, if you want to encode some objects, random variable, you need to use on average like this H of X, the entropy of this random variable. But it's actually dependent on the distribution that you use to define a random variable, right? So what if you actually think that this random variable is represented by some other distribution Q of X than P of X, then what you're doing is to sort of like, okay, let's say the cost of representing this H of X when X is drawn from P of X is the entropy of H of X, but there must be something else. If you actually do not, I mean, do not know what the underlying distribution is, there must be like a mismatch cost, right? So this mismatch cost is quantified by something we call to be KL divergence. It's basically a measure of distance between two probability distributions. It basically tells you how inefficient you become if you actually use or assume a probability distribution Q for a random variable X that is actually defined by a probability distribution P. So now we talked about entropy and like it's, we are gonna capture it in source coding and so on and so forth. So we defined this one. And one interesting thing that we can consider is as we did here, now consider a joint distribution of random variables and try to see, for example, think of a setup where you would have this zero and what would it mean and where you would have this quantity non-zero and what would it mean. So we know that, for example, if you're gonna take two independent random variables, you can write their joint distribution as like a product distribution, right? So if you take this into this formula, what you're gonna get is zero KL divergence, okay? So this means something. If you get two independent random variables, you're not gonna get any Meach-Maskos because they have no dependency. But in this case, if you have some dependency, you need to have a quantification, a measure of, for example, if you get some random variable X and you want to measure the uncertainty of X that is reduced by the knowledge of another random variable Y, what do you do? You actually use this as a measure and it works well. So this is the mutual information. Okay, I'm not summing it anymore. Okay, so these are the basic elementary concepts that we're gonna use in the rest of this one. So, okay, you can go, I guess. So, okay, now we can start talking about this data compression and source coding. Okay, one of the things that we want to do and actually shouldn't consider it, I think in a brilliant way is that like, okay, we want to talk about how reliably we can communicate with some place, right? Or with something. So how we can send the data in an efficient and reliable way. So it brings us back to the question of, for example, can you, should you send this data, for example, like for one time or two times or how many times? So one of the things I think it's sort of like I don't know, stealing inspiration from statistical mechanics and thermodynamics is that Shannon's key insight was that to sort of quantify the reliability of this communication, you need to assume like this and successive repeated trials of like communicating some symbols or generating some symbols, okay? So for that, he used a very strange concept that actually is related to what we are gonna just like take a glen set, which is like the asymptotic acupressure property, okay? So this is like an analog of low of large numbers and like a, yeah, equal of large numbers, but yeah, just remember what low of large numbers was saying. So when you have x1 to x and random variables that are IID generated from a distribution p of x, if you send them like this, what you get is like really close to average behavior, okay? So this is the low of large numbers. What asymptotic acupressure property says is to sort of like relate this to the entropy behavior of these identically distributed random variable instead of just the mean behavior of those random variables. Now we are, okay, I should actually do it, yeah. I can take this again, it's not gonna be a problem. Putting equals, but it's actually convergence and probability, but anyway, so you can check the details in Thomas and Kauber. So what it's basically saying is that think about all the data sequences that you can generate, okay? Given an alphabet and given a finite alphabet, you're asked to generate this kind of a set. And another key insight was that there must be some typical set that because this asymptotic acupressure property is basically holding, when you choose some data sequence from this typical set, there is like a probability of two to the minus n, h of x, of choosing any data sequence from that set. Typically. I don't know why typically, but so here's the thing. So based on that, he defined a typical set. I'm going to just come back to that. It's one of the actually most, I think, counter-intuitive and really not appealing concepts in information theory because I used to think, I don't know, I had these tendencies to think that, oh okay, if it's typical then probably it's sort of like, there's like a threshold of probability and it's sort of like containing the typical data sequences generated with this probability, satisfying this threshold probability. But it's not like that. So sometimes people use, I think, in mathematics if you would like to sort of come up with some theoretical tool, it basically trumps this intuitive mathematical preference function. So typical set is one of them, like a perfect example of that. But yeah, basically what, so this is like the, how we denote the typical set most frequently. So each basically contains elements who has the probability of occurring or of being chosen in between these parentheses. Sorry, what, what? Okay, yeah, nice, okay, okay. So, okay, so we write it this way, but if we just take the logarithm and divide it by like one over n, what you're gonna see is that for a definition of such typical set, you get this typical sequences that satisfy the following property. So basically a typical set is a set of sequences is a subset, which is a subset of all sequences that you can generate. So that if you sample from this typical set, the entropy that you will get, the entry value of entropy that you will get, it will be like incredibly close to the actual entropy of the, over the set of the all sequences that you can generate, okay. So based on that, so this is something, yeah, again like he sort of invoked for just mathematical reasons, he taught that, okay, if we generate all of these sequences, then there must be like a typical subset of sequences that actually occur disproportionately in the dataset stream, okay. But still by using this asymptotic acupartition property, we can definitely guarantee that typical sets exist for any kind of data source and it can be small and it can be as small as it gets. So this is like a universal result that is used. So why do we talk about this? Because it actually gives an intuitive, I guess, understanding to source coding. So AEP is the abbreviation for this guy. So I'm using them in a synonymous manner, like, sorry, data compression. Data compression and source coding, they're all about efficient representation, okay. So one of the basic things is that, so we are generating like this all of these sequences that can be generated by a finite alphabet. And then we're dividing into the typical set and the complement of it, like a non-typical set. We would like to ask the question of, okay, how can we actually encode these? For example, think about this is like, this is coming from algorithm theory, actually. Think about a phase space, okay. And if you want to sort of like discritize this phase space, like, to these points, okay. And for each point that you have in the typical set and non-typical set, you would like to encode it as an object by itself and you would like to sort of assign a code word, like an expression for these points, okay. So one of the results that you can prove, again, Thomas Encover, just like chapter two or chapter three or something like that, is that you can actually bond the size of the typical set. This is one of the beautiful things, like this is also one of the things that necessitate the existence of the typical set because otherwise you might not be able to bond the entire set itself and so on and so forth, but you can do this for this guy. So the size is bounded by this, okay. So given this, we are now trying to sort of like come up with a, you know, like a really straightforward forward coding scheme that would allow us to encode these dot dot dots in this discretized space. And so one thing that we can actually do is to just basically index them, okay. Like just something called like an ordering, like the x-yrographic ordering. And when we do this, the thing that we get is basically, okay, so how many bits do I need to encode these points, right? So for this guy over here, just, yeah, take the logarithm with the base two. So we will see that if you have this amount of points in your typical set, then you will need, on average, n of n h bits to describe this typical set, okay? To encode this typical set. And yeah, you can do the same thing for like non-typical set, but we don't really care about it. Let's say that this is like, I don't know, the size of it is x and you're going to need just like log x and almost like log x and n bits. Yeah, okay, there are some subtleties, but we don't care about that. Okay, so this is actually one of the first results of data compression. And if you actually check in textbooks or something like that, the source coding theorem, the source coding theorem comes in like two ways. One of them is like the source coding theorem for this kind of a really not neat, boring kind of data compression. And it basically says that if you have an IID distributed random variables to encode these random variables, you can just use an H bits and you're good to go. But below this, you cannot encode the entire information itself and you will lose, it's like almost virtually certain, they say, that you lose information. But to not lose information, yeah, you need at least like this kind of amount, okay? So this is something. So this is like the first part of the source coding theorem. But this is actually a bit boring. So what we will actually do is to talk about these real source coding theorem where we use different like coding schemes. So let me, I'm just gonna check time because I'm talking a lot. And so, okay, I was thinking entropy. I think I can do that. Okay. Just like that. Okay, how is the pace? Give me feedback. Is it good? Or is it bad? Or I'm an undergrad. I just need to say this. So like, please just like, you can butter me. So like just like culturally speaking, like the academic stuff and so on and so forth. You have the senior level. So like, please butter me. Okay, awkward. Sorry. Okay. So we just stated with the AUP that you can use on average like an H bits to encode these and IID random variables. But what if you have a growing sequence? Okay. Can you actually use entropy itself as a measure? Because we know that if you have a like, whoop, whoop, whoop, x1 to xn with large n, is it gonna be even bounded? In most cases, no. And so for that, we need to come up with actually a measure of this, of a process where you have like this growing n and you can still talk about how the entropy is changing and we use for that measure, the entropy rate, okay? So before defining the entropy rate, just like you're considering any kind of like a stochastic process like this, indexed by, it's like a sequence of random variables. You don't have to go profound for this course. Yeah. Okay. And so there is some interesting, of course, it's so interesting. Like there are so many interesting properties to stochastic processes, but some of them that we are gonna consider is like, for example, thinking about the dependency because the AUP is actually applying to random variables that are IID distributed, right? But there might be arbitrary dependency between random variables and they might have some really interesting properties such as like being stationary or non-stationary and so on and so forth. So let's just, okay. Everyone knows about like stationary processes, Markov processes and just, okay. Okay, then I'm not gonna, because then I'm just going to write down like the definitions of entropy rates. Okay, so there are two assumptions that we are making in the following and these assumptions are also like the underlying assumptions of like the classical, let's say information theory that sort of supports the results generated by Shannon. But in the research that is coming since like, I don't know, like that is being generated since the ATs or something like that, they're actually sort of having more flexible assumptions, but yeah, this is what we're gonna be dealing with in all parts of the lectures. So assumption, stationarity of the stochastic process so that if you have a joint distribution over random variables and if you have like a shift in time, you don't have a change over the joint distribution. And we're gonna consider Markov processes and to characterize the Markov processes without having like, you know, just like an expensive strategy, we need to be minimal as possible and assume that their time invariant. So it means that our like lovely conditional distribution it's going to be fixed, it's not gonna be changed. So this is the assumption that allows us to actually characterize a Markov process by two things, which is like one of them is the initial distribution and the Markov transition metrics. So we know how to evaluate like the Markov chain by the and so on so forth, blah, blah, blah. Okay, so given these, again, we're coming back to this question of how to actually define a concept such as entropy, right? So we're gonna be talking about the growth of entropy as n increases. So this is all for large n, okay? So this is like the limit behavior. So people are smart, they came up with this two definitions of an entropy rate which are guaranteed to be equal to one another and that's all great because we're, yeah, it's, so yeah, we can use both of them interchangeably, but the most popular one is this one. They show it with a curly x. I forget a one over anyway, sorry. Okay, so this is the one and this also sort of like gives the meaning of, oh, what is like the entropy per symbol, per random variable that you generate as n increases during your stochastic process. And the other one is, let's see the small baby age and it is, again, the limit, but now you're using a conditional entropy, condition of like entropy of xn given the past, all of the past. They're equal to one another. So we use this to basically, whoop, what am I doing? Okay, we use this to obtain some cool formula for the entropy rate of Markov processes. Yeah, this is for example, okay, at the reference here, this is one of the things that we will come back to like, I don't know, in the next week of lectures, when we are talking about, for example, probably potentially, it depends on him, Markov information sources. So that's why I'm putting it here, so that we won't have to go back to, I don't know, Thomas and Kauber, page, blah, blah, but just check what we have here, okay? So we said that these two guys are equal to one another. So let's see, so if you take it here, what we are seeing is that, if you consider Markov process, we know that your random variables dependence is only on the precedent one, right? So basically what you can do for the entropy rate is this guy, okay? And the trick that you can use here is like a really baby one, which is like we said that we have assumptions, stationarity and timing variance, so basically we can write this as this one, okay? So these are the things that we will use in the next week if we use them. Okay, so now we can continue with this second part of source coding and the data compression, which is like the most popular form of data compression. Because it is what is now used in data compression. So the problem with AEP and just kind of like indexing the typical set and the other sequences that you can possibly generate is that you're basically encoding every point by using really large blocks of symbols, which is inefficient, okay? This is like a doable, but not really, it's not practical, it's not really used. So one of the things that you can consider is actually asking the question of, okay, have to come up with really creative and smart schemes of actually encoding data, okay? So this is where actual source coding comes in. And the gist of source coding is that, okay, you're generating data and you're basically going to assume that the more frequent outcomes of this data, they must be assigned to the shorter codeword length and less words, okay? So basically more frequent gets the shorter codeword, smart. And less frequent gets the longer codeword. But still, of course, there was the question of, I mean, how are you going to assign this? And so now there are some, of course, definitions that we must make. For example, when we talk about source code, what are we talking about mathematically, right? This is the range of X, okay? If, yeah, we can talk about binary or let's see. Yeah, the set of E, okay? So this is basically like the formal, formal definition of a source code. And for, of course, every baby X, this is a, yeah, it makes a difference, clear? Okay, for every X, we actually assign a codeword. So there is a difference between code and the codeword. So we basically say that C of X is a codeword for X. And we can use, we will use that, this L of X to actually show this one, okay? So, yeah. So what, okay, so now there are some questions that we need to consider still. Okay, we define the source code, we want to come back to this question of, okay, let's dig further how we can efficiently represent data. If you have a source code and if you actually need to do some useful work with it, there are like some baby questions that you need to ask and maybe sort of like requirements that you need to give. And one of them is that your code must be uniquely decodable because if you're just sending like data that can be understood in so many different ways, you're not efficiently communicating. So this is the assumption of uniquely decodability. So to actually again, to mathematically like, yeah, to formalize this, what we need to consider is to sort of like an extension of a source code. Okay, so here's a very ugly star, so okay. And sorry. And it's basically the concatenation of all the code words that you're using included in this, okay? Okay, so you're basically just building this definition of uniquely decodability on this guy over here. You're saying that if you have different codes X, code words X and X prime that are not equal to, that are not identical, then you're going to have different concatenations, different extended code words that include these guys. These are vectors actually, but let me do this. Okay, so this is uniquely decodability. And the other one is that you need to know when your code is actually ending. You need to know when a message actually comes to an end and you need to identify it, okay? So that makes a second assumption slash requirement, this is a requirement, which is that prefix coding. We call it prefix coding. And if you use these kind of calls, if you make this requirement, there is no loss in performance or whatsoever. This is also something that's shown in, I don't know, yeah, it was here. Thomas Uncover, you can check that. So basically a code word is a prefix of another code word if you can do something like this. C is a prefix of this one. So you need to define a prefix code such that no code word is a prefix to another. This is one. And then the third one is actually, it's not a requirement, but we're wishing that it might be satisfied. And it's about the efficiency of the data compression and efficiency in information theory is always sort of captured in this code word length, okay? L of x. But in this case, we're going to be talking about the average code length, just like define over the code word length that are actually distributed with respect to some probability distributions. So what we are going to consider, it's gonna be this one. Sometimes we use this actually, maybe like this. I can index it, but I, yeah, these are basically the index of the code words, okay? Okay, so given this, yeah, not to prove it, but maybe, yeah, we can talk about crafts in, let's see, do I have the time for that? Okay, how many people know crafts inequality? Okay, let's go over crafts inequality. Okay, so, okay, so yeah, here is the crafts inequality. So I'm not going to prove it, so I'm trying to give like a very intuitive sort of like a proof proof, okay? It's not a real proof, but yeah, so we will get what it is. So we know that, for example, this is like a uniquely decodable code, okay? Let's say that you actually want to shorten this length of one of the code words that you're using. So what you're gonna do is like, for example, okay, I made it zero, but I just want to retain this code to be uniquely decodable. So how will you do that? Now I took a risk, here you go. This is gonna be probably wrong, but something along the lines of like this one, okay? So what you do here is like, at the expense of having a shorter code word length for one of your code words, you're actually paying the longer code word lengths, okay? So it's gonna, on average, it's gonna take many more things away from you. So I'm not, yeah, we can double check this if this is UD, but at least I'm pretty sure that if you are taking like one digit, then you're gonna add like more digits. It's not going to be good. So basically it's actually like, okay, here's the intuitive approach, but tell me how it works. So let's think of like a balloon of like code words that you can generate, okay? And like it's like an incompressible liquid. Think of it, okay? So you're going to, you're trying to compress it. So compressing corresponds to trying to assign shorter code word lengths and still be uniquely decodable, okay? But when you try to compress it, the other parts, they will get expanded, right? So there's like a, because it's like an incompressible subject, I mean, there is like a bond on what you can do, right? So you cannot get everything shorter and shorter and shorter and you cannot be happy forever. You have to pay. Wow, that sounds depressing, but yeah, exactly. So basically this is how crafts and equalities work. Now think about like this volume that you cannot compress to be one, okay? So crafts and equality says this. If you're actually within this, inside this ball of volume one with code words of, code word lengths of Li, if you satisfy this kind of an inequality, then you're good to go. Then you can have prefix codes that you can uniquely decode, okay? So this is one of the things that we are going to use, I think in prefix string machines because, yeah, okay. So yeah, based on this, now we're not going to prove it by using, is it okay? Sorry, is it perfect to one order? Because for example, let's take this one, okay? I'm sending you this, for example, think about this code. I'm sending you this message, but you don't know if, for example, and you're processing like this first digit of this first message, okay? But you don't know if it's actually coming to an end because you also have like this one here or it's gonna continue, okay? So you need to be uniquely able to say that, okay, now like I got a message and when I stopped reading it, I know what that message is. There is no ambiguity in that message, okay? Is it good? So this is what we mean by we don't have, we don't want prefixes. So, okay, now, oh, this is basically, yeah, it's not like a requirement as I try to sort of emphasize it. It's more like a wish. We want to sort of like compress the data as much as possible and we use this measure of the average code worth length as to understand how well you are compressing the data. So now I'm going to connect it and like in one minute I'm going to come back to ask you if it's clearer. What is what we want? That's the average, right? We want it to be minimum as possible. So just give me one minute because there is a limit like a bond on how minimum, like minimal it can be and it's given by the entropy, okay? So the reason that I actually also wanted to introduce is that I mean I took a semiconductor course that I don't remember anything. So like... I just want to make sure so this is not new material people are seeing for the first time. Okay, good, thank you, yep. Okay, so yeah. So yeah, we talked about crafts and equality and yeah, I encourage you to go if like, yeah, there's a question there. To actually see a proof of how to use crafts and equality to generate the source coding theorem in the form as we know it and that is as follows. So when you're assigning these code words and you have like a whole code and you're actually an objective is to keep the code words length as minimum as possible, you sort of find the bond which is that the entropy. You, like the lower bond is the entropy. So it basically gives you the minimum number of bits that you need to describe when you're given an object as we just introduced very... I mean, I tell the same thing when I first introduced like this source coding theorem. And how good you can get is this. Still, no more, no less. I mean, if you're... I don't know, it's problematic if you want to go beyond that but this is the source coding theorem. So this is one of the things that we are gonna use when we actually talk about can I take this risk? It's on me. When we actually talk about for example, the finite bath from Elizabeth finite automata and we are going to see some relation to I guess, yeah, I'm saying it, computational complexity of finite automata which is like a computer science complexity of finite automata and this kind of code length. So just keep that in mind. So we talked about nearly a two over three. Okay, that's good. Now we are going to be talking about noisy channel coding and it's just a reminder. Wow, it's a... I think it's a really profound one. So it's always risky to go into channel coding or the channel coding but let's see how well we can recover it. Okay, so far what we did was actually just encoding data and assuming that without actually having it go through through a noisy communication channel we are basically trying to come up with the ways of reliably regenerating the data once you compress it. And we try to understand how well we can compress the data. But now it comes to the real question of how well you can communicate and how well you can communicate is a question that we capture in a more formal way which is what is like the ultimate limit on the communication rate, transmission rate, okay? So for that we need to go back to this formal picture of you have like an encoder and channel and the encoder and the source and the destination, okay? So now we are at the point that we'll allow, I mean, yeah, now we can talk about actually how well you can transmit these generated messages and then recover them at the destination. So for that we need to understand the definition of communication channel, right? And it comes with a triple, it's cute. So now we can just actually concentrate on this part, okay? So we are assuming that you're basically, you have some messages that are generated and they're being transmitted. What letter did I, okay, yeah, I think I can use it. Okay, X and Y and, okay. So this is basically this process of like the data stream that is generating. So it includes baby code words inside, okay? So we need to define the channel and we do it in the following way. We say that it's a triple of the source alphabet and source alphabet is actually, let me take this here, okay? Which basically generates these sequences of messages like a binary alphabet, okay? And okay, so this is called the reproduction alphabet, okay? And what you're doing when you have a channel is that you're sending these messages generated over the source alphabet and you want to recover them by another alphabet, presumably like the same alphabet if it's a binary alphabet by this alphabet Y. But what you actually have fundamentally with the channel is that think about this for example picture of generating messages and sending them to a channel, okay? So what you would like to know is not only how well you're like transmitting is or how well you're generating it but also how well does B receive the message, okay? So if you're making an error, if you're sending like a long long long sequence of messages, how well does the B point B know actually where you're making that error, okay? This is like crucial to understand. So B actually defined that sort of like this conditional dependence that is a measure to sort of understand, okay, how well is like sort of like the sort of receiving this message or like how well is this random variable Y that is defined over this reproduction alphabet encoding these messages, how well is it dependent or like how is it dependent on this original random variable X, okay? Okay, so of course if you're talking about the dependency between two random variables we're going back to mutual information, right? So at this stage we are defining something called to be the channel capacity, okay? And channel capacity basically is defined as follows and the maximum is over X probability distribution that is defining this encoded message. So what you're doing is because we need to emphasize this because right after this like five minutes or something like that we're gonna talk about rate distortion for also like three minutes just before completing concluding the lecture. So what we have here is that we're assuming that a channel exists and it is characterized by this conditional distribution that identifies the dependence of random variables. But if I actually give you, okay, I'm gonna write down this guy. So if I basically give you this one what we see is that these random variables are defined over probability distributions, right? So this channel capacity and the value of channel capacity itself is defined on the probability distributions over inputs once you fix the channels that you're using which is characterized by this one. So we need to be careful about when we want to identify the channel capacity we need to be careful about the probability distributions that actually maximizes this mutual information, okay? We want that to actually have this communication as reliable and as I don't know as well as possible. So at that point, this might be a place that I have to check some notes because I don't wanna mess up with the notation. Okay, this is a formal definition. Again, what you're doing is sending codes throughout this communication channel and we need some more stuff to actually come up with the formal statement of the noisy channel coding theorem and for that we need to define the code that you're redefining the code that you're going to send throughout the channel. It's composed of three different components. You want to index your code words. So you need to have an index that, sorry, this is one. And then you need to have an encoding function which basically takes this index towards to what you would like to actually send to the channel and then you would need to have a decoding function, okay? And then G or yeah, okay, I can use that. So basically you're reproducing the message that said, right? Now you need to switch to reproduction alphabet and then method to the index set. Okay, so this is like the boring formal definition but necessary formal definition of a code that you send to a communication channel and we actually quantify the rate of this communication that inquires to a channel and it is defined as follows in bits per second or bits per transmission, whatever your unit time interval is. I like seconds, so this is not a big deal. Okay, so it is, okay, so this is the rate and this is basically how, again, how well you're sending your messages. So if you have a good like high transmission rate, you're a happy person. If you don't, you're probably like an unhappy person. So, and basically what noisy channel coding theorem says is as follows. If you have a rate below the channel capacity itself, you're good to go. You can transmit your data which out like a huge amount of information loss and so on and so forth, so it gets good. So, I don't know, communicate well. Of course, there is like a much, I don't know, more interesting background speaking mathematically in Shannon's paper and how this is derived and with like jointly typical sequences. We just talked about typical sets and so on and so forth but we don't have to go into that. And so, again, just as a reminder, one of the things that we discussed was like just pointing out in this channel coding term what you're doing is like fixing the channel and the conditional distribution that characterizes this channel and you're trying to solve an optimization problem over the input distribution, okay? Now you can ask this question in a different way and it will correspond to the following. Now let's say that you have a source distribution, P of X, you want to sort of like regenerate these messages but you're not fixing the communication channel. You're not fixing the conditional distribution P of Y given X that characterizes that channel. So given a source distribution, P of X and we're gonna talk about this in a minute and this is done by actually not now optimizing the P of X as in the channel coding term but like taking a reflection of it so that you're going to try to fix P of X and you're going to try to optimize over P of Y given X, okay? So what is a distortion measure? I mean, can it be anything? So one of the things that we want to understand and I guess this is gonna make, that's great, wow, okay, that's good. I'm happy, okay? So we're gonna be talking about this three more minutes I guess and then I'm going to sort of like give I guess a big picture of what we should expect like from the following lectures in this week and into next week based on these concepts that we just reminded of, okay? So yeah, distortion measure. So let's say that you have some symbol and you want to sort of like send it to a channel and you're regenerating that symbol but you're mis-generating it. You're not recovering the symbol that's generated. So formally speaking, again you have a source alphabet X and you have a reproduction alphabet, okay? Let's just because it needs to be different from noisy channel, I'm just going to give a head but yeah, X, Y, it doesn't matter. So this is your production alphabet, this is the source alphabet and you're generating symbols, you're sending them and then you're trying to regenerate symbols and you would like to understand how well you're regenerating the symbols, okay? Given a constraint and we call this to be the yeah, distortion measure. So what is it? Again, formally talking a lot but this is per symbol distortion measure, okay? It's a mapping from the Cartesian product of these guys from the set of these pairs of reproduced symbols and the originally generated symbols to real-valued positive numbers. This is open to discussion. Yeah, we can discuss that like the forms of the distortion functions but the most popular one is this, hamming distortion. Okay, so if you have a sequence of symbols then what you can do is like one possible way to do it is actually take the average of the oldest distortion so like basically, I don't know, or just like yeah, define some distortion over sequences. So one of the interesting things that's going on here, we can define some codes as we did for the noisy channels for the race distortion as well and this is like, this has a similar definition but now the question that we need to ask after this definition of this, you know, a man calls that you're sending to your communication channel and how you optimize the channel capacity what you're asking is like, how you're going to optimize the rate of communication and this is an optimization problem that comes with the following. Why did we change to this? Because now we would like to sort of minimize, we don't want dependence between like this misgenerated symbols, okay? So one of the things that I like about rate distortion theory is actually, there is not one unique measure of this distortion or there is not one, I mean, it's like, you're basically trying to come up with a fidelity measure of what you're trying to do and one of the, I think it's really, okay, risky but here I go. Yeah, one of the, I think like this open questions that we can ask, for example, using stochastic thermodynamics is actually trying to come up with relevant concepts from stochastic thermodynamics because we know actually how well we're doing this or how efficiently we are doing this in so many setups and so try to understand if we can actually, you know, use this rate distortion perspective to connect these physical quantities and informational quantities together, okay? So let's see, okay, we talked about this. Am I missing anything? Okay. I think this is fine for now, yes. Okay, okay, and just as a like, I guess a regular teaching assistant thing, if you would like to, you know, have any conversations about anything, just please come and find me and I'm more than happy, of course, to discuss. So that's it, thank you. Okay, to sort of summarize things today, what we want to do was to just bring back into people's forefront of their memory all these concepts from information theory. So Thomas and Cover and all these other sources, for that matter, Wikipedia is actually gonna be your, I have found to be the holiest of all saints. So then what we'll be doing next is going to be using things like what's called Diffenete's Dutch book arguments to actually answer questions about why are we using probability theory in the first place. We'll come by seeing other reasons of much more axiomatic ones for why we might want to use entropy to quantify uncertainty. And from that, thanks to my brilliant person called Ed Janes, who unfortunately passed away about a decade or so ago, we will actually derive all statistical physics, essentially from a purely axiomatic foundation, it'll be swimming around information theory. These are tools that we will then be using later on, but it will be actually forming the connection between information theory and physics, okay? Please, please, please ask questions after the lectures, during the lectures, and so on and so forth. And there's all kinds of calibrating here. People have very, very different levels of expertise and so on. And this is gonna be much more efficient way, much more enjoyable way for both you and for mommy and for Gucci. Can I tell one more thing? It's, okay, just we talked about information theory and so on and so forth, but this is like one baby review lecture, but other lectures, I think they're incredible. And it's interesting in the sense that the emergence of these new research fields, such as like, I don't know, it was information theory in the 60s. If you go and check some history, you will see that it's sort of recalling, it's resembling how actually the, I think the non-equilibrium statistical physics and stochastic thermodynamics and the theory of computation looking like, okay? So, yeah, I think I think ICTP is making a great move to actually sort of presenting a lecture series on that. So even if it was boring and if I stumbled during the information theory course, just remember to sort of like, I don't know, come here tomorrow. So that's it. Okay, so now I think it's time for lunch. And for those who want to take the bus, there will be shuttle bus up here. For those who wants to take a walk through the park, it's not raining. So I'm going in five minutes, I'll be upstairs and then we can walk together, okay?