 Welcome to the first real lecture of the course on data compression with the probabilistic models. In the last video I gave an overview of the things that you're going to expect from this course and today we're going to jump right in. If you haven't seen the last video click on the link to the playlist in the video description and it will take you quite right there. Today we're going to see our first class of very simple compression methods which are called symbol codes but before that I want to clarify the problem setting that and this will be the problem setting that we'll be discussing in the entire throughout the entire course. So the problem that we'll be dealing with is that of communication over a channel. So what do I mean with that? Well we have two parties a sender and a receiver and the sender has some message and they want to send that message to the receiver and in order to do that they first encode the message which is some transformation that they do to this message and then they take this encoded message and they send it through what we call a channel. The receiver then takes the output of the channel and it inverts the encoding step so we call this decoding and the decoding step then outputs a reconstruction of the original message. Okay so this is a very general setup but there are a lot of different things can happen in each of these steps along this pipeline. So let's go through each of these in more detail. Let's start with the message. Let's think of some example messages that you might want to send to the receiver. You may want to send for example in a digital file like an image file or a text file but you may also want to send something that's more ephemeral like for example real-time video. Think of a zoom call or a Skype call. Another kind of message that you might want to send is think not at all about digital data at all but imagine that the two of us are in the same room and that we're talking to each other. So the messages that we're then sending are kind of the utterances you know the words that we talk that we pronounce. So these are all different kinds of messages and they have different kinds of properties. So for example they could be digital or analog you know computer files are digital but for example when we talk to each other those are analog information those are analog messages. Another important property of messages is that they typically have a very clear structure. So for example if you're sending an html file to someone then that html file has to follow the grammar of html. So you know that it will start with a dog type and then it will have this the string html in angular brackets and so on. And these this kind of structure that the message obeys leads to certain redundancies. So for example this dog type and this beginning the string html is something that you can expect so it is kind of in some sense a redundant information. Okay this is so far to the message. Now let's think about the channel. I think the channel is probably the most abstract thing so far in this entire pipeline. What do I mean with the channel? Well you're probably thinking of something like the internet and to be more concrete we're thinking of protocols like TCP or UDP and that's of course a very important channel but there are more difficult more there's a much bigger variety of channels. So for example the internet itself is actually a very complex system and if you work at a company that builds networking tools to build up a network infrastructure then the channel that you are concerned with might be something more more physical like fiber optic cables. Another channel let's think again about the situation where the two of us are in the same room and we're talking with each other and then the channel would be the sound waves that are emitted by the speaker and then received by the ear of the receiver. And finally I want to point out one channel that you may not be thinking of as a channel but it really makes sense to treat it in the same framework and you know think about these three channels that we've discussed so far they all transmit messages or data across space. So from one point in space to another point in space but you can also have a channel that transmits messages in time and keeps them at the same point in space and we would call that a storage device. So if you store data on your hard drive or on your SSD then you're basically sending it to your future self and it really makes sense to treat storage devices in the same framework because we will see that storage devices have a lot of the same properties that these more typical channels that you would think of will have. So what are these properties? Again a channel could be digital or analog. So for example the TCP and UDP are digital channels but fiber optics are physical systems that where you have to measure something in analog and also sound waves when you talk to each other are analog channels. An important property of channels is that they can be noisy or noise free. What do I mean with noisy? Well a lot of channels don't really, in a lot of channels the output isn't really exactly the same as the input. If you have a fiber optic cable this fiber optic cable follows some physical process so there will be some noise introduced along the way and that means that every once in a while you measure something else at the end at the thing that was put in and you have to come up with a way to deal with that. Same for sound waves. Every once in a while when I say something you will not be able to hear me correctly or you might even misunderstand it and think that I was saying something different. Finally channel typically has some finite transfer rate or in the lecture one of the upcoming lectures we will introduce the term of a channel capacity and we'll define in more detail what that means in the upcoming lecture but essentially the idea here is that a channel cannot transmit infinitely amounts of data in any times but it can only transfer kind of a finite amount of data per second. Okay that's so much to the channel now let's talk about the reconstruction so the kind of version of the message that the receiver reconstructed. The reconstruction can have different properties you can have different kinds of reconstructions for example you could have again a computer file but you can also have some sort of real-time playback and you have these have different constraints for example in real-time playback if you don't get the right data in time you may want to have to drop some frames that's not the same thing as if you receive a file. Another example would be you know let's consider again the case where the two of us are talking to each other and then the reconstruction so the message is something that I thought of and the reconstruction will be just your comprehension kind of in your head so the decoding step will be done in your head and then you will have some comprehension of the thing I wanted to express. What are the properties of the reconstruction? Well an important property of the reconstruction is it can be either lossy or loss less and there's take here of course this should say lossy or loss less so imagine you're sending an image file usually in an image file if it is a photo of something then you're not expecting to get the exact same pixels values back but you're kind of it's okay if the image is kind of visually extremely visually very very similar to the message that was put in in the beginning but you might be okay with you know some minor blurriness or minor differences in colors. By contrast if you have a text file then you usually want to have loss less transmission so you would usually expect that you receive the exact same text file that was put in in the beginning. Another property of the reconstruction could be it could be done in a streaming way or what's also called progressive or in bulk. What do I mean with that? In a streaming reconstruction it's often used for real-time playback so where and here you expect that you can reconstruct parts of the message while you're still receiving more parts of receiving the the later parts. If you're streaming a video in a zoom call or a Skype call you expect to be able to reconstruct the first couple of frames before the later frames have even arrived and by contrast if you think about PDF for example I believe PDF has kind of its index typically at the end of the file so it's kind of hard to read a PDF file unless you have really received the entire file as far as I'm aware. Finally a property of the reconstruction is it could be what I would refer to as Cable so if you have even if you have a bulk reconstruction it depending on the way you reconstruct the message it may be you may not be interested in the entire message and it may be easier or less easy to just jump ahead to certain parts of the reconstruction. Okay this so far we've clarified three of the five components of our pipeline. We'll be talking more about the encoding and decoding statement this will actually be kind of the main part of this entire course but before we talk about these let's state what is actually the goal that we want to achieve. Our goal in this entire setup is that we want to transmit the message from the sender to the receiver and we want to transmit it in under two constraints it should be fast and reliable. All right I've now sketched out roughly the the setup the problem setting and our goal now I would like to thank you about what this goal means for the encoding and the decoding step so what they have to do and I've prepared three questions that I would encourage you to think about. The first question is assume you want to losslessly transmit some message so you want to you're the the sender has some message and it wants to make sure that the receiver transmits exactly the same message and they transmitted over some channel that they're both aware of and let's assume that both the sender and the receiver know kind of the type of message so they know whether it's an image or a video or an audio something like this and they are also aware of the properties of the channel but obviously only the sender knows the precise instance of the message the receiver doesn't know it yet that's why we have to send it. Now let's assume that the message is represented as some string of n bits so it could be an image file that you can represent as a string of bits or a text file and again both the sender and the receiver know which kind of you know how to interpret these n bits but they'll just you know for now for the purpose of this discussion we'll just say it's a string of n bits. Now of course the sender has to encode the message and then the receiver has to decode the message. Now my question is after encoding what is the size of the message and the options here are could be either the same size n bits or it could be more than n bits or less than n bits or maybe it depends on the precise circumstances what kind of message it is what kind of channel it is. The second question is what is really the task of the encoder and of the decoder then as well. So we're first in the second question here going to consider the case where the channel is noise free so the channel think of it something like TCP something a channel that where you can be pretty sure that the data that you get out is the exact same data that has the one that you put in. Now what is really the task of the encoding step and in particular you should think about our goal of transmitting over a channel with some finer capacity and transmitting the state as fast as possible. And then question three is now let's make it a bit more complicated let's assume that now the channel introduces some sort of noise so for example if you have a digital channel it might occasionally flip some bits randomly. And so now what additional tasks do the encoder and the decoder have to perform? Pause the video at this point and think about these questions and then I'll discuss them in a bit. Alright so here's what I would have answered. So in the first question you know assume that we have some message of n bits and we know the type of the message and the receiver knows the type of the message too but it doesn't know the specific instance of the message that's why we have to transmit it. Now the question is what is the size of the message after encoding? Does it stay the same? Does it grow? Does it shrink or does it kind of depend on the circumstances? My answer would have been it depends on the circumstances and I'm not going to discuss this right now we'll come back to this later. Let's look at the other questions first. The second question is what do the encoder and decoder have to do if we have a noise-free channel and we have to want to transmit a message and in particular we wanted to think about redundancies in the data and about our goal of transmitting the messages as fast as we can. So if we want to transmit over a noise-free channel and we have some redundancies in the data but the channel has kind of a finite capacity then you know if there are redundancies in the data then we should try to get rid of them before we send the message over the channel. So for example if you have an html file and you know that it will always start with the string html in angular brackets then you may want to just strip those out and the encoding side and then the decoder will know that it will not receive kind of a starting html string but it will know that it should be there so it can fill it back in because it's a redundant information. This is very trivial redundant information but other redundant information could be in the English language usually the letter q is followed by the letter u so you may might want to think about introducing kind of a new virtual letter that's the letters q and u together and then anytime you see q and u together you just transmit that one letter which will be may make it more efficient because then you have that to transmit less data over the channel. You then have a larger alphabet you have more letters but you will typically have to send fewer letters. So the answer to question two would be that the task is to remove redundancies and this is called source coding or in another term for it is just compression. So this is source coding and the term source coding here has nothing to do with what we typically think of source code in this in the sense of programming but we will understand in a second why this is called source coding. Now question three was what additional task to encode and decode I have to perform if the channel introduces some random noise for example some random bit flips and here my question would be the addition and my answer would have been the additional task that encode and decode I have to perform it's something that makes sure that despite the introduced noise by the channel you are still able to decode the message without errors and in order to do that you typically have to add redundancies back in add redundancies and this is then called channel coding. So let's take a look back at our pipeline so we're talking about the encoder and the decoder and now what turns out is in general encoding and decoding have to both remove redundancies and add redundancies so they have to do both source coding and encoding and it turns out that you can actually separate these steps from one from each other and this is actually a very non-trivial result. I'm not going to go into detail right now here because we're this is actually a theorem that we will derive in one of the upcoming lectures. At this point I just want you to understand how this pipeline will typically what this pipeline will typically look like in a real system so in a real system you have start with your message and then your encoding step will typically consist of two steps the first step will be what I called source coding which is generally called source coding and what this step does it removes redundancies it removes these redundancies from the data and the reason why I want to do this is because by removing redundancies the message becomes shorter so it will be cheaper to send over the channel. So another word for this is compression and then after source coding so after removing redundancies now kind of seemingly paradoxically you typically want to add back in some redundancies and this is called channel coding so this step adds redundancies so now you may think you know why do I first remove redundancies just to then later add them back in and the answer is these are different kinds of redundancies the redundancies that you remove they have they only depend on kind of this type of message so if again if you have an html file then these redundancies result from the grammar of html if you have a photograph then the redundancies that you have there is that you know just listing the color values of every pixel is a very inefficient way to communicate photos it's more efficient to to talk in more semantic terms or in more in shapes and things like that so these redundancies that you're removing here they only depend on the message and that's why this step is called source coding because it only looks at properties of the message and we will actually see that it will actually require a probabilistic model of the data source the data source that generates the message on the other hand channel coding when it adds back in redundancies it only looks at the properties of the channel and we will actually see that it will require a probabilistic model of the channel and it adds back red adds different redundancies in the message that make it strategically so that even if the channel introduces some noise you will still be able to decode the original message so this is source coding and channel coding on the encoder side then on the decoder it turns out you can also again separate these two parts so you would then start with channel coding you can think of channel decoding and this kind of channel coding looks at the message that it receives received from the channel and it then takes into account properties of both the channel and it has to know what the channel coding process has been on the center side and it using these knowledge of these two facts it can detect when the channel introduced some noise so when introduced for example a random bit flip and this step is typically referred to as error correction or correction correction and after errors have been corrected we can then we then get the kind of compressed message so then we have to decompress it and this is again called source coding so now we're doing the inverse of the source encoding step so so this is called decompression again I don't expect you to understand why you can always separate the two steps in such a way and why in the source coding step we only have to look at the message and in the channel coding step you can ignore the message and we can only have to look at the channel or in the source to be more precise in the source coding step we have to look at properties of the source that generated the message and in the channel coding step we can ignore the source of the message and we can only have only have to take into account properties of the channel I don't expect you to understand why that is the case we will prove this in the lecture but I want you to kind of get an idea of the typical framework of a of a communication pipeline and now with this knowledge we can go back to question one so question one was if you have a message that you want to transmit after encoding does the message become bigger or like longer or shorter and I hope that now you can understand why I said the answer would it depends because we have two steps in the encoding step and one step the source coding step makes the message shorter because it removes redundancies but then the channel coding step makes the message larger or longer because it adds back in new kinds of redundancies and whether or not you know which one of these steps is has the bigger effect really depends on your situation so if you have a highly redundant data then you may have like extremely high performing compression and channel coding may only have a very low overhead so then you would expect that the message after encoding is shorter than the original message on the other hand if you have a very noisy channel let's imagine you're sending a signal to Mars then you would expect that you know channel coding would have to add in a lot of redundancy so that you can still be somewhat sure that you that at the end the receiver will get the correct message all right that this kind of schematic sketch of the communication pipeline I would now like to you to think about you know more concrete examples of source coding and channel coding in particular I want you to think about these examples listed here and for each one of these I want you to decide whether they are source coding or channel coding or maybe a combination of both and then I in question two here I'm giving you a more concrete example I'm showing you two QR codes you can try to scan both of them and you can verify that they both will that you will go both that you will get the same text out of both of these QR codes but you will see that one of those QR codes is much larger than the other one and to understand why that is the case you could try to occlude parts of this QR code so to occlude for example this part or this part and you can test how much you can include in each QR code and still be able to to decode this the correct message okay think about these two questions and I'll be back in a second all right so here's what I would have answered sip gzip and bzip 2 these are all compression methods and they are kind of universal compression methods that remove redundancies in the form of repetitions that occur in the text so they are all source coding source mp3 mp4 jpeg png are also compression methods they are specialized to particular types of data like audio or video or images but again they remove redundancies that are particular redundancies in images so also source coding the phonetic alphabet is something that we use when we talk on the phone and we want to spell out something that's hard to write or if the phone connection is poor so if there's a lot of noise but it's important that every letter of the thing that we want to say is transmitted correctly like if we are providing our names to some some official government authority or something like this so here instead of just saying the letters like robert sorry just saying the the the name that we want to say like robert we replace every letter r o and b with an entire word so we're adding a lot of redundancies because if you then hear these sequences of words even if there's some noise on these words you can still make out which word it was and you know with which letter it it begins so we're adding a lot of redundancies so that we can still communicate over a very noisy channel so this is channel coding morse code i would argue is a combination of both so morse code is consists of dots and dashes like long beeps and short beeps that we send over a very noisy channel so it is these long and short beeps are optimized so that you can really distinguish them even if you have a lot of noise so that's kind of a channel coding aspect but it also takes into account the typical source of the message so typical source of the message is typically when you use morse code you would transmit or people used to transmit some sort of telegrams or some some natural language messages typically when you think for example about the English language then different each letter has different frequencies and morse code actually assigns shorter code words or shorter sequences of dots and dashes to letters that occur more often and that is kind of a channel sorry a source coding aspect because that allows us to compress messages that makes the messages shorter so it is in some sense it contains really both then the three-digit CVV on the back of your credit card sets the three-digit number that you kind of sometimes have to provide to prove that it's really your credit card that you really have it in your hand it's really isn't the main purpose of the CVV is really is neither channel coding nor source coding it's really kind of a more of a cryptographic property but it does also play the role of a sort of checksum so if you mistyped one of the digits of your of your of your longer of your 12th digit credit card number and then the then very likely the CVV won't match so that is some sort of error correction then the website or the call center that asks you for your credit card number can ask again and say i'm sorry these numbers didn't match could you transmit that again so they can correct they can detect and then correct the error so it's it has has some channel coding coding aspect but really it's more of a cryptographic property emoji i would argue are source coding because emoji allow you to express a sometimes even quite a complex emotion in just a single character so that's makes messages more compact that's compression source coding and then the fact that you are codes like these ones are still readable even if they're partially occluded well that's the prototypical example of channel coding because that's really error correction and the reason why i included this example is because i think this is a really interesting example where you really see a very un atypical channel so you can see that these concepts that we're learning really apply in a very general setting so here for these qr codes the channel is kind of you have light being reflected from the surface and then going through the optics of your phone camera so that's that's the channel that we're thinking of here we're not thinking about internet or bluetooth or anything like that we're thinking about some optical channel that some were situated in like a everyday environment so with that we directly come to the second question so i asked you kind of why do our do these two qr codes have such different size even though they both encode the same data and the reason is as we discussed already here is that qr codes contain some error correction so they add some redundancy so that even if your reader doesn't can't really make out all of the bit all of the pixels it can still decode the message usually and when you create a qr code you can actually decide how much error correction you want to have and the more error correction you add the more resistant the qr code will be to to occlusions or to to misread so if you know that your qr code will appear in some very dirty environment then you should add more error correction than if you know that your qr code will appear maybe in some factory where you have exact precise control over how things operate and i think this is a very interesting example because you can already see some properties of these error correction codes so for example in a qr code it's very unlikely that like random bits here here here and here are flipped what's more likely is that you have an occlusion like you have something that's in front of it and so it's likely that the bits or the the the pixels that are misread by a reader are kind of close to each other in some region and so the error correction that has to be included in a qr code has to kind of be optimized for these kind of expected expected errors so you need kind of in more generally speaking for error correction you need a model of your channel and then optimize the error correction code for that model all right in this lecture we will not talk too much about error correction it will come up at some point but most of the time we will talk only about source coding so we will assume in a sense that we have a channel that is noise free so that we can that a channel where we can input some some compressed data and then the receiver will receive exactly the same compressed data and it can then be compressed that why can we still assume that I mean in the end all channels that we were going to use are going to be some physical system so there's always noise in a physical system but why can we still use such a model of a perfect channel well the reason is precisely you can see the reason precisely in this diagram so since we can separate source coding from channel coding and since the channel coding is kind of close to the channel we can now use a common trick in computer science that computer science is about building abstractions building useful abstractions so we can say well yes maybe on the physical level we have a noisy channel but we can add some channel coding and then we can define kind of this group of things as kind of our channel prime and then this channel channel prime is for all intents and purposes noise free and we will prove in this lecture and this is a a non-trivial result that you can actually come arbitrarily close to really perfect noise free at a finite cost at least for very long messages all right so this kind of sets the scene and this actually sets the scene for the entire course that we're going to go through so in for this entire course we will be considering the situation where we have a sender and the receiver and they want to send a message and in order to do that they have to encode that message in particular we are interested in the source coding step of the encoding and decoding so the one that removes redundancies on the encoding side and then adds them back in on the decoding side so that you get a reconstruction and that reconstruction could be lossless or lossy with that now let's come to our first class of actual compression methods and this first class of compression methods are called symbol codes and these are really compression methods that are used a lot in practice so for example gzip and psub2 and these things they all use symbol codes under the hood combined with some additional tricks so in this lecture we will kind of set the general framework and then in the next video we will use this general framework to derive both a theoretical result and an actual algorithm so the theoretical result will be a very fundamental lower bound on the optimal compression performance that you could possibly get for lossless compression and then the practical result will be that you can actually come arbitrarily close to this that there are actually algorithms that can come arbitrarily close to these this lower bound at least in some cases and that's what we will be discussing in the next video but for this video let's first set the stage for symbol code so the problem setting that we're concerned with here is that we want to communicate over a noise free channel so we assume that we already have this abstraction that error correction is already channel coding is already taken care of so we have can think of a noise free channel and we assume that this sender has a message and i'm going to denote the message as x underlined so x underlined is the message and it wants to transmit this message losslessly so we're considering lossless compression to a receiver and it wants to use as few bits as possible so with lossless compression what i mean with that is that we have this message and our goal is that the receiver can receive this exact same message without any without any deviations so just to fix some notation we are going to denote the encoder as some function c star that takes the message x underlined and it will map it to c star of x and the c star is a string of bits so this asterisk here is that's called the cleanest star so what this basically means you take this set of you know zero and one bits and you form the set of kind of all possible strings that you can form with this of all arbitrary links so a code word could be mapped to the empty string that's a very unusual case but it could be also be mapped to a string of bits sorry in a symbol a sequence of symbols and a message could be mapped to a sequence of an arbitrary length sequence of arbitrary bits that's what this notation means more generally so here we're concerned with bits usually we use binary codes but more generally you can also think of extent this to you know codes with more than one kind of b re codes not binary but b re codes where the set of instead of just zero and one bits you have more possibilities on your channel for example Morse code as we will see is kind of a tertiary code but most of the time we'll commonly use capital B equals two so you know course that that map your message to a string of bits okay so this is the general setup of lossless compression now what are symbol codes in particular symbol codes are satisfy all of these requirements but they add some additional constraints the first constraint is a constraint on the message and here we're assuming that the message is a sequence of symbols so the message I'm going to denote this as x underlined and the symbols as normal x sub i so x sub i from and we're going to assume that these symbols all come from some discrete alphabet capital or x so the notation for that will be the message x underlined is a sequence or a tuple of symbols x 1 x 2 to x k where k is some integer the length of the message and I'm also going to sometimes just call this x sub i for i from 1 to k and then here each of these x i we are going to call them symbols is from a element from this finite alphabet or this discrete alphabet and what I mean with discrete is that this alphabet is either finite or countably infinite for the most part you can just think of the alphabet as being finite so think of for example when you're transmitting text from natural language then the alphabet would be the letters from a to z and maybe the space and some full stuff or something like that the countably infinite part actually becomes important in some machine learning setups when you want to represent or approximate real numbers but for the most part you can think of a little bit of being finite and then so this is the first constraint that's specific to symbol codes that the message is kind of a sequence of symbols from the same alphabet and then the second constraint is and that's really the more important constraint here is that this encode function c star this has a very simple form and the very simple form is when we have a message that's a sequence of symbols we just go iterate over each symbol in the sequence and we are going to look up what we're calling a code word for that symbol and then we're just so this each of these is a bit string and then we're just going to concatenate them to a single long bit string so again so c here these c's without the star we call them the code book and this code book kind of contains for every symbol in our alphabet it contains what we call a code word for that symbol and then we make one additional definition that's l of x we denote that as the length of the code word so the length you typically mean number of bits or if we have a b airy code then it's you know number of b airy bits i don't know what this would be called them all right these are symbol codes let's proceed with a couple of examples so the first example i kind of already mentioned it is morse code and morse code is really a kind of b airy code with b equal three because it kind of each letter is mapped to a sequence of dots and dashes but then also between letters you have to add a pause to distinguish letters so you have kind of three states of your channel okay slightly more modern symbol code would be utf-8 so utf-8 is kind of when you have a text file on your computer like an html file or a latex file it will typically nowadays be represented in or encoded in utf-8 this is again a binary code and so this is just the way how letters and special characters and things like that are mapped to bytes or therefore sequences of bits so here you know alphabet is the set of all unicode code points so basically you can think of this as letters punctuation marks and all kinds of special characters and also letters from different alphabets than latin and then the code word of any letter of any symbol in that alphabet is just the utf-8 representation of x which is standardized somewhere and the length of each code word is utf-8 always maps to a full sequence of bytes so it's either 8 16 24 32 32 bits that is so either one two three or four bytes so this is a kind of very widely used example of a symbol code and then the third example is a somewhat more kind of a made up example but this will be an example that we will use in many of the upcoming videos to kind of play around with symbol codes so this will be a less useful system less useful code but it will be one that we can where we can very easily intuitively understand some important things that are going on here and i'm going to call this the simplified game of monopoly and this is not a standard term this is just a term that i came up with here so what do i mean with that uh let's say we have a pair of dice and we throw throw the pair of dice and after the throw we add together you know we have two dice so we add together the numbers that they show like you do in monopoly and then we just write down the sum as the new symbol xi so we have a message of five symbols then we throw the pair of dice for the first time we write down that number the sum of the numbers then we throw them for the second time we write down the sum of the numbers and then we do this five times and then we get these five numbers and so just you could do this with regular six sided dice but then we will later on we will write out tables for all kinds of combinations that you would get and that would be very tedious so let's just make our lives simpler and let's just assume that we have three sided dice so dice that can each one of the pair of dice each die can only take the numbers one two or three and therefore their sum can only be two three four five or six so you would get a two if you have the only way to get a two is if both dies are one so that would be a one plus one you could get a three if you know one of them is one and the other one is two or if it's the other way around and so on and you would get a six if both of them are three that's the only way in this but these three sided dice get a number six okay this may seem like kind of a silly game but it will really make it easier to think about some important concepts and it will visually exemplify a lot of the important concepts that play a role in compression so in order to understand that let's think about some symbol codes that we can design let's assume that we have a sequence of these numbers from this alphabet let's say we have fives of these numbers from this alphabet and we want to transmit them to some receiver and the receiver knows that what they're going to receive is such a sequence from our simplified game of monopoly but they don't know the specific instance of the sequence they don't know the specific numbers so how could we map these such a sequence of numbers to a string of bits in such a way that the receiver can then decode that okay the first thing that you might come up with is okay let's construct a code book that again a code book assigns to each of the symbols here assigns a bit string and let's the first thing that you might come up with is you know these are all numbers so they have some binary representation let's just write down that binary representation so let me do this down here I've prepared some table so in this simple example you would have your code book the binary representation of two is one zero the binary representation of three is one one and you get one zero zero one zero one one one zero okay this could be one naïve example but then you would see oh I start already kind of with a two um two bit code word but really I mean the symbols zero and one aren't even from possible why don't I just subtract two from each of them and then I encode the binary representation so that would lead to our next kind of candidate simple code and we'll discuss whether that's a useful idea or not in a second but let's just do that for now so then if we just take right down the binary representation of the symbol minus two we would get zero for the number two number three we get one then for the number symbol four we get two which in binary representation is one zero then one one and then one zero zero um if you look at this code you will already might already guess that there's this is probably not going to work and the reason why this might not work is um how do I know if I get a one zero how do I know that this is a four and not a three two right because it could either be this symbol or it could be the concatenation remember we can we are going to concatenate all these symbols whether it could be the it could also be the concatenation of these two symbols um so a simple fix for this issue that you could think of is that you know let's just pad with initial zero bits so let's just say the symbol um the the code word for the symbol three is three times zero so fungus symbol here's three so we have to pad to three bits then we have zero zero one uh zero one zero zero one one and one zero zero then we don't have whenever we get a message we can always divide it into chunks of three bits and we can uniquely decode it but it will the message will become longer so now we can think of okay could we not not still kind of make it shorter for some of the symbols and I'm going to just propose two um code books here and we will then discuss whether they are useful or not so the first one I'm going to propose is this one and this one uh this one this one and then this one so zero one zero one zero zero zero one one and zero one one and then um I'll also propose an alternative code book that is zero one zero for the symbol two um zero one for the symbol three um zero zero for the symbol four one one for the symbol five and uh one one zero for the symbol six now we have five different candidate code books that we could use either one of these to encode a sequence of these symbols now I want you to pause the video again and you would I would like to rank I would like you to rank these code words by usefulness so I would like you to tell me which of these code words is most useful and which of these code words is least useful and as a reminder ultimately our goal is not to transmit a single symbol two three four five four six but our goal is really to transmit a sequence of these symbols um and in particular we want to transmit them in as few bits as possible so pause the video again and think about which one is the most useful and which one is the least useful symbol code book all right let's get back to it so here's what I would have answered um first of all we kind of already discussed that um the code book c2 is not useful because you won't be able to uh reliably decode the message that the encoder encoded let's assume the encoder encoded the message just consisting of a single symbol four which is one zero then the decoder might think that's the sequence of symbols three zero or the encoder may encode this the message sequence of symbols let's say five three so that would be one one one then the decoder may think okay this might be five three or it might be three five so it even if it even if the decoder knows that uh length of the um the sequence it will still not be able to distinguish these two clashing um examples um so this is certainly kind of not very useful so let me just drag this through um what about c1 for c1 it's not that obvious but I would also argue that this is a not a very useful code um let's imagine you have uh an message an encoder that encodes the message um is this visible here um consisting of the symbols two six so two symbols two six then c star so the encoded form of this message would be two gives you one zero and then six is one one zero but then the decoder could think that could group um the symbols in a different way it could first read off the first three bits and then say okay this is um the symbol five one zero one followed by the symbol two so you have again two different sequences even though each individual symbol is mapped to a different code word but once you concatenate them you have two different sequences that are mapped to um the same encoding the same bit stream so this is also I would argue a not very useful uh code uh book um now c3 clearly um doesn't have this problem whenever you get a message that's encoded with c3 you know that it will hit the number of bits in this message in this encoded message is a multiple of three so you can divide it into blocks of three and then for each block you can just decode that block into its original symbol so that's surely a useful symbol code but um sorry yeah useful symbol code a useful code book but it's not optimal so it has some overhead it's it will lead to longer messages than really necessary um and it turns out that c4 and c5 are also valid code books so you can convince yourself that um none of these that there's no way that in which you will um in which you can introduce a clash between two different sequences between the encoding of two different sequences in either code book c4 and code book c5 and we'll discuss the reason for this in the next lecture and in the next video um but if you just trust me for a second then i would argue that c4 and c5 are a bit more useful than c3 because c4 and c5 lead to shorter messages so they are more efficient all right that wraps up this first video in the next um video we will um derive some properties of we will derive some ways and how you can see whether um symbol codes are useful or not we will derive we will define the property of uniquely decodability unique decodability and um we will derive some properties that unique decodable uniquely decodable symbol codes have to um have to satisfy and that these properties you can immediately see that uh both of these codes are not uniquely decodable so not useful and you don't even have to come up with like examples of where things clash you can immediately see that certain inequality is not satisfied so they cannot be valid code valid uniquely decodable code books and then we will um derive a um a a quantity that um poses the um theoretical bound for the expected number of bits that you would get for kind of a randomly drawn message and we will then derive a practical algorithm that can get very close to this expected this theoretical bound for the expected length of an encoded message that will be the famous Huffman coding algorithm so we will uh both prove the theorem on the lower bound for lossless compression and that's a lower bound for any compression method that you'll have not just for symbol codes and not just for Huffman coding and then we will actually derive a code that in some sense comes close to this lower bound and that's in the next lecture on the problem set which you can find a link to in the video description you will play around a bit more with these um code books and um you will argue think a little bit more about what this um this procedure of how we created these code these symbols what influence that has on the choice of um of code words that you may want to do and you may then understand why I chose kind of shorter code words for these inner symbols and longer code words for these more kind of extreme symbols and then um and as the second problem on the problem set you will actually implement both an encoder and a decoder a very naive version but a version that will work for search symbol codes and you will be able to test that these um and your implementations will actually work on um real data and with that see you in the next video