 So, this is lecture 24 and the last thing we were seeing I think yesterday we did not have class, but the day before that we saw some couple of examples illustrating the Viterbi algorithm how it runs and all that. Then I listed out a few implementation issues that one might face. The last thing I want to mention in that to round it off is a brief computation of complexity in terms of number of computations that is required to run the Viterbi algorithm. So, that is the last piece of information we will do for the Viterbi algorithm and then we will move on. So, the first thing we need to tabulate is the number of, so in terms of computations we will be worried about additions, what are the computations that are done in the Viterbi algorithm? What else? Multiplications, where will you do multiplication? When you compute the branch metric presumably we did multiplication, but that is also a computation to keep in mind. And then what else? Comparisons, so that is for finding minimum, so these are the various operations and one can compute one can go through stage by stage and accurately do a computation, but that is not what we are interested in right now. So, I just want rough numbers, so if you are thinking in terms of very rough numbers how many branch metric computations do you have to do, so you have to think of how many stages of the trellis you have. Suppose you have l stages of the trellis, suppose that your input alphabet is x and suppose you have a mu tab m of z, so these are my assumptions. Suppose I do all this, then how many branches do I have, how many branch metrics do I have? You go through exact computation, maybe you have to worry about termination, this that and all that, you won't do that, just do an approximate computation. If you look at a complete stage of the trellis, how many branches does it have? Size x power, mu times, how many branches are there in one stage of a trellis? One stage implies, how many states in one stage? Size x power mu, so out of each state, how many things come out? Size x, so one stage contains size x power mu plus 1 branches, size x power mu states, so that is the simple computation to keep in mind. Remember this we called as some ns or the number of states, so things will be in terms of numbers of states and all that. So if you do rough computation, each branch metric maybe is one multiplication or one addition, something like that and then as you run the Wittebi algorithm, you have to do how many additions? One addition per branch and then one comparison per state, so that's it. So roughly one or two operations of each of these things, additions, multiplications, comparisons per state. So roughly the computation in one stage will be about say some 3 to 4 ns computations. You can do an accurate count depending on what thing is and you can optimize this thing very much less but it's very difficult to go beyond 2 or 3 ns, so roughly number of states, so that's the computations per stage. And then what do you do for the total computation? Multiply by L, did I make a mistake? If there is a branch between any two states, the state metric you mean. Should you check for existence of a branch? So I am going to assume that it's known to you, it's already known, so there will be an array which holds all the branches roughly, I mean I am not going to look at memory lookup, I am not going to look very closely at memory lookup, I am just going to assume that it's free, so assume it's kind of stored in a sequence and you are going one after the other, in that case you can assume it's free, it's not a big deal. Computations that really matter are the actual additions you do, so typically there is an issue in case you are implementing the whole thing in one processor and you have to worry about these memory lookups and all these things, but we won't worry about it here, so we will only worry about the actual computations in terms of additions, specifications. So it's also very typical in algorithms, people don't look at memory lookup too closely unless you come to a final implementation level detail, so I am going to assume for that purpose that all these things are given to me, I know what are the computations I have to do, I will do it in a sequence. So total would roughly work to L times, so some constant, some 3 or 4, so I will say 3 size x power, let's say mu plus 1, so if you want to see, you want to be very accurate. So roughly that's the amount of computations, so linear with L is a good thing to have, but exponential in mu is not a good thing to have, so the reason is if mu becomes 10 or something it's really impossible to run this, so this causes problems, this is okay, so that's the rough notion of computation. So the main story is exponential in number of states, sorry exponential in memory, channel memory and linear in length, sequence length, that's the story, so that's one bit of detail. So the last thing we'll do is on the Viterbi algorithm and MLSD is estimating probability of error, okay, so this, okay, so the setups we have seen so far for estimating probability of error have been really, really simple, so far we pretty much saw only 2D constellations, okay, so 2D constellations really trivial to at least write down the probability of error expression, definitely, pairwise error probability is really trivial, okay, right, you see just 2 dimensions, everything is intuitive and very nice, you can see it, you know where the projection happens and all that, okay, so now we have a really complicated problem to look at probability of error, right, so what's happening here, you have a sequence which is running through, I mean your whole processing is a big sequence of steps, running the whole Viterbi algorithm, it seems terribly complicated to come up with a probability of error, okay, so it is indeed complicated and we'll do a nice simple estimate which makes sense and we'll use pairwise error probability to simplify almost all our calculations, okay, so the first thing is even though you're doing all these sequence detection and all that, ultimately you're detecting one out of several possible sequences, okay, so remember what is MLSD doing, okay, so you have a total of say size of X par L sequences, okay, what are these sequences? Symbol sequences, right, data symbol sequences, okay, so let me also be very precise, these are the data symbol sequences, okay, this is the SK I am talking about, right, the different possible SKs that you have, okay, so many data symbol sequences are there, when you run through your channel model, remember what's my channel model like, I can have a simple monic minimum phase model with MZ, okay, when I run through my channel model, it generates sudden BK, those are my channel output values which are corrupted by noise, okay, and then I get my ZK, okay, so based on ZK and AK, right, based on ZK and different possible sequences here I have to find the best and closest sequence, okay, so that's my, that's the way it worked, okay, so keep that in mind, I had SK which went through a channel, produced BK, noise got added, right, and I got ZK, okay, I am running my detector here, okay, ZK is being processed by my detector, okay, keep that in mind, and I had a nice expression for SCAP and all that, what was the expression for SCAP, argument for of minimum A in XL, what, norm C minus B A square, okay, I hope this notation was roughly the notation I used, okay, the B corresponding to A square, okay, so this also in a high enough dimension can be seen as a search through a constellation and finding minimum distance neighbors, okay, so imagine a huge L-dimensional picture, okay, in this L-dimensional picture, it's tough to imagine these things for a large L, each of my sequences are just points, okay, and my ZK is also some point which is a received point, what do I have to do with ZK? I have to find the closest, well B of A, okay, so in fact it's not even L-dimensional, it's L plus mu-dimensional, okay, so I have to take care of even the termination and all these things, okay, so in a huge dimension space, I still have a constellation and I am still doing minimum distance, so I can think of everything as vectors and constellation in a very, very large dimension, okay, so I can still think of Parvay's error probability and all that, okay, all those things have to still work, but it will be much more abstract than doing a two-dimensional picture and plotting something, but still essentially the ideas are exactly the same, okay, so this is the way in which we viewed in terms of vectors, we also had a way of viewing it as a, what, in the trellis, path and trellis, path metric, right, this was another way in which we viewed it and it's also useful to think of these vectors on the trellis and develop some notions of closest vectors, closest paths in the trellis, okay, so if you actually traversed a certain path in the transmitter and you receive noisy, receive values corresponding to that and then you're decoding to a certain path in the receiver, you can also decode to some path which is, you might decode to the actual path or you might decode to a path which is very close to the original path, so now I have to say what do I mean in terms of close, okay, so I can also think of close paths, not just close vectors, okay, so that's the interpretation that's very common when people analyze Viterbi decoders, one always thinks of making errors on the trellis and going to the closest possible path, okay, so I won't mean this, it's possible to write down notation and do it carefully, I'll also do that but before that I want you to get some kind of an intuition of what's happening, okay, so if you have a certain path, okay, so you have a certain path in the trellis, okay, say this was the actual transmitted path, okay, so the path that corresponds to the transmission, okay, so if you actually decode to some other path, okay, so maybe I'll call this path P, okay, so this path I'll call P, okay, so suppose I decode to some other path P hat which is not the same as P, okay, P hat is going to also be another path in this trellis, okay, so remember what are these small circles that I've drawn here, those are the states at different times, okay, and the branches that are connecting them, okay, so you can do it, skipping all of them, okay, so P prime if I draw on top of P, what do you expect, what kind of a picture do you expect? Yeah, it's going to be on P mostly, right, assuming you're doing some minimum distance but what will happen, okay, deviate away and then join back again, deviate away and then join back again, okay, so that's the way of visualizing what this erroneous path P can be, okay, so it probably like this and then maybe it went somewhere else, then maybe it came back and maybe it was like this, okay, I need more states here, so I'll put one more, okay, can you see it, okay, so maybe the erroneous path went off somewhere and then came back, okay, so this could be a P prime, just a visualization of what's happening, so this is a closest path which is close but I've not really put a number it, right, so this is how you're visualizing it but I've not put a number it but you notice this path P prime when compared to path P is common for a lot of sections and it deviates only for certain sections, okay, and if at all I define any distance between P and P prime, the distance will have contributions only at the places where P and P prime are not the same, okay, so if they are the same then the contribution for the distance is going to go off to 0, okay, so I don't have to worry about those parts, okay, so this leads to what's called an error event, okay, definition of an error event in these trellis, it's basically only focuses on this path, this path, okay, so an error event in a trellis, okay, I'll call it E contains psi and psi cap, okay, so I think I should use different notation because I think you use psi for the state, right, so maybe I'll use phi and phi cap for paths, okay, phi and phi cap, okay, where phi and phi cap are paths on the trellis, okay, so remember for all this I'm already assuming that there is a trellis that's given to me, okay, MFC is given, X is given, I know what my inputs, inputs symbols are, I've already drawn the trellis, okay, so I'm trying to do analysis of error after all that, okay, so I did not go through that, so assume all that has already happened, okay, so you're trying to now find error, okay, if psi and psi cap are paths on the trellis, and now I want this to be an error event, as in what do I want, okay, so if psi, this phi is psi 1 psi 2, so on till some psi n, phi cap will be what? psi 1 psi 2 prime psi n minus 1 prime, okay, so I'll put psi n minus 1 here, and then psi n, okay, so here what should happen, all these guys should be definitely distinct from psi i, okay, so I want psi i prime not equal to psi i for i between 2 and n minus 1, is that okay, okay, so that's, so basically phi, phi is the path on the trellis, so what is the psi sequence of states, okay, so I'm defining the sequence of states, and the sequence of states is also called a state trajectory definition, okay, so I'm defining the state trajectory for phi, so when do two paths become an error event, if they start at the same place, end at the same place, but in the middle they are somewhere else, all right, so that's a very simple definition for error event, and associated with any error event is a metric, okay, so it's also very clean to define the distance of an error event is what, how will I define a distance, how can I define distance for this error event, okay, suppose I traversed phi, I would have output the certain set of symbols, if I traversed phi cap, I would have output certain other set of symbols, so what's the natural distance for the error event, the Euclidean distance between those two vectors, okay, so I need more notation for this, I'm going to roughly say this, okay, output of psi i psi i plus 1 minus output of psi i prime psi i plus 1 prime squared, I should go from what to what, 1 to n minus 1, is that fine, so that's my the distance or the metric I'm going to associate with my error event, is that fine, did I make any mistake, so what is this, what do I mean by this psi i to psi i plus 1, it's the branch from psi i to psi i plus 1, okay, when I say OP of that branch, what is that, the actual output corresponding to that branch, see the branch is denoted by an input stroke output, right, so the output corresponding to that branch is psi i to psi i, okay, so I'm not sure if I use this notation for a branch, state to state, I must have used it somewhere, okay, in the definition of the Viterbi algorithm, okay, so that's the way to think about it, okay, so notation is one of the biggest enemies you have to conquer when you describe algorithms and this sequence detection particularly, okay, you can get in, you can get confusing, but ultimately when you write it down in good notation, it makes for good reading and you can really understand and implement it very well, okay, so it's good to write a nice notation, all right, so that's my error event, okay, so suppose I give you a trellis, so we'll take a simple trellis right now, first thing is to compute, see if you can enumerate all the error events, right, that might be a simple thing to do in a 2D constellation that was very easy, right, if I give you one point, you can find all the points in which you can erroneously decode, okay, so now I'm going to enumerate all my error events, okay, so that's one thing you might want to do, okay, so but what is more interesting is what, to enumerate error events whose metric is really, really low, right, error events whose distance metric is very, very low, why am I interested in such, such error events, that's most likely to be a source of error, okay, so because that in my L plus mu dimensional constellation those error events represent closest neighbors to my transmitted sequence, okay, so now first of all remember I need both phi and phi cap, I can't just be happy with one phi, right, so there are so many possible paths and for every possible path there is an, there can be error events, okay, so enumerating this is a big headache, okay, so usually there are several of these and there are too many of these to enumerate, okay, so let's take an example, we'll see, we'll try to enumerate them, that's a good thing to do with an example, okay, so I'll go to, I'll go back to this very simple example I had, this 1 by root 2 example but I'll get rid of the root 2 and simply say this is 1 minus z inverse, okay, so it's monic, minimum phase and all that but it's not normalized, okay, so we'll skip the normalization, I'll say my x is plus minus 1, okay, so the trellis, you know what the trellis is, right, so trellis is really, really easy, okay, so plus 1, plus 1, minus 1, minus 1, okay, so I'll just draw one more stage and then we'll terminate, okay, so go ahead and draw the trellis, okay, so let's label it, it's not, it's once again easy to label, okay, so the output corresponding to the top part is 0, put corresponding to the bottom part is also 0, output corresponding to this transition is minus 2, output corresponding to this transition is plus 2, is that fine, okay, so you can also do a nice pictorial representation for the error events, okay, so how will an error event look, it'll be one part and there'll be a erroneous part, so the actual part you can put with, join with the solid line and the erroneous part you can join with the dotted line and enumerate the error events, okay, so I want you to take a couple of minutes and draw the error events of least metric, there are two parts, I won't tell you the actual part, to fix some actually but only very small error events, right, so what is the smallest number of states that you can have in an error event, can you have one state, two states, can't have two states in an error event, right, there'll be zero, okay, so zero is not allowed of course, of course a non-zero metric, zero metric of course you can have, you should have three states in the smallest non-trivial error event, you should have, the parts should have three states, right, there should be at least two branches, okay, only then you can have the middle state being something else and joining, okay, so that happens only if memory is one, okay, if memory becomes larger then the number of states will also increase, okay, so in a way you see, okay, I'll talk more about it, write down this, we'll see a slightly bigger example and then you'll see the memory actually gives you distance in this error, just write down one or two, I mean you get started, you can take the fee to be along the top for instance, okay, yeah, there are too many of them, very many of them but it's good to just write down what you think are the different ones, okay, so one way to get it simpler is say I'm going to look at all error events which start at a particular time, okay, so say for instance error events that start here, okay, so you start there and then all error events of least metric starting at that state, at that time, okay, so there are two possible places where you can start and then how many possible actual paths, four possible actual paths and see if all of them correspond to some alternate path which has minimum metric, so you proceed like that, you can make it more systematic, right, the paths are easy to enumerate, right, starting somewhere, thing is the other path and whether it will have least metric or not, that's where the difficulty comes, okay, so you start here you can either be in state plus one or minus one, so I'll say maybe plus one and the first path I'll consider is plus one, plus one, plus one, right, that seems like a simple path to consider, what is an alternating path, that's also only one, right, because you have to start there and end there, the only thing you can go wrong is in the middle, so the other path has to be one that goes to minus one and then comes back, okay, and what's the metric for this path, right, this one corresponds to 0, 0, this one corresponds to minus 2, 2 and what's the distance between these two guys, right, square of the distance is 8, okay, so that's all 8, 8 is the metric and so first thing is can there be a smaller metric path, okay, so by inspection and by various other things you can conclude that that cannot be, okay, so, but you can easily imagine as the trellis becomes complicated, such a question may not be easy to answer, right, you may not know 8 is the smallest metric, but in this case it's very, very easy, okay, so this is one path, how many such error events are there starting at this point in the trellis, how many such error events are there, in fact even this I can convert into another error event, how will I do that, make fee as fee cap and fee cap as fee, right, I get another error event with the same metric, okay, so you can see if you list down carefully there will be 8 different, what, error events, all of them with weight, 8, okay, how did I get that, that number 8, is that right, 8 is correct or not, okay, so there should be 8, okay, I think it will work out to be 8, okay, you can check that, starting at any single point there will be 8 different error events, okay, so now if you try to find the total number of error events in this entire trellis, it's going to be fairly large because you can start at every single different point and then 8 times, 8 times you keep adding it becomes a large number, okay, so a couple of observations already I want to make, enumerating least weight error events is important, okay, hopefully you agree, but can be a slow process, it's something that you have to pay attention to, okay, so what I'm not going to talk about in this class is it's possible to set up a Witteby algorithm to find the least weight error event corresponding to a particular path, okay, suppose I give you a fee, it's possible to find the least weight fee cap, okay, and you can set up a Witteby algorithm which suitably define metrics for that, you can imagine very easily why this happens because for fee cap also I know the starting state and the ending state and the metrics in the middle are what, d of e I have to minimize and what is that metrics, I can write it as sum of some individual branch metrics, right, you see you can easily set up another Witteby to do this, think about it, okay, if you've not thought about the Witteby algorithm enough maybe this is not very clear to you, but even the d of e is actually sum of several branch metrics, right, sum of several metrics and that can be made branch metric, so you can set up a Witteby algorithm to do this, okay, so I won't talk about this further in class, it's possible to set up a Witteby for solving this problem in the general case, but that again is only for a specific fee, okay, you have to keep changing fee and finding more and more of these cases, that's one thing, the other story which is really true is there are too many of these cases, okay, there are really, really many of them and if the number of states increases and if the number of stages increases, you will have way too many of these errors, okay, so that's the model of the story, another thing is this 8, right, is some kind of a closest minimum distance separation in my huge L plus mu dimensional constellation, okay, so this is some kind of a minimum distance in the constellation, in the large constellation, okay, okay, so what am I going to do next, so this is, so I wanted to do one more example, but I think if I take up one more example, it will be only a bit more confusing, so I am going to leave it at that, maybe the examples we will see in the assignments and all that, if there is any question, now is a good time to ask about these other events, enumerating them and all that, okay, it's fine, seems reasonable, okay, so now you have to start estimating probability of error and we will use these error events in estimating the probability of error, okay, and I will only say it's an estimate, it's possible to accurately, properly derive these things and show bounds, upper bounds and lower bounds, okay, so it's possible to do that, maybe if we have time, I will go through it very quickly and maybe just bounds we will see, but I will just give for today, I will simply give some expressions and say these are good ways of estimating probability of error, okay, so the first thing one needs to worry about is there are two different things, probability of sequence error and probability of symbol error, okay, so these are two different things, right, so this is the first question we have to answer, what do I mean by probability of sequence error, so imagine what am I doing here, SCAP I am finding as, right, this is how I am finding, probability of sequence error is what, probability that S is not equal to SCAP, okay, so even if one symbol is an error in the sequence, I actually have a sequence error, okay, even if all the symbols are in error, I have a sequence error, only one sequence error, okay, but what is symbol error, probability that SK not equal to SCAP K for a fixed K, that is my symbol error, so in my big L plus mu dimensional picture, if I compute probability of error from one point to another point, that is actually sequence error or symbol error, sequence error, because the entire sequence I am thinking of as one point, okay, and to compute symbol error, one needs to do careful calculations with the error events, okay, so it is possible to do it, that is not very difficult, everyone has to carefully do it with symbol errors, look at all the error events that start at K or start before K, cause an error in K, all these things you have to look at very, very closely, it is possible to do that, so what I am going to do roughly is define some minimum distance for materialist, which is the minimum metric according to the other event, and then based on that provide some estimates for, estimates for both sequence error and symbol error, okay, so that is what I am going to do in this class, maybe in the assignments I will frame some problem which deals with accurately computing symbol error or not, okay, so the first thing is to define minimum distance for materialist, okay, so I will say this is, okay, so this is not really a proper distance definition, okay, minimum distance T min I will define as what, minimum over, okay, error events E, D of E, okay, so this is the squared minimum distance, okay, okay, it is a very simple enough definition, you can see what is happening, I am looking at only error events, and picking that error event which gives me minimum metric and that metric I am calling as the squared minimum distance, so one can roughly show probability of sequence error will go as some constant K times Q of D min by 2 sigma, okay, so that is to be expected, right, so you do, you do pathways error probability between the error event path and the valid path, you will get Q of D min by 2 sigma, right, so remember this is the only pathways error probability and this is where your imagination has to come in about these points, okay, actually you are getting Gaussian corrupted noise but you are projecting onto that line, once you project onto that line you only get one random variable to worry about and you can do pathways error probability with simply Q of D min by 2 sigma, okay, so that you can do, but what will be this K, what do you think this K will be, how many closest points do I have, okay, so yeah, not just L plus mu, so way too many of them, at least L plus mu, like he says, right, there will be at least L plus mu, but in fact if mu is large, it will also scale with mu, right, there is very many of them, okay, starting at a particular point there are several error events, okay, and you have to worry about all those things, average over all those things, okay, becomes very, very large usually, it becomes so large that you can show when L tends to 1, when L tends to infinity, probability of sequence error will tend to 1, okay, so when you are really decoding a very, very, very, very long sequence, probability of sequence error will become 1, okay, so well that is not really, it has to become very large for that, okay, in practice you can decode the signal sequence without any problems, okay, so this K becomes large, so it is not a very nice thing to do, okay, so a better estimate to have is of course probability for S k not equal to S cap k, and here you have to worry about a lot of things, k things like finite length effects because of termination here, there things might be different and you have to do a careful computation, but even here after you go through a lot of computation, you will see this also approximately becomes some c times q d min by 2 sigma, but in this case, the c is a much more handleable number, it is a small number, okay, right, it does not blow up with L, okay, it is independent of L, even if L becomes very, very large, it is a small number, so the c becomes small, okay, and it does not depend on L on the number of stages, okay, so it will in fact depend on the number of, like the number 8 that we had before, right, you remember for this example, we had 8 different such error events, right, that number 8 will play a role in finding c, only that number, it does not depend on the total number of paths that are in the entire trellis, okay, so these are good estimates and in practice, one does not really worry about computing k and c, okay, so how do you validate this, you kind of estimate it with q, but how do you actually compute the probability of sequence and symbol error in practice, what is one way of doing it, which does not care about all of these things, just simulate, right, so you go run through, it is easy to simulate m of z, it is easy to simulate noise, cc to, well maybe it is easy to simulate the WD decoder, after that, you find, do repeat it several times, so repeat it a million times, you can find the probability of sequence error, probability of symbol error, anything you want, you can validate it with this, this will be fairly good for moderate to higher scenario at least, it will be very, very good, okay, so that is the estimate, so the crucial thing to find in this is the error event, okay, once you find the error event, the minimum distance error event, one can get a feel for it, right, and okay, so the other thing I want to point out is, so d min, if you notice, okay, in the previous example, d min suddenly became 8, okay, so where did that d min come from, it came from the trellis, and the memory of the trellis, in fact, ISI can be also be termed as helping you in that way, right, it is giving you some distance, it is causing some inter-symbol interference, but it is giving you distance between your constellation points, right, do you see that, okay, so the 8, if I did not have ISI, okay, will I ever get 8 between two distinct constellation points, I would not get, right, I am sending plus 1 and minus 1, my distance will only be 2, right, so because I had ISI, I am somehow getting 8, okay, so one might think maybe ISI will help me in doing better than the non-ISA case, okay, so but that is not true, okay, so you can show that that does not work out, at best you can get back to the non-ISA case, you can never beat the non-ISA case in most case, okay, so you cannot do that, cannot beat the non-ISA case at all, okay, so but this notion is quite important, the fact that ISI gives you some kind of a distance, and separation between the possible sequences, and you might use that to your advantage in doing your detection, okay, so you have to keep track of that, that is kind of important, because as we go along, we will see some sub-optimal methods for doing equalization, okay, sub-optimal methods to get rid of ISI, so at that point you should know just blindly getting rid of ISI can be very lossy, can be very bad, okay, because ISI itself has some advantages, and you have to try to use it when you try to detect, okay, so that is the last bit of comment I wanted to make about this, before we proceed, okay, so as you can see I am trying to wind up the Viterbi algorithm, I just do not want to leave any piece hanging, just want to make sure that we finish off before we move to the other part, okay, so the last thing I have to mention is what is called soft detection for soft detector, okay, for ISI, okay, so far well for the non-ISI case, okay, we looked at both hard detectors and soft detectors, right, so what do hard detectors do? The output is a decoded symbol, that is all, so you receive a certain received value, you do, you decide which is the closest symbol and say that is my decoded symbol, okay, that is the hard detector, and we saw how that works, what does a soft detector do, okay, in the non-ISI case, what does the soft detector do? It computes probability for each bit given the received value, okay, for instance the quiz problem was a soft detection question, it is given a constellation and given that you receive a point, what is the probability, a particular bit is 0 given the received value, in fact you might even compute the ratio of that probability with the probability that it is 1 given the received value and take maybe the log of that and output LLL, so in general the soft detector will output probabilities, outputs probabilities and the hard detector just outputs the symbol values, okay, so that is the distinction between hard detector and soft detector and we saw it for the non-ISI case, in the non-ISI case you know how to do it, you know how to write down that probability and compute it, okay, it is possible, in the ISI case is the witter bit detector, a hard detector or a soft detector, how do I fill in the blank here, it is a hard detector, right, it gives you the symbol but it does not give you any probabilities for each bit, each symbol, it does not give you any probabilities for each bit, okay, so but like I have been saying all along in this course, today the first, today's systems typically want the detectors to be soft detectors, the soft values being used further in the other parts of the system, okay, so you want a soft detector, okay, so for ISI it seems like building a soft detector will be much, much more complicated than building a hard detector, the surprising thing is it is only two times as complicated, okay, so it seems like an interesting thing, so building a hard detector, if you can build a witter bit, if you run two witter bits, you can build, you can give a soft detector, okay, so that algorithm is what is called the BCJR algorithm, okay, so BCJR or Baal, Koch, Jelinek and Raviv, it is named after those guys, it is a very interesting algorithm, it is called by several names, it is also called the log map algorithm, MAP algorithm, okay, it is also called MAP algorithm, what is MAP? Maximum A posteriori, it is also called the log map algorithm, all these things are there, so basically, well, one thing is we will not see the BCJR algorithm in this class, okay, maybe if we have time towards the end, we will motivate it a little bit, but right now we won't see it, so what you do basically is, you run witter bit twice, you start from the left side, go all the way to the right, and then you have to start at the right, then come back all the way to the left, you have different types of state metrics, there are two different types of state metrics, you have to keep track of, there is also a branch metric you have to keep track of, and you keep doing it, you can, at the end of the day, you can compute probabilities for each bit, so what do you compute actually is, probability that sk equals say sum plus 1 given the entire received vector z, okay, so this can be computed, this is what the soft detector computes, okay, so well plus 1, I say, you see what plus 1 is, sum point and the in x, okay, so for each of these things, you can compute this probability, okay, so it's not, I mean, it's a nice algorithm, but it's not too much of magic, if you actually read it, you'll see the way it works is very, very nice and interesting, okay, so that's the last piece of information, there are approximations to the BCJR, which makes it make it more implementable and all that, and many of them involve doing the maximization for the log, remember there's this max log approximation we did for the soft detector, right, even in the non-ISI case, so you can do that in this case also, so then that's called max log map, okay, so you do that, and that's the version that's today being implemented in many systems, if people don't implement the entire BCJR, all right, so that's the soft detector, and with that I want to pretty much conclude the optimal detection for the ISI and move on towards several suboptimal possibilities for ISI, okay, so here's a summary before we forget where we are, okay, so what have we been doing, so the flow in this course, the way I'm doing it has been this way, so initially we started with an ideal channel, an ideal channel which is flat, okay, and then I said I'll choose a very, very simple pulse shape which is flat, flat for long enough time, if we make it flat for a long enough time then this channel response will become like a delta for me, so I don't have any ISI at all, so that was the first idea, okay, so in a practical situation you're given a certain bandwidth, if you have to use that very simple scheme then you have to first pick a small bandwidth within your bandwidth available which is flat, and then on top of that you have to pick a very, very small bandwidth within that you'll occupy, so it's very, very bandwidth inefficient, okay, so that's the first thing we saw, then the next thing we saw was what, what was the next step in the evolution, okay, you occupy the entire ideal bandwidth, how do you occupy the entire ideal bandwidth, okay, first thing you try is to avoid ISI, okay, how can you avoid ISI and occupy the entire bandwidth, you have to satisfy the Nyquist criteria and how do you satisfy the Nyquist criteria, yeah, choose a suitable pulse shape with your transmitter and receiver, right, so you do it with the square root trace cosine, okay, so that way you're able to occupy an ideal bandwidth, so now you want to slowly move away from occupying the ideal bandwidth to even other types of larger bandwidth, okay, so in that case we saw clearly that you can't get rid of ISI, okay, so but you can have a very good receiver which is an optimal projecting receiver which is, which is what did I actually officially name it, it's called the Whiten match filter, okay, so the match filter and the Whitener, how did that work, okay, so you have an SK which is going through a transmit filter, then it's going through a channel, then you have noise added, it gives you R of T, first thing you do is what, filter match to H of T, what is H of T, G of T convolved with C of T, okay, and then you sample, okay, some KT, and then you do what, you do a spectral factorization of SHZ, I'm assuming it is possible and all that, and then you filter with 1 by gamma squared M star 1 by Z star, okay, so this is, this produces ZK, okay, so intermediate thing we call YK, so this is a front end which is optimal in the sense that it gives you orthogonal projections in the signal space, right, so it's optimal and this is, this front end is called the Whiten match filter, okay, so maybe I didn't define this as the Whiten match filter, strange, I thought I must have done this, okay, or the WMF front end, okay, so the nice thing about the WMF front end is what, from here to here it simply becomes what? MZ and K, and then K is white and Gaussian and nice and you produce ZK, okay, for any further processing that you might presumably do, the Whiten match filter is an optimal front end, why? Yeah, because it's orthogonal projection, there's nothing more in the signal space than what the WMF can give you, okay, about SK, okay, any information about SK is completely contained in ZK, okay, so any other processing or suboptimal processing or optimal processing you want to do to recover SK, the Whiten match filter is an optimal front end, okay, so you can put that there and then do any analysis you want, okay, anything else you can come up with presumably of doing some other filtering instead of h star of minus t, some other filtering instead of 1 by gamma square, all that can also be done even if you do this, okay, so keep that in your mind, okay, so this is an optimal front end for any analysis that you might want to do for this kind of a transmission, okay, because it's in the signal space and it's orthogonal projection, it doesn't get any better than that, okay, so anything else that you might conceivably do can also be analyzed after doing this, okay, so that's the power of this and that's why it's very, very useful, very useful theoretical construct, okay, in practice we'll do something else but in theory it's a very good construct because anything else that you might want to study, okay, so now if you want to study equalizers, okay, what is an equalizer? Basically something that detects with ISI, it's an equalizer, okay, so roughly, okay, think about it that way. Any equalizer I want to study, I can study in this model and not in that entire model because I know this model completely captures everything that is out there, okay, so this is the ISI model that we'll be working with in dealing with equalizers, okay, so far we saw only one equalizer which is the maximum likelihood sequence detector which is the Wittebi algorithm, we also saw some analysis for it and it works fine, it's very good, if you can implement it, it's the optimal one for sequence error probability, okay, and there's also a soft version which is also very, very good, one can implement those things and work but the only problem is what? When the number of taps in this MOC become very large, why will the number of taps in MOC become large? When will they be large? When your channel contains a big tail, okay, so why will you want to use a large tail channel? So because MOC bandwidth always means better rate, okay, so we'll see all those things later, so you're always better off using larger and larger bandwidth, okay, so in terms of capacity considerations, you remember the formula, it's some bandwidth times log 1 plus SNR, so even if you have a non-ideal channel, the capacity scales with bandwidth, okay, so you want to always use more and more bandwidth even if it means a very bad channel which you have to equalize with a lot of penalty, okay, so you want to use a lot of bandwidth, okay, so that because of that the number of taps might become large in practice which will make the ML, SD and Vita be quite arguable to use, okay, so if you can't use it, what are the suboptimal equalizers that we can build, okay, so that's what we'll see beginning from next class and we'll only work with this model and like I argued anything you do in the entire thing with suboptimal equalizers, I can study in this model itself because the WMF is an optimal front end for everything, okay, so we'll pick up from here in the next class.