 Today's class will really tie together almost everything you've studied so far. There's dynamical systems, information theory, Markov chains, hidden Markov models, Epsilon machines, information measures of processes, all kinds of stuff. So it's a pretty good cap, I would think. So the next slide should be very familiar from Tuesday. Again, definition of a process and such. But we're going to use a slightly different notation, well not different notation, but we're going to single out different blocks. We're going to single out x0 and call it the present. So it is the observable that we are seeing right now. Everything before that we'll call the past and as you already know is denoted like that. And everything after the present is the future and we'll denote it like this. And we're going to keep this color scheme for the rest of the talk also. Red is past, blue is present, green is future. And whenever we have mixtures of say the present and the future, we'll have some color that's a mixture of blue and green. So the colors are meaningful here forward. And once again we'll assume ergoticity, stationarity, and discrete sequences, but these can all be relaxed fairly easily. There's been work done using these measures in non-stationary situations and non-discrete. I don't know about non-ergotic, but that's easy enough to get around also. It basically means there's a single basin of attraction. It means that as long as you wait long enough, the system will explore the entire range of its dynamics. Okay, so given notation and such, let us consider the present. And we're going to use i diagrams to really get to the heart of what we're talking about here. They're not key and we're not going to use them extensively, I would say, but this is how we're going to present the information. So as you already know, we can represent random variables and the information contained in random variables using invent diagrams because of the duality between set-theoretic operators and information-theoretic operators. And entropy is similar to set-cardinality. So here we have the information contained in the present. And we can throw the past on this diagram. Here we go, the entropy of the past. There's some information that the past and the present share. That's this overlap piece, the mutual information. There's some information that's only in the past and there's some information that's only in the present. And we can also throw the future in here. Very generically, this is the standard three-variable Venn diagram. The only difference is that I've elongated these two as kind of a visual aid to remember that these are infinite random variables in general. They're infinite in length, so typically there's an infinite number of set events that are in these random variables. If you had a periodic process, that would not be true. You would have a number equal to the period. But usually if there's any entropy rate, any randomness whatsoever, these two will be infinite. So once again, we've got the three variables we're interested in here, the past, the present, and the future. And they overlap in various ways here. One thing to note is this component down here. This is information shared by the past and future but not observable in the present. So this information is related to hidden Markov models. If you had just a Markov chain or a Markov process, this quantity would be zero. But anytime you have a Markov process of order larger than one or something that can't be represented as a Markov chain at all, this will be non-zero. And the other thing, and what we're going to focus on, is the decomposition of this blue circle here into these four components. We notice there's a piece that's in the present that's not related to past or future. A piece that's related to the present and the past but not the future. And similarly a piece that's the present and the future but not the past, and then one that's all three. All right, so H of x0, the information in the present is partitioned by the past and future. And as I already said, there's this piece over here, which we'll call sigma mu. And we're not going to talk about it any more today, but it is an interesting measure in general. So it's just the mutual information between the past and the future conditioned on the present. Okay, so now we're just going to focus on H of x0 here and this decomposition. And I'm just going to expand that circle out here. And we're going to look at one primary decomposition to it. Or two really. The first one is the one we've already seen before, which I'll call the human decomposition because it's the decomposition that's induced only by the past. This is that past pill type shape. And it splits the present into two pieces. And I call it human because when we study something, we use past observations to inform us about the present. So this is the decomposition you would get as any sort of causal being, not being able to tell the future. So there's two pieces here. The first one, this one down here, we call row mu, which is the mutual information between the past and the present. And this is anticipated information. This is the amount of information you would be able to tell about the present using just past observations. All right. And the second piece you all know and love is the entropy rate over here. This is the conditional entropy in the present conditioned on the past. So it's whatever information you cannot anticipate about the present given the past. And we've already seen that this goes by a number of different names. It's the Shannon entropy rate, metric entropy, Komolgrove Sinai entropy, and other names, very widely studied, et cetera. So the past will break us into anticipated and unanticipated information. And if we bring the future in, the future, I'm only going to have it divide h mu of. I'm not going to worry about how it divides row mu here. If you're really interested in that, you can read the paper or talk to me afterwards. But I have a reason for splitting it like this. And this breaks h mu into two components. We still have row mu here in the present. But now we have b mu over here, which is the mutual information between the present and future conditioned on the past. And this is unanticipated but relevant information. It's the unanticipated part of the present, but it plays a role in future behavior. So it's relevant for temporal structure. And the third piece, r mu, the entropy of the present conditioned on the past and future, is the unanticipated but irrelevant piece in that it does not relate to past or future behavior. This is kind of just noise. We also call this the ephemeral information for that reason. It shows up in the present moment and has no relation to anything that happens here forth or has happened before. And b mu, if you allow me to go back a few slides, it has a terrible historic name. So b mu is this piece right here. And if you recall the definition of the exocentropy, it's the mutual information between the, if you split the time series in half. So we don't have this present. But if we roll the present into this piece here, so we have zero onward, then the mutual information between the past and future would be these three pieces. And if we were to shift this piece here, so we move forward one time step, then x zero is now in the past and this is our future. And then the mutual information between the past and the future would be these three pieces. So as we shift in time, what happens is this piece of b mu, or really this piece of b mu, moves over and becomes this piece. Part that was in the future becomes now part of the past. So if your time series is stationary, these two components are the same in quantity. And since this is the exocentropy or this, this piece was originally known as the predictive information rate, because unfortunately, the people who first studied this were more familiar with the term predictive information than exocentropy. So they called this the predictive information rate, because it's the rate at which the predictive information or exocentropy moves. This piece shifts over. This piece remains static. So unfortunately, it's called the predictive information rate. But we will refer to it as the bound information in the rest of this talk. So we have anticipated information, ephemeral information, and bound information. And on this diagram, we can think about a number of different processes. So for a periodic process, there's no entropy rate. So both of these two pieces are zero, and all our information is anticipated, which makes sense. If you flip a fair coin, well, a fair coin that's or any IID process, there's no information shared between that observable and any that came before or afterwards. So all of the information is here in R-MU. So periodic processes are dominated by R-MU. IID are dominated by R-MU. And most other complex processes involve some mixture of all three. And we'll see some examples in your homework and later. Yeah, so most complex processes have a mixture of all three of these. There's some anticipated, some noise, and some relevant temporal structure or stochastic structure going on. And that's what this B-MU measures. OK, so the issue here is that to divide H-MU with this line here, we needed to know what symbols are in the future, which we obviously can't do. So the question is, how do we measure these? And we do it very similarly to block entropies. So if you remember, the entropy of a block of symbols is just p log p of the probability of those words or those blocks of symbols. And on a VIN diagram over, say, a block of length 4, the entropy measures each of these ways the information could be divided up. There's some information that'll be in the first symbol, but not the other three. And there's some information that's shared by all four symbols and so on. And the entropy captures all of that information. It doesn't distinguish between whether that information is only in one symbol or in all four symbols or what have you. And this other measure we're going to mention, the total correlation, measures things differently. It sums up each of the variables individually, so this oval, plus this oval, plus this one, and plus this one, and then subtracts off the joint entropy. So what we end up doing is anywhere where two variables overlap, and only two, we count once in the total correlation. Areas where three variables overlap, we count twice. Where four overlap, we count three times, and so on. And the areas that are only in one variable aren't counted at all. So the gray scale here denotes how often each atom of this VIN diagram is counted toward this information measure. Does that make any sense? Yeah. Here it seems like you only have one, two, three, four, but down there you have one through L. Yes. This is just an example of what one of these would look like if L was four. Or actually if L was five, since the right index here is exclusive. So for two variables, if we just had two variables, then this would measure just the mutual information between the two. For three variables, it would measure the conditional mutual information once, and the three-way mutual information twice, and so on. So this works on n variables. And it's been studied for a long time. This measure goes back to the 60s, I think. It's been called either the total correlation or the multi-information in some contexts. But since multi-information is an incredibly vague name, we don't use it. OK, so how do those two measures help us get the decomposition we saw earlier? Well, we already know that block interpies, their asymptotic rate goes as E plus L h mu. So we can get h mu by looking at the asymptotic slope of the block entropy curve. Similarly, if we plot the block total correlations, they end up having a slope of rho mu. So we can do this just from data without having to model or anything. We just take successively longer windows of words, calculate statistics, plot the entropy of the statistics of those blocks, and get the best estimate we can as to the slope. And that will give us our rho mu and our h mu. We can do a similar thing to get r mu and b mu. To do r mu, we use this measure where we only include the information that's not shared with other variables. So we look at these condition, we sum up these conditional interpies. So the entropy of x1 conditioned on x2, 3, and 4, the entropy of x2 conditioned on x1, 3, and 4, the entropy of x3 conditioned on 1, 2, 4, and the entropy of 4 conditioned on 1, 2, 3, and so on. For five variables, we would have conditioned on 2, 3, 4, 5, and so on. And we add all those up, and we call this the residual entropy. And as a dual to that, if we only consider the areas where there's overlap, but we do not weight them according to how often they overlap, so this area where just x1 and x2 overlap is counted the same as this area where 1, 2, 3, and 4 overlap, we call this the binding information. And if we look at how these quantities scale with increasing L by plotting the block residual interpies and binding informations, once again they will eventually hit a linear asymptote. And that linear asymptote will have a slope of r mu and b mu, respectively. So we can calculate these quantities just like we calculate h mu from data. And similarly, we can calculate these from epsilon machines, if we have them, using a technique that's a little more difficult than calculating the entropy rate, but we will go through a calculation of that later. So you'll see what it is. The one thing to notice about these measures are these block measures is that they are not monotonic, like the entropy rate and total correlation. They can go up and then down and around. So it's a little trickier to tell when you're at the asymptote with these than you are with the block entropy and block total correlations. So harder to work with, but they also give you more information. And these two quantities, since if we look at the eye diagrams here, this one plus this one will give you this one. Then we know that these two curves add up to the block entropy. So that helps us somewhat also. OK, so to really make this concrete, let's consider chaotic dynamical systems, in particular the logistic map. So you all remember that this is the Lyapunov exponent of the logistic map plotted at some point. But what I've done here is anywhere where the Lyapunov exponent was 0 or less, I've cut off. So I'm only considering positive Lyapunov exponents here. And that's because there's a theorem called Pesson's theorem that says that the entropy rate of a dynamical system is equal to the sum of the positive Lyapunov exponents. So that means in our particular case here, when the Lyapunov exponent is positive, that's equal to the entropy rate. And when the Lyapunov exponent is negative, the entropy rate is 0. So this gives us a direct translation from our information theoretic property, the entropy rate, to this geometric property, the Lyapunov exponent. And so that means we can use our nice information theoretic tools to analyze this picture. And that's exactly what we're going to do. We already know that H mu breaks down into r mu and b mu. So that means the logistic maps, Lyapunov exponent, also breaks down into r mu and b mu. So let's see what that looks like. We have one piece here, r mu, and a piece here, b mu. So we see that there's some non-trivial behavior in this decomposition, which I think we saw earlier in the quarter, but very briefly. So like I said, H mu is equal to the Lyapunov exponent, and the Lyapunov exponent will then be equal to r mu plus b mu. So at every parameter here, we can do a calculation and figure out how much of the Lyapunov exponent is b mu and how much is r mu. And that tells us how much of the observed dynamics is ephemeral and just noisy, and how much is actually correlated behavior, which is indicative of intrinsic internal computation of some kind. And we can see there are a few locations, say right here, here, and over here, where it appears that b mu goes to 0, and then the entropy rate is just equal to r mu. And does anyone know what's going on at these locations here? Anyone recognize them at all? What? The not periodic cycles. I mean, for example, there's the period three windows. That's where the Lyapunov exponent goes to 0. So these are areas where the Lyapunov exponent is not 0. It's large. But the b mu piece becomes 0. Anyone remember? It looks like the, I forget what the name of it is, but where it's completely chaotic, in some sense. Do you mean at the band mergings and this band merging? And then also at the full band, like you mentioned. So for some reason, which we're not sure of yet, at band mergings, the observed dynamics become completely ephemeral, and there's no temporal structure other than periodic behavior, because there are unstable periodic orbits here. So this can have a few meanings here. For example, if we were trying to design a random number generator, we would probably want to choose locations where b mu is 0, because then there would be no correlation between our observed symbols, other than perhaps periodic behavior, which is easy to account for, which is a good sign, because it was r equals 4 that they used as a random number generator at the Manhattan project. So von Neumann chose wisely in his parameter for the logistic map. Other issues here, well, so we know band mergings, it goes to 0, and you might have seen that from your homework because, well, we'll see what this means, what b mu means for automata a little later, but you can think about why that might be at the band mergings. You saw it was a noisy period 2 behavior here. Noisy periodic behavior is periodic and ephemeral. There's the period of the internal Markov chain, and every other step in that periodic chain, you have ephemeral coin flip that doesn't affect which state you go to. So you know, at these kind of points where you have noisy periodic type behavior, that the information measures will be all ephemeral or periodic, so all Romeu and R mu, but no b mu. However, that doesn't help us understand why this might be from the point of the map itself without discretizing and converting to an epsilon machine. We would like a method of being able to look at this and determining why or at what parameters these will go to 0. And we don't have that yet, but we are actively working on it. So the other question here is, there is some serious structure in this R mu b mu partition here. It looks noisy-ish or something. And the question might be, how much of this is original behavior from the decomposition and how much is incorporated from the fact that the entropy rate itself is very noisy and bumpy here? So to do that, we turn to the tent map instead, because we know its entropy rate is very simple. It's just the log of the control parameter. So h mu is a very smooth function. So then we have to ask, will the boundary between R mu and b mu be smooth, or will it have a lot of features? Any guesses? You might have seen this already, so we're going to skip that. Well, it has features. It appears to be continuous but non-differentiable if you zoom in as close as possible. And again, it goes to 0 at the band mergings, which we can estimate even better, because we know they are the perhaps the square root of 2, the fourth root of 2, the eighth root of 2, and so on. So we can see very clearly that at those values, b mu is going to 0. So it is at the band mergings. It wasn't just coincidental. And I guess there's not too much else to say here, other than that, even though the entropy rate is very smooth, the decomposition to R mu and b mu is not. So that means this decomposition is introducing something very new. There are new dynamics and new behaviors here. One hint as to what this might be measuring is we notice that the bifurcation diagram itself has a lot of features in here. There are the veils, the veils cross. There's a lot of dynamics going on within the bifurcation diagram, which are not reflected by the Lyapunov exponent at all. So we believe this decomposition is picking up some of that behavior. What exact features of it we don't know, but that's what we believe. Yeah? There appears to be some similarities there too. Yes, we will be getting to that in just a few more slides. Just one more map to look at first, though. And you might not have seen this one, but the lozi map, which I don't have a picture of it, but this is it here. It's a two-dimensional map. It limits to the tent map when b is equal to 0. b? Yes, when b is equal to 0. So this line here is the tent map. We see that because when b goes to 0, y goes to 0. So this is y. And we just have 1 minus a times the absolute value of x, which is essentially the tent map, just shifted and scaled a little. So if this is our entropy rate here, we can see it's relatively smooth in this attractor region. And it's 0 outside. In fact, let's see. I think up here, things go to a fixed point. Here and here, they diverge to infinity. And I think here diverges to infinity also. But in this region, it has a chaotic attractor. It's very similar to the Hanan map. This is a linearized version of the Hanan map. Not completely linearized, but in the same sense of that tent map is a linear version of the logistic map. This is a linear version of the Hanan map. So we can play the same game, break it up into r mu and b mu, and we get these. So this is r mu at the same scale as this picture. And this is b mu, but the scale has been adjusted so that you can see the features. So we see that there are two distinct regions down here and up here where b mu is maximized. So if we had some system with the lozi map underlying it and we were using it to perform as some kind of computation generator or doing some kind of computation for us, if we could harness its dynamics appropriately, we would want to put it in either this area or this area where it's doing the most internal computation, most intrinsic computation. Because over here, where b mu is small, we know all of our behaviors are mu. And that's just ephemeral. There's no structure going on, just random number generation. Whereas in these areas where b mu is large, we have lots of correlation between the observations. And so there's some kind of internal computation and structure being performed. So next, we actually have a movie of this decomposition. And what we'll do is we'll take slices going this way. And our movie, as time progresses, will go like this. So we'll start here at the bottom, where the attractor is very small. And we'll show pictures just like the ones we saw earlier, like this one. So we'll see the Lyapunov exponent as a bold blue line, the bifurcation diagram in the background, and then a blue and a green region that separates r mu and b mu. And we'll see that for the lozi map here, tent map right there. We can see that period two behavior that was outside the attractor region. So any questions about the movie or the decompositions we've seen before moving on? Yes. I was changing the parameter b. If we go back to the equation, it's this parameter that xn is multiplied by to get yn plus 1. And then the horizontal axis on the movie was the a parameter, which is right here. Yeah, so effectively you fix both of those and you get some long time series. So we pick some particular pair of values, say, right here. We take a long time series, and from that we discretize it and we calculate r mu and b mu using the residual entropy rates and the binding information rates and the slopes of those. And then we get the decomposition. And we do that for a wide array of a and b values. And then we get these pictures and the movies. So b, if you calculate the Jacobian, you'll see that controls the determinant of the Jacobian and therefore the area shrinking rate. So when 0 is the infinite shrinking rate when you think of the one dimensional system, so then the two extremes are close to area preserving. Right, right. At b equals 1 and negative 1, it's area preserving. And you actually get an attractor called the gingerbread man attractor, which you can look up if you want. OK, so we can watch the movie again because we have plenty of time. So you'll notice there are some features of the bifurcation diagram. There's a discontinuous jump here into where the attractor bleeds out. You can't tell down here, but you can see there's some up here. And the tent map is where this joins the main attractor, as well as all these veils cross at the exact same point once we get to the tent parameter, b equals 0. And we'll notice that here this band merging still has b mu equals 0, but the one that's very fuzzy here was not at 0. All right, so going back to your question about the self-similarity, one thing we can look at is, what happened here? I'll say what I'm going to say anyway. So if we plot b mu versus r mu for all different parameter values, for the logistic map we get a picture like this. And we can kind of see those bumps we had before in the self-similarity. And if we do the same for the tent map we get a much cleaner picture like this. So we can see the b mu versus r mu plots are very self-similar. And we can also look at scaling b mu. So if we plot b mu divided by h mu, so this is for the tent map again, but instead of having b mu's that were upper bounded by something like this, if we divide by h mu to rescale it, we see that these all have the same height, I mean up to noise. This is a very low parameter value in there. It's hard to get accurate values there. But we can at least see for the first three self-similar bumps. So from the what is it, 8 to 4, 4 to 2, and 2 to 1 band, we see these all go to the same height relative to h mu. And if we were to just stretch these, so this is from square root of 2 to 2, this is from 4th root of 2 to the square root of 2, 8th root to the 4th root, these three are all perfectly self-similar. As long as you scale them to have unit horizontal width and divide by h mu. You had a question? No, I don't know why that's the maximum b mu. It's at, what is it, 0.2514, I think? I mean between an a and a. Oh, I don't know that either. No. So now the hard part of the talk is we are going to calculate b mu for the tent map at a particular value. We're going to calculate it analytically. And this is going to tie together almost everything we've talked about so far. And so we're going to pick this value here. This is the value, it's called a miserevage point, and it's where the, let's see, the first, second, third, fourth, and fifth iterate, so the fourth and fifth iterate of the maximum cross, or where they're equal. And it turns out this is a fixed point of the system. So the fourth, fifth, sixth, seventh, eighth, ninth. So we only have four unique vales in this, or four unique lines. We have the first iterate of one half, which is the upper bound here, the second iterate, which is the lower bound, the third, which is this veil right here, and the fourth, which is right here. And so if we calculate that just by setting where f to the fourth equals x to the fifth, or f to the fourth of one half equals f to the fifth of one half, we can solve for that and we get this parameter. Which is if we set alpha equal to the cube root of the square root of 19 over 27 plus 1, then a is equal to alpha plus 2 over 3 alpha, and we get that. Because that's just the solution to this x to the fourth equals x to the fifth. And we choose that point in particular because it has a Markov partition. So here's the map at that particular value. And we know we have to split it at one half here, because that's where the directionality changes there, where the second derivative changes sign, or the first derivative. Yeah. And there, we also notice this is the invariant distribution, and it has two very natural places to cut it. And we notice that does form a Markov partition. Partition A here maps to both B and C. A maps to B and C. B maps to D. C also maps to D. And D maps to A, B, and C. And because these heights are fairly well defined, I mean it's three uniform pieces here, three flat pieces. We can build the Markov chain fairly easily. We already knew A goes to B and C with equal probability. B goes to D. C goes to D. And D goes to A, B, and C. And it goes with these probabilities, 1 over A plus 1, A over 2A plus 2, and A over 2A plus 2 to B and C. We know it has to go to B and C equally because they're equal width here. So we get a Markov chain. And we can turn it into a hidden Markov model by attaching a symbol on each of these edges according to where in parameter space they were coming from. So all of the edges that leave A are on the left half of the partition, so should output a 0. B should also output a 0. C should output a 1. And D should all output 1s. So we make that into a hidden Markov model by putting these symbols on here. And now we have a hidden Markov model that represents the process here exactly. So we've relied on, first off, just solving things about our dynamical systems. We were able to find a Markov partition. And we were able to turn that Markov partition by using the generating partition at 1 half into a hidden Markov model for the observed dynamics. OK, but we notice this is non-uniphealer. We go to A, B, and C all on a 1 from state D. So this is not uniphealer by any means. So then we run mixed state presentation on this. And we have to do it carefully. We can't use campy for it. Because campy only uses numerical methods. And we're doing this entirely analytically. So I implemented this in Mathematica and was able to calculate the mixed state presentation for this. And this is the recurrent portion of it. Still four states, but we have much more difficult transition probabilities and a different topology altogether. So this is the forward epsilon machine for the tent map at that miserevage point, where the fourth and fifth iterates cross. Any questions so far? So we've started with a dynamical system and we've constructed the epsilon machine completely analytically. No numerical tools whatsoever. We can also construct the reverse epsilon machine using the standard reverse operator we've seen already, where we flip the edges and renormalize them according to the invariant distribution. And then run mixed state again on that. And we end up with this. So we reverse all the edges and it got messed up again. Title and this should be, I think it's 1 over A plus 1. And this is 1 over A plus 1. I guess it would have to be A over A plus 1 and 1 over A plus 1. Yeah, it looks like the A's were just dropped out of this. So A over A plus 1 and 1 over A plus 1 should have just stuck with the PDF. Is the bidirectional? The next slide looks OK. So that's the important one. So once we have the forward epsilon machine and the reverse epsilon machine, we can calculate the bidirectional machine, which I don't know how much you spoke about that. All right, so here it is. This is the bidirectional machine for the tent map analytically at the parameter A equals whatever I had before. And this is only valid at exactly that one point. OK, so now we have the bidirectional machine, which means we can calculate the exocentropy because the stationary distribution over this will give us a distribution over forward epsilon machine states and reverse epsilon machine states. Once we have the distribution over forward and reverse that joint distribution, the mutual information of that is the exocentropy. So we can calculate the exocentropy exactly analytically in terms of the parameter A. So we have it completely analytic. But why stop there when we can calculate B mu and R mu? So to do so is trickier. If we recall, I'm actually going to go way back for a moment to the past, present, future I diagram here. B mu is this piece, and R mu is this piece. The difficulty with calculating them, just like the difficulty with calculating the exocentropy, is that this and this are infinite random variables. These circles are infinite in size, so it's very difficult to calculate the difference between them, these overlaps, when we have infinities to work with. So what we do to calculate the exocentropy is we summarize the past by constructing the causal states, which are the minimal sufficient statistic. So we take this infinitely sized circle, and we make a finite sized one that preserves all the overlap with the other variables. So the causal states would look something like this. They would still capture these exactly, but they would shed an infinite amount of the past that was irrelevant for these variables. We can do the same thing with the future by calculating the reverse causal states at time one, and so those would look something like that. And then we have three finite variables that we can use to calculate all these overlaps as needed. So what we do from the bidirectional machine, we know the distribution over forward and reverse causal states at the same time is the exocentropy, but if we separate them by one time step, we get the distribution needed to calculate r mu and b mu. So what we actually need is, for example, the forward state at time zero and the reverse state at time one. So we look at each of the edges, and on the node we start from, we look at the forward causal state, and on the node we go to, we look at the reverse causal state. And looking at all the edges here, that then gives us a distribution over forward causal states at time zero and reverse causal states at time one, as well as the observed symbol x zero. And once we have those three, those are the three variables that r mu and b mu are defined in terms of. So from the bidirectional machine, we look at what's called the edge machine, which gives us a distribution over forward causal state at time zero, reverse causal state at time zero, x zero, forward causal state at time one, reverse causal state at time one. We get this distribution from what's called the edge machine, and we can marginalize that to keep just the forward causal state at time zero and the reverse causal state at time one and the symbol. And from that, we can calculate r mu. So we have our three values. H mu, we already know, is the log of a, which is the log of this thing, which is that value. r mu, when we solve this right here, is actually equal to this. Turns out not to have any logs in it at all, which we then plug in what a is and numerically solve. And then b mu is just h mu minus r mu, which we get there. So what we've done is we've started with a standard dynamical system, the tent map, isolated a single parameter on it. And from that, we constructed a Markov partition, a Markov chain, a hidden Markov model, then an Epsilon machine, then a reverse Epsilon machine, then a bi-directional Epsilon machine. And from that bi-directional Epsilon machine, we can calculate h mu, r mu, b mu, the exocentropy, c mu, any of the quantities we're really interested in. So this construction, starting from the map and going all the way through, is a complete example of starting from essentially a complete scratch, a system that we're interested in, and analytically finding all the information quantities that we associate with the dynamics and behavior of that system. And so that's actually about it. So any questions? On your homework, you're not going to really be asked to calculate these quantities because, as you've seen, it's rather difficult and time consuming. We have these functions built into Kampi, so you can. Do you want to go over the, like, you can show the same shit that I'm going to do? Step through? Step through? Sure. We, without doing it, just discuss? Yeah, we're not much better at it. So these methods are built into Kampi. You say m dot residual entropy rate, and you get this. m dot predictive information rate, and you get that last one. And if you're interested, you can read this paper, which talks about how you calculate the measures, but not so much about the last part of the talk, applying it to chaotic systems. But it's there. So the video of the slides won't have the homework on it, but the video of the class will. All right. And this is not the final version of the homework. It'll probably be tweaked before you do it, but. You just want to scroll through. Sure, sure. So you're not being asked to do too much. I've written a function called Markov machine here, which you pass a machine into. And what it does is it plots the picture of that power automata that you just need to find the longest path through the transients to get to the recurrence. And you're going to do that for the even, the golden mean, the noisy random facelip, and the random random XOR. And then you're going to do the same thing, but you're going to use a function called cryptic machine, which, again, from Tuesday's lecture, it will show you the output of the cryptic order algorithm. And again, by eye, find the longest paths. And just look at what the algorithm outputs, because it says a lot about the process. And then there's a couple theoretical questions. So first, I don't know if co-uniflarity was introduced, but all that really means is if you flipped the edges, it would be unifilar. So unifilar going forward means from a state, there are all the edges that outgoing are uniquely labeled. There are two edges with the same symbol outgoing. Co-uniflarity means of the incoming edges, there are no two that have the same symbol on them. And so the question is, how does co-uniflarity relate to cryptic order? So that's what I want you to think about. And in particular, think about what the algorithm does, what the cryptic order algorithm does, and how co-uniflarity will affect what it does. All right, looking at those four example processes given before, the even golden mean, noisy random phase, sip, and random random XOR, perhaps figure out which of these are co-unifiler and see what the output looks like and why it looks like that. And then, if you consider any presentation R, so any model, not necessarily the Epsilon machine, any model that is both unifilar and co-unifilar, and there's a word missing on here that I put in the answers, but not in this one, so I'll fix that, but, and does not have duplicate states. So no two states are completely identical. So unifilar, co-unifilar, and does not have duplicate states show that that presentation must be the Epsilon machine and that any other model, even those that are non-unifilar, have to be larger. All right, and hint, look at the, look at the eye diagram of the past, future, and the forward and reverse causal states, in particular the forward causal states. So that's the hint, look at the eye diagram. The last two questions, this one steps you through how to calculate the excess entropy for what we call the odd process. Similar to the even process, but when you see blocks of ones, they're always odd in length. So what you'll do is you'll use a function we have built in called build by machine, which outputs three things, it outputs the bidirectional, the forward and reverse causal Epsilon machines. And you'll look at all three of those, and then from the bidirectional machine, you extract the stationary distribution and then calculate the mutual information of the stationary distribution and you get the excess entropy. And compare that to the output of just the excess entropy function. All right, so that's easy enough. And then the last one, and this is a relatively open-ended question, I want you to look at the even and the golden mean processes, which are supplied in this worksheet. The golden mean is slightly different than the one you're used to. It's no isolated ones instead of no isolated zeros. So it's completely isomorphic, but you switch the symbols on the edges. So look at those and then look at their entropy rates, their ephemeral information and their bound information using these functions that are here. Think about where entropy is generated in an Epsilon machine. Where does the entropy rate come from in terms of the structure of the machine? And then consider why two machines with only a very small difference will have different decompositions between the R mu and B mu. So it's a fairly open-ended question, just put down any ideas you have as to why things might be the way they are. There are not any particularly wrong answers on this one, so. Yeah, exactly. In particular, I say speculate because I don't know the answer. I have some strong ideas, but I don't have an actual answer yet, so. If you come up with one, that would be awesome. So yeah, that's the homework. There'll be some tweaks between this and what you get, but this should be relatively straightforward other than the speculate part. I guess not straightforward. But any other questions about the anatomy of H mu or anything else? Hopefully everyone understood how you go from, even if you may not have been able to do it yourself, you understood all the steps going from the dynamical system to the Epsilon machine to the bidirectional machine to actually calculating these quantities analytically, because that's essentially what the entire class has been about. How to go from a dynamical system to understanding how it stores and processes information which are what those quantities measure in different fashions. So, all right, thank you.