 Okay, well let me get started. I have a batch of announcements, so first is that there's a new lab on the SageCampi server that goes over the mixed state calculations. You'll need this for the homework, which I still have to push up there, homework 14. There's this homework this week and I'll assign one more next week, and then that's it. Then work on projects. Yeah, and then this is, like I said, just mixed state and how to use the computation mechanics in Python package that's built into Sage. And it just goes through various things you can do, but basically the goal is to calculate the, starting from some presentation, calculate the mixed states of that and the transition structure. And there are, you know, we went through the calculation last week, the theory behind this. We're going to use it today, so we'll see it in action. It's a little bit time-consuming to do when you have, starting presentations that have, well, more than three or four states and get kind of, in fact, infinitely hard, so there's some code to help do this. So the lab just goes through what libraries you need to import. And then there are sort of two classes of thing. One is drawing the mixed state distributions graphically in their simplex, right? State distributions are distributions. There are sets of numbers between zero and one that sum to one, and that means the geometric structure they live in is a simplex. So the state space of the mixed state presentation as a dynamical system is a simplex. And so there are a number of simplex drawing facilities in there. So this just gives a uniform sample of over three random variables, three states if you like, sample size 100, and then draws it in its appropriate simplex. You can also draw the simplex for any number of states or variables in your distribution. There's a particular perspective it takes. It turns out a simplex. You can always turn a simplex into an orientation where the vertices, which we call the pure states, you know exactly one of the, you're in one of the given presentation states with probability one, so that those are always on the periphery. It's a projection from n dimensions down to two dimensions, but still at least gives some kind of reference projection. It's helpful. So that's just built in. So this is for that. All right. Okay. So then shows you how to, well, we've done this before, how to build a model from a string, produce a machine. And once you make this object, it knows how to draw itself. And then there's a new function, build the mixed state presentation. So you just pass in the presentation and their number of keyword options. In this case, show me all the transients including, so transient mixed states and recurrent mixed states. So here, this is the even process. A goes to A on a zero, probably a half. A goes to B generates a one, probably a half, and B goes back to A generating one with probability one. So there we go, like that. And then there. These machines get returned on how to draw themselves also. And so we go from the even process down to, and this is what we did last time, the calculation we did. We ended with two transient states. In this case, these transient mixed states are transient causal states. There's an exercise that gives you a presentation, actually it's just the two state fair coin presentation, two state presentation of the fair coin process. That the mixed state presentation does not minimize. So the mixed states it returns are not causal states. So you have to be a little bit careful. What happens, as I sort of pointed out, the mixed states are refinements of the causal states. So you may have an extra step that you minimize. So you have to be careful. But in many cases, like we just showed for the even process, that works out fine. But we can't complete these things all the time. So anyway, we recover it right, the recurrent components that we gave it, and then these two mixed states where we sort of hop around back and forth, which maybe wouldn't have been obvious from the recurrent presentation states. Now you can also take these mixed states and put them up into a simplex. So that's what this next little bit of code does here. You just simply pass in, after you've calculated the mixed state presentation, you're interested in, you just pass it into this grapher that will kind of extract the mixed states from it and then put them up into a simplex of appropriate dimension. So here, we're starting with a presentation with two states. Therefore, we have a one simplex, it's a line. The vertices at the end of the line are being, knowing that you're exactly in state A or exactly state B. And any transient states are in the interior of the simplex. So in this case, if you remember, the mixed states, the transient mixed states, this was probably two-thirds, one-third being assigned to state A, state B, your state uncertainty. And then this state here, if you saw one transient state, was 50-50. We calculated that out, okay? So those are mixed states, and then when you plot them in the simplex, you have this graphical, well, kind of degenerate. So here's our one-dimensional simplex over the two states A and B. And the little grapher puts the state names there that you give it in the original presentation. And then here's the, well, here's the start state, two-thirds, one-third, right? There's only one probability parameter because they're normalized. So I just, probably two-thirds. And then this is the other transient mixed state, one-half- a-half. Okay, well, that's all kind of trivial. And you can just sort of look up here and go, oh, okay, I can understand that. However, the interesting thing is that you can go down and look at a more complicated process. So remember our friend random random XOR? And you work through one of the homeworks, just working with the five recurrent states it has. So what I do here is I pull it out of the machine library, random random XOR. So I can now have this object. It's actually the Epsilon machine for the random random XOR process. I can draw it, and then I build the mixed states and then plot the mixed states. Well, let me say one little thing. The way the plotting works right now, you'll notice, and it's commented above. I clear the graphics, this kind of graphics work area, I clear it first. And then I draw to it, and you have to kind of save. It's a little bit of sage, rigmarole, you have to do. So it's just boilerplate. Anyway, so I use the simplex draw function to plot that. So we have the Epsilon machine first, and then we look at the mixed states. Well, so here's the Epsilon machine. It was just the five recurrent states for the random XOR. And you work that through. Then I calculate the mixed state process and show that as a label directed graph. Oops, oh no, sorry. I just did the MSP here, right, just the mixed state presentation. So there we go over the initial five states, and the state labels zero, s, one, zero one, one, these are histories actually, some notation that was built into how we stored the original presentation. So we have five states. These would be the pure states. When you get to those mixed states with the delta function means you're synchronized. And then we have all these other states in here and on the edges. So there are five recurrent states and 31 transient states. So it's kind of nice having some code doing this. Doing this by hand would be extremely tedious. Now you can, if you want, go ahead and just just draw it out. It's a machine. It knows how to draw itself in terms of a label directed graph. So here you go. Well, can I get to the whole thing? So those are all the transient states. Can I get it? Well, there we go. Let's see, let me shrink you down. Oh, there we go. So the simplex plot was much handier. But here you go. If you want to look at it, you really have some question about this state and how it's transitioning, you can go find it. So mostly transient states and then here the five recurrent states down here. So we're getting into some non trivial calculations at this point. I mean, all the homeworks have been things you could more or less do by hand. But at this point, in particular what we're going to talk about this week, in terms of time, reversibility, irreversibility, we need this mixed state operator. And there'll be some interesting things that happen when we look in reverse time. So there you have it. I think that's what's in the lab. Yeah. Right, because because because the mixed state presentation is unifiler, right? You'll stay synchronized. I should say you're synchronized to the mixed states. Now, I could you could have started with, unlike the two examples here, we'll be started with the Epsilon machine presentation that is unifiler. And in that case, once you synchronize, you stay synchronized, I could have started with a non unifiler presentation. And then you can actually gain and lose synchronization. You can be in a vertex in the hopback. And there's an example, the simple non deterministic source example is a good way to look at that, which is in the homework, which I haven't posted yet. I will post this afternoon, the homework 14. My question is, is it true that once you get an edge, would a unifiler source that you're going to stay on once you get it? Yes. Yes. If you start with a unifiler presentation, yes. Right. So synchronization is a comment about a presentation. And but you know about the presentation state distribution. Right. So if it's unifiler, then as soon as you get to, you know, certain knowledge that you're on some state, and then the next symbols you read are loud symbols, then you're going to just keep hopping around to those states, with certain knowledge as to what state you're in, you can stay synchronized. Right. No, no, there might be particular examples where that's true. But here, right. So so here, since you hit these presentations, you're when one of the green states in label directed graph or the recurrent states of the presentation, but I can hop here and come back into the interior. It's not like they're you mean, oh, this is kind of like a sub synchronization, but you can lose that information again and come back in. Yeah. Yeah. Good question. Well, you have the tools, you can come up with some examples and try it out. The homework will lead you through some calculations along those lines. In particular, there's one thing. Okay, so this is messy, maybe not so bad here. And you can go five recurrent states you can calculate to the 31 transient states. Don't trust me on that, but there they are. Verify for yourself. Sometimes these mixed state presentations blow up. In fact, they can, even if you start from a finite state non unifiler source, you can actually calculate these mixed states end up with an infinity of them. So, so then we start talking about distribution. Well, this kind of drawing is not terribly helpful. We start talking about distributions of states over this. I really moved up to this next more abstract level, where we're not talking about states and what recurrent states we're in, but this sort of constant condition of ignorance about exactly what state we need state distributions to track things to do optimal prediction. Okay, so the last little segment down here, pretty much the same code as before, except it introduces a parameter to the mixed state builder called max length. So you remember what we do is we when we calculate the mixed states, we start with the start mix state which we observe nothing. That's the asymptotic state distribution over the given recurrent states of the presentation we start with. And then we look at how that gets updated by seeing a zero and seeing the one seeing zero zero seeing zero one one zero one one and so on. We keep looking at longer and longer words, basically until the mixed states stop. I'll cover this again today. But if it's going to be a large and or infinite number, well, the calculations are going to take a while, can't display it. So what we do is you can specify the maximum length of the word that you look at. So this little part of the lab is just a step through. Try looking at the random random XOR just on words of length one, words of length two and then see how it grows. So here I set this to two and that L gets passed in here to this keyword parameter max length equal L. So here's the five state presentation we start with. And then if I just look at length two inducing words zero zero zero one one zero and one one, I just have these basically just a tree, which should make a little bit sense. Remember it was a random random XOR generate two bits time t time t plus one and then the third bit is the XOR of the previous two. So at least it's producing or distinguishing these looks like it looks like a tree, right? All four binary words with equal probability. So but then for the random XOR, you go out to six length six words and above, you recover what I just showed you. Not specifying that parameter, the algorithm does the best to find the complete mixed state presentation. If it's an infinite one, then it it'll kick back and air to you when it runs out of resources. So the homework, one of them asked you to look at the simple nonunifiler source. Turns out there are accountable infinity of states. They grow pretty slowly, basically linearly with L and ask you to make some guesses to what the infinite mixed state presentation is by looking at a series of approximations kind of guess and look at the trend. Okay, so that's the new lab. Oh, you're right. And then so here we had the six mixed states and six, seven mixed states. Here they are. Nobody's synchronized yet. So okay, so that's that. Have fun with that. And like I said, I'll post homework 14 after I want to read over it one more time. Other announcements. Well, okay, so so mark your calendar. We're going to have the project presentations in Martinez, California. That's halfway between here and Berkeley. Easy train access. It's down at colleagues house in Martinez. Very easy access to Amtrak station right there. And the tentative plan is Saturday 1st of June and kind of afternoon evening presentations and a barbecue. Also, the rest of the quarter is sort of laid out here in terms of the topics we're going to cover next week. Ryan is going to go over crypticity in two lectures in more detail. And then and then Chris strallyoff is going to talk about this issue that kept coming up a bunch like well how do we do practical inference if I start from data from finite data? How do I infer an epsilon machine? It's very nice technique is developed that lets us deal with some kind of thorny issues in rather straightforward way. And then although there are a number of applications of computation mechanics, I wanted to the sort of third week, we have someone coming in from University of Indiana, Dahman-Varn, who's going to talk about complex materials, complex semiconductors and how we can talk about not a time series or time prediction problem, but how we can use computation mechanics to go from x-ray diffraction spectra experimental data to inferring the crystalline structure of a material. In this case, class of semiconductors. So just kind of a nice concrete. This is a anomaly physics class, so we'll do some physics traditional crystallography and talk about structure and this kind of a new notion of how disordered materials are actually quite ordered in structure if you look at it from the point of view of epsilon machine analysis. Right, so based on Quinn's questions last lecture, in particular, what are the details of how the mixed state updates the state distribution after I see a word, I wrote up some notes that really short, it's supposed to become like a crib sheet that just, in a very synoptic way, goes over mixed states and how you calculate with them. So that's down, go down to the computation mechanics reader, down the bottom page, that link, and the notes are called MSP. Mixed state presentation. And it's supposed to just, you know, it's not the full mathematical detail of the last week's lectures, but just why things happen the way they do. So I hope that'll help clarify what's going on with the mixed states, how you calculate them. Okay, so that's that. And let me get started here, the real stuff. Okay, good. So this week is really the sort of resistance of the theory of sort of one-dimensional time series or one-dimensional spatial series computation mechanics. And the goal, by Thursday, is to develop a representation or presentation of a process that is in some sense time agnostic. So our notion of time is sort of subjective, we get to choose this. And as we talked just briefly a couple weeks ago about what happens when you reverse time, that's what we're going to talk about today. Just being kind of honest about how we're scanning the data and we could have scanned at the opposite direction and what do we get out of this right through these interesting properties. So that's what I want to talk about this week, directional computation mechanics. There are two lectures and like I said, we needed last week's results on mixed states to even work with this. So the agenda is to review but maybe set up the notational more carefully than we did a couple weeks ago. Forward and reverse processes, what we mean, mostly kind of notational, but also review what information statistics are symmetric and asymmetric when you scan a process in different directions. Then we're going to introduce forward and reverse Epsilon machines that correspond to those two different scans, talk about ways of measuring different kinds of irreversibility or reversibility if you like. There's a way I kind of look at this in terms of this joint process lattice when I get confused. We actually have sort of complicated joint distributions over infinite variables of observed symbol and now forward states and reverse causal states and so this is going to be a little roadmap so we don't get confused. Everything we're asking is actually a question about this joint process lattice, some conditional joint distribution. It usually turns out to be and then I'll go through how to, when we reverse time, how we reverse a given Epsilon machine, think of it as a generator, generates the process in forward time, how we can, starting from the machine, reverse the machine and calculate the reverse time Epsilon machine. See if we get through it. It's just a fair amount to cover today, not, given what we've done, shouldn't be too difficult but first part's just some kind of review. Again, just to set the notation, write our by infinite chain of random variables, past and future, and we look at blocks from type T going forward L steps or past blocks that come up to, of length L that come up to but not, do not include time T and our objects of study again, our processes that are distributed according to this huge by infinite joint distribution. The goal here is to get around using that by building models. Okay, so what we've been doing, so I've just got it, we're just going to recast what we've been doing, basically add little plus sign so we can keep track of this choice of scan direction. So what we've been doing and it was done implicitly even on the previous slide and all the previous slides, I just wrote down an index on the random variables. Okay, and we just assumed, oh, okay, that was time. And we had the index increased by one each time we advanced. Okay, so but now we're going to call this, what we've been doing, not just the process but the forward process, right, and I'll call that calligraphic P plus and it's distributed according to the joint distribution over this scan direction. Okay, and we have various quantities, of course, we have the forward entropy rate, we talked about this a little bit, H mu plus, and that's the uncertainty in the next symbol given the semi-infinite past. But now we have to talk about the reverse process. So I'm still sort of giving you this lattice of variables but I'm telling you I want you now to scan this way. So there's a little bit of an issue and I'll probably, at some point, misstate things. Human beings are very bad at binary logic. I'm one of them. So do I call these different variables and order them with an index that increases from left to right? Or do I talk about the original variable and then have a little variable accessing function that decrements when I go through? Your choice, but I, yeah, okay. So, but the idea is absolutely simple. We're going this way, right, right. To left we scan them. So we have this reverse process and it's just simply, you know, P minus and I'm going to sometimes use another variable Y to make this distinction where Y is just simply the indexing through X in the opposite direction. Okay. Mostly I'll write things in terms of this because then the contrast between left and right scan will be clear. But sometimes the points we're making it's handier to use Y. I mean for example if I want to define I had a process over Y, I'm not telling you what direction it is. I say oh it's got an entropy rate. That's in the Y variable I'm predicting the next symbol giving its semi-infinite past. But that was X's future. So I can write it this way because that's just applying the definition of H mu to this given process. But no, oh I know it's the reverse of something. So this Y notation turns into this X notation down here. Right? So the reverse entropy rate of P minus is the uncertainty in, call it the current or previous symbol given the future going forward. But that from Y's perspective that was its past and next symbol. Okay. Yeah. And there's a little bit of contrast here. Because of the way we define past and future, the future starts a team goes forward, the past ends at T minus 1 and does not include X of T. There's just a little shift by 1 here. Just a definitional thing in the index. Okay. So we played this game before. If I have these, I have a process and sitting in my computer memory and I can choose the scan at whichever way I want. So in which direction is a process more unpredictable? Namely would have larger entropy rate. Well, we already went through this. Right? Neither. In fact the forward inverse entropy rates are the same. So if I'm predicting or retro-dicting the average error or surprise is just H mu and I'll probably most of the time just drop the plus and minus. And we went through a couple weeks ago the derivation of that. Pretty straightforward. Mostly just an application of stationarity. Right? So we look at H mu plus. That's the uncertainty in the current symbol given the past. We write this out to be careful. Now we're just going to condition on finite length pasts and put the limit out front to give well-defined quantities or obviously well-defined quantities. This uncertainty next symbol given the past as we know we're talking about block entropies. We can write it down as the two-point slope or the difference between two block entropies. A block of length L plus 1 and a block of length L. That's just an information identity. Then we can shift these guys around in particular this one because the stationarity up so this first symbol is time zero. Those block entropies are the same if I shift it forward one. Then I just substitute that back in here. So now remember this kind of little rule here. If to get from the block entropies to the conditional there's a variable that's left out. That's the one that is conditioning. For conditioning on this block so x0 doesn't show up here. What I've done now I've shifted this up so it's so this goes from zero back to minus L plus 1. This one goes from zero back to minus L. Now x sub L minus 1 is left out therefore by that kind of heuristic but it's just an information identity. I can... the difference between these two blocks is this conditional entropy. The uncertainty in sort of the previous the earliest variable given the following L minus 1 symbols. Okay so well I'm just going to get rid of these L's here. I'm just going to shift them by L so now I'm back down to shifting x L minus 1 to x at time minus 1. Shifting everything here stationarity allows me to do that but that is of course just when I take the limit that is the uncertainty in the previous symbol given the future. Or again if I write it in terms of y all I did is I've recovered exactly the definition of h mu minus applied to the reverse process. So just a little index gymnastics. Right I start with this variable conditioning on this past. I write in terms of two block things. I shift the block over and now I'm looking at this random variable here is free and I'm conditioning on its future. That's all that's happened. I just shifted things. Okay so the result is that entropy rates the same. Now does the amount of information shared between the past and the future depend on the scan direction. We talked about that. They're the same and if you remember excess entropy is this past future mutual information and the idea what the proof is pretty straightforward it's just that the mutual information is symmetric in its variables. So here's sort of going through a little more pedantically that observation. So excess entropy of the forward process is a mutual information between the past and the future. Okay let me be careful I'm going to go into the past into the future length l and pull the limit out here. So this is actually a well-defined thing. Well we know that the mutual information just with its definition I can swap the order of the variables same symmetric so I move this past over to here and this future over to here right but by the definition of the y variable this future is y's past and that x is past is y's future. Again indexing gymnastics take the limits then that's the mutual information between y's past and y's future was just the excess entropy of the past process. This isn't deep it's just gymnastics with the indices. Keeping track in your head whose time direction you're talking about. Okay so entropy rate excess entropy are symmetric in time and change of scan direction and if there was some statistical temporal asymmetry you couldn't use them to detect this so now does the stored information differ and to do this to answer this question right the stored information the information we need to store from the past to do optimal prediction at rate h mu. To talk about that chain possibly changing we need to talk about the epsilon machine of the reverse process. Okay so first just notation again mostly just putting pluses and minuses on things we're familiar with so we had this by infinite chain random variables we're scanning with the index increasing and now the causal states the the analysis we get out of that by look using the predictive equivalence relation we're going to call the result the forward epsilon machine and plus when we have this now just rewritten equivalence relation I'm going to call it twiddle plus to make the distinction right we're grouping histories using the epsilon function we group them so that the futures look the same we're going from the given forward process and then sort of modding out by this equivalence relation the resulting equivalence classes are the forward causal states and we have all of our various complexity measures I have the forward epsilon machine there are things we can calculate right the state averaged uncertainty in the next symbol that's the entropy rate well that would have been h mu plus but we just argued that's the same thing so drop the plus that's we have the statistical complexity you know something that the Shannon information in the causal state distribution with the forward states okay and then we have the difference between the state information and the observed mutual information member that was this mystery wedge which will come back to several times and even next week is going to be pretty much analysis of this this what we call the crypticity and there's a forward crypticity it was that that funny thing in the information diagrams where it was the given a future our uncertainty in the current state in this case the current forward state okay similarly for the reverse scan now we have total minus so we have this we're grouping futures to retro-dict the past right we put futures together when they gave us gives the same reverse morph I just wrote it out here right so this is a future we're conditioning two distinct futures such that the distribution of our pasts is the same and the result or these equivalence classes under that equivalence relation the reverse causal states and we also have you know the transition matrices we call it just like we did with the forward process but scanning the opposite direction so that's the worst epsilon machine is m minus and we have our quantities so we have a what you might call the retro-dictive entry rate we're using futures to predict the past and certainly the next symbol scanning in the reverse direction but that's h mu we have the size of the reverse epsilon machine c mu minus just the Shannon information over the reverse causal states and similarly we have a reverse crypticity and all these quantities are going to come back and help us kind of get some sense of how to describe a process in a kind of time agnostic way so right so this is the uncertainty in the reverse causal states given the past okay so to try to put these things together hopefully to make some kind of sense we have our favorite information diagram roadmap so this will be the past so first we'll just talk about the forward process so this is all familiar except I'm relabeling things by putting pluses okay then we have right so past the size of the past is in the information diagram it's actually entropy of the past then I just put in the yellow circle that's a subset of the past this is the state information right the causal the forward causal states are defined in terms of the past so whatever information they contain is part of the past okay and then the size of that is the forward statistical complexity and of course we have the future and then for example we're doing prediction right we're sort of looking at the uncertainty in the future given the past so that's this red wedge here right again if you remember the way the information diagrams work I have the entire future and then there's that part of it which is determined by the past like subtract that out that's conditioning and what's left over is what's uncertain about the future given the past which I subtract out and then the overlap between the past and future is we well know that's the excess entropy right that's just this piece here the excess entropy is contained wholly in the future and wholly in the past it's shared it's what's common well if it's wholly contained in the past and future that means it's also contained inside the statistical complexity of the state complexity right so we can now think of that as the mutual information between my knowledge of the current forward state and the future I think we argued before this uncertainty in the future condition on the past actually is actually well behaved in particular we can we can condition on the causal states because they're basically proxies for knowing the past except sometimes they're finite when pasts would be an infinite set so that's handy and then we argued that oh well actually I was just looking one step ahead that would be the entropy rate in fact it factors due to stationarity into a product of of these single symbol conditional distributions conditioned on the current state which is then just a sum of each mood every step so this wedge is actually very well behaved here we know how it scales so that I've been a little bit improper here putting down these entropies of semi-infinite chains of random variables typically just infinite so the diagram is technically wrong but now actually this red part is scales well so we can think of this future uncertainty is actually once we know what causal state we're in every time we go to one step longer future we're just adding another h mu that's our net uncertainty so the way I think about this is that this red circle is that they're actually swaths here of area or size h mu so that's laminated or foliated every time I go from futures of length one length two length three I'm just adding on a new strip of area h mu okay so so that kind of gets rid of something that was actually kind of a technical bugaboo I mean it's intuitive what we meant but now we can be careful about that okay so now what about this okay so this is now here's the past and now we're going to take out the state information so that's this green wedge here and that's the size of which is uncertainty in the past given the forward causal state slightly strange quantity but there we have it the diagram puts it up there we have to acknowledge exists and we just finally have this last piece the mystery wedge what I was just calling the forward crypticity this is the state information minus the future so at least graphically this quantity makes sense maybe it's slightly bizarre to think oh what state of mind what forward causal state of mind given that I know the future so this there can be some uncertainty thinking same futures can lead from different causal states so there can be a some uncertainty so anyway so this wedge here that's this crypticity again when you think about it it's just the difference between seeing you the statistical complexity and E right that the handy interpretation of this if you don't look at this maybe sort of confusing conditional entropy it's just the difference between the internal state information and that past future mutual information it's kind of measure of how hidden the process is there's a kind of the superficial view just E was past future mutual information we got some statistics of the observed symbols now we've got a process that says a lot of internal structure has a lot of state information okay the crypticity controls that difference alright now just reverse everything and then we'll compare okay so now we are the why process so so we're now trying to retro-dict the past given the future so here's the future here's our past now we have the reverse causal states which are formed by grouping futures such that they do the same job at retro-dicting so so this purple circle is the represents the reverse causal states and the size of that is the statistical complexity we have this piece over here large why I've drawn it in green which is the uncertainty in the past given the future but that's exactly what why is trying to do it's trying to retro-dict its past so that makes sense that sort of the goal he hasn't changed it's symmetric but it we can also think of it as the mutual information between the past causal states reverse causal states and the past same argument we gave before shows that this green wedge once we have the causal states which I take the past and hack out that piece then I'm actually retro-dicting at the optimal rate and so the screen part is also foliated with as we go to longer and longer words with slices of size h mu this red wedge here for any kind of stochastic process there'll be some uncertainty in the futures given what past causal state we're in and then we also have the reverse crypticity so uncertainty in the current reverse causal state given the past I guess I keep flipping time and making sense back and forward time then coming back here but anyway all perfectly symmetric I just changed plus minus reverse the diagram okay so we're actually getting pretty close to starting to understand pretty much everything there is about stationary stochastic processes here we have this sort of epsilon machine view of the processes information diagram so just to kind of summarize we have e which is shared between the future and the past of the process we have the two different state informations here they both contain e the difference between the shared information in the past complexity or forward complexity that's controlled by the reverse crypticity and opposite statement for the forward crypticity and then we know how these things scale here so in particular we know that right if we sort of take the future and condition on knowing the forward causal states we know that these the purple and red they scale as LH mu we're doing optimal prediction once we have know what causal state we're in and the same thing is true by symmetric argument that the yellow and green wedges this piece here on the retro-addicting since we know the reverse causal state as long as they have that information then these two areas together are scaling with LH mu so there's a lot we understand about this picture information diagram picture of process so what I'm going to do is focus in on these two wedges here they're actually sort of critical to thinking about how how hidden processes are so these two remaining mystery wedges right this is one way to think about is in if we're doing in the predictive mode this is the so I'm trying to predict but then given that I know what current causal statement I'm uncertain about the past that led to me or if I'm in wise case I'm retro-addicting I'm uncertain about the futures that led to my current reverse causal state so the few things we can say about these just to kind of drill down a little bit just bring these two results together so so what what what are these quantities so let's go into the predictive mode well you can show that it actually scales linearly after we subtract off the the forward crypticity so basically this is just saying just relating the crypticity to how the crypticity and this other piece scale so we're trying to figure out what's going on out here well we've already defined this and we know that both of them together goes LH mu so it's just simple arithmetic to show that again so I was putting it the wrong side sorry sorry sorry so stop me this is the these are the forward states because they're part of the past okay right so I'm talking about this right here so we know that this piece here plus this piece here scales like this right send these two things up it just basically just trying to predict L steps ahead you can look at the the right the reverse causal states I can just take that out and then we get this scaling as H mu L okay so they're just simple consequences the same thing reverse now we're trying to retro-dict in some sense but then we're looking at the uncertainty in the futures that lead to the current causal state and same sort of scaling relationship here it basically goes linearly after we subtracted off the reverse crypticity same same argument so at least we know something about what's going on here it's not this wedge here we just analyzed and the one on this page the red are they behave well not a surprise because we know together they scale LH mu together they scales LH mu because in either case we have the forward over states that are giving us the context we need to do optimal prediction at error rate H mu okay so that's pretty good and we would sort of identified all of the pieces here maybe there's there are some issues about building up intuitions for these things but we've got the forward state complexity of a risk state complexity shared past future mutual information the crypticities correspond how hidden the process is the difference between the state complexity in E and then this outer wedge which in the past I've been denoting is some sort of ill-defined infinite semi-infant chain random variables this actually scales quite well so all of these quantities are really just questions about a joint process and their number of different kinds of joint processes we could consider one is just now that I have a presentation of a process they have states and the states there's some internal state process that the model implies so we could talk about the state you know observe sequence state symbol joint process where the realizations are just pairs of state next symbol state next symbol we just refine that so we have two other kinds of joint process well actually it's the same as before before we were doing this now we can talk about the joint process between forward states and symbols or reverse states and symbols but really we're now going to be asking questions about how the forward and reverse process are related going a little bit beyond the information diagram so really what we're talking about is this joint forward state reverse state symbol process where the events are current forward state current reverse state and symbol observed so how are we going to think about that so here's just some kind of a right roadmap or guidebook so this is what we have to deal with right these are the observed symbols and we make the distinction between the past the future everything up to this point has been talking about these forward causal states right so if I'm at this causal state at time one then this past is a member of its equivalence class same thing as three this past member it's equivalent class well so everything I've been doing works in reverse right if I'm now going this way in the reverse process looking at the reverse states I'm down here at time minus two then there's this particular future is in the equivalence class twiddle minus of that reverse causal state and so on but I'm stepping back everything we're asking about is a question about this lattice of random variables and that's also why it's a little bit complicated and confusing but if you get confused just write it down and figure out what you're conditioning on and what variables you're uncertain about and it'll become clear in this that you can even print this out from the slides and color things in as you're doing your homework just so you don't get confused right I mean I'm you can tell struggling a little bit to try to be careful what I'm saying forward and reverse it's a little bit tricky but if you lay it out here you think it's like a spatial lattice don't have to worry about time right it's just a spatial lattice in fact these could be configurations generated by some spatial system like us like the cellular time that we studied and then there wouldn't be this issue of forward and reverse it would be left or might scan and maybe some of the prejudices we have about time wouldn't be confusing so for example just so the exercises I'm suggesting in this so if we have some quantity like this I know my current forward causal state and I'm uncertain in the future well that's this is the canonical quantity for prediction error right I have my current causal state and then there's some future leading from it same thing in when I know the the reverse causal states and I'm trying to retro-dict the past current causal state trying to retro-dict the past and I can do strange things like try to retro-dict the past given the current forward state at least on this diagram again just a set of random variables without this sort of semantic interest and confusion that we get same thing using reverse causal states to try to predict the future okay why would you do such a thing well it turns out these quantities are related to the crypticity so that's why we need to sort of lay them out well in fact to be more correct about that right the forward crypticity was the uncertainty in the current causal state given the future so there's some particular future you see and now given that knowledge how uncertain and am I in the forward causal state reverse crypticity is uncertainty in the current reverse state that I got to this way using futures in the past uncertainty with past okay and so on I mean you can ask all sorts of questions that we're gonna end up today not talking about the observed symbols at all but in fact talking about how the forward causal states relate to the reverse causal states and that's going to be a key step that we're going to develop on Thursday to give what is basically a time agnostic picture of a stochastic process and let's just calculate all sorts of things okay so sort of one consequence of all of this perhaps at this point it seems like we're flogging a dead horse is that there there can be statistically irreversible process and namely those with different statistical complexities so just reviewed that the entropy rates are the same in both scan directions past future mutual information excess entropy necessarily by definition is time symmetric but the stored information statistical complexity can be different so now how are we going to analyze this so so I gave an example before today we want to actually calculate this that the epsilon machines needn't be time symmetric before reverse machines needn't be the same I say I just changed arrows from plus and minus I just change the forward reverse arrows for four inverse machines if the machines aren't the same then statistical complexity need not need not be time symmetric although the statistical complexity is just a number so this is the more detailed comparison you want to do here I mean states aren't the same or the transitions are the same or both here it's just a scaler so you can have a large machine with low statistical complexity and a small machine with for it relatively high complexity it's numbers could be equal so there's a little bit of but it's a useful cut quantitative way of talking about identifying irreversibility so the example I gave was this miss a ravage parameter setting in the logistic map you know remember miss a ravage parameters logistic map give parameter settings where the distributions on the interval are nice and well behaved in particular the iterates of the maximum are periodic therefore we only have a finite number of delta functions so they're nice well behaved unlike the typical parameter setting in the chaotic logistic map where you have an infinite number of delta functions anyway so this particular one it's where the iterate of the maximum is periodic after four iterates and then you can use that constraint between the logistic map to calculate exactly what the parameter is use a binary generating partition and this is the example I showed before in the quote forward direction the direction which I'm iterating the map I get four causal states no transient states is about point eight bits of per symbol per step of entropy rate and about 1.8 bits of stored information in reverse we had three recurrent causal states reverse causal states and one transient state just a numerical check our theorem the entropy rates the same about point eight bits of uncertainty per reverse step but then the statistical complexity is lower about a third of a bit lower for this okay so there can be you know this is proof by just one example there can be template symmetry of course one question is how typical is that and I think Ryan will talk about how often this occurs it is very very frequent if you randomly pick finite memory processes out of a batch of hidden Markov models it becomes quite typical I can't help but point out that this is often ignored in statistical analysis so so so sorry so so this I mean this is true for stationary process so even if we're in quit statistical equilibrium there's a lot of discussion in physics about irreversibility in the second law and that kind of thing having but that's a different set of questions about the difference between microscopic time reversibility of the of the dynamics and macroscopic properties and typically people are talking about relaxing down to a thermodynamic equilibrium in some sense the process we're talking about are already in their equilibrium they're called non-equilibrium steady states and even there we find that there's this asymmetry in time so it's similar but not the same thing as the discussion of irreversibility in thermodynamics side comment there so anyway so what we've got here is that the stored information statistical complexity that we need to do optimal prediction or retradiction depends on the scan direction now the asymptotic uncertainty we have entropy rates the same the amount of information that's being shared between the past and the future or the future in the past are the same but the amount of effort the machine we need to reach the level of optimal prediction and to see that amount of shared information that depends on what direction we're scanning okay so this this notion of of reversibility that's involved in so the physics physical discussion of thermodynamic irreversibility so is the contrast as I was just saying between thinking about sort of microscopic detail dynamics so here you should be thinking about a box of gas molecules banging around the picture you had hard sphere you know elastic collisions and the physics of that is if I change t to minus t I get essentially the same behavior different in detail because I just reversed time but these macroscopically it's the same right so or the way it says the physics is invariant under time reversal so what would that mean here so if we take the observed sequence to be a real microscopic realization in this sense what would it mean to reverse time well so if we saw some word of length n like this then reversing time is now just to note that if I start with a word and I reverse its direction well that's going to be a new set of symbols over this y random variable where I'm just going to swap the ordering here start from zero go to n now this is going to go from zero to n in the y variable or in the x variable variable from n back to zero on x okay so we can say that define a microscopically in this sense microscopically reversible processes when the forward and reverse processes are the same so what does that mean it means my process is defined in terms of what words occur and their probability so I mean both of those things are equal so so if we start with say the forward process and we've got some word that occurs in forward time then when we reverse it that should occur in the first process but then it should also be its probability should be the same under the forward distribution okay so here I'm using the forward processes distribution to compare I see a particular sequence this way with probably you know point to a reverse the sequence that also has to occur in the forward process with probably point to then it would be reversible yes time semantics again okay that's a very detailed view fact you'll see it's kind of very restrictive notion of reversibility what we're going to mostly be talking about is this kind of higher level thing namely talking about process that causally reversible what we mean is just that these forward and reverse stored informations or statistical complexity of the same it's a little bit of a coarse-graining I mean what we really should be talking about is whether the two machines are the same but just so we have something quantitative we're going to define causal reversibility is one that statistical complexity is the same so note that this previous notion of microscopic reversibility implies causal reversibility namely if the forward reverse processes are the same well then their machines are the same machines are the same then their statistical complexities are the same the opposite isn't true how do you mean different processes yeah exactly no this is not deep yeah right there's no way to get right exactly yes yeah where's we're sort of you know I'm trying to emphasize in here this this measure of causal irreversibility just the difference between the forward and reverse statistical complexities and it's just a nice crude measure of a time asymmetry in the case of that miserae vich process on the logistic map that was about a third of a bit different models if you will of forward and reverse time you needed to do optimal prediction one consequence is if you have you know process that's causally irreversible so we have positive number here or negative just not zero means that complexities aren't the same if the complexities are different then the machines must be different the machines are different than the processes are different so you can't go from causal irreversibility to microscopic reversibility there must be microscopically irreversible so so so anyway this this this notion is much coarser okay so now I wanted to show how the crypticity is sort of controlling reversibility or irreversibility kind of using this language to talk about there was we just had so again so our forward crypticity is uncertainty in the current state given the future reverse is uncertainty in the past causal state given the past okay so if this causal irreversible if we have a causally irreversible process that means that these crypticities are different so how do we see that well either crypticity is just a difference between its forward reverse stored information and E so E is symmetric in time so the same across these two things thus if it's causally irreversible the statistical complexities aren't the same but since E is the same that means the the crypticities are different so I'm just trying to emphasize that the crypticity is controlling this notion of irreversibility we can also we've been thinking about trying to do retradiction with forward states right that showed up before in the information diagram right there was that wedge that was the hacked out of the past the forward state information okay and we sort of showed that that was scaling this way we just scale just like the the sort of past predictability scales is H mu L when we had the reverse causal states but we just had to subtract off the forward crypticity but then that is again just the difference between C mu and E going in the forward direction so so what you see here is that if this wedge is positive that can only occur well typical as I argued then given this relationship there's a constraint on the lengths that we can be looking at so just kind of work this through and it's really just that this relationship is can only hold when L is larger than so the largest smallest integer larger than the ratio the crypticity over the entropy rate and this is something called the cryptic order so in other words we need to we start doing good retradiction with the forward states when we go L symbols into the past so next week Ryan is going to talk about how to calculate that will give you some more interpretation of this here I'm just kind of laying out that an interesting question here I can a constraint on on this retradiction distance so another observation is when we take now think of the information diagram there was that outside wedge right the past given the causal state information forward causal state and we add back in the statistical its statistical complexity well we know that thing is has an asymptote that eventually starts scaling like E plus H mu L right that was just the from the information diagram so we have this quantity in fact I should probably be instead of equal here what I should be saying is at some L will start looking like so think back to the information theory in the winter quarter where we were thinking about how the block entropy we analyze how the block entropy came up to that linear asymptote but now we're also concluding that the block state entropy has a similar asymptote just by looking at the kind of information diagram motivated scaling property and that's going to be a critical thing next we can talking about the crypticity and what this sort of cryptic order is and and also the Markov order of processes to write them the block entropy when it finally reached that E plus H mu L asymptote it was sort of a an indicator that we could do good prediction and the L's that with that happened where associate at least an order our markup processes L would be are in that case so we're going to look at other block entropy scaling so block state entropy scaling and state block entropy scaling as a quantitative way to to look at how this crypticity comes about well same sort of comments maybe no surprise going in the reverse direction I won't go through the whole argument here right so now we're trying to use the reverse causal states to predict the futures that led to them right there's some ambiguity in any stochastic process about what particular sequences would lead to a reverse causal state same sort of scaling argument and we have now also have a similar kind of cryptic order but going in the associated with how well the reverse causal states predict the futures that are associated with them and they have a same the same kind of scaling so their block so future block past causal state entropy scaling has an asymptote E plus H mu L and you can also argue that the these two block state interpeses are the same just by a simple time shift by reversing thinking about the reverse process probably should use the the y variables here so so there's something about even though there there are distinctions between the forward and reverse epsilon machines there is some scaling here when we're looking at these block state interpes that becomes equal they have the same linear asymptotes maybe it's a little bit not too surprising because the entropy rate and the excess entry were both symmetric in time so that constraints if I'm looking at sort of block entropy is going in forward time or block entropy is going in reverse time their asymptotes are going to be the same so okay just a few observations about the basically the crypticity and how we're going to study it but now let's get down to actually doing some calculations right I've been talking about the well calculations we'll do one calculation first some examples of what happens when you scan forward and scan reverse and do epsilon machine analysis so we're just going to go through our favorite examples here just kind of a snapshot of three or four cases so we have even process and it's familiar epsilon machine two states if I see one I must see one I'm just sort of state some properties that we'll calculate next week this is this is not a cryptic process what's that mean well one way to think about is that senior year the same past future mutual information and state mutual information and state complexity are the same okay so now what happens when we reverse this so we'll see why once we go through the the calculation method but I just want to give you kind of a survey of the different cases that can happen so it turns out the reverse epsilon machine if I see one I must see one in fact what I've done here and I'm putting in the the branching probability from state a parameterizing it I'm carrying through the calculation including that parameter so these are sort of exactly same even as we vary P from 50 50 to whatever right so the reverse machine is the same since it's the same while it's also not cryptic it's excess entropy and statistical complexities are the same it's microscopic irreversible right because the machines are the same therefore the processes are exactly the same so if I see a word in forward time and reverse it and ask under the forward distribution do I see that word and what's its probability it'll be equal or the other way says if I see a word this way and has a problem I'm going to see it scan the other direction with the same probability causally reversible completely symmetric in time okay so that's one case maybe in some ways oddly the even process or the least interesting it's kind of oh okay it's the same not temporarily asymmetric golden mean process this one is cryptic takes a little calculation to see that but we'll just take that as a statement now what happens when we reverse the golden mean now you know your intuitions might be it just has this one restriction no consecutive zeros and that shouldn't change if I look at the other direction no consecutive zeros that would be correct although the golden mean turns out to be a little bit subtle and we'll go through the calculation for that pardon me you will go through the calculation for this in the homework so if I reverse it I get the same thing like this okay fine see a zero I must see a one so it kind of like the even process although it has this crypticity property there's a difference between the mutual information the observed information and the state information it still is causally irreversible and also microscopically reversible because the epsilon machines are the same so the crypticity doesn't completely drive this irreversibility right I just the previous example hit zero crypticity this one with positive crypticity and they're both going to trivially reversible both microscopic and causally reversible but we'll see when we calculate the reverse machine on Thursday for this I think the golden mean case is interesting and distinct from the even simple case periodic period three process ABC so ABC ABC ABC well okay we know how to reverse this one right ACB ACB ACB that's fine it's not cryptic right E is log base two of three period three state information uniform probability that's just log base two of three bits Shannon entropy so on cryptic but it's microscopically irreversible why well if I'm looking at the forward process I can see a B but under the reverse process I never see a B I see B a I see AC but I never see a B yeah oh wait let me think hold on hmm yeah that's one of those general customs of getting into trouble let me let me think about that I believe that's true but it kind of intuitively kind of seems like you need an alphabet size larger than two to get a microscopic irreversibility but because if you just have like zeros and ones it seems like the only types of restrictions you can say I don't see consecutive numbers right but the trick is going to longer words in this yeah right yeah yeah all right so so right so imagine that you coded a in the binary way and be in a by and seen a binary way then it would be clear at the cluster level they wouldn't repeat and therefore as long as you didn't go to malls 0 0 0 0 0 if you made them distinct I think good exercise excellent question good exercise yeah but yeah but but you have to look at my longer words than just single ones random insertion process hmm so this one go from a bias coin flip and I'm going to stick a one in back here with probability one or B we do another coin flip so we're randomly inserting a bit 0 1 with probability q and then turning to a this guy's cryptic anyone want to guess what the reverse machine looks like I wouldn't say it's obvious I wouldn't say it's obvious so right well we'll go through that I hope maybe I should make this move ahead here because I want to show you the calculation steps so here it's four states and I've actually done it you know in full generality with the bias pq parameters here carried over exactly so the whole family processes distinct so four causal states and it's a little harder like why is that not the same easy to see the ABC case so and then you can of course calculate the because I've given this to you you can go ahead and calculate the state distribution for the forward process and the first process in closed form if you plug in them both being fair coin flips then you end up with these numbers so forward and reverse entropy rates of three quarter three-fifths bits per symbol point six bits forward about 1.5 bits reverse about 1.8 bits so we have a lot of third of a bit of calls like your reversibility and again they're not the same therefore and that process that microscopic reversible okay one more example I think call this the butterfly process there's a whole zoo of these things these are all built into the competition mechanics in python library the machine library I want to play with him five states over seven letter alphabet against not something one's going to typically do by hand although we did okay so what's the reverse machine look like again completely it's really different so there you go so this I mean but that that's why having some some computational tools helps with this sort of thing and again you can go through and calculate the previous the forward example had uniform probability on the states so log base to five stored bits of information and do this one the reverse machine about 2.7 bits so we have about 0.4 half a bit difference in causal stored information it's something we call to cryptic to is the order number steps you have to go backwards from your forward causal state to start doing good retradiction but anyway it's about point three bits forward to crypticity actually if I know this number then of course I can calculate e because e the crypticity is the difference between these two things and I just calculated state complexities are easy to calculate given the presentation I calculate the crypticity then I just subtract that off and I now have a nice analytical estimate of calculation e and then from that since the reverse crypticity is the difference between the reverse state information in e I can calculate the reverse crypticity is about 0.7 bits so these things are all related in this case we just had to calculate this quantity first which we get to with Ryan anyway this is just a you know a list of cases even golemene the ABC process random insertion process and butterfly and basically anything well with some constraints can happen it can be microscopically and causally irreversible and not cryptic that or cryptic microscopically irreversible causally reversible not cryptic and so on so lots of things can happen and we'll show you some more interesting examples as we go forward let's see if I can ten minutes get through this so let's try make make make a Boy Scout try here okay so there's sort of two questions in all this how do we calculate the reverse machine if I give you the forward machine right I mean I can give you a machine it generates the forward process presumably you can reconstruct by looking at the process itself and get the reverse machine but is there a direct way of doing this also we it will turn out that we would like to relate the forward state process with the reverse state process that's going to be key to calculating a number of quantities including the excess entropy in a closed form so so the answer is pretty straightforward in sense we did all the work last week with with mixed state presentations what we're going to do is start with the forward machine reverse the arrows great that was easy problem is that can be non-unifiler well okay we reverse the arrows we actually have to renormalize the machine so we have a stochastic process right and then but that can be non-unifiler so then we use the mixed state operator to calculate the mixed state presentation of the normalized reverse machine and then like I was emphasizing before we then have to the mixed states may not be the causal states we have to minimize sometimes okay so we got about 10 minutes let's see if I can get through I don't think so but we'll try we'll try and then I'll maybe fix it on Thursday okay but just so you can kind of see it'll kind of wash over you in 10 minutes okay so the random noisy copy process that was the one of the examples we used last week we're talking about the mixed state presentation so it should be a little bit familiar this is actually going to fill in some of the numerical details so so we put the coin from a we go to B on a zero and C on a one and then in either case we always return back to a on the next time steps kind of in a noisy period to accept when we're coming back from C we flip another coin with bias Q come back to a coming back from B we generated zero okay so so we have three three states so that's that's the forward presentation three states and we can you know just directly right out from the diagram the two simple label transition matrices so we have the symbol label transition matrices and we can calculate the state probabilities asymptotic state probabilities fine so now what we're going to do is run the machine in reverse how do we do that well the first step is trivial we just reverse the arrows that's the same thing is essentially taking the transpose of the matrices okay but then we end up with the probabilities of transitions leaving a state not being a normalized distribution in other words the rows in the transition matrices aren't normalized so the trick to constructing a stochastic process for the reverse direction is to form this guy here so I took tilde or hat here I should say tilde is the is the transition matrix for the reverse process so what we do is right so that's the if in forward time we go from state r to r prime we want to form essentially the transpose but with one normalization it this is the probability that I started in the in forward presentation I'm starting in the next state and coming back to the previous state seeing a symbol so that's basically just the symbol label transition matrix matrices transposed but then I normalize by the state probabilities in the for those two states and then if I just want to look at the internal Markov chain process I just as usual sum over the observed symbols okay so that said here's what it looks like is the random noisy copy so now I oh yes okay so I so these two matrices are now the symbol label transition matrices for the reverse machine normalized that according to that calculation it turns out that the the the forward stationary distribution is the same as a reverse stationary distribution okay so that's the handy thing but now what we have and I've written it out here so now I've reversed the edges remember a going in the forward direction went to be in C well now I have these incoming arrows and I just recalculated them and saying now a now goes out on a zero to be and on a zero to see or on a one to see every calculated transition probably so they all sum to one going across the rows but it's now messed up because it's not new to feeler so that's where the mixed-state calculation comes in someone's getting a cell phone call okay so now okay so now we have so the previous presentation was the right random noisy copy forward direction but then reversed and renormalized so it's now a hidden mark off model again just reversing arrows doesn't allow that and now we're going to calculate the sort of mixed-state presentation and so they we're going to make this identification in this case it happens to work out that the reverse causal states reverse mixed states I should make but they happen to be the same in this case are the set of mixed states that are induced by seeing these words going in reverse time right and that's going to be the probability of the the our uncertainty in the forward states given that we've seen these various words okay so just like before with the mixed-state calculate where I'm looking no words words of length one words of length two that we see in the opposite direction okay or in the forward direction for the reverse machine okay and we'll get transient recurrent states and we're just going to ignore the transient states just to keep our lives simple so we start with not seeing anything lambda well in that case it just means that we'll have some mixed-state that is the pi of the forward process okay we calculated that out before that's this guy okay then we calculate the mixed-state now having seen a zero or having seen a one so here are the expressions for that and also the interpretation right so if if and here's this language thing my uncertainty in the current causal state given that the next symbol was a zero or that's the same thing is using the reverse presentation and having seen a zero right so then and these are just expressions I'm just plugging the matrices we just calculated into the last week's mixed-state update right we're updating the mixed-state having seen a zero so we just calculate that out and here's the once all the mess settles out we get this probability vector here and this piece down here is just normalizing it so it's actually a distribution and same thing on having seen a one work this all out and you get that vector so this that's the mixed-state after having seen a one almost simple and so on so we just keep going so we keep extending with one more symbol to longer and longer words and then we want us we then we're curious again we're thinking of this almost like the parse tree we have these two mixed states we just calculated and we want to know where this one goes if I see a zero one and where this one goes when I see a zero one so that means we calculate out these mixed states having seen zero zero zero one one zero one one again and just kind of plugging things in here we just have these same expression or we could force collapse these two symbol able transition matrices to be the symbol able transition matrix on a word of length two so let me not go through that we just we're just plugging in here and then you keep doing this going to longer and longer words until the mixed states start repeating okay and if it's a good day or the mixed state gods are smiling on you that'll actually stop after a finite number of words but it needn't it's kind of a key point sometimes these things will blow up so so in this particular case I mean I'll just say it you can actually verify it with a camp you stuff or if you want to do it by hand for the for the random noisy copy these two mixed states are the same on these two different words conditioning on those two different words so the machine sort of folds back on itself and if we start seeing the same mixed state that's the same condition we need to predict the future well therefore we identify those and they basically are proxies for a future more sort of the same and we we identify those states so so we end up with if you do the calculations you end up with three reverse states call them def to be different and you can say well if you're careful to track through the calculation what the mixed states correspond to in the original machine you can see that again I'm leaving out the details that the reverse causal state D which we got turns out through that seeing that word is corresponds to this mixed state do the calculation but then that means that D is really associated with being in the forward state forward causal state C okay same thing with E corresponds to a and it's really just this new state F that's a mixture you're confused about whether you're in states B or C okay so and then we go through and calculate the transition probabilities so running out of time here and more eyes are poking in right we want to know if I've seen zero zero and I go to see word one zero zero how the mixed states get updated and we gave it the transition probably also those mixed state resource notes go over the transition the expression for the transition probability to go from one mixed state to another and in this case randomize a copy we can do this in closed form so and then we do this after we have all the unique mixed states namely unique reverse states you just go and look pairwise and calculate the transition probably tedious of course but then the result is this so again carrying everything through in closed form we end up with you know three reverse states but now new transition probabilities so it's similar in structure to the forward process but different in the particulars in terms of the actual transition probabilities and if you look at this we have zero one leaving here we have zero one leaving here one leaving here it's it's a unifier presentation and we go calculate entropy rates and statistical complexities so they all make sense so as I hit that beginning the mixed state turned out to be the causal states in this case and that's not always true so so this happens to be the calculation we just went through it is the epsilon machine of the worst process when you're calculating the mixed states they're not necessarily the causal states all the examples up to this point that is true so that's nice what often happens but it's in general not the case but the mixed state presentation is a unifier presentation which if you remember way back to we're talking about the epsilon machine optimalities if you have a unifier presentation it means it predicts the process it describes the process and in addition its partition of pasts is a refinement of the causal state partition the causal state partition being coarser and having fewer cells so flip that around so we have this in a few other presentation so sometimes we have more states mixed states that are necessary so you have to minimize that's the punch line here and when you're doing these calculations it just better to get rid of the transient states if you're starting to calculate mixed states you end up with a bunch of transients most of the things we're calculating really just refer to the recurrent mixed states or recurrent causal states so you could actually just get rid of that and it will make your the matrices and you're working with smaller so so then what you do is just then get rid of those transient states and then simply group states together in terms of the probabilistic equivalence which would then minimize them group the mixed states together and then you'll get the causal states so so that's sort of the end of the process calculate out your mixed states it might be too big the set of states might be too big may have lots of transient states dump the transient states and then minimize the recurrent states using essentially the predictable equivalence relation you merge states when they lead to the same distribution of futures now there's a little bit of a trick in this if you strip the transients and calculate the mixed state presentation that can also minimize which actually now that we have the competition mechanics of Python software anyway we have software that does this for you this is an easy next step to do just do it again okay in fact we're now sort of working on a kind of a conjecture based on some recent work in in the regular theoretical computer science and automata theory that on the minimization of finite state machines that if you reverse and calculate the analog of the what I call the narrowed equivalent states and then reverse again then you end up with a minimal machine so in this case we're currently working with in a sense a much more general minimization algorithm for any presentation calculate the mixed states reverse it mixed states reverse and then mixed state again and we believe that will give you the epsilon machine the causal states but stay tuned for all the cases we tried that works we have improved it yet so anyway one of the key things that we want to calculate I mean okay so so last lecture we were talking about the forward process and the reverse process as if they were sort of separate processes but in fact when we sort of pointed out in that joint process lattice we have questions that relate to the forward states and reverse states for example so one of the things we'd like to calculate is in a sense this joint forward or state distribution and it turns out the way we were calculating the mixed states we were actually tracking all of this so that's going to be the theme today start thinking taking a step back and not just realizing that the forward and reverse processes can be different in terms of their statistical complexity two different representations but actually we're going to go in the direction of a sort of unitary view of any stochastic process that in a sense is sort of time agnostic so rather than having this time always increases left to right or right to left or whatever past future there will end up today with a way of just talking about joint causal states for the past and future okay so so what we were doing in calculating the mixed states right we started say with epsilon presentation the forward process we reversed it normalized things to get a stochastic machine typically non-unofiler and that we calculated the mixed states to get the reverse machine and then we want the epsilon machine we maybe minimize okay in fact this minimization process I just mentioned the conjecture is actually going around this loop once starting with any presentation not even the epsilon machine going around once and coming back with the epsilon machine before process anyway so so this is the sort of commuting diagram that describes and what we want to do is is as we're tracking right so we have the forward causal states we reverse this normalize it we still have this the presentation states here and then we calculate the mixed states which are distributions of the over these states well in that calculation we can track how the reverse causal states which are now distributions of the forward causal states how they're related so if you think back to and I'm not expecting you to remember especially after two days the details of the calculation that we did for the random noisy copy process but let me just restate things actually as we did this we were keeping track of what we wanted what we need to keep track of to talk about the joint causal state process here okay so again our current causal states just look at causal states that have asymptotic probability it's positive if you remember well the forward random noisy copy process had states ABC three of them in this epsilon machine presentation we calculated that we had three reverse causal states and then last lecture we were talking about how to calculate these mixed states from those but these mixed states were distributions over the original presentation states so in fact given that I was in reverse causal state D namely this guy that really is the conditional distribution of being in reverse state D and asking oh what forward state could I be in there's the distribution there so in this case what we calculated was for this example that basically states C and D are synonymous or we're in the same same condition of knowledge about the future when we end up in state D scanning reverse or scanning forward we end up in forward state C okay so there's a one-to-one mapping same thing here so I'm just actually rewriting the calculations we did and just reinterpreting them same thing in E it was this mixed state so so there was no uncertainty if I if I if I'm in look at the joint process lattice if I'm in reverse state E I know that I'm in forward state a so what was interesting was state F there's some ambiguity here so if I'm in in reverse state F I'm there's some uncertainty I could sort of look up in my lattice and see sometime the forward process will be in state B or C so so so in a sense trying to think about this joint process we've already done the calculation the mixed states are these conditional distributions so if we want to get the joint distribution for example then we just calculate the asymptotic state probability for the reverse machine multiplied times this and that gives us the joint that's just right basic probability identity so we know how to track these things so that's good so now what we'll do is use this to move in the direction of this more general time agnostic view of a process