 Okay, so just some notes on the homework. Of course, it was due today and I usually maybe like you Let's kind of look at the homework solutions sort of a day before it's due and then I started writing some more Code that may be helpful. I sent a message out on the class mail list. So there's a Little lab parse tree and morph plotting some routines that help a little bit show you how to use sage plotting This one produces a binary tree You don't have to rewrite that function so much as well Maybe you want to change some things, but you just allocate a tree using the sage digraph function Think of this mostly as boilerplate I did this is the positions of the tree nodes So it looks a little more nicely laid out But again, you don't have to worry about that this function calculates all of it And then basically what you've allocated the tree you just pass in the digraph The depth and then this kind of auxiliary array, which is the node positions Then you can set up how it gets plotted which which Tree node is the root number zero. I pass in the positions for all the nodes the orientation I don't know my Although we call it a tree often trees go down. You should call it roots And there are other things you can change here how you color whatnot And Then finally you show it that's the graphic command to put it up on the screen and then you get something like this And you can change the size of it with this last You can change those X width and heights if you want so there you go labeled edges I still am having some font problem with my browser. So I use exclamation instead of vertical bar So symbol probability It's one of those obscure things somewhere deep down inside my Defaults so there you go kind of helps Yeah Are you getting that too, yeah typical font things. Oh, yeah, I'm using Firefox too, so I Dash so New notation exclamation means conditioned on it. It's a little more dramatic. So why not and Then then then maybe it kind of more interesting than than doing a full binary tree That's just gonna be helpful for thinking. I made a function Where you can pass in a tree and a word to just put in words and it'll put the path in just like I was kind of describing Last Tuesday and then build up a tree. So so that so this function same overall boilerplate You have to allocate the directed graph from sage Set up this this dictionary or array But that says all boilerplate you can set depth and then I just pass in here The tree the position thing and the depth I want and then the depth has to agree with the word length And it just goes through and allocates tree nodes and edges incrementally so Then you plot it basically using the same plot command as the previous links not really any different So and in this case what I did is I write so that's a length for binary strings There would be 16 of them and I just pruned out the ones that had consecutive zeros emulating what the Word distribution would be like for the gold amine process. So and then lo and behold you get this guy Like I guess I should make the smaller Well, yeah, I'd get this guy here so you can see the various morphs and so on It'll be clear on your display. This is a bit correct because it's the video projector Anyway, so so that's that feel free to modify things I did Just as a reminder in fact the way I programmed all this up in sage If you if there's some function you're interested in like the plotting function, right? So I allocated some object directed graph and then in object oriented programming Every object has a whole list of functions so-called methods that operate on that data structure So there's one that sage attaches to a digraph called plot You just put a question mark here that execute that code cell and then boom you get tons of documentation sage and Python see pretty well documented and it goes through all the different keyword and positional parameters in gives you in the documentation gives you example code So of course the way we do coding these days is by example by stealing stuff and adapting so and then same thing down here for The digraph what's a digraph when what can I do to it? Explains all of that So sage is very handy that way you can query the documentation right as you're working So I was just kind of going back and forth steal some of the example code modify modify change So and then they've lots of example code And of course there they're whole you know web pages that are more systematic that document You know Python to seven sage not plot live on all that sort of thing. So anyway There you go. There's certainly many more little helper routines one could write for this But hopefully this will get you started and there is a question on the homework 12 assigned today That I give you a word distribution and you're supposed to figure out what it is and interpret it so that's part of the Theme today, I mean we've done all this essentially heavy lifting both sort of formally and algorithmically and now we're sort of embarking on a series of lectures that Maybe emphasize what it means to discover a new property. So we'll be talking about the day Maybe it's you know, it'll be kind of snippets different observations At the end of Thursday, we're gonna end up with yet another puzzle Hopefully counter-intuitive but interesting about time a cemetery that's going to lead us to Generalize yet again. So next week there'll be Sort of a new concept of state generalized notion of causal states will use so called Nick states But first did this week is kind of oh, what did we learn? What did we learn? so And of course one of the first things is now that we've extracted this representation or The lingo is the presentation of a process from the behaviors There's some questions about What does it allow us to measure measure? So that's what I want to talk about today that gives us a Way to make some connections with all the information theory that we were doing and it's not like we haven't talked about this before But now The question Today is interpretation and trying to draw some consequences after doing all this work in the previous weeks so from the information theory or at least how we adapted information theory to complex processes we had this list of informational measures right all starting from Shannon's notion of surprise and Sort of the canonical measure is how random a process is how unpredictable the entropy rate But then if you remember we had this way of looking at the block entropies and basically concocting through a systematic way of taking discrete word-length derivatives of Coming up with other information measures That right so we had this geometric interpretation and we went from block entropy to the entropy rate And this was like a slope of the block entropy. It's the growth rate geometrically of the block entropy And then we look at sufficiently long blocks and we actually pull out the intrinsic randomness of a process But then we also had the predictability gain and so on and then after taking derivative as we looked at the Constance of integration as we came back down the hierarchy Ran into this excess entropy which had this very nice expression in terms of the mutual information between the past and future How much information does the past share with the future or since it's a mutual information and that statistic is symmetric It's also the amount of information the future has about the past We have the total predictability or some generalized measure of redundancy how much of a raw measurement is Truly randomness and how much of it is sort of hidden structure sort of Pretending to be Surprised but in fact if you do the right kind of thing you'll see it's something you can compress out And then we also had the transient information which Let us to think about how an observer makes measurements and comes to know the internal state of a process so So this was all in terms of the information theory and block entropies And then entropy hierarchy so there's a question Putting last quarter in the context of last three weeks. How are these related to the for example the statistical complexity? And maybe a practical question is can we estimate these things if we have the Epsilon machine? How can we calculate them? And this is going to lead us to some curious observations we will come back and Answer both of these questions, but it's actually going to take remarkably long period of time But hopefully along the way they're interesting observations about What it means to be a complex process new new things we discover about them Okay, but let's start at at the top with more or less the simplest sort of this unifying theme of how random a process is right so we have the original definition of the entropy rate given the description of the process in terms of Sequences and their probabilities. I've just been kind of formal here. We could Define that in terms of the limit of the block entropy divided by L So this is like the information Shannon information per symbol We had other ways of thinking about that as looking at enough history and looking at the uncertainty in the next symbol based on infinite pasts And I did mention but we'll talk about this a little bit more You can also get this directly from the Epsilon machine so The entropy of the process is going to be the entropy using the causal states or Epsilon machine and We have this explicit solution We're going to use that of course was due to we can use this due to unifilarity In fact, that's one of the things I want to talk about what happens if we have other models of a given process There aren't you know feeler, but anyway, so the entropy rate given the Epsilon machine is yet again this State average now causal state averaged branching uncertainty So we just go to each state We've calculated the Probability of the causal states if they're any transient states those will have zero probability asymptotically So this is really just going to be positive over the recurrent states You go to each state and we look at how uncertain we are in The observing the next symbol Well since it's unifilar that's equivalent to uncertainty and going to the next state, right? So so we can use this formula Because as we proved last Thursday the Epsilon machine is unifilar. So this is one of the key Benefits of using Epsilon machine we have this closed form expression for the entropy rate in fact I think I said before that Kind of the kind of boils down to We basically have to use the Epsilon machine to calculate the entropy rate Simply because it's it's unifilar there might be other presentations like the pressing at rival models. They're also Classes that are unifilar. We can use those too, but the Epsilon machine can also so again, right? So unifilarity means that there's this mapping between The measured sequences and the internal state paths Basically one-to-one correspondence between observed sequences internal state paths that allow you to do this It's really what you're measuring here Is we're using the internal states and we're using the state-to-state branching uncertainty just like this is really this formula It's really just a modification of the entropy rate for a Markov chain that comes right out of Covarr and Thomas right out of Shannon So we can adapt that to a hidden process and we get this expression for it So So now I want to expand a little bit on this this issue of why we need unifilarity and The nice way to do that is to take one of our workhorse examples the simple non-unifilar source as another As a presentation of a process so Well Imagine that we have a process generated by this mechanism. So it has two internal states the states Branched to each other with fair probabilities. So at least the internal state sequence is a fair coin over AB But then what we see are Given by the zero one symbol labels. So If I'm in B, I will see a zero with probably half and go to a right we talked before well about This property if I give you the model and I tell you you've seen the zero you immediately know what the internal state is we're an A and Sort of like the golden mean process. I know if I'm in a I'm going to see a one So this is a no consecutive zeros process But after Having seen a zero being actually certain about what internal state I'm in as soon as I see a one My internal state uncertainty is almost perfect I'm in state a with probably half or in state B with probably half I suddenly have lost this so the observer has become unsynchronized to the States of this particular non unifiler presentation If that was the internal mechanism, then I wouldn't know what state I'm in until I saw another zero Well now if I see more and more ones this initial 50-50 uncertainty I had over the states more the probability moves over here. So actually after every successive one I've seen My expectation of the state distribution changes a little bit So for to do optimal prediction. I have to track The internal state distribution over these column non unifiler states Okay, so that's that's the puzzle there Still simple model So how are we going to calculate the entropy rate of this? None unifiler so there's some process we have and then I can tell you oh here This is the mechanism. Can I use this? You know knowledge at the output alphabets binary and I have these Symbol-label transition matrices. Can I use that knowledge to calculate the entropy rate? And it turns out to be remarkably more complicated. So what I want to talk about are three different methods None of which actually work, but they're reasonable first approaches And I can point you to papers in the literature that use these methods So this isn't my just drawing out straw man to make a point. Although it's kind of that so So let's see first thing is oh we could just look at the internal Markov chain like I said Okay, but that's a fair coin over a's and b's Okay, and that would be the Markov chain transition matrix and we have you know now just by observation calculate the left eigenvector The asymptotic state probabilities and I can you know just sort of naively just take the formula I put up there For a Markov chain now, and I just look at the state average state-to-state transition uncertainty well, so in each state I'm in each state with probably a half and I'm uncertain Well, I'm going to be an A or B with 5050 So my state uncertainty on the transition is is one bit and I'm in B with Probably half and then I see one bit of uncertainty So I get a half and a half so I get one bit per symbol output uncertainty So that should strike you as a pretty bad approximation Why because I know that there's a restriction in the observed sequence That's not captured here all sequences can occur. It's basically a fair coin Well, okay, so method B would say well Pretty clear from just looking at the original two-state non-unif either presentation that There is there's some restriction in the observed sequences. I don't see consecutive zero So I can write down a Markov chain that describes that restriction And then calculate its entropy rate Okay, so this is the golden mean again if I see a zero I must see a one And then when I'm in one I have equal branching to go either produce either one or a zero So we've done this ad nauseam. We know that this state occurs to probably two-thirds This state with probably one-third so that means two-thirds of the time I'm here and I have one bit of transition uncertainty one third of the time. I'm here But I know exactly where I'm going so there's no uncertainty in the future So two-thirds of the time I have one bit of uncertainty. So that's the entropy rate So maybe that's good description of this original simple non-unif either source We're just sort of Taking to get one restriction We're not for example tracking how After we've seen a zero and then see One one two one three ones how our guess at the internal state probability is changing That's kind of throwing all that away. So we assume this is just an approximation But certainly better than the fair coin of the internal state process and it's Sort of silly right what we're saying is we sort of use this first a product I'm putting this in quotes because it's basically not the entropy rate That was this with the fair coin. That's much much larger than The entropy rate of the golden mean process approximation But that doesn't make any sense because the entropy of a distribution is always great in the entropy of the support right So I'm always less than the entropy of the support, right? So the support here was just looking at the sequence Sequences produced by restricting no consecutive zeros and then that's contrasted with this other way We're trying to look at the internal state distribution That was one bit You can't have a process that has more entropy in the distribution than the entropy over The support right there's a bound here, right when when the support has uniform probability Then the entropy of the distribution Shannon entropy the distribution is the Entropy of the uniform distribution, but it's only in that case and typically less than so so so if we use this The direct use of the simple non-uniform presentation that to state model the internal State transition it gives us an over over estimate the entropy rate We can maybe kind of argue that this other way of taking into account no consecutive zeros capturing the entropy rate of the source of the support sorry That was better, but is that an upper lower bound? It's actually kind of tricky Depends so Yeah, we have all this internal state information that's being presented, but that's much more than in the observed process so How are we going to do this calculate? I mean we I give you the matrices just happens to be non-uniphealer So we can also look at kind of a direct application of this Entropy rate formula for hidden processes right so here's that SNS non-uniphealer presentation again and We go around to each state again They're probably a half and we look not at the state-to-state branching uncertainty, but the uncertainty in the symbols Okay, well, so So if we're in state a half the time the uncertainty in the symbols of the next time step well We know we're going to see a one so there's no uncertainty in the symbols. I'm predicting a one I will be accurate and then over here in B. I'm here half the time And then I see either a one or a zero with fair point flip so half the time I see one bit of uncertainty so that leads to On average half a bit per symbol So here's yet another number Considerably lower than the golden mean estimate So the whole thing is just terribly problematic I guess as long as we're honest with ourselves and saying we're doing approximations what I'm talking about I was okay, but basically every one of these methods a B and C is wrong Of course and the question is how do you do it right? but the lesson is All the kind of naive first-cut Approximations actually they're really bad approximations each for their own reason, but you just cannot use non-uniphealer presentations to calculate the entropy rate So someone just gives you a hidden Markov model and says that they're calculating the entropy rate They know how random the process is I would question them Pointedly to find out how they did it The main point is you need the epsilon machine you need at least a uniphealer Presentation so that when you use these formulas that are basically calculating the entropy rate of the internal state process that that maps through the State paths on to observe symbols in a one-to-one way and then the property of the observe process will be measured properly so it turns out that and we'll get into this later on but what I'm doing here is Trying to make some obvious connections back to the information theory answers simple questions that we have the epsilon machine and it's going to lead us to some Asking for some new techniques the actual answer is about point six eight bits Which is pretty close to that two-thirds So but that just turns out to be a fluke that the golden mean The entropy of the golden mean support sequences is close to this It's not actually correct and we can we'll actually learn how to calculate this in close form to arbitrary accuracy. So so now now kind of the other curious lesson here is Even if we want to estimate how random a process is What the lesson here is depending upon what part of its structure we look at we get different answers in other words to Estimate how something random is you have to know how it's structured which maybe said out of context Is intuitive in fact, but in fact there has been a huge amount of work in say nonlinear dynamics and statistical physics That only focus on ways of measuring randomness without talking about whether they have the right structured model Ignoring properties like unifilarity and so this is just mostly sort of a cautionary note We'll get to do it right, but So it's it's problematic problematic Just to kind of hint what we're going to do is for this simple non-unifilar source We actually end up with a countable infinity of calls and states But we can still work with those I'll show you how to calculate From infinite matrices the state average transition uncertainty, but we'll get there I have to introduce a few more things Okay, so we have the sort of size of the epsilon machine right the statistical complexity We know how to calculate the asymptotic causal state probability distribution and we just the amount of Shannon information that distribution is the Statistical complexity and like we talked about before as we're developing things. It's sort of like the size of the model right if the causal states were Uniformly probable then this would just be the log of the number of states right P log P goes to just log of the number of events when the distribution is uniform and It's you know, I guess just overtly it's the Shannon information the causal states in other words a process is rattling around I say oh, it's in state D. You go. Oh, I'm surprised right you're trying to predict the process This would be your average uncertainty But there are other ways of thinking about this In particular and this is maybe a semantically subtle sort of point But in the context of the difficulty we had of coming up with the right entropy estimate, maybe it's not Subtle at all. I mean we're interested in figuring out a property of the process out there in the world And the claim is that when we use the epsilon machine the properties that we measure that we measure of it in particular This is the statistical complexity is a property of the process That's some approximation not not so I'm So what we're going to think about this is that the statistical complexity in addition to this more Mechanistic view size of the size of the epsilon machine. There's a number of causal states in a process We'll talk about how this is really a measure of the amount of historical information of process at every moment in time There's some amount of past that is remembered by the process The loci of that storage. I'm claiming are the causal states That should be somewhat Straightforward because that's how we constructed them. We've grouped the histories together That led to equal predictions now a little more Subtle thing, but what we'll come to this at the end of the lecture is This number is the amount of structure in a process Now that's a little bit new, but if you sort of Think back to the introductory lecture this quarter and also in the winter I was sort of contrasting how in physics we have measures of disorder Measures of randomness thermodynamic entropy temperature. We have all these things But there's not a similar Systematic way of measuring how structured I can say this is hotter than that This process our system has more entropy than that degrees of disorder But how is is you know is is a is a chaotic pendulum more or less structured than my laptop? well, I have a guess but How would you actually measure that so this the first kind of big claim namely that using the statistical complexity we can start to compare Processes on a new axis certainly we can compare them to relative randomness. We have the entropy rate But now there's a new axis how structured they are how actually quantitative measure how structured something so I want to unpack that a little bit You know what we mean by structure organization regularity. We have a lot of you know historical baggage maybe language History built into what we think that means and so by having this explicit representation. We can start to explore some Of those meanings Okay Right so one of the things one of the measures we used a lot in information in the information theory sort of the newest thing Was this excess entropy and if you remember that we had these three different definitions for processes right one was this Compared the block entropy of our process to a memory-less process that had the same entropy rate Block entropy this curve was the distance between those We had this convergence definition. We looked at the L length Approximate of the entropy rate and how that converged to the true value and kind of summed up those components That's this and then you know get the most intuitive one was this mutual information between past and future Well, and we'd like to think if you believe the claim for example that the epsilon machine is this minimal sufficient Statistic and you can calculate anything from it for a process. We should be able to get this guy Well That's a remarkably subtle question took almost 15 years to figure out, but I'll give you the answer in less than 15 years So so how to get this simple thing? How do I get this excess entry past future mutual information? I have the epsilon machine it generates the process so there's got to be some connection here So how how do we do that now? There's some Particular cases and we went through those way back in when we're talking about information theory and just talking about the statistical complexity But now let's compare that I mean sorry the excess entropy compare that to the statistical complexity But the punch line is going to be other than these these these special cases where we know the relationship between excess entropy and statistical complexity past future mutual information compared to internal stored state information We have to actually introduce a lot more Some more techniques, okay, so if you remember when we went through this before the excess entropy for we argued that all independent identically distributed processes had zero excess entropy well If you go through the cases In fact, you can maybe almost do this in your head now having done the homeworks all ID processes generate some kind of full binary tree With one kind of morph. There's actually just one state and since we're always in one causal state the uncertain state uncertainty The state information is zero. So for all ID processes, these two quantities are the same and zero memoryless Okay The other simple case we talked about were just periodic processes, right? So we argued that the excess entropy was log of the period and said well what kind of information is this well This is the phase information And if I ignore transient causal states you can sort of imagine if I have some period p process that The architecture of the recurrent causal states is just going to be some big chain and they'll be p of those and again Now we have this explicit representation of what was the phase information. It's the causal state. You're in we have p of those uniform probability so log of equally likely p events is just log of p so again, so for periodic processes I ID process these completely predictable and completely unpredictable because the two extremes we talked about These two quantity of the same stored information the state information the same thing is this past future mutual information Now Kind of an early class that sort of hinted that things are a little more subtle Is when we look at what I call spin chains so these are our block Markov process we lay down blocks of Words of length are independently from a distribution and then you can show that The the excess entropy is the sort of block entropy But minus this funny term the range the size of the block Times the entropy rate of the process So now there's actually a little subtlety here. It's not quite our block process These actually have to be specified by Hamiltonian and described by transfer matrices, but But what I want to point out is that at least in that simple case We could write down a relationship between the state information and the excess entropy So notice that again, it's the range if it was a spin system keeps saying spin It's they have spin coupling in the certain range of coupling between the spins could be nearest neighbor next to nearest neighbor coupling That's our so we have the state information now It's discounted by this range of coupling and the entropy rate kind of the the uncertainty per spin So here I'm just explaining what the the formula means without telling you why it's the case We'll come back to that but it's a little bit odd, but I but just contrast Up here they were equal down here. There's obviously a non-trivial relationship between these things and it was at least I Don't know what eight years. I didn't know what this I could calculate this I could count I didn't know what this meant. Why was I why was I Discounting the state information by the randomness very strange Also, the other thing it's pretty clear here as long as I have you know Stochastic process entropy rates a little bit or a lot positive that's some integer that the excess entropy is less than the state information So that a process can have some state information, but not all of that appears in the observations and typically We have this I should say for hidden Markov chains for it can general finitely specified hidden Markov models unifield an onion feeler The statistical complexity is an upper bound on the excess entropy State information is an upper bound on the mutual information that you see in the observations Well, how is that the case? How can we understand that? Well? We actually have enough and the derivation isn't too bad So let me convince you of this rather general basically for any process the claim is that this is true that Basically in the best case what you're observing Reflects the internal state information typically not the information is getting thrown away. Okay, so So here's the just few line Sketch of the proof so so remember that the excess entropy is the mutual information between the future and the past and We can do our little information identities Again with the caveat so again the caveat is okay So I'm going to unpack this turns out this term is fine even though the arguments are infinite numbers of random variables But I unpack it this way so I'm going to pull out the future so I have the Shannon uncertainty of the future minus the uncertainty the future given the past that's just using the mutual information identity right now this This is very problematic But I'm just going to give the sketch of the proof I should really have finite futures here because then I have a countable number of objects and this number doesn't diverge So there's some odd thing going on here this it turns out Mathematically as well defined even though we have infinite chains around the variables This is very problematic because I'm actually subtracting off to potentially two infinities Okay, let's just put that caveat aside for now The overall proof when you unpack it you put your finite links and take limits at the ends shows that this is a little bit Okay, but just as a proof sketch you really have to do the stuff right so anyway, okay So so this first step is I just take apply my information identity and certainly the future minus the uncertainty the future given the past Okay, but we know that for this term here for example that I can either use the pasts or the causal states that They lead me to those are equivalent, right? Like we use this move a lot in the proofs of the Optimalities of the epsilon machines in other words that the causal states are proxies for the past or vice versa And they're sort of just as good right the shield Either I see a particular past or I tell you you're in state F Same information so this uncertainty in the future is the same whether I'm looking at the past or the causal state associated with So I can just substitute that in here So uncertainty the future minus the uncertainty future given the causal states. Oh, but that's a mutual information So I pack it back up again. So now this is the mutual information between the causal states in the future Fine, and we know well I can either unpack it again and argue that now I choose to pull out the The uncertainty over the causal states minus the uncertainty cause states given the future It's like the odd And because it's a positive number. I know that that this line is Upper-bounded by just this first term the entropy over the causal states, but that's the statistical complexity So here we have our inequality that says that statistical complete state information is greater than or equal to the excess entropy Well, this is kind of strange quantity here I could also just argue if you remember one of the Inequalities of the mutual information the mutual information is always less than the entropy of either one very One variable or the uncertainty the Shannon entropy of the other variable So we come down to this in any case but but this is this The pins We just use a simple property of this that is a positive number and subtract it off to get the inequality or You know if you like to meditate on troubling things. This is bizarre. We're talking about what does this mean? It's I have a future and then I'm uncertain as to what the present causal state is This is actually going to haught us for the next three lectures or two lectures This kind of thing for now all we need is this bound It's positive number who cares what it means and we get this inequality, but it's sort of controlling oddly enough I mean I'm kind of dismissing it, but it's controlling The difference between the internal state information and how much mutual information we can see in the observed sequences So it's kind of important But it's absolutely critical to figure out what that means although initially it's kind of odd We're sort of retro-dicting from the future what the current causal state is so Well that was a bound That the excess entropy is bounded above by the state information the statistical complexity I gave these other cases where e and c mu are the same and wrote down another case Where they're actually not equal that spin case But it turns out our friendly even process um actually Is that is another case where it's neither iid nor nor periodic obviously But the excess entropy and the statistical complexity are the same Now I have to we have to give you some new techniques. You can calculate this, but just a little heads up So we have other cases besides that the extreme of periodic processes in iid Where that bound that I just derived is saturated Which means that funny term I was just whining about For the even process is zero I can I can I can I can Take the futures and say what causal state I'm in in the even process. Oh, that's interesting Okay So but this this leaves open a question So I've just given you a bunch of examples for classes where we know what the facts of the matter are non-trivial example where the spin chains, but now even the even process We have the the bound now saturated so that's it's this very puzzling so we need there's obviously something else going on here much deeper questions Sure, yeah Yes, that's exactly where we're gonna go so I've been talking about You know a past and then you know the entropy rates the uncertainty of the next symbol well That's prediction what this actually hinting at is we have to understand what retradiction is We get the past and asking well what next symbol or the predecessors symbol or that the current state? Yeah Yeah, we have to get much more comfortable with that so to answer the question But let's just kind of drill down a little bit more on this comparison between Sort of the state information that's stored internally in a hidden process and what we can see informationally in the observed sequences via the mutual information so this kind of the extreme interpretation of this this Cmu bound on e is what I call that cryptographic limit so there are examples whole families of processes infinite numbers of processes where At least the observed information mutual information is arbitrary close to zero and the internal state information Is arbitrarily big? so you might remember last court I kept talking about probably by way of trying to explain what this past future mutual information was or what e was is some kind of Memory are probably even slipped and said the stored information. It ain't the case that so I usually said apparent stored information It's what you get in the observed symbols So this cryptographic limit this class of process that satisfy this There's an arbitrary distance between what you observe in your measurements and how the process is structured internally So so Now now the the state information is still in the sequences, right? I'm not saying equal to zero if E was equal to zero it would be an IID process and as an IID process would have one causal state and Cmu equal to zero So think of this now. This is epsilon small, but still not zero So the state information is still in the sequences oddly enough. It's just not immediately apparent You have to do a lot of work to pull it out so so and the reason I call this a cryptographic limit It's very similar So so sort of as the you know analyst or the scientists were trying to figure out, you know nature's hidden secrets Well in this case more prosaically, what are the internal? What's the internal state structure and stored information sort of similar to? Cryptography right if you have some plain text that you want to transmit to across the country You scramble it all up so that at least to all observers The symbols you transmit look like randomness in fact the closer that text looks to having zero access entropy or zero Cmu the better the harder for the the crypt analyst back at the NSA to to Figure out what the actual message was but you're not sending a random sequence, right? I mean the goal is that the receiver that gets the message decrypted So there has to still be some way of some little bit of information in the sequence that it's it could still be unpacked So that's the sort of analog up here even though at the access entropy is very very small close to zero There's still the internal state information somewhere hidden in sequences And you would still write down the call state equivalence relation and go through this it just could be quite hard So here's maybe more visually here's one class. That's processes that satisfy this cryptographic limit I call them the almost ID processes And what you should imagine here is Well, here's a state diagram has eight states The way I built this up though was I started with the fair coin Single-state model of fair coin then I split it So I had two states with equal transition probabilities And I split each of those to have four states and I split each of those again now But they all had uniform transition probabilities Which means if I actually apply the equivalence relation they'd all collapse back down to a single state again And I would say see me is equal to zero so once I build this up Obviously, I can keep doing the state splitting as far as I want once I stop what I do is I go in and just take a Uniformly distributed random over between zero and epsilon where epsilon is small as you want and tweak all the transition probabilities So that the now these what we're Predictively equivalent states that are all into in distribution. They have slightly different future Morse and they're distinct So these are all causal states now because the transition probabilities are close to a Half everywhere We they're based the the state probabilities are basically uniform So the statistical complexity so many log of the number of states or eight in this case or three bits or again could be sixteen hundred and twenty eight whatever However, you can show that the past and future are very Closely independent and that's the criteria for the excess entropy being zero So it's really really really close to a fair coin Except for the small little variation in the transition probabilities The future morphs as you hop around from state to state to state and obviously and then this is this is this is a construction scheme for a whole class of processes That have arbitrarily small past future mutual information with arbitrarily large state information So whatever these things are measuring they're different so after all this discussion this sort of one There's several consequences to draw out one is the excess entropy is not the process of stored information Now why do I point this out is that this has been a confusion in fact? It still is a confusion in many literatures in information theory and in machine learning and neuroscience Just by that one class of process of the almost ID processes shows that this the case There are there more maybe principal reasons for this, but this is an infinite class of examples It really is this statistical complexity. That's the story information But again at this point it's a little more intuitive because we know that they are built out of the Causal-colon's relations are grouping things are equally predictive. So how how we how it's not with so part of what the problem has been It's been a poverty of language How are we going to think about these things? So so what one sort of approach to this to think about e is the apparent information, right in the measurement sequences it's not doesn't refer to directly how the how many states effective states there are or The organization of the internal process. It's just kind of superficial apparent But sort of why my favorite summary is the following so so c mu is the amount of information the process uses to communicate It's a mechanism like a channel to communicate e bits of information from the past of the future Right we have this sort of superficial. Oh, how is this sequence past related to this sequence? That's e But now we're looking at some some box, right? There's some internal mechanism That's doing it actually implementing it and that's where the c mu comes in And that can be very very large even if this appears to be close to a good random number generator so I mean we probably have to come to your own terms with this but these things are different and there's a little bit of a Collapse of terminology here when we get confused right, so so think of this as the story of information and this is sort of Kind of the superficial thing that we see now, of course the puzzle is When we're modeling these processes what we started with even to get the causal states We just worked with what we observed, right? I mean there'll be cases where we assume we have a model and calculate properties of it But we we got the causal states from looking at the superficial thing Just so it's kind of some ways a companion of e So another way to think about this this bound the fact that these these can be different It's it's really why we build models of the world if somehow Maybe it's hard now to believe it imagine that that the state information and the observed mutual information were the same thing In other words the state information was always present in the observed sequences Well, then what's the modeling strategy? You just store all your sequences. That's a that's as good a model right But the fact that no well and there are cases right so periodic process I would just write down the template where it ones everyone will whatever it is and then be done with it Right, so so but there are other cases where this is a really subtle process That's quite different You know So one way to think about that some misinterpretations in the literature really what they're doing is they're using observed sequences as Proxies for states and what we're saying is sequences aren't states sometimes we group them together Because they're equally predictive, so we're breaking this kind of naive assumption And the other thing is this of course there's internal structure a process that's not directly expressed by the observed sequences and Just to drive home this point. There is really this extremely widespread misconception about The past future mutual information. It's just it's amazing to read it. It's often called the predictive information Which is a terrible name It's actually misleading Really the predictive information is this statistical complexity Right the disco Plexi is a stored information. It's the information you need to optimally predict we prove that on Thursday You can't use less information from the past than that right That to try to help bridge this Terminological gap here what I would suggest is the excess entropy is the predictable information It's the information between the past and the future that can be predicted It's not the information that you need to do prediction so But I can give you 30 papers That confuse this in the last half dozen years It's amazing, but I Mean one of the lessons here is a methodology be really careful about the language you used to describe The mathematics that you're doing and the quantities you're talking about especially an information thing something about information theory It's very attractive obviously. I'm a fan But there's something about it that we all get kind of weak in the knees and use these sloppy phrases So Right right Yeah, yeah, and you'd yeah absolutely that that could certainly happen and then you go well jam I really don't care about that bit if it's gonna take me that much effort. That's a different question That's fine. That's okay We'll get to that we can talk about how we would trade off Your prediction air against a model size we can do that and there's this thing called this rate distortion theory that does that So so so hopefully we'll have some time to talk about that But yeah right now again, we're just talking we're just trying to lay the foundations And then the practical issues and how we back off from that so it's true So you'd say for these almost ID processes. Yeah, yeah, they're almost ID so between you and me, you know We're friends. It's ID Well, there might be situations like reconstructing the text that you've transmitted you really want you can't just take E exactly the zero make it a fair coin because then you're throwing a bit, you know the essential bit away that you need so I Really what this is telling you is that the state information when he is very close to zero for the almost ID process So what's happening if there's something about how the state information is being measured how you've labeled the edges in That translation this function from state paths to observe sequences It's taking the internal state information and spreading it out over arbitrarily long words So that's what's going on So the flip side of that is well, okay I only want to describe the statistics up to length 16 words and then maybe it's essentially a fair coin Well, okay, but that's that's an issue of approximation Oh Let's see. Okay. This is just almost an anti-climax but but another kind of bound If you remember we have this redundancy per symbol We had a whole set of different redundancies back when we were first introducing excess entropy sort of a first term The excess entropy is actually often used as a difference between the symbol single symbol uncertainty You know the bias in the coin if you will and the actual entropy rate. This is the compressibility of a process so you can Know almost a very similar proof tech proof method you can show that This statistical complexity is an upper bound on that too It's the same method. We write out this quantity here the redundancy for length one symbols Well, we actually just write in the definition for each mu. That's the uncertainty in the next symbol given the past And I can then of course replace past with causal states. I Package it back up again as a mutual information between the causal states and the next symbol and then Just invoke the fact that the mutual information is always bounded above by the entropy of either variable Independently so it's bounded above by C mu very similar to the previous e-proof But again, this is a quick and dirty thing that you'll often find used to measure How quote complex processes are Used a lot in statistical mechanics actually So it's also upper-bounded by this and different so all the previous discussion holds Okay, so what we've been doing is just Sort of compare and contrast we had these information measures entropy rate We could also talk about transient information But we mostly just here it just just talked about excess entropy and how that relates this more structural view of the epsilon machine It's statistical complexity So these are different quantities E and C mu now to Try to get more at how This issue of organization. What is this? I mean? It's almost easier to talk about amounts of things quantities and argue them through you know C mu is not profound on E It's a more subtle question almost philosophical as to what we mean by Something is structured. It has a pattern or has a cemetery and so on so what I want to do here is kind of draw kind of a contrast with with group theory what we mean by Cemetery's that that are expressed in terms of groups and then talk about how The epsilon machine presentation is a type of group is the relaxation of the group concept So just very quickly. This is a little bit kind of a high level, but so in when we talk about cemeteries We mathematically express that as as what's called the group The idea here is that the sort of the the organization of an object is Defined in a complimentary way by the list of all the operations you can apply to it such that it comes back to itself So if I have a square and I rotate it 90 degrees, it's a square again If I rotate another 90 degrees and so on so I can make this list rotate by 90 90 90 and it's always the same thing It's like there's an invariance. I play operation I transform the object and then when it comes back to itself I note down oh this operation led to this object coming back to itself So so this is interesting kind of duality between the list of cemetery operations the so-called group operations and What the shape is so we don't actually at some point ever define what the hell the word means shape We just say oh it's this list of operations That's what we meant by shape Okay now in groups The actions you take always have to have inverses if I rotate my square 90 degrees and I get a square And if I rotate it back 90 degrees, I still get a square. Okay, so that's an important distinguishing property of of a of a group now by way of contrast and I'll give two examples that show these The epsilon machine is called a monoid. It's actually a semi group With an identity, so let me define these terms so The simplest description of a semi group. It's it's a relaxed group So basically It's a set of operations that don't have a unique inverse. I Mean I feel a little hard to think about if I had a square at least I Rotate it 90 degrees. I rotate it back. It's still a square It's kind of hard to imagine if I did the opposite minus 90 degree rotation. It became something else But what I'll do this with time series so Now there'll be Certain processes for which the epsilon machine semi group or monoid is in fact a group and then it describes an exact symmetry But the point here is that semi groups are very important They in some sense describe generalized or noisy or wild card Cemetery's or regularities not exact Cemetery's so so two examples to kind of contrast this So here we've got one zero one zero one zero. So we have this period to process Okay, and my operation is going to be Translate by one time step Okay, comes back now. I have you know a zero over here and a one here while it doesn't match Well, if I shift it again in time The sequence comes back to itself So shift in time by two is a group operation That describes this period to sequence Okay, so you can translate it in time and we get the same sequence and of course It's easy to now talk about what the epsilon machine is because we're back in terms talking about sequences, right? We would have the two causal states both predictable transitions that produce the sequence And then what the epsilon machine captures is The pattern of the sequence is the periodicity in the state sequence a b a b a b Okay, so that's the case of when a time series has a group structure to it There's just one operation shift by two steps So now imagine we have a random process some kind of random process here That's actually described by this epsilon machine Okay, so this is actually the noisy period to process There's actually a fixed zero and then I have a wild card zero one fixed zero wild card, right? Now the pattern that's captured in this Is actually the same as before Right the internal state process is periodic Even though the actual object is noisy So we've sort of figured out the hidden symmetry by Reconstructing the epsilon machine the noisy period to machine from this random process So that's a sense in which there's an internal group structure But that means that this object up here this process up here is a semi described by a semi group They made the full epsilon machine, right? So so the point here the contrast with the previous exactly periodic sequence here if we translate by two steps We don't get the same sequence back. We get the same distribution over the sequences back And that's why it's a semi group So that's why I mean kind of noisy regularities noisy cemeteries Yeah The operation of concatenation or serial application of these operations the epsilon machine there different ways to sort of express it probably the Maybe the most obvious one the group elements to take or should say that the semi group elements that take are the matrices the symbol labeled transition Matrices so when you write out a word there'll be some product I take the zero matrix the one matrix the zero zero matrix and so on and Then what you have is and then you can sort of track which semi group element You're at by the actual entries in the matrix of the product Yeah, and then and then then I go from one matrix to another matrix on the zero matrix and I get the new element Which is a product of things so Yes, yeah, yeah, but not unique inverse right so here I'm not unique over the transitions In the sequence itself So yeah Now okay, so this is maybe a more subtle point maybe more important Those previous two examples were easy to kind of compare and I'm obviously not doing a real real justice to group theory or semi group theory at All but there are other things the epsilon machines capture in their shape We have the set of states and the transition structure, and I've been using English to describe these to you why because I hope you understand them, but in fact Language fails this especially if I had a 17 state machine, I'm not gonna have some oh, it's noisy period too or it's the You know in this case the golden mean no consecutive zeros I mean I keep using these examples because I can say in one English phrase, and you know what I'm talking about right no consecutive zeros But what happens if I have some seventh state thing? Well language just typically isn't there that you'll probably notice as we go along We just concoct all these wild names the butterfly process the you know Random man I explore process all this just for some sort of labeling But that doesn't mean that they're not structured it doesn't mean that they're not patterned in fact It's the it's the structure of the machine that the epsilon machine that That that gives us that right so so again even process I use the word even the pattern. It's capturing is even so the homework assigned today is Two sets of word distributions, and you're supposed to go reconstruct the epsilon machine and figure out Based on this I want you to discover something keep playing discover the patterns Well you have to come up with a narrative after you get the machine the states and transition start You have to figure out what the narrative description of the property is it captures what the pattern is Just to kind of make this point. I mean it's it's I always either frustrates me or I like giving this talk about the pattern because it's You're sort of stuck using language and it The idea really sort of transcends that it's really is mostly kind of a mathematical concept of the algebra of a semi group so right so Okay, so I think we'll just finish up on this which is another aspect of capturing pattern What I call measurement semantics, and this is again. It's going to just if we're trying to exercise This the use of these epsilon machines in causal states So so measurement semantics is meant to answer a question namely We've been sort of thinking about you know the the hidden nature out here observed through some instrument producing a Bunch of data we build a model out of the statistics and we're distribution But now I want to go back and you imagine that you have a particular model and Then you've been watching the process for a while Maybe you're even synchronized to the process You know what state it's in and at some particular time You see a particular measurement value and the question is what's it mean to you? Now your house fly flying around you fly over in the Oxford English dictionary and there's a little Thing little serif from a tea or something like that. What's it mean to you? Well in that case probably nothing, but can we make can we make that more intuitive? Well the sort of there's a contrast I want to draw to pull out the side of measurement semantics The idea is that You're sort of looking at the world or that measurement sequence and sort of two different ways at least two different ways One is we call the prediction level right, that's What what what does That one Mean in terms of you're predicting the future and that's really what Shannon was was was going for right? And he gave us the semantics, you know very first preface or chapter of his original papers He said oh what you should think about the meaning is when you make that measurement It's how surprised you are right so he came up with that So called self-information minus log of the probability and sort of argued again That was all very philosophical if you look back at it You know your the amount of information in in that symbol is Minus log of the probably of observing that symbol and he said you should think oh really what I mean is you're surprised get that That's very subjective very effective theory, but very subjective so We have that answer in the epsilon machine, you know, I'm the agent I'm looking at the data coming in I have my model I've already done the modeling I'm tracking it going from state to state state taking transitions So so what's what's how do we answer Shannon's question here? What's the amount of information in the next symbol? Well, that's just the log of minus log of probably observing that symbol is minus log of probability of going from my current state Seeing that symbol and going to a next state. That's the transition probably log of a transition probability Which of course if I average is the entropy rate, so no surprise I'm just recasting familiar things here, right? So so so Shannon's looking at it sort of that this prediction level. He's just saying how surprised are you and That is captured in the epsilon machine by the transition structure Okay So if we had you know time moving along here and we some measurement sequence we stop at Time 11 that we see symbol one How much information does give Shannon says? Oh, that's something like this conditional entropy my uncertainty in What I'm going to observe at time 11 given the past going from time 10 on back And that's you know something like the entropy rate and I would calculate about point six bits there Because I know that was coming from a particular process So it's just the degree of the observer surprise predictability. So but this doesn't say What that particular occurrence that s 11 is equal to one means to the observer? It's just oh, I Expected that oh, I'm surprised So there's something else to do here besides just prediction So that the the notion of semantics are meaning I want to introduce is Based on this idea that there's a tension between two different levels of representation of the same event Okay, so Level one was sort of the previous context It's just the data stream itself and the event is just a measurement in the context of the processes pasts And their degrees of surprise. Okay, but the other level is that no no I'm an agent. I built the model and when I make a Measurement that updates my expectations what states in what transition I should take I up it updates my model So that's a different Model relative Notion so so so and the claim is that sort of meaning occurs There's a semantic content when there's a difference between these two levels So in the case that I'm agent with the model the degree of meaning. I'll first talk about quantity I just call it sort of big theta here of s is minus log of the probability of the state Calls the state that you go to Having seen that measurement symbol s So the prediction semantics was uncertainty in which symbol I was going to see and now what I'm saying is that the degree of Meaning is log of the probability of the state I get taken to Okay And then the sort of content rather than the amount is the state itself the state I get taken to is the meaning content Again language is sort of starting to fail us here, but So we'll get to go through some examples, right? But there's some boundary conditions that may be also makes this definition sort of plausible Well, so we have this epsilon machine it has unique start state. What's the meaning of that start state? well, we put all the probability up in that start state so that You know minus log of that probability one is zero So the amount of meaning is zero. Why well, what's the start state mean? We haven't made any measurements yet Can't have any meaning so That's good. At least that's a little consistent with this What happens when we're reading a sequence and we're in some state and We're in state B, and we're going to see a one with probably when we go to state a but in fact I'm in B, and I see a zero so what happens. What's the the meaning of a measurement when? It's a disallowed transition. Well, there's a little sort of process that we do here Which is what you do is your models incorrect? You really don't know what's going on the model said that's disallowed So what you do is whatever picture you have of the process like the distribution over the states you reset that so all the Probability goes to the start state again You really don't know you're ignorant again just like when you started out making the measurements And then the idea is that these disallowed transitions aren't meaningless That doesn't mean that these Unanticipated disallowed transitions when they're meaningless it doesn't mean that you're not getting something from it You're really really really surprised right Shannon says since the transition probably is zero It they're infinitely informative This cannot happen then it happens you go whoa right big surprise. So so if the transitions disallowed you have log of a Zero transition probably which which is infinite. So the surprise part of this is high, but It has no meaning to us Which is almost a nice way of defining what we meant by disallowed to start with So these is kind of boundary conditions on this, you know the extreme cases of this proposed definition of meaning and meaning content There was some something at CERN where they thought that they've seen Nutrients traveling faster. Oh, right. Yes, right. Yeah Right They were really surprised So Shannon would have given them the right measure for their surprise, but then I guess the rest of the scientific community said that's meaningless Yeah, exactly. They finally figured it out. So it fell back into the regular context They were they re-synchronized further down in their machine, right? Yeah, they figured out. Yeah Yeah, they didn't reset all of physics to zero the start state, right? They pondered pondered pondered although I someone probably felt that way I mean if it were true then they really didn't know it was going on so they would have to rebuild everything But in fact someone figured yeah Yeah, I mean this is kind of kind of a metaphorical description of that But that's that's the kind of basic idea and it's yeah sort of curious Quick observation. We're talking about Log of the probability of the state you get taken to and if I average that over time I'm hopping at all the states I'm visiting those states with an asymptotics their asymptotic state probability. So the average degree of meeting is The statistical complexity. So this is so so rather than me talking about the stored information It's really the total semantic content of a process. That's what seeing you is Again, it's just a number and I'm not saying, you know, no consecutive zeros that means kind of these interpretational thing But but there's the number so so things kind of close up nicely here Rather than just being Shannon and looking at the branching structure. We're also trying to figure out what the what these states mean and tell us So let me not go through this in a lot of detail, but you can now apply this definitions like So maybe you can kind of work through this offline this table here So here we've got some process described by this guy should look a little bit familiar It's if I see a one I must see a one So it would be the even process and I've even put in seeing a zero with probability zero disallowed transition And you just sort of go through here So if I am in state a the start state and I haven't made any measurements Well, there's no transition at this point The degree of meaning is zero because I've made no measurements because all the probability is here So that was the previous case I talked about If I'm in state a and then I see a one Well, I come back to myself So with probably two-thirds Or I go to state B with probability one-thirds that means these states ends up with probably two-thirds and probably one-third if I started here and saw one or saw it zero respectively and therefore the The sort of degree of meaning is just the log of those probabilities after one symbol What's the meaning of that? Well, if I'm staying in a thing in the start state I'm still I don't know what recurrent state. I mean, I'm unsynchronized So that was the semantic content of staying here in the start state with a loop To the extent I'm seeing ones. I'm unsynchronized. Whereas if I see a zero, I'm synchronized So that's a little bit so with a semantic the actual meaning is Let's see if I'm in state B and I see Well a zero or one both of those are equally likely so there's one bit of surprise says Shannon But the semantic content is that if I go on a one I go to see that means I've seen an odd number of ones That would be the information content if I Go back to be I've seen an even number of ones Zero ones two ones four ones right so every time I'm in here It's a statement about the structure of the past that could bring me here Long as I'm seen once it means I've seen As long as I've seen zeros I stay here. I've either seen no ones are seen pairs of ones. So that's the structural interpretation of state B and then Right, so in state C if I see a one well first of all that's not even remotely surprising because that's that's a given I expect that and I go to be Which means an even number of ones Now if I'm in C and I see a zero well, that's disallowed. So I infinitely surprised The semantic interpretations I I reset the machine and I lose synchronization. I go back to the start state But it's meaningless to us. I don't know what to make of it So you can even apply this definition of semantic content like I just did actually to a Sequence from the even process, but I actually was using the wrong model. You remember you did this it actually has four Causal states to transient and to recurrent so you can go through go through that table, which of course would be much longer Which is why I didn't do it So I can even be looking at a day's dream coming in with these rules and and use a completely inappropriate model And I'll still have an interpretation. I Could look at the even sequence with the golden mean Epsilon machine and go through this again Maybe you're a little more challenged to come up with that one column That was like the semantic meaning of this like you're misinterpreting things If it's the wrong model, you'll be you'll be seeing lots of disallowed transition So you're resetting all the time But you can still go through and calculate these numbers and think about what the meaning of each individual measurement is Even if it's the incorrect model Yeah Like the noisy period to yeah noisy period to zero and then I had both zero and one back to a yeah, right Yeah, right so right so there you're going from a to b a to b so the internal state sequences periodic So that that's just yeah It is even not even odd and even an odd phase in terms of the semantic content but then for one of the zeros it's completely predictable and Through the even times and then odd times it you'll have a fair coin flip and you'll be surprised Yeah, so you can go through I mean these different. Yeah, it's this Yeah, yeah, just pick some examples and imply these these rules to it works out just fine Mmm. Yeah, so Yeah, so I read okay. I try well I did slip something by in this sort of along the lines what I was saying before language sort of fails us here I really The things that probably made the most sense is when we had an I was associating a number with the degree of meaning and Contrasting that to the support to Shannon's surprise interpretation of the measurement You know and we're looking at amounts of memory instruction surprise, but There was one column in the table where I tried to suggest what the meaning content was and it's just again. That's human language In fact, it was English in particular That's really not Language, let's me point at what the meaning is the actual meaning Turns out has to do with the semi group structure that algebra of the epsilon machines entailed How it sort of breaks up the sets of pasts and futures So there's an actually an algebra behind this that is really as close to the more what I mean by meaning the semantic content somewhat interestingly this notion of Semantics goes back to kind of the early days in neuroscience It's back in the good old days when There was less distance between computer science information science People working on neural systems and so on and the idea This is an early references is Donald Mackay famous neuro physiologist back then he was trying to think about This stuff up here neural tissue and his idea was that meaning is the selection of some Anticipated contexts. I have a palette of things and one thing gets selected the thing that gets selected is the meaning You said Chicago to the online Reservation system so here exactly it's exactly this right the epsilon machine gives the natural set of contexts The states the causal states are contexts of interpreting the measurements That they're what you're anticipating right because each state is an anticipation I mean it makes some statement about the future What futures can occur the future morph so their context of interpretation? Exactly so so this is I mean the Mackay's discussion of this was very narrative So this is actually giving a mathematical foundation to this notion of an intrinsic semantics to a source So Now and there's still a lot more to do with this I mean you can imagine moving through the world and if you had some modeling system that was adaptively changing You were learning new things then of course your Semantic view of the world would be enriched and you'd have new meanings appearing So we'll come back and talk about an extreme case where it's not that the model states Causal states get more enriched, but in fact They diverge and you have to come up with a whole new class of models And you can do that from the process itself to so you go from it basically incorrect hypotheses It's a finite state source to realizing it's an infinite state source and you end up with a whole new Presentation representation of the process. So that's a sort of huge semantic shift that is an example of you know Generally creating new meaning So that's the story for today and we're going to kind of continue on Thursday with a similar kind of you know trying to exercise what these epsilon machines mean What the what how we can use them? What properties do they capture and we're also going to talk about projects? So? be thinking a little bit about projects and I'll go through some past projects and Try to give some guidance so part of the homework is writing a project proposal, okay?