 Today, I'll be talking about two measures of a process that probably haven't really been discussed so far, maybe Markov order a little bit. I will be talking about how you calculate the Markov and cryptic orders for a process and calculate them exactly rather than approximate. And we'll go through three different approaches to doing this, each one successively better. Better in terms of computational performance and accuracy. And we'll detail what's lacking at each level and what's improved when we get to the next level. So, just for background's sake, you all know what a process is already, but essentially it is a shift space endowed with a shift invariant measure, which is exactly what you've already learned. It's by infinite strings with probabilities attached. Just to solidify notation, we will denote blocks of symbols like this, where on the left side of the colon, we put the starting index of the block, and on the right side we put the ending, but minus one because this is inclusive on the left and exclusive on the right. So, standard interval notations. Closed left, open right. So, we would denote a finite block this way with the two indices, but infinite blocks, the direction it's infinite on, we will leave off the indices. So, this means start from negative infinity and go all the way up to but not including negative one. And similarly on the right, this means start at two and go all the way to infinity. Yeah. I'm afraid of getting lost in the next slide. What is the new standpoint? Your process is... Here? It means a measure. Okay. Just standard measure theory stuff. This is the shift operator that's subset of the full shift. Yeah. It was... I tried to find the most succinct way of denoting what a process was, and Nick Travers and I thought that this was it. So, furthermore, for the duration of this talk, we will assume that the processes we look at are stationary, ergodic, and discreet, discreet in both time and observed symbols, but the notion of Markov and cryptic orders doesn't quite hold in ergodic, but the algorithms we talk about will calculate a generalization of those properties in non ergodic situations, and I don't believe the concept of stationarity really would make sense with Markov and cryptic orders. So, this one is needed, but these ones can be relaxed somewhat and the algorithms will still work for generalizations of those ideas. All right. So, Epsilon Machines, you already know what those are, but if you don't, they are mealy type in Markov models that are minimal unifiler, unifiler meaning that given a state and an output symbol, there's only one edge leaving on that output symbol. States are causal, meaning that they summarize the past as best as possible. They're the minimal sufficient statistic, and that's already said right there. So, moving on, the Markov order of a process is the minimum length such that observing a block of that length, the probability of the next symbol upon observing the previous L symbols is equivalent to the probability of the next symbol upon seeing all the previous symbols. So, instead of looking at the infinite past, if you only have to look at a length L past, the Markov order is the shortest length L past at which you still have equivalent predictive ability, and we can rewrite that using the causal states because they are the minimal sufficient statistic, and we can write it this way, that it's the smallest L such that the entropy over the ELF symbol or the ELF state conditioned on the previous L symbols is equal to zero. So, given that you've seen L symbols, you know exactly what state you're in, knowing exactly because the entropy is zero. Yeah. So, why is it X from zero to L instead of minus L to zero? So, because these processes are stationary, we can shift the index we call zero anywhere we want, and writing it this way will make more sense when we define the cryptic order. So, I shifted it over, and so the cryptic order following on this definition, the cryptic order is the same equation except instead of this being a finite block, we consider it an infinite block. Now, this might be a little weird. This definition says we saw L symbols and then the ELF state, and we know exactly what state that is. This one says, given that we see the entire future, at what point do we know exactly what state we're in? So, we'll see some examples of what this means exactly and concretely very soon. But first, the naive approach to calculating the Markov and cryptic orders is using block entropies and block state entropies, which have those been talked about yet? A little bit? It just, as you can see, it's the entropy of not just a block of symbols, but also the state induced upon seeing those symbols. So, like with the even process, if you see a bunch of ones, you're in a mix of A and B. But if you ever see a zero, you know exactly what state you're in. So, if this block has a zero in it, this will contribute nothing to the entropy, but if this block has a one in it, this will contribute, or if this block doesn't have zeros in it, this will contribute to the entropy. Okay, so if we plot block entropies for this process, the noisy random facelift process, first, length one, block entropies is some value, approximately one. Length two is some other value here. Length three, length four, length five, and we notice that length five, the block entropy is exactly equal to E plus five times H mu. So, it's exactly on this asymptote that we know it eventually limits to. And when we're exactly on that, that's the Markov order. And then it just keeps going and following the asymptote. Similarly, if we start with the block state entropies, they start at C mu because this is nothing and then we just have H of SL, which would just be the entropy over the states. And then we keep this going and eventually what happens is the entropy from the states no longer contributes and we're exactly on this asymptote and we call that the cryptic order. So, in this case, the cryptic order is three. And then it follows the asymptote. Yeah? Are there any guarantees that the cryptic order exists? Yes. Exists, do you mean by not infinite? Or... Yeah. So, we do know if we have a unifelial model that is minimal in the sense, or if we have an asymptotically synchronizing model, which has that been defined. If, in the limit of infinite pasts, you will know exactly what state you're in, then this curve will eventually limit to this one. That is, if we keep observing, yes? It doesn't have to be less than or equal to R because that's when you know the else state by definition. Yes. It does have to be less than or equal to R in the case of asymptotically synchronizing machines. That's what you said. Okay. I hadn't said that yet. Okay. But if you have a model that is not asymptotically synchronizing, for example, a version of the even process where you duplicate both states and just have them go in a ring so you don't know which state you're in exactly as you go through because you have exact duplicates of the states, like you can draw a period two process with four states. You go zero, one, zero, one. And you won't know exactly which phase you're in. If you have a machine that has that kind of property where you, no matter how much you see, you'll never know which phase of the machine you're in, then this curve will not hit this asymptote. There'll be a gap between them. A finite amount that we call the gauge information. Even if it's asymptotically synchronizing, you'll, one curve it might merge the other, but it doesn't mean you have a finite marker. That's right. I'm not defining finite ones. But if you have asymptotically synchronizing, these two curves will approach this asymptote in the limit. For example, the even process never hits this asymptote exactly. You've already seen that, right? It gets close, but it never hits it exactly. And we'll see that means that the Markov order for the even process is infinite, which I believe you had a homework problem to do. Yes? Yes. All right, so this is one approach to finding the cryptic and Markov orders of a process. But there are many problems with it. So let's look at the assumptions we made when we had to do this. First, we had to assume we knew E exactly, which isn't a big problem. We can get it exactly in many cases, whenever the forward and reverse machines are finite in size. But if one of them is not finite in size, we can't get E exactly. So drawing that asymptote to find out exactly when we synchronize to it will be more difficult. We assume we knew H mu exactly. That one's relatively trivial. We can calculate that from any unifiler model. So this assumption is not a very particularly bad one. This one's a little bad, because if the forward or reverse process is infinite, we can't get E exactly. But we can approximate it very well with the mixed state block entropy algorithm. This is a more important issue. We assumed we could differentiate exactly on the asymptote from less than machine precision away from the asymptote. This is a very practical concern. For the even process, the block entropy curve approaches exponentially to the asymptote, but it never is exactly on the asymptote. So eventually, we'll get to a point where the distance between them is smaller than your computer can differentiate. And at that point, if you had an automated algorithm doing this, it would assume you were at the Markov order. It was some finite but large value. Whereas we know theoretically it's infinite. So this is an actual practical issue when we're calculating Markov orders. And furthermore, if R were infinite, we can't have our computer just continually grinding out block entropies, trying to get closer and closer to this asymptote. It will never get there. And so we have to have some condition for bailing out and saying, well, the Markov order is larger than a billion, so we're going to assume it's infinite. So these are the two biggest issues with this algorithm that I showed before, grinding out block entropies and block state entropies until they hit the asymptotes. One, again, figuring out exactly when we do hit the asymptote. And two, stopping when we know we won't hit the asymptote. And knowing we won't hit the asymptote is a difficult problem, or could be, potentially, if it weren't for the stuff we're going to talk about next. So the first hint that there is a very simple solution to this problem is let's take that machine we just had before. Same one. Same one that's here. And instead of these one halves here and here and here and here, let's replace those with p and q. So now we're letting any probability go this way and this way, say 90%, 10%, and this one could be 40%, 60%. We don't know. Any p and q. And if we sample those p's and q's at random and plot the block entropy and the block state entropies, we end up with this. This is actually minus the entropy rate of that particular set of p's and q's. So we know they limit to E plus LH mu. So if we subtract out that asymptote, we know they should go to 0. And we see they do. In fact, no matter which p and q we choose, the block entropies hit the asymptote at 5, and the block state entropies hit their asymptote at 3. So that tells us, or at least strongly suggests, that the Markov and cryptic orders of this noisy random phase slip process are independent of which p and q we put on here. So they only depend on the structure of the epsilon machine, not the transition probabilities. Now it does make a difference if, say, p was 0, then we would always just loop here. So it's not absolutely any p or q. It's any p or q that does not change the structure of what the epsilon machine is. The epsilon machine being minimal. So we can't set this 0 and this 1 and still call it an epsilon machine, because the epsilon machine is actually just the single state that always outputs 0. But so long as you don't hit one of these points where either a symmetry or setting the value kills an edge, as long as the topology of the epsilon machine stays the same, the Markov and cryptic orders will be the same. So they really depend on the epsilon machine topology, not the actual process. If this was, say, 10 to the minus 6, and this was 1 minus 10 to the minus 6, you might get very close. See some of these are a lot closer to the asymptote here at length 4 than others. You'll get really close, but you won't actually be at 0 here, unless you either go out there or you've changed the structure of this by changing the probabilities. Like I said, the Markov and cryptic orders are independent of the probabilities, and therefore they only depend on the topology. So now let's focus on topologies and what those tell us about the machines. And we're going to switch to a different machine, because this one's a little large. And we're going to look at this one over here. And what we're going to do, just for the sake of argument, is we're going to consider all words that can be seen by this process. So the first word we can see would start with a 0. Let's go in lexicographical order. So we could observe a 0. We could be in any state of this machine. Before we've seen anything, we don't know what state we're in. We could be in a, b, c, or d. But then if we observe a 0, let's see, a on a 0 goes to b, b goes to c, and c goes to d. So after observing a 0, we have to be in b, c, or d. And if we observe another 0, b goes to c, c goes to d, d can't observe a 0. So now we're in either c or d. And if we observe another 0, we know we're in state d. But what if instead of seeing a 0 at the end, we had seen a 1? Well, then d goes to a on a 1, and we're still in a single state there. And we can keep doing this. So 0, 1, well, after observing a 1 here, instead of a 0, d goes to a, and b goes to a. And we know we're in state a. Because c cannot observe a 1. And we continue doing this, listing out in lexicographical order all words we can observe. And there are others that can't be observed that are length 3. For example, you can't see a 1, 1, 1, or anything like that. But these are the ones that can be seen. Now let's recall the Markov order criteria that we had before. It was the smallest l such that this is true. And remember, I said this is true only if there's no entropy left in what state we're in after l symbols. So that's what conditional entropy means. It means there's only one possible s given a word. So let's consider words of length 0. Words of length 0, we could be in any state. We haven't seen a symbol to narrow us down yet. If we consider words of length 1, well, after observing a 0, we're in three possible states. But after observing a 1, we're only in one possible state. So conditioned on a 1, we know that this is a. But conditioned on a 0, this could be b, c, or d, with some probability. So this conditional entropy is not 0 yet. Then we keep going, still not 0. But after seeing length 3 words, we know we're in d, a, b, c, or a, depending on what that word was. But you give me any length 3 word. I can tell you exactly what state the machine is in. And therefore, this is 0. And so we know the Markov order is 3. Now to figure out the cryptic orders, remember, the criteria for the cryptic orders is we drop this having only seen up to the state we're considering. And we can see the infinite future. That means starting here, I can see exactly what words are going on. I just don't know necessarily what state I'm in. But being able to see into the future means I know exactly which paths I can start down that will actually continue going. If I'm going to see a 0, 0, 0, I can't start in b because this path dies here. And I know if I'm going to see 0, 0, 0, I can't start in d or c. I have to start in a. So conditioning on what word I'm going to see going forward will limit what states I can start in. That make sense? And so if we highlight the paths that are valid here, for example here, if I know I'm going to see 0, 1, 0, I could start in state a and go a, b, a, b. Or I can start in state c and go c, d, a, b. But I can't start in b and I can't start in d. So if we consider just these paths that can actually be, or if I consider just the states at any point that are actually on the paths that can observe this word and get rid of the rest, then we play the same game we did before. And we consider length 0, 1s. Well, there's still entropy here, here, and here. We go one more. There's still entropy here. But then we go here. We can only be in state c, only in state d, only in state a, only in state b, only in state b. So the cryptic order is 2. So this gives us a nicer method to calculate the Markov and cryptic orders, but it still has some problems. The first problem, or the first advantage is we don't need to know e or h mu at all. Those don't matter. It's integer-based, meaning we just need to keep track of what elements are in a set and which ones aren't, rather than keeping track of actual values, real floating point values, so we don't need machine precision. But when it comes to synchronizing words, they can be of arbitrary length and an arbitrary number of them, depending on the process. So we still have infinities to worry about, but we don't have to worry about differentiating very small numbers. So one problem down, but one still exists. Questions so far? All right. So can we do better? Yes. And the idea is instead of enumerating synchronizing words like we did before, we can encapsulate them all in an automata. And so let's start with that thing we had before. Did you have a question? We start with the machine we had before. And we start by not knowing which state we're in. We could be in a, b, c, or d. So we put an actual state that represents that here. And then if I'm in a, b, c, or d, we already saw that if I see a 1, I know I'm in state a. And if I see a 0, I'm in either b, c, or d. So we can add those in here. If I see a 0, I know I'm in b, c, or d. And if I see a 1, I know I'm in a. This should look familiar from the mixed state algorithm, which is the probabilistic generalization of doing this. So from b, c, d, if I see a 0, I know I'm in c, d. But if I see a 1, I know I'm in a. And from c, d, if I see a 0, I know I'm in d. And if I see a 1, I know I'm in a. And we can encapsulate all the synchronizing words of a machine by doing this type of construction. And we can enumerate those synchronizing words again by looking at all paths through that machine, or through that structure. So our synchronizing words are 0, 0, 0. That takes us from not knowing what state we're in to knowing exactly what state we're in, 0, 0, 1, 0, 1, and 1. Those are the four paths that will take us from this state of not knowing exactly what state we're in to knowing exactly what state we're in, if we see either of these, or any of those four words. And the longest such word, or the longest path from this state down to these states, is the Markov order. So how do we calculate that longest path? Well, we can do it by eye easily. But on a computer, it can be more difficult. So we have to utilize an algorithm that someone named, two guys, one guy named Bellman and another named Ford, came up with back in the probably 50s or 60s. And what it does is it calculates the shortest path, or the minimum weight path, from one state to another state. Minimum weight, meaning if you had some kind of transportation network that had costs on each of the edge, like how much it costs to transport from one city to another, you want to know the cheapest way to get from any city to any other city. This algorithm would tell you that cheapest way. And at first, it would seem like that's the wrong algorithm to use here, because we want the longest path, not the shortest one. But what we can do is, if we put negative ones on all these transient edges, the ones that take us from here down to these recurrent states, and put zeros on all of these ones, the weight of the minimum weight path becomes the length of the longest path. And so at this point, we can use Bellman Ford or another algorithm called Floyd Warshall, named after the two people that invented it. And they can tell you, they will spit out exactly what the length of that longest path is. And the nice thing about them is, well, we'll see in just a moment, actually. So the minimum weight path from A, B, C, D down to A, or B, or C, or D. And notice it doesn't matter which of these states we picked. If we wanted the longest path from A, B, C, D to C, or the minimum weight one, it would be equivalent to B, or A, or D, because moving between these states caused zero. So from an algorithmic point of view, we can just say this one and whichever state we called A here. And that's equivalent to the longest, or that will still tell you the minimum weight path. So if I had chosen D instead, the minimum weight path would be minus one, minus one, minus one, and then a zero to get to A, or this. Or if I had chosen B, for example, one, one, one, and then two zero weights to get down here. So it doesn't matter which of these states you choose in the algorithm, it will work. So for a few examples of what these types of structures look like for different processes, and then we can see some cases where it's infinite. Here's one example. I've colored red, the transient nodes, and transient edges, and the recurrent portion is in black here. So we can see the Markov order for this situation is three, one, two, three, or one, two, three. However, this machine here, the Markov order is infinite because I can go so the longest path from here down to one of these states is infinite in length because I can take this loop as many times as I want. And if we look down here, that's equivalent to going back and forth between these two states upon C and a zero. We won't know exactly whether we're in A or B if we just see a bunch of zeros. The first zero will take us, we know we're not in C because we go to B. But if we had been in B, we would have gone to A, and if we were in A, we would have gone to B. And after that point, all zeros just cycle us between A and B, never actually synchronizing us to the machine. And so we never know exactly what state we're in if we only see zeros. Similar to the even process where if all we see are ones, we don't know whether we're in state A or B. So this is a case where we have infinite Markov order. And the nice thing about Bellman, Ford, and Floyd Warshall is they will detect these negative weight cycles. So when we weight these all with negative ones, they can tell you that the minimum weight path is negative infinity. Whereas a lot of shortest path or longest path algorithms like Dijkstra don't work when there are negative weight cycles. All right, so the last example is the, we call this the Nemo process. And it's also infinite Markov order because of the loop right here. OK, so the Markov order algorithm is very simple. You just build that automata. We call the power automata. It's also the structure that's developed when you do what's called the subset construction, which is the way you determine finite automata. And it's directly analogous to the mixed state algorithm which unifilarizes a hidden Markov model. But the cryptic order algorithm is much more complicated. We start with the power automata like before, but we're going to modify it in a number of steps. And we're going to have five different statuses for the edges in this machine. So in this power automata, we're going to color the edges based on what state they're in. They can be in a state where we still have to check them. And I'll explain what checking means. They can be in a state where they're good, a state where we're adding the edge to the machine, a state where we're removing the edge from the machine, or where we're currently checking it. So as we step through here, we'll see we'll color the edges different colors. First thing we're going to do is set all the recurrent edges to good. So we don't need to check them. They're fine as is. But now let's consider this edge. Let's check this edge. Let's think about what this edge tells us. It tells us that if we were in a state of, or if we could have been in state C or D, and then we observe a 1, we ended up in state A. But let's look at state A. If I saw a 1 and ended up in state A, I had to come from either D or B. I couldn't have come from C. So in some sense, this edge is lying. It's lying because it was built going forward only, not with any goal in mind or where we ended up in mind. But if we know we ended up in A, we knew we came from D. We couldn't have come from C. Does that make sense? So what we're going to do is we're going to take this edge out of the machine. Because it's not telling the truth when we look backwards. We're going to take it out. And instead, this edge is actually this edge. Because we know we came from D and went to A. So this is the edge we actually were following. And to maintain that we don't know how we got to this state, we don't know how we got to D. Well, we do because there's only one path here. But if there were multiple paths, the answer could be different. So to maintain the fact that we don't know how we got to D, at least right away, we have to take this edge that went into C D and put it down going here. So that means we went A, B, C, D, B, C, D, D, A rather than A, B, C, D, B, C, D, C, D, A. All we're doing is we're getting rid of this C, but trying to maintain as much of that path as possible still. So we cut out this edge. We remove it. But we add this one, which is a clone of this edge. So now our machine looks like this. So let's check another edge, this one. Well, if I got to D on a 0, I had to have come from C. I couldn't have come from D. So we're going to do the exact same thing. We remove this edge because it was actually this one. And this edge becomes this edge. Now we have a state and an edge that can't make it to the recurrent portion, so we can just remove them. So let's just keep going. So there's this edge to check. It says I was in B, C, or D, saw a 0, ended up in C. I had to have actually come from B. So once again, we remove this, but this edge becomes this edge. Or we clone it. OK, let's check this edge now. Well, I had to have actually come from A. So we can take that out. And this time, there are no edges that get us to this state, so we don't need to maintain provenance and actually make a copy of any edges. We can just remove that one completely. So then this edge, B, C, or D, get in here. We know it came from C. So we remove that edge, add in this green one. Let's check this edge. It says here on a 0, ended up here. I had to actually come from B, so we can remove that. This edge. Now, this is where it starts to get trickier. B, C, or D, saw a 1, ended up in A. Well, I actually came from either B or from D, but not from C. So what I need to do is I need to actually create a new node to add to this graph. I actually came from B or D and went down here to A. So this edge is actually this edge. This edge, I make a copy to go here. And since this is a new node in the graph, I need to consider what it would do on a 0 also. Well, B or D on a 0 goes to C. So I end up with this. And again, I can remove that dangling node from the graph. So now we can check this edge here. Well, we know it's this one, actually. So this would go here. But if you remember, we already had A, B, C, D to B on a 0. We already checked and verified that that edge should not go in the machine, so we don't add it again. And now this edge, we verified and is good. So we color it blue. Because we've already checked B, D to A on a 1 is valid, looking backwards. Continuing on, A, B, C, D to A on a 1, I had to actually come from B or D, which we already have. That's this edge right here. So we remove. And then A, B, C, D to B, D on a 0. Well, B on a 0, I had to come from A. And D on a 0, I had to come from C. So we need to add a new state again, AC on a 0 going to B, D. And since we just added that, it's good. And we can delete that old node. And we finally have a structure where all the edges are good. Complicated, yes. What's the order that you process them in? You can choose any order you want. In Campy, the order we use is actually, we use the edges that are connected to the recurrent component first. Because that ends up leaving dangling states more quickly, so we can remove them and do less processing overall. If you start with the edges that were connected to the start state, you might have to process a lot more to end up with the same structure in the end. So we maintain a priority queue that are prioritized or sorted by first whether they're connected to the recurrent section, and then the ones that have the smallest number of nodes coming in. So ones that have A, B, C, D to A are checked after ones that are just B, D, to A, for example. So you do a topological sort sometimes? Yes. We sort them by first the size of this, smallest first, so the to node and then the from node for tie breakers. So then we do the same thing with Bellman-Ford. We wait the transient edges with a negative 1 and the recurrent ones with a 0, find the minimum weight path, and that's our mark of order, or a cryptic order. So here's some other examples. And you'll notice something here. There are two separate transient components. The algorithm can create that, and in which case you need to consider the longest path from each of these start nodes. What happens is it didn't happen in this past example, but it can in general. We had that A, B, C, D state, and it ended up getting split in making that A, C state. Sometimes when you do that, there are other connections outgoing from that state that also need their own thing. So both of these were split from an A, B, C, D state. One on a 1 and one on a 0. And so in this case, the cryptic order is 2 because we have a length 2 path right here. This is that same machine we saw earlier that had a mark of order of 3. This one, however, is a machine that had an infinite mark of order. We see it now has a finite cryptic order of 2. And that's because eventually, after we see a lot of zeros and ones, or a lot of zeros, we might see a 1. And once we see that 1, we knew we were in state B. And before we were in state B, we were in state A. And before A, we were in B, going back and forth and back and forth. But we might have then seen a 0, 1 to start with. And we don't know whether it was this 0, 1 or this 0, 1. And so we end up with a cryptic order of 2. We do know that eventually, the longest words we can see are a bunch of zeros. And then if we eventually see a 1, we know we were here. We can easily backtrack that 1 and a bunch of the zeros. But if we had started and we saw 1, 0, blah, blah, blah, blah, this part can't be retro-dicted, as we call it. All zeros we can retro-dict, because 0, 0, 0, 0. Well, we can't retro-dict this 0, which is this edge. Well, we might have seen a 1 before that, and we can't retro-dict that either. Even though that would have led to a finite Markov order, or not finite Markov order, but a finite synchronizing word. If it starts with a 0 or a 1, 0, we know that if we go back quite a bit, this is why there are 100-something slides. Oh, wrong one. 1, 0 is this path right here. So 1, 0 synchronizes us easily. But we can't back out what path we took to get here. We know we came from, we were in A or C prior to the B. And then prior to A-C, we could have been in C or in B. So we end up with the, what was it? It was B-C to A-C to B. So even though this branch of the machine can be perfectly retro-dicted in that, well, if I came here on a 1, I know I was here, and then some arbitrary number of zeros will take me back. But this path cannot be retro-dicted. So if we jump forward again through all of these, our last example, and I'll do this one on the board, when we originally saw it as the Markov order of the power automata, we had A-B-C with a self-loop on a 0 and then a 1 going down here. So we went from a very small transient structure of just a single node and two edges to a larger one with three nodes and four edges. So let me work out that really quick. Can you set this part of the board as good? There's no good chalk. OK, so it's coming. Can you see that? So what do we have here? Can you see that? OK, and then this is the structure we had before. This is the recurrent portion. Then if we do the power set construction, well, if I'm in A-B or C and I observe a 0, I just permute these values. 0 goes here, here, and then here. So we self-loop. And on a 1, we have to be in state A, coming from either C or from A itself. So this is the structure we had before. But once we were on the cryptic order algorithm, well, if I was in A and I came there on a 1, I had to have come from either A or C. So we make an AC state on a 1. And the only way I could have gotten into this state on a 0 is from coming from itself. So we clone this edge and go like that. And this edge is not necessary, because this is the actual one. Well, let's consider this edge. If I was in A and C and I got there on a 0, I had to come from B or C. So far so good. If I'm in B and C and I got there from a 0, I had to come from A or B. And if I'm in A and B and I got there from a 0, I had to come from A or C. And we already considered the edge of an A, B, C to AC, so we don't add that back in. Now, this is a disconnected component. We get rid of it entirely, and we're left with what's on the screen there. So we went from a very small transient structure to a larger one when we're calculating the cryptic order. But as you can see, it's also infinite, because we can cycle through these three states as much as we want. So that's how the cryptic order is calculated. You'll have some homework problems where we already have the code that will generate this graph for you, but you will then run that code to see the structure that results from a number of processes, and you'll find the longest path yourself. John. So this is an example of finite cryptic by infinite Markov? This one is infinite cryptic, infinite Markov. Finite is this one. This one was infinite Markov, finite cryptic. In fact, it's cryptic to infinite Markov. Question on it. Yeah. So what about the infinite length of 0s, just a future? Wouldn't that know which state you were in if you saw that future? Well, see what I'm saying? Ah, yes. That occurs with probability 0, so it is irrelevant. So those are three examples. So where are we now? We need to know the structure of the epsilon machine, but not the transition probabilities. We don't need any information theoretic properties of the time series. We don't need to know the exocentropy. We don't need to know the entropy rate, anything like that. It's integer-based in that we only have to worry about sets and transitions, not probabilities, no floating point values anywhere. And we encapsulate an infinite number of synchronizing words into a finite structure that has loops. So we don't have to worry about infinitely sized sets either. So we've gone for something that required floating point values, comparisons of floating point values. So we needed exact floating point values. Then we needed to compare floating point values. And we potentially had to grind our computer for an infinite amount of time. Now, we don't need to know those floating point values. We don't need to compare floating point values. And we don't have to grind for an infinite amount of time because we ended up with finite structures with a finite number of edges to check. In fact, we're limited exponentially in the number of edges for the power automata. It's 2 to the number of original states times the alphabet. As the total possible number of edges in practice is much less. And for the cryptic order, it probably gets closer to that limit, the alphabet size times 2 to the number of states. Because the states in the transient structure are elements from the power set of the original states. And that has size 2 to the n. And then there could be as many as one edge for each symbol going outwards, so times alphabet size. So lastly, we use Bellman-Ford or Floyd-Warshall to calculate the longest path. And those detect cycles efficiently. So if we have an infinite Markov order or cryptic order, these algorithms will tell us. Whereas doing so by hand is relatively difficult. So last thing I did was we can look at all possible six state epsilon machines with a binary alphabet and calculate their Markov and cryptic orders using these algorithms. And then we can plot that pair. So for example, if we had a machine with Markov order 4, cryptic order 4, we would put a dot here. And the size of these dots is equal to the number of machines with that pair of values. So the first thing to notice is that almost all the machines are infinite Markov, infinite cryptic. At least by the time we get to six state machines. And there are a number of other interesting properties. So we notice first off that 12 million, is that right? Or 1.2 million? 1.2 million. If you want to know the exact number, you can look at Ben Johnson's paper, enumerating epsilon machines, or enumerating finite processes. Are your dot size logarithmic size? No, they are linear. Linear. Or the area is linear with the number of machines. So the radius is square root. When you get to higher states, you have to go logarithmic, because otherwise this dot covers everything else. But I was able to find a scaling linear in size which this dot didn't cover any others. And I'll get to that interesting bit in just a moment. So number of features here. First off, of the finite Markov order area, so these ones here, we notice the density tends to be when Markov equals cryptic. So along this line here. And it's smaller. There are fewer machines with large but finite Markov order or small but finite Markov order. Second, we do notice that even when the Markov order is infinite, the cryptic order is finite in a large number of cases. These aren't very small dots. These are the biggest ones that aren't this one. Furthermore, there are areas, for example, right here, here, here, here, here, here, where there are no machines here. So there are, although there is a Markov 13 cryptic 3, there's not a Markov 13 cryptic 4 or 5. But there is Markov 13 cryptic 6. There are gaps in this picture at fixed machine size. We can, of course, find machines that have these Markov and cryptic orders. But within the class of machines that are of size 6 and binary alphabet, those machines don't exist. Those processes don't exist. Second, the thing I find probably most interesting are two related things. First off, there is a largest finite Markov order of 13 for 6-state machines. And that means, because what Greg pointed out earlier, and it's relatively easy to show, the cryptic order is less than or equal to the Markov order. And so the largest finite cryptic order is also 13. But if we consider the processes that have infinite Markov order, but finite cryptic order, the largest finite cryptic order is 11. There is no Markov infinite cryptic 12, even though there are Markov 13 cryptic 12. This dot is not covering up machines here. There are none with those properties, which is very confusing. So if anyone wants to work on that, another project problem. Also why these gaps might exist is unknown. This trend seems to hold for larger machine sizes also. Almost all, I think this is 96% of all the 6-state machines with binary alphabet are infinite Markov infinite cryptic. And 98% are infinite Markov itself. So these dots here account for only 2% of the machines. I think that's actually about it. We're a little early. So questions?