 So this is joint work with Hugo Bazil, Eric Faber and Blazianist, all three from REN, and Hugo was supposed to come and present the talk here, but he ditched, which means that I get to present it. So I made him give me his thesis slides, so Hugo is a PhD student who is just graduating, and this is going to be based on his slides. So as promised to Benjamin, so I have about 15 minutes of the main whatever, to give you an idea of what the problem is, what is interesting, why you should be interested in it and so on, and in the remaining time I will go into the details. So at that time you can stop me whenever you want, right? All right. So the outline of the talk broadly will be this. I'd like to start with the model, look at some, the problem that we are considering, what are our results, and what are our approaches, and then towards the end I'll try to discuss a little bit about related work as well as to extend this to some settings, and what were more applied settings from security. So the basic motivation of our model comes from imperfect information. I mean, the exact state may not be known of an underlying system, and this could be because of various reasons. You may want to hide the state, maybe there is a lot of noise, maybe the sensor only looks at one level and so on, and so you have many examples of such states. As I told you, these things are all REN specific. So this is an automatic driverless vehicle which is running around the campus there, and we tried putting our feet in front to see whether it would stop and it did. So somewhere it is doing something. You could have a lot of examples of imperfect information, and the question is looking at only the observable information coming out of this system, what can we recover from it? This is a classical kind of problem, and we want to look at it in one specific setting. Indeed, the generic kind of statement would be that, if I am given an observation, can I figure out which system produced this? So for instance, take a randomly generated sequence. Yeah, I think all of you know about this sequence, and the question is which stochastic system produced it? On the right is a deep neural net which generates statements randomly, and on the left is another stochastic process, and we don't know which one produced it, and our goal is, for instance, to figure out who could have said this, right? So with that as the motivation, I think we can start with the real talk. So this is going to be the funnest that you'll get. So more formally, the classical model, one classical model that people have been looking at is on hidden Markov models where you can consider a Markov chain, which is a bunch of states with probability over the transitions, and the hidden Markov models typically, there is a signal which is being emitted at every state, and very often it is assumed that the signal is stochastic. So there is some probability associated with which the signal is emitted, and there is a probability associated with which the transitions are taken, and as always from one state to go, the sum of the outgoing probabilities to go to another state, except to one, classical, right? And but we will very soon look at kind of a, also a formal model, but of a labeled Markov chain where we abstract away and we consider the labels to be on the transitions rather than being on the states. Is that fine? Yeah, so slight modification, and the reason is that in the, let's say, automated theory community, a lot of people have looked at this, whereas the first one comes more from a motivation of the AI ML community. So in some sense, one of the goals of this work, as I will say a bit later, is also to bridge the gap between the way the problems are looked at. Yeah, so as written here, this is often looked at automatic control, this is often used in formal methods, and one can easily see that expressively they are about the same. You can always, like you can basically add one more step and say that, okay, if this is emitted with probability one, I can actually break it and write that probability on the state. So I won't go into the exact translation, but this is not very difficult to see, right? So all we are saying is that I can encode that as like a partial step in between, right? So then the question is really that if I tell you that there is a bunch of systems, a one to a k, right? And in particular, we will start with two systems because otherwise we can always do a pair-wise comparison. I give you two systems and an observation which has been produced. What is an observation? These labels which have been emitted because we made the assumption that these are letters, this is a word over this alphabet, is that fine? All right, so given a bunch of observations, I want you to tell me which of the systems produced it, all right, is the setting clear? So what we have is a bunch of, and for my talk I will keep just two, but I can always extend it. So I have two hidden Markov models, but again I'm going to look at only labeled Markov chains and I'm starting here and one of the two has happened, right? So what is known to the observer? We know the system, so we know the structure of the system. What we don't know is which choice was made, is that fine, that's important, right? So for instance, suppose I see the sequence B, B, B, B, A, what would you conclude? Of course it belongs to A1 because only A1 has A, A1 has no A, right? And, but indeed in this example, can you always do that? So there is a word for which you cannot figure out who produced it, right? If I just generate a bunch of Bs, then I don't know who produced it, is this fine? All right. So one can think of different ways of defining the classification problem and we will look at some variants. Again, there is a lot of literature on many different variants and again, depending on which community you belong to, you may have come across these with different names, right? And one or two of them I will introduce here, but please look at the paper for more references, yeah? So, given two labeled Markov chains with language of observations, L1 and L2, one question could be that, okay, for any word, any infinite word that I have, for, does there exist a finite prefix, right? For which I can actually say, does it belong to this guy and not that guy? For every word I want to be able to do this, is that clear? So in other words, if I take this, it is not sure because there exists a word for which I cannot figure this out, yeah? This is sure. I mean, I really want to be certain. No matter what comes, I want to be able to say that I will distinguish it and that is not true for this, right? But actually look at the probability measure of the words for which we cannot decide. In this case, it's a probability zero set, right? Because it's exactly one set which is bigger and bigger. I mean, you can take the, as an infinite word, it's a single word, right? So one can immediately, therefore, define almost sure classification where you are saying, okay, fine, I want to actually figure out is the probability set, as a set, is that set of measure one? Is this fine? Okay, so these are two questions, but actually we might be interested in something even more, so if I take this one, so for instance, suppose I take AB and AB, but on the left-hand side, I have 1 tenth and 9 tenth, and on the right-hand side, I have 9 tenth probability of doing A and so with 1 tenth probability, I emit a B, right? So for this one, what do I do? Whereas I would like to be able to, so if I gave you this, if you, what would you do? What would your best guess be? The languages are the same, right? What would you guess? Well, you will try it, right? You will sample it, so in other words, you will try a long enough run and if you see a lot of A's, you will conclude, ah, it must be from the second guy and if you see a lot of B's, you will conclude, okay, yeah, it must be from the first guy, right? And with a high probability, you would be true, right? So that is kind of what we are going to. What we want to capture is this notion of limit sure where we can add a confidence with which we can classify these problems. Is this fine? Okay, so indeed, when I mean, when I say I'll use the term classifier and the classifier is nothing but for any finite word, I will just say whether it belongs to 1 or 0. Indeed, classifier means that it can make an error, right? I mean, it's not perfect. It need not be a sure classifier. So the question is, when does a classifier exist and when it exists, how do we build it? So when I say classifier, it could be a classifier which satisfies this definition, second definition or the third definition, right? So it turns out that this problem actually unsurprisingly has received a lot of attention. And if you look at the sure classification problem, it was solved long, long ago where it was shown to be in log space. Essentially, the idea is you just take a product of A1 and A2 where you take the product with respect to the actions and you check whether there is a loop which comes, right? Similarly, this problem of almost sure classification is more recent, but it has been again looked at in the language of diagnosis. For those of you who know, again, I won't go into the details. But there is a easy enough proof which shows that this is p-space complete, okay? So the goal of this paper was to, this work was to look at this in-between notion, not in-between, whatever, this other notion of limit sure classification and try to come up with algorithms for it and try to come up with characterizations for it, right? So let's define it formally and while defining it, I'm going to slightly change things, right? So I will say that two labeled Markov chains are limit sure classifiable. If there exists a classifier, remember, a classifier is just a function from sigma star to 1, 2, such that if I see a run of one of the process, then and if the observation that is, so observe of rho is just the labels emitted by rho, right? The probability that I make a mistake, that is, I say that it, the classifier says that it belongs to the other guy when it belongs to process 1, right? This goes to 0 as the length of this goes to infinity. So what I want to say is that as we have more and more statistics, right, on the observation, we should be able to always give the correct answer at the limit. That's what this is saying. Is that fine? Okay, right? So again, this is one way to define it. So yeah, so the probability of misclassification, that is, of making an error, goes to 0 as the length of the run goes to infinity. Okay, good. More examples. This guy exactly satisfies, it's been made to in some sense satisfy this property. And indeed, if the proportion of A is greater than half, I mean, if I see more than half times A, I will report one and otherwise I'll report two and this is my classifier, right? But this classifier has a very common name in, at least the AI and control community and sometimes it's called the maximum a posterior classifier which just says that you look at that, you look at the probability, if the probability of this is greater than that, then you return this. And indeed, what we are saying is that if there exists a limit-sure classifier, then this map classifier, which is kind of the classical one, is a limit-sure classifier. Again, formal proofs I'll let you read the paper, but this is not a very difficult thing to prove, but you can see that that's kind of what we are doing. So if you know that indeed this is possible, that this class, limit-sure classification is possible, then there is a very simple type of classifier which can give you the answer. That's what we are saying, right? So some observations. So notice that the, yeah, this I think we already saw, even though the language, non-probabilistic or non-stochastic language is the same, it can still be classified by looking at the statistics. And this is as different from sure and almost sure, of course. But on the other hand, if for every observation, the probability of observing the observation is the same, then of course you cannot classify it, right? I mean, again, something very trivial, which is that if I'm able to show that two things are equivalent, that is for all words in Sigma star, the probability of seeing one word from a given initial state. So I forgot to write, this is from some Sigma zero. So from a given initial distribution is equal, then this is called, these two languages are said to be equivalent. And indeed, if I know that two languages are probabilistically or statistically equivalent, then I cannot classify them with this limit-sure classifier. The question is, is the converse true? And in general, it's not, right? So actually now, it so turns out, this is again quite classical, and it was already known long time ago that checking equivalence between languages of two LMCs, actually one of them is a LMC, the other paper is on PFS, checking language equivalence is already in polytine, okay? So this is again not very difficult, but I won't go into this, this is not really our result. And this is kind of the base case, right? So we start from here and we say, okay, what else can we do? Right? So I come to the kind of the breakpoint slide. So what we give the results in this paper are, we give a new characterization for limit-sure classification and a polynomial time algorithm to check it, right? And indeed, while to do this, we actually develop a new notion of stationary distributions over labeled Markov chains. What is classically known are stationary distributions over Markov chains? We have to lift it to in kind of in a correct way, what we believe is the right way to lift it to labeled Markov chains and that's what we use as our primitive, right? And in doing so, we are also able to compare with other notions of opacity and distinguishability which have been looked at in other communities. And there we figured out while doing this, that in 2016, this notion of distinguishability was defined by Stephen Kefir and I think Karun Prasad Sisla and they showed already that there is a polynomial time algorithm for this. So it so turns out that our notion of classification for labeled Markov chains coincides with the notion of distinguishability. It's not completely trivial to see that, but you can prove both ways. And as a result, we could have just directly just said, well, it's the same as that and therefore. But nevertheless, we wrote this paper and why was that, A, we had already done the work and B, it was also the case that, oh sorry, it's not written here. So it's also the case that our technique differs quite a lot from the Kefir and Sisla technique. Kefir and Sisla look at, define a metric between probabilistic systems and they prove that when that metric is equal to one, things are indistinguishable, right? Whereas our thing does not go through such metric arguments and we actually use what I would like to say is more of first principles approach, but that's up to you. It depends on what your first principles are, right? So that's kind of one reason to look at it. There are other reasons also. In doing so, we were able to also kind of get a slightly more efficient algorithm, arguably in the number of, so of course both these are p-time because you can encode it as a linear program and the number of variables that we required was slightly fewer than the Kefir-Sisla in the worst case and the other reason was that when, so if you want to think about a security perspective, that is if you want to think about a security domain, then what would you want to talk about? You may want to say that is this really, I mean is this how an attacker would try to classify? What would an attacker do? If he finds, he or she finds that after several attempts, he or she is not able to classify, you would rather take a reset, right? You would say, okay, screw it, I'm going to start again, right? So if you do this, this is what we call attack classification, where we extend this model with the ability of the strategy to do resets. Indeed when I do reset, the player can decide to now take not the first model but the second one, but as long as this is allowed and only finitely many attacks are required in the limit, then we are able to show that this problem, the way we have defined it, we can still show that this problem can be solved in PSPACE. So it's in fact PSPACE complete, whereas this shows a big difference with the notion of indistinguishability from Kefir at all where that problem becomes undecidable, okay? So that's kind of, since I won't have time, I've already told what is in my last slide, so we don't need to go there, right? All right, so in the remaining five minutes that I have, I'm going to go into the details, but at this point, if you have any questions, please ask. Okay, great, all right. So how did we start? So well, we started by actually looking at, we didn't start this way, but this is probably an easy way to start, which is to look at what happens when you are at a stationary distribution of the underlying Markov chain, right? So suppose I look at, so maybe I will skip this slide, I'll go directly to the second slide. Yeah, so this basically, if I take two probabilistic systems, labeled Markov chains, and if I know that they are ergodic, and if I know that the probability mass on all the states is strictly positive, then it turns out that, you remember the trivial direction that we had when we started with, so this is actually by implication. So in other words, so I'll just explain that with an example. So basically the result is, and this it turns out again was known with another definition called opacity, but in this restricted case. In the case where things are ergodic, and in the case where you assume that initially the probability distribution covers all states, that is the support of the probability distribution is the entire space, then you can show that they are not classifiable if and only if they are stochastically equivalent. So both ways it's true, and kind of the idea is look at this guy, right? So these are two different LMCs. By putting two start states here, I just mean that the initial starting probability is greater than zero. I don't mean that there are two starting states. And if you notice, if I reach the stationary distribution, from the stationary distribution, the observations that I see are the same. They cannot be used to distinguish the two sets, right? So if I assume that I start from there, then stochastic equivalence actually coincides with the limit-share equivalence that we are looking at, right? So that was kind of where we started with. But you can see that this idea that the initial states should all be having a non-zero probability was crucial. Because suppose I take these two systems. These are exactly the same systems. The only difference is that the start state is here and the start state is there, right? What is the stationary distribution of this? Half and half, right? The stationary distribution is the same on both. So therefore you would conclude that it's not classifiable, but actually this is classifiable, right? Why? Because the first step tells you here that in the first step there is no B possible. Here there is no A possible. So indeed this is classifiable. So in other words, this assumption that in the initial step all your states must have some probability mass was crucial, right? So this allows us to say, okay, to extend this result to the general LMC setting, we have to actually look at states which are not reachable with just one observation. And that's kind of the motivation to define stationary distributions over labeled Markov chains. So how we come up with that is kind of, I would say, one of the nice contributions of the paper. And again the definition is very elegant and I would not be surprised if we can find it in some other guys somewhere else, but we searched a lot and we didn't find it. So let me, so I have again a lot more detail, but instead of doing that, let me just say that once we define such a notion of a stationary distribution over LMCs, what we are able to show is that one cannot limitually classify between two processes if there is some belief state which is reachable, that is a product of states, right? So a set of states which can be reached such that from there you see this probabilistic equivalence, and the language, so this is written in semi-formal language, the observations seen till there and the observations seen from there are both the same. So this is the way to extend. So remember that the earlier theorem had only this, it didn't have these two and it wasn't over beliefs. We have to define the notion of beliefs which are sets of states where you can be at and then define this notion of stationary distributions and we are able to show that from that stationary distribution this probabilistic equivalence will hold. That's kind of the way we lift it to the general case, okay? And that brings me to, I think, yeah, so I don't have too much time. So as I promised, there is this link between distinguishability and if you look at Stephen Keifer's distinguishability result, the P-time algorithm there was proved in a LIX 2016 paper and it used itself as a subroutine CSL 2014 paper by Keifer and Chen. So it's quite a complicated nontrivial result to prove that and I believe what we have is something as nice if not nicer. So that kind of brings me to my ending. I will not describe the security context. I'll just go with the conclusion and just say that, okay, this is the kind of stuff we have done and open to questions. If it is limit-sure distinguishable, it's just the map classifier. It's quite easy. I mean, the map classifier is limit-sure if it is distinguishable. If it is not, then, yeah. So, like, obvious done for dealing with the, like, problem of sort of initial things distinguishing might be is that maybe it's the case that you're distinguishable if and only if these distinctions are the same and you have the same language if possible. What? I assume you have a counter-example. Something like that. That doesn't, that's not sufficient. So we have a counter-example in the paper. Maybe I'll explain it to you offline. Yeah. Yeah, that's a good question. So basically what the classifier is, is basically saying that you look at the frequency of the behavior and depending on what the, so if you're given a word, right, all you need to do is to check whether the number of, I mean, if it's more likely to be here or there and then you answer one or zero. So again, we did not go beyond that, but I mean, in practice, if you want to construct it, I don't know what it would be. But I guess this is a well-studied problem. So I mean, the reason we did not look at it, we lost interest actually once we showed that it was the map classifier, which was the best. Before that, we thought we could come up with a different classifier, which is better. But then we showed that this is kind of already a limit-sure classifier. Then we kind of lost interest in showing anything more. But yeah, I don't know. So the classifier takes as input any finite word and tells you one or zero based on that. But the property indeed is defined in terms of the limit, but yeah, but it's a probabilistic thing, right? Yeah, so that the property, yes. Because I want to say that, yeah. So Dave, do you consider how many words you would make without a classifier? No, that's a good question. So that would be a nice direction to take. So the question is, how quickly can I classify it? Then maybe map classifier is not the best classifier. Maybe, yeah, that I haven't thought about. Yes, what is the likelihood radio or how do you estimate? I don't know the answer to that. Thank you.