 So now we have Ryoma Shinya from Akita University who is going to be speaking about a quantitative approach to the primitive words conjecture. Okay, thank you Non for the introduction. Do you hear me? Yes. My voice clear, okay, okay. So hi, this is Ryoma from Akita University, Japan. I want to say thank you to all organizers and I suppose that making our timetable of this workshop is not easy because the time zone of all participants are quite wide from Texas to Japan, but they made a perfect timetable at least for me. Thank you. Okay, so this talk is divided into mainly three parts. And the first part is a lot of survey talk. And I first introduce what's the notion of the primitivity and the primitive words conjecture is. Okay. So in this talk, we call our non-empty words W as primitive if it cannot be represented by a power of any proper shorter words. Okay. So here after we only consider the case that the alphabet consists of only two letters A and B and actually this is enough for solving general conjecture, okay. And we simply write QA as Q, okay. So Q denotes the set of all primitive words over a binary alphabet A, okay. So for example, consider word A, B, A, B, A. So this cannot be represented by a power of any shorter words. So this is a primitive word. It is in Q. But if we add the letter B from light, then we obtain the word A, B, A, B, A, B, which is a cube of A, B. Three repetition of shorter words A, B. So it is not primitive. So the conjecture is very simple. Q is not context-free. And this is still open since the 1991. So this conjecture is soon to be 30 years old, which is originally posed by Domoshi and Hobas and Ito sensei. So first I explain some reason why the primitivity is important, okay. So primitive words are like prime numbers in natural numbers. So because we have some unique factorization theorem for words by using primitive words. Every non-empty word W can be uniquely represented by a power of primitive words, okay. So this gives us a good decomposition of the set of all words over A denoted by A star, okay. The any word can be represented uniquely as the power of primitive words, okay. So Q gives us a good decomposition for the set of all words, like prime numbers. And there is several different characterization of primitivity. One can define the primitivity by using the notion conjugation. So consider the word W of the form UV and we denote by U inverse W, the conjugate of original word W by U. Here U inverse represents the removing prefix U from W and adding U as the suffix, okay. And then we obtain our word VU, which is a conjugate of original word UV, okay. And if U and V are non-empty words, then this conjugate is called a proper conjugate. And here is another characterization of primitivity. W is primitive if and only if W is not equal to every proper conjugate of itself, okay. So if we think about, if you think the conjugation as a partial morphism is on words and then the primitivity just means there is no non-trivial automorphism. So primitivity has some similar flavor with the notion called rigidity, which is important in our model theory and finite model theory too, okay. So because the primitive words gives a good decomposition theorem, so it applies a central role in algebraic coding theory and the combinatorics on words too. And also there is some special subclass of primitive words called the Lindon words and which is useful in text compression, okay. So this is a very rough reason why the primitivity is matter, okay. So here is the picture of two guys. The left one is Ito sensei and the right one is the middle one is Pauli sensei. And they recently published the monograph at which consists of numerous results of primitive words and context-free languages, okay. So these two guys sometimes refer this monograph. And actually my title page is inspired by this book. Yeah, I mimic it, okay. So here is a third guy, Shira Fazukas from Akita University. And his supervisor was a party at the University of Debretsen, Hungary. And he also studied with Masami Ito as a postdoc researcher in Kyochang University. And his main topic, his main research topic is combinatorics on words and formal languages. And he is familiar with primitive words conjecture too. So I moved to Akita University three years ago. Since then I sometimes discussed with Shira sensei about combinatorics on words and primitive words. And then I have been interested in this conjecture and then start to survey and study it. So here is brief history. So now let me explain some brief survey of known approaches to the primitive words conjecture. So actually we have a good theorem called Chomsky-Stembeger's theorem stating that the generating function of every ambiguous context free language is algebraic. By using this term, we can formally show that Q is not an ambiguous context free. Actually this is first to be shown by Peter Sen but in 1994, he showed that the generating function of the set of primitive words has this form. It can be represented by using maybe this function. And he showed that this is known algebraic generating function. So Q is not an ambiguous context free language. Okay, so this is a generating functional approach. But as far as I know, for general context free languages, there is no good theory of generating functions. So I think this approach is very hard to apply to the conjecture. Also there is more a formal language theoretic approach. Constructing a regular language so that this condition holds. Because if we have some context free language L and some regular language L and then their intersection is always context free. So if we can construct such language L for Q, then we can show that Q is not context free. Okay, however, the monograph says that by some results of Katzoni and Katzla, this approach also seems to be hopeless because they showed that for many forms of regular languages, their intersection with the set of primitive words is always context free. And then now there are results called Katzoni, Katzla theory, which I don't explain deeper. So please see the monograph for details, okay. And but we have another approach which is maybe a most simplest approach like pumping lemma like technique, okay. But this approach is also hopeless because actually Q resists almost all known tests of context freeness, including various extension of classical pumping lemma, okay. So now let's turn to the main topic of this talk, some quantitative approach to the primitive word conjecture. Okay, so let me first introduce the notion of density of formal language sheets. So here we call, we like delta AL as a asymptotic density of a language L, which is defined by here, which is a limit of this fraction. So let's forget about this limit and then only consider this fraction. So the value of this fraction exactly represents the probability that the randomly chosen word of length N is in L, okay. And then take its limit. So that is a asymptotic density of a language L, okay. And then Brazill showed that if L is regular, then its density is always rational number if it exists, okay. So this fact is basically follows from the fact that the generating function of any regular language is rational, pretty fact, okay. So for regular languages, we have another pretty fact. So here we call a language NUL if its density is zero. And this fact gives alternative characterization of non-NUL regular languages. A language, a regular language L is NUL if and only if it is dense, dense in this sense. It intersects with any ideal language generated by any words. Here A star, W A star is called the ideal language generated by W which consists of all words contains W as a subword, okay. So basically speaking, so this fact says that two different notions of largeness, one is a major theoretic largeness, non-NUL and another one is topological largeness called dense equivalent for the regular case, okay. And then I want to explain the further fact too. So the left to right direction L is not NUL in price L is dense is true for any language not necessary to be regular languages but the converse direction is not true in general. So from left to right direction is very easy. It can be proved by using infinite monkey theorem and maybe some participants remember that in last year's CLA I gave a talk and the main topic of my talk was extension of infinite monkey theorem, okay. I really like this theorem which very easy to show and very classical, okay. The infinite monkey theorem says that any ideal language is very large is of density one, okay. So now left to right direction can be shown like this. So L is not dense means by definition there is a sum of words W which does not intersect with L, okay, in this sense. But so we have this in equality because the compliment of L contains this language. So the density of this language L is smaller than one minus the density of W which is density one by infinite monkey theorem. So the density of L is zero. So left to right direction is true for general language but the right to left is not true because we have some very simple counter example which we call a semi-dike language which consists of all our consistent sequence of left and right parlance, okay. This language is dense because for any broken sequence of the parlance we can add left bracket and light bracket so that it's consistent, okay. So this language is dense, not forbidden words but actually this language is no, density is zero, okay. We can formally prove that this fact by using our very basic connection between the catalan numbers and the number of semi-dike words of language M, okay. So now let's turn to a quantitative property of the set of primitive words Q. The first, I demonstrate that Q is very large, okay. So Q is called null, called null means it's compliment is null, formally it's density is one, okay. So this is very easy and maybe folklore but I give you a proof because it is simple. So we show that the compliment is actually null, okay. So because each number N has at most two square root N devices and if W can be, the W is now of language N, if the W can be represented by such form a power of proper shorter words then the B should be language smaller than half of N, okay. This is very basic. So we have this upper bound for the number of non-primitive words of language N. Here, sharp denotes the cardinality of these sets and this sets is exactly the set of all non-primitive words of language N, okay. And here the two square root N represents the possibility of divisors, upper bound of divisors and the right hand side represents the upper bound of the possibility of choice of this word V, okay. So simple calculation gives us the fact that the compliment of Q is null, okay. Because this fraction tends to zero, okay. So recall that we consider the case, the alphabet consists of two letters A and B so that this denominator is to the N's power, okay. Okay, so anyway, this fact is very simple and maybe folklore but I don't find any literature which pointed out this fact explicitly. Anyway, so this fact actually gives us a rough intuition that why Q fulfills various extensions of fomping lemma like test of context-freeness, okay. Because any pumping sequence cannot escape from Q. So now recall that this is exactly the statement of classical pumping lemma, okay. If L is context-free, there is some language P called the pumping length. And then if L contains a sufficiently large, sufficiently long word U, which is longer than P can be factorized as the W, X, Y, Z which W and Y are pumping part. And we can pump this part arbitrary many times preserving membershipness of L, okay. So if L contains U and U has some factorization of this form and then we can pump it. And then if the W, N, X, Y, and Z is in the outside of L we can conclude that by using pumping lemma, L is not context-free, okay. L is not context-free. So this is how pumping lemma like technique works. But now how about Q? Because Q is very large. It contains almost all words. So any pumping sequence basically cannot escape from Q, okay. Because it is very large. So this is a very rough but good intuition why the pumping lemma like technique doesn't work for the disconjecture, okay. Okay, so now I want to introduce another theorem which is more hard to show. Every regular subset of Q is actually very small now. So this theorem states that basically while Q is very large, but every regular subset of Q is not. So in TTV, this means that there is no good lower approximation of Q by regular languages. We cannot have any good approximation from lower, okay. So I mean, Q has very complex shape from the viewpoint of regular languages. And then proof of this theorem uses a basic semi-group theory including Green's relations and Green's theorem. And I think not so much participants are familiar with semi-group theory, but I think, okay. I think I have enough time. So I give a quick introduction to semi-group theory and I want to explain the proof sketch of this theorem. Okay, now we can see the monoid M. M is not necessary to be finite. Infinite monoid is okay. So for monoid M, we can define Green's four relations J, L, R and H as like this. And all of these relations can be defined by using the notion of ideals. The element of M, A and B are equivalent with respect to J relation. If and only if the ideals generated by A and B are equal, this is the definition of the J relation. And L relation is defined by using left ideal and it's DL, R and H is the join of the relation L and R, okay. So basically we can interpret this relation by using some graph theoretical intuition because so consider the J relation, okay. A and B are equivalent with respect to J if and only if there are four elements X, Y, X prime, Y prime such that X, A, Y equals to B and X prime B, Y prime equals to A which can be interpreted as A and B are mutually reachable by both sides much prying by other elements. So and B belong to the same strongly connected component in the category graph of M. This is the notion of J relation, okay. So L can be interpreted as some left reachability relation and R as a light reachability relation, okay. So now the statement of Green's theorem is follows. Let M be a monoid and A be its element. Then the here HA denotes the equivalent class of A with respect to H relation, okay. And the theorem states that HA contains some idempotent element if and only if it is actually a subgroup of M and for the identity element is E, that's idempotent, okay. Okay, so this is Green's theorem and Green's relation. Can I ask a question? Of course. By saying a subgroup of M you mean like a submonoid because M? No, no. So here subgroup means that subset of M which every element has an inverse element. So the unit element not necessary to be coincided with the M's identity, right? Okay. Are you satisfied? No, probably because maybe it's a very stupid question because I thought in monoid we cannot necessarily take the inverse. So in general monoid has no inverse element in general. Okay. Monoid plus inverse element is just a group but yes, no, okay. So anyway, so light to the left direction of Green's theorem is trivial because if HA is a group then it surely contains the identity element, okay. And it is because identity, so it is also idempotent. So light to the left direction is trivial but left to right is non-trivial and which is actually what's Green proved, okay. Sorry, just to ask Sergey's question. So do you mean it really is a subgroup rather than a submonoid? Okay, it's really a subgroup. That's my answer. Sorry, could you repeat that? Can you give just a small example of monoid? Sergey, your connection is bad. Sergey, your connection is bad. Maybe you could write it in the chat window. Yeah, chat is more preferable because I have better reading skill than listening skill. Maybe a small example. I cannot use white words, so. Okay, anyway. Okay, the proof actually contains how to use Green's theorem. So maybe if I explain the proof, maybe he can grasp some intuition. So yeah, maybe you can go on and it's unclear we can conduct later. Anyway, so this is our statement of Green's theorem, okay. So now let me show, let me demonstrate how to prove this theorem, okay. So now we assume let L be a regular language with a positive density, okay. And then let Eta be the natural morphism from A star to the discorsanted monoid called syntactic monoid of L. And here this relation, shielded L is called the syntactic congruence and defined like this. The two words U and V are equivalent on this congruence. If and only if which these are, the L cannot distinguish these two words U and V by concatenating other words, okay. This is a concept of the syntactic congruence. And this is actually a congruence so we can take the quotient, okay. The classical Myhill-Nelode theorem states that the L is regular if and only if this is a finite monoid, okay. And then let S be the syntactic image of the language L, okay. So now we consider this situation, okay. So the first claim I didn't, I don't give a proof detail is that the assumption that L is positive density and the syntactic monoid is finite. So we can conclude from this assumption, we can conclude that S contains our J minimal element of T. Here this ordering J order is defined like this. A is smaller than B if it's generated more, generated ideal is smaller than B's B's, okay. So claim one can be shown by using infinite monkey theorem but I omit the proof detail, okay. So the second claim is that T is J minimal in price. This is rather trivial for any N, T. N's power of T is equal to T with respect to J deletion. So we have the J equivalence across T here, blue rectangle and it contains for any N, TN, okay. And the claim three is a bit semi-glupe theoretic claim. The syntactic monoid is finite and T and TN are equivalent with J deletion in price. Actually these are H equivalent. So the equivalent class of, H equivalent class of T is exactly the J equivalent class of T, okay. And then the last claim is that this is a very standard claim, very standard argument in finite semi-glupe theory that the finiteness of the syntactic monoid in price for some K, the K's power of T is idempotent, okay. So we can apply Green's theorem so that HT is a group with the identity element TK. So this is the exactly subgroup of the syntactic monoid of L with the identity TK. So now consider the non-empty word from the inverse image of T, okay. Because T is in the syntactic image of S whose inverse W is also in L, okay. And then consider the non-primitive word, K plus one powers of W and then calculate its syntactic image because eta is a morphism, we have following equation, this equation. And then the image of W is exactly T. So we have the K plus one powers of T and but this is exactly the TK much fried T which TK is the identity element. So actually we have T and it is in S. That's the non-primitive word, W K plus one is in L. So this is the sketch of the proof, okay. Okay, so I have a few minutes left. So I'd like to conclude my talk with few open problems. Okay, so in this talk, we gave a brief introduction to the primitive word conjecture including some survey and some brief intuition why this problem is harder to solve. And we also describe a special two quantitative properties of Q, it is very large, but any regular subset is very small. So Q has very special form, special shape. And the third item is just my opinion. Maybe false, maybe false. I think for tackling this conjecture, a study of the theory of large context free languages is important. So this is the last slide, three open problems. Maybe it is better to discuss these problems in open problem session, I think. Okay, thank you very much, Shiora. That's all, thank you. Thank you. Hi. Perhaps everybody can unmute and give a round of applause. I will start. I know these things are a bit awkward, but so we have time for a few questions. Are there questions? I have a question. We have a few, oh, I see we have a few questions. So is that Olivier? No, yes, I speak, yes. I am the guy who is speaking. Yes, okay. So my question is, so you know that asymptotically, there is almost the same number of element in Q and in the free language with two later. Do you know more precisely what is the asymptotic of number of, what is the asymptotic of Q? So Q is, sorry, so yes. So you have this expression. Yes, but can you, you are able to analyze this expression because it seems that if we use Dirichlet series, it could be quite easy to understand what is the meaning of this expression. It's just another fun. I only know a laugh asymptotic formation of this generating function because it is actually new. So maybe it tends to, maybe it can be upper bounded by the, properly upper bounded by some exponential function, but I don't know precise asymptotic formula of this language. I only know this equation and I only know its density is zero. Okay. Yes, as Marta Péban said, it's just, okay, you can express, you have a solution of one, divide by one minus two z is equal to some product of, some Adamard product of Q for both sides. Okay, so it is easy to understand this. So this equation is actually not very difficult. Yeah, yeah. The question is. Question is. What is the asymptotic of this precisely? I think it is easy. Okay. It's maybe easy, but the answer is I don't know. I don't know precise asymptotic formula. Okay. Because we can use Mellin transform to get something about this. Mellin transform maybe, yeah. Okay. Yes, yes. I think question. But it could be interesting to have more terms in the asymptotic expansion because, okay. It's two to the N. This is the dominant part, but after it could be useful to understand what is the next term in the expansion. So it's clearly it's two to the N times something, one plus something, but I don't know what. Do it is minus something? One minus something, yes. Yeah, yeah, yeah. So Mahtan is answering your question right now in the chat. Yes. But can I have a remark about earlier question maybe? And because you told earlier that it's not possible to approach, so we can only approach an ambiguous context, three grammars with generating functions. And that's why with generating functions, we can only approach an ambiguous context, three grammars. But at the same time, we can enumerate words from ambiguous grammars, but up to multiplicity. That's correct. So then if a word has several multiplicities, we could potentially use inclusion methods to enumerate words on the one. But it's not a rigorous remark, but no. But I don't understand exactly, but our conjecture is that there is no grammar which generates this language, right? No, yeah, you're right. But yeah, it's difficult to say how many multiplicities are there for the word. But Mahtan, yes, yes, yes. Okay, okay, okay. So the, yes, we can, from the ambiguous grammar, we can calculate generating function, and but it does not count the number of words. It does count, it do count the number of present trees. So words with multiplicity, yes. So ambiguous means there is many parsing three for a single word, okay? The number of different parsing three is exactly the multiplicity of word. It's okay. Okay, just a comment that we're running close to the start of the next talk. So maybe we should, so, I mean, you, you know, you continue, I have an idea. So you can continue this discussion over the Discord server, Discord server chat, maybe that's the best way to continue it. And to give time, sorry. And no, yes. Can I attend the open program session tomorrow? Yes, of course, of course. Because today's open program session is too, too late from Japan. It's very late, yes. Yes, it's too late. So maybe I will attend the tomorrow's open program session. I will send us some slides. Yeah, that would be great. Later. Okay, thank you. Okay, so let's thank you again for this very interesting talk. And yeah, and you can continue the conversation. Okay, so I'm gonna stop recording now.