 . . . . Two topics. One topic is what is the relation between artificial intelligence and information theory. What information theory can tell about artificial intelligence. And I will go into more involved part about learnability. Last, it is relatively new. Are all problems learnable? That's a question that we are all asking ourselves. So first, if I want to continue the discussion we have with your talk, that basically one question that Stephen Hawking asked himself to the other people, what will happen if artificial intelligence, since it's based on the technology, not biotology but electronic technology, is supposed to be faster, what will happen if artificial intelligence supersedes humans and basically could be the end of civilization and even worse, maybe the end of our job or research job or research sort of question. This is a very important topic, more for the young people, because for people of my age it is just important. But basically it will tell the first thing that artificial intelligence tells in the fact that a system cannot evolve by itself into a more sophisticated complex system. Because basically the entropy of the future of the system, since the future of the system is a function of the system itself, the entropy will be smaller. The entropy can only decrease. Therefore you cannot expect that if there is basically a program in the computer which is just working with itself, evolves from itself, it is not going to do something better than what it was doing before. The only way to make a system, an automaton, to create an automaton which is more complex than the original one, when you introduce entropy from outside. From entropy it could be randomness, it could be data, data set and basically what artificial intelligence is doing is just collecting huge data and try to convert it into a system that will be better aligned with what it is expected to do on this data. So for the point of view of artificial intelligence theory it is as if physically you add to your original system randomness. And there is an equivalent in his life evolution. Life evolution has worked from simple system into more complex system ourselves but the animals are more complex than ourselves. In fact, very surprising. The birds are more complex than mammals. They are not doing math but they are more complex. In fact we don't know, we are not doing math. And the mutations are mostly random. And the being was mutation from cosmic radiation also from virus. Virus are not random but these are the mutations. And what happened? Of course evolution selects the best species. The problem here is the main problem. What does it mean best species? The best is who does survive the competition. It's a little bit logical. But it may be logical but you can see the result every day when we work in the forest you can see that physically the system where we arrive is very complex. I think the human is somewhere. In general we always put the human at the top of the evolution but there is no reason to be at the top of the evolution. So what will, is it possible to apply this model, the model of life evolution to the evolution of codes, basically the art of intelligence. How codes can evolve to something more complex, more useful. So is it possible to have to adapt this evolution of life to codes generation? First as we said the criterion to have the selection of species is not useful because it is vague or totally equal. Every time we just learn that a species is involved because it was something completely impossible to imagine that the nose was longer than because of the flowers. The butterfly knows who survives the longer because the flowers start to be deeper and therefore the butterfly with short nose cannot survive so the butterfly with longer nose survives but in this case the flowers become deeper and deeper and at the end we don't understand the usefulness of the project. But regarding codes, can we create a digital ecosystem where we have specified basically rules for the codes to fight each other? The other question is, is it possible to have a large enough ecosystem because if we have only two or three codes competing it won't be very useful, you have something large enough. Another problem, how can we verify? I mean certify, imagine that all these programs generated randomly are used to control your car or the airplane where you are travelling. You would like, at the minimum, to be certified. Recently the car, Google car just run over a woman because the case that the woman was not at the right place was not in the program. If I consider it is something, it is not existing, let's run on it. But now I would like to compare the power of life compared to the power of our computers. The rise of artificial intelligence has been done because we had the application of more law since three decades. I say three decades, four decades. So we arrived to a tremendous power of computational power. All the idea about deep learning, things like that, about neural networks are from the sixties. But from the sixties, basically there was less computational power in the computer than in this, no, that is already too sophisticated. Then in this key, without the electronic part. So it was meaning less to imagine, I could do deep learning with basically screwdriver. But now it is possible. But let's compare with life. Everybody has tried this. I tried this when I was young. You took a liter of water, sea water, and just you have a microscope, you look at a small screen, something like that. And it's like an ecosystem. This is one liter. And you can imagine that one day, one day, all the streams, they are going together. The bacteria, the main source of mutation are the bacteria. And they generate one kilo byte of new muted code per liter per day. That's an assumption. Why this assumption? In fact it is true. When I did the first talk about random, in fact the proof that was true. So you have one kilo byte of code every day per liter. And basically life appeared one thousand billion days ago. The ocean volume of ocean is one billion cubic kilometers. And every cubic kilometer is one thousand billion liters. Let's check it again. The math are correct. Therefore the user space of life on Earth is a space of 10 to 36 kilo bytes. And this is a lot. I can tell you this is really a lot. Imagine that you have a magic computer, then you can store one byte. Yes, exactly. One byte per atom. 10 to 36 kilo byte will need a mountain of the size of Everest. With one kilo byte per atom. And if you look at the quantity of information related by mankind, this talk is of two years old. You have to multiply by 100, of course. So it is an order of less than, as we say, more of information. More information will be 10 to 33. Here it is 10 to 18. This is equivalent in our model to store all this information in a sun grain of a mass of 10 to minus 8 kilo volume. So it can be small. But if you look, because most of the information created by mankind is streaming, is videos, is, of course, books, is on Wikipedia. But if you look at the code, the working code, executable code, it is much less the number of executable codes after they have replicated but the number of original executable line of codes created by computer scientists is 100 gigabyte, which will be compared to that less than a nano drop of air, not a drop of air, a nano drop of air. So it says, so, so. Ah, here, because I was expecting to talk, to speak during one hour, therefore, I don't have enough material, nothing has nothing to do with the discussion. But anyhow, it shed light on the complexity of life. The human, compared to the human being, the human being has a gene of equivalent of 6 gigabits of information and one baby, when there is a new baby, this part is my grandpa part, you have a new baby coming, basically it had to share 20,000 coding genes either coming from mother or from the father. And therefore, I think there is only two possibility from father or from mother, the complexity added to mankind by a baby is 20 kilobits. It is less than the content of information contained in the SMS or in the tweet you use to announce the birth of the kid. Don't tell that to your children. Make that and that's a mistake. If you take a vegetal or something strange, a vegetal have DNA which is larger than the DNA of mammals which is 20 gigabit of the reason, nobody knows the reason, of course, but the reason is the fact that there is a change of climate if I can clear an earth, the animals can move and the vegetal cannot move by definition, therefore they have to find their genes. Ah, there is one gene against warming or one gene against freezing and to activate it. Therefore, every time there is a species tension, you move 20 gigabit to life complexity and every time there is a birth, you add only 20 kilobits. Be careful when you walk or don't walk on grass before checking it is not something which is under extension. The temporary conclusion, it's a good news for us at least that we cannot expect even in 2030 that artificial intelligence will be able to serve his time with mankind. Not yet. To do math, yes, I do believe, but to evolve independently is not expect. Every time you show there is an advertisement about the miracle of artificial intelligence. You have to have a small thought about all the engineers, the mathematicians, the computer scientists who work to make this happen and this is not only the training of neural network and work. You have a lot of things to do in order to have a result. Or the bad news that if you want to get rid of our civilization of course there are less complex way to succeed. Ah, yes, there are all this because of just check. Yes, I'm not too late. So in case I was too long I still have a little... That's what I just said What would say Shannon to Turing because we considered Turing invented artificial intelligence to tell, oh yes, your thing is very great but there are limitations, be careful. And in fact, it will not be true because Shannon also works on artificial intelligence and he designed one of the first machine to help for maze escape for something with mechanical mice. Before I thought it was a real mouse physically because in the picture it was not clear and also designed a mind reading machine. So mind reading machine is basic to take advantage that the human brain is not a good source of randomness to be able to predict what is next move. It works very well. Of course it's an illustration. This is an exercise because maybe if you think it's too boring you can think about this small exercise. This is an exercise. It is what Turing would have replied to Shannon. Imagine if there is a problem you can think by yourself during the rest of the talk. You have a monastery on the top of Everest and there are monks living there since ever. Their everyday job is to predict the weather of tomorrow via artificial intelligence. You have the question God accepts only a finite number of mistakes. Since you have an infinite number of days and before Thursday God is counting more than 5 nights it won't be happy. We consider to simplify our lives in binary weather zero for bad weather one for nice weather. The only data set for each monk everyday is the infinite sequence of past weather and prediction. The hint, the monks have to make a choice. You can think of doing the talk. Now we are going to go into more involved research less philosophical there will be less story about life evolution and things like that. Anyhow, I am still going to talk about cats and dog and the question is is there a limit to learnability or cats and dog because basically the symbol of the most success of artificial intelligence is how to detect a cat or a dog in a picture. Of course, notice more than that, fortunately. The question what I would like to do I will ask my neural network to be able, my machine learning system to be able to discover an algorithm a simple algorithm of course, because I have no way I have no algorithm to to detect if there is a cat or a dog in a picture in fact what I say is wrong because the neural network is an algorithm sometimes some mutation but my question what will be the consequence if machine learning will be able to discover a simple algorithm first it is not a very interesting result because as you said it is completely pointless to use machine learning or intelligence to mimic algorithm if you have an algorithm don't try to to make machine learning deep learning stuff just use it in fact if it fails it will be the only case if you have a polynomial p equal mp because if you ask a machine to discover a polynomial algorithm it will find a non-prinomial solution something which will converge very slowly but in fact it is not that useless to ask the question because if the machine is not able to detect an algorithm it means that on some problem it will not be able to converge because some problem will need that the data will be pre-processed by a specific algorithm if the machine is not able to detect that you have to process the data you have poor convergence if the machine is able to detect the algorithm it needs to be applied to mimic the algorithm to have good convergence you will have an universal solution and of course we will be back to the beginning of the talk that we have an universal machine that solve all problems for the question is machine learning and intelligence capable to detect simple algorithm and in fact the candidate to in fact the answer is no there are problems where deep learning cannot converge to a solution the candidate is sorting for you transform for example the classic example we have led the convolution you need basically to implement the convolution the system without the convolution that is able to detect you have to put at zero many coefficient and not converge properly pattern matching pattern matching is something that you cannot do very well tree on graph structure this is the consequence of pattern matching on parity function that has been proven on parity function that the deep learning cannot converge to a solution parity function is you have a sequence of n bit and the ground proof is one bit for example is a parity of the number of non zero bit and you take a random parity function then the your system how long you train it will give an average error of one half means that it won't converge and will not even give a clue about the correct answer first important point is the fact that a neural network is a Turing machine there are some you need to be careful because there is some question about the memory therefore you have to consider and need a recurrent neural network but basically this means that if you have an algorithm just by adjusting the weight in your neural network you can implement the algorithm but of course the question is that machine learning is not programming it is training training via stochastic gradient descent can be training but for deep learning it is training we have already a free talk about this therefore but I want to show you because you can see a neural network like a box with neural network with weight it is matrices we are going to more sophisticated description but basically view from outside it will look like a coffee machine if you consider this design of course you have other design and all the weight of your neural networks are in the machine you show a cat it will answer you cat it will answer you cat with probably with weight cat with probably 100% 90% it will show you another cat even this cat you see this cat is very troubling it is a cat of course you have to challenge the system you show a dog show the dog here I don't know what will be the result so I assume that it will say 50% cat 50% dog yes there is no limit to human imagination to full the machine but how it works basically you enter an image an image is a sequence of numbers it can be 1 million numbers here I just put 10 numbers the machine produce a prediction with neural network which is inside and knowing that the true result is 56 it will compute the difference between the true the ground truth and the prediction and during the training phase it will adjust the coefficient inside the machine so that it will be always closer closer to the to the ground truth but every time we change the training image therefore it will be gradient descent and it says stochastic descent because the image we select is always random so it is stochastic it will be like that gradient stochastic machine descent you satisfy your life you consider that you know the result you compute the gradient you know it is just a system with matrix and some activation function something mathematically completely trivial you can compute the gradient and then adjust the weight according to the gradient of the loss function so just reduce take the weight of the gradient the negative weight so that you expect to reduce the error the average error here there are two examples one example when you adjust when you adjust the gradient vector this is normal but it starts nothing it is not interesting and the question the big question I am going to ask myself YouTube can we train a machine neural network to extract the maximum of two numbers so that is not a question of cats not a question of dogs not a question to supersede in many humanity mankind is I want to have two numbers I want to be able to extract the maximum of two numbers something which is very simple it turns out that you have this expression the maximum of two numbers and this expression in fact it is a neural network because if you express it as a neural network you use as the the activation function the everybody use the theory that nobody use in practice is a rule and it has this matrix representation this means that I always take the positive part of the vector and that I apply in the again another matrix and this is equal to the maximum twice the maximum of the two numbers therefore it satisfies the property that the neural network is a train machine and it can extract the maximum of two numbers and not too much effort even at that time of the afternoon you can see that you can extract the maximum more than two numbers four numbers here extrapolate the case of eight numbers just basically you have you have specific block to extract the maximum of two numbers like a recursion way to extract the maximum of the numbers so nothing nothing special you take log n layers it's very simple there are several there are several way to adjust this recursive not recursive, not recursive this neural network you can take the maximum of the numbers and separate what is above and below but you can mix these and in fact it is not it is not innocent you have many combinations of the neural network to give you the correct answer so maybe it is a reason why we will see some problem first how good is a gradient descent first it is absolutely impossible even for a simple problem to get the optimal the optimal neural network you will always end on a local minima it is a gradient descent you go down, go down at some moment it stops because it is just on a local minima it is not necessarily the global minima there are a lot of stories that you shake up the system you fail and then you go you go on lower up but it is always the same story you will never you have not a lot of chance to end on the local minima even worse since we are on a huge dimension you can have a vicious saddle point saddle point where basically it is very unstable but in fact you go one step here and then up you go there and you just do a cycle around the saddle point it is in theory but in practice nobody has seen that if the local minima are close to each other in the sub definition of closeness it is good if the local minima are far from each other it is bad so good far from each other very bad so the question how training can reach a good white vector a good local minima so the white vector now it is something I will explain the game you I will take a neural network but I am not going to take number I am going to take cats you know what if you had a small kitten the first time you put a small kitten on a mirror and it shows its image it is a panic so what I do the same with a neural network I put a neural network a front against another neural network what is known in this case it is simple the ground trough is given by a neural network another neural network and I will select the weight of this ground trough neural network at random and I will never touch them and as I said there are many if you have for the root of the loss of function you have many because you have many permutation of lines column that are possible candidate for being the optimal neural network therefore the loss function has a number of roots which is factorial n which is big basically because it is n is 1 million it is big so the exercise you take an aquarium of large dimension and this will be the place where you select the root of your loss function in fact the roots of the function are correlated but in this case you assume they are not correlated at all it is sufficient for the result and you define a simple loss function that has nothing to do with the radiant descent which is this product for example we can take something more complicated but it is in fact very easy and you prove that basically there is a quasi black hole effect that the minimum the local minimum is not a global minima but it is something the sort rate of the global minima each of this point is a global minima but the local minima it is a sort rate which convert to this local minima which is not a good local minima but it always convert to this one or there are some experiments and oh it is working yes yes it is very interesting so that when you have less than 4 you are in dimension 10 if you have less 4 it works very well it is always radiant descent always converges to the global minima but if you take 10 for example then it will converge to the sort rate something which looks like the sort rate which is not the sort rate it is something which is more complicated but if you look aesthetically when the dimension increases it tends to the sort rate what is the maximum no no this is for any kind of then we will look how it applies the maximum this is not for the maximum it is more general proof for any training and neural network whose ground trough is given by another neural network which is selected randomly and this is the amount of proof in an engineering that we can show that the gradient with high probability the gradient converges to the origin in fact converges to the average the sort rate of the root and that the sort rate look at that how long you can look at that without it worked just to prove that the function is convex with high probability around the sort rate therefore if you enter the sort rate you will never escape you enter the vicinity of the sort rate you never escape in fact this proof works for every if in the root are correlated which the case works for many matrix but you have to be careful in this case our loss function is smooth in fact the loss function is not smooth it is something very complicated therefore we cannot say that it is true for the real gradient descent in the neural network but assume that it is true this tells you that if you convert to the sort rate if the average if the sort rate is not zero then by the law of large number you will do an error but the error you will do will be basically will fade away compared to the to the the length of the the length of the modules of the of the sort rate the normal sort rate I mean that if attack vector and I apply the text vector x to the real neural network I will be close to if I compare with the what I will get with the sort rate neural network I will have something which will be very close via square root of n in fact since this order n those errors are one over square root of n n being the dimension vertically and the problem happen if if your sort rate is zero or very close to zero this is a bad news in this case therefore the error the relative error you will do so it is impressive the relative error you will do compared with the exact prediction will be too important in the relative error in fact will be infinite and if you took the maximum the maximum the coefficient the average you have to take the average of the coefficient of the of the matrix you see that the average coefficient is zero so you expect that the max finding will be a special case where basically the sort rate will be zero and therefore you expect not good result and it turns out that many algorithm unsigned numbers have this property that we call the zero mean weight here it is the training over the max so all this trained nine neural network randomly selected and the position in the plan is that basically the vector of their prediction on two test vector that have add on touch and the red arrow is the ground truth so it works well for two numbers so two numbers it converge well four numbers it converge but less well and then 8 16 32 there is this effect of that we assume to be the effect of the zero mean weight property starts to be a problem to compare we try zero mean weight random neural network not the maximum neural network and turns out that that is going very bad in comparison if you take a non zero mean weight function or it converge because it's basically the low level number there is no nothing it's working very well so basically if this is true because as I said I use a toy model to show it very quickly if it's true it mean that there is a swamp area in the learning if you have a problem and if you are ground descent go into the area that I call the swamp area ground descent has a weight zero weight can be the mean weight can be the third moment weight anything like that then it doesn't converge it can be blocked and the convergence will be very poor and if the solution the optimal the optimal network is in this area then you can expect that it won't converge it mean that your system will be able to to find a minimum of seven numbers now there is a kind of equivalent the conclusion the conclusion can be learn learnability there is a kind of equivalent programming and learning we know that from Turing that the program termination is undecidable in general so we can expect that the learning convergence may also be undecidable in general there are some proposition in this area the problem with this it is easy to state that a program is not terminated but there is another difficulty how to define a bad convergence that convergence of course it is another difficulty but we know that we can prove that some program terminates fortunately for example the program is controlling your plane, your car it is proven to terminate sometimes therefore can we think to have some way in some special case to detect that a program that a problem will not converge well and how is it possible to learn to try in neural network to detect which algorithm you should use to escape the swamp area there is also the question that I was asking that Stefan start to look at is can we train in neural network to detect which physical law apply to obtain some physical measure you have some measure in astrophysics but you don't know how to see what is the sequence of physical law to apply or which physical is there a new physical law and this is a very interesting question because if we consider the physical law is like a basic algorithm so just over writing it may be also difficult to to find the basic physical law but this is a question so that's all the what I wanted to show you if you want to know the solution of the problem for the monks it is one slide after but I think you know the answer already so I'm not going to go into these slides in your comparison between life evolution and AI and machine learning as you said life has no goal, no purpose no evolution that could be you know stated by Darwin but AI surely has one goal you would tell you machine what you want it to achieve so would that make a huge difference because you said life or evolution has done a lot of computations basically use less computation no particular direction so could that make AI more help it run faster I agree and the question was can art and intelligence get rid of of human intervention that was a basic question so you give the answer if you are able a human is able to give the rules if it works of course it is working but if you are going very vague rules you will need more computation if you if you if particularly you don't know the rules you have to give you know the famous expression there is a command do what I mean on the computer because when you want to debug a program and at the end you are fed up you write do what I mean it doesn't work so in this case what is the minimal quantity of information to have something working on that I don't know this is not sufficient to say make me happy I think yes it's another day but I had predicted that maybe it's naive but in all of these problems where you ask for exact precision like say are these two numbers equal something to be simpler than the maximum problem I mean there is no hope in any case that I said that if I need an approximation of the algorithm but it doesn't exist an approximation no no an approximation of the for example an approximation of sorting of the maximum and it will help to to find the correct neural network then I'm happy if I have something give me the maximum with an error of I don't know 5% beyond what could we expect with the present deep learning system I mean if if I take as an example assume that it is impossible to find the convolution algorithm algorithm that makes a convolution but I know that I need it to be able to recognize a cat because it has been designed with a convolution algorithm if I train my neural network and the picture of cat without this convolution algorithm inside it won't work it won't converge it's a bit because I never tried if you have the convolution algorithm but with something which is very close it will find the cat at the end because you will just add the error of your imitation of the convolution algorithm to the error you have when you have already built in the weight of the neural network your convolution algorithm because also the convolution algorithm is a neural network basically you force some coefficient to be 0 and you force some other coefficient to be identical in translation that is the convolution algorithm you force it it works well if your system will be able to discover that you need to have the coefficient to be 0 and the non-zero coefficient to be identical by translation then you will be able to find that there is a cat in the picture there is a dog in the picture you will just add the error of your fake convolution algorithm it's just to say that you need to prepare your data on your neural network in order to have the good convergence because if you don't know at the very beginning then you have to use the convolution the guy who discovered the convolution is a very smart guy and many people find it but it is something that has the machine cannot discover if this is true what happens if you try to learn max with a network with many hidden units so not just four but hundreds because the usual I see in fact I try it doesn't work but maybe my neural network it doesn't work what will work is to have the recurrent neural network which is something other story it means that you inject the data I may be surprising because people usually say that if you have number of units less than or comparable to the amount of data then there can be lots of local minimum but as you take many more units you don't have these bad local minimum if you add many layers you will augment the number of sorry I did not understand the question it could still be two layers but many units and many hidden units in that second layer many units and you mean that you increase the dimension of the container in the flavor I don't have a definite answer for that in fact I have no definite answer don't ask any further question by the way I don't have a definite answer but I will say no just increase the number of possibilities and decrease the let's ask the way more people would say that it should work and now in this case you will go again to the max to the sort rate but faster and if the other sort rate for the maximum is not good be careful if you try to find the maximum of positive number then it converts but if you do a non-sign number then it has a problem that it is my guess for the question if you increase the size of the dimension the dimension of the system no it will just to what I feel it will not converge you add more unknown to converge again you require infinite numerical precision because I could always give you two numbers seven point zero zero zero zero zero zero one you can never get this exact you can always find for any implementation you have any fixed number of neurons you can always find two numbers which are so close together no no it's not a question of numerical my problem is not a question of numerical accuracy if I show you this but there is no neural network which can train this gradient descent that solves the next problem this doesn't exist no I'm sorry if I ask the maximum of 32 numbers and this gives me this error I will say it's not problem of numerical instability it is a problem that it doesn't converge oh yes no it is a number there are only this finite set of numbers no no yes it's a no I train a number it's something so very simple but it is nothing to do with numerical instability it's my bet it doesn't converge it doesn't find the magic it doesn't have to be exact there is just a loss function how close did it get what does it mean you are close to the max well you have a size of the data standard deviation and the error is much less yes you don't look at the accuracy of the weight you look at the accuracy of the answer of your neural networks when you test it with real vectors because you may have a neural network very far apart but when you test a neural network it gives something very similar what is the loss function for the max between the numbers the square of the difference between the max and what it gives but why is it if I compare two numbers it's a super small then you don't get a small error but if you compare two numbers which are super large but still close together the number is better minus 1 and plus 1 it doesn't do something complicated I was not cheating like that I can but no I have two random numbers 32 numbers between minus 1 and 1 but real numbers uniformly there is no trick here attack uniformly if I have to look for two numbers it converts to something what will give an answer would be correct and if I was rich enough to make the to make it converging during 100,000 years it will goes to arbitrary precision but I am not asking for that here that I say I am not seen not rich enough by the way but I did it longer convergence these are stable these points are stable completely stable you can wait for you can add many 1000 multiply by tens number iteration they are stable this is the final result they don't move anymore they move because the learning rate is not zero but they just do the classic zigzag around the value nothing special but maybe there is a bug maybe what I thought is completely wrong that's a good subject of research yes it was strange for the maximum sorry I think I don't try to convince you because I am not convinced by the way I feel the same as after R2 numbers no no it is not I have a question to say that R2 numbers are equal I have a function I enter two numbers and give me an answer and the answer I enter 0.45 and minus 0.3 and give me 0.1 so I consider that the answer is not acceptable it was giving me 0.40 0.44 I will be very happy there is a really big really big and doesn't diminish if if you select another initialization of your it's not like in your case when you take any initialization it goes to the good local minima here you take new initialization another minima which is not good neither no no no because I explained he didn't explain well sorry but I think you are tired we can continue but do you want to see the last slide? oh the last slide so I show the last slide just to close the last slide I was not thinking it's a shame but I gave I gave the hint by the way it was very simple the monk are just using the axiom of choice this is and I gave wrong hint because I assumed that you have all the all the sequence of weather and prediction in fact you need all the sequence of weather with axiom of choice the monk can just figure out I have a sequence I can fill the sequence to doomsday I assume the doomsday is tomorrow and and just consider that consider all the weather next to doomsday will be bad weather so I put zero and then I try to show you because good apparently so they use the axiom of choice and basically all the sequence they have one presentant given by the axiom of choice and the monks are giving this presentant and the relation between two sequence two sequence are in relation if they differ by finite number of point therefore they all chose the same presentant is in the same relation class of the sequence of weather therefore the prediction will fail only a finite number of time and God will be happy yes but the the trick is that we assume that the monks are able to manipulate infinite sequence it was a toy example to prove that the infunction theory is not a complete theory because you need to apply the computational ability of the data it was a toy model exactly don't ask me questions about this ok sorry for being so long