 Okay. So I wanted to cover two topics today related to Markov chains. The first one is LEM, which is actually a Markov chain extension, and then the second one is in the classical model of Markov chains, but it's how to learn a Markov chain in a particular setting. They're both kind of web motivated. So let me start with LEM. There's a paper from Wodob that's referenced here that I can send the slides later if anybody would like to see more details. So we're interested in sequences of actions, so user actions or item consumptions, and we're saying that the output of a particular time step can depend on kind of what's happened earlier in the history. As an example, let's say that we have a universe of science fiction novels that we can recommend to people on a website like Amazon, and we may observe that people who read this first book are likely to read Ender's Game, and likewise for the other ones we have certain statistics, and so given a sequence of consumptions, we'd like to know what's a likely next element, or maybe more practical, we'd like to know what element should I recommend to this user given the consumptions that they've made so far. So this is a classical problem in recommendations. We often wind up augmenting the representation with other information features about the user, features about the timing of the prior consumptions, and maybe more details about how engaged the user seemed to be, and so forth, the session structure. We're going to kind of simplify in this setting and talk about something more stylized that looks just like this sequence here. So the simplest approach would be to use first order Markov model. So in this setting, we'd say that the most recent letter, the letter C right before the question mark, is something that's most representative of the user's current state. It should be the most predictive. We'll ignore the rest and we'll write the likelihood of a particular element following C, and maybe we can do that based on some count statistics like I show here. And so we can normalize these to produce the matrix of the Markov chain, and then we have an equation for the evolution of the process. Generally though, we think that looking at more history than a single most recent element should provide better models. And in fact, there are a lot of approaches to using these long range dependencies. Maybe the earliest and most pervasive is to use higher order Markov processes, but these days everybody's talking about deep network sequence models, which learn neural networks to control the evolution of the process. There's also field around point processes, which further will allow you to predict the time of the next consumption and make statements about likelihood of consumption occurring within a particular interval, and many more. So let's talk about higher order Markov processes. So in this case, maybe we have a case order Markov process which is allowed to look at the last case states, and based on the K previous characters, the process will make a determination about what the distribution should be over the next set. The parameter space blows up exponentially in this model as a function of K. So in many cases, people would instead use variable order Markov processes, which would use a first order process in many situations. But if further data is available for a particular prior symbol, maybe there would be a decision to keep two elements of history instead. Nonetheless, if at any point you're trying to track an order D dependency, then you would require space that's exponential into even in the variable order model. Deep neural nets have been very successful for sequence modeling, starting with natural language tasks like translation, and now used all over the place, including in the specific recommendation problem that I introduced at the beginning. There are a couple of architectures that are most well known. LSTMs have a long history. The primary target of these architectures is to address the issue that if you imagine propagating a gradient backwards through a series of recurrences in a network, then the gradient may shrink at each step. Over time, in much the same way that the higher order Markov chains require exponential data, the back propagation also would require exponential data because the gradient will be vanishing exponentially, and you need to see enough to make a significant change based on a longer range dependency. So there have been a number of architectures that have been proposed in order to get around this vanishing gradient problem. LSTMs and GRUs are probably the most successful right now. LSTMs have this slightly complicated architecture that you can see here on the screen to control the flow of errors back to earlier elements and allow the idea that depending on the settings of the current instance, an earlier element potentially far back in the sequence could flow directly through the network to influence the current decision, and the gradient similarly could propagate backwards without the vanishing issue. It's a heuristic attempt to make that happen. It's slow to train. It can also be not so robust in the training and tends to require a lot of data. So for the approach that we're going to talk about today, I want to just kind of introduce a technique that we're going to use that solves a much simpler problem based on Herb Simon's copying model from the 1950s. So in this setting, we have a sequence and we'd like to predict, but rather than saying that a previous element in the sequence will influence our prediction, we're going to say instead that the next element will simply be copied directly from an earlier element. So if the cell with the question mark is the one that we would like to infer, then it may be a letter D. And if so, that would be because it was copied from one of the three earlier occurrences of letter D. And the model for solving this problem inspired by Simon's copying model is to learn a weight for each previous time in history. So there's a weight associated with copying the most recent element, another weight with the second most recent element and so forth. And now the probability of consuming D next would be just the sum of the weights of all the previous occurrences of D. So the generative version of the process says that I will look back into the past and select distance from this weight distribution and then I'll copy whatever is that many steps back. So we'll use this as kind of a black box inside the lamp process that I want to introduce. So we want to do exactly the same thing. We want to say here's an example of some states representing restaurants that the user has visited and we're trying to predict what the next restaurant visit will be. So in a first order setting we would say that we decide based on Ruth's Chris. In a variable order setting we might say that the sequence Morton's, Shaky's and Ruth's Chris together would lead to a particular prediction. In this new model we would say that based on the past history we will use a recency distribution to choose how many steps back into the history we should look and then based on what we select from there we'll pick a next element according to the previous state that we've selected. So the definition of lamp says that a lamp process is defined as a recency weight vector and a single stochastic transition matrix. The way the process works is that the probability of choosing a particular element for the next state given the history so far is a sum of the probability that I choose some element i steps from the past which is wi from the weight vector and then if I choose to move from i steps backwards then I'll take a single step in the Markov process as if I were in state i. So sorry as if I were in the state from i steps ago x sub t minus i. So one can imagine that the user maybe has certain facets of their taste or of their personality and those represent different hats that the user wears. Sometimes the user's in the mood for a dramatic novel, sometimes for something lighter, sometimes for poetry and depending on the first probabilistic decision of wi we determine which hat the user will be wearing by determining which point in the past we will emulate and then based on where they were at that point in the past we'll then take one step forward in the Markov chain. So rather than any exponential growth in the parameter set as we go further and further back the total complexity is just the non-zero of the transition matrix plus k values for this weight distribution that determines for the particular task how much we focus on recent behavior versus how much we're willing to look back into the history. So in order to learn a lamp model rather than just learning the transition matrix we have to jointly learn both the matrix and the weight distribution w. The learning we do through alternating minimization the joint learning problem is non-convex and we don't know of any way to do it faster. There's some details in the paper on the optimization. Okay so maybe we can do an example here. So here's a Markov process with some transition probabilities that aren't shown and we're currently in state A. So from state A we will move from A following this edge to state C. Now we're in state C and our history is A to C. So now we'll make a decision where we'll move from and maybe we flip the coin and decide that we'll move from one element back in our history moving from A again and we'll follow the edge to B. Now our history looks like A, C, B. This time maybe we decide to move from B. It's typically most likely to move from the most recent state and then we follow this edge to state E. So we have A, C, B, E. Now we choose another element from the history distribution. We choose C and move forward to state G, move from A, move to state B. This is an example where it's a transition that had zero probability in the underlying Markov chain but in the LAMP model it has non-zero probability. Okay so maybe with that in mind we can quickly talk about how expressive this model is. We've taken the transition matrix and we've added a single weight vector in terms of the parameter complexity. So it's easy to see that a LAMP process with access to K elements of the history cannot be approximated by a K minus first order Markov process even though the K minus first order Markov process has unreasonable parameter settings many more weights. However it can be approximated, it can be captured entirely by a K-th order Markov process. Okay so let's take a look at how this LAMP distribution evolves. So at time zero we imagine that we're in some starting distribution pi zero over the states of the Markov process. In time one we take one step forward by applying the transition matrix and now we're in a new distribution pi zero times w. Now at time two with some probability we take a step forward from the current state in which case we will be in pi zero w squared but with the remaining probability we take our step forward from the initial state pi zero in which case we will again be distributed as pi zero w. So at time two we're actually a mixture over the distributions pi zero w squared and pi zero w. And then in the next step the same thing happens again. We can get to pi zero w cubed if we again flip a coin that moves us forward from the most recent state we can get to pi zero w squared in two ways and we can get to pi zero w again by choosing to copy from the start state. Okay so this sort of suggests that the random variable that we really want to study is the random variable representing the exponent of the matrix at time t and so we'll call this e sub t the exponent at time t and the evolution of e sub t says that we pick an exponent from some previous time step according to the weight vector w and then we add one to it because we take one step of walk from there and that applies to the matrix one more time. So this is similar to a process that we're in like studied which is available on archive. Okay so what's the steady state of this of this process? So if we imagine that it's a kth order lamp process so our weight vector has k elements we're allowed to look up to k steps back then we can do sort of a simple batch wise analysis and say if we're at some time step t let's look at the last k elements and see what the minimum exponent is and now in the next step at t plus one we must be at least one greater so every time we take k steps the exponent the min exponent that we see must increase by one so over time if we take k times as many steps then we'll be distributed as as the Markov will be at least as close to the steady state of the underlying Markov chain as the Markov process with with a full t steps. So the first corollary is that lamp actually has the same steady state as the first order Markov process with transition matrix w but that dynamics can be quite different so the flows that we observe from that steady state may not look anything like the flows that we would observe if the process were actually a first order Markov process. We can we can tighten this analysis a little bit by looking at the exponent e sub t at time t and checking to see that element e sub t was generated by choosing something from the history distribution and then jumping back that many steps and taking one step forward. So we can imagine a backwards chain linking from time t to some t minus w one and then from there to a t minus w one minus w two where each of these w values is chosen from the the k element weight distribution. So this is a renewal process and by the strong law of large numbers for renewal processes this actually gives us a characterization of the the number of backward steps and the number of backward steps is exactly the number of steps forward in the Markov chain that we've taken in time t and that gives us a bound on how close we are to the steady state. So in expectation the number of steps that we jump back at each point is the mean of the weight distribution the distribution over how far to jump in history and so that that gives us both the expected number of steps including in setting where the weight distribution is unbounded and we're allowed to look arbitrarily far back into history and it also gives us a way to show concentration balance as well. So under LAMP we have these these sort of strong understanding of the way that the mixing works for LAMP based on a modification of the way that mixing works for first order Markov processes but we allow this arbitrary dependence on history and so the hope is that the model will be more accurate in terms of actually predicting evolution of sequences. So that we can test empirically. So we've looked at some data data sets from wikispedia from Reuters last FM and from an old location-based data set called Breitkite and so these are the results in terms of of perplexity the reduction in uncertainty on the guess about the next element in the set and we compared to a series of baselines and the LAMP model performs generally pretty well the language model based approaches that we look at are prone to overfitting. We tried a few different smoothing techniques to address the overfitting. They work pretty well in text data but not so well in behavior data. We also take a look at the weight distribution that the LAMP learning learned and we see that in some data sets for instance in Breitkite which has check-ins to particular locations the LAMP model has learned to use past visits pretty strongly not just the most recent visit. In wikispedia which is based on wikipedia steps the most recent state is much more important than the other states and so the LAMP process sort of devolves gracefully to something that's quite close to a first order Markov process. So I talked about deep network recurrent models like LSTMs. We compared LAMP to LSTMs on these data sets also using some settings for the LSTM that have been published previously and we found that with sufficient data LSTM is much more representative is able to outperform LAMP. For some of the data sets that are smaller LAMP does significantly better like Breitkites. For others like LSTM music consumption data the performance is similar and for text data which do have these text statistic information based on both the vocabulary and the document so far and also on kind of what's going on in the current sentence the LSTMs perform quite a bit better and as the LSTMs train more in certain data sets they become better and better and are able to outperform LAMP but generally when we use the same computational effort LAMP has a lot more assumptions made about the data typically does much better with the same amount of training time. Okay so that's that's sort of part one on on LAMP. Maybe this is a good time to pause in case anybody had any questions or comments about the LAMP process. Okay sounds good. Okay slides are okay or visible. Great so we're going to change gears a little bit here and talk about some work from the wisdom conference in in 2015 on what we call reverse engineering a Markov chain. This is joint with Ravi and Sergei Vassilvitsky and Eric V. So in in in data analysis we really like Markov Markov chains for the same reason that we like them in LAMP very simple they capture a lot of interactions the theoretical story is is very clean and easy to use and they're they're highly extensible and they've been used in a in a web setting in in a lot of different ways including including page rank. So we're going to talk here again about chains related to recommendations so imagine that I do a search on the web for some topic maybe I search for the topic of Markov chains itself I get some results I follow a link to YouTube see some videos and I pick one so I watch the video and when the video ends I'm then shown a series of recommendations for the next video that I may choose to watch on the topic and so one could imagine kind of taking a walk through the state space of YouTube videos in this way and in fact a large number of YouTube watches are derived from these watch next recommendations so so we sort of have this process that looks like this I'm in a current video markup chains part one and then I'm recommended a sequence of other videos that I may want to jump to I jump to one of those I watch it and then again I have a new set of recommendations for where to go next and at some point I may terminate this process and jump to something that looks more like a random state or something that looks more like a state derived from a search or something else but while I'm in the flow of going from video to recommended video then I evolve according to something that looks like a first order process and as a user of this system I actually have visibility into into two parts of this one is I see the recommendations themselves so I can go to YouTube and take a look at what's recommended in response to a particular video I see each recommendation I know it's positioned in the in the list and the other thing I have is the view counts for this video so I know the stationary distribution of this of this evolution and so so this is a setting that comes up pretty often the example we looked at has videos as the items and the stationary distribution information as the as the view counts and in that setting we'd like to know why is it that certain videos seem to be they have a huge number of watches in the system and it could be two things one is that it could simply be a phenomenal video and for this reason words got out everybody knows and goes and watches the video maybe recommends it to a friend and so on the other is that for some reason the process might have featured a certain video all over the place including on as a recommendation for videos that are watched very frequently and people will just watch it because it's there but it might not be any good and so we'd like to kind of tease apart the distinction between things that are truly high quality and that's reflected by having been looked at a lot versus things that are high view count due to their position in the graph and so so that's maybe one motivation we could have for understanding what the actual underlying process really looks like because it will give us some sense of how people are choosing one recommended video over another one okay so so that motivates this problem of inverting a markup chain so the problem is we're given the stationary distribution and we want to find the markup chain that generated it so ordinarily the inference is the opposite ordinarily we're given the markup chain and we'd like to find the stationary distribution which we can do through some sort of matrix techniques like power iteration in this setting though we want to flip that around we're saying we're given this graph G which we can observe in the youtube setting we are given also a steady state distribution pi which we can observe also we'd like to output the transition matrix m that generated this steady state distribution over the graph and so the graph essentially represents a constraint that edges that are missing from the graph must be zeros in the transition matrix but otherwise edges that are present in the graph can be given any any weight so of course it's not always feasible to do this i can exhibit a chain here that maybe is not even ergodic and and so it's it's not possible to establish a markup process with the given establish some transition probabilities that will yield the given steady state we define a directed graph as consistent if whatever the target steady state is that we're asked to produce there is a flow that preserves that steady state on the graph so for example any strongly connected graph with self loops is is consistent consistent is a property that's defined in the presence of a target steady state distribution and then the theorem is that for any consistent graph there is a markup chain with pi as its stationary distribution so for any pi for any consistent graph with pi as the target there is a markup chain the problem is under constrained we have these end constraints on the steady state but we have possibly a dense matrix or a fully connected graph we could have an enormous number of variables certainly we expect there to be in this recommendation case a hefty handful of recommendations from each vertex and so the number of variables that we have control over is typically much larger than the number of constraints that we have so we have to address this in some way um there is a classical approach from 2003 from John Tomlin saying that we could simply write this as an optimization problem with uh max and objective and so maybe because we we simply don't know anything about these further constraints we'll try and pick the one with max entropy um today we're going to pick a different approach we're going to try to limit the degrees of freedom of the of the process um because we think that it actually captures what happens in in many settings and by learning in this framework we'll be able to induce some other usable information about the vertices of the graph so we will say that every vertex has a score which is intended to be capturing some elements of the quality or the attractiveness of the vertex and the Markov chain will be expressible as a function of the scores so if um if we imagine some probability from uh a to c in the chain um transition probability we'll say that this depends on the score s sub c of the destination vertex c and also possibly it may depend on some properties of the edge uh w from a to c uh for example maybe the wac edge encodes the fact that c is ranked at the top of all the recommendations when you're on page a um so uh in uh in sort of the simplest variant of this setting that we'll spend some time looking at we can say that the transition probability to a particular destination is proportional to the score of the destination and the constant of proportionality is given by um what options are available so uh any edge has probability which is the score of the destination divided by the sum of the scores of all the alternate destinations from that start point um and uh so so already this sort of suggests that the transition probabilities to a particular vertex are are highly contextual so in this example we've got our same example from a to either b c or d and i've assigned some scores here so under these scores uh the probability of going from a to c is maybe nine percent or so um on the other hand if i introduce a new node f which has only c and d as options then um the chance of going from f to c is 91 percent because the highly desirable node b is no longer present to siphon off traffic okay so um uh just to sort of formalize what the result actually shows um we we're saying that we depend on the score of the destination the weight of the edge um the results hold for um this setting here that can be an arbitrary function of the score of the destination and the uh the edge properties um and then the transition probabilities are given by uh making uh the uh different values of f be a distribution um we also for our results to hold we're going to require some sanity check properties on f so it should be the case that as the score of a vertex um increases uh f should increase continuously so we don't expect that um there are any thresholds in behavior where increasing a score causes a discrete jump in in f and uh we also expect that a score gets better f should get better so as the score goes up the node should become more desirable and finally we expect that um as we increase the score unboundedly then we can make the score of f arbitrarily large and if we hold all other scores fixed that should mean that with unbounded score that destination vertex should become more and more desirable until uh the probability of transitioning to it becomes one okay so these seem like um reasonable uh sanity check conditions on on functions f that we might consider and in particular the the one that we looked at earlier that the transition is just given by the scores uh has these properties trivially okay so uh so we'll actually take a look at this simple version where we say transitions are proportional to scores um and uh uh we we we can use the edge weight uh in in a few different ways we can use it for rank information as i as i mentioned we can also use it to say that um it might encode uh the similarity between the two nodes so given that you're on vertex a which is about markov chains we're recommending b c and d but c might be particularly desirable because it's also about markov chains whereas b moves into some other element of stochastic processes and so we might use that w ac edge weight to encode some kind of similarity as well um so uh so overall this is our target we want to know whether items are popular due to high scores or due to their location in the graph and the main theorem that we have is the following uh given a consistent input g g and pi and given uh function f that controls the um uh the uh map from scores to transition matrix uh there exists a unique set of scores one for each vertex such that pi is the stationary distribution induced by the markov process that you get by applying f to those scores and these scores can be found in poly time um and so unique here means unique up to scaling so there is one uh one scale parameter that's free um the stationary distribution actually we find up to plus minus epsilon um and uh so now we can maybe move into a little bit of discussion about how this works so we say s are the scores um and uh what i'm going to call a permutation pi isn't really a permutation this is going to be the uh distribution of uh probability mass a current distribution of probability mass over the vertices of the uh markov chain so here's an example um and uh we'll we'll say that uh q sub i of s will be this quantity that says let's say i begin with my probability distributed according to pi um then q sub i of s is the uh new distribution i get if i take uh one step from pi using the current scores s that i have so the current scores and the function f induce a transition matrix and with that transition matrix i can go from pi to um some new state uh q sub i will be the new mass at uh vertex v sub i um under the current score vector s and so in this setting um given the pi that i started with in red and given the scores in blue the q sub i values um for that score vector s um are then written in in black okay so um one more definition we'll need we'll say that a node is underweight um if it's uh um if it's q sub i value is uh bounded below its target um so uh q sub i of s is underweight if it's less than one minus epsilon times uh its target pi sub i value and uh the algorithm is going to repeatedly increase the scores of nodes that are underweight and it won't touch the scores of nodes that are not underweight so in this case um the uh yellow node d here um is uh um underweight and so we could increase its score okay so uh so now we can go through the algorithm um so the definitions are listed at the top and the algorithm works as follows so it says start out with all the scores equal and maybe we'll make them one over n we'll say this is the score s sub i at time zero and then we'll repeat the following step for every vertex if the vertex is underweight then we will raise its score um until we've closed uh half of the distance towards the target steady state otherwise we'll keep the score unchanged um okay so let me just say that slightly more carefully so um uh because f has these properties of being unbounded as we increase the score s sub i of t if s sub i is underweight it means we're uh less than one minus epsilon of our target value pi sub i um we're going to increase the score continuously until um q sub i of s becomes one minus epsilon by two of the target pi sub i value and then we will stop um so if it happens to be underweight at exactly one minus epsilon pi sub i we would move uh until we had closed half of the gap if it was more underweight we'll move a little bit more um okay so this um this new value of s sub i is guaranteed to exist um this is why we need these properties of being monotone continuous and unbounded and we use the consistency of the graph g to say that there's always enough flow for us to push in order to have this outcome um the only step we ever take is increasing an s sub i score and so under this algorithm no score ever decreases um and as soon as uh the q value for a vertex is below pi um it will never cross over because the only operation we ever make is either increasing the score of that vertex but only until it's closer to pi without reaching it or increasing the score of another vertex which by the properties of f will never increase the um the score on uh on v sub i um so once a q value is below pi it'll always stay below pi okay so the so the key limit for the convergence proof says that there's some explicit value m such that for all vertices i and all times t um all the scores are below this upper bound m and the sketch of the proof says if we uh assume by contradiction that there's some set of scores that in fact grow without bound then we know these scores must all be underweight because otherwise they would not grow um and we know that not all scores can be underweight um the some of the underweight scores must be below one because the uh probability has to sum to one overall and we know that the target steady state is a distribution um so the scores that are growing without bound must take all the probability mass from the scores that are bounded because of the unbounded property of f and so by consistency the demand that they have must be met which leads to a contradiction because they would um move probability towards one when we've argued that their probability must be bounded away from one so there's there's quite a few technical details in uh in getting that proof to go through that are are in the paper um okay so once we have that key lemma we know there's a bound on m and we've shown that the scores must increase multiplicatively by a factor of one plus epsilon over two just the way the algorithm is constructed um we have a bound on m and so overall we can show that poly many iterations um will suffice to reach uh an epsilon approximation of the steady state okay so that's the um that's the theoretical statement and the algorithm the algorithm is a little bit weird it doesn't look like a regular learning algorithm it has these two asymmetries in that only this uh vanishing subset of the vertices ever have their scores touched and in that we never reduce a score um but um with that in hand we can take a look at some uh experimental work so the way that we do this is we look at a data set of empirical transitions from some system um and we take the transition graph the skeleton of the matrix and the actual steady state in practice for a particular data set like this feed it to our algorithm and we ask uh for it to output um the scores and transition probabilities and then we'll measure whether we're correctly capturing the way the transitions actually work in the in the real data okay so we look at um some navigation paths through wikipedia uh we look at some restaurant data from uh restaurant queries to google uh the entree data set with uh chicago restaurant recommendations and uh we look at some uh uh data set of of uh comedy videos um the baselines we consider are algorithms that uh transition just proportionally to the steady state distribution so this seems like a pretty reasonable uh approximation um we look at a uniform uh approach where it's uh um whatever out edges are present they get uniform score um we look at page rank itself um where the transitions are proportional to the page rank at the node um we look at the maxent based approach that we mentioned before from 2003 and then we look at our uh inversion steady state inversion algorithm and uh in terms of the root mean square error of this prediction task um we've normalized the error to be one for the popularity baseline and uh you can see that the inversion model actually does better job in all of the data sets at predicting the the transitions um this is a convergence graph to show um as a function of number number of iterations what the um log likelihood on one of these data sets looks like and then on the right side the rms error um and so you can see that uh it's an iterative algorithm that adjusts weights according to the um the underweightness criterion and by about 15 iterations of the algorithm it's uh it's converged pretty uh pretty effectively okay so that's the um that's the end of the of the talk so happy to discuss any comments or questions questions so um I I didn't understand the uh w matrix that is also fixed as a parameter in the second one the inverting the steady state in the second part yes in the inverse uh yeah so the um the um the the skeleton of the matrix is fixed so we know which entries are allowed to be non-zero but we are not given the values for the entries the task is to pick values that lead to the target steady state and rather than allowing full freedom to do that we require that the values be picked so that they are consistent with a uh end dimensional score vector um according to this function f that maps from the scores to the transitions yeah so as far if I understood correctly so the model that proposes uh proposes having only the first order mark of chains so have you tried a second order or a higher order for mark of chains and sees how how well it fits in in comparison with first order I I think so I think the question was in the in the first part of the talk whether we'd looked at maybe a second order mark of chain well yeah no because um if I understood correctly the model that you're proposing is only first order right you mean in the second part yes it's first order mark of chain uh yes so my question is if you have tried uh so far with a higher order with I see um so uh yeah it's a it's an interesting question we haven't we haven't thought about it so the idea would be that um I'd like to consider something that might have n-squared many parameters um and uh see if I can so in a sense we're sort of saying just learning the um just learning the first order transition matrix is uh under constrained and we place this additional constraint to use the scores um learning a second order would be kind of even more under constrained and we'd need a way to either use the scores of the maybe previous two elements to predict the next one um or maybe change our representation to have sort of a set of first order scores versus second order scores or something um I don't have a good intuition for whether the proof technique would would go through in that setting it seems like uh it would be quite a bit harder thank you it was one more question I think oh no no more questions from you this okay thank you again for your talk okay thank you very much