 Model-on-convergence is a very common problem in hamburger research using structural ecosystem models or any other more advanced models. A convergence problem can be a technical problem or it can be a model or specific as an issue, but nevertheless these are typically really hard to troubleshoot for a beginner. The reason for them being hard to troubleshoot is that often researchers don't have a firm grasp on what exactly is convergence and in which ways it can fail. To understand how we can fix possible convergence issues we must start by understanding what does it mean when a model does not converge. Let's take a look at symptoms first. Why should an applied researcher care? When you run Stata or any other statistical software you might get output like this. So the computer just goes on forever and ever and ever, prints something called likelihood and never prints results. So we can leave this on for days or weeks or months and unless there is power outage the computer will just print the likelihood and does not go anywhere. Another thing that you might see is this kind of error messages. So you might get some kind of error messages about derivatives or Hesson. Here is another one. Hesson can be calculated or you might get warnings not specifically errors related to Hesson. This kind of print out between the different iterations in the estimation in the maximum likelihood estimation or you might get results but you have these not concave backed up. So can you trust those results when you have that notification or warning there? Nonconvergence is a major problem in the sense that if your model doesn't converge you generally cannot trust the results. You might get some results but you don't know whether those results can be trusted at all. There are a couple of exceptions when nonconvergence might not be a problem. The exception that I have in my mind has three elements. One you must really understand what is the reason why the model doesn't converge. So you understand what the problem is and you decide to not care about the problem. So you would need to understand then what are the consequences of nonconvergence. If the nonconvergence only affects one parameter that you're not interested in then you might decide that you go with the nonconvergent model and that leads to the third condition that you must understand not only what is the consequence but you must also be able to say that the consequence of nonconvergence in your particular case has no implications or no relevance for your research question. But if these three conditions are not true then you generally should not trust any results from a nonconvergence model. Let's take a look at what is nonconvergence in Morita. The symptoms of nonconvergence include a couple of things. When you look at the output you might see for example warnings so STATA here tells us that the model has not converged. That's a pretty good indication that there is a convergence problem. Another thing that you often see is missing standard errors. So the period is missing values STATA. Here a variance estimate has a missing standard error. Another thing that you might see is extreme estimates. So some of the estimates are like hundreds of thousands times larger than any other estimates. So the ballpark of some of the estimates is quite larger than any other estimates. Then you have these not concave messages or generally any other messages about the Hessian matrix. So I'll talk about Hessian matrix in more detail in another video but it is central to understanding some of the nonconvergence problems. To understand what the thing what is that there are the disease that causes these symptoms we need to first understand what is convergence. So we can't understand nonconvergence unless we understand what does it mean for the model of the convergence. And this is my list of four things that I understand what convergence means. So a model or a set of estimates or an optimization procedure however you like to say it converges if the four conditions are true. And this is written in in terms of maximum likelihood estimation but it applies to any other numerical optimization technique or numerical estimation technique. So the first estimates must be exist for the model. So this is the statistical mathematical identification. Some models are impossible to estimate. For example if you have one correlation you cannot estimate two causal effects from that one correlation that would not be identified. So does the solution even exist. Then the estimates must exist using your dataset. It is possible for example that the sample size is so small or your sample happens to be weird in some other way that the estimates don't exist for that sample. They would exist for the full population but not for that sample. Then there's all kinds of computational problems. So how the computer works is that it starts with some kind of initial guess of what your parameters might be. It adjusts those initial guesses to make the likelihood statistic as large as possible and that can fail in a number of different ways. And finally it is possible though not that likely that the convergence check fails. So it is possible that your computer tells that the model has not converged even if all either these three conditions are true also in our conceptual example in a moment. The inverse is also possible. So it is possible that your model computer program declares that one of them is converged even if in reality the computer has not found an optimal solution for your problem. So that can happen as well. So let's take a look at this analog of climbing a mountain and maximum likelihood estimation and any other numerical estimation is quite often explained using the analog of mountain climbing. So the idea of maximum likelihood estimation is that you start somewhere at the base of the mountain and you move left and right and your task is to find the peak of the mountain here. And to find the peak there must be some conditions that must be true. So first of all there must be a unique peak. To be able to say that we have found the peak there must be one unique peak to be found. If there is no one unique peak if there is for example a flat area. So there is no one unique point that is the peak we would say that the model is not identified. There are multiple peaks the peak is not unique. So this point here is equally highest as this point here. So we can say that we have found the highest point because there is no single one highest point. The second problem is empirical under-identification. It is possible that the peak exists for the mountain the maximum of the likelihood function does exist but it does not exist for your sample. So let's say that we take a sample of this mountain so we don't observe those parts and now we don't observe the peak which is observed two equally high points around the peak and we don't know which one of those is the highest because they're equally high and therefore there is no unique peak to be found with these data. If we collect more data if you're able to uncover this peak here using more data then the problem would go away. So empirical under-identification typically is a sample problem but it can also happen for a certain population or combination of population values but that's not as common as a sample problem. Then there is the problem of numerical optimization and this is something that is perhaps the easiest to troubleshoot and this is perhaps one of the more common things that are easily addressable. So we must find the peak. It is possible that the peak exists and the peak can be found using this data but there's still the task of actually finding where it is and how the computer tries to find the peak or the maximum of the likelihood function it starts from our initial guess. So let's say we start here so that's our starting point and we apply rule always climb uphill to find the peak. So we go uphill, we go uphill, uphill, uphill and we found the peak. Okay so we found the peak. This algorithm works well when we start from that side of the mountain but what if we start from the other side? So we start here, we move up a bit, we move up a bit, we move up a bit and we find this flat area. So there's this flat area here and if we are in a flat area we can move uphill. So it's momentarily flat, there's no uphill, there are the rule does not tell us where to go next and this optimization fails. These kind of simple optimization problems can be sometimes solved by using another optimization algorithm, another rule in this context. For example we might have a better rule of looking around just go a bit to different directions and see where the ground starts to elevate again. So I'll look around and move toward higher ground using that rule we could find the peak. So it might be an algorithmic problem that can be fixed by switching to another algorithm. I sometimes get models to convert this way but this is perhaps not the most common way of fixing these optimization problems. The more common problem is that when we find a peak we might actually start so far away, we start from here that we can't even see the peak. So the mountain is so far away that we can't see where it is and if we can't see the mountain then it is impossible to actually find the peak and this is called the starting value problem. The starting value, the initial point where we start to climb the mountain must be somewhere within the vicinity of the actual peak otherwise the computer is very unlikely to find the peak within reasonable time. So this is the starting value problem and then another problem that we might face is that we must know whether we are on the peak or not and so how do we know that we are on the peak? We must have a rule for that and a typical rule looks like this. So we are on top when ground is level and curves down on every direction. So that would work really well for this kind of round peaks. So it's flat on the top, we can stand on flat ground and if we take a step in any direction we go, we start to go down. So whenever we go, whichever direction we go, we go downwards but we are on a flat place and the flatness is quantified by the slope or the first derivative and they are stored for all dimensions of a problem in a gradient vector. So if we are on a two-dimensional space like we would be when we move north and south and east and west so we have two dimensions then we would have a gradient vector with two elements. So how much does the elevation change when we move a little north or a little east for example? And the curvature is quantified by the secondary vectors which are stored in the Hessian matrix and the Hessian matrix is a bit more complex if we have this kind of like north and south east and west then we have the curvature when we go north or south, we have the curvature when we go east or west and then we have a curvature when we go northeast for example. So we have three different elements in that Hessian matrix. Okay so when might that fail? For example we might have a mountain that looks like this so we have a sharp peak so there is a unique highest point but the point is not flat ground it is peak and if it's a peak like a right like sharp angle it doesn't have a slope and it doesn't have curvature because it's kind of like an angle it's not a curve. So the calculation of the slope and the curvature the gradient and the Hessian would fail in this point even if we are on the peak and in rare instances you might encounter this problem then you might be able to fix switch to a different convergence check to actually notice that you are on the top instead of saying that computation failed and printing out an error message. Another issue is that how close did the peak must be must we be to say that we are actually on the peak? So if you think about a real mountain that is a kind of like a flatter top like this one there is there is one unique peak so there is one grain of sand one small rock that is the higher that is higher than any other pieces of sand or pieces of rock and then then how would you know that you are standing on that one specific highest place so normally we would not require that we would say that when we are close enough to the peak let's say that that it looks like it's flat it might not be exactly flat it looks like it curves down every direction it might not exactly be that and then we declare that we must be on the peak so if we are actually climbing a mountain we might say that if we're here we are on the peak we might be here we are still happy we say we are on the peak even if the peak is over here so in practice we are trying to get close to the peak because finding the exact peak would take forever we can always add more and more and more decimals to our estimates to make them more and more precise but sometimes we decide that well if we have five decimals we are we are happy we declare that we are on the peak instead of going for 50 more 50 more decimals of precision in our estimate this relates to to something called tolerance so if we look at our find our peak we might have a low tolerance and the low tolerance would say that we must be very close to the peak to say that we actually got to the peak and then a bit higher tolerance would say that okay we're somewhere in that area we are on the peak adjusting tolerance values you can change the tolerance so that you would be within the tolerance already in your starting starting position so if we adjust the tolerance to cover the entire mountain including the base then we would always say that we're on the peak but that's very useful adjusting the tolerance will make your model converge but it will also decrease the quality of your convergence and typically adjusting tolerance it should only be reserved for when you really know what you're doing because it is possible that you get estimates that are bad it's really not converged but the computer says that it is converged because you just make the tolerance so large okay so that's the conceptual explanation of these different things that that what does it mean for the model to converge and here are some of the problems that you might encounter and some of the there are the solutions to these problems so after this conceptual explanation of convergence through the metaphor of mountain climbing we are ready to move on to the more specific technical topics on how exactly you identify non-convergence and why exactly a model might not converge