 Yeah, my name is JP. I'm a research scientist at the CryptoEcon Lab. Today I'll present a talk entitled Gas Consumption in the Falconry Network Analysis, Models, and Opportunities. Just a quick overview of the talk. I'll first introduce the concept of gas, transaction fee mechanisms, why is gas important, why do we need it for blockchains. I will then present an analysis of gas consumption in the Falconry Network. As we will discuss later, this consumption of gas in the Falconry Network is a random process, so to better understand or to understand it properly, we need to understand the statistical properties of this random process. Afterwards, based on the key insights of this analysis, I will present some mathematical models that we came up with in order to model this gas consumption in the network, and at the end I will finalize with some opportunities or some applications of these models. Okay, so let's get started. So what is gas? Gas is a measure of the resources needed for on-chain transactions. So just in the same way that you have to pay for gas and you have to put gas in your car to make it run, and the consumption of gas in your car will depend on how you drive it, whether you're going too fast, too slow, uphill, downhill, whatever. You also need to fuel your transactions in a blockchain and the amount of gas that your transaction or all your message is going to consume depends on the type of transaction that you use. Furthermore, as we know, computing power is a finite resource, which means that at any given time of all the messages that users want to include in a block of these messages, we can only include a handful of them. Now these poses two questions. The first one is how are these resources being paid for? And the second one is how are these transactions being prioritized? How do we choose which messages go on chain? And answering these questions is akin to the sign in what it's called a transaction fee mechanism, which is kind of the incentive part of the blockchains. There are many types of transaction fee mechanisms, but historically and perhaps for our context, the two most important ones are first price auctions, which is the one that was used in the theorem network up until August 2021 and an improvement on it, which is called the EIP 1559 model, which is the one that File concurrently uses or that has used since its inception, and it is the one that Ethereum and other networks use. So how does this EIP 1559 model work? Well, it has four parameters, one parameter that is protocol defined and three parameters that are user input. The first one is what's called the base fee, which can be understood as the minimum amount of token per gas unit that you would pay in order to put your messages inside the block. This base fee is adjusted according to the man, meaning that if the man for block space is high, then the base fee will increase and if the man for block space is low, then it will tend to decrease. Furthermore, there are three user defined parameters, sorry, the first one is gas limit, which is the maximum amount of gas that a message execution is allowed to consume. In general, users have an idea of how much gas the message will consume, but for technical reasons, it's very difficult to estimate it exactly, so we need to over estimate it. This is given in units of gas. The users also define a gas fee cap, which is the maximum price, which should of course be above the base fee, which is the minimum that you need to pay. It's the maximum price that a user is willing to pay. This is given in tokens per unit of gas consumed and the user also specifies a gas premium, which can be understood as a tip to the miners. Roughly, this EIP 1559 model works as follows. The user sends a message with some sort of bid that can be given by max cost understood as the gas limit times the gas fee cap. From this bid that the user is sending, some part of it is going to be burnt, and some part of it is going to the miners. The part that gets burnt is gas use times base fee, and the part that goes to the miners is gas limit times gas premium. So there are a couple of interesting parts to these transaction fee mechanism. The first one is that for the users is relatively simple to know how much they will pay for a transaction. The second one is that blockchains are a team effort, right? All nodes need to validate it. The whole network needs to put effort into it. So by burning tokens, this step here, you're sort of increasing all the remaining tokens value a little bit because you're basically putting deflation of your pressure. And for the miners, it's fairly simple in the sense that they just choose to mine or to include the messages that would increase the revenue. So let's look into base fee a little bit in more depth. As I mentioned earlier, the base fee increases or decreases depending on the demand for block space. And it does so according to this formula. So the base fee at the next epoch is equal to the base fee at the current epoch plus this term, one plus one eighth of the gas used minus the gas target divided by the gas target where the gas target is defined to be half of the block space, half of the maximum space gas that you can put on a block and gas used is then of course between a value takes values between zero if the block was completely empty to max block size if the blocks was completely full. Okay, so if you're good at math, then you can see from this equation that basically this implies that if the gas usage was above the half of the block space or half of the block size, then the base fee at the next epoch will increase. And if it is below the half of the block size, it will decrease at the next epoch. And of course, this is important for many reasons. The first one or the main one being that it sort of induces some incentives for demand for block space in the sense that if the base fee is very low, then more users will be incentivized to send messages. And if the block in the base, sorry, if the base fee is quite high, then people would rather maybe wait a little bit for it to go down. So hopefully it's somewhat clear that understanding this base fee is important because it is somewhat the measure of the demand for gas or for block space in the network. And as such, it is important to have a feeling of it, of how to modeling, how to use it for making decisions, etc. However, there's a caveat and it is that this gas used here is a random process because it depends on demand, it depends on how of the state of the network or the overall feeling for it, if Elon Musk tweeted Filecoin, whatever. So it is a random process. And as such, it is not trivial to predict it, it is not trivial to understand it. So one needs to be sort of careful when analyzing. So motivated by this, when we're writing this project, we have three goals in mind. The first one was to obtain key insights on the statistical behavior of gas usage. And the second one was to develop a probabilistic and data driven model for gas consumption in the Filecoin network. And the third one was to develop a tool set to simulate this gas behavior in the host that it can be useful for other projects. So now the rest of the talk is basically tackling each of these goals. Okay, so let's move on to the analysis. So remember the first goal was to obtain key insights on the statistical behavior of gas consumption. So as an experimental setup, what we did was we looked and analyzed at roughly six months of locked data. So on an epoch by epoch basis, which is around 700,000 data points each of each data point containing several features. Okay, and instead of looking directly at gas usage, this GT here, we looked at this normalized gas usage, which is just the gas usage minus the target gas divided by the target gas, which is a bit more convenient in the sense that this value here takes values between negative one and one. When G tilde of T is close to negative one, it implies that a block or the blocks were very empty. When it is close to zero, it means that it was close to the target gas, half of the block size. And when it's close to one, it means that the block or the blocks were quite full of that epoch. Okay, so let's start with some with some statistical properties. Here we're looking at the time series, the histogram and the autocorrelation function of G tilde of T and G tilde of T plus one minus G tilde of T. So the gas consumption, this normalized gas consumption and the difference between two consecutive gas consumptions. So there's a couple of salient features here. Perhaps the most interesting one or the most telling one is that there's this big pick here around one and a very small peak here around negative one. This means that it's more likely, way more likely to find blocks that are very, very full than blocks that are very, very empty. We can also see that this peak is very, it sort of concentrates around zero, which you can sort of understand as the blocks or this incentive mechanism, it's doing its job at hitting the target gas. So in fact, if you look at the mean and the standard deviation, this gas stays in its range around 70% of the time. We can also look at the autocorrelation function, which is a measure of how strongly correlated this time series of gas consumption is with itself. So one way of understanding this plot that goes to zero around five means that five epochs from now, the gas consumption won't be too influenced by what the gas consumption is at this exact moment. If we look at the histogram for the differences in gas, in gas consumption, so the increase or decrease, we can see two interesting features. The first one is that it's very symmetric around zero, meaning that it doesn't tend to favor over or under consumption. And the second one is that it's also center around zero, meaning that on average or in general, the changes in gas tend to be fairly small. Another thing that we looked into was the distribution of the time between epochs of very high demand. So when blocks were very, very full, epochs of very low demand. So epochs with the blocks were essentially empty or mostly empty. And a cool of times defined as the time that it goes, that it takes to go from an epoch with very high demand to an epoch with average demand, or conversely, an epoch with very low demand to an epoch with average demand. So there are two main interesting things here. The first one is that the probability distributions of these times between demands are all well approximated by exponential distributions. If you're a data scientist, a mathematician, a study station, whatever, maybe this is interesting to you in the sense that exponential distributions have many useful properties. One of them is that it's memory less, which can roughly be understood as these times do not, these distribution of the times do not depend on the history of this time. So it's an independent process. If you were to put a little bit more specific data on it, we obtained that on average it takes roughly two epochs to go from a block that is very full or an epoch with very high demand to an epoch of average demand, three epochs to go from low demand to average demand, around 20 epochs between blocks of very high demand and around 70 epochs between blocks of very low demand. And here an epoch is 30 seconds. Okay. Furthermore, another thing that we looked at was the timescale of this distribution. So instead of observing it block by block or epoch by epoch, sorry, we observed it every, every one day, five minutes, two days worth of data. And as we can see, these distributions, they're different because you're taking less and less data. So statistically, you're not going to get the exact same figure. But it has sort of pretty much the same shape, which to me, it sort of implies that this distribution doesn't change much in time. Which again, if you are into modeling, this is also quite useful to know, because it means that this process is a stationary process, it might change a little bit. There might be times with high demand or low demand, but the overall distribution of this gas usage stays pretty much constant across time. Furthermore, if we focus on looking at gas consumption by message, if we first divide messages into categories, the first one being what some people call control plane messages, which are those messages that are critical for the, for the functioning of the network, such as pre-commit sector, proof commit sectors have been windowed post. And on data plane, which includes all the messages, we can see that on average, this control plane this control plane messages take around about 95% of the available block space, while the data plane messages take the rest. And in fact, if we look at the 25th percent quantile, which is 94%, it means that it is very, very likely that at least that up until 94% of a block is going to include these sorts of, this kind of messages here. Furthermore, if we rank the top 10 messages by gas usage or by gas proportion on the block, we can see that, cumulatively, these three messages, by the way, can you, is it easy for you to read? I'll take that as a yes. Cumulatively, these top three messages occupy around 87% of the block. And the top four messages, on average, occupy about 91% of the block. We looked at the correlation between the gas amount of these messages. We obtained that there's not a very strong correlation between these messages, although there is some correlation between these three main types of messages. Okay, let's move on to models so that I'm not wasting my time. Recall that our second goal was to develop probabilistic and data-driven models for gas consumption in the Falkoi network. So we created models for gas computation, meaning the gas use, for modeling the demand process, so these underlying demand that pushes gas up or down, and for how messages feel at block. And of course, all of these can be used to simulate and to understand the data space here. So looking at the easiest model, perhaps the easiest model to take is what it's called a kernel density estimator. Essentially, what you do is you sample from the probability distribution that is induced by this histogram, and you sample independently from it. It has a computational advantage of it being very straightforward. However, it implies that the samples of the gas usage that you get from this method are independent from each other, which in turn hinders some of the hidden dynamics that this gas usage might take. Perhaps a more fancier model is to consider this to be a Markovian process, meaning that the probability distribution of my gas moving into a given state, it doesn't depend on the whole history of gas, but only depends on the current state of gas. So the consumption of the next epoch only depends on the consumption that we had at this epoch, which in turn defines this relation. If one were able to sample from the probability distribution that induces this, then one would one can create an algorithm to generate this gas process very easily. This is more likely to capture the hidden dynamics because you have this conditional probabilities. However, one needs to approximate this conditional probability here, which is not an easy thing to do. And since you're approximating it, you're likely to introduce some computational errors, which might affect how you're doing this. Another thing that we looked into was how to model the demand process. So like, how can we understand this hidden demand process, this abstract process based on measurements of gas usage? And one way of doing so is that if you assume that demand is itself a Markov chain, that can take values on this state. So you can have very low demand, low demand, medium demand, high demand, or very high demand. If you assume that demand is a Markov chain moving randomly between these states, and you assume that the gas distribution depends on the current state of demand. If you do that, then there are machine learning techniques and mathematical techniques to induce the transition probability between one state to another. So you can simulate how likely it is to go from a state of low demand to a state of very high demand or from a state of medium demand to a state of low demand. Furthermore, given this, you can simulate or you can obtain histograms for the conditional distributions, you can simulate the demand process, given the demand process, you can simulate the gas consumption, and you can classify the state of the demand given your data. You can also use similar methodologies together with queuing process theories and statistical methods to come up with algorithms on how messages or how blogs are getting fooled by a message. However, it's a bit more technical, so I don't want to explain it. Okay, and lastly, to finalize opportunities. So recall that the third goal was to develop a tool set to simulate gas behavior in the hopes that it can be used for other projects. So we have all this machinery, what can we do with it? So there's three main approaches that I can identify. The first one is an academic one. Because in the scientific community, machine learning and machine learning of random dynamic systems is a hot topic right now. So there's a lot of opportunities to investigate there. You can also use it to develop what's called digital twins. So instead of having to deploy a beta of your network, you can deploy code that simulates it quite well. And since you have like an understanding of how gas dynamics work, you can use this to probe and extract information from it. And also in the mathematical community and in the modeling community, there's also many methods that aim at improving this type of models. From a planning perspective, since we now have some understanding of how of gas consumption and gas demand, one could use these methodologies together with an extra set of assumptions to model how gas will change once the EFBM is implemented or with this interplanetary consensus type of algorithm. And lastly, the third part where one could use these methods is an improvement of current techniques, such as using this data to obtain better estimation for gas when a user is sending a gas bid or to use it on a Monte Carlo type of approach to improve robustness with respect to uncertainties that get induced by this gas process and demand process. Okay, that's it for me. Cool. Thank you. So one thing that I found interesting from from your talk was that how gas demand changes rapidly. And, you know, maybe this wasn't in part of your analysis, because it's maybe not, you know, you maybe can't see in the data. But one thing that I was interested in knowing is you showed on average, like, you know, my take two epochs for gas demand to change from like high demand to lower and three epochs from, you know, very low to high. Do you have any insight into like on those averages, like 70 epochs for the low when it's super low demand to get to normal? Why why it's taking like that much time? Or not that why it's taking that much time? But why it's like those amounts of epochs? So I mean, intuitively, so there's there's two parts of it, right? The first one, this this is 70 epochs between two low demand events. So if we look at this plot, it is not very likely that you will have a low like an almost empty block to begin with. So since this is not very likely the time between it happening, it's, it's larger. And then can you just go one one slide back? So like, do you have any insight on the first two bullet points on average, when you say like two epochs to go from very high demand to decrease and three epochs to go from low demand to increasing demand, any insights into like, why those numbers? It could be it could be for a variety of reasons, either the people who use the network are hyper vigilant, and they're like, Okay, decreased. And then we do it. Or it just might be an issue be changing rapidly. But yeah, I would I would have to think about it. Make sense. Why you could be I was just curious that like, you know, if people are like, you know, as soon as like the monitor, or the network has in high demand, it just like naturally goes to goes to a lower demand state, which you know, in a way, it means that this EIP mechanism, it's, it's working as intended, like when demand goes high, then it pushes it down. When it's done, it pushes it up.