 So, it's the latest Ethereum Sharding design that proposed by Dunkirk is here today in 2021. And so, in this new design that it unlocked so many scaling, I mean, the challenge. And it's not, now the sharded data, we don't, we shot it at the data blocks rather than having many EBM shards chance. But the Dunk Sharding protocols, there are still some technical challenge that we need to fix. So right before we have the full Dunk Sharding, Proto is also here and many of us, they collaborated and figure out that we can have a more feasible solutions in the short term to address the scaling. And it can greatly scale the Ethereum with their two roll-ups in the very near future. So that's the Proto and Dunkirk and it's the Proto Dunkirk Sharding. Okay, so there are some many common features in both EIP444 and the full Dunk Sharding. So today we will break down these topics and you can see that they both have the KZG commitment. So Dunkirk will introduce the cryptography part first. And they also have the Blobs transactions. So we will also introduce like what is Blob today. And FEMA kit is also the shared common features here. And so the challenging part of Dunk Sharding is the PBS and the DS. So we will also talk about it later. It's the agenda today. We have a very rich agenda in the next two hours. So, yeah, and I will hand it to Dunkirk next to introduce the cryptography in Dunk Sharding. So if you can go through this, you will know everything. Thank you, thank you, Sir Wen. Cool, so I will be giving an introduction to KZG commitments. I'll be starting with giving a motivation to understand why we need these advanced polynomial commitments and why we can't do all this simply with Merkur roots, which we are all quite familiar with. So I'll be going through the motivation. I will quickly go give an overview over the final fields that we use in order to commit to polynomials. I will kind of motivate KZG commitments as like hash of polynomials, then go to the actual meet, which is how do KZG commitments work. And finally, because we also use these a lot in our construction now, I will be going through the technique, which I call random evaluation, which is a nice trick that you can often use to work with polynomials, and that makes a lot of things that you want to do with polynomials a lot more efficient. Cool, so let's talk about data availability, sampling, and Erager coding. So what's data availability, sampling? So the idea is we somehow have a large blob of data, and what we're working on is scalability. So scalability means that somehow we have to make it so that a node has to do less work to achieve the same thing that we do today. So right now, every node ensures that all Ethereum blocks are available by downloading all the blocks. That's just an implicit part of it. Like it seems obvious because right now you also execute the full blocks, but it's one of the things that don't scale in the current Ethereum system. So we need a way to reduce this workload, but we want to do it in such a way that we don't lose any of the security that this provides. And that's what makes this tricky. And so the basic idea is, okay, what if we take our data blob and we just check that random samples of the data are available. So if we do this naively, if we just take the data as it is, then this does not really work because even missing a tiny amount of data is catastrophic potentially for a blockchain. But by doing random sampling, you can never find out whether a tiny bit is missing. You can only see whether a major part of the data is missing. So what we all need to do is, in order for this technique to work, is we need to encode the data in such a way that even having some parts of the data, say 50%, is enough to guarantee that all the data is available. And so the way we do this is we extend the data using a so-called Reed-Solomon code. And if you know a little bit about polynomials, like a Reed-Solomon code is nothing else but extending the data using polynomials. So what you do is, let's say in this simple example, we have four blocks of original data. And what we'll do is we will take these four as evaluations of a polynomial. There will always be a polynomial of degree three that goes through these four points. And then we can evaluate this polynomial for more points. And what this means, because four points always determine a polynomial of degree three, that any of these four points are enough to reconstruct exactly the same polynomial. It does not matter which four points you have. And so this is the basis of a major coding. And now, because of this, the data-availability sampling idea that we had here actually works, because now I don't need to ensure that every single bit of the data is available. Now I only need to know that at least 50% of the data are available, and then I can always reconstruct everything. Yeah, sure. Is there now double the data? Yes. Yeah, we have double the data. Check the percent of it? No. So that's the trick. So we do random sampling. So as an example, we query 30 random blocks. So if the data is not available, that means the attacker needs to have withheld 50%. Because if they submit it more than 50% to the network, it's all there. Okay. So if it's not available, then each of these samples, because we used local randomness to query them, has only a 50% chance of succeeding. So that means, in aggregate, the probability that all of them succeed is now two to the minus 30, which is one in a billion. So this is why it scales. You don't need to query 50% of the data. You only need to do a tiny number of random samples. And this number is constant, so it does not depend on the amount of data. Okay. So what if now we have this polynomial? So we have these evaluations. This was the original data, D0 to D3. And then we have these extensions that we computed. And we just compute a normal Merkle tree on top of it and use this root as our data-developed root. So the problem with this is that Merkle roots do not tell you anything about the content of the data. So like it could be anything. So in this case, let's say an attacker wants to trick our data availability system. They could just not use this polynomial extension, but they could just put random data. So basically in coding terms, they have provided an invalid code. And what that means is that if you get four different chunks of this data, you would always get a different polynomial. So consensus is all about agreeing on something. And in this case, we wouldn't actually have agreed on something because the data is different depending on which of these samples we've got. So the only way to make this work is if we add fraud proofs to the system, where you basically prove that someone has provided this invalid code. But that isn't great. So that has some problems. They add a lot of complexity. And particularly in this case, because this is about the layer one itself, it would make our system very, very difficult to design because now validators would need to basically wait for the fraud proof in order to know which block to even vote for. So it would be very effective to design this. So the interesting question is what if we could find some kind of commitment that instead always commits to a polynomial. So we always know that the encoding is valid. Cool. And that is why we will introduce KSD commitments. And we need to start a little bit earlier. So I will start by introducing finite fields a little bit for those who are not familiar with it. Okay. Sorry. I can read from you. Okay. So what's the finite field? Okay. So to understand what a field is, it's basically think about rational, real or complex numbers which you've already learned about. And just remind yourself like we have basic operations that we can do in them. We can add, subtract, multiply and divide. And we can do all of these except division by zero so that you're always able to do these operations. And you have some laws like they are associative, commutative, distributive. Basically just think of like I mean I could give like the formal laws here but I don't think that would be the best illustration because you're already very familiar with these rules when you work with rational or real numbers. And the big difference for finite fields is that unlike these fields that we're very familiar with, which all have an infinite number of elements, they have a finite number of elements. That's quite important because otherwise we can't encode them with a finite number of bits which is kind of like something that we need to be able to do. So that means that each element can be represented using the same number of bits. And as an example of showing how this works, here's a very small finite field, this is F5. And basically the way it works is you take the five numbers 0, 1, 2, 3, 4 and you use your normal integer operations to compute like the addition, subtraction and multiplication. But whenever you've done that, you take the result and take the remainder after division by five, so you take it modular five. And then when you write it out, basically you'll find that for each element, so we haven't yet defined like how do we do division, so if you write down the multiplication table you'll find that each element has actually an inverse. So basically if you take for example here like two, then you can see that two times three is six but modular five, that's one. So it has an inverse. Okay, that's nice. And like then the other way around three, the inverse is two and for four, if you take four times four, it's 16 and that's again modular five, that's one. And so we've just found like by just listing these numbers that every element has an inverse and the reason for that is that five is a prime number. So whenever we take these modular operations, modular prime number, then we'll find that we actually have a finite field. And so that's our finite fields except that the fields we're going to be working with in practice will have a lot more elements. So the prime that we're going to be using will have 255 bits, so it's like a very big number because yeah, we want to be able to represent a lot of numbers in this field. Cool. Okay, so let's think about hashing polynomials. Okay, so a quick reminder, what's a polynomial? So a polynomial is an expression of this form. So it's like a sum over some coefficients fi and terms of the form x to the power of i. So the property is that it has to be finite sum, so it's a sum from zero to n, n is the degree of the polynomial and basically the other important thing that you have to always remind yourself there can never be any negative term. So you cannot have x to the minus one. It's only terms of the form x to the power of zero, one, two, three and so on. And each polynomial defines a polynomial function. So it's important to distinguish between the two. So polynomial is just an expression of this type. So it's just, you could think of it even as a list of coefficients and then it defines a function. But like for example in some fields, like in finite fields, you will have the property that the same polynomial function can have many polynomials corresponding to it because there's only a finite number of functions, but there's an infinite number of polynomials. This property you don't have in infinite fields. And so the cool property that polynomials have is that for any k points, there will always be a polynomial of degree k minus one or lower that goes through all of these points. And this polynomial is unique. And the other property is that a polynomial of degree n that is not constant has at most n zeros. OK. So what would be cool if we could imagine a hash function for polynomials? So let's imagine that we could have a hash function that takes a polynomial and hashes it. OK. That's easy. But it should have an extra property which is that we can construct proofs of evaluation. So basically what we want is that for any z, so any point, we can evaluate those polynomials, compute y equals f of z, and we want some proof that this is correct. So that would be an interesting hash of polynomials that gives us something new. And this hash and the proof should be small in some sense. So here's some idea. OK. What if we just choose a random number? For example, let's say we choose the number three. If we want to hash a polynomial, we just evaluate it at this random number, three. So we put x equals three. There's a couple of examples of how that works if we stay in our small field, f five, which we did provide before with just those five numbers. So if our polynomial is x squared plus 2x plus four, then the hash is like x squared nine plus 2x six plus four, modulo five is four. And then here's a second example, so a bit of a bigger polynomial, modulo five, and in this case it's zero. OK. That seems a bit stupid to just do it at one point. But the interesting thing is, if our modulus has 256 bits, which is what we're going to work with in practice, it's actually extremely unlikely that two randomly chosen polynomials have the same hash and quotation marks, just like it is for a normal hash function. So that's an interesting property. It seems like a very stupid and simple operation, but in some ways, in one way, it already has a property, like a hash. OK. So if we accept this for now, then let's have a look at some of the things we could do with it. So for example, we can actually add two hashes of polynomials. So like if we have the hash of two functions, hash of f and hash of g. Then the hash will just be the sum of the functions will just be the hash of f and plus the hash of g. And that's because of this homomorphic property, which is trivial if you write it out in polynomials. And the same is true if you multiply two of these polynomial hashes. And that's just because polynomial evaluation itself is a homomorphic property. If you can either first add two polynomials and evaluate them, or you can evaluate them and then add the result in the same for multiplication. So it has some really cool properties if we could use this hash function. But there's one problem. If you use this, then if someone knows this random number, then they could easily create a collision of this polynomial hash function. Because while for random polynomials it's very unlikely that they evaluate to the same point, it is very easy to create like manually two polynomials that evaluate to the same value at this random number. So it doesn't quite work as a hash function as we know it. But what, it would be different if somehow instead we could put this random number into a black box. So if we could find a way of computing with these finite field elements, but instead of giving everyone who wants to evaluate this hash function, giving them the actual number, you give them a black box. So we assume we have a cryptographic way of putting a number into a black box. Then we give them our random number s and we give them also the random number s squared and s to the power of three, but all of them only in the black box. And we do it in such a way like this black box needs to have the property that you can multiply it with another number and you can add two of these, but you cannot multiply two numbers in a black box. So if you could do that, then this would actually work because now the attacker would not be able to create these two polynomials because they don't know this number. And so they cannot craft the polynomials so that they evaluate to the same number at that point. And basically the cool thing is that elliptic curves actually give you exactly that. So elliptic curves are basically, you can think of them as a way of creating black box finite field elements. And the finite field that you have to use is the curve order of that elliptic curve. So if we have an elliptic curve, which we call G1, why we need this index G1, we'll come to later, but it's just elliptic curve that has a generator, which means that that's a point so that if you add that point again and again, it will generate your whole curve. And the order of the curve is p, so that's the number of points. And then basically we have the property that x times g, where x is a finite field element, x times g1, is basically this black box. And the reason for that is that it is hard to compute so-called discrete logarithms. So it's difficult, like when you have computed this x times g1, it is difficult to compute x from that point. So that's a cryptographic assumption. And so if we have that, then if we take two elliptic curve elements, g and h, then we can multiply them with field elements, like we can compute x times g, we can add the two g plus h, and we can compute linear combinations like x times g plus y times h. And what we can't compute is we can't, without computing the discrete logarithm with a chart, we can't compute something like g times h. And so just like we said before, like we want this black box, so we will introduce the notation like x and squared brackets, one for saying that it's in this g1, which is the first elliptic curve we're going to use, we later will need another one. We define that at x times g1. And so basically, when you see these square brackets, think of it as like this is a prime field element in this black box, in this elliptic curve black box. So we can put stuff inside, and there's no easy way to take them back out. But we can do some computations while they're in there. Cool. And with this, we are ready to introduce KZG commitments. Okay. So what we're going to do is we're going to introduce a trusted setup. So we're going to assume that someone has taken a random number, s, and they've computed inside this box and given to us the powers of s, s to the power of 0, 1, 2, 3, and so on in our black box. And actually forget this second one, for now we'll come to that later. And so if we take a polynomial function, so we've defined this previously, so it looks like this, it's like a sum of coefficients times powers of x. And we define the KZG commitments as this sum, which we can evaluate. So we take the coefficients and we replace x to the power of i by s to the power of i inside this black box. And here on the left, like so this is something we can obviously compute, it's just a linear combination of these elements which we have been given as part of the trusted setup. And the cool thing is if you write this out in effect, if it is just f to the power of s evaluated inside this black box. So effectively we've come back to what we said before. It's just this random evaluation, but we've managed to now randomly evaluate this polynomial inside a black box at the secret point. And yes, and this we call the KZG commitment to the function f. And now in order to do interesting things with this, we'll need to introduce elliptic curve pairing. This is where we get our second group. So we actually need a total of two groups. And what we'll have as we have a pairing is a function from two elliptic curve groups and a target group, which is a different kind of group. It's actually not an elliptic curve, but that's not too important here. And it takes basically these two elliptic curve elements, one in G1 and one in G2. And it has the cool property that it is what we call bilinear. And so that what that means is that you can compute this linear combination. So for example, if you have the pairing of A times X and Z, that's basically you can take this A out, the same in the second coefficient. In addition, if you have the sum, then basically what it does, it splits into these two. So it's like a distributive law here, X plus Y times Z is E of X Z and E of Y Z. And the same goes again in the second parameter of this function. And basically the cool thing is that this, what we couldn't do before, inside, yeah? Yes. And so what we couldn't do previously between elliptic curve points, which is multiply two elliptic curve points, is that we can do away between pairings. So if we have one of our points in this first group and one in the second group, then due to this bilinear property, it actually in the target group, it computes something like X times Y. So it has this property. We define this additional notation for the target group. And then we have this very clean and nice equation that the pairing of X as a black box element, Y as a black box element is X times Y. So this is very important basically at this point when we have the pairings, and that's why we really need them, we can do one multiplication. We can only do one because afterwards we get this target group element and that we can't really do anything with. But it turns out that this is actually enough to do a lot of very useful stuff in elliptic curves. Okay, now let's assume that we have two polynomials, F and G, and we commit to those polynomials, but we commit to F in G1 and G in G2 in the different groups. Then basically this pairing actually lets us compute this, like the product of these two commitments in the target group. So basically in this really cool polynomial hash that we have defined, we can now, if we commit to them in the right groups, we can now multiply two polynomials that are committed in this way. So we can multiply the commitments without even knowing the polynomials themselves. Cool. Okay, so we will need to introduce one last missing piece in order to fully come to how KZG commitments work and how we can construct proofs, and that is quotient of polynomials. Okay, so let's say we have a polynomial, F of X, and we have two field elements, Y and Z. And then we can compute this quotient Q of X. This is a rational function, right? So like a polynomial divided by a polynomial is in general a rational function, so you can just see this as like a formal expression. But sometimes this quotient is exact, so sometimes like this quotient will actually result in another polynomial. And basically there's a theorem that's called the factor theorem. It's relatively elementary, you've probably learned it in school at some point without calling it that. That basically says that this is a polynomial, this quotient, exactly if F of Z equals Y. And I mean, you can kind of see like that in one direction because if F of X equals Y, then if you set X equals Y, you get a zero here, F of Z here, and you get a zero here. So like zero by zero, that can only like, yeah, so sorry, if the quotient is zero at Z, that can only work if this is also zero at Z, so like so it can only really work if this is correct, but the other direction is slightly more complicated. So if we restate this, basically we get the fact that we get the polynomial that fulfills the equation, this equation, so we just put the X minus C on the other side, Q of X times X minus C equals F of X minus Y, if and only F of Z equals Y, okay. And now we get to how the KCG proofs work. So if a prover wants to prove that F of Z equals Y, they compute this quotient, Q of X, which is F of X minus Y divided by X minus C, and send the proof pi, which is Q of S, so the commitment to the polynomial Q. And in order to verify this, what the verifier will do is they'll take this quotient and they will multiply it by the commitment to S minus Z, and check that this is the same as original polynomial, the commitment to that minus Y, and so this is unfortunately not very readable on this background, because if you write it out in this pairing group, then you get on the right hand side, Q of S times S minus Z in the target group equals F of S minus Y, and this is the same as the second equation. So the cool thing is we can verify this equation because we are able to multiply two polynomial commitments inside using the pairing. And this way we can verify that the quotient was actually computed correctly. Cool. Yeah. And that is basically, that is how KCG commitments work. So the idea is just if you can compute this quotient, then you'll be able to find something that fulfills this equation. And using the factor theorem that we mentioned previously, if F of Z is not Y, then you cannot compute this. It doesn't exist. It's not a polynomial, and we can only commit to polynomials. So yeah, this is the recap on the KCG commitment. We can commit to any polynomial using a single element in G1. And this is just the variation of the polynomial at the secret point S inside the black box. We can open the commitment at any point. So we can compute F of Z, and by computing the quotient Q of X, we can compute this proof, which is Q of S in the black box. And in order to verify that proof, we use this pairing equation, and that shows a verifier that this evaluation is correct. Cool. So that is how KCG commitments work. Now I want to do something slightly more, which is a technique that we use quite a lot. Even using it in EAP 4844. And so I want to give a quick introduction into how it works, which is the random evaluation trick. Okay. So basically, let's recall that KCG commitments are nothing but evaluating a polynomial F at a secret point S inside this elliptic curve black box. And so in a way, this is already like a random evaluation. Like basically what we've done is we've identified this polynomial using a random evaluation, and we somehow found that this is good enough to hash a polynomial in a way that it's very difficult to create collision. And more generally, this random evaluation trick can be used to verify polynomial identities. And the reason for that is the Schwarz-Zippel lemma. And I will just formulate it as a more general one, but let's say what it says in one dimension. So let's have a degree, a polynomial of degree less than n that is not identical to zero. So there's one particular polynomial that is zero everywhere. That's just like all zeros, right? That's a very special polynomial. So let's say it's not that. Now let's take a random point Z in FP, then the probability that f of that is zero is at most n over P. And that's because it can have at most n zeros. And so this is a very useful thing because our P is very, very large and our degree is relatively small compared to it in our case. So for example, in 4b less 1231, P is 255 bits. Say we commit to a polynomial of degree 2 to the 12, then this probability is something like 2 to the minus 240. So like it's a very, very small probability. And so here's the first way in which we can use this. So like we have these transaction blobs that will define for 4, 8, 4, 4. So it's like they are commitments to polynomials with 4,090 degree 4,095. So in total 4,096 points. And computing such a commitment is not very expensive, but it is expensive. It's like 450 milliseconds to do this. But verifying one KZG proof is quite a lot cheaper. It only costs about 2 milliseconds. So we can use this to our advantage. And so the idea is this. We take our commitment to the polynomial C and we take the polynomial F itself. So what we want to verify is that we have the polynomial that it's given to us. In this case, we have all the data and we have the commitment. And the naive way, we can just compute the commitment from the polynomial. But that's expensive. Okay. How can we do it cheaper? We do, we compute a random point. And one way to get a random point is actually a very cool technique. It's called Fiat-Chamier. We take all our inputs. So we compute that as the hash of the commitment and the polynomial. Why is that kind of random? Because like if an attacker tries to craft something, if they try to adversarily compute either C or F, it will always change the point that so it's very hard for them to find some like to craft them in a way that breaks our construction. So basically, this is a common technique in cryptography to get something random that the attacker cannot control. And so we evaluate this polynomial Y at this random point that we've taken. And then we compute a KCG proof that F of Z equals Y. And basically then what we'll do is we just add this proof to our transaction block wrapper, which is the way we're sending transactions. And then like to verify this, you compute Z as this hash. You also compute F of Z, which you can do because you have the data for that. And you check the proof pi, the KCG proof. And that's done. And that's much, much cheaper than computing the commitment. So that's one way in which we can use random evaluations to like save us a lot of work and making things more efficient. Okay. So here's another way in which we can use this random evaluation technique. And so ZK rollups, they use many different proof schemes. And so only a handful, I don't know if actually they're in here right now, will use natively KCG commitments over BLAST 12381. And so the question is like, how do all the other work make efficient use of our block commitments that we want to add with 4844 and then full charting? Because computing KCG commitments inside a proof or computing pairings, that is pretty expensive. And that's a very expensive operation in a zero noise proof. And so what we do in this case is basically you have to, you commit to the data in different ways. So we have three different inputs. So we have our BLOB data, which is this function F itself. And we have two different type of commitments now, we have C, which is our BLOB commitment, which is what we'll use inside Ethereum for 4844. And we have another way of committing to this data, which is using the ZK rollups native commitment. So they will in some way, it will also have some way of committing to data that works well for their zero noise proof scheme. And so in this case, what we'll do is we'll take Z as a hash of C and R, like these two different commitments, and we will compute Y again as F of Z, and we'll add pi as a proof that F of Z equals one. And we'll add this pre-compile that allows us in the Ethereum virtual machine to verify that the KCG proof pi. So we will know that C is a correct commitment to F, and what we'll need to add is to add the proof that R is also a correct commitment. And the ZK rollup can do that inside the proof. So they will inside the proof, they also have to somehow get C and R as an input and hash them and compute Z. And then they can evaluate. So they will also have F because the rollup wants to use the data. So F is completely available to them, and they just have to compute Y equals F of Z and use some technique to verify that the F is the same as they are, but there are ways to make this easy, and then they can verify that they have the same data as was committed through C. So that will make it much easier to use these commitments in ZK rollups. Cool. And yeah, I collected some resources if you want to read further on this. So Vitalik wrote a post on elliptic curve pairings. Because there was a lot of interest in that, I wrote some notes on this last part, how to use KCG commitments in ZK rollups. For those who are skeptical and wondering, do we really need this advanced cryptography and trusted setup and so on, Vitalik recently wrote a summary on what the difficulties are with alternatives to KCG commitments. And here this is, if you want kind of, it's very similar to this talk, but I wrote a blog post about KCG commitments. And then of course, if you want to dive deep, there's the original KCG paper. And if you scan this QR code, there will be all these links. Cool. Yeah. So I guess we can look at the main question on this. I don't understand. Where do you want to open it multiple times? So what we do with KCG because checking ultimately that it's open. Right. And in the live site, the back end is here, live site, the setup, so we are ready to check. Right. If each time you would have used the pressure, I'm going to hit I. Are you talking about S, or like the trusted setup? I mean the size there. Yes. Mm-hmm. Right. But this is a cryptographic probability, right? We're talking about, I mean, that's why we're setting the security to two to the minus 128, so to turn it in. Yeah. Mm-hmm. Yes. But so we are setting in cryptography, we are setting our security parameter already in the assumption that an attacker will do a lot of computation to try to break it, like two to the 50, two to the 60 or more computation power. This is much, much more than however we ever use it in the actual protocol. So like this is all already covered by the cryptographic construction. I mean, you can do it, but the random probability is like the probability of randomly hitting that are extremely, extremely low. Like if you construct it so that the probability, so yeah, yeah, like randomly they are less than two to the minus 200 or something like that. It's like so low, you cannot even like, yeah, think about it, yeah. Very good question. Do you think we use any vibrations to distinguish polynomials? Mm-hmm. I wonder how you distinguish something like x to the p minus x from the zero polynomial. X to the p minus x. Right. From the zero polynomial. Yes. Yeah. No, you can't, but that's fine. Okay. So we are always, because we're limiting the degree of our polynomials, right? So our trusted setup will only go to a certain power. To the power of 12. And inside that space, there's only the zero polynomial. Yes. Yeah. Yeah. So like if you have no limits on the polynomial degrees, then it doesn't work, but we always have a limit. Yeah, the question could lead to the Q and time. Okay. Thank you. Thank you. Excellent. Excellent. Thank you, Dan Crud for the math. So, okay. So now we're going to go into the bit more like kind of like we're in the sky of math and we're kind of like tone it down into like the protocol stuff. So I'm going to start with a small like explanation of how all this math stuff going to our protocol and how like, you know, all the extra bandwidth of 4844 travels around and gets verified, then Proto is going to take it and tone it even down into more practical stuff like how the L2s are going to use the data. And then Ansgar is going to tone it even more down and basically explain how people pay for this data. So, okay. So basically this is a graph that shows how like optimism and L2, what is its costs and you can see that like this blue stuff is the data fees, like how much money they are paying for the like data they put on chain and the other like white stuff is some other stuff, but you can see that the blue, the data is dominating all the costs. So basically what 4844 is, it's like a mechanism that drastically increases the amount of data people can post on chain. And this is all it is, right? So, okay, so basically what we want to do is we want to increase the amount of data. So on this very simple picture on the left side, you can see our data, which we call blob, because it's a bunch of data that also corresponds to polynomial. And on the right hand side, you can see a small thing, a commitment that represents that data commits to that data. And the like rough idea is that, you know, commitment goes on chain forever, whereas the blob is kind of like, you know, there for a bit and then disappears. So this is like the high level strategy of how we increase bandwidth. We commit to data. We keep the commitments forever, but the data is ephemeral in a way. Okay, so let's talk a bit about what this data is, what these blobs are, how do polynomials enter this picture. So, okay, this is a polynomial. I think by now you're very familiar with it based on the last talk. The question is, like, how do we put data into this polynomial? And like the basic idea is, you know, you have these coefficients, the a1, a2, whatever, and each, you can basically put data into this coefficient. So, you know, if you have some data, one, four, one, six, you can put it in the coefficient. So you make this little polynomial on the bottom. And that's like a very straightforward way to put data into polynomial. So, right, so let's now think of like what, like in our case, let's see about these numbers one, four, one, six, how they can resemble real data. So in our case, the numbers are going to be finite fields. So they're going to be parts of the finite field, which is going to be like a number between zero and this insanely huge prime number. And so each coefficient is going to be a number between these two things. And that's about 254 bits. That's about 31 bytes. So a coefficient with a polynomial with like 4096 coefficients can store about 128 kilobytes by putting the stuff into the coefficients. So, you know, now we know of a way to store 128 kilobytes into polynomial. And that's kind of interesting because like, you know, right now rollups, they don't even use close to that number. Like maybe they use one kilobyte. So we're basically giving lots and lots of space, maybe even uncomfortable lots of space to rollups to put their stuff in. But this is like the whole idea of 4844. Of course, in reality, we don't put the data into the coefficients and we put them into valuations. And then we do interpolation, but like whatever, this is not so relevant for this case. The idea is that, like, you know, when code data into polynomial and we have polynomials that correspond to a big amount of data. And that's a blob, right? And then, you know, then we have case G, which is what Duncan was explaining for the past 45 minutes, which is basically like a black box where you give a polynomial to the black box and spits out a commitment and the commitment is tiny and the data is big. So you end up with a situation where, you know, you end up posting on chain, lots of data and then a small commitment. And this is like the rough idea. So just to talk a bit about, like, when this data travels, what the network is supposed to do, you know, like when you see a commitment that corresponds to lots of data, what the network needs to do is, like, they need to make sure that the data corresponds to that commitment. And, like, the basic thing to do there, the basic strategy of the verifier to make sure that someone is not, like, you know, fooling us and giving us a wrong commitment to other data, which would be catastrophic, is to, you know, like, commit to P of X, use this black box again, commit to it, and then check that the commitment that the verifier computed matches the commitment that the guy gave you. So that's basically pretty straightforward way to verify that polynomial matches the commitment. But, you know, then we have more data and more commitments, you know, in a transaction, you can have lots of those in a block, you can have blocks, you can have lots of those. And that starts being quite expensive. So what we end up doing, because, you know, like, it's 50 milliseconds to do each of the commitments, so, and it scales linearly. So that ends up being quite expensive, especially, you know, for mempool and this kind of stuff. So in the end, what we're using, we're using KZG proofs and this whole random evaluation trick that Duncan taught you before. And basically, for each, like, data and commitment, we also put a proof of a random evaluation. So basically, the proof is a helper that helps you do this small verification. I don't have enough time to go into the details. But like the idea is that, you know, like the proof tells you that the committed polynomial evaluates to Y at Z. And then you can also evaluate the polynomial on the left side at Z and get some other number. And if the Y1 and Y2 matches, you're certain that it's the same, that the polynomial matches the commitment. And this is much faster than doing the commitment manually. It's not my intention to go very deep into this. I'm just giving you some idea of how KZG is used in the protocol. So I think I'm going to stop here and stop with the cryptography and pass it over to Proto, who is going to go a bit deeper into the actual system. Hello, everyone. So let's talk about the blob usage. So with EAP for it for far, we're introducing a transaction type to make to confirm these blobs in the EVM chain. However, something to note is that it's a new concept here where we are having a transaction type with data outside of the transaction. That's now a responsibility of the consensus layer. So it's like a regular EAP 155 transaction. The transaction contains some pointers or hashes ready that then commits to the blob data. This is the transaction in a bit more detail. Something else to note is that it's not RLP, but as you see, so it merkelizes nicely and it's better for layer 2. And then note here that we have these data hashes committing to hashing the KZG commitments, which then commits to the full blob data. These data hashes are available in the EVM through an opcode, whereas the blob data lives outside of the EVM. So the blob content is, unlike call data, not available in the EVM. Eventually we can prune this blob data. It's not a long-term commitment to store all of this blob data, but rather we are introducing this blob data just for the availability properties. Layer 2 needs this data to help users sync the latest state permissionlessly without communicating directly with the sequencer or whichever operator exists on the roll-up. And then people can reconstruct the latest state. We can have a different solution for retrieving very old states like a month ago or like two weeks ago. So there's the separation of data and the transaction itself. So this is what the lifecycle looks like. As a layer 2 user, you submit the transaction. Then we have this bundling. As a layer 2, we often combine the transactions. So you can pack them, compress them, and so on. This is a task for the roll-up operator. And then as a roll-up operator, you publish your bundle to layer 1 with this new transaction type. And then in the transaction pool, we have both the transaction at base-to-fee, as well as the wrapper data with the actual blob content. And then the layer 1 beacon proposer creates a block, and the blobs make their way from the transaction pool in the exclusion layer to the consensus layer. At this point, the blobs don't get into the exclusion layer back. It's just the responsibility of the consensus layer. The pairs on the beacon chain, they sync the blobs bundled together with blobs from other transactions as a sidecar. And then the exclusion payload stays on layer 1, whereas the blobs stay available for a sufficient amount of time to secure layer 2, but then can be pruned afterwards. So blob data is bounded. This is what it looks like on the network level. We have the layer 2 sequencer communicating with the transaction pool, the execution engine communicating with the beacon proposer, then beacon nodes syncing the blobs with each other, and then there's the split of the data where the other beacon nodes, they give the exclusion payload, process the EVM and everything. Fees will be processed by everybody. But the blobs, they stay in the consensus layer until a layer 2 node retrieves them to reconstruct the layer 2 stand. So how do roll-ups work with this? Danck already explained the proof of equivalence trick, so I'll give you just a simplified overview how we do this in the EVM. We introduce two new things in the EVM, an opcode and a precompile. The opcode simply retrieves the data hash, which is this hash that is part of the transaction, just like the hash is in the access list from the Berlin transaction type. It can be retrieved through an opcode, pushed on the stack, and then there's this precompile, which you can provide with a proof to verify that certain data at a certain position matches the blob content committed to by the data hash. In the case of Zika roll-up, we use this precompile to do a random evaluation and prove that the data that the roll-up is importing is equivalent to equal to the data that the blob is introducing. This precompile is versioned, so we can change the commitment scheme and in the future, I hope, we can use it for other things, perhaps for Q3 verification. Then this is part two. Yes? Can you evaluate that for the blob as the opcode? Right, so going back, the proof, all the inputs to the precompile, they're passed in as call data. The blob is completely separate. It's not involved in any of this computation. So it's just the call data that are passing in with a proof, the index of the pointer trying to verify it. The commitment that hash is to the hash that we retrieve from the opcode and then the precompile will verify everything. Similarly, we have the Zika state transition that needs to be verified. This is all Zika roll-up specific up to you to design this. But with the data that's verified and the Zika proof that's verified, we can then get some outputs that we can persist and then use to enable withdrawals. Then this is the version of the interactive optimistic roll-ups. Interactive optimistic roll-ups use this concept called a preimage oracle where we do not access all the data at the same time, but rather we load pre-images one at a time. And by bisecting an execution trace, we only really have to do a proof for a single step, a single execution of a single VM instruction. And this might be loading some data. For example, we start with a layer one block header hash, then we retrieve the full block header as a preimage, then we retrieve the transactions by digging into the merc three commitment in the transactions hash. And then we can get the data hashes from the transaction and then from the data hash, we can get the KSD commitment and then it's not a regular hash commitment anymore, but there's a different type of commitment with the same oracle where we load one point from the blob that is committed to by the blob transaction. And so this way we can load all the data into the for proof VM. Okay, yeah. Hello everyone. I'm going to talk a little bit about now that we hopefully in the future will have this functionality, how can you pay for it, but also kind of conceptually basically, I mean, the Ethereum blockchain is already kind of pushing its limits, like where is the extra space resource wise for this basically, where does the efficiency gain here come from? And to understand that first, we have to just look in general about how does resource pricing on Ethereum work today. So this is just kind of my way of thinking about categorizing the different resources we have on Ethereum. So there's things like bandwidth, compute, state access, memory, state growth, history growth, right? All the kind of things and this is a non-exhaustive list, right? But this is basically the kind of things that actually cause effort for nodes while they are processing a transaction. And if you squint at this hard enough, you'll notice that there are basically two different types of resources here. And we call those basically burst limits and sustained limits. And so burst limits are things where basically they cause costs right at the moment that the block is propagated, right? The bandwidth to propagate a block, the compute to actually verify it. All of the critical point there is that basically it has to be bounded in order for blocks to still be propagated in a timely manner. And it also for nodes to be able to verify them at all, right? They might run out of resources. And the sustained limits, they don't matter so much block to block. Those are more things that accumulate over time. So that state growth history grows, these kind of things where a single block can't really produce too much damage there. But over time, it just basically makes it more and more costly to run a full node. As it turns out, if you look at this, there's some sort of structure to this. And you can actually reorder this a little bit. And it turns out that usually there's a relatively good matching between a specific burst limit and a specific sustained limit. So bandwidth and history growth kind of correspond, right? Because the bigger a block is, the bigger the more bandwidth you need to propagate, but then also the more disk space you need to just keep it round for history purposes and similar with state access and state growth, these kind of things. Now, specifically for 4.8.4.4, right? What we are introducing is this new type of data. So kind of the resources we're talking about here is this first kind of row. So it's on this burst limit, it's the bandwidth, how big can blocks get. And then on the sustained limit, it's just like, how much resources do you need to store kind of the history of Ethereum? And if you put an engine and if you basically looked into the IP a little bit, you already know that there is this limit in terms of history growth. We basically introduced this new mechanism where blocks are only stored for a single month. And so this is basically why on the history go side, it does mean that there will be some extra requirement for node operators. But it's quite bounded because unless normal history that today is stored forever, but even after this, this nice IP 4.4.4, physically, even after that, it's still going to be stored for a year. Blobs only stored for months. So basically in terms of sustained limits, it has like a very limited impact. The more interesting and also more tricky side of this picture is the burst limit. So bandwidth and to kind of understand what the situation is and how 4.4 fits in, we have to first remember that today on Ethereum, basically, we only have a single gas price. Whenever you send a transaction, you don't actually specify how much bandwidth do I want to use, how much compute, how much memory. You just spend, like, one gas limit and then also basically how, what kind of base fee are you willing to pay for this. And it's all basically mapped down into what you would think about it as like a single dimension for pricing. And that comes with basically very real trade-offs in terms of a kind of resource efficiency. So if you look at this kind of stylized picture of just looking at two different dimensions here, there could be, I don't know, data and compute or data and memory or whatever, right? Two different dimensions. And basically the way the kind of the Ethereum gas works today, and that's purely for simplicity, right? Because it's very simple for users to deal with one dimension, basically. But the way it works is basically that it is basically that those two resources compete for usage in a block, right? So you could imagine if a block is very full of compute, then there's very little room to put any data in it or the other way around. And actually if you want to like open EtherScan, for example, for every block for the detailed page, they actually give you the size of the block. And usually it's something like 50 kilobytes, 100 kilobytes, but like rarely more than that. But if you look at what would a block look like if it was like just full of call data, which is where all the data comes from. If you were all the way, like say on the lower part of the diagram, if resource B was data, it could actually be up to one or two megabytes, well, two megabytes basically of size, right? So what that means is B basically determined in the past that two megabytes per block are kind of safe and the resource would be there, right? It's basically sitting there, but an average block basically almost like completely underutilizes data. And that is again just because it's simpler for us conceptually to price these things. So most of the time we are like very far up the slope there. And where do we want to be? Like what would be like the most efficient way of handling resources? Well, that would basically be this picture. So ideally you'd want to basically make these things be independently used, consumable, where you can basically consume the most amount of the highest safe amount of data that we think you should be. Like the chain can manage, but then at the same time you should also be able to still do state access to the highest amount possible or memory or whatever, right? There should not be this kind of competitive nature to it. And this is basically where 4844 on the burst limit side gets its inefficiency, right? Because full charting, we'll hear about it a bit more after this. Actually, that's really clever things where people only sample the data. So bandwidth constraints goes quite down, but for 4844 there is no fancy trick, right? Everyone still downloads all the data. So it's very real bandwidth strain. So the innovation on the burst limit side is purely trying to get to this upper right point, trying to actually basically make it so that the existing resource we already have today is just more efficiently utilized. And the way we do this is by going from, as we're saying right now today, pricing is one-dimensional. And so what we introduce with 4844 is basically we go to D. And this is how that looks like. So this is an open PR right now. It's not yet quite much, but you can have a look. So small details might still change, but I think the general direction is pretty set. And so the idea is we introduce what we call data gas. And as you can figure from the name, it's not blob gas, it's data gas. The aspiration would be that maybe in the future we can expand this to cover the entire data dimension. But for now it's only used for blobs. And we set it in a way where basically like one byte of blob will cost one data gas. And this data gas, importantly, basically is completely independently priced from normal gas. So it has its own 50-59 style mechanism where, and that's where basically we're with the use. And I see Mario is not very happy about this because, you know, like he has to implement it and get that at the end of the day. But this is really important for the AP because other than that, basically you wouldn't be able to get to this more efficient bandwidth usage. So what does it look like? And how can you think about it? Well, it's just, you know, similar to how 50-59 already looks. So the way, this is courtesy of proto, I stole the slide. And so every column here would be a separate slot. So the first slot, and in this case, basically the target amount of blobs would be two. The maximum allowed would be four on a block. So the first block comes in, it has exactly two. So nothing happens. The next one has three, right? The red one, it's basically one, two, too many. So the price would go up. And then the next two kind of like are stable again. And then one misses a blob. So the price goes back down. So it's like a very, you know, like just like 50-59, like you, you, you know, and love it, basically. It is a bit different or like basically it's under the hood. It works a bit differently. So here's kind of a bit more, more, more look at the details here. So first of all, of course, just we have a max data gas per block, right? Just similar to 50-59 and the target, that is half of that. Transactions, these, these block transactions, they specify an additional max fee per data gas field. So like how much are they willing maximally per data gas to have that transaction included. Importantly, you know, this does introduce a little bit extra complexity for users, but users in this case are not actually users. Those are like big rollups, right? Because with them having to specify one more value, you know, fine. That shouldn't like, if you can't do that, maybe you shouldn't be in the raw game, basically. And, and, and so we just to keep the complexity to the minimum, though, we did not opt for having a separate tip for this dimension. So we just reuse the existing, existing tip. And then we, one thing that we deviate from 50-59 a little bit in 50-59, basically if the demand were to completely crash, theoretically, like one gas could, could be, could be, I think, valued as little as seven way, which is just the minimum after which basically updates don't, don't go lower anymore. But so a transaction would basically be free. We don't quite want basically to make the, the, the lowest demand case of transactions here completely free. So we set a minimum data gas price. That's kind of at least somewhat meaningful. So that's like 10 to the minus five ETH per block. So of course it's priced in data gas, but it comes down if you, if you compute the cost of a full block, the block to, to that value. And, and the last thing, and again, this is very technical. So like if, if you just want to understand conceptually how this works, don't, don't care about this, but, but if you ever pull up the IP and you might stumble across this and you might be confused. So, so actually the way we, we track this in 50-59 right now, we usually track the base fee directly. And then we updated every, every block and actually turned out after we introduced it, like looking at it, it's slightly conceptually ugly because we always do these, these kinds of, so basically there are some properties in the upgrade, updating. We don't quite love it. It's a little path dependent and these kinds of things. So, so we moved to, to just a conceptually simpler way of tracking. For this dimension where we, we tracked the excess, excess data gas that has been basically been used over the existence of the AP, right? So basically we just, we have some sort of target that we want to be used. And then every, every block, if it basically uses more than that, we just add to this, to this counter. And then every block wave is basically uses less than that, but we're still above zero in this counter. We just reduce the counter. Yeah, sure. If you want to, this is basically a header field just like the base fee. Yeah. So, yeah, yeah, yeah. So this is one additional header field, which good question. And actually also, I, you can, you can see that because I just wanted to give you like an impression of, of what the kind of calculating the, the, the, the cost looks like with this header field. So as you can see, basically we have these kind of mock functions. If you want to, to get the, the, the feed that in transaction actually has to pay. It depends on, on, on, on the header of the previous blocks emerge to 1559. So you first get the total data gas that that transaction consumes, which is just, you know, data gas per block times the number of blocks. And then you calculate the, basically the base fee, but we don't call it base fee because again, there's no tip. So it's kind of unnecessary to have the base fee tip distinction. We just call it data gas price. And so how you basically, for each block, you once basically calculate its, its data gas price. And you do that by, by basically taking in this, this excess data gas. And then we use this fake exponential function. It's a little, nice little tip bit. I don't know. Maybe it's irrelevant, but it's, it's still fun to talk about briefly. So like just because we want, we want so, so maybe I can, I can already go to the next slide to explain. This is kind of how the pricing develops. It's, it's like 1559, right? So basically, if you were to continue to just keep, keep basically using up all the data space in a block, not just the target, it would basically be on an exponential curve and it would be more and more and more expensive. And you can see basically like a thousand, a thousand excess blocks that that's roughly, I don't know, some like 10 minutes or so. So within 10 minutes, you'd really like to have like super expensive blocks if it were to keep, to keep being, being fully used. Yeah, Marius again. So the nice thing about this is that it's basically pure function in in excess data guys, right? So it doesn't really matter if it was really the beginning or at the end. And there, there will probably be a different, like in the beginning, it will probably be relatively cheap to use data guys because the rollups are still kind of adopts in the pros of adopting it. So there's not that much, that much demand. So basically probably for the first, first month or so, we'd be like in the very zoomed in left part of this picture. And then later on, once, once it's all physically fully adopted and people use it, we'll be like a bit more towards the right in this picture. But it's not like this is not basically because it's so reactive. It's similar to 5029s basically every block and at most doing 12.5% update. So the difference here, like basically you can come, you can go from one of these paradigms to the other within five minutes of high usage or low usage blocks. So it's not like it's not basically something where it matters immensely what was done in the past. Basically a lot, a high consumption in the past only means that like basically you have like five minutes of reduced block usage before you're back to your normal price level. So it's not, yeah. So basically there's no significant kind of accumulation effect or anything. Right? Sure, no, no, but the thing is because, so the way to think about it is like because the price is a pure function of the excess data guys. So at any excess data guys, I mean, of course I put down excess blocks just to think about it more easily, but it's tracked in excess data guys. But once you reach something like say, I don't know, a thousand excess excess blocks, that would mean that sending one block already costs 30. And that doesn't matter whether the excess blocks were accumulated over one day of they were accumulated over a year. So basically once the excess data guys field reaches that value, it would cost 30 per block to send blocks. So we would expect, of course, that the robots are not willing to pay that much for blocks, right? So if for some weird reason there was some spike in demand and the excess would shoot up to that level, it would quickly come back down and stay at some kind of permanent level. So the excess is not something that will continue to grow over time. It'll just similar to the base feed. The base feed doesn't just grow over time, it just finds its equilibrium value. And of course sometimes goes up and sometimes goes down temporarily, but it hovers around some sort of, you know, 10 to 100-way level-ish. And similarly the excess blocks, because that can go back down, right? If a block uses less than a target, the number goes back down. So it will just find some sort of equilibrium value that corresponds to some sort of equilibrium price. And it'll just hover around that. Yeah, I'll just keep, I feel like we're out of time, unfortunately, for the, sorry, P-market section. Yeah, come find me after for continued questions. So anyway, so basically this is how we make, for work, history goes, not a big deal, bandwidth, we really need to put in work to make this work. And this is kind of where the core innovation of the EAP for now lies other than that it's forward-compatible, full-dank shouting. But for now, this 2D e-market is really why we can do this and why we basically just utilize existing Ethereum resources more efficiently. And with that, I think we have done with the kind of the four part of today and we can move to full-dank shouting. Okay, I want to introduce now to the two-dimensional KZG scheme which we will need for full-shouting. Sorry, this is a big jump. Okay, so when we do full-shouting, why do we not take all the data that we want to encode and put it into one big KZG commitment? And the reason for that is that that is going to require a super note, like some powerful note that you probably can't easily run at home unless you have a very good intent connection and want to invest some money into it. So you will need this both to construct blocks where we're probably kind of okay with it. But we will also need it to reconstruct the data in case there is a failure. And this is an assumption that we want to avoid for validity. So it's kind of more acceptable if a failure leads to just not being able to construct blocks or maybe we have to make smaller blocks or we have to make blocks without charted data. But it would be really bad if the absence of the super note could lead to a network split where some people think data is available and some people think it's not available. This is what we want to avoid. So what we want is a construction where, yes, there will be a lot of data in the network and maybe someone needs to be there specialized into distributing that data. But once they've done their job, a very decentralized network of maybe your Raspberry Pi is at home can guarantee that it will always converge, it will always be safe and so on. Okay. So what if we just use many, many different KCG commitments, just a list of KCG commitments. If we just naively, we just take many commitments and we sample from each, then we'll need a lot of samples because before we had this number of samples, for example, say 30 samples. Now we need 30 samples per commitment. Okay. That would be a lot of samples. But there's another much cooler way of doing this where we use read Solomon codes again and we will extend M commitments for like M actual payload blocks and extend them to two M commitments. So here's how this is going to work. So we have our original data commitments. In this case, three commitments. And what we'll do is we'll define another four commitments that are an extension of these commitments. So they will be completely determined by the actual data commitments. And yeah, so here's the math of how this works. So what we'll do is we'll define a two-dimensional polynomial for the data. And it works the same way as before. So basically we will interpolate this polynomial. We will define it by this data region, the original data that comes from many different transactions that included sharded data. And what we'll say is for simplicity, I'll just say the row k will just be the evaluation of this polynomial where we set y equals to the number of this row, y equals k. So we evaluate the polynomial at k and then we get a one-dimensional polynomial, right? So we get fk of x equals to this and like you can pull all of this together and what you get is again an expression just in these powers of x. And then we can commit to those polynomials in our normal KZG way. Okay, so we have fk of s equals to this. Now we replace the x by s and some complicated sum in there. But overall we'll have like one elliptic curve element, this black box evaluation. And we call this c of k. Okay, now the cool thing is if you look at this expression as a function of k, then this is also polynomial, right? It's just a sum of terms of powers of k. Okay, so this is very cool and what this means is that our commitments themselves will be on a polynomial. So if we see the commitments, which are now elliptic curve points as a function of k, they are on a polynomial. So what we have is before we started with having each row being a polynomial that we commit to. We also have that each row, I mean this is probably just a two-dimensional polynomial, each column will be a polynomial. But also the commitments themselves are a polynomial in this case of degree three. Yeah, because they're determined by these four commitments. Okay, so what we have is we'll, how the 2D commitment scheme will work is we'll have two M row commitments and we can actually verify that this is the cool thing, like anyone who validates these commitments can easily verify that they are on this polynomial using a random evaluation trick again, which I introduced earlier. So what we'll do is we'll take the first M commitments, we'll add them at a random point and we'll do the same for the second M commitments. And if these two result in the same point, actually the point will be in this case an elliptic curve point, then they are actually on a polynomial of degree M minus one. For those who are interested, there is a way to do something very similar using 2D commitments. So you can do one commitment to the whole thing, but I won't go into the details here, but there are basically some downsides which is why we're not choosing that way. Cool. And so why are we doing this? Okay, so we have properties that we already know. We can verify all samples directly against commitments. There are no fraud proofs acquired, but now we need a constant number of samples for all these commitments in order to get probabilistic data availability. And basically we get the property that if at least 75% of those samples are available, then all the data is available. And it can be reconstructed and that's the cool thing from validators or other nodes to only observe rows and columns. So there's nobody in the system will ever need or will be necessary. I'm sure they will exist, but it's not necessary that anyone watches the full square of samples in order to get these convergence properties. So what you'll notice is that this number is a bit higher than before, so like if we only have one commitment, then we only need 50% of the samples to be available for the square we need 75%. So the number of samples you'll need to get this will be a bit higher. Cool. And so what we get with this is that I made a proposal. I mean, this is all still in discussions, but like one of the ideas how we could extend this to a full charting construction is that basically the way validators use this construction is that they will download rows and columns. They will each choose to randomly of each. And then what we get is that if a block is unavailable, it can't get more than 1.16 of attestations. So automatically the consensus will never vote to unavailable blocks. And at the same time, they can use these full rows and columns that they download to reconstruct any incomplete rows or columns. So if any samples are missing, they will reconstruct this. And because there will be some intersections, like for each validator, if they do two, there will be these four intersections. They can see the orthogonal rows and columns with the samples that may be missing. And so like as an example, I made a computation that basically with about 55,000 online validators, you get a guaranteed reconstruction where basically every sample will always be reconstructed if we initially had enough data available to do it. And in practice, this number will be much smaller because most nodes don't run one validator. But tens and some even hundreds. And data availability sampling is basically just checking random samples on a square. And what we want is again, we want to get the probability that unavailable block passes is less than 2 to the minus 30. And if you do the math, you find that you need about 75 random samples to do that. And so the bandwidth to do that in this example, if we do 512 bytes samples would be 2.5 kilobytes per second, which is really nice low number. Cool. Okay, handing it to Danny. Okay, so there's a lot of math and there's an elegant construction assuming that we can do a constant time amount of work for a large amount of data to kind of layer it into it as similar to like a validity condition on our block tree. We don't consider invalid blocks in our block tree. We don't consider unavailable blocks in our block tree. And so the math and the construction are very elegant. But on the upper reeds of the road, data availability sampling on the networking layer is actually a non-true problem. That's the wrong way. Oh, the arrow that goes right is so worn it doesn't look like an arrow anymore. Okay, so kind of stepping back. Why is this? Why are we making this problem hard for ourselves? Everyone's seen this. These things are not fundamental that they cannot come together. Scalability, security and decentralization on one system, but it is hard. And it's hard primarily because we want home nodes to be able to run. We want standard computers be able to validate the system to kind of have security and aggregate even against a malicious majority of our consensus participants. Again, kind of in that validity condition of if there's an invalid block and all validators or minors are saying that's what the head is, you say, well, that's not even literally real because it is invalid. And so users in power kind of define what the network is. Similarly, we want to do that with our bandwidth consideration with respect to data availability. So thus we need to focus on the bandwidth here. A lot of this is a quick recap. We've been talking about this all day, but we need to scale execution. We need to scale data availability. Essentially, roll-ups give us some sort of like compression algorithm for the execution of transactions, whether it be from fraud proofs or validity proofs. Data availability, we use DAS, Data Availability Sampling, or we want to. Data availability, we've been talking about all day, but no network participant, including a secluding supermajority of full nodes has the ability to withhold data. Again, it kind of makes data availability of validity condition. It is already today, as we noted, you have to download full blocks, but once it's a lot of data and we want those home nodes, it becomes very hard. Right, so again, we want the amount of work to not really scale as those blocks become very large to scale the network. So data availability, assurance of data is not withheld. Also, assurance of data was published. Real quick shout out. Docker had made most of these slides for another talk, and I'm just reusing them. Important to note, it's not data storage. It's not continued availability. There's a debate as to how long the network needs to have the data available so that people can check that it was made available. Some people say on the order of where are we at? Like 100 seconds. Some people say two weeks. It kind of depends on the use case, and it's a bit of a more of a UX debate. It's kind of the online in this requirement of people to be able to get this security guarantee without trusting someone else. So is it important? I don't think we need to get into this too much. Optimistic rollups and ZKRollups, it's critically important. And who knows? The utility of solving this problem might extend beyond these two types of systems. So networking is hard. We probably are making it even harder on ourselves by some of our assumptions here. So we could say, okay, we certainly want to make sure that block producer and consensus nodes, we want to be able to not be fooled by a malicious majority. But maybe we have a neutral PDP network and we can just assume that PDP network is healthy and gives us what we want. This is certainly attractive. It ensures that each node really can see that they get the statistical security. But if we're assuming that the validators can be malicious, it's very high amount of them, at least two thirds. Some people like to say 99% depends on probably the construction on what the real one is. Then the assumption then that the network is neutral is probably not a realistic assumption. So, well, maybe it's realistic in most scenarios, but if we want to really be able to harden against that majority adversary, we need to be thinking about an attacker control PDP network by some threshold defining whatever that is. Again, this is a lot of kind of exposition of the problem rather than total solutions of the problem. So, if I'm thinking about designing data availability sampling, it's probably interesting to think about what's a good neutral network solution. But then I think when the rubber meets the road, we need to think about what thresholds can we actually harden against a very attacker controlled PDP network. So, in this model, certainly some nodes can be fooled. And so, it ends up being a collective guarantee, again, depending on the thresholds and how the system is tuned. But rather than no node can be fooled, it's probably going to end up looking like no above certain threshold of node can be fooled, maybe for a certain period of time, maybe until the network kind of results itself. So, this is likely correct model, but it does make the problem harder. So, the B2E problem, what are we trying to do here? We want this like PDP distributed data structure that can reliably serve samples so that people can do their job of getting the samples. We want low overhead on nodes from multiple perspectives, one on nodes that are participating in pulling down samples, but also potentially want to leverage nodes that are not just validators, not just builders in this distributed PDP structure. So, we want to also consider the overhead of these nodes that are participating in the serving of the samples as well, or in the dissemination of the samples, other things. I want to be abreast against attacks. I think one of the really, really scary things here is liveness attacks, doses, civil attacks, etc. that happen on the network layer, because if a majority of nodes are seeing data as unavailable, either temporarily or permanently, then they cannot follow the chain at all. Again, we want this to be essentially a validity condition. If there's an invalid transaction in this branch, I don't follow the branch. If that branch is unavailable, I don't follow the branch. So, that is a very important, critical requirement, but a very terrifying requirement, meaning that it is very important that these PDP structures are hardened, and we do understand their failure modes, we understand where they operate, and we do understand how they resolve maybe after an attack. And low latency on the order of seconds. I have a page of some disiderato I'm going to get into in a second. And there's some distinct challenges, I think, when you're kind of thinking about this problem. Dissemination into the PDP structure, we have a lot of data. How do you efficiently get it into this PDP structure without causing high load on the individual nodes of the PDP structure? So, if every node only needs 100 of the data, but they had to touch 50% of the data to get it disseminated in the structure, we're kind of missing something there. Similarly, we want to support queries of disseminated data sample for X amount of time, which I can get into this disiderata again. And validators, certainly with their row and column kind of crypto economic duty, can identify and reconstruct missing data. But we also probably want to consider should this PDP structure be able to identify and reconstruct missing data. So there's two kinds of potential reconstruction that we might want. So validators are very incentivized out the gate. If things are missing from the rows and columns to repair, patch, and make things whole. But if, say, the PDP structure is supposed to serve data availability sampling for one week, then are those validators the same people that will then identify and reconstruct missing data? Or is there some other more distributed and less timely required method to do so? There's a handful of actors involved in data availability sampling. Francesca is going to talk about builders and where they fit into kind of the consensus protocol, but they're kind of the original source of the data. They're highly incentivized to get it out, but they're probably not one that you'd want to rely on in perpetuity. Validators are highly, these are crypto economically incentivized actors that we can try to leverage in this construction. They do have the rows and columns. They do also perform data availability sampling like a user node. And then we have users. Users perform data availability sampling. Hopefully they can be leveraged in serving and making the whole PDP also more resilient itself. Some quick disiderata right now. If I were thinking about building data availability sampling, if I'm researching and doing stuff, these are kind of some target numbers, but I would also be sweeping these numbers and understanding where they work and where they don't. So data size, 32 megabytes per block, that's per 12 seconds. Or if the slot time were adjusted, it might be per some other amount of seconds called 16 or 20. But with the 2D erasure coding, that ends up being 128 megabytes of data being disseminated into the network. Chunks, I think we, there's chunks and we sample the chunks or there are samples and we sample the samples. But on the order of 250,000, you can make these larger, but then you end up with, you still need the same constant number of samples. So you end up with more overhead samples. He said 75 something on that order, but essentially want to drive that probability down as we're doing the sampling latency. Validators really right now need to make decisions about what they see as the valid and available head on the order of four seconds. That could be tuned, depending on the constructions available to us, but they, if they could not regularly be able to do data availability sampling, then on the order of four seconds, we have a problem. Users, you could have a potentially more lax requirement on the order of 12 seconds on the order of a slot. Or you could even consider maybe they needed to be doing it on the order of epochs and optimistically following the head as available. And maybe there's some play in the constructions there. Validator nodes, 100K is pretty optimistic, but we probably have in the order of 4,000 today. So something on that order is kind of the baseline. And then user nodes on the order of 10 years, especially if you start adding lighter weight nodes with statelessness in clients that might want to participate in this data availability sampling, 100K to a million user nodes. So it's really, if the user nodes cannot participate in the serving of samples, then the load on, if we only relied on, say, incentivize actors like validators, then the load would actually scale to serve as the user nodes serve. So it's probably very important to tie them into the data structure itself. Bama assumption, I don't know. It's probably worth discussing. The eth.org website suggests a minimum of 10 megabytes per second around a full node, but for good whatever, 25 megabytes per second. I don't know who came up with that number. Maybe it's a good place to start the conversation. And then persistence. Obviously, like I said, data availability sampling is not for persistence. It's to ensure that data is made available. But if data was made available for half a second, no one's going to necessarily be able to prove to themselves that it was made available or a very small subset. So is it two epochs? Is it two weeks? There's much debate here. Ansgar, I think, what was your recent number? Is he still here? OK. OK. 10 minutes and an hour. I think some are more like a week, two weeks. And that actually changes the requirements on nodes, especially in terms of storage. My intuition here is that the online in this requirement for users that want to get their state transition changes from ZK rollups or policeman users that want to submit fraud proofs for optimistic rollups, this dictates their online in this requirement. And so I feel like 10 minutes. Oh man, I got to get out of here. OK, cool. So debate. An hour seems short. P2P designs. So one easy thing you could do is just say there's a bunch of super nodes in the network. And if you connect to them, you do DIS. And if they give you the samples that you want, then things are available. This is, I believe, Celestia's current design, although that statement I could claim is true a few months ago. I'm not sure today. And you could potentially do something in similar Ethereum, whereas maybe instead of each node, meaning you have everything, you could leverage Ethereum validators, the rows and columns that they custodied. And it looks kind of similar. This is nice. If you connect to one on a super node, then you get what you need. But this doesn't really fit well into the node model, especially if validators, a node that's running on the order of one, two, maybe three validators should be able to run on the order of home resources, which is definitely not the case. DHGs. They all of a sudden, DHGs, nice way to distribute data and attributed data structure across the network. It's a nice way to find data and seems intuitively like a very good direction, a very good start. It fits really well in because each of these nodes can have a very small amount of data and really nice scalability as you add more nodes to the network. You can, depending on your redundancy factor, you can have similar or less data per node. Prone to lightness attacks. It's really easy to civil this thing. It's really, really naive. You just make node IDs, you fill the tables, and if you're a malicious node, you can just return entries from your table that are full of malicious nodes. And one thing that's, I think, very promising is looking at secured DHGs. Dockard's been digging into eschidemia. And I believe there may be some others in this room that have looked at some other papers about hardened DHGs. And we do have, as long as you have a simple resistance set, it all of a sudden can have certain guarantees in these constructions. So you can leverage the validator set or maybe other types of crypto-economic sets to have hardened DHGs. So you could use standard open DHG for average case performance and maybe a secondary fallback DHG, leveraging the validator set in case of attack. So there's some weirdness, because then all of a sudden you're assuming that you have a certain amount of honest validators for this. So does that suffice under the malicious majority construction? Sure, you can probably choose the numbers, but you could also potentially layer other types of crypto-economic sets, proof of humanity, spruce ID, whatever the hell, all sorts of stuff, and could have layered DHGs where they're ultimately just kind of fallbacks in the event that the big main DHG starts failing. Validator privacy and optionality and how they construct their node setups is probably very important. I'm definitely over time. OK, cool, great. Hi, I'm Francesco, and I'll cover the last bit of this very large topic that we've kind of gone over today. It's proposable separation. I expect probably most people will be somewhat familiar with it, but this will be kind of a light introduction. It's not going to advance. It's going to be just for you to get a picture of how does it fit with done charting and what does it have to do with it in general, and also how does the roadmap of that fitting in the protocol look like? Yeah, so first of all, what is PBS? It's, oh, sorry. Yeah, so let's start from the pieces. We have DB and S, so first of all, block building, the B is essentially this task of actually creating and distributing execution payloads, mainly. So we have beacon blocks, but then inside them there is execution payload, which is kind of the valuable part, in some sense, the part that actually changes the state of the execution layer, and this is the part that requires some specialization to deal with, whereas the beacon block part is more of a consensus part. And yeah, so this is, normally today we only think about the creating part, like only basically putting together a new execution payload, but the distribution part will also become critical, especially in the context of done charting. And yeah, and also this, so the distribution will involve the data that is committed to, which is going to be eventually very large, and that's why it's kind of an important task eventually. And so for these reasons, and well later we'll get a bit more into them, it's a quite specialized activity that we don't really want normal validators to do, because it would kind of increase their requirements too much for us to be comfortable with. And then there's proposing, so this is just, you could think of, today proposing includes both things, both this kind of consensus part of making a beacon block and including all the consensus messages in it, attestations other thing like slashing messages or anything that's kind of critical to the good function of the beacon chain, but then also putting an execution payload in it. So today it's still possible for anyone to do this by themselves and kind of have both the roles together. But if we kind of ignore this execution payload part, this is really not a particularly specialized role and we think that it's always going to be possible or we really want this to always be possible with low requirements. That's basically what we expect today a validator to have. And yeah, the separation is just that these two things are split up, like we don't, the default would not be anymore that a validator does both things, or a proposer which is a validator does both things, but that the proposer does the beacon block relevant part, the consensus messages part, and some other kind of specialized actor comes in with the execution payload and distribution of the data eventually. And yeah, so why do we want to do this? I've kind of already hinted at it, but yeah, it's simply that if we outsource the specialized stuff, we can keep the simple stuff basically decentralized. We can keep the really consensus critical things essentially done by a very decentralized validator set, which is a really important goal in Ethereum in general. And I mean practically what are these things that we want to outsource? So for the whole day, we've been talking about dank-sharding, and there's nothing really, I guess fundamental about sharding that requires this outsourcing, you could imagine other models, I mean the original sharding model before the dank part didn't require this outsourcing, but it's really like a major simplification, and so I'm not just simplification also as I think like consequences for latency, like it just makes gives us this really tight coupling between the execution payload, the blobs, and kind of just reminds the whole process. And so with dank-sharding, if we do want these simplifications we kind of have to we start having something to outsource because the proposer has to compute these commitments really quickly, which is not easy to do for like normal hardware, and also like probably the most prohibitive part is the basically distribution of the data to the network, so that would require really kind of not acceptable upstream requirements for validators, like more than probably multiple gigabits you know, and so yeah we don't want to require this it's like, or there's a magnitude more than what someone would need today because basically the most you might need to distribute is 128 megabytes per block and yeah, but again this is not a kind of fundamental reason if there was no other reason that we needed this separation for we might be a bit more skeptical about dank-sharding we might think, well you know, we don't need these other actors, why are we introducing the system just to get the simplification that's not kind of the ethos of Ethereum, like we really want everything to be as decentralized as possible as like resilient as possible, these actors probably you know, do introduce some complexities in this vision but the issue is, dank-sharding isn't the reason why we introduce these actors, the reason is MEV and this kind of fundamental reason there's I don't think anyone that has looked into MEV enough thinks that there is any other way essentially to go and the issue is simply that as I said these execution payloads are really valuable and extracting value from them is a really sophisticated activity from many points of view algorithmic, infrastructural, like it requires potentially very good hardware a very good connection, like latency is really important, so there's like all kinds of reasons, oh and also like access to order flow, so you know today we can think that order flow is more or less, so you know essentially access to mempool transactions is more or less available to everyone publicly but that's it seems very naive to assume that that's going to be the case in the future and already it's not quite true that that's the case, so maybe they're always going to be a public mempool for censorship resistance reasons for I mean other reasons but it's really naive to think that everyone is going to have access to the same kind of raw material to build blocks like the transactions and this access to order flow is a huge part of being able to create valuable payloads so there's like all kinds of reasons why it's just not realistic to think that validators will be able to profitably make their own blocks and so there's these really like strong centralization pressures if we essentially don't provide them a way to do it you just go and you know have someone else to do it well which is the whole point of the separation but there's different ways in which it could happen some ways in which it could happen are for example just everyone's taking with pools because that that's the only way that they can extract value although that's actually kind of not already it seems like a scenario that in some sense maybe we can avoid we already have pbs today like we usually say pbs and we mean basically in protocol pbs so where the protocol kind of knows about the separation like has a concept of builder and in some sense like negotiates this outsourcing but today we basically have pbs is just not in protocol it's called a math boost maybe probably a lot of you know it and essentially what it does is it introduces a trusted third party in between a builder and proposer which are these re-layers I don't think I have time to like go into the details of it but essentially you know we don't we want builders to not trust proposer we want proposer to not trust builders there's reasons for that and yeah we just basically put like a trust third party in the middle which kind of negotiates the the exchange so you know the proposer wants something from the builder the builder wants to get something to the proposer the trust third party makes sure that the exchange happens in a way where none of the two parties can trust each other essentially and so this already exists today a lot of the theorem blocks are built in this way so it's it's the reality that it's not something that you know the theorem community kind of made well it is something that the community made happen but it's in some sense inevitable like anyone could always build some infrastructure of this kind and people could use it if it's more profitable for them so yeah so you know we already have this we care about potentially putting this separation protocol so as I said really is a trusted third parties we don't usually like to have these sort of entities in the protocol they're not critical in some sense well if things are set up properly which I mean I think there's a lot of improvements to be done on the infrastructure that exists today it's very you know young infrastructure but either way there's always going to be some kind of failure modes that we don't really like or some some requirements that we don't really like from having these parties so one is that you have to basically why list them because they're trusted so everyone has to kind of go and configure some list of these entities that they're fine with essentially the trust and we don't care builders do that but we don't really like validators to do that or well I don't know that's debatable but anyway there's I think there's a future for relays to still exist and just have a full fallback in protocol that is not the default but that's a conversation from the right time but yeah another thing is that today we don't really have a kind of live monitoring for relays like locally people don't have a chance to observe interactions that relays have had with other proposers and then disconnect from them if these interactions looks suspicious essentially so that's something that we can include basically really improve the resilience of this whole system because I think it's so that people don't need to you know go on Twitter and find out oh this relays malicious I'm going to disconnect from them but just maybe this can happen locally essentially so there's a lot of improvement there but still there's some kind of I guess really fundamental catastrophic scenario that seems unavoidable to me if we keep having rather if we only rely on these entities for this outsourcing if we have kind of no fallback so essentially we're done so today you can always have a fallback actually it's not so fundamental to the state of things today you can always have this fallback which is essentially the catastrophic scenario is like all relays that most people are connected to fail for some whatever reason they're malicious or they're attacked anything can happen they fail and now all of a sudden today it's fine you could you know once you manage to disconnect because you realize okay these people haven't given me blocks for you know however many times I've tried or if you have this monitoring system that's fine you just fall back to you building your own you know GAT or whatever like other execution clients you're running building your own blocks so now our likeness is not really threatened maybe it's like a temporary thing but we're done charting and also statelessness in some sense if you know let's say all the validators are stateless they cannot build their own blocks or we're done charting like they cannot distribute data then this becomes like a threat to now our likeness not exactly it's like you could make blocks you just cannot put a lot of data into them but you could argue well these are really liveness like if all the roll-ups can stop because they don't have access to data anymore that is not really what we want yeah so this is maybe what it will well this is like one of the current ideas of what it could look like to put it in protocol I think well I think probably can't really go into it I don't think we have time to go into it in much detail but basically it looks like you know as I said before what are the relays they're just these kind of actors that negotiate the exchange you know what do we do if we want to remove these actors we basically have the protocol negotiate the exchange and the protocol in this case is basically other validators so there's a proposer there's a builder and we have the whole rest of the validator set or some committee more well likely that basically kind of makes sure with their well they observed exchange and with their attestations they sort of make sure that if the proposer tries to shoot the builder they fail and vice versa essentially so it essentially gives us the property for example that if the proposer accepts some block and or some like bid you could say from a builder and we have good latency like things are fine from a fortress perspective from a network perspective then the proposer will get paid it doesn't matter what the builder does if they reveal their block a good kind of this is a good case if they don't reveal their block they're really late you know tough luck for them they're gonna pay the validator and not even get their building opportunity and so this is one design there's this other design which is kind of interesting oh yeah also thanks to Vitalik for all of the things I just took him from many of his e3search posts but yeah like so this is basically you could say in protocol map boost because it's really like designed to look like a map boost again we have basically this like party in the middle this time more clearly than before which is in this case a committee it also was before but anyway and this kind of party again negotiates the exchange we could think of the party as basically an availability oracle so it's basically its job to ensure it's to give guarantees to the proposer that what the builder sent is available so the proposer will accept a header basically offer of you know I want to give you this block pay you this much and the builder will essentially erase your code so hopefully if you've followed the discussion you know what is your coding is by now the essentially the execution payload to the committee like essentially encrypt well there's your code then encrypt and then basically split the parts to the committee so that if some threshold of the committee is honest and online they will be able to decrypt even if not all the committee is and basically the committee signs you know essentially individual members of the committee will attest to the fact that they have their part so that if you see enough attestations and the committee is officially honest then you know that as a proposer that this thing will be able to be decrypted and you know the data will be there essentially so actually yeah this kind of fits in quite nicely with these data availability discussions like that is really the problem here that the proposer is accepting a bid but the builder doesn't want to say what the bid is because that's their kind of private like secret information and we want basically some guarantee that even if you don't know what it is it is going to be there once the time comes essentially like once you've accepted the bid and it's ready to go in the chain so that's what it looks like and just last quickly I want to comment on basically so there's like sensory resistance questions about PBS and I think they're not you know they're like fairly well understood there's a there's clearly like a way that PBS in or out of protocols it isn't really our questions also today it does the great sensory resistance but we already know kind of how to deal with that there's this concept of inclusion lists there's like slight tweaks to that there's like basically a really wide design space of very like roughly said ways for validators or proposers but you could just say validators to basically make sure that transaction that should go in the chain eventually get in the chain even if builders don't want that and this also by the way like a really important reason why we want decentralization of the validator set because if you don't have that then you just don't have this option like if you have 100 validators and they don't want something to go in the chain that's it there's no way like soft working or you know other other ways but there's no kind of automatic way to do that whereas with a decentralized validator set we can always do that and yeah so inclusion lists they're quite simple in some sense there's like disagreements about how exactly they should work but they're sort of simple today so if we have like the property that it is easy for a validator to say this transaction is available and this transaction is valid so the validity part becomes a bit harder for account abstraction so there's some questions there but we won't go into that that's not really relevant here the availability part that becomes a bit harder with done charting because now all of a sudden you know there's all these like all these blobs floating around the network there's all this data that you're not supposed to you're not supposed to essentially download all of you're only supposed to sample what actually ends up okay yeah well I'll just finish this phrase and then I guess that's it but yeah basically yeah so with done charting the term availability becomes a bit harder so we would like to have some kind of shard and mem pool construction so that you can even for things that have not been included in a block yet you can still in some way determine that they're available without everyone have to download everything essentially and at this point this might not need to be the default route that all transaction goes through and probably won't for kind of some other reasons that I've already hinted at before like it's unreasonable to expect that everything will go through public mem pool but this is kind of the fallback for censorship resistance always and so we want to basically have some kind of construction like this and I think that's it we're out of time hopefully maybe we still have time for some questions for everyone but otherwise that's it okay so we are obviously out of time but we can still use this room for another 20 minutes we have a special guest here Vitalik is here to answer some questions around it okay any questions oh okay there's a question hello thank you how do you approach the topic of multi-reli in this term sharding like ecosystem because there are many solutions because I heard in PBS but that it weakens the topic of censorship but how do you approach then to improve the mem pool with multiple relays of information these are a concept that exists in like MevBoost kind of out of protocol PBS right like it's not a concept that exists in in protocol PBS so the long the long-term solution is to not need to rely on them hi the erasure coding and shami a secret sharing scheme seem very related they are they're the exact same math okay it's network persistent for the blob is going to be dependent on finality because I would have expected this to be the case and therefore rule out completely these notions of having them for only five minutes what do you mean by dependent on finality like that if we're not finalizing them we need to keep the blobs for longer oh I see so like if you're in the middle of an activity leak that probably makes sense I mean I think I personally favor blobs being around for long enough like at least a month or so so that it's longer than any realistic activity leak that would happen but there's different approaches just on the setup was there like a consideration or is it even possible or what's the problem was actually making that like a separate system it reminds me a bit to like swarm as it was integrated into Ethereum nodes wouldn't it be possible with pre-compiles and like the right new EVM opcodes actually make that an independent system or so the problem the reason why we need like data availability sampling in consensus and why it's like so different from you know IPFS and everything out there is because we want to like actually have consensus on the fact that the data is available right like the IPFS does not provide that right and there's ways to like upload files so that some people think it's available and other people think it's not available and for like regular file publishing that's fine because if a file is half available you just publish it again but for rollups like you need like exact global agreement on which data was published on time in which data was not published on time because the rollups like in order to figure out the current state of our rollup you need to figure out like which data blobs to include and which ones to skip over yes and tightly coupled with the chain next session here hi for the sort of data blob storing period are there any thoughts about like challenges for validators that kind of keep the data or is that purely altruistic behavior there I mean there have been proof of custody designs that we've worked on over the years I think it's kind of it's on the sort of rhetorical back burner because we just like know that these techniques exist and like we know that when the time comes you know we can probably just stick them in to get extra security um yeah so I wanted to ask about the multi-dimensional fee market quickly we talked about this excess data gas field I was just wondering if the same would like in an ideal world if we hadn't already done EIP 1559 would the same construction be wanted for the like original kind of gas yes yeah and but this can happen simply just could this happen EIP 1559 could be upgraded to that over time okay cool I think they even already thoughts in that direction so I think over the medium term we would want to and one of the nice side benefits it would get us that it also makes other improvements to a 1559 like mechanism easier like for example like a time based instead of a block based kind of throughput targeting so yeah we would probably want to homogenize this over time yep so I could be wrong in one of these assumptions but my understanding is that proposal builder separation was motivated largely by the centralizing effects of MEV and us wanting to keep the proposal set centralized but then kind of later with these designs like full sharding we could utilize the builders as kind of with extra hardware requirements because they'd be incentivized with the MEV that they're extracting to have these nodes but then there's also research into completely mitigating MEV okay so maybe that's the wrong assumption there's multiple strands of research and some of them are definitely sort of covering for each other in case the other fails but and some of them are complimentary right so there's a BBS that allows proposers and validators to be more decentralized but at the cost of kind of shifting that centralization to builders there's a separate strand of research on the topic of trying to make like builders themselves decentralized internally so like a some kind of protocol would plug into the market and make bids instead of being a single actor and then there was also research on making applications that are MEV minimized so like all three of those exist okay I guess the question is just simply like how do we mitigate or very minimize MEV then how do we incentivize the builders it's just like minimizing MEV doesn't mean MEV goes away it just means we do as much as possible to reduce it but there's no sure again like I think anyone that has thought about MEV for some time will come to the conclusion that it's just not possible to assume that there's not going to be an incentive to be a specialized actor like there's even if all transactions are encrypted there's always going to be some reason to be the first to touch this state like okay so the idea is there'll still be incentivized to run these nodes yes there's always going to be I think a lot of money to be made by controlling a block I don't think that if Ethereum is a platform with value flowing essentially right and just to mention it's not just thankshading we are basically a pbs like architecture would help so once we move with worker trees to something through a world where it's easier to run stateless nodes then also with what pbs would get us would be that like normal validators would basically all of a sudden have way way lower storage requirements and we don't get that if they still have to create blocks because then they need the state but if you only actually validate but you don't create your own blocks you leave that to a specialized entity then you can as a validator turn stateless and that's kind of one more of these benefits we would get out of this so would it make sense to like charge for something or something like that to like motivate people to have the data and be able to collect the charges I mean I think it's definitely a possible construction you could like see a way like where you oops a specialized sample provider and you pay them I think like the downside is that it makes it much harder to run a node because now you need to somehow set up this payment infrastructure so I think it's not ideal but it's possible Hello will there be ways to for somebody who wants to make data available to provide a proof through a smart contract as well which is independent of this call data layer 2 and so on which is smart contract specific like it's a generic infrastructure for proving that the data was available I mean that's what this construction does like this is part of like there will be a type of transaction that is called with the KCG commitment and with the guarantee that this data is available and then also being does there need to be like a special opcode in Solidity to run the proof then to check that it was actually provided because the data was like if you get that commitment it was provided full stop but if there's no extra check necessary yeah that it was provided but then if you have a check inside Solidity if the data was available but it's a parameter it's just like you get this commitment and you know it's there it's like extra data if you want to check that the data is available behind the commitment you use a history proof to prove the document with any transaction okay okay any other questions so one way to understand the data for example in network can we basically interpret it as kind of with the samples that distributed in the network and the validator separates but definitely IPFS has some... IPFS is a very bad way to think about it because what we're doing is not storage it's proof that data was not without yes but I just like kind of like for the network perspective like yeah and I know IPFS is one of those simple attack there's something that is going to be adjusted I think yeah from the kind of network perspective of how the thing should be yet implemented like this thing has much higher requirements in terms of business and fault tolerance and in terms of real-time access and like real-time being able to change what you're accessing those are probably the biggest differences in requirements okay great next question there if you don't mind have you enjoyed your stay in Colombia so far sorry yeah no I have it's been very fun thank you next thank you something else thinking about is that if we assume that MEV exists and that people want to you know sandwich other people's transactions and stuff like this so we have this proposal to build a separation which makes a lot of sense and then we once we have this we kind of start to utilize it to do heavier work like thankshouting and things like this things that be concerned about is that presently we have the ability for users to just simply not run MEV and they just let the transactions come in as they will and they you know lose a bit of money but they're kind of genuine people I just be interested to make sure that we don't rule out this person and we don't kind of glue together the role of like making really specialized fancy sandwiching blocks and then also doing all of the thankshouting stuff I think it'd be nice if we can make sure that we keep a space for the home user to continue to pack their own transactions and just be like you know a nice guy we'll go. Yeah, well this is actually one of those things that I think I wrote underneath the research post about last week like basically like can we push the autonomy in choosing a block contents back to the proposal and like that's a spectrum that potentially could go all the way up to the proposal having an option to make everything and then one of the conclusions there was that if we want to have that kind of proposal autonomy property but also have the property of like potential low proposal requirements we might need to have a third category of actor that does are kind of not and not maybe extraction that basically the entire bundle of computationally expensive stuff so like a witness addition, state root calculation in the future as the case darking and figuring out polynomial results and proofs and broadcasting and so forth I mean I just want to comment I think like people need to stop thinking of MEV only as a bad thing because you think of sandwiching and even without any front running any sandwiching we still have lots of MEV and it's actually a necessary part of the system like someone needs to do the arbitrage on the exchanges someone needs to do the liquidation someone needs to submit the fraud proofs all of these are maybe so don't think like this is a we're incentivizing a bad thing here so it's a part of the systems we're building and but I mean I would say for example you can eliminate the bad MEV using transaction encryption for example but you'll still have loads of MEV left like yeah so I've got four others going to say I think well but basically we're relying on ethical builders I think there's like all kinds of techniques that we're layering on top or that builders have to do with the really nasty things a question on the PPS so here so will we sort of apply some slashing mechanism once we have a separation between the builders and the proposers sort of on different actions they probably have yeah so there are slashing mechanism like slashing mechanism so like there's a slashing mechanism that slashes the proposer if they make two conflicting blocks in some of these partial block auction protocols we use eigen layer and that's like basically exposes the proposer to kind of extra slashing if they if they violate the rules of the partial block auction protocol builders can get slashed in some context I mean forget exactly which ones but there's definitely a few cases so there's definitely like different forms of slashing to make sure the different participants follow the rules of the protocol thank you