 Hello, and welcome to our workshop about running the Beacon Chain Explorer on your own. We will go through how we came to the idea to make the Beacon Chain Explorer, and then we will explain how you can actually run the Beacon Chain Explorer. We, this is Stefan, my colleague, and me, Patrick. We both work for the company BigFly. So yeah, first we all, Stefan will introduce you to our company, and how we actually came to the idea to make an Explorer for the next stage of the Ethereum project. Then we go briefly through the architecture of the project, and finally we will show how you can run the Explorer on your own. But before we start with anything, everyone who wants to participate, like, interactively, we need your, ooh, sorry, that wasn't the wrong laptop. Yeah. We will go first through how we came to do the Beacon Chain Explorer, and what the architecture of the project looks like, and finally we will show you how you run the Explorer. But before we go to anything, we need your, anyone who wants to participate, we need you to download a few things. So are there anyone who wants to participate on his laptop? Do we have installed Docker in Docker Compose? Sounds fine. Yeah. So can you basically try and see if you can access the repository, if we made it public or not? Yeah, then I'll quickly, we'll jump back to that, and I'll quickly introduce kind of who we are a bit more in detail. So the company Bitfly was founded in 2017, and we've done a range of products. We've been basically active since the beginning of Ethereum in the space. We actually created one of the first block explorers, but kind of let it slack a bit. So that was etherchain.org. It's one of the older ones. And then later on we kind of regretted not putting more effort into that explorer, and then we kind of decided now that there's going to be Ethereum 2.0 or basically the merge, we decided to create another explorer, specifically for phase zero. And that explorer kind of ended up being the beacon chain explorer, and it started out where everyone could basically see all the information about the beacon chain. Maybe a show of hands who has used the explorer before, maybe just to kind of get a feel. So about half of the people have used the explorer before. So maybe if you want to show of hands who's running a validator. Okay, quite a few of the people that are using it. So basically we wanted to create a nice place for you to understand what's actually going on with the validator, to see the different states of the validator to kind of make sure you don't get slashed. And we also operated one of the bigger mining pools called ethermine.org that was now retired since the merge, and we kind of shifted away from mining a bit and focusing, we're focusing a bit more on staking. So kind of to shill a bit our other products, we have staking.ethermine.org now, where you can stake with less than 32 ETH, and we have ETH pool. If you're too lazy to run your own validator, you can just upload your validator key and we'll kind of take care of that for you. And then a brief history, I kind of already touched on that. So it started out in 2019, end of 2019 with the first test networks. So even before the genesis of the beacon chain, here you can see a small or kind of nice picture of the genesis event. We kind of created like a slot view. So each one of those rows, if you don't know, is an epoch and in one epoch there are 32 slots and here you can kind of see the green part is when it was proposed, the red one is when it's missed, and the yellow is if it's orphaned. And that's kind of a nice overview of the first few epochs when the genesis happened. And we also have like a checklist below how much finalization we have, a participant patient we have, and if the epoch is justified and finalized. And a nice rocket for like the countdown when genesis started, which is also pretty nice. So that's kind of how this whole project began. And it kind of exploded a bit more and more and with the merge we tried to add more information, not just the phase zero information and it kind of grew and grew. And this presentation will also be kind of about how we kind of handled the scale and also give a kind of a workshop where you can help us with kind of finding ways to scale better. Yeah, so let's go a bit to the architecture. So in the beginning this is kind of a very simple view what the Beacon Chain Explorer looked like. I also presented this, a similar slide. I think it was ETHCC a year ago where we had just the Prism node, we had Infura node and the Funeral node was mainly to get the deposit data from ETH1 and then we had an exporter which basically grew out everything into Postgres database and that worked really well in the beginning, it was really easy to work on and then we just had a Golang front end with some templates that kind of served everything to the end user. And even when I presented it back then we kind of can see that Prism already needed like 576, 67 gigabytes of data, Ergonode already 1.4 terabytes and Postgres already 2 terabytes of data and I looked at these numbers again, how it's running today and kind of a reason why we had to scale a bit more. So the numbers now that we switched to Lighthouse and we have a 32-slot sync so there's different kind of sync versions that you can choose and depending on what amount of slots you choose the more information is stored but the quicker you can retrieve information and we basically want to store everything and that's already a 5 terabytes disk that you need for that Lighthouse Ergon is 2.1 and our Postgres database is huge so kind of the tables and the indexes that we have in the Postgres database they just don't scale that well so it's already 10 terabytes big and we don't only run the mainnet we also run the testnet and the testnet is just that amount and even a bit more for different testnets so it's a huge amount of data and it's getting a bit expensive running that in the cloud so we're going to talk about how we kind of started migrating away from that and basically the scaling challenges during the merge so this is kind of an analytics view that we had so we had concurrent sessions of 2,267 people that were looking during the merge because we had like this nice slot view that you saw before where people could see okay during the transition what was happening it was a nice visual way of kind of tracking it and we had a lot of people we kind of were kind of prepared so you have this Super Mario mushroom that should power you up but that mushroom was kind of overpowered by the people looking at it because the way the architecture was designed it just didn't scale well we have frontend instances that kind of have to query the same stuff like if we have 5 frontend instances they do 5 times the same queries because we don't just have like one layer that updates our cache we just have every frontend instance that has its own cache so these are things we kind of try to try to improve and then again we have a very expensive indexing as well in big table and big table that's going to get to later in Postgres and Postgres also has to get a lot of the table into memory to be able to query stuff which is a lot of the time it's not very efficient especially with huge amount of data and so what we try to do is we started out with one big binary where everything runs and we try to strip away things from that big binary and kind of isolate that in microservices that not every binary does everything and if we duplicate one binary we don't do unnecessary work and kind of the migration is kind of a bit challenging for us because then we have to kind of make sure that we manage a lot of the technical depth that we kind of have and are maybe adding through changing technologies could someone maybe say if they can access the repository? Yeah, there's a docker file and a docker compose file in it and if you just run docker compose pull it will pull all the images and when we later get to the interactive part everything will be done we have all the things ready Okay great I hope the internet is good let's pray a bit because when we tried it or tested it before it took a while so here you can see I kind of touched on that the scaling and large table so in Postgres in the beginning we didn't have any partitioning at all so later on when you can just maybe hold I can hold it up Okay Again, lots of assumptions that we made without knowing too much of the spec or your specific use case let's go so cloud first I think still makes sense especially for like the DOS or whatever you guys might need that for in terms of front end I don't know if this is a bottleneck but we mentioned thought about clustering or some autoscale for this or even if we're going to the web3 ethos of let's put this on IPFS or like distribute it somehow and merely make it undeniable service scaling that way or it could be just some autoscaling surface on cloud run or GCP or whatever we might use for that so I don't think this would be the bottleneck anyways the front end itself I think that and if depending on what you have here this could be heavily statically generated so it might be something that you wouldn't be a problem then second layer we're talking about the cache so I imagine most people are our team imagines that most people are looking at past data so you probably don't have to have direct access to the database you don't have to do real queries against it you probably cache heavily previous data that was already from previous blocks that will never change so things that could be pretty much frozen there because it's not, it's really static and stale day let's call it like that so this depending I guess there are probably lots of ways of doing this cache layer this could be like a DB level this could be some like Redis cache level or something like that but yeah really something before the database itself then on database our friend I forgot his name but our friend there gave an idea of maybe we could start sharding this maybe some sharding based on terms like hash or something like that which same thing since this is mostly data that will live forever and not touch it again we probably have a part that's only read only so it's there very high output for reads not that much in terms of rights and a part which will be the right part which this part here would have more direct access to the actual nodes and then the nodes probably would make sense to have multiple archive nodes if the RPC calls might be a problem I don't know if it would make sense to have lots of different machines running those instead of infura infura for us in our company is always a problem if there's anyone from these companies talk to me please but yeah maybe just run your own archive nodes for these not sure also if the actual RPC calls are a bottleneck for you guys but it's good to have a backup okay so at least yeah maybe to like archive nodes you guys are going to run for that and then this part here would just be like the heavy right part which everything that you check that's in you you would update database based on that thanks so much okay thanks for participating I'm gonna hang that up this is the solution we came up with so with which part kind of the solution we have right now yeah this is the solution we have right now with which we parts of the of the data set we now store in big table instead of in postgres so we thought like it's not really necessary to have a relational database for this kind of data set and big table scales really well so I think that we were able to to index all of the current 8.1 transactions in 5 hours so it's like really it scales really well big table so we now store the blocks the balances of the validators every transaction all of the stations in big table and this resulted in a really much more stable index time so now you can see it takes us like really stable 30 seconds to export all the data of one epoch yeah this is where we come to the interactive part of this of this workshop for everyone who wants to participate so he's just resetting everything he's done to kind of start fresh with you guys so the first thing we want to do is initialize the databases so we turn on postgres where there's a big table and initialize the postgres tables and the big table what is it called tables database so the next thing we want to do is to start at 1 indexer we called it that because we're not really creative it indexes like it first gets the blocks from the execution clients and encodes it in protobuf and stores it into big table encoded in protobuf and then after that we indexed the blocks with all the fields and values and the transactions as well so one big thing is like we're actually not using big table from Google Cloud we're using big table emulator from the company Bitly it just emulates the big table API and stores it in a square lite database and I think like for just storing and then analyzing the data it's not like it will not scale for a thousand requests per second but for analyzing the data it's a really good solution in our opinion so the thing here is we are not able to reach the nodes we think before the workshop in the cloud somehow we don't get access so we are now running the nodes on our laptops and that is where you see some errors because the nodes are not synced and it's not like it doesn't work well but so now we have the databases and there is one index around me the next thing we want to run is the exporter the exporter is like just the exporter we had before the merge but now it will index the blocks from the from the big table that we just exported in the one indexer again we have some errors because the nodes are not synced the next thing is like the statistics module and this is also a really important and integral part of the whole project because the indexer will aggregate the data of one day and store the aggregated data of one big engine in the database and so all the queries become much faster and the last thing we start is the frontend it will get started with the cache update and the frontend itself the go lang web server which just queries all the databases so all the things are running now and we can browse the big engine explorer on localhost 8080 the thing is since the nodes are not synced it will take some time until any data will show up this one some extra scripts we wrote into the main script the thing is like we understand this repository not as we want to make the big engine explorer available for everyone who wants to explore the big engine not so you don't have to trust us big engine big engine big engine big engine .in you can run it yourself it will take some time until it's like synced and we will try to make it more easy but the thing is we really we stand by the decision to make this an open source so we will try to make it available for everyone it's not like we made the decision to go into big table from Google to run it in GCP just to make the scaling issue like gone but that doesn't mean it's not available for everyone thanks to such amazing open source project like from Bitly the GCP emulator so the last script is now you have like a Postgres database where you can think with the data we made just some the script is not up to date but now it is yeah okay I don't know if I want to do that is anyone following along is it working okay it's working yeah okay yeah so the thing is like with this repository we show really how you can set up the extra and we know it's not like there's a lot of technical depth that is like we need to solve it but it's working for everyone not just for those who have really GCP accounts and everything I think we can go to the next slide so this is probably yeah okay so for anyone using our mobile app or the website we've got a discount code bogota50 if you want to use it you can get our premium app services if you have a lot of validators it's nice among other things or if you want to use our api services you can use the code over the website not directly over the mobile and yeah if you're running a validator and you don't know about the mobile app it's pretty nice let's you look at your validators on the go let's you get notifications when you propose a block and kind of see if you've gotten those 30 ETH block reward or something like that yeah okay then that's a lot of features yeah if you want to contribute here's kind of the overview of kind of the resources so we have a get coin grant that's linked below you can tweet at beacon chain if you see any issues or something create issues on the github repository that's also linked there and oh did anyone still want to take a picture oh no thanks yeah that's basically it from us thanks so much for participating and listening if you have any questions yours right now that you... it's called so it's one of the testnet it's a bit smaller so it should be able to sink in a reasonable amount of time the wifi isn't the best here so here it could take a while but it's a lot smaller than prata or which is girly or mainnet which are pretty huge right now any other questions not sure if I should answer that but it's quite a lot it's quite a lot so it's what is it five figures yeah we looked especially at silo db but we looked also at other db's but in the end we took something that also fits in our team like we don't have so much resources I was like so much take so much man hours to get into a new technology and for us this looked at the best fit and right now we are really happy with it so yeah we basically went from an export time of 30 plus seconds for attestation assignments to like 5 seconds which is a lot better and there was less contention between different queries on big table now that we moved but that can change if we have everything on big table we're just starting to migrate so I hope it stays good yeah and layer 2 is coming so okay anything else thank you very much for listening and for participating again