 So, I would like you to do a bit of a demo to try and show off sort of what happens if this runs successfully for you, but if you're able to connect to the internet and interested in the workshop style component, then I actually recommend following these steps, basically just pulling down and vulcanize, checking out the grant you got set up for this process and running Docker Compose up to set up your instance. So, it's now being settled, grab a microphone and let's start it. So, yeah, if you're talking about vulcanized DV, my name is Rob Mohald. I'm a lead developer with Vulcanize. I'm also a principal software crafter at Haylight in Chicago. And I'm a guest engineer. I've added three lines of code to the Ethereum repo. It's a microphone process. Yeah, bring the microphone up, cool. So, to get started talking about sort of the motivation for Vulcanized DV, I wanted to start talking about sort of where are we now with accessing data in the Ethereum ecosystem. And from my perspective, there's a few things to notice. One is that you've got these decentralized applications, but largely in order for these decentralized applications to work, we're relying on centralized data, right? So, here I've visited molockdow.com, and we've got a bunch of network requests going to Infura to access the latest state of the molock contracts. And that's pretty cool, like it works, and it's awesome that Infura provides this service. But ideally, you know, we might have solutions that didn't depend on one third party to service that data. So, in addition to sort of centralization around access to data, another thing we've got is that we've got a lot of duplicated effort, right? So, when I've spoken with people about Vulcanized DV in the hallway, I think one of the most common comments I get is like, oh, that sounds really great. Like, we had to build something like that in-house to support our own infrastructure, right? So, it's amazing to me how many teams have sort of gone through this work of figuring out how to, like, extract data from an Ethereum node, decode it, and put it into a Postgres database. But the fact that so many people are doing it sort of raises the obvious question of like, why don't we have a shared tool set for that? I did want to mention that there's a lot of teams doing great work to address these two issues, right? So, the question of centralization and duplicated effort, we see a ton of teams out there making progress to try and improve the situation. And, you know, I think that's super awesome. I'm really happy to be able to, you know, hear at DevCon, like, see what other people are doing and learn from them, and hopefully we can all learn from each other. But I think the fact that you've got this many organizations all working on like this shared problem sort of indicates the magnitude of the issue and the rationale for having improved tool set. So, with that sort of motivation in mind, the question I wanted to propose is so like, where are we headed? And with VolcanizedDB, I would say that our mission is to replace the centralized and bespoke solutions with shared tooling that anyone can run. And so, if we look at the toolbox of sort of what we've started been developing to make this happen, I wanted to walk through a few different things in our tool chain that you can use like right now to start spinning up your own instance of VolcanizedDB and owning your own data to be able to serve it for yourself. So, I'll walk through each of these in more detail. So, the foundation of the process for setting up a VolcanizedDB instance, we've got this header sync process. And what the header sync process does is we're basically taking block headers out of your node and putting them into Postgres. You can configure a starting block. So, if you only care about a contract that was deployed at block, you know, seven and a half million, then you can start syncing headers into your instance from that individual starting block. And what this process will do is it'll enable you to continually verify those headers at a configurable depth. So, depending on what your concern is about reorgs and so forth, you can have this foundation of data that's continually being validated and where data that's no longer on the chain, if there was a reorg or something like that, is going to be automatically pruned for you. So, you know you have a consistent record of just, like, what are the headers that were on the chain. And importantly, for some of the additional tools I'll discuss, you know, we have foreign key relationship between this block header and all of the nested data, such that you can sort of cascade, remove any data you have in your system that is a product of a header that was removed. So, the thing I wanted to talk about today and the thing that the exercise will let you to run is then the contract watcher process. And so, we think the contract watcher is pretty neat. What it does is basically you give us an address for a contract and you tell us the deployment block of that contract. And then we will automatically figure out what are the events on this contract, start getting them off the chain and decoding them for you. You can run the contract watcher with multiple contracts. We'll create a schema for each one and a table for each event. Those events that end up in Postgres will be associated with the headers, such that that header sync process is going to remove stuff that's no longer valid. And this works pretty great for things that are like events that are defined in a contract's ADI. And again, so if you're able to access the internet and want to set up your own instance, this will do that for you. This will kick off the header sync process, contract watcher, focusing on three contracts from MoLockdown. And it will spin up an instance of Postgraphile, which I'll get to that will enable you to see that data on localhost 5000 in your browser. Compose and execute is a command that I want to talk about. So this is still very much in the work for us, but what the contract watcher gives you is the ability to automatically look at events that are defined in the ADI. But what we've found is that in more complex systems of smart contracts, you also got to worry about things like anonymous events or custom events where the payload doesn't necessarily, like the types that you want to decode the payload into and Postgres don't necessarily match what might be defined on the ADI or the topic zero on a given log event is a little bit different than what you might expect. And so our answer for dealing with that sort of thing has basically been to say, hey, you can write your own plugins and these plugins can be used to, you know, take care of that. Another thing that composes and executes plugins enable you to do is to look directly at storage tree nodes. So I'll be continuing to talk about this during this talk, but I think one thing that is really interesting that the community sort of needs to, like, reckon with is that, right, in order to access state you traditionally need to run, like, an archive node, right, if you want to have historical state. And so a lot of folks don't want to do that. They want to be able to access their data running just a full node. And the way you can do that is you can sort of stuff your state into events, which is cool, except it means that we've just sort of moved the problem over where instead of, like, state load or you have the state load but you also then have, like, event load, right? There have been proposals already floated in the ecosystem to start pruning historical events out of full nodes as well. And the state that you get from an event isn't necessarily the state that's actually, you know, what's happening on a given contract. And so one thing that we've been working on a lot with plugins that we're developing is to look directly at storage on a contract. Now, traditionally you're going to have to do this with an archived node but we've got some ideas there as well. Anyways, plugins let you look at storage tree nodes. You can automatically decode, like, what is the true value of a variable on a contract. And plugins also enable you to make more complex queries, right? So in order to sort of crystallize what those complex queries look like, I'm going to bump ahead to post-graphile. So post-graphile is not something we built, but post-graphile is a super awesome tool. We support Benji on Patreon. I would recommend that everyone else does as well. What post-graphile enables you to do is you just literally, like, bump post-graphile from the command line and it will inspect your post-graph database, identify the schema, and, like, automatically make that available to you in the browser, which is just, like, super sick hotness. So thank you so much to Benji. He's also super responsive on Discord. We found it to be a really easy solution to sort of expose the API in the browser with the data that we have in post-graph. Of course, if you're interested, you can always, you know, just make queries directly against that post-graph data but for browser access, post-graphile's been great. Some things to know about post-graphile is that, like, it'll automatically discover relations between your data, right? So, for example, I mentioned that events we decode are associated with a block header that exists in the database, so you can automatically get header by ID with the event that you're looking up and see, you know, metadata about that block header if you want to see, you know, what block number was it or whatever. Post-graphile also exposes built-in filters and conditions and so forth, which means that you can, like, basically apply those exact tools to your post-graph database via your GraphQL query. It supports subscriptions. It has notifications about new data hitting the database. There are computed columns, which enable you to, like, append data and custom queries, which is what I was mentioning with the more complex queries on the last slide, right? So, you can, in a migration, like create a query that aggregates data from, say, multiple events or tables and those will show up automatically in post-graphile. So, that's the toolbox, sort of, like, as it stands right now. Again, you can try it out, totally open source. The thing I wanted to talk about next is sort of, like, what are we driving toward with our upcoming work? And so, the first thing is simplifying our interfaces for plugins, right? So, I mentioned that you can write plugins and they do super cool stuff, but our mission there is to make sure that in order to write a plugin that enable you to get things like storage tree nodes or anonymous events that you have to write, like, the minimal amount of bespoke code possible for a given smart contract. Another thing we're working on is client patches to emit these storage diffs during a sync. So, I mentioned that, like, traditionally you need an archive node to access historical state and that's kind of a bummer, but, like, one option that's definitely on the table would be to enable a subscription over the JSON-RPC interface that would just, like, spit out those diffs as they're happening. So, I'm going to publish diffs to me if they come from a given smart contract, right? So, part of our sort of open source work is to figure out a way to get this done and get that upstreamed into GAP and Parity and so forth so that you can sort of plug in that subscription directly to VulkanizedDB have a plugin that's parsing that data on the fly and you can be accessing state data with a plugin that took, ideally, a minimal amount of code. Another thing that we've got coming down the pipeline is what we call the super node. So, the idea for the super node is that we will be automatically digesting all those state and storage diffs but also blocks and also proofs for the diffs that are coming out of a node and our goal is to publish all of that data to IPFS and have VulkanizedDB serve as sort of a filtering layer where you can say I want to get all of the diffs from X contract and we will give you a list of contact addresses contact addresses CIDs and you can then query IPFS for that data and you can get the proofs with it right so you don't have to trust that we're giving you the correct data or that that data is valid because you can submit that proof to your node if you want in order to verify that that data is in fact what we say it is and so, again, we think this is super cool work like share Docker compose style demo that lets you do this at DevCon next year maybe or even sooner but that's some huge priority stress and, you know, we always welcome issues, pull requests and people think that other priorities are worth tracking down as well this is the part of the workshop where I would be saying like let's go ahead and try this out everyone run your own instance of VulkanizedDB on your machine if you have internet, we even like performance tested a node back home that you can connect to but I don't think performance is going to be a problem with that internet issue here so to walk into our stack VulkanizedDB is written in Go it goes very nice because it enables us to really easily integrate with Go Ethereum for like unpacking logs and so forth Postgres, obviously talking about that GraphQL this setup on this slide is exactly what it says on the board but, you know, try it out when you're at the hotel or home or whatever like definitely interested to get feedback on this so what I wanted to point out about this setup that we're showing is our config file this config file is 20 lines this config file also yields parsed events from all three mo-lock contracts the mo-lock contract, the yield bank contract and the mo-lock pool contract some of that data at the top I could have put in environment variables to slim this down even more but I wanted to show you like everything you need to do this work so what's happening under the hood you've got this configuration file and then you're running three of the processes that I mentioned so you're saying I went on the header sync process pointed that config file starting at the deployment block contract watcher process with that config file and the most cumbersome command in this post-graph file but that's our fault because the way that we set up these schemas for the contracts is header, basically says this is the header sync process and the contract watcher that's creating this data and then it's the contract address so fairly straightforward to populate this data with any arbitrary address that you want to, you can point at this schema it'll be there there's a little W flag there which is important you can kick off this post-graph file process as soon as you kick off the contract watcher and with the W flag it'll be watching the schema so you'll see a warning perhaps that'll say hey we don't have anything in this schema right now and then as the contract watcher starts to populate that it'll be like oh okay the schema's here you can load it and when you refresh your browser you're gonna have that data exposed in post-graph file so a workshop well an ability to run this stuff we've got I've got sort of a demo here so this is this is a setup instance so I did this whole thing these steps that I'm asking everyone else to do and this is the post-graph file interface that we have so graph IQL which was a command we passed to post-graph file that means that we get a list of all the available queries on the left hand side and then what we've got in the center is graph QL interface with these queries so I've selected these two ownership transfer events and withdrawal events because we can see that if I ask for that I get 23 events and if we pop over to the main source of truth either scan we see that there is an attack 23 events on the MoLock pool contract so that's cool that lines up checks out within these queries we can do some cool stuff so we can say I want to see the nodes and I want to see the parts like new owner and previous owner data and then I fire that query and like there it is ownership was transferred from address and I can do the same thing with withdrawal events I want to see the amount I want to see the receiver and cool these amounts are weird because it's like PBM but there's a fixed value you can divide that by to get something that's a little bit more easy to parse an easier event to look at for understanding this would be the rage put event because they're dealing with shares as opposed to a monetary value and so if I want to see shares to burn and the member address from the rage put events on the MoLock contract then it's like okay these are pretty nice numbers and we can see here we've got like 99 shares burnt we pop over the MoLock contract on ether scan we see like oh interesting here's that same address in topic 1 you know it's a padded hex value and if we do code this to number it's like 99 this event is that event so okay you know you're just getting like automatically decoded events I had you know I can walk through all the queries but I think folks get an idea of what we're doing right you're automatically getting all these events parsed into postgres with like three cans cool so trying to avoid opening my email again and talk about customizing it so you know the idea for the workshop was like first you can run this docker compose that will enable you to like reproduce exactly what I just demoed I did want to mention that thing that I demoed that database ends up being about three gigs that also took me about 30 hours to sync on my own wifi so continue to work on performance that cost is amortized over the life of your system right so you might have like a high up front cost to get totally sync with all the events that have happened throughout history but then you're going to stay in sync if you're running the process continually not going to fall behind and yeah that cost to amortize which is built but what that means is that you know you start seeing events immediately for things that happened early on in the history of those contracts but you wouldn't see all the latest ones until your system finished thinking but if you wanted to check out some other stuff you could write your own config file so that was 20 lines to look at 3 contracts but you could look at 17 contracts and more lines that's up to you you could run it with an address where the ABI is not published on EtherScan right so there's an option to supply the ABI in that config file for the contract if you're not dealing with you know verified source code so it's not a constraint for the system that the ABI has to exist on EtherScan but that minimizes the amount you have to include in the config and if you wanted to run it locally like not in Docker then you can you know do a few things that we have hidden for you in our composed script right so you have to like create a database that Volcanize can connect to we use Go modules so you have to turn that on to enable it to build you end up with your VolcanizeDB binary and you should be able to run these things locally and then if folks were able to you know be like ok I did this I'm bored and I like made my own config looked at my own contracts and bored again I'm running it locally on my machine I'm still bored then like option 3 that I thought would really take up everyone's time would be ok let's start building our own plugins right so all the plugin architecture is in the library shared folder of VolcanizeDB and specifically in the factories directory you can see we've abstracted code that will handle like the overarching process of syncing anonymous events or syncing storage diffs that you just have to write like a few small dependencies to basically tell us like ok you've given me an anonymous event like how does X become Y like something that you put in the database but yeah and you know I'll stick around after this talk I would love for people to give this a go if we can but there we are so I wanted to say thank you I definitely have not done this by myself I you know have the privilege to speak up here but Rick Dudley, Elizabeth Andy, Gabe, Edward, Ian, Connor, Guston have all been super instrumental to getting this project to where it is today we see support from the Ethereum Foundation for which we're tremendously grateful and also from MakerDoc so thank you so much to those folks for supporting our effort we're really excited about hopefully a lot of value can be delivered to the community with some of the tools that for caching historical data and that is pretty much what I got for questions do you support subscriptions on GraphQL? yeah post-graph out is just like the most awesome open source projects it supports subscriptions out of the box so yeah yeah we're just curious the intercom file that you showed us in the address how do you deal with some tabs that essentially spawn from the contract for each individual user with the same bytecode and now you're ending up with thousands of addresses yeah so you definitely want to write a plugin to deal with that because what the contract watcher is giving you is it's giving you an upfront facility to say given a contract address which I think has some value on its own but we're not trying to be super aggressive beyond that with the contract watcher itself but with plugins what you could do is you could say given that I see this event that I want to spin up maybe a new instance of the contract watcher pointed at a field on that event that is an address that I care about or any number of ways you can implement that to run the process we have code that does that some IANs code there's a couple different teams that work on both SDV so there's another code base that actually does that it will read a contract to get a contract address and then write the code for that address yeah similar question is like from now you want to just listen for ABI whatever address typical case 721 for example to keep track of all the 721 token yeah so that's actually what I was just talking about it was an ERC 20 watcher and we just did an ERC 20 watcher but obviously you could change it to make it an ERC 721 watcher as well yeah I mean one thing just to expand on that point too you know we've sort of tinkered with a variety of different amounts of like base data that you can scan for when you're running VulcanizedDB and like just like digest everything and then anyone can run this and digest everything but like I think a reality is that a lot of people who might get value for VulcanizedDB are not necessarily super stoked to like pay the performance cost that it takes to digest all of that extra information if they don't need it right so a lot of what I've been demoing here is like a pretty lightweight process of syncing headers and syncing events from specific targeted contracts there are more tools in the box that enable you to do more heavyweight stuff if you've got the info on the motivation to do that and we're hoping the Supernote can help out with that a lot too I'll go back to you I have another question Is there a technical reason for not syncing backwards? Because you mentioned that you start syncing from the starting block which means it takes you 30 hours to get to events you probably care about yeah so I mean the main thing is just like I guess the latest blocks are more likely to be removed by the header sync process right so you'd be potentially doing a lot more redundant work you can totally get around that if you're interested by starting the header sync process with a more recent starting block number right so I was demoing with the deployment block as the starting block number so that you know you're going to get full of the events but if you really wanted to say like I just want to spin up an instance to give me like what happened the last five days then you can pop that in as a parameter to the command a block number where you start caring about stuff and then that's where the sync will happen from yeah but I mean you care about everything it's just that in the 30 30 hours you want to do something as well yep and you have a book watcher watching for new blocks so your reorgs can just manage just the same way as they are right now yep it's just instead of starting at 7 million whatever you walk backwards from the moment you started the book watcher yeah but before but before yep so the short answer is it is technically harder to do that just generally Gath doesn't like to go in reverse so I mean that's kind of we can if you're really curious about it we can really talk about it short answers Gath doesn't want to go in reverse well I mean that's specifically true also with like subscriptions right so part of the idea here is that you could easily have like a subscription to events but then those are going to be fired as your nodes like processing the blocks that it's going over so part of the rationale for the setup is like you could use either subscriptions or get large queries to progress forward but I think that's a really cool idea and something we could totally dig into so appreciate it yeah it's kind of a work in progress I mean we have I think we had like a doctor file a long time ago and it was like pointed at ranked B and we found it to be like not super useful for our day to day development so lately there's been a lot more work on that and we have a lot of progress on an open PR I think it's called like doctor updates but yes it's not on master yet and the reason but the reason that this one is on its own branch is because like this is pointed at MOLOC like maybe it makes sense at some point to have like an ample doctor files but we just didn't want to be like the purpose of VulcanizedDB is washing the MOLOC contracts you know it does a lot more than that yeah I'm curious about the limitations of the plugin API so currently from what I understood plugins act as a map let's say so they map events or whatever to another events or whatever can they act as a reduce so for example reduce I don't know on the database level like I don't know sum up transfer balance whatever and how does it work with reorgs yeah I think I'm gonna go straight to Rick on this one oh okay I mean yeah you would be running reduce but then as you pointed out so what you would be doing is we would have a table like we have now and then we would just again have a plugin that read from that table and wrote a new table and then it would know to compute the new table so the first table that I described would have all of the things that could get reorged and then the secondary table would get re-computed so you can write everything that I just described because you can do all these things in plugins and that's sort of that's how we would have addressed that but yeah I mean when I designed the system originally I thought most of the utility would be in reduce that's just not the way that development happened to go yeah you said in your presentation that you're not only about those so-state genders yeah yeah so I mean that's work that happens in plugins because we're actively trying to sort of figure out how can we either do this automatically or ask you to write the bare minimum of code that you need to to make that happen but in a nutshell the way it works is that you have a mapping of given storage keys to given contract values so that's pretty straightforward for static values on contracts like it's literally just like to index the value on the contract so you can have a mapping because like index one is you know the total supply or whatever but it gets a little bit more interesting when you start dealing with like mappings and dynamic arrays because generating storage keys for that generally depends on secondary data so like an address to map to balances or whatever and so the way that we're looking at doing that is basically you flag some events as like these have addresses that I know for sure in this mapping and then will like automatically based on the index of the mapping generate the keys based on hashing that address with the index such that you can recognize like all of the storage divs that are coming off of a contract after a given event has been synced and then you know after you know like what it maps to it's a pretty trivial process of just like decoding the value associated with that storage key into the appropriate type well thank you so much for coming out to listen to me talk again we're really excited about this work and hope that it can provide a lot of value for the community and totally open source