 I'm Ben Mimichewicz, I'm a software and smart contract engineer being in the space since about 2018. Today, I'm basically going to do a tutorial where we'll build through a simple blockchain indexer and just sort of explain how indexes work, different pipelines you can design and make, and in general just interacting with Gath on an sort of RBC level. Okay, just some background quickly about me and how I would love to run the presentation in general. Yeah, so as I mentioned before, I've been in the space since 2018. I used to work at a consultant in Labros, which is an Australian smart contract consultancy, and then I worked as an engineer at TracerDAO in Mycelium, which is a perpetual swaps protocol on arbitram. Yeah, I was the team lead at ReputationDAO, which was a project at Mycelium where we focused on indexing and monitoring oracle systems on chain to make sure oracles are essentially held accountable and also that they're functioning properly and accurately. So sort of just detecting errors sort of in that pipeline. So in terms of my experiencing in the indexing space, I've been working in indexing for about two years. I'd like to know what I'm talking about, but I guess we'll see. In terms of questions, please just shoot up your hand if you have anything as I go. Okay, EtherScan. EtherScan is probably the most popular indexing tool all around. I've spent a lot of time on an EtherScan. It's basically a hobby of mine. I'll click through random blocks and see if there's something fun. If you've ever accidentally sent ETH to a contract instead of calling the function and you think nobody saw it, I saw it. So this is one trend. Oh, there's a screen here. This is just one address that I found this morning in a block. And it's essentially an EOA that seems to be fanning out ETH to a bunch of addresses extremely quickly. So it's sending out four or five transactions trying to spread out, I think, about 0.8 ETH. Yeah, but it's really worthwhile if you haven't done that to sort of explore what's on EtherScan and see how interactions are happening on chain. Okay, so observability and transparency. I know at the moment, ZK and privacy and transactions is highly important. And that's absolutely true. And if you want privacy and transactions, go for it. But on the other side, observability is also highly important. Just as a question, does anybody have any pros that they think having transparent transactions have over ZK? Yeah, no corruption. Exactly, nice. And I'm not trying to just shield transparency. Don't worry. I don't work for the SEC or anything like that. I'm not after the taxes, but it's still highly important in certain systems. Oracles, which I work for, don't really function on zero knowledge. You have to know that your oracles are operating properly and you have to know what each oracle is submitting. Having anonymity there reduces the accountability of those oracles and it can cause your systems to not function the way you hope. If one bad oracle transaction can completely crash a market, especially derivative markets, this is actually one example here. Also from this morning. So Manga markets lost $100 million last night because of oracle manipulation. So making sure that you can observe what's happening and also the exposure that those oracles have onto the each market is very important and something where transparency and building an indexer is highly useful. OK, what actually ends an indexer? The way I define an indexer and the way I like to think about it is it's basically an ETL program where your back end is communicating directly with the node, you're requesting data from the blockchain and you are converting that data or storing it in a unique way that makes it more accessible or useful to you. Yeah, sorry, let me just show my hands here. So why would I actually want to store it differently than on-chain? Because the blockchain is a database in and of itself. All the data is always there. The problem becomes, though, if I want to retrieve specific data or a range of data or God forbid, create some sort of average across it. So if I wanted to do that, I might have to make hundreds of e-calls to process one transaction or one event that I define on-chain. So one really popular reason to make an indexer, which a lot of now RPC services like Alchemy and Coinbase support, is to fetch the transaction history of an address. That is not possible directly when you're directly interacting with the node. You would have to pass through every single block that that address has made a transaction on and then manually look at those transactions. Another really good reason is non-ephemeral data on the chain. So mempool data and a thing that you run through as a test as an e-call doesn't stay on the blockchain forever. Mempool data in particular is highly useful for arbitrage strategy, but there's actually no consistent history of the mempool. Obviously, mempool also differs from node to node, but having a history from a range of nodes is highly useful if you want to sort of backrun strategies that you could have done if you're an arbitrage. Yeah, also a gas estimation as well. If you're making e-calls that don't actually go through chain, you want to have a history of how your contracts will look when they're finally deployed, creating a mini indexer that runs off your local host non-contract node is absolutely a valid option. Now I just want to quickly break down a log that happens on chain quite frequently. So this is what happens when a chain link oracle submits, well lots of chain link oracles actually, submit an answer for a price feed on chain. So this is the decoded version. I'll show you the encoded version a little bit later on, but even here, not everything is super obvious to the eye, right? So you have the answer at the top, which is obviously the new value that price feed will have. For those curious, this is the price of one inch, but there's other points that are really confusing, right? If you look, there's observations where you have what seems to appear as an interay rate and you have observers, which is just this confusing byte array with a weird name. You have raw report context. Let's sort of just look at what's going on. Okay, so step one to creating any indexer, creating any indexer, sorry, or sort of observation solution is first, you have to understand the contracts that you're working with. What your ETL really does and what it should aim to do is to take this raw like hex data on chain and give it new meaning or new accessibility. And to do that, you have to know what the contracts are doing already. Just copying the data across isn't really enough. So sort of on that talk, on that point, sorry, I'll just quickly explain how OCR works. That was the chain link log that we just looked at previously. So how chain link operates nowadays is that also I don't work for chain link, I just wanna say. How chain link operates now is that most of the aggregation and collection of data happens off chain. So a price deviation will happen, all these nodes will look at their different APIs and they'll send all of their answers to a single node. And then this node will submit all of these answers and the final aggregated answer on chain on behalf of everybody else who's supporting that price feed. So sort of knowing that, I'll also just go through a simple timeline. So as an example, say I have a chain link feed that's monitoring the price of Ethereum. When the price of Ethereum dips by certain percentage, all the chain link nodes will know and they'll start a new round. Every single chain link node will then send an answer that they've collected from their APIs to a leader. That leader will aggregate that answer normally taking a medium of some sort. He'll pass this along to the elected transmitter for that round and then that transmitter will send off a message on chain containing all the information that everybody submitted individually as well as the final answer. And why I sort of went through this is by knowing this information, knowing how the system works, we can now make sense of the log that we looked at earlier. Okay, so believe it or not, this observers array right here, byte array is actually a list of the addresses that each of these observations come from. So if you, all those numbers, there's a corresponding oracle address of the oracle that submitted that answer. And that answer is the number that they thought is true. Another thing to note is that number is very large. Yeah, if that number was exactly how it is, I would be retired as holding like 51 inch tokens. But here I am. So this is sort of another step we have to overcome. So we're sort of missing more information, but we know it's available somewhere within the contract. So this information is available through a view function in the chain link aggregator contract called transmitters, which returns a list, which returns a list of the addresses of each oracle that is supplying that contract of data. And there's another function called decimals, which tells us by how much I should divide those numbers, those observations to get the accurate, like human readable price of one inch in this case. The problem though, is that these variables can change block by block. These are adjustable variables. Transmitters change all the time, not only for security reasons, but sometimes certain oracles are better at performing on certain feeds than others. And decimals can change when different markets require more precision out of a price feed. Okay, so this is sort of how we break it down. So I had my observations eroded there and the way to break it down, and you can find this out by sort of reading the contract and then doing a bit of guesswork, is that if I look at that byte array and I split it apart into pairs, each of those pairs can be decoded from hex into a number. This number represents the index of which that observation in order corresponds to which item in that transmitter's array the observation was from. So in this case, the second observation here was from the second oracle. So we're indexing from counting from zero. So zero x, zero three, one, two, EA, dot, dot, dot, responded with four, three, nine, three, nine, nine, blah, blah, blah. So already we're sort of getting kind of complex in how we're just deciphering this one single log. So we'll go through the steps and code as well, how we would do all the different calls, but essentially at the minimum, what we have to do to fetch this data is first we have to encounter this chain link log, then we have to pass the data back into regular data types, take out transmitters, buy an array, break it up into pairs, convert that into hex to get a number, call the transmitter's function with an eith call, and then pass out which address is with which address. Now doing this for one log, not too bad, I probably wouldn't wanna do it by hand, but imagine if you wanted some sort of aggregate data. Imagine if I told you, okay, tell me how accurate this oracle was on every single Wednesday for the past six months. Now you're passing hundreds of logs, you're making thousands of calls, and if you want this to be done on demand, it's gonna be incredibly slow, it might take minutes. On the other hand, I could make an SQL statement like select average where oracle equals this, where time equals this and this. I'm converting hundreds of lines of code, thousands of HTTP requests to one line in SQL. That's sort of the value that indexing can bring. Before we actually get into the indexer, I also just wanna go through the data types that Ethereum has. So these are sort of our avenues into getting different data. So there's actually more than this, but this is just a quick summary. So does anybody know by heart what things that you can search on EtherScan? Well, any like blocks, contract addresses, any others? Okay, yeah, transaction hashes, yeah. So transaction hashes, addresses, block number, block hash. Sorry? ENS names, yes, nice. No, you cannot, not within the top search bar, yes. So these things are the things that Ethereum will naturally index. So you can already select these from the node relatively quickly and they each have a relationship with each other. If I have the log, I can retrieve the contract address that that log was emitted from and I can also retrieve the transaction that that log came from. And then through that, I can also retrieve the block number that that transaction happened on and so on and receipts as well. So transaction receipts, so you can see how much gas was paid for that receipt. But the point here is that you're sort of migrating across these different data types to collect all the information you need for your index. So dissecting a log. I'm just gonna go over the structure of logs and for those curious, the reason why I'm concentrating on logs is that there's quite a lot of functionality within the GIF client or blockchain clients in general for creating these indexers based on logs. Creating them based on transactions is also possible but a lot more manual. You have less sort of filtering power on that first step. So how a log works. So logs are constructed out of topics and data essentially. So topics are also data. Topics one to three are data types that you can emit in your event and mark them as indexed. And why this exists is that instead of building an indexer, I could potentially just use the blockchain client as well to look for cases where the data is equal to a particular amount. So I could tell GIF, tell me when topic zero is equal to X and topic one is equal to this and retrieve all those cases for me within some block range. Topic zero, I'll go over in a moment. The data, sorry, something to also note on topics is that you can only index up to three fields. Data though, you can throw as much stuff as you want in there. And this is like highly valuable. And a good way to think of logs and data in general is sort of like a print statement for your smart contracts. So if you've ever like debugging and you know, we're all programmers here and we hate using the actual debug tool. So we just print hello. This is exactly what a log is. Yeah, so we also have the transaction index, which is at what point our transaction appeared within that block and also log index, which is separate, which is where this log appeared within that block, which is normally completely different. This also removed, which is like a bull that just signifies if our log was removed from the canonical chain due to a reorganization. Topic zero, topic zero is highly useful. Topic zero defines the log, I would say. So what topic zero is, is I take my event definition. So say I have an event called transfer and it takes a U interest in argument. I want to look at every single instance on the blockchain where this transfer event happened. And to do that, I have to calculate topic zero. And to do that, I take the kick hack of my event name. So transfer and then each data type that is within that event definition. So here it would be, you know, Uint 256 or whatever. Some important things to note. You do not include, and I learned this painfully by experience, do not put spaces between the types because your hash will be wrong. And also don't put the name of the variables within the type because the EVM doesn't care about that. So something to note as well is that this topic zero is unique for every single contract. That is to say that you can only have one event with one exact hash definition per contract, but this is not stipulated across the whole chain. So I could potentially say look for every single log with topic zero, but I might actually collect data for four or five contracts that are doing a completely separate thing. That's just something to be wary of. Storage, another really cool thing that you can do when you're building an indexer is accessing the actual contract storage. So private variables now become completely accessible to you. This is really good. One of the main use cases that I've certainly used this for is when a contract is implementing EIP 1967, which is the proxy pattern. So you have a contract that's implementing all the calls from another contract and that contract's address that's actually got the implementation is always in a specific storage slot. I can go into this example later as well, but this is just one hugely powerful example of stuff you can access if you're building your own indexer. Okay, infrastructure design. So this is a bit of a web two thing, but it's important to think about how you want your indexing application to run. You can make it as complex or as simple as you want. So if it's just a regular ETL, you can have your program connected to a node and then you're inserting into a database, but you can do so much more. You can create a hackathon submission that is looking for events on lens, for example, and then I'm sending out alerts through EPS every single time someone's profile gets liked. I could scale my application. I could have several nodes, have a load balancer between it, so I'm not overwhelming any single node. I could be sending off my messages after I've dealt with them to an AI analysis and I could be storing them across several database nodes. Stuff I've worked with personally that I commend is a sort of microservice architecture. You can have Kafka sort of in the middle as a message broker and pass information between all your services, all your different storage points. Maybe you're an arbitrager and you want to do sort of an analysis, throw that analysis inside in-memory database like Readers for quick access while simultaneously throwing any archive data into your own sort of Postgres database. Database options. So indexing is really powerful because you can choose how quickly you want to access what subset of data. So these are just like a few I threw up. So timescale DB is extremely interesting but any time series database sort of proves this point. The blockchain doesn't give you an option directly to look through transactions within time range. You can specify a block range but that's not really always ideal. Blocks take a different amount of time to create over time and you have to do sort of extra legwork there if you want to convert from block to time. Timescale off is something really awesome especially when you're looking at a large dataset like a blockchain and that's continuous aggregates and these are materialized views that you can create for aggregate data. So for example, I can calculate the average every single week of how much gas is spent on Ethereum and I can permanently store that data in a view and access it instantly as opposed to having to recalculate that data. KDB is very popular within sort of the MEV community from what I've seen. It's a completely in-memory database. It's extremely fast. So similar along the lines of readers but far more performance. It's used a lot by quant firms, arbitrage and also formula one like racing. So they use it for in-race analysis and stuff like that. Postgres, absolute classic, completely free, perfect RDS system. Why would you use anything else? Don't talk to me about my SQL, I hate it. Okay, now I just wanna go through a quick code walk-through of a simple application. So we can actually build these indexers in about 150 lines of Go code, easy. Okay, so first I wanna explain my language choice so I've been contested on it in the past. I really like using Go for the back end and interacting directly with Geth and with the Geth library. Like first of all, Go is fast. Geth is probably the most well-maintained library. It obviously doesn't have features of certain other blockchain Ethereum clients but it is definitely the most well-maintained and it always fits through the specs, the yellow paper specifications. Go has a huge bonus and that parallelization is directly built into the language. So we're gonna be doing a lot of calls through web circuits or through HTTP and we wanna paralyze these and we don't have to worry about race conditions when we're using Go. Yeah, another huge bonus is that all of these functions that we're gonna call to extract data, our program is gonna be completely portable to every single EVM chain except from memory Xdai. So we can port our application, make it as generic as possible and we can start looking at events like Polygon, Ethereum, Optimism, Arbitrum, whatever you want without having to do any extra legwork. It's like deploying a smart contract to multiple chains. And the reason this is is all these RPC calls that we're making are specified within the yellow paper. So any EVM client that is being created is gonna have to fit through these specs. Okay, creating a client. This is like pretty basic code. So I'm just creating a HTTP client. I'm feeding in my alchemy sort of key. That's what the RPC thing is there. It's just a string. I'm doing a bit of error handling here, making sure that I am in fact connected and in fact connecting. And that's really it for this. There's really not too much to worry about here. Yes. Okay, that's a good question. I'll just repeat it. So the question was is should you run your own node infrastructure or use an outsourced node like alchemy? So there's several advantages to running your own and a few disadvantages. Running your own node, beyond you just have direct access and you can manage all your load yourself. You can also put other sort of add-ons on top of that node on top. You can put add-ons on top of that node to make the indexing even faster. So when you're doing retrieval, you're sort of indexing already on an index which is fantastic. Me personally, if I'm running in a production environment and I have an application that's fully running as an indexer and my system is depending on it, then I would run my own node. I see some like node operators here and I feel bad. But for hackathon projects, absolutely use alchemy. Fantastic solution, not just alchemy. And any infura, whatever. I'm not sponsored guys. I'm sorry, I keep using the same names. Web sockets versus HTTP. So most providers will allow you the option of accessing their service through a Web socket or HTTP. I'm just gonna go over this quickly because it's more of a web two thing. HTTP is normally how your wallet connects to the chain. So whenever I'm making a request, my wallet is creating that TLC connection and it's shooting back a message and that connection is then closed. When you're creating an indexer, you're gonna wanna make lots and lots of requests and that extra latency between creating a new handshake every single time is actually quite cumbersome. I would just say straight up that if you can use a Web socket connection, always use a Web socket connection. No reason, not to really. Okay, now we're gonna look how to create a query and this is an SQL by query. I mean, we're gonna query data directly from the node. So there's this Ethereum geth object called filter query and it does exactly what you think it does. It takes an array of addresses from which I wanna collect logs from and it takes a list of topics. So before we talked about topics and how you can index up to three and you have topic zero. So you can specify here, I wanna collect data from which topic zero and which other topics I wanna be equal to some hex value. And I sort of just declare the subject and that's all you really need from here. We haven't actually made the call yet. This is just constructing the query. This also takes a block range which I didn't include for some reason. Next, okay. So now we're actually making the query and this is where block range becomes very important. So there's two ways in which I can request data from a blockchain node. I can do subscribe filter logs which is I'm establishing a connection to the node and I'm telling the node, okay, here are my parameters. Whenever this happens on chain, send me a message. So this is really useful because now I can sort of distribute my load from the node because I get these messages one at a time. I can process them and as long as my application is at least decently efficient, I'll process everything without any lag. Now, this always isn't an option because one thing about subscribe filter logs is that I can't request historical information. I will only be receiving data from the point I've subscribed. So from the next block that's coming. Now I can also do filter logs in which I can request a bunch of logs that have already happened. So that's essentially what that does. It's really useful, especially if you wanna do a historical analysis. The thing to be wary here is that it is like quite hard on the node if you're asking a retrieval of megabytes or gigabytes of data and also your application is gonna have to keep time as well. Definitely super expensive if you wanna collect the entire history for a particular log throughout the whole chain. Okay, channels. This is a go thing, but it's also extremely important to know and it's one of the reasons that I suggest making these index applications in go. So what a channel is and goes is essentially like a pipe. So if I'm sending data from A process to B process, I use a pipe, but there's certain issues associated with doing this, right? So you have to deal with race conditions for one, which is a huge headache. Also piping can be quite messy. I don't know how you're doing it, but you can do it like through a bash script. It can get pretty weird. What go offers is channels and channels are naturally blocking. So I don't have to worry about message one getting there before message three and screwing up all my analysis. The channel will only send a message across to my next process when that process is ready to receive the message. And this for loop here was sort of declaring an infinite for loop and using the select statement and the select statement is just saying, I've created this channel logs one and I'm waiting continuously until something is sent to that channel and the blockchain node is sending data to the channel. And I will pull out that message as soon as it comes and select is just saying whichever one comes first. So if my channel comes back with an error, I'll deal with that and crash my program or if a log comes first, I'll go and do some processing. Okay, how do I actually process data? So when I request data from the blockchain and I get logs back, I actually get back a pretty messy data structure. Not messy, but not human readable. So what I'm gonna get back is logs data structure which we went over earlier. I'm gonna have these topics and I'm gonna have the data field inside the log. And if I'm pulling the transaction similarly I will have the data field within the transaction field which is identical to the sort of log data. Now, passing this across is pretty complex. So there's natural padding that happens from RLP encoded values and I also could convert from hex back into sort of regular values at least for go to interpret but also so humans can interpret it. I don't really know what's happening when somebody says, oh, the value came back is like 0x60 zeros and then a three. So what we can do instead is we can generate an ABI and this is exactly how EthaScan does it. And for those not familiar and ABI is a specification of all the functions and all the events that happen on a particular contract. And I can generate this by going to a contract and using Sol-C, so that's the command at the top, Sol-C ABI and the name of the file or the path, sorry. And then I'll get a huge spit out of a JSON file and we'll use this and define it as a string and we'll use it further along but the ABI is highly useful but something to note is that it does not directly appear on chain. On chain is only sort of the compiled like bytecode. So the ABI you have to get by having access to the source code or sometimes people upload it to EthaScan and what the ABI allows us to do is the ABI knows as I described before with the CAC it knows all the input types for all the functions and all the events that are defined within that contract. So now I can use the EVM to, oh, sorry. It's plugged in, should come back up in a moment. Okay, oh it is back, fantastic. So I can use this ABI and use the EVM to decompile all of this hex data that I'm gonna get blurted out back into regular data types. So creating an ABI object similar to the JavaScript concept but I'm gonna call this ABI function which is also a ABI library sorry that's also part of the GIF module and I'm going to pass in the ABI string that I defined previously. I'm gonna check that it is in fact a valid string which is important and then I can start unpacking data and that's that second sort of code block in there. I call my object, I tell it to unpack, I pass it in a string which is the name of my function and then I pass in my data and what it's gonna spit out is a huge array of interfaces and interfaces are just generics in Go. So it's data that I don't know the type of and I have to just tell Go which data type that is and I can find that out by looking at the smart contract code I can find that out by looking at the ABI and even if I make a mistake here Go will tell me that I made a mistake. If I try to assert wrongly that it enters a string Go will tell me oh actually you wanna type a sort to a string I don't know why I can't just do it automatically but life can't be that easy. Okay, working with a database. Okay, so now essentially what we've constructed is we have a system that's requesting data from the node either live or through a historical query. I've collected that data, I now have a method to unpack that data into regular Go data types and I can convert between them as I wish. Now I want to insert that data somewhere. So this is a pretty standard way of just interacting with a database and Go completely fault-proof and it's using a library called Gorm, so Go RM. It's used to be sponsored by Chainlink actually as well. Pretty fantastic library, highly recommended. So it's very easy for what I have to do. I just define a struct which has what I want my table to look like. I throw on some strings where I want my primary keys or foreign keys or indexes and I'll just leave that in the module by itself. And then what I'll do is I'll initiate a connection to my database and it's also just this one line code. So I just tell it to open. I have Postgres listed here but it has support for most popular database management systems and a Gorm config which you can specify different things in but we can just leave it blank. If you're not doing something fancy you can pretty much leave it alone. I do some error checking and then I've got migrations. For those unfamiliar, migrations are fantastic. If you've worked for a production system and you've wanted changes in your database and you basically it's like a nuclear reactor, like two people have to turn their keys at the same time to edit a table, migration sort of sidestep that. What Gorm will do is I can pass in my structs that I defined before and it'll automatically create or edit the tables that I already have. So that way the code that I have is gonna be exactly represented in my database. Inserting, so we're pretty much like done here guys. All I have to do now is use the structs that I previously defined, pass in my decoded values and then I just have this one line insert or update on conflict. So what this essentially doing is I'm sending a message and it'll be acid, depending on your database of choice, I guess, out to the database and I'm telling it, okay, if there's no entry, insert it, if there is, update it. And updating actually absolutely can happen. Although the blockchain is immutable, remember that the node is getting new data all the time, reorgs do happen. So something you wanna watch out for is your block hash changing and also your block timestamp. Profile IDB, oh, sorry, we just have type assertion here so you can sort of see how these are getting passed back into the struct. So I have my profile ID and this is an example that I did based off lens. So profile ID and lens is an integer which represents your user on lens. And then also NFT, which is an NFT that's generated on lens whenever you follow somebody. And I just pass these into that struct. I'm calling my, sorry, database.clause is update function and I'm passing in a pointer to the struct. Okay, so that's basically it. I can go through some actual examples as well but I just wanted to do a Q and A. Actually, how much time do I have left? About plenty of time. Yeah, oh, okay. So EIP 1967 basically says, it's a proxy pattern. So say I wanna make changes to a smart contract. You obviously can't really do that. So what I can do is I can define EIP 1967 contract and tell it, do all the functions of another smart contract and I can change which smart contract is, which smart contract logic is actually being changed by just changing that variable. But the problem is that there's no view function for the contract address that's on that 1967 contract. But the storage slot where that address is stored is always the same across any EIP 1967 contract. So I can retrieve this address and then I can query that contract for any information that I want. Does that help? Okay, yes. Yes, I will, I wanted to clean up a bit, but it's submitted within the ETH hackathon. I can send a link out if I'll send it out on my Twitter, which is at the end. How many block confirmations do you have on the database or how do you update? Yeah, I'll just repeat the question, but the question is, how many block confirmations do you wait before inserting the data into the database or how do you deal with updates? So the great thing about the system is that you don't have to wait at all. Every time there's a new reorganization, the logs will be like re-put through the node and then the indexer will automatically update that column for you. So the data you have in your database will always be the most accurate that it can be and most up to date. Sorry, yes. So the logs, it's like voluntary? Yeah, why is it voluntary? It costs gas to emit events. Why isn't it standardized? I mean, it is standardized in development. It's not like you have to deploy a contract with events. And if there is contracts deployed with that event, you can also create an indexer based on that. I think that's what you're getting at. Oh, yeah, okay, for sure. Sorry, can you repeat the question? I'm lost in time. Oh, okay, why indexers mostly use logs? That's also I wanna say more my opinion than true fact, but also most popular index platforms. So the graph and stuff focus on logs quite a bit and sort of why there's so many logs and you have to find the useful ones. That information is hard to come by. And if you wanna create your indexer, as I said, you should know the contracts and the system. But also you can make a general solution like the graph or like, oh, I forgot the name. Can someone help me with their logo? It's like June analytics, thank you. Like June analytics where I can do an SQL query on absolutely anything. But in terms of why there's so many events happening, a lot of these events are just shot out for debugging purposes and they're left in the contract and you sort of have to decipher as you go. Any other questions? Yes, contract state. Yeah, yeah. So as before as I went like through storage. So if you wanna, it's getting annoying. If you want to get contract state, there's three main things that you can do. So there's access through variables. So every single public variable on a contract automatically has a get or a sign to it. So that's one way we can access state. The second one is through storage, as I mentioned before. So it's kind of tricky to find out exactly where variable is in storage. But if you go on the solidity documentation page, they will actually walk you through how storage is structured based on the contract and you can sort of mow your way through and find the variable. The third is traces. So traces are the individual op codes that a contract is pushing, which you can also request from the note. And with combination of those three, you can see absolutely everything that a contract does. In terms of sort of creating strategies based on collecting that state, I think the best indexers are ones, well that you make yourself and that purpose built and you're traversing across these different data types to create sort of your perfect arrangement of data. So I built one on top of Aave a while ago and that can be was like relatively tricky. So they have a EIP 1967 pattern. We're accessing storage to find that contract. That contract is a pointer to other contracts. I'm doing an ETH call to access the variables to find those other contracts. And then I'm calling balance to find ETH balance of those contracts. I'm also utilizing storage, stuff like that. And then you just have a really neat one row item in your database for what you want. Sorry, yes. Do you mean deploying the same contract on two different chains? Okay, so yeah, that is actually important to note. So if somebody like destructs a contract, yes, okay. If somebody self-destructs a contract, I'll just repeat the question. If somebody self-destructs a contract, I believe it's possible for another contract to appear on that identical address. And that can have different cards so it can potentially break your indexer. What you can do there to solve that system is you can have a microservice looking for destruction transactions and matching them up doing a query in your database to see if that's affecting one of your indexers and then making changes to that indexer live. So I'll normally do that through a microservice architecture. So I'll have one looking for destructs. If a destructs happen, send a message, adjust or cancel my indexer for a period. The great thing about the blockchain is the data's always there. So even if you miss a few logs, you can do that historical query and rebuild your dataset. Yes. Yeah, sure. The advantages are sort of, okay, so specifically to the graph I would say is that there isn't a lot of functionality I believe but please do correct me if I wrong to do queries across different subgraphs, at least currently. Okay, that was definitely one use case when I built them. Oh, okay. I mean, okay, so number one, I would say in general is latency. You don't have to wait on any graph node operators to fill up the graph for you. Second of all, I can create a graph that is, well, a graph, sorry, a dataset that's infinitely more complex than what I could do. I mean, I could theoretically do anything on the graph but here I have very close knit control of exactly what my code is doing and what I'm putting through. Sort of third, I have control of my own infrastructure so I'm not dependent. The graph is decentralized in a fantastic protocol but having your own dataset with your own database that you know exactly what's happening on is extremely useful. Fourth, you don't have to pay, it's free. Yeah, I guess that's like some of them but really my main pitch for it is that this isn't that much code. It's not expensive to run. It's mainly really just doing HTTP calls. You can run this for free on any cloud provider on like the ET micro or whatever the AWS equivalent is. You can run it on the free tier. Superbase has free Postgres databases. You're running your own infrastructure, creating your own DAB. And it has no cost to you whatsoever. Yeah, anyone else? Yes. So yes, there is a way to handle this and you can subscribe. So there's a function called subscribe new head inside the Geth Client. And every single time there's a new block header or an uncle, you'll get a message there. So you can make a service where you're looking for uncles. It'll give you the information you need and then you can look at what that uncle did and it should produce new events and it should tell you like the removed tag for the event, for the log. So you can retrieve, so when your block hash gets uncalled, like let's just assume that you detected properly through a subscribe new head. You can re-request that block through the block hash and then it'll give you inside the log field a removed field, which is a ball to tell you if that log has been deleted or not. And if it has been deleted, then you can mark that in your database. You can delete the row or you could, it depends on your private key, sorry, not private key, your primary key set up. You can delete it or change that variable to true that it has been removed to mark that it's not valuable. Okay, looks like we might throw Rave here instead. Yeah, anyone else? Yes, sort of hardware that you need. So I'll say like for a couple of different components. From a database point of view, absolutely. It depends what sort of service you're running, how quickly you wanna retrieve that data. But in terms of storage, I don't know, I would say maybe six, 700 gigs for the database. The more difficult component is if you wanna do that historically for every single block number, you're gonna run into hiccups with having nodes, requesting that data from nodes. So that's gonna be very expensive if you're doing that through Alchemy or Infura or one of those node subscribers where you're paying per call. And if you're running your own node, I would say one isn't enough if you want it to finish like this century. So what I would do is run, you know, whatever 20 nodes, hook them all up through historical data so you can use snapshots now. So it's nice and easy to set that up. And then have a load balancer and distribute out your calls into the different nodes. And that's probably the quickest way to do it. In terms of the actual indexing, pretty lightweight, I wouldn't really worry about it. The way I normally run it is have a Kubernetes cluster and then just auto-scale as I need it. Yes. Sorry, I'm sorry, I'm sorry. That's a long-term performance. Can you read the structure of the data? Okay. Okay, it really depends what your application is doing, but what I would say is that for long-term use, I would recommend a relational database system, so SQL, just because you can get a lot more use out of it. And if you're storing the data long-term, you probably are imagining some use cases that you don't have yet. No SQL I would suggest for when you need a really quick retrieval of that data. But in terms of sort of data strategy, a pretty common one that I'm being successful with is time partitioning my database. For those who are not familiar, partitioning is sort of, imagine you have a CSV and you have a huge Excel file. Now instead of having this huge Excel file that I have to search through, I break out the Excel file into like five different files and I define them by a range and now I can just search through that range for a particular transaction. So there's lots of time partition databases, but I would time partition your data if that's relevant and then create indexes on everything the regular blockchain does and also anything of direct interest to you. So just like a congregate index, so whatever you want, like block number plus the value of some variable plus like topic zero hash, for example. Also another good thing to do is to break apart into different tables, your events. Yeah, I think that sort of goes for best retrieval. Obviously also, if your database is getting huge, also separate out into different nodes so you're not paying for like vertical scaling. Oh, the other one I didn't mention is graph databases. I haven't seen that much use for them. I haven't used them myself, sorry, but that's also an extremely powerful way to store your blockchain data so I can see the relationships between different addresses or I can see the relationship between, for example, like oracles and the R-Bay market. Yes. Sync for indexing. So the different channels, I mean, it's parallelization but essentially I don't have to wait for one request to get through before I do the other one. So it's very, very useful. So I'd say if you're doing the subscribe filter logs method and you're getting one event at a time, you're probably absolutely fine without using a routine but if I'm doing the filter log method and I'm requesting a huge amount of logs at the same time and I have additional calls to do on each of those, that's where parallelization will really help you out. You'll be able to make thousands, millions of calls at the same time instead of one at a time. You're gonna be cutting down the amount of time you need to index from whatever, weeks to hours. Yeah, you can insert more than one row at the same time. I could run multiple index programs, for example, from like block range zero to 100, 100 to 200, 300 to 400, and they can all insert separately. So I'm finishing that index far quicker than if I had to go from zero to 400 on one routine or one program. Yes. Yeah, I mean, look, note providers are fantastic and highly professional. So you're gonna get amazing service and you're not gonna really experience much delays there but you can set up your node in GCP and you should not really have too much more of a delay. The bigger risk of running your own node is if you make a mistake or you're missing a geth update, a lot of the manual stuff you have to do around managing your node is automatically done by node providers while you have to take care of those operations yourself. The music out of me, it's just over here. Okay. Mempool data, yes. Are you trying to do like arbitrage or something, somewhere? Not testing or something else. Okay, one thing that you can do is that you can run multiple nodes in different regions because the Mempool is not necessarily synced between all nodes at the same time. So a popular setup is to run three or four nodes in different regions and they'll have different peers and you can congregate that data yourself like through another process and that way you'll get a much more complete view of the Mempool at like block execution time. Yes. Sorry, I have like five more minutes. Anyone else? No? Okay, fantastic. Yeah, thank you for listening. Here are my details. Thank you.