 going today by Amber Data, Alefio, CoinGecko, and I'm going to do an assessment. And I'd like to thank the theory foundation and all the volunteers who have made this week possible. Before we get started with our speaker from CoinGecko, I'd just like to let you know that this is going to be an interactive session. So if you've come here today with these related staff needs, challenges, or something that you can contribute to the community, do you feel free to write on the post-it notes that are on the end of your aisle seat? And then we'll make sure we get through that today. Yeah. So before we get started, just a quick context of my conference. Hi, everyone. I'm Maxime Fander of the CEO of Sensible. Before I joined Groupon 2016, I spent, like, seven or eight years in the yoga community. So I was learning actually, studying yoga and meditation. And a few things we learned there is that there is no distinction, actually, between what we hear and what we see in the world. In the yoga, we don't make difference. What I have here, what I see outside, all the same, all the one. And the second thing, we do see a lot of challenges, or painful, or sometimes ugly things. And around us in the world, that's what I think, but sitting here, we want to improve it. But the way we improve it is not what we destroy. And we just try to make stuff, step by step, solve this whole. And we are here with the different projects. And we are doing one thing altogether. We're trying to make it better and easier for, and I'm talking now on a practical level, people work with the data. And we're doing maybe from different angles. It doesn't matter that different projects, it's still about doing one thing. And we want to make it better and to show other people how can they jump on board very fast? How can they be part of what we're trying to change in the world? How can all of us be inclusive at the same time still keep our focus on what we're doing? It's good things to be a little bit different. So everyone who will come on stage at least will share what you've got and give it to people outside and see how it can iterate. First speaker today is Tien Ly, the founder of CoinGecko, which is an analytics platform for tracking cryptocurrencies and blockchain assets. Since founding CoinGecko in 2014, Tien Ly has closely followed the development of crypto-economics and the real-world applications of blockchain technology. Apart from that, Tien Ly is a contributing author of two books on the topic of digital currencies published by Elsevier. He holds a Bachelor of Science in Computer Science and a Minor in Psychology at Purdue University. Recently, he was also listed on the under 30, forms of 30 out of 30 in 2019. Please join me in welcoming Tien Ly. Thank you. Thank you, Serena, for this opportunity. So for those of you who have followed CoinGecko, basically we are a market data provider. For this case, we had some research work that needs to be done on ethereal data set. So this is more like sharing the experience that we face in terms of finding all these data points on ethereal blockchain rather than to share with you the options that are out there. So yeah, in this case, we are not going to be interested in making transactions or making contract calls changing the state of the ethereal blockchain. We're just going to get down to exploring the data that is already on the chain. And typically, the kind of data that you need to be interested in are blockchain transactions, smart contracts, how much balance are in contracts, the events that are needed, these are addresses, and some events that were locked from the project when something happened. So when we first got in, the most obvious thing to do is to run a full load. Your options are like GAP or Parity. And then you have the node money and have it synced up, which takes a couple of weeks. And then you run some RPC calls and turn that out into a CSV or some sort of a data that you can use to conceal. So this is great. You get full control. You have all access to data, but it's quite tedious for us who just want to get down to the data and get running immediately. The second best option is to use a node as a service. So you combine Confure, Outcoming, instead of running the node yourself, and then plug in the same way exactly of how you interact with full node using RPC calls, some code to turn up CSV. This is great. So again, it might be doing too much than what we expected. So the third option is to abstract everything out and consider using a third-party API service. So what they do is that the third-party API service will probably point out certain key things that people are interested in and abstract it out. So we only interact with the rest of API to get like JSON or CSV for us to work and analyze those data. So a few examples that you could choose from either scan or ever data, you probably need to pay some sort of money subscription for some of this. Great thing is there's no setup, there's no need to work with full nodes, and these data are indexed. So you could just grab all these data immediately based on what the third-party offers. So the last one here, which is something that I think lots of people are not aware of, which I would like to share a little bit more because this is what we ended up using on our side, is the BigQuery data set for Ethereum. So I think this announcement was made end of last year where Google had this initiative where they would index the entire Ethereum blockchain onto BigQuery, which makes it really easy to extract certain set of data. So the great thing about Google BigQuery is that the data that you extract, which could be as large as you want, can then be integrated with all the other Google services should you need to do so. And you could also export it out into a CSV or JSON, just like how you would deal with an API. So this is quite nifty. The difference here is that the data updates every six blocks, but this is kind of unofficial because you can't really find the kind of documentation. Out there, documentation says it updates every 24 hours, but based on my experimentation, it updates almost every six blocks, which is kind of good enough, not real-time. You pay for BigQuery and it's good for non-developers. So if we have statisticians or people who are not technical, they could just get down to this without any set up of API to get those data up. So this is an example. I want to find the addresses that have significant outbound transactions transfer for USB-T. I could easily plug in a BigQuery SQL-like instructions, make the call, keep in mind that the charges is $5 per terabyte. So it really depends on how much data you extract. You could be a little bit smart in how you choose the keys that you want to query in order to save costs, but pretty much this is your constraint in terms of costs. And then you have your options. You want to export it to CSV, JSON, you could do so. And this is an API from that data point that I extracted out. Basically, the large circle that you see, which is not very clear, but this large circle shows addresses that has large amount of outbound transactions. So this may point out to be exchange addresses if you will. So these are the kind of things that you can do in BigQuery. You can easily extract our data, pop it into a graph solution, and get some data out of it. And then before I came for DevCon, I think someone was sharing something quite interesting where imagine something like BigQuery, but it's social where you can share the queries and the results around. And it also acts like a notebook, like a Jupyter notebook. I'm not sure if you can use some of this Python Jupyter notebook, where you could lay out tags together with some explanation of data. And that service is called Dune Analytics. I highly encourage you to check it out as well, because all these data are available out of DevCon 3. You can just quickly make queries, extract it out. And there's one example on this ferry contract about how the ferry contract affect the price of gas. And these are some interesting insights that people share. So I highly suggest that you guys check it out. So the key takeaway is that there's plenty of tools to choose from, all the way from having as much control as you want, over to as easy to get started. So it's always going to be trade-off, but it really depends on what your objective is. For us, we just want to get in, extract some data, fix some analysis, we went to that side. But if you want to get more control of data, we could lead towards the other side of the spectrum. So really, there's no barrier entry for data analysis, condition of our developers to explore Ethereum data set. Just go out there, go crazy, and grab all this data and analyze what you want. Other than that, it's a little note that Coingego has a data API, not on-chain API, not on-chain data though, but market data only. You could use it if you like. And that's all I have for now. It's a quick highlight on the options for exploring data set. I hope this is useful to you, while we also continue to learn about this space as well. Thank you. Moving to any questions, and I'll try my best to answer this. What are the costs of using BigQuery for you? Like how much does it pay to do this or other things? Right, it's over. Yeah, so for example, for this one, when you enter the query, on the BigQuery, they will actually run an estimation of how much data you will process at the bottom right. So this is before you actually execute it. Google will tell you how much gigabytes or terabytes of data you are using, and then you have to do the map yourself. So it says you have $5 per terabyte. I think the first one, terabyte, is free. So as long as you don't hit the trash or you don't have to pay anything, once you do hit them, then, for everyone, you have to prove it fabulous for every, like, price of one terabyte that you use. So it really depends on what you want to query out of it. But you could stay within a three tier if you are creative enough with the queries. Okay, yep. Here we try to run like a distinct API yourself, like other people accessing. Yes. I think it could become quite expensive in that case. Yeah, so the BigQuery is a data warehousing solution, so it's not supposed to be plugged into a front-end, if I imagine. Most likely, if you're going to use this for an application, it will be like a background batch processing where you could control the flow of data, or the query of the data, rather than being queried by the outside world. So if in the Amazon side, I think they charge you for storage, but then you don't have to pay for query. But for BigQuery's case, if you put data in there, it's free, but every query that you make, they will charge you $5 for every query. Okay. In other words, the question on the BigQuery side, did you upload your ABI before, or since you generate the queries, how do you, maybe token transfer is just a very standardized thing that is important, but is it possible to get through it by the way? Yeah, so that's why in terms of control, maybe BigQuery may not give you everything that you want. So this is just some extra slide. So if you're going to BigQuery, you can see the list of tables that they have indexed already on their site. So blocks is pretty obvious. It's blocks data. Contracts are contract with a bytecode. So I think you need the ABI itself on top of it once you extract the data up. Logs are contract events. Token transfers, which is the one that we use. These are tokens metadata. There's not much data in there. Currently, Chasers are having used them, so I don't really know what that is. The last one is transaction data, which is every Ethereum transaction out there. So you are constrained within this set of tables. If this gives you what you want, you could go ahead and use it. And then you might have to go up one level, API or something like that. The logs here, I think, will be binary, so you need to have the ABI to decode the... Yeah, you have to decode the... I'm not sure if you can use it to find the functional structure of the code. You probably have to do it yourself. So one example of the kind of data you would get would be, I'm trying to see if I can do anything here. Yeah, so you're probably going to get it in this form. So if you're a developer, you have to call the WebTree function and decode this into something that humans understand. So you have to chop it up. Is there any other questions? I guess we're good for now. Thank you very much. And just while we're getting set up with the next... Yeah, here you go. And now, I'd like to introduce Sean Douglas from AdmirData, a platform for monitoring, searching, and analyzing public and private cross-chains. Prior to founding AdmirData, Sean served as president of software at CTO Unified, building and operating the company's rapidly expanding SAP SaaS offerings in cross-platform data management, analytics, and reporting. He has held roles as board member, operating executive, technologist, advisor, and investor. Sean is a graduate of the Harvard Business School. Please join me in welcoming Sean Douglas. Sean Douglas from AdmirData. We combine blockchain data and market data into a single data platform and serve up a less delayed guy as well as watch lockets, as well as RBCC connected data and understand what's going on up chain. We support seven blockchain today, as well as about 20 exchanges. So I want to just kind of quickly frame what is going on, what is happening right now with crypto and Ethereum. So everybody believes this is a massively disruptive opportunity, it's about $300 billion in market cap right now across all of the tokens and digital assets, but we are creating an open financial system, we're creating potentially a new internet, and this is going after a disruptive offer this is going after a disruptive opportunity that actually disrupts cloud computing, remittance, payments, store value, fee and currencies. So why is that work and why is it so disruptive? It's because everybody here in this room is building, we're able to delegate trust systems that are radically transparent, that provides transparency that's not available in traditional financial markets, that drives network effects, aligns incentives, drives behavior, and has a very low barrier to auction that we can all develop against. I used to be a venture capitalist, you're always looking for things that drive network effects, have social networks, incentivize behaviors. Crypto massively enables that. And you know it's crypto equity, it's crypto equity, I can't talk shit, crypto economics. Is the fundamental reason why this is so massively disruptive is because because that we can create these mechanisms that incentivize behavior, and because data is an input, and data is an output to these systems, we can understand what's going on, and we can drive network effects. So with that said, today, if you think about Ethereum, Ethereum is about $18 billion in market cap, that's like AMD. It's become an industry. Everybody here in this room is creating an industry. If you look at the top 50 tokens that are trading on Ethereum, that's about $11 billion in market capitalization, and that's about $17 billion per day trading. That means almost two times the total market cap of those tokens is trading on the top 50. So it's become a... It's not what we were doing last year. Now I'm going to dig into the data to actually show how crypto economics in action and the evolution of what's being built on chain. So this chart is pretty interesting because if you think about the most simple form of crypto economics, it's like I'm going to be a miner, I'm going to mine blocks, I'm going to get rewarded, and that's going to keep the system in balance, keep the system safe. What we can see here is that over time when the Ice Age set in last year, we were starting to actually see the block time slow down. There was a lot of congestion in the network and they had to do the constant level hard fork, and then once they did that, the system went back in balance. So it's a pretty good depiction of crypto economics in action and the most simplest form to improve our work. So you can also see, we're starting to see now, if you go back in time, during the big ICO bubble, there was a lot of transaction activity on chain, but really that was mostly single token transfers. Today though, we're starting to see, we're almost at the same point where we were previously the number of transactions on chain, except the composition of those is very different. The composition of those, what this is showing here, is that the actual transactions on chain today, here this is wallet to wallet transfers, and these are smart contract transfers, including smart contracts and tokens. And Ethereum starts to be a global computer and global computers run code, run software, and what we're seeing here is that the actual majority of activity on Ethereum is actually interactions with smart contracts, it's not just transferring value, which is great because that's what we're all here for. Now the whole network runs on gas. What we're seeing here is that in fact, people, the consumption of gas is driving these smart contracts is kind of the lifeblood of what's happening on Ethereum, and you're seeing, it's becoming true. This is what we had set out to, everybody in this room has set out in participating in building, we're actually starting to see it come to fruition and you can measure the adoption, utilization of smart contracts by gas consumption. But let's get a lot more interesting. Everybody here has probably built orchestrated systems and microservices and what have you. Well, last year what you would see is you would literally see a smart contract would execute a transaction, it would transfer a token, that was it, it was done, right? But what we're seeing now is things like DX, DYBX, XeroX, you're now seeing orchestration across multiple contracts. So this has a call stack of about six calls deep, which means there's a lot of interactions with different smart contracts. So we're seeing, going back in time, there was literally just token transfers over here. But now we're seeing orchestration across multiple smart contracts at five, six levels deep. So you're seeing much more complex applications with interactions and dependencies being built. It's really an evolution, a maturity and realization of much more powerful dApps being built. Back in the day, it was CryptoCities with this little ecosystem, and MakerDAO with this little ecosystem, and they really were islands amongst themselves and didn't interact. However, with the invent, advent of XeroX where you could actually swap one token for another, we can now start to see people connecting independent dApps together in orchestrating transactions across these doing enabling token swaps. With Compound coming into the ecosystem where we start to see lending, we started to see borrowing, and we're seeing leverage being brought into the whole defi system in enabling this. We're now starting to see people doing 250,000 transactions in a single week where they're taking Compound, interacting with XeroX, levering up their positions, or doing what have you. So we're seeing much more complex interactions. Additionally, with Protocol Bridges coming in, you're seeing transactions of 300,000 transactions in a week where people are taking and building a protocol bridge between Compound and a CDP protocol. CDP pulled together. There's also a lot going on, Uniswap where you have a single token being swapped for another. We're seeing 510,000 transactions for a single week where Uniswap is interacting with DAI. So you start to look at the complexity of these DAFs in the orchestration across the ecosystem, across transaction, across contracts. It becomes really, really interesting and multi-claderalized DAI. You just kind of think about where we're going to be in a year from now where you can actually take pieces and parts of smart contracts and start to orchestrate your new idea across those. It becomes incredibly powerful where you start to add scalability and more efficiency and people start to share and we're building blocks that we can build on top of each other. The data is there for everybody to see. The thing that makes blockchain and crypto so powerful is that crypto and economics allow us to create these systems, incentivize behavior, measure behavior and then have data as an input and data as an output so you can measure your system, instrument things and how they work. As I said earlier, we're a crypto-economist data platform. We have market data, blockchain data, metrics, insights, web services, WebSocket, nice to meet you. WebSocket's RESTful API. You can connect to it just like and you can connect to a full node via RPC. So feel free to come check out our platform. Now speakers and contributors from Alephio. We have and Bogdan Demichu from Alephio. Alephio is a powerful blockchain data analytics and digitalization platform. Adrian is a senior software engineer building systems architecture with a focus on reliability and scalability and Bogdan is a software architect with the love of distributed systems and he says he plays the Lego for money. So please join me in welcoming Adrian and Bogdan. At Alephio, we are trying to bring transparency to the blockchain through the data platform as well. We have an API, we have various tools, but we are not going to talk about now. We are going to talk about how you can get indexed data with no third part required. So this is kind of in the graph that Alephio first slide where somewhere in the middle this is a tool that essentially it's open source it's on github. You take it you get any RPC enabled dream client and our tool and what you get is the POSGR database and API with indexed chain data. So this is like something you can set up very easily on your own and control it. What you get you get blocks, you get encodes you get transactions, indexed by account which is yeah, something else that dApps need and you get events locked in the indexed by transaction you don't get contract messages also known as eternal transactions or traces and you don't get your objecting. So like if your happen data gets replaced you can't see what got reworked, what transactions were removed and we don't pay low decoding with this as well and I just wanted to like show it actually if my internet works and it does. So after the repo which I'm not going to do right now because it might be problematic you get this configuration file where you put in like a note in this case I'm going to start to take them yearly and you have here like a feature for lag which if you don't want to handle it with your you can set up a lag of 10 and we'll just wait for like finality there so you can get that but this is default right now so once I start it up you're just going to start getting data so once that config is set up you just do this it starts with Postgres it starts with Redis let's see everyone does go down I think it's funny seeing all the rules which are great it's in the back okay there we go let's try this again if we try this again with internet it might actually work yay it's okay so now it's started parsing so if I span up an API so if you go to this here yay let's wait for the one that has the transaction because you can look at it some of the other endpoints as well what's happening not having two transactions yay okay so now that we have an address so to this gives you like an index you set the transactions for their account and if there were like event logs you could do that as well event logs it's just like a tool that's very easy to set up and you can use it to just index your own chain and just bear on so if you're developing that you don't want to pay for any variables and you don't need a lot of data this is just like one command you get like this one cluster but for more complex things like internal transactions and the more dynamically you're handling so for example you get an information you can send a query with the date, the hash that you know about and our API can tell you exactly what got rewarded in the meantime and what transactions got rolled back for all of this visit Alifio and check out the bigger API the first model needs you can just take it and use it thanks questions? it seems like that would be challenging to use with internal transactions especially if you go back to the previous how many transactions do you think is this something you're using what do you have with that? do you think it's difficult to use without given that many many transactions are internal transactions I guess it depends on what your risk is of course at this state if you need to understand internal transactions this wouldn't be as useful or even at its open source it would be extended by supporting something but it would lock you into parity that's one thing or something that gives you traces in a good way but this is built to be like superagnostic just use any RPC you'd be able to know that's kind of like if you can avoid using internal transactions that this is fine to use if not open a poetry press that is that we try to give this to the community and wanted to see how this could evolve in a bigger mechanism of ingesting also more complex question I can give you sweets now you've got one so do you set if you're watching right? yeah but like not necessarily main just any private network or testnet basically just connect it to specify the node and that's it for our use case in the demo we use the query only for thank you thank you guys next speaker is my colleague Valentin Miho from Setsment Valus CTO Setsment all inclusive source of targeted intelligence for Ethereum Valus of software with over 10 years of experience building startups and he won a silver medal in the international of informatics in 2004 he loves building great products performance optimization advanced software like AI and scalable systems and when he's not cracking up for each cracking in the mountains or kite surfing on a beach please join me in welcoming Valentin so I hope everything goes according to plan and actually I'm going to start with the live demo because it's going to take some time to compile because I wanted to show you how it looks from end to end so we built this tool it's very similar to what the guys showed for getting your own data analytics up and running in an easy way so we call it like names for lane which is a spoiler but the idea is to be able to export events out of the Ethereum option and decode them and have them in a database where you can query them with SQL so you can build your own API you don't rely on any third party and it should be very easy to sell so in order to start there is a project generator so you need to install this thing which is a project generator and then this is the generator can you zoom in a bit so you need to install the Yoma which is a project generator for end and then this is the generator for this project then you create a folder which is your exporter so I'm going to start doing that so I'm going to create folder friends and then I'm going to ask you now what's in the name of my constructor and it's going to start building ok now why would it build it so why we actually do this so we have a blockchain and in the blockchain we have some kind of events or transactions that are coming in and usually the blockchain has some it's going to have some API and we need to have some kind of connection so when we build that we need to have some API because we want to create a UI like for example if you build compound you want to have a website for compound to be able to see when somebody deposits something to show that oh you deposited here is your balance or you want to see what is the current state of the market how much money has been invested what is the for example the amount of money being deposited every day so it's not only about fetching what is the latest transaction but also you need some kind of aggregation on top of this and you need such kind of aggregation not only for UI cases but also for example if you want to do fraud detection let's say or whatever your needs are so usually the full nodes don't provide you with an API for that what full nodes gives you is give me all the data for a given block or give me the data for a given transaction or tell me what is the latest block so they don't have any aggregation capabilities they're not able to say like select from all the deposits group them by day and sum them up and stuff like that so the current APIs that you can rely on are either centralized like hour or they're not real time like big query let's say actually I was quite surprised that right now it's six blocks behind before that it used to be like 24 hours but also as going to get people mentioned you need to have some batching on top of it so it's not like a real time thing that you can query all the time so that it's not going to work so if you are in the case where okay I built that I want to export the data somewhere and I want just run some SQL on top of it to get all my analytics needs then there are not so many options out there also but it's great that they are and it's amazing so analytics pipeline is going to look like something like this so you have a full node then you have the data extractor which is going to extract the data from the full node and then it's going to put it into some kind of a query system in order to be able to query it so why it's hard to decentralized that it's hard because it relies on a lot of state this is like the biggest problem like it's very hard to this state needs to be the full nodes they have the state but it's organized in a specific manner that depends on how the blockchain works and it's very hard to build and maintain this state so the only way we can kind of figure out how to decentralize this is true open source let's make tools that are open source and people can use them to extract the data and to get their needs so if you're able to just have an easy way to run a tool that's going to extract the data put it into a database and you can build APIs on top of it then everything will be good now why we do this like as a data provider well we provide kind of more simple type of analytics and then on top of it we're going to provide some more complicated things that we're going to do the APIs for so this is basically why we do it but this data should be available to everyone so it's good to have decentralization and cooperation so this is the URL to the project that I'm just showing so let's go back to the demo oh it's done awesome okay so everything working so far so we need to provide a URL to some full nodes for this we're going to use infura because this is the way let me explain okay this is my infura project I'm going to copy the URL to infura and I need to put it into my configuration but here here I need to put here like htps okay and here I'm going to specify from which block I want to start syncing this is how many confirmations I want to wait for so that we don't need to handle reorganization and here is some batching so let me specify this one because I think there is more action in there and now I'm going to run it with docker okay so now the pipeline is going to start so this project here that is generated uses the api of compound I found the compound api the compound documentation to be very good so it's it's very nice to work with so it started to sync the events now and it's going to decode all the events put them into a database and this is all happening in real time so let's see what we have here so first we have a very simple let's look at the code or maybe open the database so we use a click house as a database it's very very fast analytics database it's colmar and we are managing to handle billions of records with that so most probably it's going to do you some good work if you don't need it so I have this is the main table and I'm going to select from events I'm going to take the five records and what we have is for every event we have timestamp the address from which the event is coming from the hash of the block and here we have the decoded event so in this case we have a pure interest so this as I said compound and it shows you like what is the like borrow index interest accumulated for this event and all that so if we scroll up we see we have mostly a pure pure interest here so we have also a cruise basically we put the api the api is going to extract all the events across all the in this case I think it's limited to certain addresses but you can extract it across the whole whole chain and it's going to as I said sync in realtime ok let's look at the code so this is how this border looks like this is basically that so we managed to condense it into this like them lies so you get the api so you need to have the api in this case it's the compound api you instantiate the class you say extract events with api that's it and here is a simple rest api that is built with micro it's a javascript framework and right now it gives you what is the total amount of events and what is the events over time so I can now go to the congo and say curl I think it's some port 2000 api events over time and here's the aggregation that I get so it's currently syncing so it's going to take some time but yeah I mean this is how it works and you can take this thing and deploy it for hosting like you only need docker you you deploy it on digital ocean or something like that and you get like analytics api for any smart construct you just need the api actually last night I was playing with the dybx api and it managed to extract it so here is like an overview the full notice in the funeral the data extractor is this repo and the scalable system is this clickhub database so that's it if you have any questions I would like to answer because the events you're interested in that is it only focused on events or do you this particular project is focused on events because we wanted to make it as simple as possible the truth is this open source tool is using another open source tool which is more low level and for this one we actually use it for all kinds of things for example I can show you on our github it's again like an open source thing we use this lower level library that extracts all the trades from centralized exchanges and it's basically a game like JavaScript it's a bit longer it's like 6,000 lines of code but it extracts all the trades from these exchanges in real time and pushes them to the database and now we can analyze it this is also open source we can use it what would be your idea to use this for this well I think it can be used basically for anything like if you want to have very granular data where you say I want to take the latest transaction the latest water or whatever up to I want to get some aggregation because with this database particularly Klikhaus like the reason we use this and not Postgres let's say is that we found it's very fast in doing aggregations and analytics it's developed by Yandex so it has a very strong community behind it and we've been able to run a lot of the things pretty much online without any need of doing batching like Chrome jobs or anything like this so I would say now you need to be careful though there are some trade-offs like me in software engineering there is always trade-offs so one of the trade-off is that with this database if you want to do some very complicated joints might be a bit tricky especially if you have two joints that with tables with billion records then yeah that's not going to work but if you have like a billion records and maybe a million records that would be fine so still like if you want to join all the transactions with all the blocks it's going to work but maybe it will be slower if you want to go through the whole history so you might still want to do aggregation batching and the other thing is the way it works internally it's possible that there is a specific syntax that you need to use which is different from Postgres and so you need to be careful because the database might do some duplicate records so in order to avoid that you need to use for example select from final and then you don't get it so there are some kind of catches like that you do can you start the pipeline with multiple APIs or is it like one API for instance multiple APIs API or just one for instance yeah currently it's one for instance but you can share the same database so I mean if you look at the Docker compose so this is how the Docker compose looks like so we have like Zookeeper and Kafka are just storage Crickhouse is the database and exporter is basically the script that's doing the exporter so you can run multiple exporters here and connect them to the same database and this server is the API that I showed you these are the components what you can do actually you can probably merge the APIs together into a single API because it's just a super API because the API is just the list of description and you can just merge the two lists and then use the super API if you have a posted at the end of your seat area you want to note down three things so something that you need, something that you're looking for in terms of data supply any challenges that you're having you can kind of hack them out here and yeah anything you can contribute so if you're looking for opportunities, this type of stuff if you have a skill you'd like to contribute share those types of things and just before we do that try to like to gauge the level of the audience here and what presenters have presented so I'm looking for a little bit entry level I would say more advanced one thing that hasn't been done is watching mempool transactions sorry watching mempool transactions so depending on what happens you can confirm it and be able to get live streams of ads if anybody knows about that you can do it on our website you can also watch the same you can also watch member data member data yeah anyone else looking for something specific I can contribute something maybe that's the time now because I'm with good events and we do internal transactions and we also have all the APIs so we do that for function calls so we have over 90% of all function calls covered and we put that in the postcards to the database and in the elasticsearch cluster so you can use either whatever fits your needs and grab all the data for free you can actually do the math and time series on database with that clear text data with us so if anybody wants to use us feel free it's like easy.events and all the data is there it's your game hey in the beginning of this year we built a a ceremony of database of the house it's a request you can start to need one comment and we also use click houses of base case for analysis of the data and you need only one schedule of server one week of to provide a model and you can use all your all the server data and this database is under mid license and you can use it for you balance here how to prove that the data on metrics are not faked do we want to have the speakers back up here you don't have chairs or anyone in the audience so for this question that's actually one of the reasons we've been thinking about how you decentralize and if you're using an API from a provider you can't really be sure if they're not having it could be like a mistake in the data it might not be like a deliberate error or something like that and the only way I guess is to figure out how you decentralize so you don't need to trust the third party well through open source you need to use your own open source pipeline so if you want to do that then using some kind of a tool that you run locally and you connect to a full node that you trust that will be the way to approach like I don't think there is any other way that you can really trust I think the blockchain is the source of truth so anybody who aggregates data like you or like us or like in Europe how can you verify that that data is correct and what we do is we allow every single API call in return we do the basic the virtual proof from our chain data so you can always compare our on-chain data that is correct with the off-chain data because a lot of people that aggregate data because of the blockchain organizations they collect wrong data so if you don't continue as they scrub that and be able to verify it then you could be serving that data so by serving your virtual proof you can verify so that's what we do so because otherwise people are going to you know making it something else but if you serve an aggregation like let's say you want to show number of transactions per day and there is another interesting question that we can actually measure that in different ways I mean if you talk about number of transactions for a given token is it only transfers or does it include maybe some other operations like does it include also like approvals and that kind of stuff then it becomes like if you serve a given block then you can verify that I mean block has a cache so it's fine if you serve a transaction we can verify but when it comes down to aggregation then it becomes tricky I want to agree with you if you count things you can count things in several ways for example the easiest one would be do we include the Genesis block as a transaction or not that's already a difference of one and there's many many more if we go down the rabbit hole there will be many more for example do you count different transactions differently than normal transactions they're just like transactions very fast that is the problem you put in a separate table that's included in the same table and also when you start aggregating the data actually you're going to find some really weird so one of the things that we found is that during the ICOs most of all these pre-sales sometimes they don't advertise events for the pre-sales on the blockchain so you start aggregating for example you want to find what are the top holders of the given token and when you go down the rabbit hole you actually find that in smart contract there is a list addresses that are going to receive a bunch of tokens when the contract is created let's say and this is not advertised on the blockchain at all and even more interesting thing is some of these top holders are not even shown on EtherScan and what happens is that the first time you open an address for a given token on EtherScan they're going to weigh the full node figure out that they have some kind of balance with this token and include it in the top holders so we've found folders with millions of dollars of tokens that hasn't even checked on EtherScan yeah I mean it's crazy so installations are not cool what? also it's interesting how they handled the Dowpack like can you explain that more like they are called the change the balances in the the serial client without emitting any transactions so there is something called counting and we actually have implemented the counting yet so we asked node for that but the question is who owns what address owns what balance of tokens or EtherScan special thing and that is explicitly hard and we think the most robust way would be to bury the node for each drop backwards because the node is the only truth that actually has to stay otherwise if you do the math yourself so if you compute the state I think you can say something on that as well now let me say who was it that you found that could say you can't recompute the state of the node correctly and this adds up to this so you actually have to ask node but in order to ask a node you need archived node which is like four terabytes or like two terabytes of storage also run archived node so everybody wants an RPC quad archived node don't advertise that one if you don't have a transaction for an event how do you know what address to ask for like you need to ask the balance but how do you know what to ask for you can actually ask the node for the balance of that address how do you reply the balance of this address was created without meeting the transfer event or without the transaction in the case of Dalka how do you know that you have to ask for this address well then you don't if you don't know what to ask for then you don't have the most basic information well but you might be asking I want to get for a given address all the balances this address has for all the socials that implement year situation and then it becomes complicated this is a way harder problem you want to say hey what's my time series historical account balances for like you know all of my just my last 360 days change the token balances continue to like cultivate million blocks take all of your logs extract your token values then go out and get market prices map that out it's like a massive amount of things we have that available today it's nice how did you get it you asked the node for every bulk benefit just like you and probably a few other people here know like we replayed every transaction since the Genesis block extract make that easily accessible blockchain is not easily searchable for us we don't do it through archived nodes because we figure out this is going to be insanely slow so for the so we basically reviewed most of the contracts and found such kind of discrepancies actually it's very easy to find discrepancies because when you start like if you look at not the top holders but the small holders if there is if somebody gave if somebody gave tokens under the table let's say and this address gave the tokens to somebody else you have negative balance so when you see a negative balance you see that somebody is doing something funky on the contract but at the end it has to sum up you will never have all the patterns you can't do that actually there is contracts that the whole supply is higher than the amount applied on the smart contract it's possible but I can tell you a couple of accounts they just don't essentially you are saying it's not deterministic it's not something active or somebody there's something going on somewhere there's trading tokens on exchanges today that have circulated higher than the total amount somebody just transferred tokens minted tokens without adding them to the total supply basically the total supply is just like a few it's a few I've also seen events that like they said that someone transferred like 10,000 times the amount of the total supply to a new address the transaction failed but the event was still amended that's why you don't draw for events events for a county is a bad choice events is totally optional for a developer and you essentially can write in there whatever you want you can actually do something different you could do a call and invent something different you could rewrite what the event would be I'm not aware that someone did but that is totally possible I think this particular game that you see a book in the smart contract that they were emitting events even though the transaction fails like they check do I have enough money to transfer oh I don't but I emit an event before that and this is like I think at the end of the day this contract is unusable basically this is a hack it needs to be marked as this is garbage and thrown away because if you have a wallet that needs to work in these things the wallet there is a high chance it's going to rely on events and it's going to show like crazy stuff so I don't know events should be like they're built for having a reliable view of the state of the things how they change not all contracts are like and in the other interesting thing is there is not really good standard like ERC 20 is not a good standard at all like there are no events there for minting and for burning though so how do you calculate the total supply of effects you can there is a better standard which is like I don't remember the number but nobody uses it this one you don't remember the number yeah that's exactly what we learned too and I don't think that events have been meant to be a reliable form of communication it's totally optional because it's on the hands of the developer how to use an event within a function call it's not a precise log of a function call it's something additional what is your connection with the outside world the smart contract needs to have some kind of a connection from your outside world and the events are the causes it comes back it's up to the developer and it doesn't have to be the same as the function call or in a tight relationship it can be a loose relationship and that needs room for interpretation it's not a definition well on the bright side I see for example for Davey I see some very nice events like the smart contract that are built like the latest generation seems to be much more well behaved from events perspective so if you want to build some dashboard about DeFi that drives like interest rates and borrows and all that events will be a pretty good way to do that so it's made really? it's like a lot of weird stuff going on well maker I mean maker is weird so it's on like very weird field names and that's how it starts you actually have to write a liquor puzzle before it ends which sucks this is actually you might know as well but if you go down the rabbit hole and interpret events and use them as means of communication the more complex the smart contracts get the worse it gets and I think there needs to be some sort of similarization in that process as well the key value of field names is just to start and there will be way more for say not proof of work for cheaper chain you could actually use this as means of interface like heavily rely on events that would be a total move change how to interact with prop chain and I think this will happen like on proof of quality chains for example the transaction is cheaper yeah well something that we noticed is that once you remove the proof of work like with EOS and a lot of spam starts to happen so this kind of analytics it starts to take huge amount of space for example for EOS if you want to crunch the whole work chain and put it into analytics database and input all internal transactions into the mix so basically throw out everything like with the chips it's like there are bytes that compress JSON like the whole thing and it's been running for one year yeah I wonder what these guys are doing you have like found a really safe clock time with Solana so I wonder he's not from Solana, he's actually from Sanctum oh yeah but like they are really fast and they have like insane rates and I wonder what you do then for interesting if you mentioned EOS when you mistake you get kind of those side tokens I forget the actual terminology but everapedia, EOS stack will they get lost in the spam and their value do you think will they use case continue on despite the spam what tokens you referred to everapedia is one it's decentralized with the EOS stack as well but the spam is more like people posting random stuff there is this side contract that they are not even doing tokens like this spam is not even transverse of tokens just people publishing some binary date on the blockchain or yeah I mean it's really weird but you need to keep it because you are doing analytics so you need to have everything and it's insane I think for one year of EOS history it took us maybe six months in order to build a special full node we need to put into the full node itself in order to be able to extract the data fast and it took one month to extract because it's got four terabytes I suspect there will be more for example big chain DB so data on the chain or as I mentioned Solana I think this will be new challenges but how do you how it is a blockchain if it's hearing this amount of data like you won't be able to run this anywhere Ethereum right now it's possible to run it like if it's not like archive you can run the full node it's probably going to take 300 gigabytes but still there are laptops with 300 gigabytes of this so it's still possible but with something like these other blockchains I don't know how this is going to work it's insane it's not decentralized anymore well I suppose for Solana we have to ask Anatori he gave a nice thing yesterday he's not here but there's many others big chains they hook us in a private environment machine to machine communications so there will be many of those big chain things and I suppose there's many others they all have just different use cases probably not a big chain but they all have the archive problem and the audit problem as well if you like have amounts of data you tend to chuck it off and fall away at some point so the only thing that has to do is to node in the state with power on if you lose the power you might lose the state as well that's why we exist didn't Ripple already lost some of his state like for the government but I can imagine who wants Ripple Ripple Ripple Ripple is it for people of mine Ripple is also an interesting case like I was seeing this guy that was building XRP scan so Wither scan for Ripple and he was boasting how I run my own node now and it's on a pizza box in this data center it took me 3 months to sync it and like yeah and come on man you need like 10 terabytes of storage or something like that and they actually write this in the documentation if you want to sync a full Ripple node you need at least 10 terabytes of storage and it adds 10 gigabytes per day this is only for the node and it's not for transfer value of the tree base yeah I don't know what's the case there it's uh we crunch it so there is activity there but you actually run Ripple nodes and analyze Ripple data we don't run our nodes we rely on external nodes it's just insane I mean there is no Amazon servers that much hard drives we can't believe it the famous ones Ripple moved when they were coming on cloud they moved their Ripple node because that was the only one that would give them that many terabytes the other side providers couldn't do it so we use their own their nodes we need to rely on the Ripple nodes in order to start today because yeah it's just not possible to run a full node so this is how far the testing presentation goes by the way it's Alan from Google Big Query you can ask him questions because Alan you missed this one the very first presentation was using Big Query to obtain data from here so happy everybody is finding out how many Big Query users are there in here ok that has been his I hope for well even though you're all welcome to do it you still can't grow that was very good one more question do you ever plan to use ABI's and translate the data so you can actually do math using Big Query with property data well say that again but if your data is not integrated it's done so you need ABI to transfer the next binary data to new numbers and to use Big Query with calculations to translate the data do you want to decompile the ABI's no use the ABI to translate the data to execute the data input it and translate it into something else you can use k-text in numbers so you can actually do math I think you can do this by using the are there some ethereum javascript libraries that could be used to do this and I have to do math afterwards myself what he means can he say from his bill when it's part of Big Query if I understand his question so I remember Nick Johnson did something like what you're describing with the FJS library by calling it from a so all of the Big Query UDF's are written in Javascript and he found a way to shim in FJS into a UDF so he said he could run an EVM emulator inside Big Query so there's a lead for you I don't know how far he got on that but he was doing something like that I'm not familiar with the terminology but that sounds like a story does he check kind of things UDF stands for user-defined function so you call a Javascript you include a Javascript library so you can then call these functions ENS apparently is using as part of the registry they have some contracts so we wanted to reproduce the results of the contract calls inside of Big Query calls I'm probably not doing justice but yeah, something like that so it's already supported well I have to do it by myself I have all the APIs and function has to work out but that's good because then they come to us I guess the right thing to do would be to take the if you know what contract was called and you had the ABI you could automatically process it but the problem is that the structure of the inputs and outputs are very flexible so we couldn't really define a schema that would easily support all of that it would be easy to query it would just be kind of nest or repeated field of blogs which doesn't really buy you anything other than having done the execution of the ABI so you need to it's not very structured but it's a list of key values every contract has different key values so in Big Query you need to define a schema so if you index a thousand different contracts you have different keys and a majority of them it will be flat so with this pipeline that I showed currently the schema there is unified for all the events so it has this field called decoded which contains a JSON basically with the decoded data and then you have a function in the database called JSON extract you specify the key it's going to extract the key so you can then get the data this way but there is also a possibility it's something that I didn't show like with these five lines of code you can plug in a callback and take the decoded data and emit like a flat JSON out of it so you can rearrange the data whatever you want and then there is a SQL script used for initialization to create a table and you just need to create the tables with the same schema so you can adjust it in a way that okay I'm parsing let's say compound events like these events and this is my schema I want to support so I'm going to translate it in the script and insert it denormalized so it's a possibility to do that but yeah I mean it would be nice to get some feedback and see if this actually solves any problems for people we use this pipeline for our own needs for many of the things and it's been working pretty well so we extracted it from our pipelines but yeah we use it a little bit different like we tend to normalize the data into fields we don't want to because it's slower but for the table and for the project you need to have a great fight so yeah it might be possible to extend do you have any kind of problems for dApps that you write like you write a dApp and you want to build UI for it or maybe some analytics for it is anyone in the room visualization, aggregation any stuff like where you can find the cheapest for the highest level of loan rate at the moment yeah or alternatively somebody was curious about mempool talking about data in the mempool this is danger need that so anybody else interested in mempool mempool is an interesting topic yeah since there's no problem is anybody of the solution people doing mempool currently we look at the mempool I don't think we have exposed any matrix on top of it but I believe the mempool is useful for front runners yeah a prediction prediction you can see immediately what things are changing yeah it's about was it like a gigabyte per day or maybe I'm wrong for one, two, three locations for the post-itsiatic we just run the nodes and get the mempool in the node that's like we haven't done much complicated like we just do that something like this I need to double check because I was recently working on trades from centralized exchanges that was about for these 5 or 6 exchanges about one gigabyte per day so maybe I'm thinking about that so I need to double check we have a use case for mempool from wallet provider they want to display for their customers that transaction got submit and it's not good enough to have the first block which we actually have but yeah but what about mempools has anyone built something workable or a little bit simple you wanted to say something about mempool somebody asked and posted but then left I guess yeah for us we do basic mempool aggregations to explore the state of press action that we've seen determine the time spent in the mempool but nothing fancy aggregation we're thinking about it but we didn't really find a good use apart from what you mentioned showing the transactions as soon as see by anything but that's like a basic use case that's just a real time ask it so you don't keep any data for a longer time we do keep some records of it but which has just happened we also use that to figure out what transaction and I kind of have this list what is that hygiene paramedic yeah basic question if you want a full note what is partial I think the mempool of that now yes it's full but it's geolocated it's just like in that area of the mempool and that's actually another if you want to prediction you need to have at least a bunch of condiments so you need to have geographically this is for full notes they're different that's actually a nice question how much do they differ usually they shouldn't differ much but they might in terms of when the transaction is first seen where is the mempool the mempool as far as I know is not equal over the globe there are different mempools of different over blocks so the transaction in the mempool is first but in history they're not it also kind of depends on where the transactions are mostly coming from so if you do metamasking for it that's the source then that's going to be your main that's also that they get propagated from there it's hard to know when you see a transaction near a note what hop of mempool distribution that transaction is like how long they can have like 500 milliseconds with the first note that's possible you just see it in a block as soon after you get the mempool is your TV right?