 All right. Well, hello everyone. I'm Adam Link. I'm an engineer with AirSwap and today we are going to be talking about running a Gethnode cluster in production at scale. So as was alluded to about 15 minutes ago, there is no open-source way to run a Gethnode at scale. We will be open sourcing our production infrastructure in about 20 minutes. Yep. So this is the work of about three developers over about two months to develop the infrastructure with a lot of the pitfalls that we experienced along the way. So it's at least something for you guys to reference as you're building a Gethnode cluster or perhaps use on your own. So at the core of this, dApps need to keep state, right? When you run a dApp, your data store is inherently distributed. And distributed data store means that you just stay in sync with your peers to decide what the correct view of that data actually is. And there's a whole host of issues that can occur if your front end or your back end application does not properly reflect what the consensus of the database state actually is. And there's nothing worse than users not trusting your application because they can't trust the underlying data. But you also can't sacrifice the responsiveness of your application when you're waiting on an eventually consistent database. Users on the internet are generally not accustomed to waiting five minutes to do a state change operation. And while we in the industry may be okay with waiting for transactions to settle, if we want to actually get mass adoption, we need to understand what the general public wants, which is a lack of leg when they're doing a state change operation. There's two types of data that you need for state management. The first type is the balances on your accounts, right? This is the ETH that you hold in your wallet. And for many dApps, this is a critical piece of data in the user onboarding process because they may need to swap their ETH for your proprietary dApp token in order to use the product that you're creating. The second type of information is the the state tree. This is information about events within your smart contract. This is things like the user's token balance, smart contract function executions, and token balance transfers. However, contract events, which is the state of your smart contract, are actually implicit and computed based on the data in the blockchain. So this means that your dApp needs processing power behind it when you're running your Ethereum nodes. If you're out of sync with your nodes, obviously, you're misrepresenting the state of the world, and that's a dangerous place to be in as an application. So with the importance of staying in sync now for your dApp, kind of resting on your shoulders as a developer, some of you guys and the industry as a whole, we've wondered, can we just outsource this and pay somebody else to do it? And the answer is, of course, you can. There's many companies out there that actually offer services where they will run nodes for you. Not all of them are geared towards OLTP processing that a dApp needs. Some of them are more analytics based. You know, Google recently came out with a product like this. But the question we need to ask ourselves as an industry is, do we want to move towards the MasterCard Visa model, right? So right now with credit card processing, there's only a few companies in the world that have the technical expertise, software, and hardware to actually keep up with the number of transactions that happen on credit cards. And as we move state management towards centralized parties, we move the entire industry towards that model again. And the question is, do we want to do that? Do we actually want to be there? The solution in our mind to combat the centralization is having people run their own nodes. But running an Ethereum node at scale in production is really not well documented online right now. There is also costs associated with running your own nodes, not well documented either. So not only do you have the overhead costs of the actual infrastructure for running your own node, but you also have costs on your team to create the infrastructure and create the software that manages your nodes and make sure that your nodes stay in sync. There's a lot of resources online pertaining to optimizing nodes for mining, but there's not many resources optimized for reading and read heavy workloads that adapt needs. It gets even worse when you start going into data querying and trying to understand how you process this data. There's very few resources online for data querying, for taking block data and storing it internally. We as well as everyone else who was up here before have created tools for doing that. We're not outsourcing those yet, but that is something that we are all creating our own data querying layer on top of the blockchain. And there are some companies right now like the graph that are actively working on making a way to query using GraphQL, but they also still require that you run your own nodes or you find somebody that can run nodes for you to get the underlying state data. So right here is a simplistic view of our Gath architecture. We run a production cluster and it's basically a group of load balanced Gath nodes behind, sorry, in an autoscaling group behind an application load balancer on AWS. The application load balancer allows us to upgrade non-secure requests to TLS. We can also run very tight health checks to cycle out nodes that are not healthy out of the load balancer target group. So if they fall behind, for instance, like the head of the chain. And usually what we found in our internal tests is we recommend setting the frequency of those health checks to a time that is less than a single block propagation time. So right now that's doing a health check inside of a 15-second window. This allows nodes that fall behind to be dropped out of the pool quickly so they can sync back in so you're not querying nodes that you know are behind in state. The load balanced nature of our clusters also means that if a single node does fall behind, our overarching service doesn't actually fall out of sync and become unavailable. So we use our Gath nodes on the front end UI on the AirSwap platform, as well as in the back end for state management and storage. So on the front end this takes the form of balanced checks in the UI, wrapping and unwrapping with, and also approvals and approval state for the atomic swaps. So if you've used the AirSwap platform, you know that on the new spaces on the right-hand side, we have a bunch of information that comes from our Gath node state. We also use a heavy polling method to handle trade confirmations. So if you've ever popped open the network tab in Chrome while using AirSwap, you'll notice that we do heavy polling against a Gath cluster basically when you do a trade to make sure the trade actually goes through and gets executed. We also do allow very limited third-party access to our Gath cluster. We do have some of our maker partners who use our Gath cluster as well for balance checks, both for their own wallets that they upgrade out of, as well as counterparty balance checks to make sure that the counterparty taking a trade is a valid taker. Further, our parties then also use the Gath cluster for market data and price information, in addition with a couple other sources, but we do provide chain state to them for pricing information. When we were setting up our Gath cluster for the first time, we ran into many issues with the way Gath was written and our particular use case of heavy reads. This is not a knock on the guys that built Gath, it's a great piece of software, but we just operated at a scale that really no one else had been publicly talking about. So the first problem we had was Gath wasn't written with modern pub-sub methods. This meant that you have to do heavy frequent polling in order to get state changes, and you can't just simply subscribe to the events being emitted out of the blockchain that you want to get. Now, that's kind of true because you can with filters, but and I've heard a rumor here that this may no longer be true, but at the time filters were dropped on a node resetting, and you'll find out in a couple slides, but that was a huge issue for us, but basically the filters being dropped was a hindrance to us properly getting a pub-sub system set up, and then further complicating that is filters aren't a single node only. So when you cluster your nodes behind a load balancer, the round robin aspect of the load balancer means you can no longer guarantee which node is being hit and if the filter even existed there. One of the solutions we found at the time was a Node.js library that basically allowed you to do a multi-node filter setup, and all it would do is iterate over an array of Gath nodes and set up filters on each node. That was a heavy client-side solution, and we really didn't think that that was what we wanted to do. We were hoping for more of a server-side solution rather than relying on a client implementation. So the underlying issue that we found and some other people have alluded to is the overall ecosystem of Ethereum is meant to be fault-tolerant when it comes to chain state. However, there's no real thought that's been given to the notion of creating a subset of nodes that are also fault-tolerant in and of themselves. And so with ADAP, this means either querying a single node and praying that it doesn't go down, using a bunch of third-party nodes and hoping you have the right state, or doing what Infira did, which is writing a really cool piece of software and hardware called Fairyman, which basically takes introspects, the request coming in, and then sends the request off to the proper type of node, also provides a modern pub sublayer on top of that, and then allows for you to subscribe to client events. The problem is the Infira solution is very proprietary and only makes sense for their setup, so they can't really open-source that to the community. Our story on Argeth nodes, we do about 800 requests per second on Argeth node cluster. It's about 70 million requests per day. That's over two clusters of three nodes each. So that's a fairly large amount of traffic, and under this load, geth actually broke. We had a massive memory usage issue and a recurring memory leak that caused each node to slowly utilize all the RAM available on the box. We tried scaling up our hardware to, and this is AWS terms, but M52X larges, so it's a 32 gig of RAM, 8V CPU piece of hardware. And with six of those in production, you should be able to do substantially more than 70 million requests per day. But as you can see from the chart here, after about 15 hours, we had topped out our memory right at 32 gigs. And the drops you see are where we actually went in physically and restarted the geth process. The memory leak meant that we had to cycle our geth nodes before the geth processes used all the available RAM. In fact, if we didn't catch it in time, the box itself was so, the RAM was so used up, you couldn't even SSH into the box to kill the geth process. You had to restart the whole node itself. And this meant that we were waking up every three to four hours to check memory consumption and manually restart the process to make sure we didn't get locked out of our boxes. And once we figured out the usual cycle of RAM usage, we were able to write cron jobs to then go in and restart our geth process. And we staged that over about an hour across six nodes. And it worked. But for any of you guys that work in infrastructure, trusting your entire system to six cron jobs is incredibly scary. So we started tinkering with some of our geth settings, and we actually dropped our max peers setting. And the memory usage issue was largely resolved. And it turned out that the memory leak with the max peers setting really only occurred at the scale at which we were operating, and it wasn't publicly documented prior to that. As you can see in the charts here, those drops, like, that's when we were restarting geth. The stability that we did finally get came at an incredible cost to our operations team. So for at least two of us, we were getting nothing done during the day other than geth node work. And I can tell you that your sleep quality certainly suffers when you're waking up every few hours to make sure that your boxes aren't freezing or running out of memory. Also, your morale really suffers when every morning before breakfast, you wake up to 50 plus pager duty alerts, and they just keep rolling in throughout the day. So we wanted to fix this, and we embarked on a massive experimentation to find the correct hardware and software combination that would be the most stable and cost effective. We ended up iterating over 10 different instance types, spinning up everything from T2 micros to 4XL boxes, and we tried everything from compute focus, memory focus, instance store focused, trying to find and profile the way that the geth process worked under the loads that we were seeing. We made some concessions along the way that we have built into our open source solution, unfortunately. No pub sub method meant that we were just going to brute force it, and that was one of our big decisions out of the gate. We couldn't guarantee that a single node was up, so we needed to load balance our nodes. We couldn't guarantee that our nodes were in sync and up, so we needed to write health checks. We also couldn't guarantee that a node was at the latest head, so we wanted to do peer checking to make sure that we were actually at the head state that we thought we were. So that combination was kind of the secret sauce that no one was really talking about online, which was aggressive health checking, aggressive peer checking, load balancing, and then proper hardware. So let's talk about the infrastructure we actually run in depth. This is what we're going to be open sourcing. So when we run our nodes, we run on top of Ubuntu, so we use the service construct to start our proprietary tasks. Here we're actually starting the geth process, so this is in the open source repo. You don't need to worry if you can't read the actual code, but let's break down the options we use. So we brought the max peer down to fix 50. This solved memory leak issue and also helped with the data transfer cost that we'll talk about in a couple of slides. We use a custom datadir parameter to take advantage of the NVMe drives that AWS recently made available. Right now our default is cores turned on everywhere. You may want to restrict this for your own implementation. We use a 4 gig cache to speed up the initial sync process. This is the health check that we used to keep our nodes in sync. It's probably pretty small, but what it essentially is is a Node.js express app that just does waterfall queries over our peers to determine state. So we compare it to Infero and EtherScan, but you can certainly customize this to run on additional peers. We consider ourselves to be in sync if we're within 10 blocks of the head, and you can make this tighter, but we found that if you tighten it up too close to the actual head state, like three blocks or less, you will occasionally drop entirely out of sync with all your nodes when you get a small chain split at the head, which happens on occasion. So we decided we'd rather stay available than completely in sync all the time. We returned back HTTP status codes that do correctly identify whether the service is up or down. We also returned back a network timeout because we found out occasionally Infero just times out our requests. We've got a couple of nifty features. Health check has started as a service. We've run into issues before where our health check was down, but the node is actually up. We have cron jobs that reset the max open file limit. This was something that Gef had an issue with earlier, where it was not respecting the max open file limit. So we just wrote a cron job to fix this. And then we wrapped this all in a cloud init script. So it automatically spins up the boxes, installs all the required programs, Gef mounts the NVMe drives, and starts the services. So your boxes spin up and they start syncing. You can actually check that in a health check that we publish as to whether or not your node is running. One of the biggest takeaways we had was the hardware selection that we used. So we're going to talk about IO right now. Magnetic and spinning disk drives just don't work. You just don't stay in sync. You become IO bound and the node will freeze. Similarly, provisioned IOPS is actually not cost effective. You over allocate your IOPS and you're only using your max IOPS about 3% of the time. So this means you're just wasting money. So we tried moving back to burst-based EBS drives, and you have to actually allocate three terabytes of storage space to get the provisioned IOPS that you need. But again, because now you're below where you'd want to be in IOPS, you're actually hitting caps when a large block comes in, and so now you have a right queue that will build up over time and your node falls out of sync. EFS is something that AWS provides. It's an infinitely scalable network attached storage, and the network overhead actually makes it so your node will never sync as well. So just don't try that. And NVMe drives were announced a few months ago. That's actually what we use. It is a very good IO-based instant store drive. Family-wise, geth is IO-bounded incoming data and compute bound and serving requests. So M4 and M5 general classes over allocate memory. We don't use those. Any other memory-heavy family is actually not going to be cost effective. I3 classes is actually what Infira uses, or used at the time I talked to them. So they work well, but NVMe drives are now back out, and that's a better instant store than the I3 class. We run C5Ds in production. This is the proper mix of compute power with IO for what we run. We found anything smaller than a large instance is actually not going to give you enough power to run your nodes in production at scale. And any other family than these, it's just costs too much. So we run six C5D4XLs in production, and we have not had a single instance of downtime since August. Last factor is costs. So what do you pay for? You pay for the EC2 instances, pay for the load balancers, and then you also pay for outbound data transfer. Something we didn't know when we had our max peers set as high as they were was your outbound data transfer actually runs into thousands of dollars per month. So we've dropped this down with 50 peers. We actually end up with about two to three terabytes out every month. Real quick slide. Why not parity? When we did parity about five months ago, we were getting a bunch of false nodes. So we were not actually staying in sync appropriately with enough peers. So we just dropped them. But version 2.0, we certainly want to go ahead and start looking at other clients to bring out. And I think that's something that we'd kind of call for help on the open source side. If you guys know how to properly configure parity like this month, that would be great. And lastly, here's the CloudFormation stack. Like I said, we do run an AWS, but this URL will take you basically to our GitHub repository where you can go ahead and click with a single launch stack, spin up the exact stack that we run in production. That's all I have. Thanks, guys.