 Welcome to FPGA, an easier path to parallelism for Postgres. In this webinar, we're going to explain FPGA acceleration for Postgres, how FPGA database acceleration works, why it's likely to be a hot DVMS topic in 2020, and how to apply it in your Postgres deployments. My name is Lindsay Hooper, and I'm one of the Postgres conference organizers and your moderator for this webinar. I'm here with Sebastian Dressler, Solutions Architect at Swarm64, and Andy Ellicott, Sheep Marketing Officer at Swarm64. Sebastian has a wealth of experience in connecting hardware with software to build creative solutions. He's a Solutions Architect at Swarm64, helping customers fully understand how to fit Swarm64 technology into their environments. Sebastian holds an MSC from the TU Berlin in computer science, and Neil Vim is his favorite editor. Andy has been, has spent over 20 years launching and growing adoption of new databases. He's held executive product strategy and developer advocacy roles at Crade.io, Cloud and IBM, Vertica, VoltDB, and others. He enjoys being on the forefront of DVMS technology because he enjoys working with the early adopters. Those in the Software Development Vanguard who are transforming the way we work and live. With that, I'm going to hand it off to Sebastian and Andy. Go ahead and take it away. Thank you, Lindsay. This is Andy, and I want to thank everybody for joining us today to learn about accelerating Postgres with FPGA hardware acceleration. The next slide, we're going to, just a quick introduction to Swarm64 since some of you may not be that familiar with us, but we are a bunch of hardware programming and parallel processing nerds, mainly from Berlin, Germany, but also Boston and the Bay Area, and we're in the business of extending writing extensions for free open source Postgres that speed it up so that you could do more with the database. Next slide, please. Our agenda today is hopefully to, in the next 45 minutes, help you learn some new ways to get more out of Postgres and really improve Postgres performance by quite a lot. Specifically, we will talk about field and programmable gate arrays FPGA, what those are, why you should care about them now, which use cases you should apply them in and how it all works, really how they bring along with the embedded software that Swarm develops, runs on the FPGA, greater parallelism, calendar indexing, and a whole bunch of other goodies to Postgres that speed up its performance in a lot of different use cases. After I go through that, I'm going to turn it over to Sebastian and my much better half, who will take you through a iterative exercise of scaling in, scaling out, and then scaling in, actually with FPGA and Postgres to show you how the performance changes as you start to scale out and then accelerate and then scale out again with FPGA. Then we'll close with Q&A. So, talking about faster Postgres, why do people talk to us about this? This is a little bit red and butter. The obvious is they either have more users that they want to allow to query or interact with the database concurrently or more data. For those, faster answers, more people, more data, what FPGA acceleration usually provides is a 15x to some cases as high as over 100x faster query response and data ingestion. And for companies or people with old data warehouse appliances who want to migrate those to or modernize those with free open source or SaaS companies who have expensive cloud data infrastructure costs, who'd like to cut those by anywhere from 30% to 80% and increase the profitability of their SaaS business. FPGA acceleration is critically important to those people. And then I think for all of us who love Postgres and want to use it anywhere we can, and it really allows people to use Postgres in more use cases outside its sweet spot in transaction processing workloads and instead of migrating on to commercial alternatives or other open source databases. So sticking with free open source Postgres is one of our favorite reasons to be in business, to help people out. And the timing of this is really important because if you follow Gartner and research, you may have seen their state of open source database report, I think it was last October. And one of the predictions they made was based on the research is that within two years from now 70% of new projects are going to be running on open source databases and half of the databases that are commercial databases that are running today will have been replaced by free open source databases. So what we're seeing is a lot of companies, especially the bigger ones, big enterprises have data modernization initiatives to help them make this transformation and things like FPGA acceleration, some of the other acceleration solutions we offer come at a great time because it allows companies with these extreme workloads like data warehousing or larger SaaS applications or unusual data to accelerate free open source Postgres to meet these needs and achieve the savings and economic benefits of moving to open source. So now let's turn a little bit to FPGA specifically and let's start with kind of a why you should care. And this is when your Postgres hits a wall if you're using it today or you're thinking about using Postgres for a project that you're not sure if it's going to kind of get there or you've tuned it as far as you can go. What we've been conditioned to do over the last 10 to 15 years is think of further back. Most people say the next step is to do one of the two strategies that you see here. One is the kind of the more most traditional which is I'm going to move my Postgres database from say a machine with eight cores on it to a machine with a bigger server. Safe, easy, straightforward, kind of expensive and usually a little bit short term just because you tend to keep outgrowing the servers as your business and application grows. More recently it's become fashionable and usually more economically beneficial to do a scale out. So to kind of move your data to a distributed database whether it's Postgres or something else. The costs, the hardware costs are linear in that case versus a little quite a bit steeper in the scale up. But it can be difficult to choose a distribution strategy or even a distributed database. I have the right sharding, the right database, the right machines and plus what do I do for failover and blah blah blah. It's an administration database administration challenge for sure. But what people are, you step back and think about it, no matter which path you take, people are generally on the same quest. And what they are after is for more parallelism, more cores. So whether you go from a machine with eight cores in it to a single machine with 48 or you go with the same machine to a cluster of six machines with six similar machines, five more similar machines to get your 48 cores, you're really trying to get more parallelism like opening more checkout lines at a busy grocery store to just get more of those requests, satisfied scale easier and so on. And what people don't realize or what they're starting to realize and what we're going to talk about today is there's a much easier option to gain that parallelism and that is by dropping a FPGA thing into your server if you have your own data center or even though this is kind of a form of scaling up, moving to a cloud instance like an Amazon F1 which are FPGA equipped and Microsoft and others are starting to introduce FPGA equipped servers, but moving to servers that have FPGAs in them. And just if you take nothing else away from this session today, it's just knowing that by dropping an FPGA or moving to a machine, an FPGA equipped server, it can be like adding 100 cores to your server with the right FPGA software running on that device. And you'll kind of get an easy quick acceleration of query and ingestion from anywhere from probably 15 to 155x. Okay, next slide. So let's say, so what is FPGA? Just a short primer here. So FPGA stands for Field Programmable Gate Array. It's an integrated circuit that you can add to your server or that is installed into a cloud server like the Amazon F1. And what makes the FPGA unique is that it's a lot of processing force power, but it's not out of the box. It's not designed to do anything, right? So the beauty of it is really in the FP part, the field programmable part, which means that when at runtime, the software is loaded onto the FPGA firmware image is loaded onto the FPGA to allow it to become a custom processor like an ASIC or an application specific integrated circuit. So if that FPGA IC is loaded with the Swarm64 Postgres Accelerator, that FPGA now becomes basically a custom processor that does, it is completely focused on speeding up queries and data insertion, things like that for Postgres. Okay, that software really runs it near firmware speed. And the other nice thing is that it supplements the CPU. So for those of you who are in the habit of if you have a Postgres server that's going to be updating data, so kind of transaction heavy transaction processing, but also you want to enable some reporting and querying on that server. And if the querying and transaction processing interfere with the performance of each other, what people almost always do is just set up separate servers, one for transactions, one for reporting, and they set up a bunch of ETL in the middle to move data, to keep the two databases, two servers synced. With FPGA, you can get away from that and actually deal with mixed workloads much more easily, right, where the CPUs continue to do the transaction processing that they do that Postgres does and the querying gets offloaded to the FPGA which is working independently of the CPU. So really handle mixed workloads and we'll go into that in a little more detail. Next slide. Okay, so a couple of facts about FPGA. If you're number one, the processing power of PGA has grown a lot more rapidly over the last 10 years than it has for CPUs, about 500x more powerful over the last decade compared to CPUs, which are only about 32. If you're in the data warehousing space, FPGAs are also widely available to you. For those of us running Postgres on the cloud, really running your own Postgres servers, and not necessarily as a Postgres as a service from say Aurora or RDS. F1 equipped servers in the cloud or instances in the cloud are pretty affordable and very accessible on a Amazon, I think Azure, soon if not already, Alibaba, Baidu, I think also offer them. And so very accessible and for those of you in the data warehousing arena, if you go back far enough, or if you're an IBM customer, you've probably heard Natisa, which kind of invented the term data warehouse appliance, or they would back a truck up to your data center and wheel out a rack of very turnkey data warehouse cabinet. But inside that was a data warehousing system, a rack full of servers that were FPGA equipped Postgres, essentially. So the idea of speeding up Postgres for data warehousing workloads with FPGA actually goes back about 15 years. And today, or actually as of late or last November at re-invent, Amazon revealed that they are a new Redshift instance, the Aqua instance, the accelerated or the Amazon Query Accelerator instance that speeds up Redshift query processing with FPGAs as well. So FPGA acceleration for databases is tried and true, and companies like database leaders like Amazon are embracing it now to speed up cloud databases and Swarm64 makes it possible for Postgres community to do the same with Postgres, and they're actually companies similar to Swarm that offer FPGA acceleration for other databases out there. So it's a long story short, FPGA is not, even though it's rocket science, it's easy to adopt and affordable and proven and widely available to people on the cloud and in the data center. Next slide. Okay, so next question is how does this work with Postgres? I'm going to go over it kind of high level. Sebastian will go into it at a deeper level later in the session today. But so that we're going to talk about one of our offerings, the Swarm64DA data accelerator. It's an FPGA acceleration extension for Postgres, works with free standard Postgres as well as enterprise DBs version of Postgres. And when you start it, start it all up after installing it, Swarm64 will do one of two things. First of all, there's an extension that starts up and it interfaces with the query engine in Postgres, and for queries that it sees it can accelerate, it will develop a much more parallel query execution plan and execute that. So from the very beginning of the query plan is highly parallel and then it sends that query to the FPGA to be executed where you have 100 plus SQL reader and writer processes working in parallel, scanning and filtering data, compressing and decompressing data. And then today, those queries are operating on data that's stored in a foreign data table in Postgres that the format of that table is really optimized for parallel access. So number one, it's columnar index, which I think we all should know by now is really lends itself well to fast querying. And in addition to that we chunk up the data just into ranges to reduce IO and increase parallelism of IO and processing. So in a nutshell, that's how we work, integrating with the Postgres query engine and offloading, creating parallel plans, offloading that to the FPGA for execution and processing those queries against columnar indexed data format in a foreign data table. Excellent. So the result, okay, so what can you expect there? In the TPCH benchmarks, roughly speaking for to execute the whole suite of query or like the 20 of the 22 queries that we can execute, Postgres can execute. We have a scale factor of 1000, which is a terabyte of data. We run about 16x faster to execute the whole suite. Some queries work a little bit faster than Postgres. A lot of them work quite a bit faster than a few work astronomically faster, like 50x or higher. But overall, you're talking about 18 minutes versus an hour and a half. And this test was executed, this is Postgres 11. Underload, I think with about 10 concurrent users, the less fast, but 14x faster. But standard benchmark, this is kind of a rough like what you should probably expect. But next slide, your mileage may vary depending on the type of data and concurrent load and all sorts of things. What we've seen in customer work is things like IoT sensor readings in kind of a time series, very much a time series use case, inserting over a million rows per second slightly faster than some time series databases we've been benchmarked against. The transformations mainly querying, aggregating data into bigger chunks of kind of time. Generally 16x faster. Post-GIS, big automobile manufacturer testing that. They found about, allowed them to achieve the same performance under heavy load, concurrent load with a cluster that was a third the size of what they were using. Individual, and other demos we've seen other queries there go up again about 55x faster. And then another insurance company that was benchmarking. FPGA accelerated Postgres against a data warehouse appliance, 10 terabytes of data. For queries that had been highly tuned in the data warehouse appliance, the performance was about the same, but the cost was much, much lower. And for ad hoc queries, the performance was generally quite a lot higher in the FPGA, the modern FPGA accelerated Postgres than it was with the old appliance. So again, rule of thumb expect probably a 15x boost in query and ingestion performance, but again maybe go up or down depending on your use case. Okay, so most people when they hear about FPGA and hardware acceleration, they really think acceleration, it's going to make it go faster. And yes, that's what we've been talking about. But there's a very interesting capability that FPGA offers that we are starting to take advantage of. And that's really the FP in FPGA or the field programmability gives us the ability to change the, in about a second or two, to change the software image that's running on the FPGA chip. And that enables what we call shapeshifting, while it's Postgres to shapeshift and become something very different. Like you see, if you notice the animation there, but by default, Swarm64's FPGA image is mainly readers, right, about 85% read processes and maybe 15% write processes. But we can replace the image, the default image with one that flips that so that the database goes from a, you know, kind of a query mostly, right, you know, by day serving all those end user queries, et cetera. And then by night becomes, we call the werewolf mode becomes like a data loading and ingestion beast. Okay. And for those of you who have like a nightly load window, say, in financial markets where you're probably overnight loading the previous day's market trading data, right, or customer data every month over a weekend, whatever it is, you've got the, for some amount of time, you're going to shut off querying to the database while you load and aggregate and test sort of the daily load. And before you turn kind of re-enable querying. And with this shapeshifting, right, ability to change the strength of Postgres within a second, you can see here that in a small benchmark of querying and data loading, we really reduce the data load window by about 75%. And if we take that even further, okay, this is kind of a little bit more R and D in our labs right now. You know, I think, again, over the last 15 years, we've been conditioned that there are special use cases require specialized databases. And, you know, I'm doing something other than, you know, relational SQL, I should be looking at the JSON database, document database, or a search, you know, tech search, or what have you. What we can do and are doing is creating FPGA images that accelerate specific types of workloads like geospatial, like tech search, right, and others. And the beauty of this, eventually, can't say today, but, you know, imagine some time from now, it would not be far-fetched or unreasonable to say that you can use Postgres for everything. And you start to, instead of specializing your database for workload, you specialize the accelerator that's speeding up Postgres in certain types of use cases. So FPGA is very interesting, you know, beyond the reprogrammability of it is interesting beyond just the pure acceleration. Okay. So that, with that said, I'm going to turn it over to Sebastian, who's going to take it a little deeper and go through a process of scaling in and out with that FPGA. Sebastian. Thanks, Andy, for the nice introduction and giving a brief overview here. And you're jumping in a bit more technical. And for that part, I think it makes sense if you just imagine probably some of you are, if you're thinking about being a DBA of some company, which we call Company X here. And that company basically does store and analyze data for reporting and analytics. So just imagine that they, for instance, have a bunch of business analysts. And those people want to run queries during the day. And to make them basically work productive in a productive manner and in a very efficient manner, it's of highest importance that the query run times are really low. And at the same time, your data is probably constantly growing. So you want to also keep those query run times low. And at the same time to that, basically, it could be also that you're hiring more people, for instance. So your database experience is much more low. And the core question we are asking here is, how complex is the solution you pick to solve this particular issue? And in order to assess, to get to the core values of a solution you want to pick, you need to somehow standardize and you need to somehow evaluate different solutions. And for that, we are typically using benchmarks, obviously. And we picked here the DPCH benchmark, which is the industry workhorse basically in terms of data warehouse. It's consisting in our case of a terabyte of data roughly, which sums up to seven and a half billion rows overall. And you will show you certain configurations, you start off with a single node, and then we scale out a bit in order to get some more performance. And after that, we will then eventually see how it works with an accelerated database. Now, before we go into the joy of seeing something working, we reasoned a bit about how we could tune our Postgres for analytics. And there are basically two main parts, except for what Andy said earlier, adding a bigger machine or going to the scale out route. And the first things people typically do is adding indices to their data. Why? Because it helps them to access data faster by having more more punctual interaction with certain data types and certain joins and query patterns. And the downside of that is that sometimes it can cause you nonoptimal IO parents and your storage subsystem has a very hard time, especially if it's like, for instance, network backed. And you then, for instance, need to make your system more complex by, for instance, adding additional caching. Now, the other option is to add partitions to your setup that is typically used when you are getting to very large tables. So you start partitioning them. Why you do that? Because again, you want to lower the amount of data that is accessed somehow. And Postgres has this nice feature of partition pruning. And therefore, it's able to simply select essentially only the partitions you really wanted to select. Now, however, the downside here is that typically, Postgres also unfortunately limits parallelism at the moment. And you have sometimes a hard time if you want to change your partitioning scheme. Now, that's enough from theoretical perspective. We jump right into testing some things because that's what makes the most fun. And we have a little setup here, which is basically consisting of a single node, which is named here as coordinator one at the moment. And we show some graphs, most notably is here the CPU load that's ramping up in a few minutes. And we then on the other hand have certain queries we will start here. And basically at the moment, this is always the same query, which is one query of the TPCH benchmark. Because we want to first analyze how the overall system behaves when we just ramp up the load by simulating multiple users. This is a video I will start it now. And you see we are adding users here, which is like simulated as this little PC here. And as you can see, the coordinator does some work because the CPU is ramping up. So it starts basically processing this query and we speed it up the video a bit to make you not wait so long until the first results are there. And here it is already. So we see that this single query was processed within 670 seconds. And when we do the math, we boil down to roughly five queries per hour or five and a half almost. So this is roughly a metric what one analyst could do. And we are spinning up a second user and now to basically show how the system scales over different load metrics. And we see a jump of roughly 200 seconds here. So effectively, we reduce the throughput. We got some more parallelism. But both analysts have to wait longer in the end. So what we see here is now that we are at 10 queries per hour roughly. And both of these analysts, they would have to wait roughly 800 seconds until their query finished. Now the question we are asking ourselves is why is that? And now we basically jump into the details of the query. And this is query number six for those who are familiar with TPCH. This is a very straightforward query. It basically does a scan only and some filtering. And if we reason a bit about this query, what could make it more performant? We basically find two things. Number one is we want to add statistics because statistics in that case help us to get data in a more directed manner. And the other thing we want to add obviously is parallelism to accelerate the scan part. And these are two things which are commonly doable by adding better statistics on Postgres itself. This is the lower part. And for scanning, we could simply choose a different option to get more speed out of our setup. And that option is typically, for instance, a scale of method. Why would I want to choose this? Because I, for instance, want to have a separation of concerns. So when we take query six again, then that scan part is a very IO intense part. And the final number crunching on the result is a very CPU intense part. And I could now build nodes into my system which are optimized for either of them. So I basically want to have a data node, which is very IO heavy. And I want to have a coordinator node, which is very CPU heavy. Now, what happens if my data grows? Well, that can then be handled more on those data nodes. And also I can handle more concurrent users and more demanding queries by adding more nodes to the system. And by that, I then fulfill my request. And that's also basically what I described node, a very typical scale out approach actually. So you split your nodes into two types. You have the coordinator nodes. The coordinator does not know anything about data itself. It only knows about meter data. So it basically knows how can I access the data. Now, when it needs to access data, it hands it off to the data nodes, which then perform the IO operation actually, and hand the data back. Now, sometimes they, those kind of nodes can also do something different. They can, for instance, pre aggregate or they can already filter some of the data and give them only a subset of data back to the coordinator in order to make the life of the coordinator easier in terms of number crunch. Because if you have less data to crunch, then you're effectively reducing your load on the node. And by that, you're making more space to do other valuable things. Now we prepared this, of course, because you want to show something again, we extended our setup a little bit. So this coordinator node stays, but it now shifts its purpose basically. And it does not need to do any disk IO anymore to fetch data, but it is now doing network, which we see in the top. And we have added two data nodes. So we essentially now created the scale out system with three nodes in total. And we just start this. And we are just repeating the experiment again. So we are adding a user. And we just will see what is happening. So in a moment, the query starts, we see this guy were happening here, which essentially means that there was the cash was not filled right now. And we see that we get some network traffic here towards the coordinator, speed it up again a bit. And we're waiting for the results. The reminder was roughly 650 seconds beforehand. And now in that case, we are going down to 280 300. So we got ourselves roughly a two X speed up here. And by that also improving the productivity of our analysts two times. And now the interesting question again is what happens if I add more users, if I scale up the workload again, and just see that in a moment. Now we speed it up again a bit. And we're waiting for the query returns here, a bit longer. And the result is there. So we are ramping up a bit. It's not that dramatic as before. So it's like 50 seconds difference here. But it's not not a 200 seconds we had before. And now we are adding another coordinator to the system in order to handle more load. And basically by that, also adding more users and at the same moment. And just repeating the experiment again. And now you see that the load based on four users and two coordinators or total system of four nodes is stable in comparison. So this is potentially a very nice setup here in terms of load balancing. Now the question is, we added complexity. Did we gain any value? What we've seen very clearly is we made the system obviously more complex. So we went from a single node to four nodes in total. And I of course have to maintain those four nodes and potentially there are much more nodes in the end in a real system because you also want to have high availability and failover and potentially routing of queries. So this made it very complex suddenly. And it's also obviously more expensive because four nodes or maybe eight nodes for full redundancy are not that much cheaper than a single node. Now, what did we get out of it? We basically got ourselves a 2x speeder. And by increasing the complexity four times, we only gained a two times result basically. So can we improve this somehow? And the answer is obviously yes, we can. By going the scale inside route, which form 64 offers, and we take a piece of hardware, which might look like this card here, which is displayed on the right hand side. And we add that card to our server. And by that scaling in the server with an FPGA device. And that helps us to not go this big iron route. We do not have to immediately migrate off to a different vendor. We can keep all open source database. And we can also deploy much smaller clusters because the scaling in does not mean for you that you need to, that you cannot scale out anymore. Because at some point, the data is so much that you basically need to find a different solution. But the scaling in helps you to push limits quite a lot. And as a bonus, it enables hybrid transactional analytics processing, where in that case, Postgres can do the transactional parts. It's very well suited for that. And the accelerator does the analytics for you. Now, coming back a bit to the previous statements regarding how do I tune Postgres for analytics with the two mentioned things, indexes and partitions, you can completely get rid of that basically. Why? Because our technology gives you a different type of index. This is actually what we call organized columns. And you can think of this as a range-based index. And they have a very low overhead because this is not a traditional index in the end. But it still enables you to fetch data more efficiently. And the same holds for partitions. You don't necessarily need them anymore. You can still use them, of course, if you have a need for that. But the data accelerator by itself gives you a higher throughput and more parallelism. Now to demonstrate this and do this now again on a single node system that you see, you have here what we call a split screen demo. So on the left side, you have Postgres with FPGA. On the right side, you have Postgres without an FPGA. And we are again taking query number six and we just run it again. So we repeat the same experiment this time just with a single user. So I start a video and you see it's running the queries. They are both the same on both sides of the screen. And we're waiting a bit and we see that the Postgres with the accelerator is done within 14 seconds. And now we need to wait a bit longer for Postgres with all the accelerator to finish. And you see it's taken quite a lot. It's taken longer than on the previous system. Now we're waiting already, putting 900 seconds. And there the result is. So in this particular case here with this particular setup, we gained ourselves roughly 100 times speed up if we compare those two values. So how does this work in detail? Or better say, how do we work together with the accelerator? And we're taking again query number six here and you see we start off with a parallel plan. Now that's nothing that unusual with Postgres. The difference though is that our plans can be much more parallel than you would expect it from a two-day Postgres plan. Plus we have some query rewriting techniques in our extension, which then also enable the plan to stay parallel to the end. Then this parallel plan is processed. And I already mentioned it, we have the concept called of optimized columns. And the optimized columns basically enable you to only fetch the blocks you need. So this is a hybrid row column store. So basically the filter condition we see down here determine which blocks you want to fetch from the disk. And those blocks are lined with each other so that you can make a continuous fetch and you don't see those, for instance, scatter, gather IO patterns. Now when we have the blocks, they are streamed through the FPGA accelerator device and their first decompress there. So the storage is also compressed. Typically, in average, we are seeing between two and three times, four times compression. In some extreme cases, we can even get to five times compression that depends a bit on the data. So it's decompressed. The decompression happens entirely on the FPGA device. So this is not loading your CPU here. And then the FPGA picks the columns it needs. So you push down the select statement essentially here. And then eventually it starts filtering the rows. So you can see the optimized columns as a coarse-grained filter. And the FPGA does the fine-grade filtering. And whatever is then left over from this line item table here is sent back to the CPU where it basically stays in RAM and cache. And essentially then the CPU processing kicks in. And in that case, it applies the aggregation. So query number six does the sum over those two columns. And this is then your final result. Now the important here is that this is still streaming. So we are not storing anything on the FPGA device permanently, but the data is streamed at all times. And the data also stays compressed in memory, which effectively increases the memory you have available to process queries. And for certain data sets, for instance, the TPCH 1TB, you can with a medium-sized machine roughly estimate that your whole data set is now cached. So the database in the end behaves very much like in-memory database. But even if you need to go to the disk because of compression and because of how we fetch the data from the disk, you still see the same effects as if you would have the data in memory already. Now a bit earlier I said you can still scale the solution out. And that's what we're showing here. So we have the same setup again as you've seen it already now. The difference now is that we are using Postgres with hardware acceleration. And the setup, by the way, is made with the Postgres foreign data wrapper. So it's fully open source. We have that available in GitHub as a patch. So if you want to try the setup out, you can get the patch, patch your Postgres, and then build a very much open source cluster without any third-party tools from other vendors. I'm just starting this video again here. And we repeat the experiment again. We add users and we see that there's a bit load on the disks now. The hardware accelerator starts loading because we process the data as said. We have some CPU load and we are already done within 14 seconds. And we now add, again, multiple users to the system in order to determine how stable this runtime is when we increase the load. So we have two users now and waiting again a bit. You see also, by the way, that the disk IO stops completely here because the dataset is cached by now already. And you see the runtimes are really stable. So we have now we went up from 12 seconds to 14 seconds. And that gives us a whopping 500 queries per hour. For those two active as an artist at the moment, we add one more to the mix to just see how this behaves. And as a reminder beforehand, we added one more coordinator to the node in order to stabilize the whole system. Now we are keeping it in one. And you see, we have really small increases on the query times. So these are really small. And the system behaves in a very stable manner. So in summary, what we've just shown you now is that we basically benchmark Postgres in a standalone version without any acceleration and our base measurement, which we will call here as 1x. Then we scale that Postgres out to a three to four nodes cluster. Depending on the system you want to show, we gained a 2.7x speed up by that. And then we scaled it out and accelerated that with the FPGA device. And that leads us to a 64x acceleration. Now, what does it mean? In the end, we increase the productivity 64 times with this particular query. As Andy mentioned, your mileage may vary. So different queries have some different behaviors. And the important point is also that you can now decide when to scale out. And essentially, you have fewer machines to maintain. And if you want to scale out, you can afford a much smaller cluster in the end, which is then even easier to maintain. For instance, a three nodes cluster in that case would be obviously easier to maintain as a 12 nodes cluster. And having said that, I hand it back to Andy. And thank you. Thank you, Sebastian. So just we'll quickly wrap up. And if you go to the next slide, please. So two things. Number one, where is this a best fit today? Is this a right for your project? As you may have gathered, just from what you've seen over the last 45 minutes, that this is really today, we're accelerating query intensive workloads, like customer-facing or just user-facing dashboard query dashboards, visualizations, if you're a SaaS or a cloud company, it's an opportunity to cut your cloud across by quite a bit. Data warehouse modernization, new GIS projects and so on. With our Amazon, you can launch an FPGA accelerated Postgres instance from the Amazon marketplace and pay for it on an hourly basis. We see that sometimes used by data scientists who just want to run certain models or just kind of crunch a lot of data in a short amount of time and then shut it off. And in some cases, we even see this used in or considered for use in continuous integration testing pipelines where they're doing a lot of database querying just to make sure there is a regression testing just to speed up their software delivery. Next slide. If you'd like to try FPGA accelerated Postgres, it just a couple of points. It works with Postgres version 11 and up, standard free Postgres or Enterprise DB's E-Pass. You can run it in your data center, get FPGA hardware from Intel or Xilinx free seller, also the Samsung Smart SSD, which is a new kind of hardware device that puts a small FPGA on your SSD board, which brings compute and storage very close together. It's very interesting. And as I mentioned before, you can run it on the cloud. Easiest way to do that. And actually, we offer free trial of just cover our website and request it. You've got a FPGA accelerated Postgres instance running within five minutes. Very easy way to give it a try. We have a TPCH toolkit. That's part of that. It allows you to populate the database full of data and some queries just to start experiencing the acceleration. And if you are like an advanced technologist or something in a larger company and who are an architect who has a lot of project teams coming to you for database recommendations, we offer a Swarm64 lab license, so non-production version of Swarm64. It's only about $500 a month and allows you to basically run tests on any kind of hardware to see whether FPGA accelerated Postgres is a good fit for the various project opportunities that come up at your company. And just I'm going to take one question here. That is, somebody asked whether you need to be an FPGA developer to use or know how to program an FPGA to use this. The answer is no. We've done that software development for you. Just install it. It's pretty plug and play. So I think with that, we will wrap up. Well, thanks, Sebastian, for co-presenting and sharing off his expertise. Thanks again to all of you for your interest in the topic and hope to see you again online in the future.