 Welcome to creating a single point of access to multiple Postgres servers using Starburst Presto. I'm joined by speaker Randy Cherkow, Solutions Architect at Starburst Data. Today we're going to review the architecture and environment of Presto, share the practical considerations and engineering choices for deploying Starburst Presto to solve your problems, show the most common reference architectures, and then demo it. We're going to run an install of Presto live and connect multiple servers showing you how to run it in your own lab or environment. Let me give you a little background about your speaker. So Randy has over 27 years of IT experience and so the first 20 years of his career as an infrastructure architect at large enterprises such as Abbott Laboratories is a master's in computer science, the concentration in data communications and artificial intelligence. Randy has spoken at technical groups and meetups all over the country about database topics and has been a regular speaker at Postgres meetups. Besides the extensive technical background, Randy's a new decision on the side and has written four books with major publishers about the music business. So with that, I'm going to hand it off to Randy. Take it away. Thanks, Lindsay. Thanks for having us back again, looking forward to doing a demo version of what we talked about last week and I'd like to kind of park in back to that by doing and starting with a bit of a review of what we covered. I won't go into the huge detail because we actually have a recording of what the last one was. So you can refer to it if you want to dive in to the background. But for now, let's talk about just a bit of review. If you're connecting to a server, of course, that's really easy when you have one database. It's once you have two that things become interesting. And while there's ways to do it, for example, using foreign tables or connecting both of them to the tools that you have, the biggest challenge that you usually run into is the fact that as soon as you have a third server and more, it actually becomes very complicated. You end up with a lot of lines and a lot of issues along the way. But very often your data is actually found across the board or you have reasons to connect to more than one server at a time. We've all run into those situations where you need to query multiple systems. And that includes things like cross DB queries or selectively moving tables between servers, which we're going to do today, and analytics, which is a key one where you might be using something like Tableau or dashboarding tools or data science tools, which really have a need to connect to many different data sources and add more as we go. So all of these become a real challenge to handle. And part of the reason why I ended up joining Starburst data and really it was around the product Starburst Presto that we have is that you can make it simple by creating a federated access method by creating basically an abstraction layer so that you can query many different data sources at once. Really importantly in this case is the fact that you can actually normalize that if you have a federation system to ANSI SQL. As you've probably seen before, and this actually has some examples of it right here. You have many different flavors of SQL that really seem to get created every time we have a new type of data source like Cassandra, Kafka, Hive, Spark, and of course, no SQL and many others. So the goal and what you can do with a platform like this, the flexibility that you have, it's actually pretty amazing to deal with. First of all, accessing different data sources and adding them quickly to the platform. All you need is a connection string. I will show you how to create one of these files a little bit because that's part of the demo. Although I have some pre-baked naturally, you would need to customize the demo for yourself. Accessing dissimilar data sources and we just talked about that. Naturally getting the latest data rather than waiting for an ETL scheduled job to run. Now Lindsay mentioned my background a bit. I used to be at Abbott Laboratories and we had, by the time I was done and left that place, I think I had created a data warehouse with 25 different data sources. Naturally, since they didn't understand SQL as most people don't, most people are not database people, I kept asking for an ETL tool and they didn't think I needed budget for it. I ended up just grabbing some Pearl libraries and actually Pearl being very good at regular expressions was able to parse a lot of data sources that weren't even SQL in the first place and put it all into the same one. This really became handy. I wish I had this tool back then because I would have been able to get the latest data without having to write ETL jobs. I would have had concurrent access so I could have actually had more people connecting to it at once and I could have granted centralized security and chosen which of the data sources, the 25 data sources that I give people access to because not everybody should see every data source. By the way, what you see here in the right corner is especially important when you were at a company like Abbott Laboratories because we were life sciences. We had to deal with the PHI with private health information. We had to deal with quite a few other things that put us under a lot of regulatory requirements to prove that we were actually keeping it safe and actually had a security method in front of it. A federated access platform solves a lot of problems. You can think of it as basically a distributed SQL engine that provides a real time scalable single point of access to your data. The system ends up looking like this where we have a cluster in the middle which has a coordinator node and multiple worker nodes. The workers can be expanded. That is to say you can actually have more worker nodes and by adding more you get more concurrency. You also get more and faster response time. You can deal with more data and you can get the data faster. I just had just to give you an idea because each of us in this Postgres arena have dealt with many different scales from a little tiny server that is just a back end to a few systems to major systems with terabytes to petabytes of data. What you end up with at some point is occasionally you run into some people who have large needs. Just to give you an idea of what I run into sometimes, I was talking to a company. It's a major insurance company and mutual fund company. They have a lot of financial instruments basically all tied together. I can't use names. I apologize. I'm under many, many, many different NDAs. I can tell you that the problem that they face when I talk to them is they said that they wanted to have a platform that can handle 10,000 users is what they asked for. And they had actually they wanted very fast response times. So that would require you to add quite a bit more worker nodes. The system is meant to scale to that point where you could actually add that many users to the platform and have the flexibility. You can also scale it down when it's not needed. So it has an awful lot of an awful lot of operational choices for the needs that you might have. Now, for an architecture like this, the most often asked question, and I did address this in the first talk, but I wanted to hit it one more time, is what's different between this and data virtualization? What data virtualization platforms do, it's kind of clever. Let's say you had a query and you wanted to merge data between this data source and this data source. What it does is it copies the tables into one of the data sources, and then it makes the data source do the work. The key difference between this and data virtualization is we do the work in memory, which means that it's much faster for us to join things together. It means we also put a cost based optimizer, a CBO in front of this entire system, and are able to join things together in a much quicker way. And that leads to far higher concurrency. It would be very hard to meet the needs of the client that I was just telling you about at a large scale with a virtualization platform because you would end up straining the end systems to the point where they could fall over. Now, that's not to say that you can't use this for smaller jobs. It's actually useful for quite a bit, and that's part of why I put together this demo. I actually have a GitHub repo where you can download it yourself and give this a try. I'm going to step through every aspect of it. Just a little bit of terminology that I covered last time, but I'm just going to touch on right now. And you should get a link to the prior talk in case you want to see it in the chat. We'll make sure that we include that. But we have a coordinator, which is the master node controlling the cluster. This is what you connect to using your tools. It can be made HA. And you have worker nodes. You can add and remove them statelessly as needed. You also have connectors inside of the system. The connector you can think of as a driver. But this driver happens to be at the level of the cluster, not at the level of the user. The only driver that they need in order to connect to Presto is a Presto driver, which can be standard JDBC or ODBC. And many tools actually now have Presto connectors that are specialized. They work very well. It also has a pluggable architecture. We're not going to be playing with that too much during today's demo. It's going to be more about standing a system up and putting it together. So it's very flexible. Again, the prior talk has more detail on it. And it's SQL 2015 compliant. So it allows you to do even things like CTEs and windowing. Very handy. It does TPCH, of which I put some of the data in this cluster to play with, just so you can see some queries happening against this little tiny cluster. And we're always adding additional functions and functionality as really from the point of view of larger companies that need more connectivity. They actually usually need the cluster to work better with not only more data sources, but more backends like we have an LDAP connection. We tie into Ranger. For those of you who know Hadoop, you'll know that that's kind of a really important security platform. And quite a few other aspects with both security as well as auditing and things along those lines. So when you connect to a new data source, and we're going to go into a couple of these files in a little bit, you usually just need four lines. And what you're looking at, of course, is a Postgres connection. You can see this on the left. You just specify the connector name, the connection URL. And this is the standard connection you would use in anything that would connect to your Postgres server that you have. And naturally, the username and password. Our enterprise edition ends up putting this in a vault if you want. If you don't want it in a text file, that's very understandable. But once you do that and restart the cluster, you have an access to an entirely new data source and people can start querying it. Once you do start the query, you actually have a three dot notation. And we're going to look at a few queries very shortly. But it includes data source, database, and table. In the case of Postgres, the middle dot is the name space. I like to call it the name space, the schema. To everybody else, I'm used to flipping it and explaining to everybody that the schema is not the same as the schema that we talk about in all the others. It's more of a namespace. So you would have, in this case, the server name, the schema, and then the table name. And you can combine data sources in one table and one query, which we'll also be doing in a little bit. Scaling is very straightforward. We won't be able to do this in the demo because the idea of this demo is to run it on your laptop. I'll go into that in a little bit. Usually you use Kubernetes for that. That's likely going to be the demo for next time. We're going to put something together. It may make more sense for me to do something in Amazon. Depends on what we're going to do. I'm going to narrow it down and we'll see it for next time. Either way, we will look at how to scale it, how to make a slightly larger cluster. For now, the idea is that we have the config files, which is the same as the backup that you would be doing. And they're tiny. There's just a handful of them. And you can provide HA from warm standby, which in this case we're not going to do. So skipping ahead. As mentioned, there are multiple deployment options that we can cover. If you wanted to try this out on a larger scale, we're going to talk about this next week. I want to dive into the actual demo that we're going to do and break it apart. So what we're going to be doing right now is running Starburst Presto as a container, running on your laptop if you want to do this. Of course, you're going to just watch it happening on mine. That might be enough for some of you. But I know that a lot of people like to get their hands on it. It contains both a coordinator and the worker squished into one. You can do that. You can tell the Presto container that you are both a coordinator and a worker. So if you were going to look at a picture of it, it would just be one box that's serving both roles. This is not, I can't emphasize enough, not recommended for production. It's fine for functional testing. And it's, of course, limited by memory and your laptop's resources. I would further add limited by network. So I've actually connected to a remote server and it's going through a VPN in order to connect securely. And naturally that network is going to limit you a little bit. But as far as it goes, you can actually run a full functional test. You basically have a Presto cluster, which I'll put in air quotes at your demand when you're doing this. The connection string is just localhost 8080 for this particular demo. You can change the port if you wish. It's really straightforward to do that. The demo, what it does in this case is it overrides the default settings in the var Presto et cetera directory tree with the files in the working directory of what you're going to be downloading. And this is important in that if you don't properly do the override, you will get a connection to basically some sample data sets that are available to everybody, including TCPH data, which I actually left in here just so you can see, so you can play with it. But you won't be able to do very much with the cluster unless you override it. And so that's why the shell script uses the volume command from Docker that merges the files on your computer with the Presto container. So once your container starts up, it basically sees the files that are part of this GitHub repo that you're going to see on the next slide. We'll make sure you get the link for it. I also use the RM command just so it would nicely clean up after itself and handle that. And so that's something that's nice to have. You can, of course, remove it if you wanted to keep the container around. If you wanted to go into it, there's not much data in it except for the log. So it's possible that you might want to see the log afterwards. That's kind of nice. Here's a few things about the demo. It's tested. It works on a Mac. It will probably work on Linux. It will probably not work on Windows at the moment unless you have Linux extensions. There's some differences in how Windows handles its containers and how Docker runs on Windows. So you can give it a try if it doesn't work. There's actually some simple tweaks you can make to it, including changing the pathing alone is actually something that can really help with that. It's written as a shell script at the moment. I certainly could create one if there was enough demand and perhaps add it to the to the GitHub repo that you can find here. It's at github.com slash ranmark dot press slash presto container. You should have the link to it. I think I just saw the link get posted to the chat. So that should be really nice. And you can see it yourself if you'd like and start to run it. And you can tweak the memory settings to be appropriate for your computer. So I want to go through a couple of the settings because I really want to make sure that those of you who try this out have the ability to not only do it, but have the ability to use this as a tool. In fact, you know, as I leave this screen up for just a second, just a short story, I, when I first joined the company, I actually ended up with this, a very crude version of what you're about to run. I've refined it a lot in order to kind of be a little more repeatable and I keep refining this. So this is not something that's static. I keep playing with this particular version, but I was using it to show somebody who actually runs a few Postgres groups and is very active in the Postgres community. And he does a lot of Postgres consulting. In fact, it often requires him to connect to multiple servers in order to do his job. And one of the things he wanted to see is if this would help him. And he immediately connected multiple instances of Postgres that were running on his own laptop. He had many versions of Postgres all at the same time. I'm sure that he's not the only one. A lot of Postgres people actually end up with different versions for different reasons. So he immediately connected them together. He wrote a CTAS query that is to say create table as select. Where he does a select on one end and it actually creates the table on the other and then fills the table up. And from that standpoint, he was actually really thrilled because he often had to go to a large number of intermediate steps in order to do very simple things like moving data around. So it simplified things. He kept things in the SQL realm, which is what really excites me. You can do it. And he ran this just with this setting. We have it at two gigs. So it's very small. It should run on your laptop, unless you have a very, very old laptop or a very old computer system. This should be fine and it shouldn't take up too much memory. You're also limiting the system quite a bit by doing it though. So one of the things he did with this demo that I'm about to show you is that he ported it to a Linux box and then he put it basically in the cloud near some of his other data sources and he started to expand its use. And then after a while, he called me back and he said, Randy, can you tell me why it might be running slow? I'm like, oh, well, let's take a look at what you're running. And we reopened it again and sure enough, it was two gigs. I'm like, well, you probably need more RAM. He adjusted the RAM, but then he came back to me again and said, hey, it's it helped a bit, but it's still running slow. I'm like, well, keep in mind that you're running this on a single machine and you're running the coordinator and the worker in the same place. So one of the keys that we're about to do, and I have a slide on it coming up, is that this is a nice functional test of the technology. But as soon as you want to scale it, that's what the next session is about. This one is about getting it to work on your laptop. So feel free to add more RAM. Don't add so much that your poor laptop falls over. And don't starve the system so much that it falls over. Because if you starve it for memory, of course, at some point, if it tries to do an allocation, it will cease as a container as many containers do. They just don't run. This one's not made to restart itself. It's made to do kind of a lab test and then disappear off of your computer. But you can tweak the settings if you know Docker pretty well. The containerized systems very well. You'd be able to actually do something with it. A few other things about this demo environment. You can tweak the JVM config. I wouldn't change anything unless you really feel like it needs a lot more. You can add some more reserve cache. This tends to work fine for most people. And so I don't recommend really tweaking that too much. Here's another thing that I wouldn't change at all. If you tweak the node properties, you can change the node ID. It doesn't matter as long as it's unique. If you're using the one that I sent you, it won't really conflict with anything in the world. It doesn't seek out others with its same name. That's not a problem. But feel free to tweak that if you wish and change the environment name to whatever you please. That's all good. So I wouldn't change too many of these things. The main thing is if you want more memory, you can change the max memory and max memory per node. Now you might wonder why is it per node? That's because this is a cluster. So it actually does some very interesting settings and work to the worker nodes that you have that are out there. And this is covered very nicely in our O'Reilly book, which is actually available for free from our website, even though it costs, like, I think it costs over 50 bucks if you buy it. So if you really want to dig into this, you can, you can tune the heck out of the system and it can really, you can make it dance and do all kinds of really great things as you put it out there. So getting to the demo, as stated, this is a functionality test. It is not a performance test. If you really want it performance, you really want to do two things. You want to bring the cluster close to the data so you don't have the network in the way and give the cluster more resources than just a laptop. It's helpful to give it a little more than that. I already told you the little story about it. That said, this is a perfectly good functional platform for you to do queries against multiple systems and handle it. Also, as you add workers, you get more concurrency, you also get better response times because you get more CPU cycles, but the really important thing is this last point here, more parallelism. So what I mean by that is the fact that as you write queries, the cost-based optimizer actually ends up splitting up the queries and it ends up sending it to multiple pieces and grabs the data from multiple worker nodes and then combines it. So you can kind of see how that works a little bit, especially in our book, which you can download pretty easily from our website, starversdata.com, like I said, it's free there. You can buy it, of course, but if you're fine with an electronic copy, this is a great way to do it. And with any luck, Mandy, my compatriot here could also provide a link to that. So with that, I wanna get to the demo and now that you have a little bit of background on this and how this comes together. Now, there's a few tools that I'm running here that are gonna help you understand it a little better. They're not required, but I'm running something called KiteMatic. Those of you who have run into container systems, and I'm sorry, I don't have a lot of time to talk containers in general and explain how they work, but just say that if you install Docker on your computer, this will be available. And one of the tools that you can bring to it is KiteMatic, which gives you a little more visual way of looking at it. Now, right now I don't have Presto running yet. I do have a Postgres container, of course, as all of us probably do. Anybody who does any container work in MySQL. And I have another set of tools below, but I wanna start by kicking started the cluster. And this is all one last piece here. This is all going off of this GitHub repo, which I think was sent to the chat before, in the landmark Presto container. And you can download this and use this yourself if you'd like to. So we're gonna run it. And what this is gonna do, it's not gonna do anything amazing on the command line. I've kicked started this, but you can actually see in KiteMatic that I now have a new container running. And I have a number of volumes that is what I told you. We actually did an overwrite. Now, the thing to wait for, and it happened a little so quickly that it just came up immediately. In this case, sometimes it takes a couple extra seconds. It depends on how busy your computer is. Now my computer's starting to breathe hard because it's got a whole nother system running on it. But you wanna wait for the server started message. Once you see that, you know that it's going. KiteMatic allows you to explore a container. And so really briefly, I just wanna show you a couple features of this. We have this setup to go off of localhost 8080. So now that we're running this, we can actually connect to the console. There's a UI console for the tool. And we also have these volume overwrites, as you can see right here. So we've mounted this and each of these actually become a directory inside of the container. The container has these directories existing, but when it happens in Linux, when you do a mouse is that it replaces that and makes it available. So the nice thing about this kind of scenario is that those original files are just hidden temporarily by the files that happen to be running on your server, which leads me to the third part of this, which is if we were to take a look at this, we actually have an et cetera directory in this. And that includes the config files that I just showed you a moment ago. And so this is how you change them. You can tweak these as needed. We don't need to change anything. I've also created some of these property files. These are the catalog files that we talked about earlier. And the TCPH one is a set of test data. It's kind of nice to have, but the really interesting ones are the AWS Postgres and the Postgres laptop files. This one on my laptop is connected to a Postgres instance that I actually have running on the same laptop locally here. You can see my super secure password and the same thing here where we have another super secure password. Don't worry, none of these are things you can get to even if you're taking a look at it because these are available through a VPN that I have connected to AWS in this case. As you can see though, there's really only four lines. Postgres SQL for all of your Postgres servers are the same. If it's MySQL, it would say MySQL. Our docs site does a very good job of showing you how to configure each kind of connector. In this case, we're connecting a Postgres as well as an AWS server that's basically in the cloud. And so we're just gonna connect to two. We're gonna make this really straightforward since we only have an hour. It's not like we have the longest amount of time to go through it. You can see how I overwrite the directories by this dash V, that's a volume. So all of this is set up to currently pick it up from wherever the directory is that you're running it. You can tweak this to your heart's content if you wish. It's something easy to change. You can also change the port. I'm leaving it at 8080. You might be using 8080 for something else in which case feel free to change the host port. I didn't really give a override parameter on it, but that's the whole demo. You're gonna have all these files available to you. And now that we have this, let's take a look at connecting to it and actually doing something with it. Now, I had this also pre-baked a little bit, but I just wanna make sure that it's clear on how you would do this kind of thing. Oh yeah, one other thing I wanna do very quickly is connect to the UI. It's called Presto as the login. And once we do that, we can actually see that there's one active worker. And this poor worker of course is also serving the duty of also being a coordinator. The finished queries are the interesting ones, but naturally we don't have any right now. So let's make some finished queries by actually running some. I have prepared some early stuff right here. And of course, if I didn't make very, very clear, I just wanna do this also, is that these connection strings are not something that you are gonna have access to. I left this in the example so that you can see the same thing that I'm seeing. But of course you should overwrite the connection URL, connection user and connection password to be your own. Because this is a demo, I have this all ready to go. We're using a tool right here. This is yet a separate tool called Dbeaver. Those of you who like open source tools probably already know about this. Those of you who haven't, I just recommend downloading it. It's free, it's wonderful. It has lots of really nice features that allow you to run queries against things and explore it. And one of my favorite parts of it is that it shows you a tree. So I'm gonna connect to my local Presto SQL connection. And just really quickly, if you're doing this yourself, because I pre-baked this a little bit to save time, but if you're doing this yourself, you just create a connection in whatever your favorite tool is. You would choose Presto SQL. And then once you're done with that, you would actually have localhost 8080 unless you change the port. So that's it. Actually, if I were to hit finish right here, it would create a second connection. You can of course overwrite the name and other aspects of it later. It's pretty easy to do that. Presto SQL has a driver that's ready to go in side of Dbeaver, but if you use other tools, you can find a Presto SQL driver for it. In this case, I've downloaded it, I've connected to it. And now I actually see each of these separate servers, which I left here. So I have here a separate server to report Dbeaver. And naturally right now I have everything in the public schema. I have a few different items here. And then the AWS one separately has a main database and inside of public here, I just have an orders table. So I have a handful of tables. They're actually interrelated to some extent just to show that it's really easy to query this kind of thing. And normally if I were to try and merge data for multiple of these, we talked about the other ways to do it. But what we'd prefer is to keep everything inside of a single SQL connection. And right now if I were to run this, for example, I can let's close the results pane just so you know that I wasn't cheating. I actually am grabbing the supplier table right here. It's actually passing through my Presto SQL server. It's going through to my laptop instance and it's returning it similarly for this line item. And the third one of course is the Amazon version. So it's gonna take a little bit longer since it's going over the net and it's pulling all these tables separately. So just really briefly to do a join. I can join these together and I can do it using raw SQL. I've pre-built this, but I actually might build another one just to show you how this works. It takes about 10 seconds because of the net connection that I have here. It's actually pulling it to my laptop and then doing a merge. And of course the thing to keep in mind is that if I had more worker nodes naturally it could parallelize this a little more. I also could locate this cluster closer to the thing that has most of my data. That's usually the best choice to make. So right now it's located closer to the thing that to the Postgres laptop one. Of course it's right next to that. And it takes a little bit longer to do it. But what I've just done is merge data together from two different Postgres instances, one in the cloud and one on my laptop. And I've done it just by having it all be available in the same space right here, which is perfectly browsable. And the nice thing about this is when I have a tool like Dbeaver is it actually allows me to browse pretty nicely from what's available. So if I go to AWS it will actually let me see both the name of the data source. If I hit the dot, it allows me to choose public. If I choose public, I can see that there's just an orders table available and I can browse to it and do that. It allows you to build out queries very easily. This is also true in tools like Tableau, in tools like SuperSet and anything else that you might have that builds out dashboards. And so the beautiful thing is it takes something that is kind of troublesome and difficult to do from the tool standpoint. You're usually limited by your laptop's memory which of course we're running in this little demo. So naturally it's still within the laptop. But once you have a cluster it makes it a little easier to put it together. We're gonna do a really quick create table as query right here. And I'm able to actually create the supplier table by doing a select against my local version of Postgres supplier table and create the table in the AWS one. So this is why my friend who does Postgres consulting was thrilled is that all he had to do was create a couple connection files and then connect to the multiple data sources. And as soon as he did that he was actually able to have it be available. Now you'll notice that the orders table is the only one I see right here. I don't actually have any others. And that's just because of Dbeaver. So if I do a refresh it's actually gonna scan the system and it should find the, once I actually scan the whole thing it'll actually find the new table and now this becomes available. So remember there was orders before if I rewrite my query so all I did was hit return right here I'm just selecting the one hit return again this is Dbeaver helping me. And of course this window is just slightly out of date from refresh but now the supplier actually becomes available once you refresh this one too. And it makes it easy to put all of this together into one place. Keep in mind that not every manipulation is available for every different data source. So while we did a create table as in this case we actually have a scenario where you might have let's say Kafka or you might be using a Hadoop data source or a Hive data source. Those don't necessarily have all of the nuances that we have with something like a Postgres and others. And so as you do the manipulation like deletes and other types of CML language any manipulation language that may or may not be available you actually wanna take a look but most of what you would need to do certainly from a consulting standpoint there and almost everything you would possibly need to do from a analytics standpoint is available to you and it's available in a very simple way. So this platform is nicely self-contained. I'm gonna go back through it again and make sure that you can do this. Download the container version and it ran very quickly. It's about a gig and a half or so I believe if we open up any window here we can see that it's about 1.3 gigs from this container. We're currently doing the 3.343E version. I have the latest that I was doing some testing on as well I believe that's the same version though. So it's pretty, yeah, it's the same version. And so it downloads it it takes a little bit of time to do that. I did not wanna do that while we were on the live web stream because I knew that my stream would get a little bit cramped at that point but it doesn't take long to download it. Once you've downloaded it make sure to customize these properties files. And in fact, you can create more if you want just by creating more files in this catalog directory. Then you just start the container as we talked about before using inside of the directory inside of the directory that you have downloaded this the run presto container demo.sh file. And then once you do that it should connect to everything. You'll want to check the log files as it comes out. And again, make sure that you see inside of the log file the server started message. Any errors on connecting to your servers you'll find above it. So just take a look at the few lines above either server started which it will start if you have a failed connection but what it will do in that case is pause and it just won't provide that connection. So you may have to kickstart this thing a bunch of times until you get the configuration exactly right. And then once you do the cluster then can connect to it. And of course you can see right here that it's actually keeping track of the fact that it's running my queries. So finally, now that we're done with all of this we can actually see that we have finished queries including this select queries that we were doing and merge queries that we were doing in various sorts. And this is kind of interesting to look at if you're normally like looking at it not only because you could see how much time it takes but you can actually look at the plan and dig into it. This one's a little more straightforward. As you do others it will actually tell you how it merges it together. When you have multiple workers it will actually show you that it parses out different pieces of this to other worker nodes. And this will allow you to actually do more performance testing and changes just in case you need to tweak something. So it will work best for a cluster rather than working well for another type of environment. And I wanna show one more aspect. We can, by the way what you're looking at here that's dbeaver actually pulling the tree that it pulls in and then allows you to browse it. We'll just take a look at one more query here in the live plan. Yeah, this one's straightforward it does this. So hopefully this is enough to have you run a query on your own though you can feel free to reach out to me we'll make sure to provide our contact information if you'd like. The real power comes in of course when you do it more than just on your laptop it's when you connect it to the broader world and connect more than just Postgres servers you actually connect things like for example Kafka and especially Hadoop where a lot of people are shutting off their Hadoop clusters they're actually just leaving up a meta store and treating it as if it's a SQL data source which is much cheaper to store and much easier to query at that point in order to have like a big pile of data and make it a part of your analytics needs and backend. I would like to just leave up very briefly this so if you're interested in Presto as a system like I said you can download this as a free ebook and it's really worth checking out to get an overview of it and how it works and I'd like to leave the last bit of time for any questions that anybody has if you'd like to ask any. Awesome, that's great. Thank you so much Randy. We have two questions that have come in so if there are any more please get them in right now. The first one is what are the recommended use case for Starburst Presto? Oh, that's a very good question. There's a number of them. Analytics use cases tend to work out really well when people need access to multiple data sources that tends to work out very well. Replacing out, as I mentioned, replacing out Hadoop is one that's increasingly popular because once you have Hive, Hive actually even takes a lot of infrastructure to run. You can not only save a lot of money by doing it because Presto does all of its work in memory it's many times faster than running Hadoop and running it through Hive and things like that. In fact, one of my colleagues was at a major insurance company actually, I guess I mentioned multiple insurance companies, it's not the same one as the one I was telling you about earlier though. And interestingly, they ran, they had a whole bunch of queries that they wanted to test. People usually go in with that. They created not the biggest cluster in the world. It wasn't too huge. But then they ran it and they looked at my colleague and said, are you cheating? And he goes, what are you talking about? He's like, this is like 10 times faster than we were running it through this other system. So you've got to be cheating. He's like, no, we're just not hitting disc at every intermediate step. So it's much faster. So yeah, going through the list again, analytics jobs where you need to connect to multiple sources is one use. Another is replacing Hadoop or at least supplementing your Hadoop with a much faster SQL style connection. Three is when you have analytics BI teams and data science teams, they inevitably need access and they need to make it much easier for them to connect with their tools. So you can make them more productive because as mentioned, like I showed you in the tree of Deweaver is that they end up seeing a tree of all of the different data sources all in one place. I mean, this is just two. I have somewhere around here, I have an AWS data source that's just got a ton. See, yeah, here I've got one. It's got Glue, Hive, JMX, Kafka, Mongo, MySQL. You can read all the others. So it's quite, the world opens up in a huge way once you do it. So hopefully I've given you a thorough answer to the question. Great. And then the other question that came in are, what are the key takeaways from this demo? The key takeaways are that you can run this on your laptop yourself just to give it a good functional test. And as far as the system itself, as just mentioned, it's useful for many different other purposes beyond that. I would recommend giving it a try and just connecting it to some of your backend systems in both your lab and up in the cloud and see how it works for you. And then if it does, come back, because in a couple of weeks, we're gonna be back with how to connect it to a broader system and how to get something like we just showed you right here. We're connecting your Postgres data to a broader world of data and letting your users get access to it. Great. Thank you so much. So those are all the questions that have come in. I wanna thank you, Randy, for coming and giving us this awesome demo. And I think we're good to go. So regardless of where you are, I hope you have a wonderful rest of your morning, afternoon, or evening. Thanks everybody. Thanks for having us, Lindsay.