 All right, good afternoon everybody. My name is Matt Fuller and today I'm going to be speaking about Presto and Cloud Native SQL on anything. Before I get started, I just kind of want to get a lay of the land, who here raised your hand, who's heard of Presto, the open source project Presto? Okay, great, cool. This is great, because today I'm going to be speaking about Presto and sort of the basics and background behind it, so by the end of the talk you will have learned about Presto and of those few hands who have heard about it, who's actually used it or tried it. Excellent, cool, awesome. All right, great. So in this talk, we will learn about Presto, which is an open source project, learning those background and architecture, and then after we go through that, we'll learn about Presto and Kubernetes and using it on Red Hat OpenShift, and then finally we'll talk about the Kubernetes operator and how we can participate in the open source community with Presto. So to kind of start off and cut the chase, we'll talk about how we got here, but in collaboration with partnering with Red Hat, we now have a Presto Kubernetes operator that you can deploy to Kubernetes distributions such as Red Hat OpenShift. So we'll talk about that, we'll get to that by talking about Presto and what the operator is, and this might make more sense to what it means and how you can use it. So first of all, it's Presto, well, it's a community-driven project, and you can think of it as SQL or anything. It's an open source project, and it looks like a database because it speaks SQL, but it actually isn't a database because it actually stores any data. What you can do though is by issuing SQL queries through Presto, you can reach out to virtually any data source on this particular diagram or some examples of data sources that you can reach out to, whether it's files on SAF or other distributed storages like S3 or relational databases such as MySQL or Postgres or even non-relational data sources, whether it's Cassandra or Kafka. And we'll talk about more about how you can accomplish that with Presto. So it's a community-driven open source project. It was originally created and open sourced at Facebook. So at the time, this was back in around 2012 or 2013, there was a 100-petabyte data warehouse using Hive, which is another SQL tool, if you're not familiar. And the problem with it is it wasn't performing to scale to Facebook's needs, so they set out and they created Presto. So fast forward a bit and it became a really popular project being used by companies not just Facebook but LinkedIn, Airbnb, Uber, Netflix. So a lot of companies that had large amounts of data that they needed to run SQL analytics on. And over time, it became really hard and battle tested to really work at that scale. So from the beginning, it was really for this high performance scale, high concurrency and so on. But one really cool part about Presto is this notion of separation of storage and compute. So unlike a traditional database that's managing the storage, Presto doesn't store anything. All you do is you issue your SQL queries to it and it will reach us out to the different data sources. So this is really neat because it works really well in a variety environment. So for example, whether it's public cloud or private cloud, you could keep your data in an object store like SAF or S3, Azure's Flop Storage or Google Cloud Storage. So you keep all your data there and then you can just provision the compute layer when you want. We'll talk about how we can do that in Kubernetes, but you can also do it in cloud environments as well. You can deploy it practically anywhere. So why that's really neat is now you don't have to have the system running all the time. You can provision it when you want to. You can scale it out and in as you want to, right? So if you have a lot of demand on the system, you can just add more horsepower to it. We'll talk about that more when we talk about Presto's architecture and how that works. But because of this, it gives you SQL on anything. Another piece we'll talk about later is how you can integrate data from different data sources. So if we go back to the slide where I said you can issue a SQL query and go to these variety of data sources, in the same SQL query, you can actually reach out to multiple data sources and join it together. So it eliminates the need to move data around if you don't have to. And then finally, there's no vendor lock-in. You don't have to run this on a Hadoop distro. It can query data from Hadoop like HDFS, but you don't have to. Or you could, but you can run it on MapR or CloudAir or Hortonworks or outside of Hadoop. There's no storage engine lock-in. It's very open. You can run it wherever you'd like. And it queries open data formats, whether it's ORC, Format, or Parquet, which are a type of file that you can keep data in. And then finally, there's no cloud vendor lock-in. You could run this on the public cloud. You can run on a private cloud. So it gives you that optionality and flexibility of the system. And as I mentioned, it's used by many well-known presto. There are many well-known presto users, whether it's Uber, Twitter, LinkedIn, Slack, Netflix, Facebook, of course, and many others. This is really just a small sample of the amount of users in the community that are using it. But a lot of these are using it at a pretty large scale. So now that I've given the background about presto, I'm going to talk about the presto architecture. Can we give you a better idea of how this system actually works, how it actually scales, and also kind of provide the background for how this will actually work on Kubernetes as well. So there's really two main components in presto. There's what's known as a coordinator and what's known as a worker. You can think of the coordinator as the brain of the system and the worker as the muscle of the system. So the user on the left here sends their SQL query that they want to query data source. Could be data that's in Oracle or Postgres or data that's in SAP or S3 or Hadoop data. But the query comes in to the coordinator and it really has three primary responsibilities. So it parses the SQL, which is basically just a text string and it transforms it into an internal data structure that presto knows how to operate on. And it goes through an optimization process. I'm not going to talk about cost-based optimizations and databases today, but you can think of it as SQL is a declarative language where you're declaring the data that you want to get. You're actually not actually specifying the steps. You're not saying do this, this, this, you know, you're relying on the database system to actually figure that out. And so because of that, there are many different ways to process the SQL query and the optimizer is task of figuring out the most efficient way to get you the results that you want the fastest. So that's what the optimization phase goes through. Then finally, it goes through scheduling. This is where it takes what's known as the plan, which are the sequence of the steps that you have to process the SQL query and it will schedule that work on the muscle of the system, the workers. So again, I was saying, you know, Presto doesn't store any data, so it's not technically a database, it's really a SQL query engine. So how does that work? Well, there's this notion of Presto's connector architecture. So in order to query any of these data sources, you, Presto has to have a connector for it. So the coordinator will reach out to the data sources via use of a special connector. And what this connector does for the coordinator, for example, is it can return metadata because Presto doesn't store anything, it doesn't have metadata. It doesn't know the table names, it doesn't know the column names, it doesn't know the column types. It has to get that information from somewhere. So that's the connector's responsibility. Reaches out to the different data sources. They all provide the metadata in different ways, but it transforms it back into something Presto understands. You know, you can think of it as, you know, a connector from going from HDMI, right, into my laptop, right, as a kind of a crude analogy, but it's translating something into something the other end understands, right? And so once it has the metadata, it processes the processing to happen on the workers. And that also goes through this connector architecture to actually get the data from the system. So in the case of distributed storage like HDFS or SAF or S3, it's actually reading those files and streaming them through back into the workers. In the cases of relational databases, it might be issuing SQL queries. So if you're issuing a SQL query to Presto to query data from Postgres, it will push down part of that SQL query into Postgres to get the results back to the workers. Now, Presto is a distributed system. So you could have one worker or you could have a thousand workers. The more workers you add, the more horsepower you add to the system. So if you wanna get more performance, you can add more workers. And this is where it works really nice on cloud, private cloud, and Kubernetes because you could set up what is known as auto scaling. So based on the load that you're putting on Presto, you can add more workers and then queries will get faster. It's a distributed system. So the plan is basically cut into pieces and in each worker, well, you can think of as like a piece of the plan to work on. It'll work on chunks of the data at a time. So when you're doing that in parallel, you're gonna get much greater performance rather than having something read all the data at once. Naturally it makes sense if you can split it. If it's parallel in some way and you can split it, you're gonna get better performance. So this plan, these workers could be much more complex. You've got a really complex SQL query. If anyone here is familiar with writing SQL, they can be really simple or they can be these massive pages and pages of queries. Presto will do either. It can handle pretty much any SQL query you could throw at it. But the more complex, the more complex the plan is and actually the more important query optimization is to make sure that you can get good performance. So the data flows through these workers and depending on the complexity of the query, it may have to redistribute data, meaning piece of the data might be have to sent over the network to other workers in order to compute the query appropriately. So when the query is done processing, it will go back to the coordinator and send the results back to Presto. So let's double click into like what this connector architecture looks like. So here's another way to look at it. If we wanna double click between, you know, into these arrows, these red arrows where the data sources are. We have the Presto coordinator on the left and the Presto worker on the right here. And this is, you know, if you were to look in the code, it's not gonna look exactly like this, but conceptually you could think of it this way. And there's what's known as an SPI and each connector has to implement certain required methods in order for Presto to work. So for example, the Presto coordinator will, you know, what to say it calls a method, get me table names. Well, the Hive connector will have to implement the method called get me table names that returns the list of table names in a format that Presto expects. Same with Cassandra and Kafka, MySQL. And because Presto is actually pluggable, if you have some sort of data source that Presto can't query from or some proprietary data source or custom data source, you could write your own connector or if it's data source that just Presto was in ship with, you could write your own connector in the complexity of it varies depending on what your data source is. If it's a relational database, they're relatively straightforward because relational database has rows, tables, columns that kind of map naturally into Presto. But if it's something more complex that isn't naturally relational, let's just say Kafka, for example, it gets a little bit more tricky. So if you're familiar with Kafka, it's basically a distributed PubSub system and you can subscribe to topics to read from messages or getting pushed to the topic. So in that case, the implementer of the connector has to figure out what is a table, what is a column, what is a row. In the case of something like Kafka, the topic, the thing you're subscribing to appears as a table in Presto and each message will appear as a row. And then it's up to you what you wanna call them to look like. So there's data stats, SPI. So if you want your data sources to work with the cost-based optimizer in Presto, you have to provide data statistics to it so it knows how to operate on the data. The data location is, where is the data located? What is the, for relational database, what is the JDBC endpoint for distributed storage? Like where is the physical location of these files? So here's like, if you double click into Mars a little bit more technical, I'll just kind of briefly go over this, but each of these workers has an operator pipeline in them. So these operators could be, read me the data, do a filter. So if you have data with, maybe there's a state column in it, Massachusetts, Rhode Island, California, but you only want to return the California rows. One of those operators will be doing that filter so that only pulls out the rows where the state is from California. Might be doing an aggregation if you're doing a sum or a count or standard deviation or something. So each of these operators have a task for what they're doing. So I'll talk a little bit about the Presto ecosystem now. And then I think at this point, you have a pretty good basic understanding of Presto and we can talk about how this fits into Kubernetes. So the most popular and widely used connector in Presto is called the Hive connector. Now it's a bit of a misnomer. It was originally written to query from the Hive data warehouse that is HDFS. But it is just a list of files, right? So if you're using the same sort of Hive table format where if you think of files and directories, right? A directory represents a table name and all the files in it could be one file or multiple files in it represent the data of that table. So as long as you replicate that on the different distributed storages, whether it's SAF or HDFS or Google or Azure or AWS, the Hive connector will work there. So you can think of Hive as three things. First of all, it's a SQL engine that was developed before Presto's time. And it was really clever where it took what is known as MapReduce jobs, which is a compute engine on Hadoop and actually translated it to, you'd run SQL, it translated to a bunch of MapReduce jobs to process the query. I'm not gonna go into too many details about Hadoop and MapReduce, but just take for granted that that existed and it was slow, but that's what was done. And, but what came out of that was this notion of a Hive data warehouse and that's how data is particularly formatted there. And that's sort of the background behind the name. Now there are really three parts to Hive. There's the actual way you're storing data on disk. Then there's the metadata catalog and then there's finally the execution engine. So the Hive connector uses two of those three. It uses the metadata store and knows how to read this particular table format. What Presto does not use is the Hive runtime because Presto is essentially replacing that Hive runtime because Presto is developed to be much more fast, performant, efficient than Hive. But Hive has a really good meta store and a really good way of representing tables. So it leverages that. But the other cool thing, this is where separation of storage and compute comes into play, is that you don't have to move the data. If you had your Hive data warehouse and you've deployed that in your infrastructure, your company, you don't have to do it. You don't have to migrate or anything. You can just point Presto at it and it can start querying the data. The other public connectors are relational database connectors. So Presto will connect to these over JDBC. You don't have to worry about this Presto figure this out how to do. So when you issue a query to Presto, Presto will know how to issue the particular SQL query to this database. So to your end users, it actually might be transparent that they're even querying data from Postgres or even querying data from Oracle. All they see is just its tables in Presto. So this can also solve sort of the data silo problem where you have to remember, oh yeah, for this data have to go over here and for this data have to go over here. Now you can point your users at one system that's Presto, kind of access all your data. And that's one of the values of the relational database connectors. And for performance optimization, you can push filtering down. So the example I gave earlier about filtering on state columns, you wanna only return the users that are from California. You could pull all that data back into Presto and Presto does the filtering, but the more you're moving data across a system, the more time and effort it's gonna take, right? So if you can eliminate data earlier in the processing pipeline, then you're gonna get better performance. So you can push down these filters to the database to return minimal amount of data. And then finally, there are non-relational data sources as well, whether it's a key mill over a Cassandra. Elastic search is one that went in recently, MongoDB, Redis. And the community is constantly contributing connectors back. So this list is only gonna grow over time. But if, again, if there isn't a data source that you wanna query from, you could always write your own connector and drop it as a plugin. And of course, contribute it back. So now that we have sort of a background of Presto, I wanna talk about Presto and Kubernetes now. So again, I wanna do another attendee poll. Who has heard of Kubernetes? Way more hands, cool. And who has tried or uses Kubernetes? Okay, less hands, but still more hands than the Presto. Question. Cool, so we're gonna talk about how these come together. So it seems like most people know Kubernetes here. I do have some slides to kind of set this up, but I might go a little bit, I'll still go over them. I might go over a little bit quicker. So it seems like a lot of people have knowledge of it. But yeah, so Kubernetes is a Greek word. When I looked it up, I was like, what is Kubernetes? Where did this word come from? It's apparently a helmsman, pilot, navigator. And finally, after reading through the internet and it clicked on me that it's a container orchestration. You have the helmsman of a container ship, right? So it's also, of course, an open source container orchestration engine for automating deployment scaling and management of containerized applications. So we're gonna talk about how this relates to Presto. So how do we containerize Presto so that it can work on Kubernetes? You may see in the slides K-8S, which is just shorthand for Kubernetes because having written it a lot of times, it actually becomes annoying. K-8S is much easier to write. So as I mentioned at the beginning of the talk, we Starburst partnered with Red Hat to provide a Kubernetes operator and Presto container on Red Hat OpenShift. And so now, going through the container catalog, you can use our operator and container and actually run Presto on your Kubernetes distribution, Red Hat OpenShift. And why is this important and why are we involved as well? So Presto is an open source project. We have a community addition. This is the free version that you can get everything in open source. And we at Starburst we constantly will be patching as you kind of get a very stable and tested version of it. Now, if you're an enterprise organization, this could be important because you can try Presto out, you can use it on Kubernetes. But if you want those extra enterprise features or the kind of the high touch support that we provide, you could come to Starburst and get that initial features, long-term support. So not forcing you to do a major upgrade because some bug fixes in a much more recent patch, we'll do patch back porting and so on. So you can start out using the community addition, but if you kind of need that extra enterprise touch, you can come to Starburst. So briefly why Kubernetes, I actually like this picture. I did not draw it, I gave credit so, but I got it from Kubernetes website. And so going back in the time machine, you just would run applications on just an operating system, Red Hat. That is inefficient, right? You might have to over provision for resources or those content resources. So then VMs came out, which are better, you get more efficient use and better utilization of your application. You can isolate the applications from each other. But the really cool thing about containers is you can think of it as like a lightweight VM where it's not packaging the operating system and all the libraries you need and the application. It's packaging exactly what you need and it can run on top of the operating system. So containers made kind of deploying applications really beneficial, but one thing is the complexity of maintaining these became complicated. So imagine if you had an application that has many containers, right? How do you actually orchestrate them? So that's what Kubernetes helps with is sort of the orchestration of and management of these containers. So as a brief concept, Kubernetes cluster is made up of notes and kind of adding these notes to the cluster increases the CPU and memory available. But the Kubernetes, it doesn't actually run the containers where applications like Presto runs, it actually runs pods and these pods run containers. So skimming forward, we've containerized Presto to run within a Kubernetes pod. So if you remember the diagram from Presto where you had a coordinator and workers, coordinator could running in a pod and then we use what are known as replica sets to run the workers. So you can, as a pod becomes more popular or has more demand, you can configure it so that you can auto scale the amount of pods. And therefore the amount of Presto workers to add to the system. So let's take a look at what Starburst, Presto on Kubernetes architecture actually looks like. So we have what is known as an operator, which I have a slide in a minute to kind of describe exactly what that is. Then we have the Presto coordinator in a pod as well as the Presto workers on the pod. We also provide the Hive meta store service. So if you remember from a few slides ago, I was describing how if you're querying from distributed file storage, there is no catalog. You have to have a catalog that tells Presto, what are the table names, what are the column types and so on. So if you want to query from distributed storage, we'll provide that catalog for you so you don't have to provide it on your own. And then finally we work with the horizontal pod auto scaler so that you can scale up and down your Presto worker nodes. So again, like the really cool thing about Kubernetes is it's platform agnostic. You could run it on Redhab and Shift. We have the certified operator for it. But if you happen to be running on Azure or Google Cloud or Amazon web services, you can move the application to and from there, right? So if you're on Google Cloud, and you want to move it onto OpenShift, you can do that. You don't really have to reconfigure much. It kind of makes the hybrid cloud a multi-cloud, whether public or private, really transferable. So you typically don't just launch the Podrick when the cluster uses abstraction called deployments where you kind of declare the layout of how you want Presto to work. And that's what we have to help deploy Presto on Kubernetes. So remember when I mentioned earlier that SQL was declarative. You're kind of defining the results you want and then it figures out how to get the results. I think of Kubernetes as declarative where you're just kind of describing exactly what you want. And Kubernetes handles the management of how to do that, how to deploy containers. If a container goes down, how to bring it back up so it's kind of self-healing. And so Kubernetes eases the burden and complexity of this. So going back to here, I want to talk about the Kubernetes operator, which is sort of the key to how we have Presto working on Kubernetes. So as mentioned, is non-trivial, to deploy a non-trivial application on top of Kubernetes is hard, especially if it's like a stateful application, right? Prior to this, you'd have to somehow manage all the bootstrapping, the complexity and lifecycle management, failure recovery, all these different scenarios on Kubernetes. So this concept of an operator is meant to kind of encapsulate this level of abstraction. And so you can kind of focus on the logic of what you want to do and it reduces the complexity and boilerplate code you might do throughout. So this is sort of a pattern for building Kubernetes native application. It runs as a container. And so we did is take the operator framework, we built a Presto operator to handle this. So the operator does a lot of things, but the four kind of main things that are unique to Presto that are worth mentioning is auto-configuration of the Presto cluster. I know I didn't see a lot of hands raised about who's used Presto, but you could imagine with a distributed system like Presto, it's pretty hard to configure. You have to tell the workers where the coordinator is. You have to specify how much memory to give it, how much, what thread count is for the CPU utilization, configuration of where to get the data. You have to tell Presto where the data is. So there's like a lot of tweaks and knobs you have to do, but we can look at the system, the CPU available, the memory available and do a basic auto-configuration of Presto to kind of get that base configuration done for Presto. So you don't have to worry about that. Also the coordinator high availability. So if the coordinator becomes unresponsive, if you remember from the diagram, I said there's one coordinator and any number of workers. So if a worker goes down, it's really not a big deal because you just have one less worker now. Of course, if a lot of them went down, then the performance gets slower. But if the coordinator ever goes down, that's a single point of failure, which means your system becomes interoperable. And so the real cool thing about Kubernetes that helps here is if the coordinator goes down, it will just bring up another one in a different pod as long as there's availability on the Kubernetes cluster, of course. We can figure out worker auto-scaling. So based on CPU utilization, for example, you can start adding more workers so long as there's availability on the Kubernetes cluster. And then finally, there's this notion we call a graceful scale down. So what happens is when you issue a Presto SQL query, all the workers are processing the data. So let's just say the demand on your cluster has gone down. The Presto workers may still be processing a query, maybe not as many queries, but still processing the query nonetheless. So you don't want to pause the cluster while you scale it down and you certainly don't want to remove a worker while it's processing a query because that would ultimately fail the query. So what we have this notion of a graceful scale down where when we determine, okay, we're gonna go from 100 nodes to 25 nodes, we may mark 75 of those nodes, let's help Presto, hey, don't take any new work, finish up what you're doing and when you're done doing, shut down. So what that does is any new queries come in, they're only gonna be running on the 25 nodes and when the other 75 nodes finish up what they're doing, they'll shut down. Other things that we'll be working on, some degree of operator metering, this is another thing that's important in the enterprise, I per se, is they want to know usage report or even to tag users to see how much resources they're using on the system. There's a variety of reasons that they may want to do this, one could be charge backs, if the IT org has a Presto cluster for a bunch of departments, maybe they want to charge back to the different departments based on usage. Or as simple as just knowing how much they're using it. You know, there are a variety of components. First, there's the Presto Community's custom resource definition. This will redefine the resource of a Presto type. An instance, like when you instantiate this instance, that represents a Presto cluster on Kubernetes. There's the Presto operators continually monitors the Kubernetes resources and it will create and remove a Presto resource when asked for. So coordinator, a worker, and so on. I think we've covered what a coordinator and a worker do, but again, the coordinator's the central point responsible for issuing the queries and implementing the queries and the Presto workers are actually doing the query processing. There's the coordinate services where we expose an external IP so you can actually connect to the Presto coordinator to actually issue queries on the system. We have a network policy, it's really for security that allows the inbound traffic to the Presto workers from the Presto coordinator. The Metastore, which I spoke about, this is for if you're using distributed file stores that doesn't actually have the catalog, you can keep it in the Metastore that we provide. And then now if you wanna persist, so if you're using distributed file storage and you wanna persist your table names, columns, comm types, it would be wise to have a database that's backing us, whether it's Postgres or MySQL. So you can connect to one that you can access outside the cluster or if you just kinda wanna get up and running fast and try it out, we do have this notion of an internal Postgres that just runs on Kubernetes, but it's life cycle starts and stops with the Presto clusters, which is really only for kind of demonstration purposes. And last is simple. With a few commands you can deploy and I'll do it so in the next five minutes I can demo this. So sorry, it's a little hard trying to see my screen here. All right, so first of all, we'll use, let's make sure there's nothing there. Oh yeah, of course I already have it running. So let me delete it. Oh yeah, sure. Let me get this deleted first and I'll work on that. I don't like it, huh? Get bigger, can you see that now? Is that good? In the back? Is that even, is it getting bigger? Yeah, that is getting bigger, okay. Is that good? In the back? All right, cool. So for time purposes I'm not gonna tear down and bring up the operator, but there's an operator running. So what we do is I have this YAML file that defines what we want to deploy for Presta. I can show that in a moment too. And now we've created it. Let me see if I can bring that example over here too. All right, so it looks something like this. We're not gonna go through it all today, but you can say, okay, use Prometheus as the monitoring tool. You specify the type of CPU you want for the coordinator, workers. I'm choosing nine workers in this case and so on. So I'm not gonna go through all today, but that's sort of how you define it. Now, so if we, do we get pods again? All right, you'll see there's a bunch running now. And now what we wanna do is we're gonna do expose. All right, so we are setting up the IP for external metastore. It takes a few seconds. So an observer will see that there are only eight workers running because the Kubernetes cluster wasn't big enough. Well, anyways, well, that's starting a lot. Let me just go back to the final slides here. So if you wanna customize the solution, we do have as long as a bootstrap script. So if you wanna set up security or download additional software, you can. Or of course, for more heavy customization, let's say you wanna install additional software on there or do something else, you can use a custom Docker image. You just pull from the one we have and then you build your own and then you can specify it in the configuration. So you can join the community as mailing lists, there's development Slack, GitHub. You can follow on Twitter as well. And then we at Starburst, we have a monthly newsletter. So I highly encourage you to sign up for that. If you wanna learn more about Presto, we're just kinda seeing what's going on in the community. Presto news, events, how to, we just kind of aggregate it all and put it in your inbox. Then of course, contributors welcome. So if you wanna learn more about sort of the vision and development philosophy, contribution process to Presto, you can go to presto-sequel.io and learn about that. All right, let's go back real quick to the demo here. I realize I have only a couple minutes but I think we can do this. Did I copy that all? Is it highlighted? All right, I'll figure it out. Oh, I see. It says hard for me to see. That's, didn't like that, huh? And really, so that's 34.73, is that 230? Yeah, 130. All right, so here we are. Presto's running nine. Okay, yeah, yeah, sure. So anyways, yeah, that's Presto running nine worker nodes on a Kubernetes cluster. And I'm told I'm short on time so I can definitely take some questions. I guess one final piece or plug of course is at Starburst we're hiring. So if anyone here is interested, reach out. Cool, can I take some questions? A quick question. Does Presto have a reduced like SQL language that you have to use as it supports so many different things? Yeah, it's a great question. So no, actually Presto follows the SQL standard quite close. So the reason for that is you can use your, your favorite BI tools or SQL developer tools. So they all kind of work seamlessly with those. So the Presto connector will handle any sort of nuances or translations to the underlying data sources. So you use standard SQL, but if there's some, my SQL or Postgres nuance, Presto, the connector figures out that kind of translation. A good question. Hey, Matt. So you mentioned that Presto is not doing any storage, but is there a way to set up any sort of caching mechanism to improve like query timing or anything? Yeah, yeah, great question. So there's a open source project called Aloxio. And you can essentially just deploy it with Presto. So that's really good for cloud storage or sort of like hybrid or multi-cloud scenarios where the data isn't necessarily close to you or you have to go over the network and reach it. So it has a variety of caching algorithms it uses and based on your workload, you can choose which one you want. But so if you issue queries and the data becomes hot, then subsequent queries will speed up. And we are actually a partner with them as well. So if one is looking to deploy both, we offer starburst with caching, which is essentially that. Does Presto use any integration with the scheduler on Kubernetes to schedule better the workers based on load and the execution plan or something? Presto itself does not natively integrate with the scheduler. But I would like to talk more about with that if you offline maybe after the chat. I'd be interested to know what you had in mind. Probably the last one I'm imagining. Just a quick question. You mentioned JDBC as a connector tool for some of these data stores. Are there any shortcomings in using JDBC? Are there better methods for faster access? Yeah, that's a great question. So JDBC really isn't meant to be a big data pipe, right? It's of course the easiest way to write a connector to a data source because it's a standard interface. But so some of the proprietary connectors that we have at starburst, we would say is like a native direct connection. So for example, our tear data connector we have actually connect directly to the amps that we bypass JDBC entirely. So the amps, if you're familiar with tear data, I'm sure people in here are not, but the amps will then be pushing the data to Presto. So we do it in a completely parallel way instead of a single pipe way as well. And we're doing that with Snowflake as well, our Snowflake connector. But great question. I think we're out of time, but thanks everyone for coming to the talk. And I'll be mulling around I guess out there later if anyone has any additional questions.