 Okay, well I think the room is small enough, participants small enough where we can make it somewhat interactive if you have questions while we're going along just raise your hand, okay? So the first thing, I just wanna start with a huge thank you for coming. You know, I know it's Friday, it's right before lunch, right? A lot of you guys probably flying out this afternoon by this day weekend, right? So thank you so much for coming. I really do appreciate the time, right? So today's session is going to be about presto. It is kind of meant to be kind of an introduction to presto. I'll speak a little bit about IBM, and by the way, I'm from IBM. And I'll speak, and those of you probably may have heard that we acquired a HANA about a month ago, right? To join the Presto Foundation. So I'll speak a little bit about that as well and some of the things that we're thinking about. So there's quite a bit on the agenda today. So I'll try and get through it quickly so we can also kind of have that interactive Q&A. There's also a demo that I wanna show you in terms of how to get started, what it kind of looks like. But a lot of topics here that we wanna cover. So I'll try and go through it quickly so we can also again have time for questions as well as maybe in the demo section we can make that live and interactive as well, okay? But yeah, we'll talk about what is presto, right? Why is it interesting? Why is it interesting for you? We'll talk about some of the use cases, right? We'll talk about some of the roadmap as well as what's coming on as well as we'll finish and close with the community and how we can all get involved. I think all of you guys are probably more involved than you and I am. So the first section is about why presto. What is it in presto that's kind of interesting? Why is it that we're here talking about presto today? So presto is a distributed SQL query engine. But interesting enough, that's not really the word that I like to use because everybody claims they're a distributed SQL engine, right? Even the classic originals Teradata is a distributed SQL engine, right? So is Oracle Rack, so is DP2, Natesa. They're all distributed SQL engines. But presto is a little bit different, right? We'll talk some more about some of those differences but the way I like to talk about presto is that it has some very different architectural differences compared to traditional warehousing and we'll go into all those details in the following slides. But I prefer to use the word disaggregated engine, right? That complete independence is important. When you get presto, whether it's open source or whether you get it from a service, it really is just compute and it has also some great capabilities that get into scale and whatnot. To give you a little bit of history and again, I mean you guys are all open source guys, right? I think so you probably know this better than I do but presto came out of Meta, right? They contributed over 10 years ago to open source but Meta were the same guys who also did Hive, right? Hive SQL for Hadoop, right? So Hive for Hadoop had some issues and gaps certainly for Meta, right? The first issue in gap being is that it wasn't really meant to be ad hoc or interactive and I'll talk about that in a performance slide but also about the difference and why that matters, right? As well as there's other issues as well, Hive SQL is not pure ANSI SQL, right? And SQL basically kind of is the language that kind of binds it all together, right? So Meta who created Hive SQL, they also then created presto to be able to address those gaps and then they released it to the open source community I think it was 2019 over 10 years ago, right? So the interactive nature of it extremely important, right? You know again I'll talk a little bit more about the performance of why you can actually get that type of interactive ad hoc SQL workload capabilities as well as you know from my perspective when I kind of look at presto I was talking to this gentleman in the back about that as well. I see presto kind of hitting in two different directions. There's the virtualization perspective through all of its connectors, right? Over 30, right? And then there's kind of this lake house you know again the interactive ad hoc SQL type of perspective, right? But it's designed to be both so you know you can see those bullets here. And again it's designed to access petabytes of data, you know Meta I think they're over 300 petabytes now. So we're talking about data volumes that are very very very different. It needed a new solution to be able to access that huge amount of data, right? You know they have 1,000 active concurrent users every day, right? You look at like Uber who also is a very big presto user, right? And they're running 100 million you know ad hoc interactive queries every day, right? So it's at a different scale, different scale volume, different scale that needed something different and new and then the traditional warehousing or other type of engines. So with that said I just want to run through real quick kind of like what kind of like a flow would look like. So this is presto and orange in the middle up in the top the JDBC layer you can certainly connect traditional BI applications be it like MicroStrategy, you know be it like Tableau, right? You know Excel, right? All that you can certainly do JDBC connect to presto, right? You can certainly also satisfy data science type well use cases I'll go into that a little bit further as well with things like Jupyter notebooks, you know notebook access as well. You can use like your open source you know SQL editor visualization tool like super set if you want, right? So standard JDBC connectivity connect to presto on the bottom the source of your data. Again, that's kind of where it's a bit different compared to like traditional data management systems. Certainly connectors to give you all kinds of access object storage itself, right? I'll talk more I'll go deeper into that but also a lot of data sources, right? From traditional relational services like you know SQL server, you know MySQL Postgres very big, right? As well as things like you know key value stores, right? Like Redis as well as like streaming you know capabilities Kafka, right? Druid, right? So extremely rich source, right? Extremely rich to be able to access that data in place, right? You don't have to really load or migrate it. I'll talk about why that's really cool to be able to immediately access that. So I already kind of spoke a little bit about that is presto a database. I noticed there's lots of material that out there that says well the main reason why presto is not a database is because it doesn't come with storage, right? You guys might have heard that but that's not really necessarily the way that I would see it either, right? You know those of you who work with traditional type of databases and by the way I spent my entire career in data management databases, right? Work with Teradata, Oracle, you know Informix, seek DB2 and just all of them, right? Certified in almost all of them, right? And all of them you can just install the engine, right? You can just point them to generic storage, right? But the real difference is that again the way presto does it is very different, the disaggregated nature. All these other databases that I talked about they tend to have a proprietary internal format, data format, they do it for performance, they do it for security, they do it for lots of reasons. That's kind of the heritage of where they come from. They tend to have an internal technical metadata catalog, right, different. So there's certain technologies that are different but starting kind of from the beginning that scalable SQL, you know distributed access and similar in many ways too, right? But again, kind of trying to address a different dataset there where others kind of were different. Like if we talk about data warehousing, data warehousing, the easiest way that I like to kind of talk about the difference between data warehousing and let's say like Lakehouse, data warehouses can work in the terabytes data range, right, you'll see very few data warehouses that play in the petabyte space. It didn't come out of that big data intent, you know that Hadoop origin, right? Which now has evolved to much better to Presto and to others, right? Where of course the big, you know, like Presto type services, right? Like I talked about before, Meta, right? 300 petabytes, just enormous. And they're accessing and processing through petabytes every day, right? So that's one of the big differences I'd say. What they do, how they do it is different but in many ways the purpose that they're trying to serve is similar, right? SQL access to your data. How's it related to Hadoop? I already spoke a little bit about that. You know, Meta came out with Hive for Hadoop but Hadoop is different. There's performance differences as well. I'll talk a little bit about that push versus pull, why you get performance improvements with Presto, how things have evolved, right? So related to Hadoop, in fact, one of the most common use cases that we see for Presto is, you know, Hive metadata, right? Accessing object storage or even accessing Hadoop directly, right? Through a Hive connector. So that Hadoop modernization, the relationship with Hadoop, accessing Hadoop data, maybe even modernizing using Hive metadata to be able to access object storage data and some of these more new advanced file formats and table formats, it is related. So I would say there's certainly a history, there's certainly a heritage, there's certainly an evolution, right? There is a relationship here, a lot of customers coming from Hadoop, you know, Hadoop source or Hadoop origin, right? Spoke a little bit again about data warehouse already, tried again the big difference. They were built and designed at a different time for a different purpose. And although there are some more kind of, I call modern architectures of warehouses that are somewhat similar, Snowflake is kind of similar in the middle, right? But there's still differences. And anyone would suggest that Snowflake is processing, you know, the likes of data like Meta or Uber or Netflix, certainly not, obviously not, right? So again, what are some of the things that make Presto unique? Scalable architecture, absolutely. You know, again, compared to more of the traditional warehouses, you know, like with Presto, and I'll show it to you in the demo as well, right? You can certainly, you know, scale Presto, add more worker nodes, right? And it's all online, right? You know, there's kind of a whole discovery process. Soon as the coordinator sees the new workers that you want, you know, that that's calling out to register, the next query will use them, it'll schedule them and use them. Similar with queries that drop out, right? Queries drop out through failure or whatnot. Next query will use the new cluster topology. So, highly scalable, right? All online, extremely easy, right? 1000 nodes, you know, Meta is 1000 nodes, right? To be able to work with their massive amounts of data, right? Plugable connectors, again, this gives you, and that's kind of the difference in concept with Presto. In so many ways, Presto was designed where here is your engine, we recognize your data is in this way, in these formats, in these locations, in these sources, right? And we will give you a way to be able to access them. So, pluggable connectors, right? Extensible connectors, connectors. Like for example, at IBM, you know, just recently, as a part of our project, I'll speak a little bit more about WatsonX.data. We created connectors for some of our data management engines, like DB2 and Netiza, very easy to do. Whole rich world out there with lots of connectors, again, giving you the ability to connect to just all kinds of different data sources, right? And then to be able to query across them. So you can certainly join across these different connectors and these different data sets, and you can sort of process data across them. So, very good virtualization type of capabilities, as well as the performance of Presto. Very good, again, intended design specifically for these very, very large data sets. So in the scalable architecture with Presto, the easiest, most simple way is kind of this diagram here you see on the right. So there's basically two different node types or roles or pod types, if you're deploying this in like Kubernetes. It's coordinator and worker, right? And like I said, you can just add workers, you can drop workers, right? Next query sees them, right? The coordinator will then plan and schedule to the new new new sets. So very, very simple, very, very easy, right? I'll show you a demo where again, you can use like a same common Docker container. You just change the configuration, basically say this is coordinator, these are workers that basically point to discovery service, extremely easy and those two different node types to be able to handle what's going on, right? Again, very, very scalable, up to 1,000 workers, right? Validated the largest companies. I've already kind of given you guys examples of that. And here, just a little bit more kind of going into what's happening. So over starting at the top left, this you'll see typical applications that connect into Presto, right? They're typically using JDBC, submitting SQL to Presto, getting results back. Inside the Presto coordinator, there's certainly a parser where we take that SQL, right? And then we create a plan inside the coordinator. Those of you who are familiar with databases, they're like SQL access plans, right? I'll give you some examples of what that looks like. I'll put the plan, then it's actually scheduled out to the worker nodes, right? In terms of tasks, right? I'll show you again kind of what that looks like. Then typically there's other information that the coordinator will interact with and actually all the nodes interact with which is like metadata, right? So like if you use Hive, again I'll go into a little bit more detail with that. That Hive connector is set up on all your nodes, right? So they can all see, you know, things like, you know, what is the table name? What are the columns in the table, right? As well as things like, you know, where is the data file located? That's technical metadata, right? So the way that it flows again, right? SQL plan, scheduled to the workers, workers are all doing their data, right? From a performance perspective from the workers, you know, very, very good parallel execution, right? Down into the push model, much better than like, for example, MapReduce and Hadoop where you're pulling data, processing your reduction, sending it back out to disk, then the next stage, you're doing it again, you're doing it again, right? And so basically it's kind of like, you know, like a waterfall type of parallelization which isn't so good for interactive, as opposed to Presto, which is very good in that type of capability. So I do want to speak a little bit more about connectors, you know, so these are what we call the top five most popular connectors, that Hive connector extremely popular, either used to connect directly to Hive or Hive Metastores or Hadoop deployments are already there, or used to be able to create Hive tables that you want to work with that's residing in object storage. Now, if you look at this list at the top five, what you'll see is that four of them are related to table formats, right? Hive tables, Delta Lake tables, Iceberg tables, Hootie tables, right? So four of them is about table format, what you need to do, because these table formats offer different capabilities that's a whole different session of its own. And then one of them has to do with Postgres, which is a database connector, you know, Postgres, awesome open source database, right? Extremely, you know, popular, you know, probably one of the more advanced open source, you know, database engines, right? So that's kind of what you see. Customers are using these connectors, right? To be able to access data in the format that they want. They kind of have different pros and cons for what they're doing, right? You know, Iceberg, very, very popular, you know, for example, at Netflix, right? They needed to use Iceberg just because the time it took to be able to even parse the hundreds of thousands of files that were out there, just to be able to actually even get a query plan used to take them hours, right? With Iceberg, as well as with some of the performance optimizations that are there. You know, they were taking queries that would take days and now they were running just in minutes, right? So they're all using different table types to be able to do kind of what they're doing. Here at IBM, I'll talk a little bit more about it. We're kind of focused in developing, improving, enhancing Iceberg Connector, right? So that's one of the areas that I'll talk a little bit more about when we get to kind of what the future section. In this, this is what they call, so the slides as connector data model, what I like to call it is kind of like a naming strategy. Like so for those of us who are familiar with databases, right, you connect to a database and then there's typically like these different levels for how you access, you know, your tables, right? There's your database, there'll be a schema, right? There'll be your tables, right? And this is kind of what this is. So in the Presto model, there'll be a connector and in the connector, you know, that's where you kind of choose, you know, what type of data you're going after. It could be virtualized, it could be, again, all the SQL database sources, it could be streaming services or it could be, you know, a hive or it could be object storage, right? So you create this connector and then that connector will show in Presto as a catalog, right? And again, I'll kind of show you a quick demo of this once we get there. And so the naming scheme will be like the catalog and then the next name tier would be the schema and then your tables, right? So that's kind of how that works together. Again, connector kind of is the catalog, right? It shows up as a catalog. So you choose to enter that catalog, think of it kind of like a database in the traditional space, right? Then you create schemas in there for, you know, for that naming strategy and then tables. So for this next slide, for the hive connector, I kind of want to walk through kind of more what that looks like. So in this model, it shows that there's the coordinator, but actually when you configure this, all the workers also have the connector property file for that hive metastore, right? So when you're working with the hive connector and your goal is to be able to access hive data, whether it's on HDFS or whether it's on object storage, right? So as a part of that connector that we just talked about, you would configure your hive metastore, right? You'd say, yeah, I'm using this hive metastore. This is where it's at, right? To identify where it's at, it uses thrift, like a thrift URL, right? And then that metastore tells you all the technical details to be able to get to your data, right? So again, you know, it'll be like the table names, the schema of the table, all the columns in the tables, right? Where it's located, right? Whether it's in S3 or whether it's at HDFS, right? So it tells you everything you need to be able to understand where you go to get to the data, right? And so, you know, very, very common. Like I said, it's probably the most common use of Presto using the Hive connector, right? To be able to access object storage, as well as other, you know, HDFS and whatnot. And one of the reasons why this is also popular, and again, this comes from a very rich heritage, right? From that Hadoop Hive heritage. So from the file formats, open file formats, right? The Hive table and the Hive connector supports pretty much all of them, right? And you've probably heard some of these, right? ORC, Parquet. These days, at least at IBM, we like Parquet, right? It's very good for analytics. Very good for compression, right? Avro, right? You know, kind of very good more for things like row-level transactional, right? But we'll support basically all the types of file formats that are out there, right? Against the Hive connector, if you're doing this like on object storage, you know, these are basically Hive tables, right? So the table format is Hive, and then the file format can be any of these, right? The last line there, the no-data ingestion data, I touched a little bit on that. So the idea is, is that if you're using the Hive connector, and you know where your data is, right? So let's say, I know that my data is on AWS S3, right? I know it happens to be, let's say, in Parquet, right? Then what you can do with the Hive connector is that you then just point to that, you know, to those assets, right? So you say, yeah, I know my data file is there, right? I know it's Parquet. You create a Hive table on top of it, so then it shows up in that naming space, right? It'll be through the Hive connector, you know, which shows up as a catalog, right? The schema you create, table you create, and then you can immediately SQL access all that data in place, right? You don't have to. There's not this concept of I need to, you know, load it. There's not this concept of, you know, I need to ingest it, right? So it's meant to be in place. But one of the questions that I know that this gentleman up here asked was, but you know, if I'm looking for performance, is that necessarily what I'll do? So you can get this instant access, no-data ingest needed. But if you want performance, then yeah, a lot of it has to do with optimizing things like the file format, okay? Now, since we're talking about connectors, we do want to talk a little bit about some of the connectors and the progress that's happening. So there's always, you know, lots of connector development. I talked about how from IBM, we're certainly looking at, or we've already created DB2 and Netiza. But there are also, you know, other work that's happening certainly throughout the open source community. There is a native Delta Lake connector that's already there. Right now it's read-only, but soon write support for it will be coming. Larksheets connectors also there, right? Some of the areas that we're working on, right? We want the Hootie connector to be native faster. That's really kind of the purpose of it, right? As opposed to going through like a Hyde metastore, right? Same with the Iceberg connector. The current Iceberg connector in Presto is, you know, it doesn't support deleting rows. You can certainly insert, right? There's also other areas of table maintenance, things like, you know, expiring snapshots, right? You know, other areas, things like that as well, compaction, whatnot, that we also want to bring in. So some of those things will also be coming as well as performance improvements, right? DML performance improvement is also one of the big areas that I know that IBM is certainly looking at and trying to contribute for open source. Okay, and in terms of why is it that Presto's performance is very good in this space? Because I was speaking about that throughout pretty much all these slides. Part, first part of it is the in-memory processing. So kind of coming when you think about where it came from, right? This is the Hadoop Hive. Hadoop Hive is really, really good at high bandwidth, but its latency is really, really poor. And I kind of talked a little bit about that. So Hive SQL with MapReduce is what you'd call a pull model, right? So it's that same notion, right? You bring all the, you read all the data, you reduce it and you write it back out and it's like this waterfall model, right? That's very serialized, right? With Presto, it's very much a push model, right? We actually push all the execution down into all the workers. All the tasks and the workers can even communicate with each other. So there's none of this waiting for the stage up above for it to finish, right? So the architecture is very different, right? Designed for much better latency, right? Or again, I'd say Hive kind of came from a very high bandwidth, you know, intent and design, right? Out of the box, it's all in memory. There are some experimental capabilities to be able to spill, but that's really not the way most people run Presto. So when we're looking at a lot of our customers, a lot of our HANA customers, they don't run with spill enable. It's experimental capability, right? They run and intended in memory, right? So that means everything's happening in memory. Again, how you get the faster, the faster latency, right? For that interactive ad hoc query style. In terms of columnar storage and execution, certainly Presto is a columnar engine, right? So it favors, you know, again, things like, you know, parquet, columnar file format, right? In terms of feeding into the engine and column form, you know, and kind of a column method of processing, that basically is the standard, the normal, for anything that's really analytic workloads. So basically every reasonably modern engine is column, is a column-oriented engine, right? And one of the other capabilities that's very good, certainly in the HANA product, which is coming into the Watson X data product, is it's multi-level caching with Raptor X, multi-level. So certainly clearly the data cache is also available, but there's also like metadata cache, right? So like in the Hive example above, when you're constantly talking to that metadata, right? You know, show tables, give me a list of all the tables, right? In many of these environments, you can have thousands of tables, right? Do metadata cache to help speed that up. Header, you know, cache, right? That's kind of like the example I was talking about with Netflix, right? When there's hundreds of thousands of files, right? To be able to go through and process all of them, to be able to optimize and look at the ranges and all the metadata of each of the file headers, that's available there as well. Partial result cache, where you can have intermediate results that you can use as well to complete your query. So multi-tier caching, many use cases, 10x performance improvement, right? So again, all about the latency. So let's improve, in this case, IO latency with multi-tier caching. So let me quickly go through kind of like what a life of a query looks like. This is the most simple example. It's basically just select all the rows from an orders table with one filter applied. Discount equals zero. So what that looks like in terms of like, you know, the operators, the stages that you'd see, you know, this kind of again, looks kind of like an explain plan. If you use the presto monitoring, you know, UI, you know, you can look at the explain plan. It looks kind of like this, a little bit like this on the right. So very simple, right? You start with scan of the table, you apply the actual filter, right? Only, I only want the rows with discount equals zero and then you produce the result, right? For something that's a little bit more complicated, complicated just from the perspective of we kind of want to touch all those common operators that you actually use, right? Inside, you know, analytic type queries, right? So in here, you'll see that, well, we add a sum, right? We also add a second table that we want to join with, right? We also add a group by, right? So similar type of thing that you'd look like. So in the actual operator, right? In the actual explain plan of how this is actually processed. So you would start, in general, you kind of want the large table on the left. So you scan that large line item table, right? You apply that filter, that predicate with discount equals zero, right? You join it with the orders table on the common order key, right? And then from there, you flow it up and you apply the aggregations. You know, you do the sum on the columns and the group buys. Okay, so the next I want to go through is I do want to go through presto. You've been hearing me talking about interactive. So, yeah, lots of reporting, lots of dashboarding, right? But there's also things like data science use cases, right? Because the data scientists also kind of often want to access a lot of data, particularly if they're coming from that Hadoop type of origin, right? As well as, again, the federating query across data sources. So, you know, if you look at kind of like that flow, like we started with, you know, what does it look like when you're talking about reporting and dashboarding, right? So, you know, again, you'll see typical BI reporting tools, you know, on the top in this example. So things like Tableau or Looker, right? JDBC driver to connect to the presto cluster. You'll see that in this case, on the left, it's going to use, you know, like a Hive Metastore connector, right? To be able to access data in S3 data, but it can also access data from MySQL. You can join them together, right? Send your data up to Tableau. Similarly on the right, you can use Looker, you know, as your BI reporting tool. In this particular case, we're not using the Hive, well, we're not using the Hive Metastore. We're using the Glue Metastore instead. That's available in AWS. Again, accessing S3 to really get that data, send the results up to Looker. Okay, and the data science case, similar again, so from the data science case in this example, you know, they're hitting the same data sources, right? But they're using different tools to be able to access that data, right? So data science, you know, very common that they're using notebooks. So certainly Jupyter notebooks are there. But also, you know, all the other typical type of data science, you know, notebook applications, you can certainly connect to Presto, access all the data sources that you're interested in. You know, if you're interested in creating new, you know, training models, and you have data from all kinds of virtualized sources, you can do that as well with all those different connectors that are available, as well as access, you know, the petabytes of data that might be in your data lake, your object storage, which is typically where they reside. Okay, now, Presto can also do batch type of work. And certainly in Meta, that is the case. And the typical type of batch is typically around transformation, right? You know, you might hear it as ETL or ELT, right? And it certainly works fine with common tools like Airflow, right, to be able to help you with that type of, you know, data, you know, transformation data movement, right? And you can certainly use the Presto cluster to be able to help with that as well, right? And again, you know, using the rich data source environment that Presto offers, be it connectors to database sources, or be it's, you know, things like object storage sources, like through HMS, you can certainly, to access that as well and run batch jobs. Okay, so you've heard me talk a lot about this data lake house, open data lake house. So I kind of wanted to share with you what this might look like. So what would a data lake house, you know, with Presto look like, see Presto here in the middle? Right now, in almost all the customers, every customer I speak to, you'll see the data warehouse and lake house is usually side by side, right? You know, I was just speaking to a customer just before I came here, and they happened to be a Snowflake customer, right? But they didn't want to use Snowflake, which was also very expensive, to be able to do that ETL batch type of job, right? That I just talked about. They wanted to use Presto to do that, right? I know on the slide, I talked about performance up above, but in many cases, it's also about price performance, right? As Presto offers just excellent price performance, certainly compared to the likes of more traditional data warehousing, right? So this particular customer, they wanted to do a lot of the ETL work. They wanted to write it out to an open iceberg table format, and then they were using Snowflake external tables to then read those tables for any of the reporting that they wanted to do, right? So many places you'll see this together, right? For the highest levels of reporting and dashboarding, something that they might come from, or if it's just even heritage of data warehouses that they already have, a lot of times that's there, then they'll be bringing in things like Presto or the open lake house to be able to get access to things like the open formats. Because remember what I was talking about, the differences between traditional data warehouses and this new Presto or open lake house, the big difference was not the storage. So even in this slide here, it's just proprietary storage. It's proprietary data format, like for the Snowflake example I just talked about, the customer that I just spoke to, right? Snowflake writes in its own internal proprietary data format. It happens to go to object storage, same object storage that like for example, Presto would use in a similar type of scenario, right? But a lot of customers want to be able to move to an open file format that multiple many applications can access. So they want to use Parquet or they want to use ORC, right? They want to use open file formats and they want to use open table formats. They want to use Iceberg. They want to use Delta Lake. They want to use Hoody, right? So you'll see a lot of this kind of coming together and this is kind of what the architecture looks like. And then the last thing I want to talk about and just kind of show you real quick is an introduction before I jump into a demo. Let me check time real quick. Good, I think we have a good time. So earlier this week, IBM announced what they call Watson X.data, right? And this is IBM's entry into that open data lake house, right? And the key differences in terms of our customers, what they were telling us, right? What are some of the pain points? What is it that we could do? I don't know if you can kind of see that very clearly, but I'll show you in a demo and kind of highlight what it is. So over on the left, one of the areas that they were really interested in was the ability to kind of bring this together. You heard me speak about how you can have all these different connectors to be able to point to all these different data sources, to be able to point to different technical metadata stores, to be able to point to different storage pieces where again, all these open file and table formats might reside. To be able to do that, a lot of our customers still found it a little goreby, right? There's all these different configuration property files that you have to set up. You have to set it up on every node. So one of the things that we've done, like with Watson X.data, is that we make it really easy, right? You register your object storage, right? You register your high meta store if there is one, right? And you kind of can just kind of like, you know, puzzle piece, glue them together and we'll create all those property files for you underneath, right? Making it extremely much easier to be able to set up your diverse environments, right? So to be able to connect to all these different sources, one of that's one of the big differences. Over on the right, I want to talk about this a little bit to be able to share all of that. Again, I don't know if you can actually see this, but I'll show you more in the demo. But what you'll see is that on the top level are all the engines. One of the things that we believe is very valuable in this space. When you're working with all these different files, data formats, file formats, different data engines, at the end, you want to be able to share them, right? So, you know, you've heard me speak about how like Iceberg is a strategy for us, right? So the idea is that, okay, well, if you want to agree on, let's say, an Iceberg table format, and that will provide you some level of consistency, you should be able to share that across multiple engines and then benefit from the efficiencies of those engines that are, again, designed to do different things. So up on the top, you know, there's Presto here, right? There's Spark next to it. There is Neteza next to it and there's DB2 next to it. And all of them can share access to this level of catalogs, which are like those technical metastores, right, which shows you where all the data is. It gives you all the technical details of where all the data is, and then all the object storage, which is where all that data resides. So all these engines can now share and participate together, right? Iceberg will give you the consistency. So one doesn't necessarily step on the other and they're all seeing the most recent data, right? That they can each change and participate inside, you know, this new lakehouse world, right? So we're putting the tools together to help build that and make that easier for you. So many of our customers, they have different engines, right? And, you know, there might be vendors that only have one engine and they say, my one engine can do it all. But I think we all know that that's never true, right? So our big part of the strategy is to say, no, we can make it easy for you so they can all participate. Okay, so with that, let me kind of show you a couple of demos real quick. I'm sorry, what was the question? So the question is, is there more than one catalog? And there can be more than one catalog, but where you share would be through a catalog. Like for example, when you configure Iceberg connector in Presto, you choose where you wanna store what they call the Iceberg catalog, which is the pointer to the most recent version of the data. So a typical, you know, configuration would be is you would store that Iceberg catalog in Hive Metastore. So then all the engines can go to Hive Metastore, which it is completely consistent in Acid Compliant. So they can go in and then make sure that they're looking at the most current Iceberg pointer snapshot, right? So then they can all make sure that they're participating in sharing and actually getting the data that's the most recent up to date. Okay, but there could be multiple catalogs that you're sharing as well. Cause again, the metadata is the metadata for a particular set of files and data, right? So you can have Iceberg connectors going through four different catalogs for four different departments, finance department, HR department, right? And they can all still share, but they're going through their paths and we're making it easy for you to set that all up. So there could be, yes, there could be, absolutely. And again, it could be, you know, customers kind of organize their data the way that they wish, right? So they might have multiple metadata and metastores to do that, absolutely. So there's two things I actually wanna show you real quick. So the first thing I wanna show you is that there's a very, very easy way to get started. You know, Ahana again, company that we acquired, they offer a Docker container, it's in the Docker registry, right? So there's a single, it's called, you know, it's a PrestoDB sandbox, right? And so I'm running this on my Docker desktop right here, right? Let me just go ahead and start this coordinator real quick so it gets it starting, right? And, you know, same container, just go and just configure it differently. You know, I set this up in 30 minutes, right? So on my laptop, I can run, you know, Presto cluster, one coordinator, multiple workers, right? As soon as that comes up and running, you'll see this, right, so this is the Presto monitoring UI, right? You'll see, and by the way, the way that I set up that coordinator is it's also a worker. I'm sorry, say again? Well, how do I share that? Yeah. Why is that? Let me, let me, let me, thank you, that's a, thank you for pointing that out. Let me go to here. If I end the slideshow, ah, there you go. I'll jump back to the slideshow since soon as I get back to it, sorry about that. So what I was showing is that this is just Docker desktop, grab the container, right? Same container, right? You configure it as a coordinator as a worker. I quickly started this coordinator. My coordinator in this case is also a worker, so when you hit the UI, which by the way, you can just, you know, you can get it from here, right? It'll pop up, it'll show up as one worker. Now what I'm gonna do real quick, and this is also kind of what I was talking about before, let me go ahead and just fire up these workers here, right? So now I'm starting the worker pods, right? And so what you'll see is I'll take a couple seconds, right? And then those worker pods will pop up here, and then you'll see them active. Soon as they become active, so again, they're going through a discovery service, right? Announcing themselves, hey, I'm a new worker, right? Please see me. Soon as the coordinator sees it, the active worker pops up. Next query, right, is then properly scheduled, paralyzed, tasked out to those workers, right? And that should come up any moment there. And so those two other additional workers now pop up as active workers. Again, any query that you submit, we'll use that, you know, all online, all very good, extremely, you know, in terms of very easy scalability what I was talking about before, right? So one of the things I'd recommend, you just go to Google, you can say a HANA Sandbox. You'll see very simple instructions, just a couple of handful of steps to be able to do that. You can run this on your laptop, right? You can run it on your Linux server, just to get started, right, it's all free. You know, highly recommend that. Now I do also want to show you kind of what IBM's WatsonX.data is about. IBM, we're all about security. I'll talk a little bit about that too. Well, we're bringing in a lot of improvements in terms of presto security, service to service. So what I've done here is I've just logged in to IBM Cloud. This is particularly a development account, so it's called test.ibm.cloud, right? And so what you can do is that if you just kind of search, like let's say we want to search for WatsonX.data, right? Then you'll see in the catalog, and then you can actually ask for the service. So right now this is in beta, right? It'll GA in July, right? And so what you can do is that with WatsonX data, we allow you to be able to deploy your presto clusters in IBM Cloud or in AWS Cloud. So you can click on either. And like let's say we want to do AWS, and then you get different regions and different data zones, right? And then what'll happen is is you'll get an estimate of kind of what that would look like to be able to get started, right? And then all you have to do is, and you can make a couple of connections changes, enter your service name, if you want a different service name, pick certain endpoints and requirements if you only want a public endpoint or private, or public, right? And then you can go ahead and create that, right? So very easy to do, right? Just subscribe, pick the location of your cloud, pick the data center that you want, you're good to go, right? Now the next thing that I want to show you is that when you do that, what does that look like, right? So if we come back here, we look at the resource list, I already have a couple that I stood up real quick. So under databases here, you'll see it. So let's go ahead and take a look at like this one, for example. I kind of want to walk you through how easy it is to be able to set this up. And then we'll go into the web console. So when you ask for the service, you'll enter this kind of welcoming screen here, right? And it's designed to get you up and running really quick. So in the screenshot I showed you, right? There's kind of these three tier in the infrastructure map that we create for you, we make it very graphical. So the top are all the engines that you want to provision on the next tier, the catalogs, the meta store to be able to access your data. And then the bottom is your storage, like typically object storage buckets, right? So we made this extremely easy, right? We also give you like a provision, our own provision bucket if you want, we'll just go ahead and use that for now, right? And then you could choose what type of catalogs that you want to create, right? For beta, you can do an Apache high catalog, again, that's one of the most common ones, or you can do like an iceberg catalog, right? So let's just go ahead and keep both of them on there. And you could see it kind of filling up, like so the first choice I made is an IBM provision, you see it on the bottom, catalogs, the next tier. Then you could choose what type of engine that you want to create, right? DB2.netes are in this case, Presto, right? And you could choose, you know, what type of compute that you want, compute optimizer, storage optimizer, then you could see it filling the production, you know, up in the engine map here, you click next, and then it'll go ahead and provision that for you, right? So yeah, I'll go ahead and provision that, that's fine. And then it takes a few minutes to be able to do that. So the next thing I want to do is I want to kind of show you what it all looks like and then show you what some of the new capabilities that are there to kind of help make your life easier. So let's go into this one here. This is where we've been doing some of our performance work. So again, we're just kind of open the console, right? And so in here, we already have an engine, we already have four catalogs. So if we look at the infrastructure map, right? You could see that we have a Presto engine here and here we're using multiple catalogs that are available, right? So we have Hive, we have Iceberg and then we have a couple TPCDS where we've been doing some performance tests. We have multiple buckets here and then this engine can access all these buckets through all these different metadata services, right? So some of the other things I just want to point out real quick. Yes, we do have an ingestion hub, right? So if you have data coming from other different locations, you want to be able to move it, you want to be able to automatically create the tables, the metadata for it, this tool will make it really, really easy for you, right? We also offer things like data explorer and a SQL editor, right? So what's really kind of cool here about data explorer, you could see all the catalogs, right? Which are related to the connectors. You can just kind of explore your data here. Very, very easy to do. So for example, we have TPCDS here all ready to go, right? You can take a look at the tables that are there, right? And then we also have things like SQL editor, right? So you can use the data explorer on the side to be able to see what you're trying to do. So for example, this TPCDS that I was looking at, so the naming schema is talking about. So you can say something like, let's use this TPCDS catalog and let's use SF1 schema, right? And then from the SF1 schema, let's say we want to do a select star from call center. Center, and if you go ahead and say, and you can choose to of course run on different engines, it'll go through, I'm not sure why that is giving me a problem. That should be fine, but yeah, you can normally enter your SQL information there. I think we're almost at time, so let me just hit a couple of things real quick before we close, because I think we're right at time and I think people are already having to go to their next session. So let me go back to this real quick and talk about some of the future things that we want to get through real quick. Okay, so one of the things that you'll also see coming to our products is the notion of multiple coordinators, disaggregated coordinators. You'll be able to see a much easier way to be able to get that up saying, you know what I want, you know multiple coordinators to be able to get high availability there as well. So you'll see that coming. A large part of the optimization, this what we call task bin packing, that's also in progress and that's coming as well. This really kind of has to do with when you parallelize your workload and then you assign it to all the workers, you know typically there'll be a fixed set of resources, right? Most often memory, right? So improvements to be able to parallelize, send out that work to each of the workers. You know that you'll see improvements there around performance for efficiency, will be more pushed towards native execution. One of that is like Prestissimo, which will be native C++ workers. You'll also see in the actual connectors, native access to different data sources and connectors. You'll also see things like ARIA, Parquet that's happening. You know certainly extensions to SQL functions, UDF, external UDF servers, right? Certainly improvements in a patchy ranger as well. I'm sorry, I'm going kind of fast. I think I'm already over by two minutes. Other areas I want to talk a little bit about in terms of what is it that IBM is focusing on? We're certainly focused around security. All our customers absolutely need it. So things like access controls, governance, what we call data centered. The ability to look at all your data, because remember the engine is completely disaggregated. Any engine can access any data, they can all share. So what's really important is security of the data itself. So to be able to create access controls and policies against data itself and then have that applied regardless to whatever engine is actually trying to access it, right? So you can provision five, six different engines, but the security control is on the actual tables or columns all the way down to whatever granularity you want, right? And it will be enforced regardless of which engine that you want to access it, right? Something that is really important to certainly a lot of the finance customers and whatnot. Service to service, authentication authorization. You saw like in the Lakehouse architecture diagram that there's lots of different pieces and they all work and communicate with each other, right? So it's really, really important, certainly to a lot of the IBM customers that all of it is completely secure, encryption enforced, full authentication, full authorization, right? So you'll see that from every service to service when you're bringing different engines, that'll also be completely secure when you're accessing and sharing different metastores and whatnot. Another area that's extremely important that IBM has already been contributing, we've already contributed more than a dozen check-ins, is for Pristisimo VLOX. So this is for native C++ workers. Presto's written in Java, right? I was speaking to this Trino gentleman just a moment ago, right? About how much of a performance difference that makes, and we're talking like 10x plus, right? Performance profiles look very different because of latency of Java, right? It's that same latency story. Presto latency much better than high Hadoop MapReduce, right? VLOX latency will be much, much faster than that. So in the interactive Lakehouse space, massive performance improvement, we hope to be able to get at least some sort of beta, if not released of Pristisimo sometime by the end of this year, right? And we also wanna make sure that the iceberg table format, couple things, first of it of course is specification completeness. The current Presto implementation of the iceberg table isn't complete. There's more capabilities we want on there. I mentioned some of the table optimizations, the maintenance I kind of spoke about, as well as just improving the performance, particularly around DML, right? Around things like inserts, around things like deletes, right? This has been a problem even voiced by some of our open source partners. Okay, and then the last thing I wanna do before I close, and I'm sorry, I know that again, late in the afternoon, I'm taking out two of your time, please do get involved with the Presto community. In fact, this is why IBM was so excited to be able to join this Presto community, right? We love the governance model. We love the democratization of how everyone participates and contributes, right? So you can join the Slack channel, right? Please feel free to join the whole community, write a blog, right? Contribute to the project itself. One of the things that we're really excited about is the actual community itself grew by over 110% just this past year, right? This is the Presto foundation community, right? A lot of really exciting things are happening. I only was able to touch on some of it in the beginning, right? So please do join. PrestoCon is coming up in June 7th, another place where you can get and learn a lot more about what's going on, right? So I've provided this link here about PrestoCon. Please do try and attend. If any of this is interesting for you. And again, please do feel free to reach out to the community. It's really a vibrant, exciting community. Thank you so much. Sorry for going over on this last day right before your lunch. Are there any other questions? Anything I can help you with? Yes, yes I will, yes I will. And then I think we do need to close, but please do feel free to reach out or just walk on up and I'll be able to help you with anything that you ask. Thank you so much for coming. Thank you. Thank you.