 Hello and welcome, my name is Shannon Kemp and I'm the Chief Digital Officer of Data Diversity. We'd like to thank you for joining the latest installment of the Monthly Data Diversity Webinar Series Advanced Analytics with William McKnight. Today William will be discussing architecture products and total cost of ownership of the leading machine learning stacks. Just a couple of points to get us started. Due to the large number of people that attend these sessions, you will be muted during the webinars. For questions, we'll be collecting them by the Q&A section. If you'd like to chat with us or with each other, we certainly encourage you to do so. And just to note, the Zoom chat defaults to send to just the panelists, but you may absolutely change that to network with everyone. To find open both the Q&A and the chat sections, you can find those icons in the bottom middle of your screen for those features. And we encourage you to share highs by your favorite social media using hashtag ADV Analytics. As always, we will send a follow-up email within two business days, containing links to the slides, the recording of the session, and any additional information requested throughout the webinar. Now let me introduce to our speaker for the series, William McKnight. William, his vice-mini of the world's best-known organizations, his strategies form the information management plan for leading companies in numerous industries. He has a prolific author and a popular keynote speaker and trainer. He has performed dozens of benchmarks on leading database, data lake streaming, and data integration products. William is a leading global influencer in data warehousing and master data management, and he leads McKnight Consulting Group, which has twice placed on the incorporated 5,000 list. And with that, I will give the floor to William to get today's webinar started. Hello and welcome. Hello and thank you, Shannon. Let me get my slide going here. Okay. I trust we can see it correctly. Looks good. From this. All right. Well, whoops, I did not mean to do that. Okay. So, uh, welcome everybody. And this is an exciting subject for me. I spend a lot of time on these stacks and keeping it up to date and keeping current with what not only vendors are doing, but what enterprises are choosing out there. And so I want to bring you a distillation of that information here. Architecture products and total cost of ownership of the leading machine learning tax stacks. And I have a lot of slides for you today. We're going to be moving fast. I hope that my eyes were not bigger than my appetite here in terms of slides. But there's a lot of information. Now, if you've seen me, uh, give, give a form of this presentation in the past, I used to go item by item. So let's look at, let's look at everybody's compute. Let's look at everybody's storage. Let's look at everybody's data cataloging and everybody's BI and so on. And in, and I've changed so that I'm actually going to show you three stacks. And we're going to, we're going to go stack by stack, but there's more. Uh, I have come to believe over the course of time, working with clients on this a lot that one of the very key pieces to the overall stack is the data warehouse is that database chosen for the data warehouse. And then other pieces start to fall into place around that. Now that's not a hundred percent true. Everybody's a little bit different, but as a result of that being so important, I am going to take a hyper focus on the various stack data warehouses today. And then we'll fill out all the stacks. Now I could only choose three stacks because of the time that we have here today, there are more, and I'm not advocating any of these three necessarily. There are more, and I would have to have more time to really break down the market. So moving right along, these are some of the clients that have influenced us over time that we've had the pleasure to work for. And with that, we're going to jump right in. Let's jump into the snowflakes. Stack, yes, it's again, the data warehouse being the foundation of the environment. There is I feel very comfortable saying the snowflake stack, even though there are many snowflake stacks, many more than any of the other two that I'm going to be sharing with you here today, which is we'll get to them. But so in snowflake, one of the first things that you have to think about are we served instances, are you going to go for a reserved instances, spot instances, bring your own licenses, containers or serverless computing. So once you get past all that, you can find many great performance features in the snowflake stack. So there's micro part. These are some of our favorites, micro partitions. These are created during loading and it prunes the partitions that it doesn't need for a query, which is nice. It prunes by column within the remaining partitions. So the docs say that snowflake only performs automated maintenance if the table will benefit from the operation, but we've never seen it not cluster by the keys we specify, moving right along to clustering. Clustering is a great feature of snowflake. You have to choose your key and your depth and so on. Before clustering, the range of values in all the micro partitions overlap as the number of overlapping micro partitions decreases. The overall depth decreases. And so that's a little bit about how that works. Transparent materialized views. We like transparent materialized views for you. It's a relatively new feature came out in, I think, 2020. Search optimization service. This is still an open preview. We have high hopes for it and query acceleration service. Which is also still an open preview. The goal of that is to improve the overall warehouse performance by reducing the impact of outlier queries and offloading portions of the query processing work to shared compute resources that are provided by the service. So some of the big features here again are clustering materialized views and search optimization service. And these are some of the types of queries that you might be asking of your data warehouse in the stack, equality searches. Now, in there, the goal is to find all occurrences of a particular value. Range searches, sort operations and so on, what you see there. As you can see, Snowflake has everything covered with one service or another. I will come back on this when I talk about pricing. Because some of these have a gotcha to it. When it comes to pricing, they can spin that up a little bit on you. There are also a lot of usability features within Snowflake. External tables. First of all, before I get to all that, do know that you have two different UIs, two different complete styles of UI, I guess you might say. There's the classic and there's SnowSite. And SnowSite has dashboarding and governance. And if you create an account after October 2022, I believe, you can only access SnowSite. And prior accounts, though, you can now toggle between the two and pick your favorite external tables. Now, that's a usability feature. That's with schema on read. And that has Delta Lake support. And the workflow goes something like creating stage, creating external table, creating a cloud object storage, event notification and automatic refresh will happen to those tables. So that's nice. It will, as you'll see when I flesh out the stack here, that will have an impact on what we call our data lake in this stack. There's dynamic data masking where it will dynamically rewrite the query, applying the masking policy with a SQL expression to the column. The column rewrite occurs at every place where the column specified in the masking policy appears in the query. So it's good that way. It's broad enough. It covers you moving on. There's time travel and failsafe. Now, time travel is their, I guess, hallmark feature, right? It was, I think it came out in the first release of Snowflake and it was a hit. And as you'll see, some others have followed suit into time travel. Semi-structured data supports JSON, Avro, Ork and Parquet using the variant data type. Snowpipe, automatic load there, automatic ingestion. Only AWS accounts, though, support all three major clouds, objects, storage. Azure accounts support blobs and data lake storage Gen2, DLS Gen2, and Google accounts only support GCP storage. Snowpipe is that auto-loading feature. Snowsite dashboards, one limitation there is you can only share that with fellow Snowflake users inside the same account with the same privilege that you have or higher. And the Snowplake Snowpark API, they call it. And that supports Python, Scala and Java. So those are some of our favorite usability features. In Snowflake, you get 10 sizes or you can choose from 10 different sizes. And this is growing all the time available in standard and Snowpark. The new Snowpark Optimize storage has 16x memory than standard. And that's an open preview right now, but there you see your sizes from extra small to six extra large. Five and six extra large are still in preview on Azure. And our, I guess, analyst intelligence does suggest how many CPU and gigabyte of RAM per standard node and Snowpark nodes. I'm not going to share that with you here, but with experience, you can kind of figure out what you're what you're really getting for what you're paying for is consumption. So you buy units ahead of time, you spend them down, you get your wallet refilled or you refill their wallet, as the case may be, watch out, though, for concurrency and price performance, effective warehouses. This is our term for what happens when you have multiclusters in Snowflake. And this is huge when it comes to your bill. And this is what people sometimes, unfortunately, learn after the fact. And this is where it will scale out in increments of 2x and leave it there awhile or what have you. And you're actually paying for the whole thing, right? These other things add on to compute and add on to your bill. They happen, they're good, but they do spin up the costs a bit. Automatic clustering materialized view refreshes, search optimization and query acceleration and time travel storage is something else that you pay for. So watch out for that. Now, I do put in the bullet here for discounts because even though I'm showing you the standard, the enterprise, the business critical pricing, there are discounts available from all of these vendors. I'm just showing you the list price here, right? Now, I'm not a big fan of companies like this that have this usage-based consumption model because by having this, you're telling the customer that the more they engage with you, the more they're going to pay. And I would like to have a bit of a freer hand and not worry about how many units I'm using if I run this query or not, but there you go. A snowflake ML stack. OK, so this is the first time you're seeing all the stack components in our ML stack. So let me see if there's any I want to point out. Identity management is on here and some people are thinking, well, why do I need that? It's used to manage digital identities, including user accounts, authentication, authorization and access control. It's used to ensure that only authorized users have access to the right resources and they do it securely. It helps organizations manage user identities across multiple systems, applications and networks. I think it is absolutely essential. And I have some Amazon components in here for you. I also have Spark Analytics. It's an important part of Enterprise ML. It provides the tools and features needed to quickly and easily analyze large data sets. So these categories are going to be consistent across the various stacks that we're going to go through. It also helps with the ability to integrate with other ML tools and frameworks. So Spark Analytics software can help enterprises gain insights from their data and make better decisions. That's what it's all about. So if you've seen my stacks in the past, my snowflake stack in particular in the past, what has changed? What has changed in the past year in terms of what we see clients selecting when they establish snowflake is the foundation of that stack. We have glue in here now for data integration, not we used to have talent in here or informatic in here. But now we find that glue is is even more acceptable and used in these stacks. Of course, it's AWS glue. For the data lake, we now have snowflake external tables, not what we used to have in here. The cloud era data hub on S3. Also, we have for the data catalog, we have the Amazon glue data catalog, not the elation data catalog like we used to have. Of course, you're free to use all of those components and more. Nobody, well, I shouldn't say nobody, but few people will have a nice pure stack. And this when it comes to snowflake, there's no such thing as a pure stack from one vendor, right? Snowflake only provides the dedicated compute, the storage, the data lake and the rest of it you've got to add in. So these are some things that we see pretty common added into a snowflake stack. Here is a picture of that. And sometimes I have different components in the picture than the power slide. Like, for example, I show Databricks here for the computer science engine, excuse me, as opposed to SageMaker. We see a significant Kubernetes application environment. We see a lot of that. We also, I also have MDM in here. I have a transactional database, which is a NoSQL database. And then I have the data warehouse data lake. And of course, the machine learning environment. Now, you'll see this kind of image for all of the stacks that I show you. I probably won't belabor them that much in this presentation. Let's move on to Amazon Redshift now. Now that we've seen the snowflake stack, we've seen some of the good and bad, I guess, about snowflake. And we filled out that stack. What about over here on Amazon with Redshift? Now, some of the things we like about Amazon Redshift in particular, the Redshift advisor, yeah, it makes the make suggestions like compress Amazon S3 file objects loaded by copy, isolate multiple active databases, reallocate workload management memory, skip compression analysis during copy, lots of great tips, I guess you might say, along the way. Workload management is pretty good in Redshift. Concurrency scaling, where it will automatically add additional clusters as the concurrency increases. Now they charge you for concurrency scaling clusters, only though for the time that they're actively running the queries. And that supports copy, insert, delete and update and comes automatically in the serverless option, which I'll get to. And then there's transparent materialized views. We saw that with snowflake. Keep in mind that with Redshift, automatic query rewriting to use the views does not support sub queries and a whole lot more. So it's pretty limited. And then there's short query acceleration in terms of usability. There is spectrum. Spectrum is spectrum resides on dedicated Redshift servers that are independent of your data warehouse cluster scales automatically, can't perform update or delete out there, but you can do insert. Automated materialized views. Talked about that a little bit. Dynamic data masking. This is brand new. It's been in preview since I think November 29th. This is where you can apply multiple masking policies with varying levels of obfuscation to the same column in a table and assign them to different roles and use priority number to resolve conflicts. So pretty cool. Federated queries. This is where you can query data in other operational databases and data warehouses and data lakes, but currently it only supports Amazon RDS for Postgres, Amazon Aurora Postgres, Amazon RDS for MySQL, Amazon Aurora for MySQL. OK, so that'll build out over the course of time and we'll have much more inherent federated quick capabilities within Redshift. There's semi-structured data and there's the super. There's the super type, which supports data up to 16 meg and actually all the way up to. I don't know, I think it's much larger than that that's in preview right now, but we'll say it's up to 16 meg for now. Streaming ingest with Kinesis, Python, UDF and Redshift ML that works well with the Redshift database. So you choose between provision clusters and serverless. These are some of the characteristics of provisioned Redshift versus serverless Redshift provision clusters are clusters of Redshift nodes that are managed and maintained by by you. They require you to configure and manage the nodes and you're responsible for scaling the cluster as needed, whereas in serverless is fully managed. It's a service requires no configuration or management from you. It automatically scales the cluster as needed and provides a pay-as-you-go pricing model. And a lot of these companies are getting into this. So your cluster sizes are based upon, of course, the AWS shape that you choose. Here I show you the AWS type and the CPU RAM that you get, the range of nodes that you can get and the price that you pay as per node. RPU, which you see on here, that stands for Redshift pricing unit. And it's a measure of the that choose to calculate the cost of using Redshift. The cost of using Redshift is based on the number of RPUs, period, which is determined by the size of the cluster and the amount of data stored. And RPUs are available in units of eight, like 32, 40, 48, so up to 512. So some decisions to be made there for your Redshift cluster. Pricing price per performance. You pay for compute resources as a function of time. Redshift also has reserved instance pricing, which can be substantially cheaper than on-demand pricing. Watch for concurrency scaling, serverless RPU usage, SageMaker costs for Redshift ML. Those are some of the big items we see that can drive a Redshift bill up. Now, let's build the stack. These are all the things in the Redshift stack. Now, this is a little bit more. I'll use the word pure because we're going to lean hard here towards the Amazon offerings, right? Because this is Redshift is an Amazon offering. If you start with that, a lot of companies will try to default to the things you see here. But again, I'll say if you have your favorite data integration tool, if you have your favorite data lake or data catalog, by all means, there are reasons to go beyond what you see here and do more of a best of breed. But if you're doing a pure Amazon stack, this is what it looks like. What's new here from last time is the data lake, Amazon Redshift spectrum now instead of EMR. So those old Hadoop components are slowly going the way of the dinosaur. And here's your machine learning stack in visual, kind of everything that we saw before for Snowflake, just with some different labels on it, really. And now let's move on to our third stack, which is Azure Synapse. OK. A lot of good features here, if you were, or are familiar with SQL Server, you will find your way into here very nicely. Workload management is good. They also have these workload groups, which is a collection of workloads that are managed together. Used to group workloads that have similar characteristics like resource requirements, performance requirements, other criteria like that. And then there's a workload classifier, which is a tool used to classify workloads into different groups based on their characteristics. So it can be used to identify workloads that require more resources or those that can be managed more efficiently. So all in all, the workload management is pretty good. We have an estimated query plan coming soon, soon enough. So I went ahead and added it, transparent materialized views, which we've talked about, adaptive caching, which does recently use data on NVMe and the Azure Advisor, which is doing some of the advisement that the Redshift one is doing and working its way up. Notice that we don't have auto scaling here. You can trigger a scaling operation with a pipeline job, though. We don't have short query acceleration or kind of a separate optimization service. Usability-wise, though, there is dynamic data masking. This is getting to be common across the databases that want to compete. Synapse does it pretty well. And what this does here is that you can substitute the real value for the real value. You could substitute X's or a dummy number in anything that anybody sees. And so if it's a credit card, maybe a bunch of X's and you see the last four digits or something like that. Synapse link is near real-time analytics over operational data. We're seeing more things like this. But that data must be an Azure SQL database or SQL server 2022 or Cosmos DB. And that is done via an isolated column store. And there is Synapse ML. This was previously known as MML Spark. And it has to have its own call out here for how it prices things. And for Synapse, it's the data warehouse unit officially. And the official definition here is pretty good. A collection of analytic resources. You can stop there, but it's defined as a combination of CPU memory and IO, which represents an abstract normalized measure of compute resources and performance. So you may ask yourself, well, how many of these do I need? And that's a good question. And probably it needs a bit of time to get into all the nuances behind that. But that can be done. That can be figured out. And increasing DWs literally improves performance. So 100 to 200, you're looking at 2X, et cetera, et cetera. Down the line, they go all the way now to 30,000. And the number of nodes per DWU depends on the size of the compute nodes. So if you have small compute nodes, you get five DWUs per node, medium, 10, large, 20. Now we can easily infer the nodes and consequently the price per hour, not going to put that on here, but you can do it. Nodes range from one to 60. So a lot of choices here, really. Pricing wise, there's serverless, which is $5 per terabyte process. There's dedicated and you get discounts. The more of a commitment that you make. This is true across the board for these vendors. There are additional charges for Synapse Link, Data Explorer and Spark pools. So watch out for that. I mentioned some of those. They don't come free. Pipelines are processed by DIU hour, runtime hour and per activity run. So for DWUs, you see here the latest list price per hour. For Synapse, again, you're paying for compute resources as a function of time. The hourly rate varies slightly by region as well. Also add the separate storage charge to store the data compressed. But that's at a rate of a number of dollars per terabyte per hour. It's far less, but it's not zero and it's not insignificant necessarily. The Microsoft Synapse ML stack. Now we're looking at more Azure components, but there are some Amazon components in here as well. Still, this is what we're seeing get selected here. So what's new, I'll jump to that. For Spark Analytics, we got big data analytics with Apache Spark. It provides a unified platform for data processing, machine learning and analytics, allowing users to analyze data from multiple sources. Build machine learning models and create analytics applications. Now, for data integration, we now have the Azure Data Factory and Azure Stream Analytics for analytics and Azure Event Hubs for streaming data. Two different choices there for data integration, depending upon the data, start with the Azure Stream Analytics. Two different choices there for data integration, depending upon the data style. For BI, there's a lot of changes to this stack actually over time. We have QuickSight, not Power BI. We just see it getting selected a bit. We have SageMaker on there, not Azure ML. We could have had Azure ML. This is a tough call and it's not really that important what I chose here. A lot of things are possible. I'm just giving you an example stack. Amazon IAM, we have that in there instead of the Azure Active Directory. Yeah, well, that's what we're seeing. And for the data catalog, I put in Amazon Purview. But if you need a fully managed serverless data catalog, then Amazon Glue Data Catalog may be a better choice for you here. If you need a data governance service, then Azure Purview may be the better choice. I would probably lean more in on that. That should have said Azure. I am sorry, just realized that. Azure Machine Learning Stack, this is the picture of it all. And should look familiar by now. Let's move on to Google BigQuery, a real up and comer here. A lot of people are liking this, especially those that are getting into heavy machine learning. So performance features we like, BQ architecture and slot. Slot is their word for the pricing unit. Slot is a virtual CPU is what it is. There is clustering and partitioning available. Clustering is where you commonly filter on columns that have usually many distinct values. That's what you want. And it's good for tables that are fairly large over a gig, at least partitioning where you can partition by time unit columns, ingestion time or a range of it for an integer. If you need partition level management, you still have to do that yourself, like partition exploration, partition deletion, things like that. The BI engine that's built in here. There's an in-memory analysis service. And we could talk about Looker as well. But there is an in-memory analysis service that's built in. Cache is frequently here, query data, which is nice. Usability features, of course, BigQuery Omni. There's an external feature, external table feature. This is where you can run BQ analytics on data stored in S3 or Azure Blob Storage. So you can reach out to those other stack storages. Time travel, here it is. Migration service with SQL translation. And that's pretty good. This is where it is free, by the way. And you can get many dialects and there are many more in preview. So they want you to to bring your SQL over. Looker Studio, Looker is free to use. You only pay the usual BigQuery and BI engine charges. Collab notebooks, this is with preview features now, passes the BQ results to a notebook and there is BigQuery ML. So you can fill out the stack here within Google. Pricing, as you see here, compute and storage. I won't belabor it, a lot of numbers. Omni, keep in mind, that extends the capabilities of BigQuery to Amazon Web Services and Microsoft Azure. With Google BigQuery, one option is to pay for bytes processed at dollars per terabyte. And there's also a BigQuery flat rate, which is very common amongst enterprises. So let's build out the BigQuery ML stack. I built it out with all Google components here for you. As again, an example, there are some new things, though, for BI. BI, Google BigQuery BI engine used to be Looker. With BigQuery BI engine, we chose to go with the BI engine because we're seeing a little bit more adoption of that. BigQuery BI engine is a powerful tool for fast interactive data analysis, while Looker is a more comprehensive business intelligence platform that includes data modeling, data preparation and report creation, in addition to data analysis. So you might have both. You very well might have both for those different needs and more and more. I don't mean it to be like there's only one BI tool for the stack or one data lake type for the stack. There could be many of a lot of these things. But if you're building one app, your first app and you don't have a lot of technical debt and you're able to do as you wish and get one thing done, one big thing, probably, then you would probably have one of each to get started. And it doesn't stay that way for long. This is the Google picture with all of the Google components. All right, I promise to talk about costs. And I did a little bit, but it's time to talk about them a little bit more. It's a big question and this gets into a lot of money. And this is where some of the engineering preferences can get put out to pasture for for something else. And again, I'm showing you specific stack costs because that's what we ran this particular study on. But this is not to exclude other stacks that are very viable in here. I don't show you the Oracle stack, the Teradata stack, the Cloudera stack. I do show you the the hyperscaler stacks. Now, this is for a let's call it a medium enterprise. This is an example from AWS. This is based upon our study. And this is a I use percentages here of where the dollars got allocated. And this does not include people costs, your consultants, your employees, your business input, none of that. I'll get to that. As you can see, dedicated compute is pretty big on here. And so this is partly why I can safely say that the data warehouse, which is that dedicated compute is a big foundational component of any machine learning stack. So beyond the dedicated compute where you'll spend your money in the stack is on data integration. It's on the data lake, as you can see. And it's also on the other factors here next in line would be data exploration. OK, but everything comes with the cost. I do show some things at zero because they're built in and you're not paying extra for that. That's not true on all of the stacks on this particular one. It was so I couldn't couldn't show a slice of the pie for that. But now you can see, hopefully, where the money is going. Into the infrastructure. Now, these are all the line items that we went and looked at to come up with our prices to to build this study. And the point here is not not to provide this as detailed information. But to show you that there's a lot of a lot of places you have to go to. There's 35 to 40 line items per stack of different prices that you have to go figure out and look at. There are 21 different websites just for AWS to get all of this information. Now, that may change in the future. But my point is it's not for the faint of heart to do. We've done it. You'll have to do it if you're going into the stack or maybe you don't. Maybe you can roll the dice on it. But coming up with the stack costs is a bit of work. And that's kind of the point. So what does it come out to? Well, let's take a use case for a medium size enterprise. Now, I know that's I should have probably 10 dimensions on here, right? Of of what you're selecting. But what I'm showing you here is the cost of a first big ML project for a medium size enterprise. OK. This is for as long as it takes to do the project to build the project. Now, I know we're all doing agile these days, right? And it's hard to declare victory. It's hard to declare you're done because you keep going. But once you get my my rule of thumb on this because I got to draw the line somewhere, once you get most of the functionality up and running, it's in enterprise. The enterprise is counting on it in production, not running it in parallel anymore with something else. But it's actually running whatever this may be, targeted marketing, fraud detection, you name it. That's what I mean by as long as it takes to do the project. Of course, it goes into maintenance. And of course, that maintenance continues to be a significant cost. But it's nowhere near the cost of the build out. So I'm looking at the cost of the build out at being once you add up all the components, 1.3 million to 3.2 million USD. And this is for the for a project that first project will probably take about a first about a year, first large scale ML project. Of course, we all have small scale ML projects today, even in the medium sized enterprise. I suggest maybe you don't, but I suggest that maybe you do. But I'm kind of trying to come from a common point here where is where are the medium sized enterprises today? And when they raise their hand and say, we want to do this with ML, what does that future look like? It looks like about a year of that project. It looks like about 1.3 million to 3.2 million. Now, this is plus labor costs, isn't it? I haven't put labor costs on the slide. We have found in the medium sized enterprise. And by the way, this could also be a medium sized project in a larger enterprise. Let me back up. I am saying a medium sized enterprise is I'm trying to use a common definition revenue up to 100 million, 500 FTE, plus or minus, of course, based upon the nature of the project. This is a project that's built to enterprise standards. It's not a POC. It's not an MVP. The basic stuff on it does work. So that's again, my definition of done. Now, back to the labor costs for something like that, what I just tried to describe, the labor costs are going to be approximately the same as the stack costs. So you can double the figure that you're looking at on the screen for your total cost to build that first big ML project. Is it worth it? I sure hope so. You may have you may have some inefficiency in your first project. You may have to bite the bullet on some things in that first project. That's OK. That's OK. Have experience there, helping you out, keeping it as efficient as possible, but you will have some. There's no doubt. Now, let's move up to a large sized enterprise and all the same things apply here, except the numbers different and bigger. 3.4 million to 8.5 million is what that first project will cost before it gets to the point of quote unquote production. Now, plus labor costs and what we find in the larger enterprises is it can be anywhere from equal to the stack costs to double the stack costs. So you can take the figure and I know I'm giving you a huge range here, but you can take the figure that you're looking at and multiply times two or three to get the full cost to build that first ML project in an enterprise. OK. So we are at the lower end at 6.8 million and the higher end you're going to be at 25 and a half million. OK, I know that's a big range, but there's a lot of different things that we're doing with ML, not just one. And so it's hard to put a put a fine point on it. But as you can see, there's a lot of money involved either way. And when there is a lot of money involved, you need to do some of the due diligence that I've been showing you around how you're going to build out your stack. Do you have some pet components that you absolutely can't get rid of? Say Informatica, say, you know, Tableau, what have you? OK, fine. There's a there's a place for that. It may cost you a little bit more. But if you have that enterprise synergy with it, maybe worth it, totally fine. And by the way, I am not saying by any stretch that the hyperscalar components of the stack are less expensive or necessarily more integrated with the other components of the stack, even the database, the data warehouse. But I am saying that they are there. They are they are definitely something that a lot of people check out. OK, I belabor that. Now, by the way, we're coming up towards the end. And I am happy for your questions. We'll go into Q&A here in just a bit so you can be thinking about them Q&A them up for me. What have you? I do want to take a moment, though, before we leave the the bad part of this, which was the high cost to say that the project return on investment should be there for this project. And I'm not chastising anybody and saying, well, yeah, it should be there. Where is it? I'm saying, no, it really should be there if you do a halfway decent job with this. And the benefits going to come from. Ultimately, ultimately to do ROI, increase revenue or a decrease in expenses. Now, I've given a whole advanced analytics webinar on that. You can find it on YouTube and learn more about my thoughts about ROI, how to get there, where it comes from, how to do the math for it and all that sort of thing. But do know that the return part of this is harder than the cost. And I didn't say the cost was a walk in the park to determine now, did I? So the returns, it might be easy for you, but I say more often than not not only is it harder to come up with because you have to engage so many people in in conversations they're not used to, but it can be quite variable and has a lot to do with things that we have less control over than which BI tool we're going to pick, for example, you know, so the ROI can come from a lot of places. It can come from different things that lead to increased revenue, like improved customer satisfaction. If you're improving customer satisfaction, they're more likely to stay. They all have a customer lifetime value projected with you to multiply something on that to understand what the value is of keeping a customer. If you're doing a great job with customer retention, let's say that's what this is about. Then hopefully the benefits are there for you. Usually I found any time you can segue the discussion from the cost over to the returns and you can zero in on a let's say a let's say a top 10 item for the company in terms of the company goals, like adding more customers, like reducing our expenses in the supply chain. What have you any of those top 10 metrics? If you can link this project into that. And of course, you have to execute and do it and drop the fraud by a point, drop the returns by a point. What have you, but that will almost assuredly pay for the cost of your machine learning project. Godspeed on that. All right. So in summary, and I won't read all this. We've covered a lot of ground for large size enterprise projects. The stack cost again, three point four eight point five million TCO of cloud analytics platform scales up as a demand for analytics. I your company grows over time. These costs, yes, there are the one time costs, but there are the ongoing costs that keep on going. And in the in the models out there today, a lot that I've showed you, you don't have that. You don't have as much of a big upfront cost as you used to. So it's a little bit more, a little bit more balance now month over month. And it'll come down to usage. Snowflake adopts a usage based or consumption based pricing model where users are charged based on the amount of data process resulting in higher costs for higher usage levels. Redshift has your provisioned and serverless. These are some of the highlights of going through that stack. These stacks synapses are available for purchase in DWU, which comprises a collection of analytic resource, BigQuery slots are virtual CPUs. I listed all the components of the stack. Some people might look at it and say, well, we don't need identity management. I'll give you I'll give you a couple here that that might get pulled out. Identity management. I'd say 80 percent need it. But some may not data catalog. You can struggle without it for a while. If you're looking to save some money, get in going. Spark analytics. If you don't have large data sets that you're going to analyze, maybe you don't need that. I do believe that these stacks need a data lake and they need streaming for that data lake. They need data integration for the data warehouse. They need compute and storage. They need BI. Why not BI? If you're going to all this work to build this stack, put that data in a place to succeed, let's put some BI on it so that users can use it in ways that we cannot even predict right now. We want that. You're going to need machine learning. And the other three I mentioned might be more optional. Estimate the cost of building a tech stack can be a complex task. It requires careful consideration of various factors. It's recommended to seek reliable performance at a predictable price. That is your goal. Reliable performance at a predictable price. Nobody likes variability in the pricing. Nobody likes to be told it's going to be $10 and it ends up being $25, et cetera. You are looking for reliable performance. Maybe it's not, maybe it's not the very best performance possible out there. But there is a cost to be paid for that as well. And sometimes that doesn't carry through to all the workloads of the company. That great performance. So you want reliable performance at a predictable price. The true measure of project efficiency or efficacy is return on investments, not the cost. It's return on investment. You've got to get the conversation over to return on investment and away from just the cost. And if you can't carry the conversation over that way, then maybe you haven't thought enough about why you're doing this project in the first place, and that does require quite a bit of effort as well. Now, I will show you this before we get into Q&A. We do have a hands-on cloud analytic database competitive workshop where we can come on site or virtual and teach you about all of the technologies that you see here on the screen, including including the user interface functionality. You will allow data. This is a hands-on workshop. This is great for vendor, competitive teams, product teams, pre-sales and sales. It's also good for the enterprises sitting at the cusp of that decision. And here's my guide, Jake, talking a little bit about Snowflake. All right. Yes. All right. That has been architecture, products and total cost of ownership of the machine learning, leading machine learning stacks. I'll turn it back to Shannon to see if you have any questions. Thank you so much. Great webinar. Thank you as always. If you have questions for William, feel free to submit them in the Q&A portion of your screen and just answer the most commonly asked questions. Just a reminder, I will send a follow up email by end of day Monday for this webinar with links to the slides and links to the recording. So, Lynn, can you share again the GCP machine learning stack? Can I share? OK. Take me just a second about that. Thank you. And I don't know if there's a specific question to that. They just wanted to see it again. And the other question was. Perfect. Thank you. And then where does data bricks fit into any of these stacks? Well, if I had more time, I would have done the same for data bricks that I would have for these other three. As a matter of fact, there was a time in the preparation where I had data bricks in here, but I was afraid I would run out of a route of time and I probably would have. So data bricks is alongside all of these. We see higher adoption data bricks. We see maybe a loosening, I guess, might say, of the partnership with Snowflake. Snowflake has historically used data bricks for its machine learning engine. And data bricks is establishing itself, seems to me anyway, on its own. And trying to make the conversation. You got two choices here, Snowflake or data bricks. I'm not sure that's true, but data bricks is definitely especially if you're real keen on machine learning. So some of the some of the more, I guess, advanced enterprises out there that are doing more advanced things are really looking hard at data bricks. And we see it winning more deals than it used to. And definitely someone to a company to keep an eye on, for sure, in this space. I don't know if that answers the question, but that's where I went. I like it. It's definitely encouraged the questioner to ask more if they want more in depth answer. But moving on here, is Snowflake's new edition of Snowpad a positive feature going forward in regards to code review, organization and applying machine learning. Do you know that have you played with snowpark at all? Oh, yeah. Oh, yeah. It's it's it's huge. It's it's it's a snowflake. Customers are adopting this in droves and it is. It is it was definitely a real key moment for snowflake. So definitely a big thumbs up from us in terms of adding needed functionality to a snowflake shop. And so, yeah, I definitely see a lot more uptake of it and it adding a lot more value to the enterprise over the course of time. I'm just trying to go back to to where I showed it in context. And see if there's anything I want to pick up from that. Snowpark, yep, yep, yep, yep. I like it. And the 16 X memory they've decided, well, we decided as customers that that was a better profile for many applications, especially the ones I see that are advanced in and trying to do machine learning. Sorry. Oh, good. And and the lot seems to be the new trend on the market focus on costs and ROI is crucial, but machine learning offers opportunities to metadata analytics for our dynamic data catalog, metadata analytics for data warehouse administration and data pipeline monitoring. So I see the benefits with the media effect. The question is how can we best convince clients to implement machine learning in the data mark layer in analytics, engineering, small use cases, long term implementation benefits? Was is did you say ML Ops, Shannon? And was the question the question come back to how can we convince our companies to do machine learning, basically? Yeah, in the data mark layer and analytics, engineering. Yeah. Yeah. Well, now there's a couple of ways to go with this. I did, by the way, have a full advanced analytics webinar on machine learning ops. So if you want more of that, go and grab it on YouTube. But to answer the question more directly, I would refer you to my return on investment webinar, because that's what it's all about. And that's the way to convince. I would also and I this sounds like I'm piling on here, but I would also recommend the webinar on organizational change management because you are changing mindsets and behavior as well as part of trying to convince the organization organization to do machine learning. Now, first of all, I applaud anybody that's actually trying to help their organization get into something that has tremendous value to them. And I believe machine learning is that so we're in total agreement. I am I go in with the bias that I think machine learning should be here in the enterprise and I'm looking for where and how and things like that. And so if you're already probably whatever you're trying to replace with machine learning is already delivering an ROI. And there's going to be a cost for transitioning to machine learning. OK, got to figure out what that is, what that looks like, how long it's going to take, that sort of thing. And then what the ongoing costs will be from that point forward. And some enterprises have such tremendous technical debt that even moving over to machine learning is not going to cost you anymore. It's going to cost you probably less. Plus, you have a two thousand twenty three opportunity to do a few other things right in that process versus the nineteen ninety whatever that that the thing was built in in the first place. So there's a lot of opportunity there whenever you do make a transition, a maturity, if you will, of an application. And so you can look at all that, see what the new application is going to look like. But the bottom line is what what more is it going to deliver a value to the organization to do it the machine learning way? If you're reducing returns or let's say fraud, say you've dropped fraud by a certain amount. OK, great. If you leave things as is, it's going to stay the same. If you add machine learning to that, it's going to drop it another whatever point. You figure it out. You drop something by a point like that. You you're more than paying for the project. And plus machine learning is where a lot of the new capabilities are going with these hyperscalers and others. And so. There's just long term efficiencies with machine learning that you want your organization to be a part of and somebody's going to have to quote unquote bite the bullet and be the first, but that can be you. And that's great for a leader to do. Perfect. So I think we've got time for at least one more question here. We've got three minutes left. Is it possible to create dimensional models in GCP or we are only can we use OBT one big tables? Sure, you can create dimensional models there. I don't think that. I mean, it doesn't care what the schematic is in the tables that are implemented. Now, I'm a fan of dimensional models, excuse me, for that in-user access layer, because it I think it's modeled in the way that a user thinks about their query, but what you have to think about now is what we're seeing is fewer data warehouses, ML stacks, except for our data lakes being built for and in-user to sit there and kind of pound away at. Now we're seeing a lot of these stacks are built to send the data into algorithms, machine learning algorithms, and that being the case. We're seeing dimensional modeling get less popular and less used, but certainly they're still out there. You know, when I see a dimensional model anymore, half of the time, I know that this was built at least 10 years ago and it's just been carried forward. And then not that that's a bad thing, but you just you just kind of know having seen so many, but sure, it's possible to have your dimensional models in GCP, BigQuery. Well, William, thank you so much. But I'm afraid that is all the time that we have for this webinar. Thanks to everybody who's been so engaged in everything that we do appreciate it. And if I can mention, I added in the links to the previous webinars that William mentioned there into the chat. So the specifically the one on ROI and the one on machine learning ops, there's links to that and plus just the links to all the on-demand webinars for the series. And again, just a reminder, I will send a follow up email by end of day Monday with links to the slides and recordings from this webinar, along with those other additional links. Thanks, William. Thanks, everyone. Hope you all have a great day. Bye bye.