 Hello, and welcome. My name is Shannon Kemp and I'm the Chief Digital Manager of the University. We would like to thank you for joining the latest installment of the Monthly University Webinar Series, Advanced Analytics with William McKnight, sponsored today by Chaos Search. Today, William will be discussing estimating the total cost of your cloud analytics platform. And just a couple of points to get us started, due to the large number of people that attend these sessions, you will be muted during the webinar. For questions, we will be collecting them by the Q&A section, or if you'd like to tweet, we encourage you to share our questions via Twitter using hashtag ADV analytics. And if you'd like to chat with us or with each other, we certainly encourage you to do so. To open the Q&A or the chat panel, you'll find those icons in the bottom middle of your screen for those features. And just to note the chat default to send to just the panelists, but you may absolutely change that to chat with everyone. As always, we will send a follow-up email within two business days containing links to the slides, the recording of the session, and any additional information requested throughout the webinar. Now, let me turn it over to Courtney from Chaos Search for a brief word from our sponsor, Courtney. Hello and welcome. Hello, Shannon and William. Thank you guys very much for having me. I'm going to go ahead and share my screen here. I appreciate the opportunity to talk with this audience a little bit today. And I would tell you, hang on, let me just go to full presentation mode. Slide show. There you go. How's that? Hopefully that has shown. Is that good? Yeah. Okay, great, great, great. I would simply offer a comment. First of all, we were here back in January for an awesome session. So thank you guys very much for having me and the Chaos Search team back today. But if there are any questions that maybe generate a little bit of, you know, a little thought generated or questions that you may have as the session goes on, do feel free to share them here. And we will, William and I will answer them live at the conclusion of the call. So Chaos Search, what do we do? We help modern organizations know better. And we are in the data lake space. We help our customers activate the data lake for purposes of analytics of all kinds. We do that by delivering a platform to our customers that indexes all cloud data and making it fully searchable. Making analytics at scale with massive reductions of time cost and complexity a reality. So for organizations today who are perhaps having a log analytics challenge, maybe they're using Elasticsearch or some sort of log analytics tool and looking to not only get access to more information and get to that information really quickly, we allow you to store information like that in an S3 Google Cloud or Google Cloud environment and activate that for purposes of analytics on demand. So you can connect and be gaining insights in less than five minutes. You can actually see that on our site. But to talk a little bit about, you know, how we're helping people in some of the challenges that we've seen in this space. And I know we'll talk about a little bit about, you know, efficiencies and managing costs today on the call. There are all sorts of challenges that have become realities when it comes to accessing data today. And that by if you think about this original promise of, hey, go to the cloud, get access to all your data for insights at scale, you're going to maximize efficiency, you're going to be the most secure, all your data will be sitting in one environment. You can efficiently manage growth and all of a sudden you will get a fast time to insight. But the truth is that for many organizations, particularly if you think about organizations who are, you know, heavily invested in and have moved to cloud services as their backbone, what's happened is it's been successful in trying to move data into those sorts of environments. But depending on the sort of question you're trying to ask that information often exists in a very siloed environment. And what ends up happening is that this promise of being able to get to all data really quickly for better insights is not at all a reality for many organizations because the data volumes themselves are exceeding both capacity and the limitations of cost, right? So if you are paying for every piece of data you are storing, there is a threshold to how much information you can make readily available for insights. And then the end result is ultimately, and we've all experienced this, you have gaps in data access, whether it's, can you get enough historical perspective? Can you ask the questions you want on demand? Are you limited in terms of the number of queries that you can run? And ultimately a loss of insight, which we know this is a problem that is not going away. So continuing to think about modern architectural solutions that will help organizations get access to more data and do that in a really cost effective way is certainly a topic that we have spent a whole lot of time considering here at Chaos Search. What that really means for us is if you think about a challenge of saying, how do I actually get to any and all of my data? And maybe I'll put this in this operational analytics lens where how do I get my site traffic, my user data, my billing detail and customer interaction available to me without having to do a whole lot of moving and pipelining of data to do some analysis. What would I really want? Yes, would I want automated data access at massive scale? Would I like to dramatically reduce time to insight and save up to 80% of my on my investments and tools for that sorts of analytic infrastructure without changing the way your users work? What would the outcome be? You would get insights at scale, you would get immediate time to insight, you would free up resources to really think through garnering answers and less time sort of working through a variety of data sets and you could ultimately see the world you want it. At Chaos Search those are sort of guiding principles for what we've built from a platform perspective and that is, you know, we have a we have a data lake platform that if you look at the bottom of this image and we'll move to the top really comes right into your cloud storage environment. So let's assume that you have data sitting in S3, for example, you're using cloud object storage, you have put a whole bunch of operational data into that environment. The Chaos Search data platform directly connects to S3 and the combination of our our fabric index and refinery make that we have an index that sits in that environment that generates a single unified representation of all the data sitting in your S3 but makes it available for purposes of analytics at scale. So you don't have to copy that data, we are reaching into that environment and and virtually representing that data for end users. And then if you are the data consumer, the person who's running the reports, let's just say you're a DevOps professional looking into cloud ops, cloud ops data to see what's happening on your site at any given moment, you're coming in through a cabana front door and elastic API and coming right into the Chaos Search platform. So we're really uniquely suited to meet people where they're at in cloud storage, and then make data accessible to them no matter type for purposes of analytics at scale. If you take the log scenario just a little bit further, what we can do is say, you have operational asks that are tied to the performance of your website, but you want to marry that data to customer detail that maybe sits in your marketing system. In our case here at Chaos Search, maybe a hub spot or a sales force. Today you have multiple users creating different access points for that data. And what we're doing at Chaos Search is saying if that data sits in S3, you can make that available to you in any way. So a marketing user can access that information. A DevOps user can access the same set of information, but ask different questions of the data set. So for an end user, what does it really mean? It means that can you really get one unified data lake for analytics at scale? Absolutely. And the simplicity and automation of placing all this information into a single platform allows us to say we can get you out of data pipelining and data movement in a very, very significant way. And there's no more schema management. There's no sharding. There's no managing of server clusters, no dealing with uptime requests. And our users, our end users are using the same set of analytics tools. So basically, at the infrastructure level where storage sits, it's the same. At the application level, our open API approach allows each of the applications to exist exactly as they did before, but you now have access to more information directly from your cloud storage. And the end result is that it is both scale and performance for these analytic workloads and a cost savings of up to 80%, which we know in today's cloud world is a very, very significant outcome to be able to achieve. How do you do it? Well, if you're interested in trying KS search or thinking about how how simple is this really the reality is, you can go in and be store connect analyze in in five minutes or less store your data at S3, configure your S3 or GCP connectivity, click to index, click to create a view, and then ultimately get into analysis in less than five minutes. And that's something that you can try on our website today, but it is as easy as store, connect and analyze. And finally, what I'll show here is that if you think about what a log analytics environment looks like today, very, very specifically, if you're using elastic search, you have likely retention limits because you're trying to make sure you keep cost manageable. It's expensive to scale. And it's hard to scale. Management configuration can be a real challenge and downtime ends up being a reality because of the fact that you're constantly having to change how the system works. You have multiple different clusters in which your data resides with chaos search. Your bucket is singular, you have everything sitting in your object store. Each of your users no matter the request can log in through a Kibana API and ask the series of questions without the limits around retention or access that have become all too familiar for organizations today. And just to conclude, if we think about how one of our customers is really experiencing this, if you think about Blackboard, an online education company whose growth just absolutely exploded during the COVID time period, what ended up happening is that their SRE teams were struggling with being able to manage the vast and varied amount of logs that it takes to support millions of users on their platform every day. So it's business critical that they have uptime for each of their students to be able to access the platform. And with chaos search, what can they do? They have a single solution for all of their logs without the hassle of managing it. It's been an awesome experience. And there's some additional detail on that on our website as well. So with that, I am more than happy to take any questions. As I noted here, you could try out chaos search anytime directly at chaos search.com. And with that, I will turn it over to Shannon, who will introduce William. Hartney, thank you so much for kicking us off. And thanks to chaos search for sponsoring and to help make these webinars happen. If you have questions for Courtney, feel free to submit them in the Q&A section of your screen, as she will be joining us in the Q&A at the end of the webinar today with William. Now let me introduce to our speaker for the series, William McKnight. William has advised many of the world's best known organizations, his strategies form the Information Management Plan for leading companies in numerous industries. He is a prolific author and a popular keynote speaker and trainer. He has performed dozens of benchmarks on leading database, data lake streaming and data integration products. And with that, I will give the floor to William to get his presentation started. Hello and welcome. Hello. Thank you, Shannon. And thank you, Courtney. I trust my presentation is being presented okay at this point. We have a big topic here today. And as I was rehearsing for this, I found it difficult to contain the subject in the 35 or 40 minutes that we have together. But I'm going to do my best here and hopefully stimulate some activity on your part to get further details on answering this very important question, which is probably the number one question that I get in regards to analytics. Yeah, how much is it going to cost? That's the number one question I get. So clearly, people care about it. Of course, they care about it. They care about the results. But first, they care about the cost. And so I want to give you some structure around giving an answer to that question. Because just rolling back a few years, if you ask me that question, I would go it depends. And I would say it's a silly question. And, and we, you know, we have to break it down and on and on. But after doing quite a bit of research last year, and also discussing this question with many of you all, many actual in enterprise clients that had multi year budgets in front of them for arrears as well as going forward around their analytics platform, I've developed a stronger sense, I guess, of how to answer that question. And I'm going to share that with you today so that you also have a stronger sense of answering that question, because we can't just say it depends. Okay, so this is a research based presentation. And I've been introduced. I do this stuff for a living. Absolutely. And I enjoy it very much. And I hope you do too. Now, I'm going to start with not the numbers quite yet. But let's talk about getting data under management, because that's what we're trying to do in the analytics platform. So when I'm trying to compare apples to apples, which is very difficult. And I'm going to try to explain all my assumptions as I go forward and get into some numbers here. It's still difficult. But I'm talking about that data that we're trying to get under management from an analytics perspective. When is it under management? Well, when it's a leverageable platform, or in other words, the appropriate platform for its profile and usage, if it's a single application platform, so be it. Let's make sure that it's it's not locked down too hard if we're doing that. But if it's a shared data warehouse or data lake, or mbm platform or some such thing, some hub, all the better, as far as I'm concerned, because you get tremendous leverage off of that. But everybody has a mix of this. The data is under management when you have high non functionals when it's so when you have appropriate levels of the things you see there availability, performance, scalability, stability, durability and security. Now, all these things really play into the cost structure of the platform, because without features that you need, you're going to have to develop them and do the so called workarounds. And that takes time. And that takes money. And I'll get into all that data must be captured the most granular level today. It's not really simply good enough to say I'll take I'll just absorb summaries into the the data platforms. And that'll be good enough because it's not good enough for very long. We found data is at a data quality standard, as defined, not by whims, but by data governance. So that does mean you have to have a bit of a sense of data governance. And I am going to be focusing on the analytical side of the equation here today. Of course, all this is true for all data. But we're going to stay on the analytical side because that's enough. Yes, the analytical architecture, something like what you see here, any data platform, data warehouses, data lakes, streaming data, I show you some representative logos not meant to be exhaustive by any stretch. But to give you an idea, it's the warehouse itself really, that is, to me, that's the heart and soul of this whole platform. And everything kind of rolls down from there, not, you know, doesn't do it in every shop. But I believe that's a good place to to think about where your heartbeat is for this whole architecture. That's right, the data warehouse platform. But you have all these other sort of things. And if you're still in an kind of an older school ETL structure where you have source, target, data integration and BI on top of your destination, you might want to think about jumping into this pipeline approach, because that is what's going to enable machine learning and AI, which will be, I believe, the foundation for competitive advantage of the course of the next probably 20 years and beyond that who knows. But this is all mostly in the cloud today, or at least headed towards the cloud, this analytical side, at least new things more often than not, by a good margin, end up in the cloud. And then of course, you have your your BI tools, your SaaS apps that are analytical in nature. And you have your machine, very importantly, you have your machine learning applications on top of all this. So this is what I'm talking about when I'm talking about the cost structure, not talking about your ERP systems, supply chain management systems, operational systems and so on. However, as even as I say that, I think about all the analytics that go into that, but most of them are developed here and shared back operationally. Still, so and then you have things that are marginal, right? Like Kafka, is that is that operational? Is that analytical? It could be both, it could be sharing data out to both environments. And then also, as I move off this slide, I want to mention data virtualization over the top to get you access to multiple of these structures at once, which is a great way to make sure that you can get access to the data that you need. Now, as we get more into the quantitative side here, total cost of ownership is more than just cloud cox. And we'll get into what all the costs are, or at least what all the costs costs are that I'm going to be thinking about. But these things factory and autonomous administration, you have to do a lot of manual administration of the environment. By the way, by the way, in case you don't know, none of these environments are plug and play, where you don't have to have great expertise to make it happen. They're still very much that way. If you lack platform features, this tends to lead to increased configuration and management. Now, I'm not really getting into the the people costs today, I'm staying more with the stack costs, hardware software cloud sort of thing. But certainly the people cost factor in but my point here is really that even the lack of these things autonomous administration platform features and so on, they do affect the cost of the stack as well. Performance clearly does. And many of the cost structures have to do with how long the how much processing is occurring. And of course, if the system performs better, it's going to incur less processing and so on. And furthermore, you're not running the system to try to get around poor performance by running less than what you really want. So the slide also kind of begs the the ultimate question or the big question out there that I get a lot about, which is indexes. And are they a good thing or a bad thing? Because, you know, if you don't have them in your database platform, you tend to tell, well, it's autonomous administration, don't have to do all that. But if you do have them, you tend to say, Well, but look at what more you can do with indexes, I'm kind of on the fence about it, although I, I would personally prefer having that those capabilities. And, and I'm just a little less trustworthy, I guess, of the system to do the right thing, I'd like to take it under control, but not everybody's that way. So it gets a little philosophical here. We'll get into more of that as we go along. Cost predictability and transparency. Okay, so this is a huge want out there for any new environment. It's a requirement. And you need to know what you're getting into in terms of the cost structure. And it's not, it's not super easy, but it's not super hard. You just want to wrap your mind around it and figure out how you can track it. And I like to take a peek at these things at least weekly to see how our, our environments are tracking towards their anticipated budget. And so on, just to make sure that nothing is spiraling out of control. And some of the things that you see on the slide that are important as we get into the numbers. For some you pay for compute resources as a function of time, but you also choose the hourly rate based on certain enterprise features that you need. Snowflake is famously this way. You have, I think it's about five levels of enterprise features that you want. Most enterprises are going to be skewed towards the upper echelon of those, of course, with some platforms, you pay for bytes processed and the underlying architecture is unknown. As a matter of fact, increasingly the underlying architecture is becoming unknown. That doesn't mean you can't figure it out with a little bit of experience and smarts. I believe we have figured them all out. But nonetheless, that's not public information and something that they are, they being, you know, the Googles, the Amazon's AWS Snowflakes and so on and so on, Oracle Territate and so on, are not really keen anymore on kind of sharing with you what it is. So that way they can swap out and move on. And, you know, you're paying for slots or you're paying for units or something like that. And it's more of a mystical kind of thing. With some platforms you pay for bytes processed and the underlying architecture is unknown. The environment is scaled automatically without affecting price. And there's also a cost per hour flat rate where you would need to calculate how long it would take to run your queries to completion to predict costs. Ask for a sample bill from these guys and make sure that you understand it. Because oftentimes it's kind of there's some shock that happens after that first bill because you didn't know about this or that. And it's all there. They'll share it. But you have to ask cost consciousness and licensing structure. So these are other requirements as you step into your cloud data warehouse, which is again, I'll say the heartbeat of the entire analytic infrastructure. So this is where we're going to start with a lot of the costs. And you're going to see you're going to see what we found in our research in terms of how important this is to the overall cost. So be on the lookout for cost optimizations like not paying when the system is idle, compression, save on storage costs. Well, of course, compression can cost as well when you're actually querying the data. So there's a fine line there, but on balance, the better compression, the better off you are. And moving or isolating workloads to avoid contention. So if you have to do things like that, that is sort of kind of forced kind of processes to just to make things happen, that is going to add up your cost. Look for the ability to directly operate on compact open soft formats, Parquet and Ork, because they're becoming increasingly popular. And if you have to convert them, that's that could be a really big cost for you. Also cost can spin out of control if you have to pay a separate license for each deployment option or each machine learning algorithm. So no, again, know what you're getting. And I guess I like the buffet style in terms of feature functions because it slows things down considerably when you see something that you'd like. But oh, you haven't paid for that. That's not part of your package. Finally, also consider if you will be paying per user per node per terabyte per CPU per hour, etc. Because all those options are out there. As a matter of fact, one thing I like to say is that in cloud data warehousing, data professionals who used to be valued for tuning queries are now valued for tuning costs. So we've got to keep an eye out for those costs just as much as everything else. And a lot of it does fall back on the data professional that's also doing the tuning, because there's that level of expertise required. You have such things as reserved instances, spot instances, bring your own licenses, containers and serverless computing. Reserved instances are typically a one to three year commit. And she best went paid in full upfront. Oh, but I thought that you could pay by the month, pay as you go. Well, there's a cost for that. And discounts are offered if you can pay upfront and if you can make longer commitments. Although those longer commitments can be counter to agile. So what most people do is they'll do their POC, they'll do their MVP at the higher cost structure, not making the big commitment. And when they see the value in site, I guess the ROI in site, and something that they want to commit to, then they can make that longer term commitment and get the cost break. Of course, at the same time, the vendor starts leveraging the situation from day one for that commitment. So it can be difficult. The price performance metric is dollars per query hour. If you've seen any of my benchmarks, you know that's really important. This is defined as a normalized cost of running the workload on the cloud platform. And it's, as you might imagine, how much things cost for the performance that you're getting, which is the ultimate metric, not just price, price performance. So now, building up to the pricing model, we have to understand some things. And the terminology is a little bit across the board when you're looking at these things. So in Azure, for example, the Azure SQL Data Warehouse is scaled by Data Warehouse Units, DWU, nobody else uses that. Which are bundle combinations of CPU, memory, and IO. I'm not going to read the whole thing. Redshift uses EC2-like instances with tightly coupled compute storage. Snowflake nodes are loosely defined measure of virtual compute resources. Yeah. And Google BigQuery doesn't use the concept of a node, but uses the concept of a slot, which is a unit of computational capacity. So in Azure, for example, one unit may be something like 64 gig of memory, 8 virtual CPU, and 1 terabyte of disk, plus or minus. So that's what we're talking about when we're looking at these funny words. With BigQuery, you pay for bytes processed, and the underlying architecture is unknown, and the environment is scaled automatically. So it's serverless, endless, bountiful computing, right? So you'd multiply the terabytes of data by the on-demand dollars per terabyte price. There's also a cost per hour flat rate where you would need to calculate how long it would take to run your queries to completion. And it's hard to know that without actually having run some of the queries. So one of the things I'm going to encourage you to do is to do some benchmarking before you buy and after you buy on a regular basis to make sure that you're in the right place, you can do a budget, and you're on track. So, and I'll just mention quickly that the slot and these other things could be an evolving unit of measure where the hardware is incremented in the clouds. So a 2019 one could be underpowered compared to a 2022 one of these, and it makes equivalency rather difficult to come up with. So why am I going into this this level of detail? Well, it's because this all plays in to the ultimate cost. At least it played into the research that we did that came up with what some of these costs are going to be. It's hard to be predictable for predictability. Anybody with any scale would not do on demand with, for example, BigQuery, because you know they're charging you $5 per terabyte of query and no one can predict that. So unless you do dedicated slots and it gets a little bit easier, then you say I'm going to commit to some number of slots a month and it's going to cost me $20,000 or whatever per month. And Redshift will have you set up the cloud watch where you can get alerts and things like that. So that's pretty nice. But in the bottom line I want to say is don't be a one issue voter. Don't just look at one thing and say, aha, this database is bad on that, on that factor I can't select it. Or this database is the best at this. It has this security certification that I need, etc. And just kind of select on that. So understanding price. The price performance metric is dollars per query hour. That's important. As I mentioned before, each platform has options and buyers should be aware of all of their pricing options. So for Azure SQL data warehouse, you pay for compute resources as a function of time. For Amazon Redshift, you also pay for compute resources as a function of time. And I'm just picking on a few here. For Snowflake, you pay for compute resources as a function of time, just like those two. However, you choose the hourly rate based on certain enterprise features that you need. And there are some of the enterprise levels that they offer. With Google BigQuery, one option is to pay for bytes processed at a number of dollars per terabyte. That's also a BigQuery flat rate, where you get those slots on a flat fee. So what might I want to mention about all of this pricing model? Well, Azure, for example, lets you quickly implement a high performance globally available and secure cloud data warehouse by independently scaling compute and storage while pausing and resuming your data warehouse within minutes through an MPP architecture designed for the cloud. A slot, which of course is BigQuery. It's a logical unit of compute measure. Google doesn't say what it is, but we can tell it's an evolving unit of measure that Google might change over time. So it's something that you can have some, I guess, heterogeneous types of slots in your environment over the course of time and they still could be priced similarly as time goes on. And time will go on, technology will go on, capabilities will increase. But one final word on that slot is that 500 slots today may perform like 250 slots in the future if they allow that to evolve in that way. So we'll see. Pricing gotchas, there's a lot of memory pressure on scale out compute. Scale out is a two edge sort. So whenever a data warehouse does not have enough memory to build a join hash table and keep it in memory, it has to spill the disk. And this is obviously very impactful on performance because the database has to do double work in writing sorting and reading the hash table information on all this rather than in memory. You'll want to be aware of this as you step into your platform. So not and not everything, by the way, is end of month, you know, peak processing or Monday through Friday nine to five overflow processing. Sometimes it's just queries that are that can be run at any time that are very important. So get enough concurrent users that satisfy your needs, right? And if you have, if you're not specced for that, the use number of users is the thing that triggers a lot of the scale out. And so that is something to watch for. And you can get into a situation like this where scale out impacts cost. Here you can see some calculations that we did. We're in the two and a half million to 3.3 million range here of running a cluster and showing how some peak end of month processing or some scaling out during Monday through Friday nine to five or what have you or the combination that you can really escalate your costs for the environment. So yeah, scale out at risk at your own risk. There's great downside cost. So if you've budgeted the 2.2 million and it's automatically scaling in 2x the units of what you have, because that's how these things tend to do it, then your costs are really jacked for that time period until it can get settled down, at which it's pretty conservative about when it does that, by the way. After you go into production, you realize a single cluster is not enough compute power to get you through periods of heavy usage, but it's hard to add clusters on the fly after you get started. So these additional concurrent users in their queries are in addition to the normal day in, day out. The scenario is called in a month and represents in the month intensive processes that trigger a scale out event every month and you can see what happens to your cost structure. So we're going to assume a little bit of that. We're going to assume we're not perfect as we get into the technology stacks and the costs. And I'm going to show you some stacks here so you know what I'm talking about. Again, this is the these are the assumptions that underlie the numbers. These are very important because you might think, well, William's talking about the full time costs, the consulting costs to run all the projects too. No, I'm not. You have to be careful with that. Is William assuming the development and the QA costs? Yes, I am for those. And how about, you know, all the operational systems that are providing the data to this? No, I'm not including that. What am I including? I'm including all this stuff. These 11 things that make up the modern stack. 11 things that make up the modern stack. Now, you see some, some technologies in here. Some stacks, if you will. Nobody's perfect in terms of sticking with one stack forever, right? Maybe some are for now, but things change. But there are shops that are trying to be Azure, trying to be AWS, trying to be all Google, trying to be all as much as the technologies offered Snowflake and others. I am not meaning to exclude any stack. And I'm not being competitive about this at all today. So pretty, I'm going to talk about these stacks a little bit, but I'm not going to try to distinguish their different cost structures very much. When you get into the millions, it's, it can be a little more here or there. And it's important to do you, you know, if you have the, the, the partnership, if you have the skills, if you have the leverage going on with these vendors, you know, that plays in as well. But I am going to talk about reality. I am going to talk about what I've seen out there in terms of what you're getting from a stack, a modern stack and how much it's going to cost. Quickly the categories though, you see them down the left. Actually, I'm going to let you read that. I'm going to let you read all of the different tools and technologies that you need to fill out the stack. But keep in mind, this is for an enterprise level client, enterprise level account, enterprise level project, big projects. If you're not that, then you might be able to back off some of this. You might not need, for example, the data catalog. You might not need machine learning initially. All right. I'm going to question, question you about that. But you may not initially, and I get all that. So let's move on. This is a sample stack cost breakout. So I picked on all, I did this for all the stacks. And again, I'm not trying to be competitive here today. So I'm just going to talk about, this is a sample so that you can see what the cost breakout is potentially. But you can see the big, the big issues here are a dedicated compute and the data lake. Wow. That's the data warehouse and the data lake. That's your data platforms. Those are the big contributors to your underlying budget for the stack. Yes, by quite a margin. The dedicated compute, remember I said the data warehouse is going to be your heartbeat. That's 42%. Data lake, that's half of that. Data integration is not insignificant by any stretch. That can be 14% of your overall stack or more. And then you have all the odds and ends that you need, that add up, that are important as well. Data exploration, spark analytics, streaming, storage. Now storages can be plus or minus by hundreds of percentages to the 2%. Data catalog, you might go with what you get. That's why it's a zero here. That's why identity management and data catalog are zeros because they're included. They're included in the stack at no extra charge, at least for now. Yeah, they're included in this stack for no extra charge, maybe not your stack. So that's why I included it even though it's zero. You got your BI, you got your machine learning. And this is an example of the breakout, like I said before. Now let's drill in a little bit on the big items. Dedicated compute. Dedicated compute. Here's some examples. Now my number, my prices that I'm showing you here that go into the calculations are about from one year ago or plus or minus a few months. So they may have changed. As a matter of fact, I know they have changed, but not by significant amounts. Azure Synapse Analytics Workspace. That's a page you go $1.20 per hour per 100 of those DWUs. I talked about them before. And then you can see how all the others do it. The dedicated compute category represents the part of the analytics stack. It's the data warehouse itself. A modern cloud data warehouse must have separate compute and storage. And in the analytics platforms that we're using here today as examples, they have separate pricing models for compute and storage. Thus, the TCO component deals with the cost of running the compute portion of the data warehouse. That's this what we're looking at. As you can see, it's the overwhelming highest cost contributor. So for example, for Azure, we opted for the unified workspace experience in Azure Synapse Analytics. While you can purchase reserve capacity for a dedicated SQL pool resource under their legacy pricing model, reserve capacity pricing currently not available for Azure Synapse Analytics Workspace. Now, at least it wasn't at the time. May have changed a little bit. For AWS, we chose their latest RA3 family of clusters. For Google, we chose BigQuery with dedicated slots, which are much more economical than on-demand pricing, which was $5 for every terabyte scan. For stack built on Snowflake, and you might say, well, why are you using Snowflake for a stack? It doesn't do everything. Well, that's true, but it does a lot. It's very popular. So I do include it. You might call it a heterogeneous stack, if you will, because you have to fill it out a little bit. But I included here, and we're calling it the Snowflake stack. So for the dedicated compute, of course we use Snowflake. We use their Enterprise Plus pricing model, which in my experience is the most common, and that gives you multi-cluster capabilities, HIPAA support, PCI compliance, and disaster recovery. Now Synapse and Redshift both have the ability to do manual pause compute, and therefore you're manually pausing the billing. We're talking about billing when that resource is not needed. For Snowflake, it does an automatic pause. Usually we just set that to like five minutes or something like that. These three platforms allow you to scale the compute size up and down. With a big query annual slot commitment, you get the best pricing, but there is no way to pause that billing. You need to buy more expensive flex slots for that. Now, coming up with a number of DWU you need for Azure, the number of nodes, the number of slots, the number of credits, what have you needed for all these? Respectfully, we use our field test to determine a like-for-like performance equivalent. So you're going to see some numbers, we're going to make some assumptions about what it takes to run projects, what it takes to run an enterprise, and it's going to be pretty revealing, I think. The data integration category, and I'm not going to belabor it here, but you can see that with Azure, you have the data factory pipeline. Now, for AWS, you have glue. For Google, you have data flow. Those may or may not be good enough for you for what you're doing. We're going to assume they are, but they may not be. If they're not, then you aren't going to have to go with what you're going to have to go with for the snowflake stack, which is something else, which is we use talent as an example. It did it did elevate the cost of that stack quite a bit, but it offers features that you may have to do workarounds in the other ones for, and so you have to pick your poison in regards to that. So talent, informatica, Matillion, what have you in that last category, that last step. Data lake, the data lake, it's becoming increasingly important. Now, we're putting a lot of data there that were not that it might be considered, you know, quote, unquote colder, but it's becoming hotter and it's becoming the platform where we are putting those big volumes of unstructured data. Azure HD Insight, Amazon EMR, Google Data Proc, and for Snowflake, we're going to use the Cloudera data hub in AWS S3. And there you can see they all have a per hour cost, which incredibly Google Data Proc is priced considerably less than its competitors per hour. So that's helpful if your stack weighs a little bit more heavily in a data lake direction. I'm just picking on some of the some of the different components of the stack cost. Data exploration, of course you've got to do that. You're going to need multiple tools here. I'm not really getting to the BI side of things, but more the exploration side of things. I would say Chaos Search is also in at this level and something to consider other than the kind of the givens here you got Azure Synapse Server List, Amazon Redshift Spectrum, Google's BigQuery, and Snowflake has Snowflake. Snowflake again you pay for the compute, but not the data scan. So for the scenarios that we're using, we have to assume a certain number of terabytes scanned, something like 500 per month for a medium enterprise and maybe 2,000 terabytes scanned for large organizations. As we get into the drum oh please technology stack cost. So again I'm going to say not trying to exclude other stacks but for brevity's sake I have to include just so many and I'm not trying to be competitive so you're not going to see stack names on the following church you're just going to get a range. So for example now we do up our project and these are mid-sized projects and you in which begs the question how do I know if I'm small mid-sized or large what have you. Well okay that's difficult and as a matter of fact you might be a large organization but have more of a mid-sized stack to your project. Let's say customer 360 you might have more of a mid-sized orientation around that because you're B2B you have a handful of customers you're not going to go to hot and heavy around that but if you're a bank with millions and millions and millions of customers you might even be a mid-sized bank but you have a large-sized project when it comes to customer 360. So I am talking about the projects now we'll get to the enterprise I'm talking about the projects and this begs the question you see some millions of dollars on here right. So how long is that good for because I know a lot of you a lot of me running projects is agile which means they just go on they go on and on and on and so the typical conversation around cost for me anyway goes like well how much have you paid for customer 360. Oh we've paid 10 15 million dollars and it's taken 10 years yes but you started to get value out of it at year two and I mean it was significant value at that point yes but we keep feeding it yes of course you do but I am looking at as long as the typical project of this nature takes not including people not including including post-production not including post-production which is obviously far less I've removed the platform labels you see the range they're anywhere from 2.7 million to 5.9 million for your customer churn project now for larger projects obviously the prices go up from from there now I am saying by the different colors that the stack matters to the cost and you can see the degree that the stack matters to the cost but we're still talking anywhere from let's say 10 to 15 million for IOT streaming analytics for predictive maintenance for a large sized project now most large organizations you're doing large sized projects okay so you can start to add up these projects now once you get a project up and running in production and it's successful there is a knock on effect to that in terms of all subsequent projects that that use the same type of structure that use all of the components that I showed you before and so now I'm going to show you the whole enterprise cost for a couple years and obviously it's going to be more than any one of these applications that I'm showing you here that I'm picking on today these are important applications to the enterprise these are typical of what enterprises are doing with the modern stack and they cost what you see here you may not think it and this is where I said that it might be a little bit eye-opening because most of the time when I share this research people think oh I thought it'd be a lot less than that well this is what it is now when you add up to the enterprise level though your two to three year I know this slide says three year but I'm I'm more or less thinking that it's a long more the lines of a couple years because I'm seeing some escalation so I made this two years so anyway my crystal ball gets a little fuzzy as we get out in the terms of number of years but over the course of a handful of years you can expect to pay anywhere from if you're a medium enterprise six to fifteen million and a large enterprise nineteen to forty three million so backing up to all the stack components can you see how important it is to get it right because you see how it matters the stack selection matters and that doesn't necessarily mean that everybody's going to pay more for stack b versus stack a if you're more geared towards stack b as an enterprise you have those skills that leverage etc I talked about then it may not be more expensive so there's a lot of it depends in here but I'm trying not to just kick the can down the road and just say it depends to everything this is what you'll pay roughly over the course of two to three years for your enterprises and you're doing projects like I'm going to back up a slide projects like this and you're getting leverage from one project to another I'm not including production I'm not including people cost people are also going to be roughly plus or minus 25% the same with any of them project ROI that's what it's all about that's what it's all about shouldn't be about the cost that I just showed you you got to know that you got to have that walking around knowledge but it should be about what the project's going to bring you to the bottom line now there's no point in spending more than you have to to get to your bottom line benefit but there is a point at spending more if you're going to be able to get more there is a point at spending more if you're going to be able to achieve results faster so keep in mind that the ROI is the bottom line on this and please don't forget that I feel like organizations too often get wrapped up in the cost of things and don't look at the ROI enough because that would get things kicked off faster now I said earlier design your benchmark know the components that go into the calculations I did it I did it I kind of hit it down the middle I do it individually for our clients in terms of situational I do it individually when we run benchmarks but you got to know what are you benchmarking is it query performance low performance what's important here it's all important right but there's only so much time to benchmark first and you're not going to lock and load your entire enterprise into the benchmark you're going to have bits and pieces that you're going to be able to pull into the benchmark but what I find too often is that you underestimate the benchmark or you end up measuring a very small piece and overweighing that piece in your overall evaluation because after all that's why you benchmark don't forget the things that go beyond the benchmark as well don't forget the beyond performance factors it's not all about performance there's manageability there's security on and on right but performance is very important so you want to do your benchmark before you do your calculations because that matters as I've showed you there's different ways of pricing based upon performance based upon data based upon processing etc there's different levels of projects and there's different numbers of projects that you're going to do over the course of time so what is it now March yeah if you start now by the budget cycle for 2023 you might be ready with a better answer than you were last year not as easy as it seems as a matter of fact just to get your AWS pricing for you in the research and what you saw here today I researched 35 to 40 different line items there you see them and it's fine print I didn't mean to to be able to read these off to you right but these are all the line items that go into the cost of the overall stack it's not easy I had to go to 21 different websites to get the full stack cost they don't make it easy so in summary and then I look forward to your questions if you have any for Courtney or myself go ahead and put them in this is your last chance we're about to the Q&A section but I'm going to summarize my part here large project stack cost between 7 and 23 million to get full ML based project to production and 19 to 43 million over two years for the enterprise now that's a big enterprise but nonetheless that's what you can expect and that's you're going to get more than that in terms of value hopefully right but it does kind of underscore the need to to do things right buy or be where the total cost of ownership of cloud analytics platform scales up too demand for analytics will only increase hardware is often the biggest performance bottleneck and most cloud analytical products scale hardware and powers of two in many systems you can add more memory here or more CPU there at a more fractional cost remember only pay for what you use is a two-sided coin the true gauge of value is price performance thus we recommend that you demand reliable performance at a predictable price from your analytical platform and the true gauge of whether you should do a project or not is the bottom line ROI but you shouldn't have to pay more than than you need to pay to get those same results so hopefully you've gotten some information here you can answer some questions a little bit better you can maybe go do some research now with a little bit more of a foundation in place you want to make sure your numbers are coming in more or less where my numbers came in or you might want to question your assumptions you've also seen what a full stack looks like anymore and maybe that's a little eye opening I know it is for some people that I share this with because they think it's half of that or something like that anyway I've uh it's been my pleasure to bring you this presentation today estimating the total cost of your cloud analytics platform been my pleasure to have chaos search along as the sponsor and Courtney and I'll turn it back now to Shannon see if you have any questions William thank you so much and just answer the most commonly asked questions just a reminder I will send a follow-up email by end of day Monday for this webinar with links to the slides and the recording diving in here Courtney this question came in for you as you were giving your presentation if a lot of calculations or transformations are needed to support some reporting analytics how are those handled in chaos search thanks Shannon and I'm happy to go into more detail on this answer to give it a starting point if you're using something like a looker or a tableau as your front door and you are doing calculations in that platform that behavior doesn't change at all so the open infrastructure or open API architecture of our system allows you to work as you always have in the BI tools of your choice what might be a win is that because you are not having to rely on data coming across from multiple sources but instead it's that single unified representation of data it may reduce the complexity of what you might be doing or manipulating in your looker or tableau or other type environment because it no longer is necessary I love it thank you so much William and this came in early into your presentation about 10 minutes into it so also if you need dev tests could be multiple and production instances those could cost more can some pieces be shared some pieces be shared I did assume that you would have dev tests and Q I'm sorry dev QA and production three environments when I when I did the calculations if you have more I would question that but I have some shops do and yeah would cost a little bit more for that maybe maybe these vendors are increasingly working with you to not charge for dev not charge for test and so on can components be shared not quite sure what what that question means you would want to have the same components same projects in all of the environments yeah and we can definitely get into migration and path to production strategies if you want some time but but yeah that that was factored in I love it thank you and and does the prices include IO network costs yes yes everything you need from a stack perspective I'd like to think I'm sure there's a there's some things that that we miss that are just sort of part of the foundation of enterprises that we don't even think about any more but you gotta draw the line somewhere well that does bring us right to the top of the hour thank you both so much for these great presentations and thanks to chaos search for sponsoring these webinars and to help make them happen again just a reminder I will send a follow-up email by any day Monday for this webinar with links to the slides and links to the recording of this session thank you both thanks so much thanks to all of our attendees hope you all have a great day thanks everybody