 Hello and welcome to the Analyst Angle. I'm Rob Stretchade, lead analyst for the Collective from Silicon Angle in the Cube. This is a new segment where we're going to talk about research we're doing, bringing up interesting things, really breaking the signal out from the noise of all the marketing and hype that's out there and really bringing you those insights not only on our research but research from some of our friends and part of the collective. Today we're going to delve into the fascinating world of data platforms and the brewing battle for dominance in that space. The rise of data platforms was inevitable. Given its massive total addressable market, it's estimated to be in the tens of billions of dollars. It's no surprise that every vendor with a data-related offering is vying for a piece of this lucrative market. But how comparable are these different approaches? Today we're going to break that down, take the signal out of the noise and explore just what that is to be a data platform. But before we dive into the details, let's define what we mean by a data platform. To us, a data platform provides the ability to programmatically use data through SQL queries, programmatic APIs like RESTful APIs, has built-in virtualization tools because you want to be able to see the data after you put it where it is, the ability to programmatically ETL or ELT extract, transform, and load that data to and from the data warehouse, data lake, programmatic object storage, all of that, and or access data protocols such as ODBC or JDBC need to be in place. So you need to be able to use the data once you get it to where it is. Essentially a data platform is a comprehensive solution that not only stores data but also offers APIs for data ingestion, methods for data processing, addresses data governance, that's a really important one especially now, and ensures scalability and performance because you want to use it at cloud scale no matter where you are. It can be delivered as build your own platform or as a service and we're going to kind of break that down a little bit further as we go through today as well. Let's discuss the market for a second. For this I got with fellow analyst Dave Vellante to really dig into and examine the data that we get from our partner ETR. We were intrigued by one question that could lead us to know how mature is this market, was it done, was it already won, how were different approaches really coming at this market and what was kind of the state of affairs today within the survey base that ETR uses. The thesis was that although there are some large vendors in the commercial data platform space such as Snowflake and Databricks, there is a ton of room still in this market. In fact we talked about this in a breaking analysis much recently around how much room was there and that the fact is that really it didn't look like there was any dominant player just yet but we wanted to take a next level down and start to examine how much room really is there in this and where can we understand it. So let's take a look at the slide from ETR and what we looked at here was underneath the hood we put in every storage vendor that we could think of and we then overlaid on top of that the most common data platforms. Why did we do this? We wanted to understand the pervasiveness of storage vendors such as Dell EMC, NetApp, Vast, Pure, HPE and IBM and others within accounts so that became the base number of N as you could call it. Then on top of that we wanted to slice and dice how often did people like Databricks, did people like Snowflake or AWS or Google or Microsoft or Oracle HPE with their Esmeral product and couch base how often did they pop up? We actually had a number of others that didn't even rate within these accounts and you can see the N is you know in the 400 the high 400s 480 plus so that's a pretty good end to be directionally accurate and where things stand. So underneath the hood is again the baseline is what storage vendors they're using and overlaid on top of that is momentum within those accounts for those doing ML, AI and databases, data warehouses, data lakes on top of that storage or in the cloud and what you get to see is not surprisingly Microsoft has a significant lead from a perspective of awareness and mentions inside of these accounts being up in the 70% range when you start to break that down further and you look at AWS and Google they're you know about in the middle for AWS and about a little bit to the 30% mark with Google. I think what was fascinating is the height at which they are all three of those are above the 40% line along with snowflake and data bricks that shows the momentum the height is the momentum within those accounts how often or how much market share you could almost say are they gaining within these accounts so you can see that couch base is about 5% plus and you have as morel is actually at HPEs as morel at a minus 10% well what that means for those is they're not growing as fast within these organizations so really just breaking that down this gives us a kind of idea of where are people really looking and how much room is there the fact that snowflake and data bricks are growing so fast faster than their cloud competitors what's really interesting is that they're not pervasive within those accounts yet so when we start to look at it from a headroom or how much total addressable market or TAM do they still have left to go after there's a significant amount of TAM left there and I think this is really why we thought this was going to be a very interesting way to look at this because it helps to understand why things are going the way they're going and why there's so much buzz about data bricks and snowflake whereas there are other competitors in this market and maybe we haven't even just seen all of the ones that are starting to compete for this so let's now dig in and come back to talking about what is going on with the characteristics of each of these and identify as the types of data platforms no these different types of data platforms are not mutually exclusive many companies will have more than one and some of these live in multiple different categories so there will be some overlap when we start to look at this I you know we took a pretty simplistic approach to this I would call it the SWAT analysis approach which is really looking at the strengths weaknesses opportunities and threats within each of these types of data platform categories and then we dropped the threat because really that's used more for when you're analyzing specific companies and vendors within that so we took a slow approach I guess you could call it so we're swollen through this because definitely it solves use cases for building data products with all of the approaches to data platforms with some caveats and we'll kind of explore those caveats as we go along so what we're going to do is kind of jump in here and let's start to examine commercial independent as a service data warehouses data lakes and data mesh as a group of vendors these platforms such as data bricks snowflake starburst couch base and others really constitute these vendors first thing we're going to look at is their strengths they provide several ways to access data programmatically again when we go back to the definition that's really important how I'm going to actually use the data is super important how I can get access how I'm going to engineer so several different ways they have data lakes they have data warehouses sequel python a number of different ways rest our restful apis as well then they have normalized and programmatic ways of ingesting data so if you can't get the data into your data platform if you can't build pipelines on top then it becomes really hard to actually use that data platform these folks in this category really do a good job and have a really robust ecosystem not just their own ways of ingesting the data by sitting underneath applications but they have other partner ecosystems that supply the data pipelines as well they also have catalog and data mapping features for governance as I kind of highlighted data governance is more important than ever in these platforms understanding your lineage understanding where did the data come from in a future discussion we'll get into why that's super important not going to go super deep into it but to kind of just understand that people are building data products on top of these data platforms is huge they really are building out the next gen of data apps with that data product we'll dig into that again in a future analyst angle they have built-in storage platforms for the most part usually object or file can be completely abstracted some actually abstract it fully some give you choices some allow you to go on to object storage in a cloud some require you to have some object storage and some more block-based storage different architectures different cost profiles and we'll kind of delve into that a little bit as well all of them are cloud scale and have performance so this is a huge advantage for them they're out there being able to be deployed widely that really helps them from a perspective as they get deployed and as they scale you're not reinventing how your data platform has to be scaled out this is super important they do scale differently so your mileage will vary depending on which data platform you're using a lot of them have been built out off of open source components and use those for their scaling capabilities as well now let's take a second and dive into the weaknesses for those data lakes data warehouses and data mesh platforms understanding or comparing the cost of the service can be complicated especially bring your own storage and what we mean here is that it's not necessarily all that easy to understand why is it costing you a certain way when maybe you used to do something on-premise and you're trying to compare apples and oranges between the two an on-prem data platform service and a cloud-based data platform service a lot of that gets abstracted and is complicated once you start to bring in bring your own storage then you have to do some pretty significant math to understand how this is all pieced together that can complicate things as well but it can also lead to cost savings options to run in co-location facilities such as equinex or sovereign cloud and or on-premise data centers may not be an option or it may be a very limited version of the product this is another really huge weakness today although some of these as a service based models are bringing them to co-locs through partnering with some of the people that are in the storage data product layer that we'll talk about this is again where things get to be a little bit nuanced in the fact that there are ways to approach this are they cost-effective or not that also your mileage will vary on that so from there we're going to jump into opportunities again the SWO methodology here doing as a service on-premise in co-location or sovereign clouds are solvable if the underlying platform is designed correctly I feel very strongly knowing a lot of these vendors in this category that they can go and be in sovereign clouds they can go and be in co-location facilities they have built usually on Kubernetes or containers and have pieces that they can go and do it's just a cost to market attainment that they have to do with calculations are they going to bring their products to these co-location or on-premise or sovereign clouds I think we'll start to see the sovereign clouds first based on the customers I'm talking to there seems to be a great desire to go beyond the big three or big four companies that they're partnered with so let's take a step back now and next let's explore the capabilities of the hyperscaler clouds in the data platform space again you know these companies companies like amazon web services google and oracle offer extensive sets of services that can rival independent as a service data platform vendors in fact many of them partner with those vendors as well and run their data platforms in their clouds some offering them as actually first-party vendors in the case of something like a data bricks and azure it's actually run by azure so let's start out with the strengths a broad breadth of services can cover data warehousing data lakes and data mesh including first-party services like I just mentioned from some of the independents I think we've all been up and seen the 380 plus services and in AWS and figuring out which data platform service because there's at least seven or eight databases that you could go and use which one is the right approach for you becomes a very complicated but they do have them all they have the vector databases they have a number of different data lake and data warehousing they have hadoop they have a number of these they have spark they have all of these different pieces for you so again the hyperscalers have a completely broad group of these data platforms at least at the data warehouse and data mesh some of the service offerings that go beyond what some of the independent offer as well and comes with capabilities built in AI ML capabilities I think you're starting to see that in some of the announcements that were made at snowflake and data bricks summit they're bringing the AI ML capabilities as well but the hyperscalers do have a little bit of an advantage where it's their ecosystem and they've been there working on this a long time AI and ML is not a new thing for these different hyperscaler providers they've been in this market for quite some time I think where they're working really hard is and what things have changed from a prompt engineering perspective is how do I get over the skillset gap that my customers have and have these service offerings that are easy to use they're really working on that and they do have the capabilities another piece that they have that's an advantage is they have a low initial pricing and enterprise contracts in place with many of the companies many of you out there already have these enterprise pricing agreements and you have to use a certain amount of their services per year so this is makes it very easy for experimentation and this is why you start to see some of the momentum gaining within the hyperscalers when we showed the ETR data earlier it's really interesting to see how that continues to gain as new services as things are launched that make it easier for them to be used they also have an abstract a built-in storage platform layer that is usually object or file can be completely abstracted or you can bring your own type of storage platform from one of their other services so depending on if you're looking to build your complete data platform yourself or if you're looking to take on one of their data platforms as is you have these different ways of trying to optimize your costs but I will say it can become complex very quickly obviously one of their advantages is cloud scale and performance as well they that's where it comes from the cloud right I mean this is motherhood and apple pie from that perspective when you start to look at it they also have a choice of services that solve the problem most you know clouds as I was saying offer a variety of different services that can be used and address different use cases and I think that becomes really one of their their selling points is not only do they have their their own first-party services they have second-party services like Snowflake and Databricks for first-party services by those companies so it becomes a really good way for you to be able to get into a different set of tooling that can help you move very quickly let's dive into the weaknesses I think what I've been just hitting on the choice of services to solve the problem with overlap between services in many cases a service doesn't equal a solution leaving the customer to integrate different pieces or different smaller services together to really attain their data platform right now this is probably the biggest deterrent is do I have the people do I have the knowledge internally to go and architect my data platform do I take one of the raw ones there that may cost me more today initially but has legs to solve that problem in that use case this is a when I talk to customers this is really one of the big points that they're starting to look on is that when they are building out their data services maybe they have to use a particular file system or a particular set of storage or object or maybe it's some compute that has some cost implications based on how they are solving their use case that is true not every service fits every use case the same and really it will vary pretty significantly it also only runs on their cloud I think this is one of the big weaknesses of using the hyperscaler types of data platforms is if you go with their native platform you're really locked into their native platform and getting out of that native platform can be really cumbersome especially how you migrate all of your data start to get to a petabyte or something like that of data in your data platform it really can be tough for them to move this also as I was kind of bringing up earlier is you know really less that they have some agreements in some of some of the bigger hyperscalers have agreements with sovereign clouds to run particular clouds in certain geographic areas such as Europe or China but you're really forced into their on-premise solution if you want to bring things or part of your data on-premise or into a co-location meaning that you're looking at something like an AWS outpost which may or may not have the services offered for everything you want to do or you may have the data left on-premise and have high networking costs going back to a region to use that service this is the same across all of these different hyperscalers it's not unique to AWS, Azure, Google all have these OCI or Oracle has a little bit different approach with how they do their on-premise we'll dive into that another day but again when you start to look at these services what gets run there sometimes it's significantly limited to what you can run on-premise availability of those services in a particular region can even be a challenge and sometimes in countries and that happens to be with the different rollouts or maybe you're limited by the number of GPUs you can find in a particular region as well this can be really tough and especially if there's potential regulations around your data or a governance model that forces you to be in a certain region where the service is not offered so it's not necessarily a zero-sum game by going to the hyperscalers and saying I can use their service everywhere your mileage will vary by service by service provider or hyperscaler now it's not all doom and gloom for the hyperscalers by any stretch of the imagination they have a lot of customers as we saw in the ETR data with Amazon and Microsoft really being having significant momentum along with Google who's at a much letter less amount of actual usage today but by partnering with most if not all independent data platforms hyperscaler clouds gain direct or indirect revenue from use of compute network and storage so this opportunity is that if they have an all of the above strategy which some of them do I'm going to put all of those independent ones on there all those independent data platforms onto my hyperscale cloud because at the end of the day I'm looking for consumption of network of compute and of storage so it's not a zero-sum game and competition is healthy for them which is why you see them supporting them and working with them so much now may have been different story you know six seven years ago when Snowflake was just becoming what they are today today they're embracing that hyperscale clouds also understand that customers are not are looking for solutions not services and can do better if not slower job to build those customer requirements into their services and what I mean here is that I have to still a lot of times piecemeal all of the different pieces or you know three or four different services together to get my pipelines and everything else my database get all my data lake all of these different things set up within a hyperscaler this is a place where I think they'll quickly make and seize on this opportunity it's a customer experience or cx involvement in making it easier to get to your initial data platform that can rival some of those independent commercial data platforms now that we're done with the hyperscalers let me take a second and dive into the ones have traditionally owned the data and these are the data storage platform vendors I think again we all know who they are they've been in the data centers for years and years there's hundreds of billions of dollars in this space still to this day moving on and let's consider what are these data storage platform vendors doing today these vendors traditionally focused on on-prem data centers or co-location facilities and still to a major part do they've expanded their offerings also to include hyperscale clouds as a service some of them have actually done pretty well moving their file systems or what have you up there so some of the examples of these vendors are hpe storage with esmeral pure storage with portworks data services vast data services products del storage as part of their apex hitachi vitara storage with pentaho software and there's many others including ibm who has a net of software that runs on their storage as well or in their cloud but there's a reason why they have still have significant strength in this market and this market is not one as we saw they still have a lot of headroom and are in a lot of this market that people like data bricks that people like snowflake are not in or couch base for that matter they started with the storage platform development and they have a history of building economical storage offerings what does that mean they've built the storage the flash the object the file spinning disk mostly moving towards flash type all flash type arrays that can be very economically priced as a service with software on top again another strength is that they offer op-ex or cap-ex options for platform acquisition in the type of market we're in today where you know IT budgets may not be growing as much as they could be in a future years or in the out years cap-ex may be attractive to some of these maybe i don't want to rent all the petabytes of data all the time every month month over month so how you acquire this could be not just op-ex but cap-ex and that gives them a significant advantage over some of the hyperscalers as well as those commercial data platform offerings because most of those are ARR or annual recurring revenue generated so i go in with a subscription whereas the still a significant amount of storage data platforms are sold as cap-ex today most offer all types of storage object file block all in a single scalable platform these object offerings provide programmatic usually s3 api or restful api access to that data so there are different programmatic ways to get at the data in its raw state often the storage offerings can be prepackaged with or in a compute platform with an operate a cloud operating environment so what does that mean usually these are referred to as hyperconverged or converged systems and include virtual machines or a kubernetes layer on top that can get you up and running really quickly so maybe then i have you know a services vending machine that can pull down known good again we can get into software supply chains and who has better known good software to go and do things like spark or what have you on top of it so again they get you up and running very quickly on-prem or in your colo once the hardware arrives and customers can deploy pretty much anywhere i think this is a huge advantage even in some cases in the hyperscale cloud so maybe i want to have part of my environment up there and they have a way for me to move data in between on-prem colo and hyperscale cloud but the strength comes in the you know as a service models with co-location and msp partners this gives them more regional support and gives them breath they've been selling servers to you forever they all understand what you're doing how you're doing it now some of the language that they're using is very different nowadays but again they're able to come to you with these as a service models and partners that are co-location or msp or managed service providers that can actually help you with that skills gap that you may be experiencing it's a huge advantage having that partner network the hyperscalers to a lesser extent do have partner networks such as that but where they do compete in the hosting aspect of it really brings them down a little bit so that's the sunny side of it now let's go into the weaknesses and start to look at what what are the catches here most do not have data warehouse data lake or data mesh environments installed out of the box in their storage systems leaving it up to the customer to figure out or buy yet another product from that company to overlay on top of their storage foundation so this again becomes a I need to build it myself but at a very different level and this is also changing however I think that that's the exciting part what we're starting to see is this market they realize it they're not dumb they see how the data platforms they see how the clouds are bringing simplicity customer experience and they understand that they want to get there an example is hpe esmeral leverages open source and data mesh technology such as presto pure port works provides an open source catalog and pipelines on top of pure storage and ibm has a number of technologies across the spaces some of which have come from the formally being formally in the red hat side of the business this is a space you should be watching this is going to be quickly evolving and if I had to put a pin in anything here this is the space that I really would be watching because I think you're going to start to see a lot of blurring of the lines between storage and data platforms over the coming months here and I definitely think into the back half of the year your go these are going to be compelling offerings that you may want to take a look at in addition to what you're doing with people like snowflake and data bricks another weakness is most platforms do not run in the hyperscale cloud those that do are just storage platforms for the most place or they're an overlay over the storage in that hyperscalar cloud at that point in time it really varies if you're going to go with one of these data platforms that's been built and can do on premise or colo or you're going to just go with the hyperscalar services or one of those first party data services from the commercial data warehouse data lake data mesh providers who make it very simple inside of those hyperscalars i think this is going to be one of the key battlegrounds is what ends up in the hyperscale cloud can these platforms bridge that gap because again it's a big piece for them to go and provide and it's being that kind of pervasive layer it's the same challenge that the hyperscalars have being only in their own cloud maybe in their on-prem type of outpost thing but it's it's definitely a weakness that we'll see how they overcome it but they're definitely aiming at that data ingest is up to the customer all all of the storage platforms focus on more of the actual at rest data versus getting the data in there that's where these add-ons such as esmeral or portworks come in and you'll start to see others that are doing a really good job i can't go into it but really under the hood you'll start to see it becoming more and more integrated with those storage platforms because they understand that it's not just about where does the data sit the type of data or type of access you know object file or block it's really about how you make the data usable so that kind of brings us to the opportunities that these storage platform vendors and data platform vendors have and it's by adding these data platforms pre-integrating as a service into the storage platform vendors this will be a way for them to address the caller what i would call the cost and value conundrums that data engineers and cio's face today you know it's hey i keep putting things into amazon it's all hot data and i still have these costs that keep on occurring i don't get to tear them off to you know a lower tier of s3 and be able to use glacier for instance because the data is so used so often and i'm not sure how it's going to be used this is really a conundrum for the hyperscalers and is a place and an opportunity for these storage data platforms to really come in and be able to show the value because this is where they've built their current customer base i think also regulations will force data platforms into co-location sovereign clouds and regional cloud providers because it's just those regulations are coming i mean we had some stuff around safe harbor out of the eu just last week which really will probably get shot down in the european courts because it doesn't go far enough you're going to see eventually some us based regulations that are going to force this to map to the gdpr type regulations for all of europe you still have ccpa in california and vc dpa in virginia those are kind of you know the canaries in the coal mine that are really showing you that there needs to be a data mapping between us and gdpr it's going to come at the governmental layer of the us i would also say that one of the places they do play very well and it's an opportunity for them to sell is no egress or ingress fees on the data movement can give them a incredible cost advantage the networking aspect of it most of these companies partner with and or are working with or have their own first party networking solutions in this space so again i think this is a place where the networking could give them a huge opportunity and a huge win from a cost perspective lastly we're going to explore open source data platforms these are solutions predominantly available through they're supporting commercial entities again data bricks has delta lake and patchy spark under the hood but again you start to see them you can still download them as open source platforms as well at their source they provide many ways for data access programmatically similar to commercial data platforms examples include what i was talking about many of the commercial products are built on top of these types of offerings or open source offerings like a patchy spark delta lake presto trino and many more those are kind of the building blocks you still have to take a lego approach to this and connect the right pieces together i mean the strengths you know they provide several ways of accessing the data programmatically you can choose based on which platform your organization deploys so this is huge depending on what your skill set is in your organization they can solve they can be more sequel oriented they can be more python oriented if you're looking to use jupiter notebooks on top of it they have ways to plug in but again you've got a build towards your end use case and goal are you doing really data engineering are you doing data science who is the end persona that is actually using the product and the output that data product that is coming out of that platform like the commercial data platforms open source has normalized and program programmatic ways to ingest data as well most of them are built in you also have different pipelines or ways of going about this you have things such as rabbit mq for moving things around it there's a lot of different ways for you to look at the pipelines and how you really move the data around between where it's discovered or ingested and in your data lakes their cloud scale and the performance is there these are these have been engineered for a hyperscale cloud benefit so you're really going to be able to get that right out of the box without having to worry most of them have thought through how do you scale on instances how do you scale across that storage layer behind the data platform and i'll say that the initial cost of the software can be minimal to zero for the data platform where you do get hit is in the actual experience and the people and the skill sets that you have to have there so this this one although i put it here is also in the weakness category as you'll see as well so let's jump into it into the weaknesses when building from an open source component perspective all the system integration work is on the end user again it's that legal approach you're going to have to have the people that becomes pretty hard an organization another kind of weakness is an organization needs to question is this core to their business or is it a science project as the cost of doing pure open source can look attractive initially but quickly escalate and this is in people this is in resources you have to understand how to run these because again if you're running them in open in a hyperscale cloud for instance you download some open source you start to run it those instance costs or if you're using functions or lambdas they can spiral pretty quickly plus you have to look at your networking expense as well because where is the data how do i move the data where do i acquire the data has to be engineered into your platform as you build it catalog and data data mapping features for governance will not be built in from the start this is and can lead to a data platform silo or worse overly permissive data access leaving the organization exposed so what am i saying here is that you're not going to have the bells and features of governance that a snowflake or a data bricks or one of the other storage vendors is putting in out of the box and you're going to have to layer on some open source or some other components to really handle that or maybe use some hyperscaler services to deal with your governance of that data that's a huge problem and i think that most organizations look at it as being way too complex or maybe if it's core to their business it becomes a data silo and they're able to manage it that way and justify the cost that could be hey tech is my thing or i'm an online e-tailer and i build a certain portion of it out of open source data platform because that is so core to what i do with recommendation engines or something like that built on top of those data platforms and we'll get into that again in another session but again it it can be very complex to go out and build it i've been there done that and seen it and have the scars to you know really prove that building your own sass can be very complicated and building or buying your own storage platform means that you know the complexity is really not completely abstracted you still have to have people who know how to run these systems under the hood of the open source if you're bringing it on-prem or you need to know how to operate that for the open source inside your hyperscaler again not completely abstracted you have to have that knowledge in the house so it's not all doom and gloom for open source i mean when we look at opportunities for open source open source is innovation i'm very big and very bullish on the open source and i do think it's a really good place to look for inspiration and experiment that can lead to really good outcomes so maybe starting with a data platform that is open source and then looking for the commercial component from one of the storage vendors or one of the commercial open source vendors or even within those hyperscalers as well i think that can give you a really great way to look at this and i think actually contributing back into the open source community if you have that luxury as an end user can be really advantageous because you can see some of these people who are pushing the cutting edge with the open source because it is germane to their business and revenue generating this can give you really great ideas on how to take the next step how to decide is it a science project or is it really something that is going to be revenue generating it'll be really something to be involved with and i you know they all have slack channels i highly recommend getting involved and open source will be where the next generation of data platforms is invented and involved this is where everything has come from already this is where it's going this is why you have a patchy spark being kind of the underpinnings originally of data bricks you have delta lake which is the data warehousing that has been open source you have a number of these different like trino coming out of presto where they're going from a data mesh perspective is really evolving so being involved understanding this at least tracking it is super important and we track it here and so you'll be able to see that on the analyst angle as well and come back here we do a lot with open source and we are very bullish on where it's going you know open source versus closed source open source is definitely one so once we did our swowe looking at strengths weaknesses and opportunities we had the opportunity to put them in a quadrant because really you're not an analyst without a quadrant so here's our quadrant really looking at where does everything stand from a perspective of you know how hard is it to integrate so is it build your own on the axis the middle axis here kind of the equator line are you going to build your own to all the way to packaged is it cloud native on-prem or in colo i kind of wanted to simplify this down a little bit so that it can give you a perspective because again you may use multiple different data platforms because your use cases definitely tell you you should again i don't think that any one data platform will win an entire organization it's going to be very rare that that happens but again you have choices you have choices in deployment such as cloud the hyperscalar clouds again they score poorly on being able to be hybrid because they are so wed to the services being in their hyperscalar cloud next to that you have the commercial they're kind of a little bit better they've made some agreements snowflake last year at their conference had gone through and made an agreement with del to be able to bring snowflake to del and to the del apex with some of their colocation providers again do we see a lot of that in the market not yet but again you do have those options so i think that the commercial folks are looking at this commercial data platform folks are looking at this as hey we can go there they also do do more packaging in some cases snowflake it's heavily integrated with data bricks it's a little bit more federated in how they approach this same with people like those others that are out there like starburst and others where it may be more federated and you're bringing other pieces to bear so they kind of almost straddle the line a little bit between build your own and fully packaged but for the pure data platform where you don't have to worry about the storage it's pretty much taken care of for you the infrastructure the scaling the virtual machines or the instances that are in the cloud they score really highly on that and that's why they've dropped towards the center then you have your storage data platforms they're really heavily packaged very well integrated from a storage perspective up they're a little bit not as packaged when you get to how they bring those data platform services such as spark or kafka or something else where you go into and how you download it they're getting better and i think again i can't i guess say it enough this is absolutely a place to watch is these folks move closer and closer towards the center and come towards where those commercial platforms are at leveraging a significant amount of open source and you'll see that most of those vendors are spending a good amount of money and a good number of engineers to support those open source packages such as presto in the open source community so that they can see it thrive again another way to look at that because although i've put open source far to the left and a little bit above the line from a more used in cloud than they are on premise i think it's you know those coming together with the storage data platforms is really a compelling aspect of it and i think this will be interesting to see it's a place to watch i think doing pure you know open source is really tough and i think that's you know a key that you need to look at so let's bring this home let me wrap this up with what's the angle here and let's kind of talk about it's about things you should consider when you're looking at data platforms i didn't want to just throw out there a whole bunch of stats and what i'm seeing from all of the customers but what am i also hearing about them when they're talking when we talk to enterprises and they're trying to figure out how do i go and buy my data platform it's where you can deploy your data platform will vary most organizations will have some on-prem will have some colo and will have some hyperscaler you have to consider this when you're looking at this trying to avoid those different silos costs will very greatly depending on all of the components you bring to bear so again you can go with fully packaged using everything from one vendor may have a higher cost from a dollar's perspective or op-ex or even potentially cap-ex perspective but it may lower your costs from a people perspective and it may help you get over skills gaps that you have so again trade-offs and costs will vary greatly between where you deploy how you deploy and how you actually maintain that from a go forward perspective who bundles and runs the platform is a major choice again this is kind of building off that skill set you really need to understand what's going on there and how things will vary and a single platform versus build your own platform is going to be a big piece of this as well building your own platform can have some cost advantages from a software perspective and an arr and an op-ex but it may have cap-ex expects may have cap-ex or op-ex requirements from people or gear that you need to then go and get so this is not just about where it is but it's how you bring it together as well choice of compute and storage will vary and will be either abstracted or not and that's becomes a skills gap perspective as well if you already have a really good aws group or azure group or gcp or oracle group to go and run these data platforms you can see that being an advantage that you may know how to run it there effectively but again that's something that we'll see how that pans out and you'll tell me how you're doing and with this I want to thank you for joining the analyst angle on the cube powered by ETR and thank you for joining us on this exploration of the brewing data platform battle stay tuned for more analyst angles on the cube where the collective extracts the signal from the noise take care