 Live from San Jose, it's the Cube. Presenting Big Data Silicon Valley, brought to you by SiliconANGLE Media and its ecosystem partners. Hi, I'm George Gilbert, and we are broadcasting from the Strata Data Conference. We're right around the corner at the forager, tasting room, and eatery. We have this wonderful location here, and we are very lucky to have with us Michael Nixon from Snowflake, which is leading cloud data warehouse, and David Abercrombie from Share Through, which is a leading ad tech company. And between the two of them, they're going to tell us some of the most advanced use cases we have now for cloud native data warehousing. Michael, why don't you start with giving us some context for how on a cloud platform one might rethink a data warehouse? Oh yeah, thank you, that's great. A question because, let me first answer it from the end user business value perspective. When you run a workload on a cloud, there's a certain level of expectation you want out of cloud, you want scalability, you want unlimited scalability, you want to be able to support all your users, you want to be able to support the data types, whatever they may be that comes in into your organization. So there's a level of expectation that one should expect from a service point of view once you're in the cloud. And so a lot of the technology that were built up to this point have been optimized for on-premises types of data warehousing where perhaps that level of service and concurrency and unlimited scalability was not really expected. But guess what? Once it comes to it, the cloud is expected. So those on-premises technologies aren't suitable in the cloud. So for enterprises, and I mean companies, organizations of all types, from finance, banking, manufacturing, ad tech as we'll have today, they want that level of service in the cloud. And so those technologies will not work. And so it requires a rethinking of how those architectures are built. And it requires being built for the cloud. And just to break this down and be really concrete, some of the rethinking, we separate compute from storage, which is a familiar pattern that we've learned in the cloud, but we also then have to have this sort of independent elasticity between the storage and the compute. And then Snowflake's taken it even a step further where you can spin up multiple compute clusters. Tell us how that works and why that's so difficult and unique. Yeah, that's taken us under the covers a little bit, but what makes our infrastructure unique is that we have a three-layer architecture. We separate, just as you said, storage from the compute layer from the services layer. And that's really important because as I mentioned before, you want unlimited capacity and unlimited resources. So if you scale compute in today's world on on-premises MPP, what that really means is that you have to bring the storage along with the compute because compute is tied to the storage. So when you scale the storage along with the compute, usually that involves a lot of burden on the data warehouse manager because now they have to redistribute the data and that means redistributing keys, managing keys if you will, and that's a burden. And by the reverse, if all you wanted to do was increase storage, right, but not to compute because compute was tied to storage, why you have to buy these additional compute nodes and that might add to the cost when in fact all you really wanted to pay for was for additional storage. So by separating those, you keep them independent and so you can scale storage apart from compute. And then once you have your compute resources in place to virtual warehouses that you're talking about that has completed the job, you spun them up, it's done this job, you take it down, guess what? You can release those resources and of course releasing those resources, basically you can cut your costs as well because for us is pure usage based pricing, you only pay for what you use. And that's really fantastic. Very different from the on-prem model where as you were saying you tied compute and storage together so. Yeah, let's think about what that means architecturally, right? So if you have an on-premises data warehouse and you want to scale your capacity, chances are you'll have to have that hardware in place already. And having that hardware in place already means you're paying that expense. And so you may pay for that expense six months prior to you need it. Let's take a retailer example. You're gearing up for peak season, which might be Christmas. And so you put that hardware in place sometime in June, you will always put it in advance because why you have to bring up the environment so you have to allow time for implementation if you are the deployment to make sure everything is operational. And then what happens is when that peak period comes, you can't expand in that capacity. But what happens once that peak period is over? You paid for that hardware, but you don't really need it. So our vision is, or the vision we believe you should have when you move work close to the cloud is you pay for those resources when you need them. Okay. So now, David, help us understand first, what was the business problem you were trying to solve? And why was Snowflake sort of uniquely suited for that? Well, let me talk a little bit about share-through where AdTech, the core of our business, we run an ad exchange where we're doing programmatic training with the bids, with the real-time bidding spec. The data is very high volume with 12 billion impressions a month. That's a lot of bids that we have to process, a lot of bid requests. The way it operates, the bids and the bid responses in programmatic training are encoded in JSON. So our ad exchange is basically exchanging messages in JSON with our business partners. And the JSONs are very complicated. There's a lot of richness and detail such that the advertisers can decide whether or not they want to bid. Well, this data is very complicated, very high volume and advertising like any business, we really need to have good analytics to understand how our business is operating, how our publishers are doing, how our advertisers are doing. And it all depends upon this very high volume, very complex JSON event data stream. So Snowflake was able to ingest our high volume data very gracefully. The JSON parsing techniques of Snowflake allow me to expose the complicated data structure in a way that's very transparent and usable to our analysts. It's our use of Snowflake has replaced clunkier tools where the analysts basically had to be programmers, writing programs in Scala or something to do an analysis. And now, because we've transparently and easily exposed the complicated structures within Snowflake in a relational database, they can use good old fashioned SQL to run their queries. Literally, afternoon analysis is now a five minute query. So let me, as I'm listening to you to describe this, we've had various vendors telling us about these workflows in the sort of data prep and data science tool change. It almost sounds to me like Snowflake is taking semi-structured or complex data and it's sort of unraveling it. And I don't know, normalizing is a kind of overloaded term, but it's making it business ready. And so you don't need as much of that data prep, manual data prep. Yeah, exactly. You don't need as much manual data prep or you don't need as much expertise. For instance, Snowflake's JSON capabilities in terms of drilling down the JSON tree with dot path notation or expanding nested objects is very expressive, very powerful, but still your typical analyst or your BI tool certainly wouldn't know how to do that. So in Snowflake, we sort of have our cake, you need it too. We can have our JSONs with their full richness in our database, but yet we can simplify and expose the data elements that are needed for analysis so that an analyst, their first day on the job they can get right to work and start writing queries. So let me ask you a little more about the programmatic ad use case. So if you have billions of impressions per month, I'm guessing that means you have quite a few times more in terms of bids. And then there's the, once you have, I guess, a successful one, you want to track what happens. Correct. So tell us a little more about what that workload looks like in terms of what analytics you're trying to perform, what you're tracking. Yeah, well you're right, there's different steps in our funnel, the impression request expands out by a factor of a dozen as we send it to all the different potential bidders. We track all that data, the responses come back, we track that, we have to track our decisions and why we selected the bidder. And then once the ad is shown, of course there's various beacons and tracking things that fire, we have to track all of that data. And the only way we can make sense out of our business is by bringing all that data together. And in a way that is reliable, transparent, visible, and also has data integrity. That's another thing I like about the Snowflake database is that it's a good old fashioned SQL database that I can declare my primary keys. I can run QC checks, I can ensure high data integrity that is demanded by BI and other sorts of analytics. What would be, as you continue to push the boundaries of the ad tech service, what's some functionality that you're looking to add and Snowflake as your partner, either that's in there now that you still need to take advantage of or things that you're looking to do in the future? Well, moving forward, of course, it's very important for us to be able to quickly gauge the effectiveness of new products. The ad tech market is fast changing. There's always new ways of bidding, new products that are being developed, new ways for the ad ecosystem to work. And so as we roll those out, we need to be able to quickly analyze, is this thing working or not? Is this kind of an agile environment, pivot or pervit? Does this feature work or not? So having all the data in one place makes that possible for that very quick assessment of the viability of a new feature, new product. And dropping down a little under the covers for how that works, does that mean like you still have the base JSON data that you've absorbed, but you're going to expose it with different, perhaps, schemas or access patterns? Yeah, indeed. For instance, we make use of the SQL schemas, roles and permissions internally where we can have the different teams have their own domain of data and they can expose internally. And looking forward, there's the share house feature of Snowflake that we're looking to implement with our partners, where rather than sending them data that has like a daily dump of data, we can give them access to their data and our database through this top layer that Michael mentioned, the service layer, essentially allows me to create a view, grant select on to another customer. So I no longer have to send daily data dumps to partners or have some sort of API for getting data. They can simply query the data themselves. So we'll be implementing that feature with our major partners. I would be remiss in not asking at a data conference like this, now that there's the tie in with QBOL and Spark integration and machine learning, are there anything along that front that you're planning to exploit in the near future? Well, yeah, share through. We're very experimental, playful. We're always examining new data technologies and new ways of doing things. But now with Snowflake has sort of our data warehouse of curated data, I've got two petabytes of reference integrity data, and that is reliable. We can move forward into our other analyses and other uses of data, knowing that we have captured every event exactly once, and we know exactly where it fits in in business context in a relational manner. It's clean, good data integrity, reliable, accessible, visible, and it's just plain old sequel. This is, that's actually a nice way to sum it up, as in we've got the integrity of what that we've come to expect and love from relational databases. We've got the flexibility of machine oriented data or JSON, but we don't have to give up the query engine. And then now you have more advanced features, analytic features that you can take advantage of coming down the pipe. Yeah, again, we're a modern platform for the modern age that's basically cloud-based computing. And with a platform like Snowflake in the back end, you can now move those workloads that you're accustomed to, to the cloud and have an environment that you're familiar with and it saves you a lot of time and effort. You can focus on more strategic projects. Okay, well with that, we're going to take a short break. This has been George Gilbert, we're with Michael Nixon of Snowflake and David Abercrombie of ShareThrough, listening to how the most modern ad tech companies are taking advantage of the most modern cloud data warehouses. And we'll be back after a short break here at the Strata Data Conference, thanks.