 It's theCUBE, covering the Virtual Vertica Big Data Conference 2020, brought to you by Vertica. Hello everybody, welcome to this special digital presentation of theCUBE. We're tracking the Vertica Virtual Big Data Conferences, the CUBEs, I think fifth year doing the BDC. We've been to every big data conference that they've held and really excited to be helping with the digital component here in these interesting times. Ron Cormier is here, Principal Database Engineer at the Trade Desk. Ron, great to see you, thanks for coming on. Hi David, my pleasure, good to see you as well. So we're talking a little bit about your background. You're basically a Vertica and database guru, but tell us about your role at the Trade Desk and then I want to get into a little bit about what the Trade Desk does. Sure. So I'm a Principal Database Engineer at the Trade Desk. The Trade Desk was one of my customers when I was working at HP as a member of the Vertica team. And I joined the Trade Desk in early 2016. And since then I've been working on building out their Vertica capabilities and expanding the data warehouse footprint in a kind of ever-growing database technology, data volume environment. And the Trade Desk is an ad tech firm and you specialize in real-time ad serving and pricing. And I guess real-time, people talk about real-time a lot. We define real-time as before you lose the customer. Maybe you could talk a little bit about the Trade Desk and the business and maybe how you define real-time. Totally. So to give everybody kind of a frame of reference, anytime you pull up your phone or your laptop and you go to a website or you use some app and you see an ad, what's happening behind the scenes is an auction is taking place. And people are bidding on the privilege to show you an ad. And across the open internet, this happens seven to 13 million times per second. And so the ads, the whole auction dynamic and the display of the ad needs to happen really fast. So that's about as real-time as it gets outside of high-frequency trading as far as I'm aware. The Trade Desk participates in those auctions. We bid on behalf of our customers, which are ad agencies and the agencies represent brand. So the agencies are the madmen, companies of the world and they have brands that under their guidance. And so they give us budget to spend, to place the ads and display them. And once the ads get displayed, so we bid on the hundreds of thousands of auctions per second. Once we make those bids, anytime we do make a bid, some data flows into our data platform, which is powered by Vertica. So we're getting hundreds of thousands of events per second. We have other events that flow into Vertica as well. And we clean them up, we aggregate them, and then we run reports on the data. And we run about 40,000 reports per day on behalf of our customers. The reports aren't as real-time as I was talking about earlier. They're more batch-oriented. Our customers like to see big chunks of time, like a whole day or a whole week or a whole month on a single report. So we wait for that time period to complete and then we run the report and send them the results. So you have one of the largest commercial infrastructures in the big data sphere. Paint a picture for us. I understand you've got a couple of 320 node clusters. We're talking about petabytes of data, but describe what your environment looks like. Sure, so like I said, we've been Vertica customers for a while and we started out with a bunch of enterprise clusters. So the enterprise mode is the traditional Vertica deployment where the compute and the storage is tightly coupled, all the rate of raise on the servers. And we had four of those and we were doing okay, but our volumes are ever increasing. We wanted to store more data and we wanted to run more reports in a short period of time, keep pushing. And so we had these four clusters and then we started talking with Vertica about Eon mode and that's Vertica's separation of compute and storage where the compute and the storage can be scaled independently. We can add storage without adding compute or vice versa or we can add both. So that was something that we were very interested in for a couple of reasons. One, our enterprise clusters were running out of disk. Like an adding disk is expensive in enterprise mode. It's kind of a pain. You got to add compute at the same time. So you kind of end up in an unbalanced place. So Eon mode, that problem gets a lot better. We can add disk, infinite disk because it's backed by S3. And we can add compute really easy to scale the number of things that we run in parallel, the concurrency, just add a sub cluster. So they are two US East and US West of Amazon so regionally diverse. And the real benefit is that they can, we can stop nodes when we don't need them. Our workload is fairly lumpy, I call it. After the day completes, we do the ingest, we do the aggregation for ingesting and aggregating all day but the final hour or so needs to be completed. And then once that's done, the number of reports that we need to run spikes up, it goes really high. And we run those reports, we spin up a bunch of extra compute on the fly, run those reports and then spin them down and we don't have to pay for the rest of the day. So Eon has been a nice boon for us for both those reasons. I'd love to explore Eon a little bit more. I mean, it's relatively new. I think 2018 Vertica announced Eon mode. So it's only been out there a couple of years. So I'm curious for the folks that haven't moved to Eon mode, which presumably they want to for the same reasons that you mentioned, why buy compute and storage and chunks when you run out of storage, if you don't have to. What were some of the challenges that you had to, that you faced in going to Eon mode? What kind of things did you have to prepare for? Were there any out of scope expectations? Can you share that experience with us? Sure. So we were an early adopter. We participated in the beta program. I mean, I think it's fair to say we actually drove the requirements in a lot of ways because we approached Vertica early on. So the challenges were what you'd expect any early adopter to be going through. The sort of getting things working as expected. I mean, there's a number of cases which I could touch upon. Like we found an efficiency in the way that it accesses the data on S3. And it was accessing the data too frequently, which ended up was just expensive. So our S3 bill went up pretty significantly for a couple of months. So that was a challenge, but we worked through that. Another was that we recently made huge strides in with Vertica was the ability to stop and start nodes and not have to start them very quickly. And when they start to not interfere with any running queries. So when we create, when we want to spin up a bunch of compute, there was a point in time when it would break certain queries that were already running. So that was a challenge. But again, the Vertica team has been quite responsive to solving these issues and now that's behind us. In terms of those who need to get started, there's a, or looking to get started, there's a number of things to think about. Off the top of my head, there's sort of new configuration items that you'll want to think about like how instance type. So certainly Amazon has a variety of instances and it's important to consider one of Vertica's architectural advantages in these areas. Vertica has this caching layer on the instances themselves. And what that does is if we can keep the data in cache what we've found is that the performance is basically the same performance of enterprise mode. So having a good size cache when needed can be important. So we went with the i3 instance types which have a lot of local NVMe storage that we can, so we can cache data and get good performance. That's one thing to think about the number of nodes, the instance type, certainly the number of shards is a sort of technical item that needs to be considered. It's how the data gets distributed. It's sort of a layer on top of the segmentation that some Vertica engineers will be familiar with. And probably one of the big things that one needs to consider is how to get data in the database. So if you have an existing database there's no sort of nice tool yet to suck all the data into an EON database. And so I think they're working on that but at the point we got there we had to, we exported all our data out of enterprise cluster as RK, bumped it out to S3 and then we had the EON cluster. It's like that data. Yeah, so awesome advice. Thank you for sharing that with the community. So but at the end of the day, so it sounds like you had some learning to do, some tweaking to do and obviously you had to get the data in. At the end of the day, was it worth it? What was the business impact? Yeah, it definitely was worth it for us. I mean, so right now we have four times the data in our EON cluster that we have in our enterprise clusters. We still run some enterprise clusters. We started with four at the peak, now we're down to two. So we have the two EON clusters. So it's been, I think our business would say it's been a huge win. Like we're doing things that we really never could have done before. Like forexing the data on enterprise would have been really difficult. It would have required non-trivial engineering to do things like daisy chaining clusters together and then how to aggregate data across clusters which would again, non-trivial. So we have all the data we want. We can continue to grow data. We're running reports on seasonality so our customers can compare their campaigns last year versus this year, which is, we just haven't been able to do in the past. We've expanded, so we grew the data vertically. We've expanded the data horizontally as well. So we're adding columns to our aggregates. We're enriching the data much more than we have in the past. So while we still have enterprise picking around, I'd say our EON clusters are doing the majority of the heavy lifting. And cloud was part of the enablement here, particularly with scale. Is that right? Definitely, definitely. And you're running on-prem as well? Is it even a hybrid mode or is it all AWS? Good question. So yeah, when I've been speaking about enterprise, I've been referring to on-prem. So we have physical machines in data centers. So yeah, we are running a hybrid now. And so it's really hard to get like an apples-to-apples direct comparison of enterprise on-prem versus EON in the cloud. One thing that I touch upon in my presentation is it would require, if I try to get apples-to-apples, and I think about how I would run the entire workload on enterprise or on EON. I had to run the entire thing, both. I try to think about how many cores we would need, CPU cores to do that. And basically it would be about the same number of cores, I think, for enterprise on-prem versus EON in the cloud. However, my EON nodes only need to be running. Half the cores only need to be running about six hours out of the day. So when the other 18 hours, I can shut them down and not be paying for them mostly. Interesting. Okay, and so I got to ask you, I mean notwithstanding the fact that you got a lot invested in Vertica and you got a lot of experience there, a lot of emerging cloud databases, did you look, I mean, you know a lot about database, not just Vertica, you're a database guru in many areas. Traditional RDBMS as well as MPP, new cloud databases, what is it about Vertica that works for you in the specific sweet spot that you've chosen? What's really the difference there? Yeah, so I think the key difference is the maturity. There are a number, I'm familiar with another, a number of other database platforms in the cloud and otherwise column stores specifically that don't have the maturity that we're used to and we need at our scale. So being able to specify alternate projections, so different sort orders on my data is huge and there's other platforms where we don't have that capability. And so Vertica is of course the original column store and they've had time to build up a lead in terms of their maturity and features and I think that other column stores, cloud otherwise are playing a little bit of catch up in that regard. Of course Vertica is playing catch up on the cloud side but if I had to pick whether I wanted to write a column store from scratch or use a defined file system, like a cloud file system from scratch, I'd probably think it would be easier to write the cloud file system. The column store is where the real smarts are. Interesting. Let's talk a little bit about some of the challenges you have in reporting. You had a very dynamic nature of reporting. Like you said, your clients want a time series. They just don't want a snapshot of a slice but at the same time, your reporting is probably pretty lumpy, a very dynamic demand curve. So first of all, is that accurate? Can you describe that sort of dynamic dynamism and how are you handling that? Yep, that's exactly right, it is lumpy and that's the exact word that I use. So like at the end of the UTC day, when UTC midnight rolls around, that's we do the final ingest, the final aggregate and then the queue for the number of reports that need to run spikes. So the majority of those 40,000 reports that we run per day, are run in the four to six hours after that spike happens. And so that's when we need to have all the compute come online and that's what helps us answer all those queries as fast as possible. And that's a big reason why Eon is an advantage for us because the rest of the day, we kind of, we don't necessarily need all that compute and we can shut it down and not pay for it. So Ron, I wonder if you could share with us just sort of the wrap here, where you want to take this? You're obviously very close to Vertica, you're driving them hard in Eon mode. You mentioned before, you'd like the ability to load data into Eon mode would have been nice for you. I guess that you're kind of over that hump, but what are the kinds of things, if Colin Mahoney's here in the room, what are you telling him that you want the team, the engineering team at Vertica to work on that would make your life better? I think the things that need the most attention, sort of near term is just smoothing out some of the edges in terms of making it a little bit more seamless in terms of the cloud aspects to it. So our goal is to be able to start instances and have them join the cluster in less than five minutes. We're not quite there yet. If you look at some of the other cloud database platforms that they're beating that handily. So I know the team is working on that. Some of the other things are the control. Like I mentioned, while we like control in the column store, we also want control on the cloud side of things in terms of being able to dedicate cluster, sub-cluster specific. So we can pin workloads against a specific sub-cluster and take advantage of the cash that's over there. We can say, okay, this resource pool, the sub-cluster is a new concept, relatively new concept for Vertica. So being able to have control of many things at the sub-cluster level, the resource pools, configuration parameters and so on. Yeah, so I mean, I personally have always been impressed with Vertica and their ability to sort of ride the way of adopting new trends. I mean, they do have a robust stack. It's been 10 plus years around. They certainly embraced the dupe, they're embracing machine learning. We've been talking about the cloud. So I actually have a lot of confidence to them, especially when you compare it to other sort of mid-last decade MPP column stores that came out. Vertica is one of the few remaining, certainly as an independent brand. So I think that speaks to the team there and the engineering culture, but run a few final words, just final thoughts on your role, the company, Vertica, wherever you want to take it. Yeah, no, I mean, we're really appreciative and we value the partnership that we have. And so I think it's been a win-win, like our volumes are like, I know that we have some data that got pulled into their test suite. So I think it's been a win-win for both sides and it'll be a win for other Vertica customers and prospects, knowing that they're working with some of the highest volume, velocity, variety data that's out there. Well, Ron, thanks for coming on. I wish we could have met face-to-face at the Encore in Boston. I think next year we'll be able to do that, but appreciate that technology allows us to have these remote conversations. Stay safe, all the best to you and your family, and thanks again. My pleasure, David. Good speaking with you. And thank you for watching everybody. We're covering, this is theCUBE's coverage of the Vertica Virtual Big Data Conference. I'm Dave Vellante. We'll be right back right after this short break.