 Hello everybody, and thank you for joining us today for the Virtual Vertica BDC 2020. Today's breakout session is entitled Vertica Next Generation Architecture. I'm Paige Roberts, open source for license manager at Vertica. I'll be your host for this session. And joining me is Vertica Chief Architect Chuck Bear. Before we begin, I encourage you to submit questions or comments during the virtual session. You don't have to wait, you just type your question or comment in the question box that's below the slides and click submit. So as you think about it, go ahead and type it in. There will be a Q&A session at the end of the presentation where we'll answer as many questions as we're able to during the time. Any questions that we don't get a chance to address we'll do our best to answer offline. Or alternatively you can visit the Vertica forums to post your questions there after the session. Our engineering team is planning to join the forums and keep the conversation going so you can, it's just sort of like the developer's lounge would be in the live conference. It gives you a chance to talk to our engineering team. Also as a reminder, you can maximize your screen by clicking the double arrow button in the lower right corner of the slide. And before you ask, yes, this virtual session is being recorded and it will be available to view on demand this week. We'll send you a notification as soon as it's ready. Okay, now let's get started. Over to you, Chuck. Thanks for the introduction, Paige. Vertica's vision is to help customers get value from structured data. This vision is simple. It doesn't matter what vertical the customer is in. They're all analytics companies. It doesn't matter what the customer's environment is. As data is generated everywhere. We also can't do this alone. We know that you need other tools and people to build a complete solution. Still, our database is key to delivering on the vision because we need a database that scales. When you start a new database company, you aren't going to win against 30-year-old products on features. But from day one, we had something else, an architecture built for analytics performance. This architecture was inspired by the C-Store project combining the best design ideas from academics and industry veterans like Dr. Mike Stonebreaker. Our storage is optimized for performance. We use many computers in parallel. After over 10 years of refinement against various customer workloads, much of the design held up. And serendipitously, the fact that we don't so in-place updates set Vertica up for success in the cloud as well. These days, there are other tools that embody some of these design ideas. But we have other strengths that are more important than the storage format. We're the only good analytics database that runs both on-premise and in the cloud, giving customers the option to migrate their workloads to the most convenient and economical environment. We're a full data management solution, not just a query tool. Unlike some other choices, ours comes with integration with a SQL ecosystem and full professional support. We organize our product roadmap into four key pillars plus the cross-cutting concerns of open integration and performance and scale. We have big plans to strengthen Vertica while string true to our core. This presentation is primarily about the separation pillar and performance and scale. I'll cover our plans for Eon, our data management architecture, smart analytic clusters, our fifth generation query executor, and our data storage layer. Let's start with how Vertica manages data. One of the central design points for Vertica was shared nothing, a design that didn't utilize the dedicated hardware shared disk technology. This quote here is how Mike put it politely. But around the Vertica office, shared disk was an OMTV over Mike's dead body. And we did get some early field experience with shared disk. Customers will, in fact, run on anything if you let them. There were misconfigurations that required certified experts, obscure bugs, expense. Another thing about the shared nothing design for commodity hardware, though, and this was in the papers, is that all the data management features like fault tolerance, backup, and elasticity have to be done in software. And no matter how much you do, procuring, configuring, and maintaining the machines with disks is harder. The software configuration process to add more servers may be simple, but capacity planning, racking, and stacking is not. So the original Allure of Shared Storage returned. This time, though, the complexity and economics are different. It's cheaper. You can provision storage with a few clicks and only pay for what you need. It expands, contracts, and brings the maintenance of the storage close to a team who's good at it. But there's a key difference. It's an object store, and object stores don't support the APIs and access patterns used by most database software. So another Vertica visionary, Ben, set out to exploit Vertica's storage organization, which turns out to be a natural fit for modern cloud-shared storage. Because Vertica data files are written once and not updated, they match the object storage model perfectly. And so today, we have Eon. Eon uses shared storage to hold Vertica data with local disk depots that act as caches, ensuring that we can get the performance that our customers have come to expect. Essentially, Eon and Enterprise behave similarly, but we have the benefit of flexible storage. Today, Eon has the features our customers expect. It's been developed and tuned for years. We have successful customers such as Red Pharma, and if you'd like to know more about Eon has helped them succeed in Amazon Cloud, I highly suggest reading their case study, which you can find on Vertica.com. Eon provides high availability and flexible scaling. Sometimes on-premise customers with local disks get a little jealous of how recovery and subclusters work in Eon. So we offer Eon on-premise, particularly on pure storage. But Enterprise also had strength, the most obvious being that you don't need to add shared storage to run it. So naturally, our vision is to converge the two modes back into a single Vertica, a Vertica that runs any combination of local disks and shared storage, with full flexibility and portability. This is easy to say, but over the next releases, here's what we'll do. First, we realize that the query executor, optimizer, and client drivers and so on are already the same, just the transaction handling and data management is different. But there's already more going on. We have peer-to-peer depot operations and other inter-node transfers. Since Enterprise also has a network, we could just get files from remote nodes over that network, essentially mimicking the behavior and benefits of shared storage with a layer of software. The only difference at the end of it will be which storage holds the master copy. In Enterprise, the nodes can't drop the files because they're the master copy, whereas in Eon, they can be evicted because it's just a cache. The master is in shared storage. And in keeping with Vertica's current support for multiple storage locations, we can intermix these approaches at the table level. Getting there is a journey, and we've already taken the first steps. One of the interesting design ideas of the C store paper is the idea that redundant copies don't have to have the same physical organization. Different copies can be optimized for different queries, sorted in different ways. Of course, Mike also said to keep the recovery system simple because it's hard to debug, and whenever the recovery system is being used, it's always in a high-pressure situation. It turns out to be a contradiction, and the latter idea was better. No down-performance suffers if you don't keep the storage the same. Recovery is harder if you have to reorganize data in the process. Even query optimization is more complicated. So over the past couple of releases, we got rid of non-identical buddies. But the storage files can still diverge at the bit level because super mover operations are synchronized. The same record can end up in different files on different nodes. The next step in our journey is to make sure both copies are identical. This will help with backup and restore as well because the second copy doesn't need backed up, or if it is backed up, it appears identical to the deduplication mechanism present in most backup systems. Simultaneously, we're improving the vertical networking service to support this new access pattern. In conjunction with identical storage files, we will converge to a recovery system that's instantaneous. Now this can process queries immediately by retrieving data they need over the network from the redundant copies as they do in Eon's day but with even higher performance. The final step then is to unify the catalog and transaction model. Related concepts such as segment and shard, local catalog and shard catalog will be coalesced as they're really represented the same concept all along just in different modes. And the catalog will make slight changes to the definition of a projection which represents the physical storage organization. The new definition simplifies segmentation and introduces laudable granularities of sharding to support evolution over time and offers a straightforward migration path from both Eon and Enterprise. There's a lot more to our Eon story than just the architectural roadmap. If you missed yesterday's Vertica and Eon mode presentation about supported clouds and on-premise storage options, replays are available. We're going to catch the upcoming presentation on flies and configuring Vertica in Eon mode. As we've seen with Eon, Vertica can separate data storage from the compute nodes allowing machines to quickly fill in for each other to rebuild fault tolerance. But separating compute and storage is used for much, much more. We now offer powerful, flexible ways for Vertica to add servers and increase access to the data. In Vertica 9, this feature is called subclusters. It allows computing capacity to be added quickly and incrementally and isolates workloads from each other. If you're exploratory analytics team needs direct access to the source data, they need a lot of machines and not the same number all the time, and you don't 100% trust the kind of queries and user-defined functions they might be using, subclusters is the solution. While there's much more extensive information available in our other presentations, I'd like to point out the highlights of this subcluster best practices. We suggest having a primary subcluster. This is the one that runs all the time if you're loading data around the clock. It should be sized through the ETL workload and also determines the natural shard count. Additional read-oriented secondary subclusters can be added for real-time dashboards, reports, and analytics. That way, subclusters can be added or deprovisioned without disruption to other users. Subcluster features of Vertica 9.3 are working well for customers. Yesterday, the trade desk presented their use case for Vertica over 300,000 subclusters running in the cloud. If you missed our presentation, check out the replay. But we have plans beyond subclusters. We're extending subclusters to real clusters. For the Vertica savvy, this means the clusters won't share the same spread ring network. This will provide further isolation, allowing clusters to control their own independent data sets while replicating all or part of the data from other clusters using a published subscribe mechanism. Synchronizing data between clusters is a feature customers want. They know several who have built this for themselves. This vision affects our design for ancillary aspects, how we should assign resource tools, security policies, and balance client connections. We'll be simplifying our data segmentation strategy so that when data that originated in different clusters meets, we'll still get fully optimized joins, even if those clusters aren't provisioned with the same number of nodes for shards. Having a broad vision for data management is a key component to Vertica's success. But we also take pride in our execution strategy. When you start a new database from scratch, as we did 15 years ago, you won't compete on features. Our key competitive points were speed and scale and analytics. We set a target of 100x better query performance in traditional databases with fast loads. Our storage architecture provides a solid foundation on which to build toward these goals. Every query starts with data retrieval. Keeping data sorted, organized by column, and compressed is by using adaptive caching to keep data retrieval time and IO to the bare minimum theoretically required. We also keep the data close to where it will be processed and use clusters and machines to increase throughput. We have partition pruning, a robust query optimizer and value index. We use segmentation as part of the physical database designed to keep records close to the other relevant records. So it's a solid foundation, but we also need optimal execution strategies and tactics. One execution strategy, which we've used for a long time, but is still a source of pride, is how we process expressions. Databases and other systems with general purpose expression evaluators break a compound expression into a tree. Here I'm using A plus 1 times B as an example. During execution, the CPU traverses the tree and computes subparts from the whole. Tree traversal often takes more compute cycles than the actual work to be done. Expression evaluation is a very common operation, so it's something worth optimizing. One instinct engineers have is to use what we call just-in-time or jits compilation, which means generating code for the CPU that is specific to the expression. In essence, this replaces the tree of boxes with a custom-made box for the query. This approach is complex to debug, but it could be made to work. It has other drawbacks, though. It adds a lot to query setup time, especially for short queries, and it pretty much eliminates the ability of mere mortals to develop user-defined functions. If you go back to the problem we're trying to solve, the source of the overhead is the tree traversal. If you increase the batch of records processed in each traversal step, this overhead is amortized until it becomes negligible. It's a perfect match for a commoner storage engine. This also sets the CPU up for efficiency. The CPUs are particularly good at following the same small sequence of instructions in a tight loop. In some cases, the CPU may even be able to vectorize or apply the same processing to multiple records for the same instruction. This approach is easy to implement and debug, user-defined functions are possible, and it's generally aligned with the other complexities of implementing and improving a large system. More importantly, the performance, both in terms of query setup and record throughput, is dramatically improved. You'll hear me say that we look at research and industry for inspiration. In this case, I was finding the line with academic findings. If you'd like to read papers, I recommend everything you always want to know about compiled and vectorized queries that we're afraid to ask. So we did have this idea before we read that paper. However, not every decision we made and the verdict executor said the test of time as well as the expression evaluator. For example, sorting and grouping aren't susceptible to vectorization because sort decisions interrupt the flow. We have used JIT compiling on that for years since Vertica 4.1, and it provides modest feed-ups. But we know we can do even better. So we've embarked on a new design for our execution engine, which I call EE-5 because it's our fifth. It's redesigned, especially for the cloud. Now I know what you're thinking. You're thinking I just put up a slide with an old engine, a new engine, and a sleek plane headed up into the clouds. But this isn't just marketing hype. Here's what I mean when I say we've learned lessons over the years and that we're redesigning the executor for the cloud. And of course, you'll see that the new design works well on-premise as well. These changes are just more important for the cloud. Starting with the network layer, in the cloud, we can't count on all nodes being connected to the same switch. Multicast doesn't work like it does in a custom data center. So as I mentioned earlier, we're redesigning the network transfer layer for the cloud. Storage in the cloud is different. And I'm not referring here to the storage of persistent data, but to the storage of temporary data used only once during the course of query execution. Our new pattern is designed to take into account the strengths and weaknesses of cloud object storage where we can't easily do a pass. Moving on to memory, some of our access patterns are reasonably effective on bare metal machines, but aren't the best choice on cloud hypervisors that overheads page faults and TOB methods. Here again, we found we can improve performance with VATOM's dedicated hardware and even more in the cloud. Finally, and this is true in all environments, core counts have gone up, and not all of our algorithms take full advantage. There's a lot of ground to cover here, but I think sorting is a perfect example to illustrate these points. I mentioned that we use JIT in sorting. We're getting rid of JIT in favor of a data format that can be treated efficiently, independent of what the data types are. We've drawn on the best, most modern technology from academia and industry. We've done our own analysis and testing. And you know what we chose? We chose parallel merge sort. Anyone want to take a guess when merge sort was invented? It was invented in 1948, or at least documented that way in a computing context. If you've heard me talk before, you know that I'm fascinated by how all the things I worked with as an engineer were invented before I was born. And that at Vertica we don't use the newest technologies, we use the best ones. And what is novel about Vertica is the way we combine the best ideas together into a cohesive package. So all kidding, about the 1940s aside, our EE redesign is actually state-of-the-art. How do we know if a sort routine is state-of-the-art? It turns out there's a pretty credible benchmark over at the appropriately named sortbenchmark.org. Anyone with resources looking for fame for their product or academic paper can try to set the record. The record was last set in 2016 with 10 cents sort. 100 terabytes in 99 seconds. Setting the records hard, you have to come up with hundreds of machines on a dedicated high-speed switching fabric. But while there's a lot to a distributed sort, they all have core sorting algorithms. The authors of the paper conveniently broke out the time spent in their sort. 67 out of 99 seconds went to node local sorting. If we break this out, divided by two CPUs in each of 512 nodes, we find that each CPU sorted almost a gig and a half per second. This is for what's called an indie sort. Like an indie race car, it is in general purpose. It only handles 600-byte records with 10-byte keys. If the record lengths can vary, then it's called a Daytona sort. A 10-cent Daytona sort is a little slower. 1.15 gigabytes per second per CPU. Now for Vertica, we have a wide variety of ability in record sizes and more interesting data types. But still, no harm in setting our sites on numbers comparable to the world record. On my 2017-era AMD desktop CPU, the Vertica EE5 sort holds about 2.5 gigabytes per second. Obviously, this test isn't able to apples because they use their own open-power chip. But the number of DRAM channels is the same, so it's pretty close. It's the number that says we've hit on the right approach. And it performs this way on-premise in the cloud and we can adapt it to cloud temp space. So what's our roadmap for integrating EE5 into the product? I had to compare replacing the query executor of the database to replacing the crankshaft and other parts of the engine of a car while it's being driven. We've actually done it before, between Vertica EE3.5 and EE5, and then we never really stopped changing it, so we'll do it again. The first part we're replacing is an algorithm called storage merge, which combines sorted data from disk. The first enhancements to that are in Vertica 10. In coming 10.0 path steps, we'll use EE5, a resegmented storage merge, and then convert sorting and grouping into new algorithms. Here are the performance results so far. In cases where the Vertica executor is doing well today, simple environments with simple data patterns, such as this simple cap-to-stake query, there's a modest beat-up. When we ship the resegmentation code, which didn't quite make the freeze, there's a much nicer bump. In the longer term, when we've used grouping into the storage merge operation, we'll get to where we think we ought to be, given the theoretical minimum work the CPUs need to do. Now, if we look at a case where the current executor isn't doing as well, we see there's a much stronger benefit to the code shipping in Vertica 10. In fact, I've turned the chart bar sideways to try to help you see the difference better. This case will also benefit from the improvements in 10.x port releases and beyond. We have a lot happening to the Vertica query executor. That was just a taste, but now I'd like to switch to the roadmap for our storage access layer. I'll start with a story about how our storage access layer evolved. If you go back to the academic ideas, this e-stores paper that persuaded investors to fund Vertica, the read-optimized store was the part that had substantiation in the form of performance data. Much of the paper was speculative, but we tried to follow it anyway. That paper talked about the WS and the RS, the right store and the read store, and how they work together for transactional processing and how there was a super mover. In all honesty, Vertica engineers couldn't figure out from the paper what to do in the next interesting case you want to try, and we asked Sam, Dave, and Mike, we never got a tough clarification to build it that way. So here's what we built instead. We built the Roth, the read-optimized store, which is actually on its fifth major revision. It sorted, columned, and compressed that follows the table partition. It worked even better than the RS described in the paper. We also built the WS, the right-optimized store. We built four versions of this over the years, actually, but this was the best one. It's not a set of interrelated B-trees. It's just an append-only insertion order memory array, no sorting, no compression, row-based, no partitioning. There is, however, a super mover, which does what we call move-out, moves the data from Roth to Roth, sorting, and compressing. Let's take a moment to compare how they behave. When you load data directly to the Roth, there's a data parsing operation, then we finish the sorting, and then compress and write out the column and data files to stable storage. The next query through executes against the Roth, and it runs as it should because the Roth is read-optimized. Let's repeat the exercise for Roth. The load operation responds before the sorting and compressing and before the data is written to persistent storage. Now it's possible for a query to come along, and the query could be responsible for sorting the Roth data in addition to its other processes. The effect on query isn't predictable until the TM comes along and writes the data to the Roth. Over the years, there's been a lot of comparisons between Roth and Roth. The Roth has always been better for sustained load throughput. It achieves much higher records for seconds without pushing back against the client and has since Vertica 4 when we developed the first usable merge-out algorithm. The Roth has always been better for predictable query performance. The Roth has never had the same management complexity and limitations as the Roth. You don't have to pick a memory size and figure out which transactions get to use the pool. And non-persistent nature of Roth has always caused headaches when there are unexpected cluster shutdowns. We also looked at field usage data. We found that few customers were using the Roth, especially among those that studied the issue carefully. So we set out an omission to improve the Roth to the point where it was always better than both the Roth and the Roth as a past. And now it's true. The Roth was better than the Roth of a couple of years ago. We implemented storage bundling, better catalog object storage, and better tuple mover merge-outs. And now, after extensive QA and customer testing, we know we've succeeded. And in Vertica 10, we've removed the Roth. Let's talk for a moment about simplicity. One of the best things Mike Stonebreaker said is, no knobs. Anyone want to guess how many knobs we got rid of when we took the Roth out of the product? 22. There were five knobs to control whether data went to Roth or Roth. Six, controlling the Roth itself. Six more to set policies for the tuple mover move-out and so on. In my honest opinion, there still wasn't enough control over it to achieve success in a multi-tenant environment. So a big reason to get rid of the Roth for simplicity. Make the lives of DBAs and users better. We have a long way to go, but we're doing it. On my desk, I keep a jar with a knob in it for each knob in Vertica. When developers add a knob to the product, they have to add a knob to the jar. When they remove a knob, they get to choose one to take out. We have a lot of work to do, but I'm thrilled to report that in 15 years, Vertica 10 is the first release where the number of knobs ticked downward. But back to the Roth. I've saved the most important range to get rid of it for last. So we can deliver our vision of the future to our customer. Remember how he said in eon and subclusters, we got all these benefits from shared storage? Guess what can't live in shared storage? The wasp. Remember how he said a big part of the future was keeping the redundant copies identical to the primary copy? Independent actions of the wasp and the tuple mover at the root of the divergence between copies of the data. You have to admit it when you're wrong. But it was in the original design and held up to the selling point of time. We've out onto the idea of a separate wasp and wasp for too long. In Vertica 10, we can finally bid good riddles. I've covered a lot of ground, so let's put all the pieces together. I've talked a lot about our vision and how we're achieving it, but we also still pay attention to tactical detail. We've been fine-tuning our memory management model to enhance performance. It involves revisiting tens of thousands of codes, much like painting the inside of a large building with small paint brushes. But we're getting results. As shown in the chart in Vertica 9, concurrent monitoring queries use memory from the global catalog proof. In Vertica 10, they don't. This is only one example of an important detail we're improving. We've also reworked the monitoring tables, split out our network messages into two parts, and increased the data we're collecting and analyzing in our quality assurance processes. We're improving on everything. As the story goes, I still have my grandfather's axe. Of course, my father had to replace the handle, and I had to replace the head. Along the same lines, we still have Mike Stunbrinker's Vertica. We did replace the query optimizer twice, the database designer and storage layer four times each, and the query executors in its zip redesign. When I charted out how our code has changed over the last few years, I found that we don't have much from a long time ago. I did some digging, and you know what we have left from 2007? We have the original curly braces and a little bit of Postgres code for handling dates and times. To deliver on our mission to help customers get value from their structured data with high performance at scale and in diverse deployment environments, we have the sound architecture of the roadmap. We use the best execution strategies and solid tactics. On the other hand, we have solid tactics. On the architectural front, we're converging Neon and Enterprise. We're extending smart analytic clusters. In query processing, we're redesigning the execution engine for the cloud, as I've told you. There's a lot more than just a staff engine. If you want to learn about our new native support for complex data types, improvements to the query optimizer and statistics, or extensions to live aggregate projections in flattened tables, and we continue to stay on top of the details, from low-level CPU and memory tools, to monitoring management, to developing tighter feedback cycles between development, QA, and customers. And don't forget to check out the rest of the pillars of our roadmap. We have new, easier ways to get started with Vertica in the cloud. Engineers have been hard at work on machine learning and security. It's easier than ever to use Vertica with third-party products and the variety of tool integrations continues to increase. Finally, the most important thing we can do is to help people get value from structured data, is to help people learn more about Vertica. So hopefully I've left plenty of time for Q&A at the end of this presentation, and I hope to hear your questions soon.