 Hello everybody and thank you for joining us today for the Virtual Vertica BDC 2020. Today's breakout session is entitled Vertica in EONMOS, Task, Present, and Future. I'm Paige Roberts, open source relations manager at Vertica and I'll be your host for this session. Joining me is Vertica engineer Yanzi Bay and Vertica product manager David Sproges. Before we begin I encourage you to submit questions or comments during the virtual session. You don't have to wait to the end. Just type your question or comment as you think of it in the question box below the slides and click submit. Q&A session at the end of the presentation will answer as many of your questions as we're able to during that time and any questions that we don't address will do our best to answer offline. If you wish after the presentation you can visit the Vertica forums to post your questions there and our engineering team is planning to join the forums to keep the conversation going just like a dev lounge at a normal in-person BDC. As a reminder you can maximize your screen by clicking the double arrow button in the lower right corner of the slides if you want to see them bigger and yes before you ask this virtual session is being recorded and will be available to view on demand this week. We'll send you a notification as soon as it's ready. All right let's get started. Over to you Dave. Thanks Paige. Hey everybody let's start with a timeline of the life of EON mode. About two years ago a little bit less than two years ago we introduced EON mode on AWS pretty specifically for the purpose of rapid scaling to meet the cloud economics promise. It wasn't long after that we realized that workload isolation by product of the architect architecture was very important to our users and going to the third tick you can see that the importance of that workload isolation was manifest in EON mode being made available on-premise using pure storage flash blade. Moving to the fourth tick mark we took steps to improve workload isolation with a new type of subcluster which Yonsey will go through and to the fifth tick mark the introduction of secondary subclusters for faster scaling and other improvements which we will cover in the slides to come. Getting started with why we created EON mode in the first place let's imagine that your database is this pie. It's a pecan pie and we're loading pecan data in through the ETL cutting board in the upper left hand corner. We have a couple of free floating pecans which we might imagine to be data supporting external tables. As you know the Vertica has a query engine capability as well which we call external tables and so if we imagine this pie we want to serve it with a number of servers. Well let's say we wanted to serve it with three servers, three nodes. We would need to slice that pie into three segments and we would serve each one of those segments from one of our nodes. Now because the data is important to us and we don't want to lose it we're going to be saving that data on some kind of raid storage or redundant storage. In case one of the drives goes bad the data remains available because of the durability of raid. Imagine also that we care about the availability of the overall database. Imagine a node goes down, perhaps the second node goes down. We still want to be able to query our data and through nodes one and three we still have all three shards covered and we can do this because of body projections. Each neighbor, each node's neighbor contains a copy of the data from the node next to it and so in this case node one is sharing its segment with node two so node two can cover node one, node three can cover node two and node one back to node three. Adding a little bit more complexity we might store the data in different copies, each copy sorted for a different kind of query. We call this projections in Vertica and for each projection we have another copy of the data sorted differently. Now it gets complex. What happens when we want to add a node? Well if we wanted to add a fourth node here what we would have to do is figure out how to re-slice all of the data in all of the copies that we have. In effect what we want to do is take our three slices and slice it into four which means taking a portion of each of our existing thirds and resegmenting into quarters. Now that looks simple in the graphic here but when it comes to moving data around it becomes quite complex because for each copy of each segment we need to re-slice it and move that data onto the new node. What's more the fourth node can't have a copy of itself that would be problematic in case it went down. Instead what we need is we need that buddy to be sitting on another node a neighboring node so we need to reorient the buddies as well. All of this takes a lot of time it can take 12, 24 or even 36 hours in a period when you do not want your database under high demand. In fact you may want to stop loading data all together in order to speed it up. This is a planned event and your applications should probably be downed during this period which makes it difficult. With the advent of cloud computing we saw that services were coming up and down faster and we determined to re-architect Vertica in a way to accommodate that rapid scaling. Let's see how we did it. So let's start with four nodes now and we've got our four node database. Let's add communal storage and move each of the segments of data into communal storage. That's the separation that we're talking about. What happens if we run queries against it? Well it turns out that the communal storage is not necessarily performance and so the IO would be slow which would make the overall queries slow. In order to compensate for the low performance of communal storage we need to add back local storage. Now it doesn't have to be raid because this is just an ephemeral copy but with the data files local to the node the queries will run much faster. In AWS communal storage really does mean an S3 bucket and here's a simplified version of the diagram. Now do we need to store all of the data from the segment in the depot? The answer is no and the graphic inside the bucket has changed to reflect that. It looks more like a bullseye showing just a segment of the data being copied to the cache or to the depot as we call it on each one of the nodes. How much data do you store on the node? Well it would be the active data set. The last 30 days, the last 30 minutes or the last whatever period of time you're working with. The active working set is the hot data and that's how large you want to size your depot. By architecting this way when you scale up you're not resegmenting the database. What you're doing is you're adding more compute and more subscriptions to the existing shards of the existing database. In this case we've added a complete set of four nodes so we've doubled our capacity and we've doubled our subscriptions which means that now two shards the two nodes can serve the yellow shard, two nodes can serve the red shard and so on. In this way we're able to run twice as many queries in the same amount of time so you're doubling the concurrency. How high can you scale? Well can you scale to 3x, 5x? We tested this in the graphic on the right which shows concurrent users in the x-axis by the number of queries executed in a minute along the y-axis. We've grouped execution in runs of 10 users, 30 users, 50, 70 up to 150 users. Now focusing on any one of these groups particularly up around 150 you can see through the three bars starting with the bright purple bar three nodes and three segments that as you add nodes to the middle purple bar six nodes and three segments you've almost doubled your throughput up to the dark purple bar which is nine nodes and three segments and our tests show that you can go to 5x with pretty linear performance increase. Beyond that you do continue to get an increase in performance but your incremental performance begins to fall off. Eon architecture does something else for us and that is it provides high availability because each of the nodes can be thought of as ephemeral and in fact each node has a buddy subscription in a way similar to the prior architecture so if we lose node four we're losing the node responsible for the red shard and now node one has to pick up responsibility for the red shard while that node is down. When a query comes in and it and let's say it comes into one and one is the initiator then one will look for participants it'll find a blue shard and a green shard but when it's looking for the red it finds itself and so the node number one will be doing double duty. This means that your performance will be cut in half approximately for the query. This is acceptable until you're able to restore the node once you restore it and once the depot becomes rehydrated then your performance goes back to normal so this is a much simpler way to recover nodes in the event of node failure. By comparison enterprise mode the the older architecture when we lose the fourth node node one takes over responsibility for the first shard and the the yellow shard and the red shard but it also is responsible for rehydrating the entire data segment of the red shard to node four. This can be very time consuming and imposes even more stress on the first node so performance will go down even further. Eon mode has another feature and that is you can scale down completely to zero we call this hibernation. You shut down your database and your database will maintain full consistency in a REST state in your S3 bucket and then when you need access to your database again you simply recreate your cluster and revive your database and you can access your database once again. That concludes the rapid scaling portion of why we created Eon mode. To take us through workload isolation is Yanxi Bay. Yanxi? Thanks Dave for presenting how Eon works in general. In the next section I will show you another important capability of vertical Eon mode the workload isolation. Dave used a Pecan pie as an example of database. Now let's say it's time for the main course. Does anyone still have a problem with food touching on their plate? Parents know that it's a common problem for kids. Well we have the similar problem in database as well so there could be multiple different workloads accessing your database at the same time say you have ETL jobs running regularly well at the same time there are dashboards running short queries against your data. You may also have the end of month report running and there can be ad hoc data scientists connect to the database and do whatever the data analysis they want to do and so on. How to make these mixed workload requests not interfering each other is a real challenge for many DBAs. Vertical Eon mode provides you the solution. I'm very excited here to introduce you the important concept in Eon mode called subclusters. In Eon mode nodes belong to the predefined subclusters rather than the whole cluster. DBAs can define different subclusters for different kinds of workloads and redirects those workloads to the specific subclusters. For example you can have ETL subcluster, dashboard subcluster, report subcluster and analytic machine learning subcluster. Vertical Eon subcluster is designed to achieve the three main goals. First of all strong workload isolation that means any operation in one subcluster should not affect or be affected by other subclusters. For example say the subcluster running the report is quite overloaded and for example an order can be the data science running crazy analytic jobs machine learning jobs on the analytic subcluster and make it very slow even stuck or crash or whatever. In such scenario your ETL and dashboard subcluster should not be or at least very minimum be impacted by this crisis and which means you your ETL job which should not be lagged behind and dashboard should respond timely. We have done a lot of improvements as of 10-0 release and it will continue to deliver the improvements in this category. Secondly fully customized subcluster settings that means any subcluster can set up and tune for very different workloads without affecting other subclusters. Users should be able to tune up tune down certain parameters based on the actual needs of the individual subcluster workload requirement. As of today Vertica already supports few settings that can be done at the subcluster level for example the depot painting policy and then we will continue extending more settings like the resource pools, knobs in the near future. Lastly Vertica subcluster should be easy to operate and cost efficient. What it means is that the subcluster should be able to turn on, turn off, add or remove or should be available for use according to a rapid changing workload. Let's say in this case you want to spin up more dashboard subcluster because you need higher queries report. You can do that and you might need to run several before subcluster because you might want to run multiple reports at the same time. While on the other hand you can shut down your analytics machine learning subcluster because no data scientists need to use it at this moment. So we also made a lot of change improvements in this category which I will explain detail later and one of the ultimate goals is to support auto scaling. To sum up what we really want to deliver for subcluster is very simple. You just need to remember that accessing subcluster should be just like accessing individual clusters. While these subclusters do share the same catalog so you don't have to work on the stale data and don't need to worry about data synchronization. That would be a nice goal. Vertical upcoming 10-0 release is certainly a milestone towards that goal which will deliver a large part of the capability in this direction and we will continue to improve it after 10-0 release. In the next couple slides I will highlight some issues about workload isolation in the initial ER release and show you how we resolve these issues. First issue, when we initially released our first also called subcluster mode, it was implemented using force groups. While force groups and the subcluster have something in common, yes, they are both defined as a set of nodes. However, they are very different in all the other ways. That was very confusing in the first place when we implemented this. As of 9-3-0 version, we decided to detach subcluster definition from the force groups which enabled us to further extend the capability of subclusters. Force groups in the pre-9-3 versions will be converted into subclusters during the upgrade and this was a very important step that enabled us to provide all the amazing following improvements on subclusters. The second issue in the past was that it's hard to control the execution groups for different types of workloads. There are two types of problems here and I will use some examples to explain. The first issue is about control group size. Say you allocate six nodes for your dashboard subcluster and what you really want is on the left three pairs of nodes as three execution groups and each pair of nodes will need to subscribe to all the four shards. However, that's not really what you get. What you really get is on the right side that the first four nodes subscribe to one shard each and the rest two nodes subscribe to two dangling shards. You won't really get three execution groups but instead only get one and the two extra nodes have no value at all. The solution is to use subclusters. Instead of having a subcluster with six nodes, you can split it up into three smaller ones. Each subcluster will guarantee to subscribe to all the shards and you can further handle these three subclusters using load balancers across them. In this way, you achieve the three real execution groups. The second issue is that the session participation is not nondeterministic. Any session will just pick four random nodes from the subcluster as long as it covers one shard each. In other words, you don't really know which set of nodes will make up your execution group. What's the problem? In this case, the fourth node will be doubled booked by two concurrent sessions. You can imagine that the resource usage will be unbalanced and both queries performance will suffer. What is even worse is that if queries of the two concurrent sessions target different tables, it will cause the issue that depot's efficiency will be reduced because both sessions will try to fetch the files onto two tables into the same depot. If the depot is not large enough, they will evict each other, which will be very bad. To solve this the same way, you can solve this by declaring subclusters. In this case, two subclusters and load balancer group across them. The reason it solved the problem is because the session participation will not go across subcluster boundary. There won't be a case that any node is double booked. In terms of the depot, if you use the subcluster and avoid using load balancer group and carefully send the first workload to the first subcluster and second to the second subcluster and then the result is that the depot isolation is achieved. The first subcluster will maintain the data files for the first query and don't need to worry about the file being evicted by the second kind of session. Here comes the next issue. It's the scaling down. In the old way of defining subclusters, you may have several execution groups in the subcluster. You want to shut it down one or two execution groups to save cost. Well, here comes the pain because you don't know which nodes may be used by which session at any point. It is hard to find the right timing to hit the shutdown button of any of those instances. If you do and get unlucky, say in this case, you pull the first four nodes, one of the sessions will fail because it's participated in the node 2 and node 4 at that point. Users of that session will notice because the query fails. We know that for many business, this is a critical problem and not acceptable. Again, with subclusters, this problem is resolved. Same reason, sessions cannot go across the subcluster boundary. All you need to do is just first prevent the query sent to the first subcluster and then you can shut down the instances in that subcluster. You are guaranteed to not break any running sessions. Now you are happy and you want to shut down more subclusters. Then you hit the issue 4. The whole cluster will go down. Why? Because the cluster loses qualms. As a distributed system, you need to have at least more than half of a node to be up in order to commit and keep the cluster up. This is to prevent the kind of lot of diversion from happening, which is important. But you still want to shut down those nodes because what's the point of keeping those nodes up and if you are not using them and let them cost you money? Vertica has a solution. You can define a subcluster as secondary to allow them to shut down without worrying about qualms. In this case, you can define the first three subclusters as secondary and the fourth one as primary. By doing so, these secondary subclusters will not become towards the qualms because we changed the rule. Now, instead of requiring more than half of node to be up, it only required more than half of the primary node to be up. Now you can shut down your second subcluster and even shut down your third subcluster as well and keep the remaining primary subcluster to be still running healthily. There are actually more benefits by defining secondary subcluster in addition to the qualms concern. Because the secondary subcluster no longer have the voting power, they don't need to persist catalog anymore. This means those nodes are faster to deploy and can be dropped and re-added without the need to worry about the catalogue's persistency. For the most, the subcluster that only need to read only queries is the best practice to define them as secondary. The commit will be faster on the secondary subcluster as well, so running this query on the secondary subcluster will have less spikes. Primary subcluster, as usual, handle everything is responsible for consistency. Some background tasks will be running, so DBA should make sure that the primary subcluster is stable and the zoom is running all the time. Of course, you need to at least one primary subcluster in your database. Now, with the secondary subcluster, users can start and stop as they need, which is very convenient. This further brings up another issue, is that if there are ETL transactions running and in the middle, a subcluster started and it becomes up. In older versions, there is no catalogue re-sync mechanism to keep the new subcluster up to date, so Vertica rolls back the ETL session to keep the data consistency. This actually quite disruptive because real world ETL workload can sometimes take hours, and rolling back at the end means a large waste of resources. We resolved this issue in 931 version by introducing a catalogue re-sync mechanism when such situation happens. ETL transaction will not roll back anymore, but instead will take some time to re-sync the catalogue and the commit, and the problem is resolved. And last issue I would like to talk about is the subscription. Especially for large subcluster, when you start it, the startup time is quite long because the subscription commit used to be serialized. In one of our internal testing with large catalogs, committing a subscription, you can imagine it takes five minutes. Secondary subcluster is better because it doesn't need to persist the catalogue during the commit, but still takes about two seconds to commit. So what's the problem here? Let's do the math and look at this chart. The x-axis is the time in the minutes, and the y-axis is the number of nodes to be subscribed. The dark blue represents the primary subcluster, and light blue represents the secondary subcluster. Let's say the subcluster has 16 nodes in total, and if you start the secondary subcluster, it will spend about 30 seconds in total because the 2 second times 16 is 32. It's not actually that long time, but if you imagine that starting secondary subcluster, you expect it to be super fast to react to the fast-changing workload, and 30 seconds is no longer trivial anymore. And what is even worse is on the primary subcluster side, because the commit is much longer and five minutes, let's assume. Then at the point, you are committing the six-node subscription. All other nodes already waited for 30 minutes for GCRX, or we know the global capital lock, and the vertical will crash the nodes if any node cannot get the GCRX for 30 minutes. So the end result is that your whole database crashed. That's a serious problem, and we know that, and that's why we already planned for the fix for the 10.0, so that all the subscription will be patched up, batched up, and all the nodes will commit at the same time concurrently. And by doing that, you can imagine the primary subcluster can finish committing in five minutes instead of crashing, and the secondary subcluster can finish even in seconds. Now summarize the highlights for the improvements we have done as of 10.0, and I hope you already get excited about the emerging Eion deployment pattern as shown here. A primary subcluster that handles data-loading ETL jobs and tuple mover jobs is the backbone of the database, and you keep it running all the time. At the same time, define it different secondary subcluster for different workloads, and provision them when the workload requirement arrives, and then deprovision them when the workload is done to save the operational cost. So can't wait to play with the cluster cluster. Here are some admin tools combined that you can start using, and for more details, check out our Eion subcluster documentation for more details. And thanks everyone for listening, and I'll head back to Dave to talk about the Eion print. Thanks, Yancy. At the same time that Yancy and the rest of the dev team were working on the improvements that Yancy described and other improvements, this guy, John Yovanovich, stood on stage and told us about his deployment at AT&T, where he was running Eion mode on prem. Now, this was only six months after we had launched Eion mode on AWS, so when he told us that he was putting it into production on prem, we nearly fell out of our chairs. How is this possible? We took a look back at Eion and determined that the workload isolation and the improvement to the operations for restoring nodes and other things had sufficient value that John wanted to run it on prem, and he was running it on the pure storage flash blade. Taking a second look at the flash blade, we thought, all right, well, does it have the performance? Yes, it does. The flash blade is a collection of individual blades, each one of them with NVMe storage on it, which is not only a performance, but it's scalable. And so we then asked, is it adorable? The answer is yes. The data safety is implemented with the N plus two redundancy, which means that up to two blades can fail and the data remains available. And so with this, we realized DBAs can sleep well at night knowing that their data is safe. After all, Eion mode outsources the durability to the communal storage data store. Does flash blade have the capacity for growth? Well, yes, it does. You can start as low as 120 terabytes and grow as high as about eight petabytes. So it certainly covers the range for most enterprise usages. And operationally, it couldn't be easier to use. When you want to grow your database, you can simply pop new blades into the flash blade unit, and you can do that hot. If one goes bad, you can pull it out and replace it hot. So you don't have to take your data store down, and therefore you don't have to take vertical down. Knowing all of these things, we got behind Pure Storage and partnered with them to implement the first version of Eion on premise. That changed our roadmap a little bit. We were imagining it would start with Amazon and then go to Google and then to Azure and at some point to Alibaba Cloud. But as you can see from the left column, we started with Amazon and went to Pure Storage. And then from Pure Storage, we went to Minio, and we launched Eion mode on Minio at the end of last year. Minio is a little bit different than Pure Storage. It's software only. So you can run it on pretty much any x86 servers, and you can cluster them with storage to serve up an S3 bucket. It's a great solution for up to about 120 terabytes beyond that. We're not sure about performance implications because we haven't tested it, but for your dev environments or small production environments, we think it's great. With Vertica 10, we are introducing Eion mode on Google Cloud. This means not only running Eion mode in the cloud, but also being able to launch it from the marketplace. We're also offering Eion mode on HDFS with version 10. If you have a Hadoop environment and you want to breathe new fresh life into it with the high performance of Vertica, you can do that starting with version 10. Looking forward, we'll be moving Eion mode to Microsoft Azure. We expect to have something breathing in the fall and offering it to select customers for beta testing, and then we expect to release it sometime in 2021. Following that further on the horizon is Alibaba Cloud. Now, to be clear, we will be putting Vertica in Enterprise mode on Alibaba Cloud in 2020, but Eion mode is going to trail behind whether it lands in 2021 or not. We're not quite sure at this point. Our goal is to deliver Eion mode anywhere you want to run it on-prem or in the cloud or both because that is one of the great value propositions of Vertica is the hybrid capability, the ability to run in both the Eion-prem environment and in the cloud. What's next? I've got three priority and roadmap slides. This is the first of the three. We're going to start with improvements to the core of Vertica. Starting with query crunching, which allows you to run long-running queries faster by getting nodes to collaborate. You'll see that coming very soon. We'll be making improvements to large clusters and specifically large cluster mode. The management of large clusters over 60 nodes can be tedious. We intend to improve that in part by creating a third network channel to offload some of the communication that we're now loading onto our spread or agreement protocol. We'll be improving depot efficiency. We'll be pushing down more controls to the subcluster level, allowing you to control your resource pools at the subcluster level, and we'll be pairing couple moving with data loading. From an operational flexibility perspective, we want to make it very easy to shut down and revive primaries and secondaries on-prem and in the cloud. Right now it's a little bit tedious, very doable. We want to make it as easy as a walk in the park. We also want to allow you to be able to revive into a different size subcluster and last but not least, in fact, probably the most important, the ability to change shard count. This has been a sticking point for a lot of people, and it puts a lot of pressure on the early decision of how many shards should my database be, whether it's in 2020 or 2021. We know it's important to you, so it's important to us. Ease of use is also important to us, and we're making big investments in the management console to improve managing subclusters as well as to help you manage your load balancer groups. We also intend to grow and extend EON mode to new environments. Now we'll take questions and answers.