 This is the last lecture and as every semester I like to do is I would have somebody from industry come give a guest lecture because they're basically going to prove that Andy wasn't crazy or making shit during the last 15 weeks. So Anil is a VP, is that correct, you're high up, but he's director of something at SAP in their office at Waterloo. He's worked on HANA and prior to before SAP bought sidebase, he worked on sidebase IQ for several years. He has a PhD from the University of Waterloo in Canada which as we're discussing today is probably the largest database group in North America now. They have way more like over five or six professors. It's pretty amazing. So he's been involved in the database industry for a long time so he's here to talk about today about the in-the-regate database that's been building for a while at SAP on HANA which we've covered several times over the semester. So with that, there's your clicker. Yeah, okay. Thanks Andy, welcome. Stop interrupting me, ask questions as we go along. Yeah, that's right. So let's make it interactive a little bit. So interestingly my PhD thesis work was in a field that used to be called persistent programming and if I look back to that that was in-memory database management in any case. So today I want to talk about HANA, give you an idea of what we have been doing and if I can figure out the clicker. So what I'd like to do is give you a bit of a recap, right? Talk about SAP a little bit, talk about why, what was the original motivation for building HANA. Perhaps some of you have heard that before, maybe my interpretation will be slightly different, which is always a good thing and then I want to talk about some key design principles for HANA, how it was architected, some of the core reasons and then I'd like to take the opportunity to do a bit of a deep dive into two newer topics within HANA. One is adding this processing to HANA and the other is the new query engine for HANA and talk about why we are doing this and then I'll wrap up with what we see as the future challenges and key technical next stage for our team and I won't talk about then tomorrow in the database group lecture and I'll pick it up from there and go into a lot more detail on what those challenges are and what we intend to do about it. And as part of that, one of the things you do in the industry is you get scars, right? So I'll talk a little bit about this next challenge and what we have been doing in the last two or three years. As you try things, some things work, some things don't work, but generally speaking, nothing actually doesn't work. It just teaches you a new lesson to build an expect thing. So hopefully I'll paint that picture for you and then I'll talk about the cloud is becoming a big topic generally speaking. So what we see as the key challenge is there. Okay, so for those of you that are not familiar with SAP, obviously a very large software company depending on how you want to look at it first or the third in terms of market cap and obviously 100,000 people working a lot of acquisition in the recent days. As Andy mentioned, I joined SAP about seven, eight years ago as part of the cyber acquisition and you will recognize some common names. And obviously at the core SAP remains the enterprise applications company, but the database business has become quite substantial for us. It's depending on how you count maybe 10 to 15% of the revenue comes from databases, including the CyBase databases. So at the current time, the CyBase SAP database team takes care of HANA, of course, but also the CyBase databases, SQL Anywhere, the embedded database, CyBase IQ, I'll talk about that a little bit by its scale, big data management system and ASC as the enterprise or TP system. So one team manages all of that business and obviously the products are in different life cycle stages. So HANA being more on the core flagship side. So at the moment we end up with five major countries that we have development teams in. Canada where I'm from in Waterloo, I'll talk a little bit about that. Germany, obviously, then we have two locations there. Berlin and Waldorf, which is one hour south of Frankfurt, India, Pune, and then South Korea in Seoul. And then in China we have three locations, but one major one in Xi'an and then a significant presence in Shanghai and some in Beijing. So I work out of the Waterloo lab. I've lived in Waterloo for 30 years now, never left after my grad school. And so the site is, this Waterloo site has a long history. 35 years plus, started as a spin-off from the University of Waterloo. Some of you might have heard the word for Fortran compiler or maybe not telling your age, that came from there. And then Powersoft, CyBase, and now SAP. More than half the team in Waterloo works on the database team. I work on a whole bunch of technologies, CyBase, IQ, SQLineware, HANA, tooling, interfaces, drivers. And we have a strong co-op and grad interns program. At any given point in time, my team would have about 14 students, mostly undergrad students, but we have a strong grad intern program. We get PhD students and master students doing internships. And also now I have a PhD student that's in the lab for two years as part of our Waterloo program. Trump is in your president, right? So you don't have to go to Canada. That's right. That's right. And we are hiring now. So I have dozen to two dozen positions open. We're hiring for different functions. So there's links that you can look at it. Okay, so now going back to HANA, right? So one thing to keep in mind as I talk through this, right? So the origins of HANA were not to build the general purpose database, although that will change a little bit. The original purpose was to build the leanest, meanest database platform we could do for SAP applications. So a lot of the focus was what do SAP applications need, right? In the end, that is our core business. And part of that comes from making those applications run better, faster, exploiting more hardware. But part of that is also, you know, financial, because SAP was writing by, there was a big chunk of the SAP revenue that was going to the back ends that SAP application ran on. So it made business sense for us to do that. So the original idea that has a planner, who still owns a whole bunch of the company, SAP, SAP was to argue that that the traditional way of doing enterprise system where you typically have two kinds of systems, the front end systems, and the back end systems. And this is maybe nightly data extraction over emphasizing a little bit. But nevertheless, the fundamentals remain the same, right? Traditionally, people still run their enterprise systems that way. And the argument was, well, is that so good, right? So now, I stay at it here, right? So two systems. And then later on, perhaps we will argue that it's not necessarily about different components, but more about complexity of the overall system. And I'll go into that a little bit later. But the key piece is that a lot of the big businesses like Nestle or Pepsi and others, they rely on something called operational analytics, or they want to rely on operational analytics. What operational analytics means is for a company, at a boardroom level, at a CEO level, being able to do analytics on live data, as opposed to stale data. So this is about getting to a point where you could literally have a system running in a company boardroom and doing what if analysis on the fly. That's very hard to do with ETL-based systems. So the, and of course, the ETL processes can become very complex. So then, based on that, that was the vision, right? So in the end, the reality then was the initial objectives for HANA were to build a single system. And again, this is in the context of enterprise applications, right? And in this case, enterprise applications means the whole data business for big companies is to, in some sense, you could say that the sweet spot that we were trying to build this was that it's good enough for OLTP, right? So what does that mean? It doesn't mean we're trying to build a million transactions per second OLTP. And that's, that's a niche area for most enterprise systems anyways, right? So something like 40,000 statements per second, and by a statement, think of it as a primary key lookup and an update based on that, for example, right? So being able to do that, that's in terms of throughput, in terms of latency, then we would say a good enough here would mean still pretty darn good. So when we talk about one millisecond OLTP transactions, right? So fairly quick. But then excel in analytics, because that's where the, as the term operational analytics implies, that's where the value is being able to make quick decisions, being able to make real time decisions. And so everything we do as you will see later on, everything from the beginning is focused on really fast analytics. And analytics here could mean all sorts of things, right? So for example, in the financial point of view, companies run running reports, so-called running reports, right? And they could take days, sometimes or hours. And the goal here was to make a real dent in that side. And so for many of these things, we are talking about achieving three to five orders of magnitude performance improvements. And that was the key goal, right? But then do it in one system, which means reduce complexity, run it. And again, once again, I emphasize four SAP applications and pick any one you want to, right? So then what I want to do next is then, based on that background, based on those objectives, what are some key concepts that we arrived at? And some of this may be, you know, repeating it for you, but we'll go through that, right? So at a high level, some very, very high level, looks like a typical database system, right? It has a parser, it has a SQL front-end, it has a transaction manager, it does persistency, et cetera, et cetera, right? A couple of things. One is right off the bat, we said we're only going to do snapshot isolation or MVCC, right? MVCC based transaction management because otherwise it's very hard to bring everything into one system. The other thing to point out is that being able to do in memory, of course, the whole motivation was we can exploit, build something from scratch that is purpose built for modern hardware. When I say modern hardware, I think it means two things for me. Multicore fully parallel across those and large amounts of memory, right? What can we do? And fundamentally, we said if you could do things from scratch, how would it look like? And then multimodal support was envisioned to be built in from the beginning. Now, you know, as we go into the next journey, perhaps we would want to still have it inbuilt, but perhaps not as a monolith and break those things up in more dynamically deployable components. Users shared nothing, distributed computing pretty standard at that end. And we'll come back to that perhaps in a later way. Now, this is a very interesting, I always go back to this. And when you look at this, this will explain to you the core design that we did, right? What this is trying to say is for, remember, again, the goal is to build, do OLTP and OLAP in the same system. And if you first think about that, and if you have TPCC and TPCH in mind, you immediately run into some design challenges, how are you going to design it? But then the key was that if, so the numbers on the left, they were kind of like 12 representative enterprise workload system for applications. And we tried to look at what is the reality in terms of what do those enterprise workloads represent, right? So if you can see here, right? So we're breaking it down by reads and writes and deletion and modification. And the key is that for both OLTP and OLAP, in the context of these enterprise applications, reads dominate the amount of work that's done, right? And if you compare it to TPCC, which is the gold standard in some sense for benchmarking for OLTP applications, we are more or less lined up on the basic selects, right? But the main difference is the right part, the modifications. It's way heavily, is way too many modifications over there, right? So the rights are still important. But the key is that if you don't make sure that you are really good at the read side, while supporting doing a good job on the right side, it's not, it's going to be difficult to build a system that can do both in the same place. Can you say how common with the brain scan for OLTP? And like actually, how wide are the brain scans? Not very wide. So they're typically going to be, for example, if I look at an invoicing system, it's typically going to be finding, think of it this way, forget about the invoicing. In general, the application organizes itself around what's called a business object. And a business object could be an invoice or an order. And behind that order could be a bunch of invoices, a bunch of line items, etc, etc. So you're typically dealing with one of the business object at a time. And keep in mind that the SAP applications do a lot of logic in the application server itself. So from that perspective, they're not very wide scans, they're fairly narrow scans. But they are important, right? So if you've seen this, the range selects are quite a big part of the mix. So then you will see in the design that while the key lookups are important, and we take care of that, we do want to pay attention to being able to do fairly efficient, narrow range scans. And you will see that in the design. So, yeah. How about conflicts? Like in the production world, you will see, especially in OTP, you will see something very high at the scale of the conflict in most cases, the conflict is very low. It's handled a lot in the application level. So at the database level, for these applications doing snapshot isolations where reads are block free, there is conflicts. But keep in mind that the application is doing a bunch of the heavy lifting as well. And so that's not at the database level with the snapshot isolation is good enough. And it's not a big issue, originally speaking. So then you will see this key. So then basically what we, then we ended up is that we're going to have an in-memory system store, and it's going to be a column store. That was an early decision as well. Now there is a row store, as you might have heard about in HANA, but that's not where the focus is in some sense. So it is used, is primarily used for two things, right? One is the metadata is maintained in that store for the database, and then applications will use the row store for small hot tables sometimes, and that's where some of the resolutions can be done. So then we ended up with effectively, now this kind of a design is becoming quite common, right? A lot of databases are ending up in this kind of a design. So the store consists of two portions. One is called the main, and the main is the, it's read optimized. Focus don't read, being able to do that really, really quick, and you will see some of the key design choices beyond what column or database is always do, right? Column or database is always tried to do dictionary compression, for example. So that's standard. But the point was what can we do more than that? Given that we want to do reads very, very fast, and we want to exploit in memory processing and the modern hardware, SIMD, instructions and so on. So the main is, it's immutable. So which means the database is organized in two, each table, let's say, when I sit, think of it as a partition of a table. Everything in this context, you can think of one partition of one table. So the data for the table is stored in two, let's say, fragment or segments. One is called the main part that is immutable, it doesn't change until it's re-computed. It's super compressed. So the start with dictionary encoding, dictionary compression for the data. But because it's immutable, it's going to be immutable for the duration of this time, we do some extra processing upfront to get more compression and denser packing of the column values in memory so you can get through more values at a time. So what it does is the following. Once dictionary encoding is done, you end up with column vectors which have effectively value IDs from the dictionary. So now there is an extra step which says, and this is mostly heuristic driven today, what it does is we say for this data set, what can I do if I can sort the rows in any way I want, in some way I want, can I do additional compression? So a very simple example of that would be if I have a column with two or three distinct values, then I could sort along that column and then do run length encoding. So there is about half a dozen different compression techniques that are deployed. And as I said, it's a heuristic decision that made upfront based on these things, and then we compress it down further. So which means now you get a really compressed format for the main store. So that has benefits both in terms of how much memory you use. So as you say here, for some examples, that could be a factor of 10 additional compression on top of dictionary compression. But the second more important thing is scans can get faster because now you're packing more values into, especially if you're using SIMD instruction. So that is the key piece. Now obviously, it's not a read only data warehouse, so we do want to make modifications. So then delta is something that sits on top, and that's what tracks the updates. So the two key pieces to it, two or three key pieces to it, one is that this is right optimized. All changes to the rows in the partition, again the unit is partition here, are tracked in the delta store. So what kind of changes can I do? I can insert a row. If I insert a new row, it just goes into the delta store. I could delete a row that already existed. If the row I'm deleting happens to be in the delta store, then I just mark it as deleted. Let's say I have a, I maintain a bitmap that says which rows are deleted, right? If a row I'm deleting happens to be in the main currently, I remember main is immutable. So I don't change main, I simply record in delta that I deleted this row without going into detail, but a simple implementation, for example, could be a bitmap, which is a delete bitmap, right? And if I update a row, then that's tracked as a delete followed by an insert. So effectively, I delete, I mark the row in the main deleted, and I insert a new row into the delta. So now of course, the query processing has to take this into account. So effectively, then there is a view, consistent view manager, any query that comes in goes to the view manager, and it's fairly straightforward to say what is the view that I present to the application that I'm asking for a particular, remember this is MVCC, it's version control, so I'm looking for a particular version. Main is fixed as of a particular version. So what I'm effectively doing is I'm saying the data for this query is all of the rows in main minus the rows that are merged as deleted in delta plus the rows that are in delta, okay? So those are the key pieces. The other key pieces that the delta store is maintained as a simple append-only store, and that has huge implications because locking, version control, it simplifies a whole bunch of, it makes things faster, let's say, to find. So all append, delta is an append-only store. So now, oh, sorry, one thing I forgot to mention. I talked about sorting, right? So there are two sortings that are taking place. One is that the dictionary is sorted. Typically in a dictionary encoded compression scheme, you will not sort the dictionary because if you're gonna modify the dictionary, then it can be a huge cost to recode everything. But in main, things are not going to change for the life of a main. So the dictionary is sorted. So that has advantages because now rain scans are much easier to do. Otherwise, you have to scan the whole thing. And the second was the column vector itself is sorted, but that's more for compression purposes than anything else, okay? What data structure are we using for the dictionary? Sorry? What is the data structure using for the dictionary? I think it's a hash map, I need to check fully. We will see the columns are sorted. Does all the columns have to be sorted according to the same? Yes, the sorting is for row, at a row level. So which means part of the heuristic would be to pick one sorting order that achieves the maximum completion for all columns, right? So that's not an exact sign, so it's a heuristic, okay? In the delta store, dictionary is not sorted. Again, obviously because sorting the dictionary would mean reencoding everything and that's difficult. There is a between index to allow for fast, unique lookup and check, so there is less compression of data, but there is more right performance is much better. And a lot of log-free algorithms, et cetera, implemented. Now, obviously, you can't keep running like this, right? So there is a price to be paid. And the price to be paid is that every so often, you have to merge the changes into the delta into the main and create a new main, more or less, right? So the thing you're sure in the delta store is like, it's a delta. So if I have, if I insert a tuple, I have to serialize the entire tuple in the delta store. Then if I update one of 10 attributes, the next thing I see in that log is just the update that one attribute, right? No, so that's, it's the whole row. Okay. So that's a potential issue in some sense because I need to pull the row out, the new row. Right, so again, that's the 10-0-e version store that we talked about, even though it's a delta store, it's the tuple. Right, so now the merge needs to be done. Now, so how will the merge work? So basically, if you want to restore, restore order. So now, as you can imagine, right? So there is a cost to be paid and the cost to be paid is in the merge. So I'll describe the merge process quickly, but then there is improvements being made to the merge process itself because it becomes a, it becomes a point of contention at some point, right? Depending on the rate of changes and so on. So the, just at a high level, right? So we can't afford to block the system up while merge is taking place. So it's kind of done as an online operation. Not fully online, but there is a little bit of a lock you need to take when you switch over to the new delta. But more or less, going back to what I was talking about before, if you look at the left most before merge view, so you have a delta, you have a main, right operations are all going against the delta. Read operations have to read from both main and delta to create the consistent view for any given query. When merge starts, during that time, we have to keep the current queries as well as new queries that arrive potentially because I haven't committed something yet, let's say, I'm still working on the previous version. So the read operations continue working against the, so we'll basically start a new delta in this intermediate phase. If a query arrives that was already running before this process started, then that goes against main one and delta one, right? Because that's the view they were looking at. If a query arrives later on, then that has to pull potentially from the old main, the old delta and the new delta because some of the new versions might be there. In the meanwhile, we rebuild, so the merge process is effectively take the old main, take the delta corresponding to it and we are gonna merge the two. So at the end of it, we will end up with a new main that contains all the rows from main to four, minus the rows that were deleted from main and the new rows from delta. Generally speaking, assume for simplicity here and I think that's mostly the case as well, which is we're gonna take the committed rows from delta, those are the ones that need to be merged into main and then we need to take the uncommitted rows from the previous delta and carry them over to the new delta at least in a logical way, right? And then, so this happens online and then at some point at the end, we say, okay, we're done. At that point, there is a small lock and we flip over to the next delta. So that's more or less, now obviously as you can see, this is a potentially memory intensive operation. So as a rough guidance, during the merge process, you need twice the memory at least for that data, right? So then, as I said, I'll talk about the new query engine and the new execution engine later on. Over time, we'll look to make this process a little bit more memory friendly. Okay, this is just an example of extra compression that we can do, run length encoding, right? So pretty obvious, so you do dictionary encoding, then you sort the rows and then you store, then you compress the column vector with run length encoding. So if I only had two values, I would end up with effectively four values in my entire column, right? How many times, the first value happens, how many times the second value happens and that's it, regardless of how many rows I have. I could have a billion rows in the column. We still have this fixed size for that column, right? Okay, just to briefly mention, of course I'm not gonna go into that detail, so you're mostly familiar with SIMD processing, so very, very heavy use of that. In fact, from the beginning of HANA, we have had some very intense collaboration with Intel and many of the SSE 2, 3 and AVX instructions were developed by Intel based on our collaboration and need. There is a, in fact, there has been a reasonably large Intel team that sits in the HANA office in Waldorf and works with us, okay? So that's the, I wanted to stop there for the overview and then I wanna go into two of the, as I said, focus topics. Any quick questions at this point? Okay, so let's talk about the query engine a little bit. So, This is the new one. This is the old one. This is the old one, okay. So, So you've got to talk about the new one. I'm worth of the new one. So I just wanna paint this picture. Right, so for historical reasons, HANA has this, remember, this is actually, this is really good. So we have specialized engines. So there is a joint engine that focuses on being able to do joint reductions and so on. There is something called an OLAP engine, not, yeah, OLAP engine, which, so HANA supports SQL, but HANA also supports this, some so-called analytical views. And that has some special syntax and considerations and that can run the analytical queries for SAP applications. So it's SQL plus some tweaks to it. But not joins, right? Joins, well, so good point, right? So we end up with a situation now where some same semantic query could take different paths, right? Joins mostly end up in the join engine, right? But this causes some problems. One is that there is, you have to keep all of them up to speed. And then the bigger problem is, depending on which code path a particular semantically equivalent query takes and how it's written, so the analytical views is, as I said, special syntax language that goes through the OLAP engine. So depending on which path it takes, we could end up with different performance characteristics and then we have to guide. So going forward, we're gonna consolidate that into a single, and the other thing is that the processing is set based internally, right? So the intermediate results that are fully materialized before they passed on to the next stage. So those are some of the changes we wanna make because- I guess what's sort of confusing with the students is like, I just spent the entire semester saying OLAP equals joins, right? And then now you're saying, if it's a join engine for joins and then OLAP is joined plus something else, say if there's something else there. So there is SQL, there is analytical views is a special syntax that the application uses and I think originally it allows you to specify things that you wouldn't easily write in SQL. Okay, so beyond aggregation. Beyond, yeah, yeah. I can get you, I can find the specific details and so on. So then that's worked quite well as long as your, our target set was the specific applications we were optimizing against, right? But as we get into more join purpose focus areas being allowing more applications to be developed then that causes some issues, right? So some short running queries and again as you're talking about this morning, the original, the genesis was that there is close collaboration between the application which are own applications. In fact, sometimes the BW, the BW is our data warehouse team and we sit together and kinda do things specifically for their needs, custom team. That works well, but as you go and try to attract more use cases and customers particularly migrating from other databases to higher than that becomes an issue because- Is this outside of SAP? Non-SAP applications, non-SAP applications, right? What typically happens is that even in the SAP customers that are running SAP applications, they often have needs to write custom apps against the same data, right? And so that gets harder. So the new query engine then, there is a new query engine being built, Hacks. One of the goals is to reduce memory footprint, right? And part of that comes from as I said before, the current execution engines are set based and so this new engine will add streaming support between operators but still not, generally speaking, we don't wanna go to tuple at a time processing because then that's key for us. So it adds, it adds pipelining, it adds streaming support and the goal would be for this engine to replace all the other engines, right? So that work has already kind of going on. More and more things are going through the common SQL optimizer, common runtime. In the current shipping version that just shipped actually, this engine is live. So it's currently handling a subset of classes of queries that get routed to the new engine. Not everything is routed there but the goal would be to replace all the other engines with this one engine. And the new engine will exploit, continue to exploit the key techniques that we have, right? Which is working on compressed data, working on SIMD instructions and so on. And in that process, it's code generation based as well and then keeping close eye on the work going on, for example, in Hyper and other databases. Is there already a new one? I know. So Thomas, just for people that don't know, right? So we have- We know. Oh, you know, okay. We read his paper, we know. So we have had strong history working with Thomas and a lot of his work got into the product. So generally speaking uses LLVM for machine generation code. Currently we don't go directly to IR and instead we generate L. It's easier to look at, easier to debug, et cetera. And then we're using an L interpreter to generate code. And L is a specific language that I said you wrote but like you guys had before you did that, right? It's already in the product. Before we did. Okay. Right, so there is already, L is already used in HANA before we started building the Hax engine. Okay. And then of course, so that's currently the current state of the engine. Of course, broadening its scope, broadening the set of queries it can handle with the goal to replace the other engines. Now, of course, completion time. L already became a secret function. So that's like, you're relying on, right now you have an L interpreter. So you're basically converting the scan operators and the filters and the aggregations into L script and letting that interpreter handle that as its own function to do. Existing built-ins, existing functions that we can now leverage. Going through the interpreter. We wouldn't be able to do that if we went directly to IR from. If you use L now in HANA without being able to mention to write the stored procedure, like the data center could have its own, I don't know, add years stuff that they could have or add two years together. It has its own function and L has its own now. So the current stored procedure execution uses L internally, right? So we're just piling on to the same implementation. It's consistent, it's easier to read, easier to debug, as opposed to going to IR directly is the main point, right? Of course, you know, when to generate code, when to not generate code, so that thing is all going. So we're gonna, the L interpreter needs more work, right? So it's not fully optimal, et cetera. So we will continue to work on that. I already talked about this currently to avoid completion times, you know, the performance, the execution can start right away. So kind of, there will be more work needed in this area. And then of course, this slide and the next slide, mostly talks about. We covered that paper too, that's Victor's, go outside. This one? Yeah. That's Victor's paper on the adaptive query execution where they generate the IOR and interpret that and then back on the compiler. Right, so this one and the next slide is kind of tracking things that we want to look at going forward, right? And again, so some other challenges, so that's kind of when you do, when you do, when you compile, when you do like materialization, et cetera, okay? Right, so then we'll switch topics to now adding this processing to the, so as Andy and I were talking about before, so there's this ongoing question about, you want to do in-memory processing in a database, how do you, where do you start? You start at a disk-based engine and add in-memory processing, or do you purpose-built from scratch, an in-memory engine and then we add this processing to that? Yeah. I think that's part of the work that needs to happen. It's not clear. I don't think there's a clear answer zone. When do you go the L-interpretation route and when do you compile the, but the trade-off is execution time and the completion time, right? So I don't think we have all the answers at this point. It sounds like you're not caching anything. No. Yeah. Okay. Well, I don't know how it works like, but I imagine compiling, see what's most expensive and that's why they have to do that. Plus at their scale. So then going back to the disk-processing engine, right? So clearly there is need to add some disk-capability, disk-processing capabilities to an in-memory database, right? This is the obvious thing of an expectation that a database doesn't stop when you run out of memory, although it might be okay for some application, but even if you discount the data size, it's not a general expectation that intermediate results blow up your memory. You want to be able to deal with that. And that's one. And then as the adoption increases for the database, you end up with situations where you want some native capabilities to be able to age data or keep data together, as opposed to if you have no disk capabilities in the database, then you have no opportunity to say, hey, some of my data is now less important. I don't want to pay the same cost of processing the data as my hot data. And I don't want to offload that data to a different system altogether, right? Because I want some transitional consistency. You can publicly say some examples, but like, you know, custom names, like, just give, like, where's some really large in-memory hon installation that you guys are dealing with? Sure. 48 terabyte production system. Single machine node uses an HPI machine. I'm not sure whether. So is that, like, they're maintaining refresh, refresh when, like, 48 terabytes of DRAM. That's right. And it's a live system. It's in production. And then 48 terabytes is not enough for the system. So then they wanted to add then some disk to that for aging purposes, right? So 48 terabytes is probably not very common, but there is multiple systems globally in that range, right? And again, it comes down to what I was talking about before. If there is business critical processes that you can run in minutes or seconds instead of hours and days, then people can find business value in maintaining these large systems, right? Multi terabyte for, sweet spot is probably four to eight terabytes, but we need to check that. But there is large systems available. And this is not even skill out. This is a single node system, yeah. So I have a question. Do they have a twist back to the earlier part about the column store versus this, like, apparently the only delta store, right? And is that monitoring typically you kind of what's the ratio between the two of the? Oh, it's not ratio, but like, they have a big size of the other store. Keep in mind that the current database implementation has a two billion role limit on a single partition. So that kind of gets your data sizes, right? How large delta grows varies. But there are two ways that the merge process gets triggered. One is we just periodically do it, regardless of the rate of change. And then there are thresholds that are in place that trigger the merge process. But generally speaking, the largest you can allow, there is a threshold, hard threshold, that you want to not cross, because the merge process will need twice the memory, at least for that one merge. If your question is what's a typical ratio that people maintain, I'm not sure I can answer that right off the bat. But that should give you, I can get you some more information on that. But they can get large. They can get large, right? Because there is obviously a design or a operational trade-off doing two. You want the switch portal when to trigger the merge. The larger the, there's one more thing to keep in mind. It's OK if the delta grows quite large if it's doing mostly inserts. Or if you're mostly doing inserts or you're mostly modifying recent rows where everything is in the delta because then it's fairly efficient to get the query result. But if you end up with a lot of main rows being modified, then it will impact performance. So there is two things I want to talk about today and then tomorrow I'll talk a little bit more on this topic. So we started to add some disk-based processing, let's say. The original work we did that was kind of published in Sigma 2016. The work actually happened a little bit before that. The objective there was not necessarily to add disk processing as a primary objective. That work was mostly trying to achieve some memory relief because as you get in most, especially in the SAP application data sets, you end up with columns that are not so critical from a processing point of view. And so the objective what we said is, well, generally speaking, you need this memory for the data set to be kept in memory. If there are some columns that are not so critical to me, can I use less memory for processing that column, doing a query on that column? Remember, in the original design, the entire, at least for a column that you're working on, the entire column data structures have to be fully memory-resistant before you can do scans on that, right? But now if I have columns where I'm willing to pay a little bit of a penalty on scanning a column, so it's not that business critical. And secondly, I'm effectively doing a scan on that column. Can we do that with less memory on the main, particularly? Delta is such a good memory. For the main, can we do that, right? But we don't want to necessarily go to a traditional disk-based paging format because we still want it to be able to do tight memory loops over chunks of data. So that's where this page-loadable column idea came from, and the main idea here was that you have a regular in-memory column, but if I'm doing a scan on that, I can do the scan by bringing it in memory piecemeal. Still in big chunks, but piecemeal, right? So that's basically what it talks about here, right? So then the idea was that we still have vectorized storage, but instead of loading the entire vector in memory, can I just bring it in chunks? So pieces of the vector. Similarly, so for the vectors, for the dictionaries, for search, and the indexing, right? And then, so only for the read-optimized portion, which is the main portion, right? And then, keep in mind to going back to your point, right? So even if I don't have a fixed answer for you, in most workloads, the majority of the data is going to be in main, right? So it means memory consumption is heavily dominated by data that's in main. So if you can bring memory relief, that was kind of like a quick way for us to bring some relief into the systems. And then, as you will see later on, that idea then grew into a more, into something, the VLDB29 this year, there'll be industry track paper that talks about something called native storage extension. So that adds, I would say, a bit more holistic dispossessing to the system. Because it's done in the HANA engine itself, then the goal is to try to do that in a way where all of the processing applies equally to that data. So in contrast to what I'll talk about tomorrow, an attempt for us to build a system with using HANA and IQ as two separate systems together, but there it becomes very difficult to maintain the same kind of functionality together. Again, this is transparent to the application. My goal is to be able to do this without requiring application changes. And if I can go to this diagram a little bit, it kind of gives you an idea of how this is being built. So remember with the page, the previous page loadable columns I talked about, we already started to come up with a disk format that was conducive to bringing vectors, segments of vectors into memory and process them with essentially the original memory SIMD processing, et cetera. So now with this one, effectively we come up with a common format on disk that is used for all data, but the choice whether compared to the previous one, there the user is explicitly making a priority choice which says I want this piecemeal loading capability for a particular column. This one says we're gonna store all data in a common format and then at runtime, we have the ability to run it using the page primitives. Really the main difference between the two is this one assumes contiguous memory vectors fully in memory, whereas the other ones is chunkwise, right? We want to use common primitives, for example, SIMD scans as much as possible and you will see some of the things that you will normally do in a display system. So there was an existing object buffer cache that's being expanded to deal with more holistically with, so this will start to become more like a traditional buffer cache in that sense, but still it's not dealing with fixed size pages but rather chunks of vectors. What makes it elastic, elastic buffer cache? I think the goal is to make, okay, before I answer that, right? So one of the goals is for the engines to automatically decide based on access patterns, which way I want to do process a given vector, right? Whether I want to load it all and do use memory semantics or do I want to do it piecewise? So in that cluster, the elastic basically means that this is not something that we want to use it to decide upfront so that this can grow. So because, remember, the memory is shared between the buffer cache for the right side and the normal in memory. So elastic essentially says the engine can do trade-offs, do more memory here or do more memory here? So this is a loaded question, did you consider NMAP? Ha ha ha. NMAP has been considered on and off for the history of HANA. Yeah, and you would take this conversation offline? The short answer in the end was that didn't work as expected. Ha ha ha ha. Ha ha ha ha. Ha ha ha ha. Ha ha ha ha. Ha ha ha ha. Ha ha ha ha. Ha ha ha ha. Ha ha ha ha. So elasticly, sorry for a short division. So for my PSD thesis, my persistent memory system was MAP-based persistent programming. The idea was that, you remember the goals of persistent programming? They have no idea what that is. Okay, so the goal was to build, make persistence an orthogonal property of data. So all data should be first class. Today it's not, right? So if the data goes to disk, forget about type safety, for example. So anyway, so my work was to use MAP in those days. And in fact, I sat down with Microsoft one day. Guys, you do, if the operating system can somewhat MAP more holistically, I think it could be a thing of beauty, but it's not. But this is like 99.98? 99.98. Yeah, still. Actually, before, I was done in 96. All right. Yeah. No, sorry, that keeps coming back. But it's also an idea that doesn't die. So maybe there is something there. Okay. I make public statements about how much I hate MAP. I'm never trying to prove it. That sucks. We'll come back to you. Yeah, I have pains about around MAP. So we'll build a recommendation engine, you know, just theoretically, the vision here would be that you don't need, the user doesn't need to make any choices. That we're able to convert, use either format on the fly. So, but reality of course would be that there is a cost to be paid runtime if you decide in the fly to go from one to the other. And like most, you know, self-driving systems, there is a period of time where we get to the utopia, right? So in the meanwhile, we will, we're trying to build a recommendation engine. At the moment, it's mostly heuristic based. Looks at statistics, tries to make recommendations to the user as to which way to go. It's pretty rudimentary at this point, but the goal would be to tie it into that, that in terms of HANA project we have where we could learn over time. And this is maybe something we can talk about as well in the context of your work. Okay, so that's about I have. So then I wanna wrap up with two things. One is obviously HANA has been extremely successful for what it was designed for, right? Which was solve the operational analytics use case, you know, I've not seen the business relevance, it gained very quickly, and it has brought a lot of benefits to customers. Again, the original motivation was build a single system, keep it simple, no aggregates, no, that's the initial starting point, right? So the goal was that the system doesn't need the materialized views, system doesn't need indexing, because all of those things add to complexity and to the cost for the load side. But of course over time, we have selectively added some capabilities here. If you go look into the HANA documentation, you will see something called cast views, that's kind of like some form of materialization. But they're dynamically created. So what are the next steps, next challenges, right? Cost cloud is a big topic, I wanna talk a little bit about that tomorrow. Memory footprint, reducing memory footprint, hardware costs, you know, for certain use cases, as I said, 48 terabyte system for this large company, it serves that their needs, but of course there are other use cases where you want to reduce or relieve that. There is ongoing work about, so going back to my original point about enterprise systems having two systems, specialized engines, et cetera, et cetera. So now, if you go outside the sweet spot that we defined for HANA, then yes, we need to build a specialized engine, specialized things, but now the goal and motivation for us is, can we do the heavy lifting for the user and present to the user a single system, which is not a single server, maybe sometime we use the term system of systems and do some heavy lifting inside, so I'll talk a little bit about what we are doing there. Okay, so pretty much an hour. Awesome, thank you. We have time for questions. Well, let's say again, I didn't tell him to say any of the things he said, right? I tried to be, they used natural isolation, I said, that's what we wanted to do, or that we use NMAP, but that was, I can't, yeah, so any questions? Just curious, for the use case of HANA, I was wondering, do you have a number at the top of your mind what's the ratio between this traditional relational data processing versus the tech search graph or whatever? I think tech search has quite a bit, so let me separate tech search from text analytics, right? So there is a place for tech search that should be more natively into the database, and graphs, it's interesting, but we're not seeing a lot of, it's an interesting topic. I think part of the issue is that companies are trying to figure out how to make business sense and use of it, right? So we, spatial, similar story, so there is some very intense relevant use cases, but it's not still mainstream in that sense, right? So for the most part, I would say the most data processing is still relational from what I see. Even in the big data use case, you know, we make a lot about in the last 10, 15, 20 years, we have talked a lot about variety, but I think to this date volume is the biggest problem, and velocity perhaps, because even when we have unstructured data use cases, oftentimes they boil down to keep the unstructured data objects available for drill downs or for reference, but do entity extraction store the data about the objects, do you're searching in the relational processing or that then you have a, maybe you have a result set that you want the original objects for. So I have examples of customers where they would have petabytes worth of unstructured objects, but they don't want to look at it until there is a very specific, so for example, there is this use case about a company that provides enterprise solutions where companies have need to catalog and store all the voice mails and all sorts of things in the company, and then nobody needs those voice mails until there is a litigation happening somewhere and you need to go back to. So what they do then is they write customized software that you feed this unstructured objects through, they extract all sorts of information, going as far as sentiment analysis, et cetera, but nevertheless it boils down to a few pieces of structured information, stored that in the database. If it's column or databases, you can also then keep the original object and store it as a column in the database, right? And then it's there, you don't have to maintain those files and then most of your processing is happening in the structured data, right? And so, sorry, since you asked, right? So I have an opinion on that as well, right? So this is the issue with today's so-called data lakes in the traditional sense, right? So if I define data lake with a theoretical sense, it's about keeping the raw data model-less, schema-less, et cetera, that's kind of like saying, you know, most applications have data models, right? And so very quickly, these data lakes become swarms, right? And so I think in the end, relational, call it relay, okay, let me not use the word relational, let's just say structured. No, no, no, I will go for it, but what I'm trying to distinguish from is relational to me means two things, right? One is relational-style structured data processing and the other is when I say relational, my brain immediately says as it as well, right? So for a lot of use cases, actually both go together. I will talk about a use case tomorrow where it's a 20, 30-parabyte active data use case and they have relational requirements in the sense that they can't lose any records, these are business-relevant records and so I think relational is here to stay personally, right? It's not a entrepreneurial statement, it's not, no, no, no, no, no, that's right, and so but at the same time, you know, we do have to adopt, so for example, talk about tomorrow. If you have to move, if you all have to move to the cloud and the cloud vendors don't change how they provide infrastructure, the current architectures will not work because they were designed with this different underlying, sorry, when I say they won't work, they won't work as is. Most of them, how much work you need to do to make them work will vary from design to design, right? In some cases, it will be simple things. In some cases, it will be significant things. So for the, in this case here, for like the graph engine or the text retention, I did take the graph one for people, right? Is that, I don't need the covers, is it the same storage engine that you've used in the Rosinor column store? Is it or it's a completely separate system? It's mostly execution engine uses the underlying column stories. Yeah, okay, okay, okay. With some, with some caveats, right? So for example, if I look at the text search, so the text mining builds an in-memory text index that's a complex structure, so it's not stored as is. Spatial is mostly stored in the column store, but adaptation store fit. So for example, we did a, so the column store is called the attribute store or whatever, right? But there is specializations in that. So for example, if you want to store series data into the column store, the column store remains the same, but you now need a few additional tweaks. You want maybe revolving door compression or some different kinds of things. So those kind of things get played. Okay, tomorrow then, so then we'll talk about our experience on bringing these different engines together and what we think about it. A question about the old kind of code path. So you said semantically similar queries can take on different paths. Does that mean that the exact same query has that chance for? No, let me refine. If you write an, if you write a query in an analytical view, that always goes to the OLAP engine, right? But you could write a equivalent SQL query that represents the same computation, but that will go through the different code path. It won't go through the OLAP engine. And the optimizations are different sometimes because the amalgamation hasn't taken place. So you could end up with different performance characteristics. And that's inevitable, right? Because if you have, these engines are running more or less independently. So you will get that kind of a thing where you have the, you can express, you can express the semantically equivalent computation different ways. They give you the same results, but your performance characteristics might differ. That's what I meant. Is there anything to pretty tell whether the query can go down the old path? Generally speaking not, right? Because as I said, analytical views, it's a specialized engine that interprets that specific, but that will change with, so currently as we bring the HACS engine, it's coming in kind of like in a staged way. It handles a certain classes of queries and currently there is not a lot of intermingling that takes place, but over time that might be the case. All right, so some of these students have taken the introduction class in the fall semester and that was all dissipated databases. And in this one, we throw away the buckle manager, it's all a memory. So you haven't worked on the side of this, have experience in code. So I guess the question for you to sort of finish up the semester would be, what are some things that like, since you started working on HANA as an memory database that surprised you or just completely through you for a loop or can make you confused you that coming from a disk based system and working with an memory database that would make you not be different. Or is it all the same? Well, so let me go back to the 2009 House House first lectures. I'm sitting there as a cyber employee, of course, right? Am I, you know, my first reaction is, well that makes a lot of sense. Why didn't we think of that? And so I think looking at the- You saw when Hosner Planet are presented, here's what HANA was going to be in 2009. Yes. You were in the audience. You didn't work for him yet because he didn't buy you yet. That's right. That's right, that's right. So, but SAP was not a database company at that point, right? So I think it's not, the general story has been at a philosophical level, right? So looking at the technology and the evolution, I think we are starting to get to a point where we're saying the lessons we learned here in the disk based processing, some of those are valuable lessons, perhaps don't need to be implemented exactly the same way. And there has been also migration the other way around, right? So because if in the IQ case, let me use that example, right? So we did implement something after coming together, something similar to the HANA Delta store into IQ later on, right? So, but it's done somewhat differently because the IQ implementation was also MVCC, snapshot, but table level versioning because the goal was to build bulk inserts, very efficient query processing. And when you're working with that model, we were willing to say one writer at a time, right? So different design, but that means you can't deal with concurrency at all. When the HANA idea of the main and Delta came along, that kind of, we kind of said, you know, that makes a lot of sense and that's an addition that we can make to the disk based IQ processing by adding a Delta store on top. But now one difference between HANA's merge and IQ's merge of course is that we are merging. So effectively what we are doing is we're saying we keep the table level versioning for IQ big data store. We add the Delta store on top that is able to handle concurrent OTP style trigger loads, but we switch from a table lock to a row level lock while these things are running. And when the merge comes, now it's a different beast. The concept is the same, that you're tracking changes into a columnar in memory store, but the merge is slightly different. We're not, we can't afford to rebuild the underlying IQ store because it's potentially gonna be not doable. But at the same time, the IQ engine already knows how to do bulk ingest very efficiently. So then the merge basically says, okay, we're gonna do the merge effectively as a giant load from the Delta changes into, I mean of course a delete, a bulk delete followed by a bulk insert. But we can do that as a transaction that IQ already knows about as an internal transaction. So going back to the question, even before we added NSE, there is as I said a memory manager and a buffer manager in the HANA in memory processing engine, because there is still this concept of loading columns and unloading columns. And then that's kind of like a buffer cache manager except that the granularity of the management is at the column vector level, the whole column. But then from there changing that to this page loadable column we talked about. So it's kind of like become the same kind of techniques at a high level, right? You're talking about less memory than your total data sets. Granularity is different. You're still talking about eviction policies. You're still talking about, so some basic concepts are similar, right? And as we add NSE now, of course then the 30 years of dis-based database processing techniques at a concept level we transport over. Yeah. Yeah, make sense? Yeah. I have one last question. How long does it take to restart the 48 terabyte machine? It can take a while. Days. Well, so yes. But dis-based systems have the same problem. Let me go back to SQL anyway, right? It's a normal system. When the database starts, restarts, I always do the same activity, let's say, right? So typical application would be, so what I'm getting to is this problem is not unique in some sense. And similar techniques will apply, right? So imagine this. It's an embedded database. And so therefore it is designed to shut down you can configure it in a way where the database doesn't run. When the application first starts, first connection comes, the database starts up. When the application stops, the database stops. You have to do that, right? So now the first thing the application does when it comes up is present to a screen. So think quicken or some sort of financial software, right? The first thing it does is it starts up and presents a query, presents a screen. The screen comprises of 100 queries to the database, right? If you do nothing, then you're effectively starting with a cold cache and it could take you two or three seconds, not acceptable. So then we added stuff into the engine, for example, where the engine remembered what were the pages that were brought into memory at engine start and certain number of pages. It remembers that and as the database starting before it's receiving queries, it pre-worms the cache with those pages. So those are the kind of things we can do and we'll do here as well. But one thing I should point out there is that there's a lot of work we are doing with persistent memories. And one thing that persistent memories can help with is fast restarts, right? Almost instantaneous restarts. So yes, but restart time is an issue in that context, but it's also an issue in the HDR scenarios where you're failing over which is why now some people want to have a hot standby where the system is replaying the... It's ready to go. You're running a 40-terrain memory machine with that hot standby. You're running 40-terrain together in one. Any last question? Alright, let's thank Nio again. Again, as he said, there's another tech talk tomorrow at 12, that'll be in CIC in the fourth floor. That one, they'll be pizza. The final presentations will be Monday. I think, whatever, maybe something 6 or 5. So I'll send out a reminder. Let's do it in this room, not where we're signed. And then let's do it at 9 a.m., not the 8.30 a.m. if they set us up. And then I'll send a reminder out about the second code review and what's expected in the final presentations. And then if anybody has a question about how to write the final exam or doesn't know what paper to pick, let me know. They had to write a two-page essay of what paper they want to add to our system. They can't do the Hanok garbage collector because these guys are already implemented it. So good to pick something else. Anything else? Alright guys, it's been an awesome semester. Good luck with your other exams or the classes. For those of you who are graduating, everyone will pass this class. So don't freak out. Good luck with your other classes. And then I will send out an email about internships and full-time positions for the various companies that are interested. Alright guys, see you on Monday. Thank you. Thank you.