 All right, guys, well, let's get started. We are actually very happy to have Piper Bowman here today from SAP. And he's been talking about SQL anywhere, but he also works in the HANA team up there. And part of the reason why we're really excited about having him here is we weren't actually sure he was going to show up. Because he doesn't actually work in Waterloo where the main SAP sidebase office is. He has this other thing. Actually, let's turn the light down so we can see it as better. He has a telepresence robot called IvanAnywhere. And it's actually a YouTube channel if you're interested in seeing what it is. And it's this little robot he has in the Waterloo office that deals around. And that's how everyone interacts with Ivan. And you've been doing this for how many years now? Eight years. Eight years. So if you're a grad student here, you cannot do this until you finish your classes. After that, I can do whatever you want. There's Ivan meeting Ivan. So Ivan did his undergrad, master's, and PhD all at the University of Waterloo, which is a really awesome database group there. He's been at, originally at sidebase, but then SAP was at sidebase since 1992. 93. 93. And now he is the lead architect for SQL Anywhere, which is SAP's embedded database system, which you can talk about today. And he also works on the HANA system as well, so as it is an advisory role as well, right? So that's Ivan for Conor. Thanks, everybody. I'm really excited to be here. Actually, I enjoy this entire series. I had a chance to watch some of the videos in the slides. Richard Hipp's talk on SQLite was really interesting to me. And I'm glad I got to preview it before doing my own slides. So I'm kind of building on that and augmenting it knowing that you've probably seen that material. And if you haven't, you can go back and see it on the web. All right. Today I'm talking about what's currently called SAP SQL Anywhere. It's a database management system. And I'm primarily talking about it being used with occasionally connected devices. And occasionally connected means a couple of things. And it means different things to different people. So I'm going to go through what it means to me. Here are some of the use cases where SQL Anywhere shows up. One that's important to me but few others is the robot that we just saw is running a copy of the SQL Anywhere database. It's just been running for eight years or so now. And it's logging everything like the temperature, the voltage of the battery, the proximity sensors, how well I'm driving. And we use that for some predictive maintenance because the batteries do die. And we want to find out when do we need to order new ones and schedule a period for when I can't use the robot so that they can be replaced. A much more serious application is where SQL Anywhere is used behind an application that people just download and install on their laptops. So we have some customers that have over 2 million instances of their application. And the customers probably don't know this. But when they install it underneath the covers, SQL Anywhere is installed. And their data is actually being stored within our database management system. And there's a number of other cases that fall more into the embedded spectrum that I'm talking about here. So one of them, there's a hospital in Austria. SQL Anywhere is running in bathrooms right now. It doesn't sound very exciting to some people, but it's exciting to me. There are sensors in the soap dispensers and various other things in these bathrooms in the hospital. And SQL Anywhere is running on a small embedded device, a Raspberry Pi. It's about a $30 computer. It's based on Linux ARM. And it's collecting data from these sensors. And it lets the company that manages the bathrooms prove things like, when you push, do you get enough soap with every push? Is there enough milliliters of soap being dispensed? And how well do people comply with the hospital sanitation's requirements? And all of the data is kind of gathered locally. There's even a user interface locally. When somebody goes in to clean the bathroom or refill it, they can just type on the screen and things happen. And the data is sent back up to the head office for the company that does the work in order to do things like send out more materials and so on. We're used in some other cases. Railway inspectors, they could be out in the field far away from a network connection. And they're able to enter details where track maintenance is required or further inspection work is needed. It's used by utilities like water companies out in the field. They want to monitor. Is there any leakage? I don't know if you know, but a lot of water companies don't get very much water out from all of the amount that they process. And there's a lot of wasted money being spent on processing the water and making it very clean and then just letting it fall into the ground through cracks and leaks in the aqueducts. And this is one of the ways that companies are able to figure out where is water being wasted because they can drive around and use radio control on the truck to pick up data from a SQL Anywhere database that's running locally at these instances. There's also some very unusual installations of SQL Anywhere. I didn't think about the IT needs of an oil rig, but they're actually pretty demanding. An oil rig is really big and expensive. There's a lot of stuff going on. There's a lot of maintenance needs. And they have a really high speed network on the rig with pretty low latency on the rig. But when you have to connect to the mainland, it's really slow if it's even connected. And when the rig is being moved into position, you may not have a connection at all, but that's a really good time to do maintenance. And it's a place where you really want to do the right thing. So you want to send somebody up to a special part of the rig and they're supposed to replace a part. They better take the part with them. And somebody else better not have done the job or you're wasting time. So this is a place where SQL Anywhere is running locally. There's a user interface where people are interacting with it. And the data is being sent up back to the business occasionally, but not on an always connected case. And we even have a case in Alaska where there's a SQL Anywhere installation. It's doing something important. The only way to get to it is by float plane. So if something goes wrong, things are slow or there's a problem, well, the person that wrote the program can go there and help. But they're going to be making a very long trip to get there. And nobody really wants to do that, especially at certain times of the year. SQL Anywhere runs on a wide range of devices. We run on handhelds. I mentioned that we run on Raspberry Pi, so the Linux arm. Minimum requirements on Windows XP is 512 megabytes. We run on less on smaller devices. But we also manage up to 512 gigabytes. And we have some customers, which I didn't mention here because it's not really related to embedded, that are running on large class machines. It could be very slow laptop disks up to high speed SSDs or even RAID or SAN. And one of the things that's kind of interesting about our product is that you can take a database file on any of these devices and just move it somewhere else and pick it up and use it, make modifications, and move it back. So if you're in Lilliput, I think that we would be a neutral party because Big Endian and Little Endian were happy with either one. We see a lot of variation in database size. So we have some customers where they have 20 megabytes in their database. But that 20 megabytes is very important to them. So there's some examples where it goes up to 500 or more gigabytes. Now, obviously, you need different operational requirements for a larger database just in terms of backing it up and having the infrastructure for it. But we have that wide range. And we also have a lot of variants in query complexity. Some applications, they're just working with single table select, inserts, updates, deletes. They're basically using it as a key store. And then we have routinely 15 to 20 tables in the from clause. 50 tables is not uncommon. It's not surprising. We've tested beyond 500. There's no hard limit on the number of quantifiers in the from clause. And the schema also very widely varying. Some databases just have tens of tables and not much interesting going on. And others have thousands of tables and 30,000 or more procedures in the database. So what's going to come up is stuff that's probably not very remarkable or novel. I just want to position SQL anywhere so that we all know what we're talking about at the same time. SQL anywhere database can be viewed in different ways. From the operating system point of view, you can think about it as a regular file. There's between 1 and 16 files. Usually that's a good thing. Unfortunately, we had some customers that deleted all the star.log files because they thought they were wasting space for text logs. But that was your forward recovery log. And it was gone. And it's hard to help a customer in that situation. From the cash manager point of view, these are page addressable files. So you can address using a key of database number file and page number. You get either a 2K or 32K page. At a higher level, you have objects like tables or indexes. And these are always identified by bit strings. So we have a data type, which is a var bit. And for an example, for a base table, there's a bit string stored in the catalog with a bit on for every page that belongs to the table. And that's really useful for prefetching and scanning the table. Individual pages are envelopes for the, yep? On your previous slide, you talked about a single file. Yes. So in the slide you were on, what do you say? Do you mean a directory? So is it actually a directory that you have to move back and forth? So there are up to 15 different DB spaces that can hold tables. So by default, there's only one database file and one transaction log. And those are the two things that you need to move around. You can put them anywhere you want. Doesn't have to go in a special directory. The reason you might have more than one DB space is if you want to put them on different physical devices, because they're big. Or if you want to somehow partition the work, you have some stuff that doesn't change very often. You put it in one DB space. OK. I'm going to go through this quickly. It's probably things that you've seen before, just to make sure we're on the same page. If you want to change a row in a table, we take a right lock in our lock table. We remove the index entries that point to the old row. We get a right latch on the page. So nobody else, no other thread, is working on that page. We can change it around. If the page was clean and cached, it hadn't been changed yet, then we mark it dirty and we write a pre-image to a checkpoint log. And then we modify the row. And we may have to move other rows around with things grow or shrink. And if there's not enough free space on the page, then we use a continuation, a linked list of rows. And that was something that Richard talked about for SQLite. We follow a similar approach where there's no limit on row length. We just use a linked list and keep going. One of the classic examples, some of the really bright guys that created SQLineware in 1992, Dave Yaw and Dave Newdorfer. So they would create the database and set all the values to Yaw, which is four characters, and then change it to Newdorfer and everything blows up. And depending on how much you pack into the pages originally, you might end up getting these continuations, which give a little bit of a slower performance. After you've modified the row, you add the new index entries and you add redo log and undo log records. So that's internally what it looks like, but to the customers, it's always a SQL interface. So standard, select insert update, delete. And the phases of processing, this is similar to the stack that was shown for SQLite, and you take a SQL statement with parameters. You have a tokenizer or scan it, parse it. We use a version of Yak that was written at Wacom that does the same re-entrant and multi-threading concerns that we're talked about for SQLite. And then we get into semantic transformations. And these are rewrites that modify the parse tree in order to give better performance and get rid of things that, for example, unnecessary distinct. If you have a distinct, but you have a primary key in the select list, you can remove that. And there's a number of other CNF to DNF conversions that we do to clean up predicates. We're really looking for something that's Sargeable that can drive an index scan. A lot of times our users will give us something that looks really ugly with a whole bunch of disjuncts, but if you tease it out, you pull out some conjuncts that you can use for index scan. The portion in pink here is what is done at open time, a pre-optimization step, join enumeration, and post-optimization. And then we get to plan building. And then what we're really building is a volcano-style tree of iterators. So these are pre-written C++ classes, and they stream out row at a time. And our scalar expressions are evaluated using byte machines. After we've got the plan, we execute it, and then we close it. And this is an example of what the plan might look like. There's a combination. This is an eye chart, so I don't expect you to read it, but there's different join types, nested loop, hash join, and merge join. So some of these operators are pipeline, which means that they're positioned on a current row within a pipeline region. And those, something that's a little bit different than some other database vendors, can be scrolled in both directions. So if you have a fully pipelined plan and you have a cursor, and you might be able to fetch forward and backward and optionally, sensitively see the changes that are being made by other users. But for this type of plan, where we have a work table at the top, and an exchange operator to do things in parallel, there's no kind of scrolling or pipelining everything is, you can scroll, but you're looking at the results of the work table. You're not seeing live updates from your own connection or other connections. All right, just to go through the limits, we try to be very generous. So we have up to 13 files per database. Each one can be up to two terabytes. That's really controlled by partly some internal data structures, but also the operating systems. We let you have 45,000 columns per table, and we would even let you have a single row that was up to the full file size if you wanted to try it. And the way we look at it is, hey, we don't judge. It's, I don't think it's a good idea to do that, but if you want to try it, it's up to you. And another part of that is that we don't change our data structures to handle the high end cases better if it's going to hurt our common case. So we let you have 45,000 columns in your table. We do have some n squared algorithms that work on them. So if you look at making it a hash table or something that has a better complexity, the constants are worse at the normal number of columns. So we don't do that. So it's up to you to decide, do I want to pay those order n squared costs to do this unusual thing in one spot? And our customers came up with this, the Wacom rule is that Wacom does things the way that they should be done. So we strive for that. We don't always get there, but there is a corollary that if we do something that's not the way you expected, then you expected it wrong. So depending on who's saying that, I think Wacom, yeah, Wacom was the company that originally created SQL anywhere. It started out as a compiler company and I started on the compiler side and it was Wacom SQL, it was bought by PowerSoft and Cybase and then SAP. All right, I'm just going to go over some of the things that makes SQL anywhere a little bit different in terms of how it's used and what it delivers. In an embedded application, the application may want to have complete control over what the user experience looks like. If you're installing into it QuickBooks, they don't want to have you go out and do a SQL anywhere install and kind of figure out where to lay out your databases and so on. They just want to manage it into the user, it looks like all of it is their application. Our product was designed from the beginning for this type of model where it was supposed to fit underneath another application and even on a mobile device. We talked about deployment diversity from tiny to huge. So we have a small footprint that we can fit in those small cases, Raspberry Pi or in phones, but we can also scale up to the very high end machines as well. Developers with a range of database experience. So we have some customers that have multiple PhD graduates from Stanford or other places and they really know what they're talking about and what they want from our product. And at the other end, we have customers that don't have any formal computer science background and they're able to use our product in a way that meets their needs. And the way that we try to deal with this is by making it easy to embed, administer and support and that's really critical. Applications can last for decades. So if you have a working database application written 20 years ago, it may still be meeting your needs today and you don't want to pay the price to rewrite it. And the way we've tried to help with that is that we always read older database files and we broke that one time. And it was in 2006 after the first release in 1992, we read all of the older versions up to 2006. And finally, we had a flag day and said, okay, you have to unload, reload into the new format. But until then you could take an old database, run it in newer software, go back and run it in older software. There were some physical store changes that we really wanted to get in. So one of the examples I mentioned, that bitmap that shows you which pages belong to a table. Before we put that in, there was a previous and next point on each table page. Didn't make it easy to do prefetching and there was some concurrency issues. And there was a few other things that had kind of built up that we really wanted to get in there. And we decided it was worth it. Our customers still haven't decided if it was worth it. Failures happen. Failures happen a lot because you're running on commodity hardware out in the field with customers that aren't really thinking about databases but getting their job done. People are running a database and they shut the lid on their laptop and it goes to sleep and it comes back and it's a different day. Or they drop their phone and things crash. Batteries run out. So we deal with this. One of the common issues is strange hardware failures. So you get a bad bit in memory or you get some other thing that happens when you're writing it out to disk. Things just didn't get there the way that they were supposed to. We do a lot to try to detect problems by using checksums. We have a lot of redundancy so that if we get into a salvage situation where customer doesn't have a good backup and recovery plan and they're not following best practices, we can still get their data back for them. We always have durable commits. So we always run that commits are durable. Some of our customers have said to us, oh, we don't care if we lose the last few transactions. In my experience, people feel that way during development but not during disaster when they've lost the last few transactions and are trying to figure out what happened. So we don't give you the option of not making it durable. Single connection case is important in our use case. So I talked to Tamara Ozu at the University of Waterloo about a particular case where there's just not enough work for the database to do and he said, well, just get more users and assign it to the data. I can't get more users. They're a scarce resource. So my machine is able to do more work but it's kind of like a lazy high school student. It does the thing that somebody came in to ask it for and then it goes to sleep and I'd rather have it be kind of an eager university grad, go out, sweep the floor, get some stock and pull it up and get it to do things that are gonna make things work better. There's a number of key features. I'm gonna talk about some of these in more detail but I'll skip over them. Just a couple that I wanted to highlight. I talked about portable database files. Concurrency control are based on the standard three isolation levels. We also have snapshot isolation. Something that's a little different for us is that we have different regions of trust. In an embedded space, you've got the application developer who owns the application code and logic and wants to have some control. You've got the data owner. They are the user that actually owns the data and then you've got SQL anywhere and we own of course the source code. So we offer database and table encryption and wire protocol encryption but also some different things like obfuscated procedure and function text so that if you're an application developer and you're concerned about your end users kind of going in and changing things on you without paying you to customize it, you can use some of these features. Rich SQL language support. We have many of the features of the SQL 2015 core and beyond but we can't claim conformance at this time due to a few issues. Pretty full data type support. We've got spatial data, XML generation and consumption and JSON row and array types in procedures but not stored persistently. Stored logic triggers procedures and functions which are either written in this Wacom SQL dialect which is close to the 1992 persistent stored modules. They can also be written in native or external environments and we also have some interesting thing here where you can have a procedure that makes a call to a web site. So you could have a geotagging application where you've got an address and you wanna turn it into lat long so you can make a call out to a web server to do that for you. We also have remote data access to ODBC data sources or files on the file system. Something that's really weird, integrated HTTP server. I wrote a simple web server as an example for our compiler group in 1995 and Dave Yaw on an airplane flight converted it into something in the database engine and I thought this was kind of an interesting or weird thing that nobody would wanna use but it's actually kind of interesting because you have this Raspberry Pi running SQL anywhere and you wanna use your interface. What are you gonna hit to do that? We can serve up directly from the database server, a web application and you can read the strings. You can use blobs to store images and so on. So it's a little bit of a different environment in the embedded case where it really does make sense to do things that you might not think of. We use a significant difference from SQLite as we use logical logging instead of physiological logging. The transaction log has individual insert updates and deletes in the DDL statements. One of the things this gives us is redundancy. So if you have your forward recovery log and a backup, you can manually decode the log and figure out what to do. So some user comes to you and says, oh, I don't know, yesterday I had the shift key on and everything went wrong and I put in the wrong values. You can go through and find out exactly what changes that user made and correct them. That's a strange salvage situation but logical logging enables a number of other things like our scale out, high availability and read only scale out is done through log shipping. It also gives us some durability to two failure modes that we've seen. One is memory error where you read a page into the database, you change a few bits and you write it out and everything else gets garbled. And if you did physiological logging, you'd write out that bad new page to two spots and then you're cooked, you can't get back. But we have a distinct copy, a different representation that is not as likely to be affected by that problem. This also lets you recover to a specified point in time, so somebody accidentally truncates a table and you want to recover to the point just before that, you can do that with our logging format. In addition to the SQL Anywhere database server, we've got something called mobile link. It's an important piece of technology that replicates data from SQL Anywhere to a back-end database running, for example, in a data center. And this could be, this could be HANA, a site-based ASE, Oracle or the other vendors. SQL Anywhere or Ultra Light has to be running on the remote site. We enable some of the mobile link technology through our logging and running scripts within the database. Ultra Light is another database server that's a little bit closer to SQL Light. It's an in-process database server and it's targeted at the most resource constrained platforms. SQL Anywhere will run on a phone, it'll run on a Raspberry Pi, but it needs some number of megabytes to run, this can go even smaller and take up less space. Okay, so far that was just material to get us to the same point and now I'm getting to what I think is the interesting part. There's a number of query optimization challenges that show up in our space that I don't think are the same in other database use cases. And the real person to talk to about this is Anisora Nica. She has taken over the optimization for SQL Anywhere and has a large number of papers. I referenced some of them in the slides. There really is no expert available at most installations to tune the database to the workload. So there's nobody to manage it other than the software. The workload that we see is usually a mix of cheap OLTP and expensive OLAP queries at the same time from different connections. And often we don't see a lot of correlation between query complexity and execution cost so long as you have a decent plan. And the challenge for the optimizer is that the query as presented has many, many very expensive plans and a few good ones. And if you spend a lot of time finding those good ones you've spent more than the query could have done. So it's really important to get quickly to a good plan for even 500 or more quantifiers. And I had a paper with Glenn Pauley in 2000 about this that talked about how we were able to do that even in memory constrained environments. The machine configurations are diverse and they change at run time. So our costing has to take that into effect. And the data characteristics are wildly different between installations. So you could have a shrink-grabbed application and many, many of your customers are individuals just working with a small amount of data. And then a few customers have a huge installation and you wanna have a single database schema that works in both of those spaces. And the design choice that we have in SQL Anywhere is to optimize every open. So that was from the beginning we would optimize at open time. Optimizing at open time, this is another of the significant differences I noted from the SQLite talk is that Richard talked about a way that you could basically prevent plan changes when you deploy your application. And we've selected a different part of the space. And some of the things that we see is that you can use the most precise cardinality estimates. You're looking at the parameter values that are coming in for your query. You're looking at the current estimates of the table size and SKU. And we even estimate how much of the table or index is currently in cash. We have accurate estimates of how many other requests are running right now. Unfortunately, there's a couple of downsides to this approach. The optimization cost can be high and it's not amortized over multiple requests. And another one is what we affectionately call the dancing optimizer or the plan of the month club which is that things look good during development and most of the time, and then you get a bad plan and things take a long time. And this is a problem that our customers don't like. It doesn't happen very often, but it is a downside. There's a few things that we do to... Is there anything about the... Is there any sort of one thing you can say that causes your optimizer to... Usually we've seen this show up when we don't really have a good idea what your query is trying to do. So you've got some search conditions in your where clause that they're not your typical... Sargeable on indexed columns, but it's some kind of function is equal to some other function and we're kind of guessing at selectivity. So when you get enough guesses in there, often we will be very consistent, but then if you match it with like, if you're just on the cusp of one plan or another, and I have a slide that I think touches on it a bit more. So this is related to a problem we have with individually cheap requests. What's wrong with cheap requests? They're great, they're cheap. This is the way I like to think about requests. You have some think time at the client, you send your request, you do a little bit of work at the beginning, you execute for a while, you do one or two IOs and you're done. And the optimizer can reduce the execution time and give you better performance. But in the cheap request case, you're spending a lot of this blue overhead, building up data structures, doing your optimize on open and they're a very small amount on execute. So I mean, if you look at where the server time goes for data that's in cache for a primary key lookup, your fetch time might be something like 26 microseconds, but your total overhead is sort of on the order of 80 microseconds or so. If you did even one IO, even on SSDs, this would be completely dwarfed by the execution costs. But what it means is that if we make an improvement to our processing that will speed up an expensive query from 10 seconds to five seconds, which is great, it may add a couple of microseconds to open time to just consider other options. And that shows up as a relative difference here. And this is a topic I looked at in my thesis with Ken Salem in 2005. And we did this experiment with an endless with multiple different numbers of elements. And if you look at the slope, that gives you an approximation of what's the cost to do one index probe in memory. And if you look over here in this bar chart, the yellow is the server cost and the blue is the client cost. This little dotted line way down here is the cost to do a single index probe, which is very small compared to the overhead. And we have applications that are written that are basically doing a client side join. So they open up a query, they pull down a bunch of rows for each row, they're gonna do a primary key lookup. They're paying that overhead over and over and over again. So the approach that we proposed is to detect certain patterns and to prefetch the results at the server and reduce the number of fine grained requests that are sent. We haven't actually managed to transition all of that into the server. There's a number of, it's great theoretically, it'll get you a thesis. But we're not so sure about giving it to customers yet. This we actually have gotten part way there. Parametric plan caching is work that Ganesha Look and Dave Dahan and I did. We had a paper in 2012. So what this says is, okay, we have always optimized at open time, we have the values of parameters. We're different from other databases that make some generic assumptions about what the parameter values might be. How do we get the benefit that they have of amortizing optimization costs across multiple opens without giving up this nice feature that we're picking the right plan? And here, this is a plan space diagram. It's the two predicates looking at the selectivity of S1 and S2 and the plan space is complicated. It's got these little islands. There's like a bunch of different plans and different costs. And maybe some of those differences actually aren't that important, but a different plan is being picked because there's some special case that the optimizer is able to exploit or it's getting fooled. Well, the idea that we talked about and Dave Dahan and I came up with the analogy is that you're really looking at counting cards in Blackjack where there is a small upside. And if you do everything right, you're going to get the benefit of cutting down that optimizer overhead. But if you are not as good as you think you are, there's a huge downside that the plan you picked is worse by even one IO. One IO is a couple of milliseconds and you're looking at shaving off tens or 20s of microseconds. So what Ganesh said is, okay, let's look at the incoming, let's monitor what the optimizer is doing. Look at the incoming parameters, selectivity, and look at the plans it picks. And we're going to model the optimizer statistically using this locality sensitive hashing. And we're going to predict that the optimizer would choose a particular plan within certain regions of the space. And if we can't tell with good enough certainty, we'll just go to the optimizer and let it figure it out. And this actually works pretty well at picking good plans. It still has the occasional catastrophic failure where you lose all your money and you go home and shame. All right, moving on to access methods. So obviously full table scan, I mentioned that we have a bitmap and we can prefetch the pages that are in the table because we have a list of all of them. We also have B tree indexes. Something that's a little bit different is that we don't have any bounds on the length of our keys in our B trees. We don't judge. So they don't have to fit into a page. You could have keys that are multiple two gigabyte strings. I don't think that's a good idea, but we let you do it. The keys are binary comparable. We make a hash of two, five, five bytes of the prefix because almost all of our customers will have keys that fit that size. And then we use a compact B tree that's based on a Patricia tree internally within each node. And it lets you traverse the tree without comparing the entire key. There's more details in the paper if you're interested. The choice between a sequential scan and an index scan is one of the key decisions that the optimizer has to make. And it depends on things like how much of the data is already in memory and how much of it is gonna have to go through a disk. And for the disks that it's on, what's the penalty for random IO versus sequential IO? And Yee Ken was a student working with Ken Salem and Anil Gol, one of my colleagues in Waterloo. And they talked about a number of the trade-offs that are important when you're thinking about choosing between these two cases. But something that's interesting that may not come up in other database systems as much is that when you have a single connection, you can really max out the resources that you give it. And Pedram Godznia is a PhD candidate at Waterloo. He was working with me as an intern and he looked at parallel aware query optimization. So here, the observation is if you're doing an index scan with this yellow line and you're just doing one prefetch, then you get a certain level of performance. But if you put 32 requests to the OS at a time through prefetching, you can get much better performance. And the way that we model this is to look at a disk transfer time model, which is based on Anil's work from his PhD thesis on object databases. But Pedram took it to looking at the prefetch depth. So if you look at only a single thread for SSD and hard drives, the band size on the bottom is how many pages apart are the pages that you're reading. So if the band size is one, the pages are adjacent. So you're doing a sequential scan. If they're very far apart, you're doing a random read of the entire table. If you look at the SSDs, there's not a very big impact for random IO. There's a little bit, probably because of some controller issues. But when you look at hard drives, there's a really significant impact to the random IO. And the DTT abstracts a whole bunch of things like head movement and waiting for rotational latency and so on. When you look at parallelism, when you're willing to put lots of prefetch requests out, you get a huge benefit for SSDs, especially when you move out to higher band sizes. But when you do it on hard drives, you don't get as much of a benefit. So it's important to know what your hard drive, what your storage technology is. And Pedram's idea, there's two approaches. One, the user can calibrate the devices that they're working with. And the other one is, if you haven't done that at startup, he does a few probing requests and figures out, this probably looks like it's an SSD, this looks like it's a hard drive. All right, moving on to a slightly different topic, load balancing. SQLinary does support running multiple databases on a single machine. So this could either be in a VM, or it could be on a single server that's running databases that are for unrelated customers. So we have one customer that stores databases for, they're actually for competitors. The end users are competitors. And by consolidating it onto a single server, they're saving in terms of overhead of how much hardware they need to support the workload, which might be fairly light. You still need to predict, is it going to make sense to put these workloads together? And one of the things we observed, you can't always predict what a workload is gonna do in a different situation, especially if it has self-adaptive features. So you could look at a workload and say, wow, this thing's doing a huge amount of IO. It's just keeping the disk saturated. So I better not put it together with anything else. You put it on a different machine or with different concurrent requests, and it can switch to a much lower IO policy. So it might switch from an index scan because it was cheap on that other system to a sequential scan, which is doing less IO, but still giving acceptable performance to the customer. So in this work, we looked at adaptive invariant workload properties. So we tried to look at properties like, how many rows of data are being changed? How many rows are being matched by select statements? And these are not going to be affected by optimization plans. There's one idea that we haven't fully explored, which is to use the power of the optimizer against it and get the optimizer to run in a what if mode and try to tell us what would happen in a proposed configuration of multiple workloads. And that's something that we're continuing to look at. All right, if I look at cardinality estimation, cardinality estimation is one of the key features of optimization to figure out index scan or sequential scan. What's the size of this intermediate result? How many rows will be materialized? We have self-tuning histograms, base tables and temp tables. The statistics are updated on the fly automatically. We build join histograms temporarily just for intermediate results, and then we get rid of them. And the server maintains persistent index statistics in real time. So some databases, they have clustered indexes and non-clustered, and it's binary. So if it's clustered, you know that all of the leaf pages are in the same order as the rows of the table, and you may even use a B plus tree, so let's start right there. In our case, it's kind of a proximate. So we look at how far apart are the keys and what's the band size? What's the average number of transitions that you're going to see? And this, we see a lot of cases where indexes are kind of clustered, but not 100%, and they're not declared as clustered, but they give the performance characteristics of a clustered index. And so by tracking a few things, we're able to do a good job of cost estimation. Our column histograms, they have a combination of range buckets and singletons. The singletons are used for the most frequent individual values, and they're updated by the results of predicate evaluation. So they really tune themselves to what are the, the search conditions, the predicates in your query, in your workload. So they try to give accurate answers for those, and the goal is not to match the data distribution of your rows, the goal is to answer accurately the queries from the optimizer for your workload. So it's easy to get misled and say, well, this is a bad histogram because it doesn't match the data. That's not important. What's important is for the queries that you are presenting, do you make good choices? So we have self-healing statistics. We monitor the quality of the estimates that they are giving. We characterize them as poor, and then we heal them, or bad, in which case we drop them and rebuild them through piggybacking. So if we find out that we're gonna run a plan that's doing a sequential scan over a table and going to see all the data, it's an opportunity to recreate the statistics that time. So we have a lot of adaptive algorithms in the database. So one of them is the buffer pool size. You don't have to pick how big is your cache going to be. By default, it adapts based on the size of your database file, what's your workload, and what else is going on in the operating system. So we view this as being a good corporate citizen. You're running on a database on a laptop for some shrink-wrap application, and then somebody starts up their email program and the pressure on the operating system increases. Well, we can shrink our memory consumption, especially if we don't think it's going to affect performance very much. Another, a few more things that are different. The database restarts probably more often in our cases because, especially in the single user cases, you start up the application, the database starts up, you shut down the application, the database shuts down. We have something called startup warming. It keeps track of the page references during the startup period, and it brings them in. A little bit of an issue there with the optimizer. So an issue with self-adapting algorithms is that they may interact with each other. The optimizer is looking what's currently in cache, and the cache warmer has brought in some stuff, and the optimizer says, hey, I'm going to change my plan. So now, you've done warming based on plan A. The optimizer says, well, the state's different. I'm going to choose plan B. So now you record a different trace for plan B, and then you get into this cycle where the warming and the optimizer continuously change between different alternatives. And there's actually some heated discussions in the hallways between the cache warming group and the optimizer group about what should happen here. What's currently happening is the cache warming lies to the optimizer and says the data's not in cache yet. You have to pretend and pick a plan that it doesn't exist. But the general topic is that adaptive algorithms are good, but interactions are hard to predict. So are you saying that adaptive algorithms isn't revisited periodically, so like, you know, on the desktop or something? Yep. The database starts up with machine builds, but then I have to open up a thousand browser tabs. So at some point in the future, does that get adjusted back? Yeah, so for the cache warming case that I'm talking about, by default, it records every startup and then uses it the next time. So one of the things we've talked about is having some hysteresis or history so that it's a weighted average of previous behaviors. Generally with our adaptive algorithms, what's happening right now or recently, as opposed to longer term, both because it's easier to look at what's happening right now and also because stuff may have changed dramatically from the past. We have a different type of cache warming for steady state. So you recognize that your application has loaded up all of its data and you say, okay, that's good. And then the next time you start up in the background, those pages can be loaded before you get to queries that ask for them. So we commonly get these users that say, well, I've been running this system for days and everything is good, but I just need to shut down to do this thing. And if I bring it back up, I know that things are going to go weird for a while because the plans are going to change and you'll be doing IO that users aren't used to seeing. So this is a way to deal with that, that you can get back to that no one good hot state. Every table and index keeps track of how many pages are currently in cache and the cost model estimates how many disk reads are going to be used based on that. I'm going to quickly go over some material. We dynamically adjust our multi-programming level, the number of threads that you have available. You might think more is better. Well, it depends what your workload is doing and one of the cases we see is where through application design or other reasons you can get a convoy forming where you have one resource and all of the connections need to hit it and if you add more workers, you're increasing the length of the convoy and the amount of time that the overhead that's being spent for every worker trying to do its work increases. So increasing the number of workers can be bad. It can also be good if you're allowing more things to work. So Mohammed Abouzour and Ken Salem wrote a paper on how to do this as part of Mohammed's master's thesis. We have a number of adaptive query processing methods. We will choose a different access plan locally at execution time. If you have a hash join and we find out that the build side was a lot smaller than we thought, we're going to switch to a nested loop join on the right-hand side. We will switch to a low memory strategy if you had a group by hash operator and we find out that we just don't have the memory we were expecting. It can stop and use something else as fast, but it will work. We use parallelism within the query plan, but we adapt to what else is going on. All of these things turn this each morning off. It's always like heuristics are check-and-see. This is something that's wrong. We switch to this other thing. Go back and switch back. There's no sort of high level thing that makes these decisions. Yeah, I mean it's done locally on a cost basis with information from the optimizer. For this hash join swisting to nested loop, for example, the optimizer says, I think hash join is good here, but if you see less than this number of rows switch to nested loop join. And we do it on a very local basis instead of having, I think if you look at a graphs paper, the choose plan operator was kind of a more generic way to do that. So we've talked about that in robust query processing. We do... The triggers are almost like application... Like what we've got in the product are things where we can be sure that we've recognized that there's a problem and we can do something that we're sure is not going to make it worse. And that's just a lot of abundance of caution and not making things kind of go in strange circles. But we've talked about these all of these kinds of interesting ideas like running two subplans in parallel especially if you know that you have some uncertainty about selectivity and doing some sampling locally before you find things out or re-optimizing. All of that sounds very interesting but it can lead to a very unpredictable experience for the users and we haven't gone there. We do for plan caching right now so we have a blacklist that says I thought that this was a really good candidate for plan caching and it wasn't. So our original version of plan caching the first ten executions it looked at is the plan always the same? Oh that's good. On the 11th I'm going to do what the other vendors do and just pick a plan that doesn't look at the current configuration and if it wasn't the same we'd say oh that wasn't good but we'd execute it anyway and sometimes that 11th plan could be quite poor. If there was especially some special structure of the query if you had a like condition with a non wildcard prefix you could use an index scan and if you optimize without using the parameter values you'd have to go for a sequential scan instead. So now we re-optimize so we go back to the optimizer and we also put some history in memory it's not persisted at this time. Restart the data system. Everything goes back to scratch. So we've talked about having a persistent query information that knows about all the queries that have been executed and different conditions but we haven't gone there yet. The shops where they don't have any data sectional that are okay with this. Along with I present the Stanford PhDs and the CMU PhDs. I didn't actually meet CMU but I'm sure they're there. So in those guys did they say now we know what we're doing turn all the scrap off? So this made me you happy. They knew a lot about what they were doing that didn't match well with what worked. So in practice many of the things that they wanted us to do made things worse. So we were able to show that through experimentation but turning things off yes our approach is good defaults no knobs where we can be sure that you don't need to change things and then a knob that you can turn it in different ways. So you can turn off plan caching there's no our product does not offer a way to say use this plan but you can say I want to use this index and a few other hints. So there are some controls there but somebody that says I'm a DBA and I want control over how everything works we don't let you have that level of control. This is actually the end of my I have some other material here but this is the end so I'd be happy to take any questions. So we have scalar expressions you know x is equal to y plus 5 originally in 1992 that was a tree of expressions evaluated through recursive descent and there were flags that indicated how we previously evaluated this y plus 5 and we only need to evaluate it when y is changed so in version 10 this was one of the changes when the database file format changed we take that tree and we compile it down to bytecode so that we figure out how we evaluate the y plus 5 which may be eagerly before it's needed and then we retain the value but it's our own byte machine it's not a Java byte machine or LLVM so if we did it a little bit later LLVM would have been a great choice probably but it's also stack based and probably three address would have been better so it was really interesting to hear Richard mention that about SQLite a lot of people start with stack machines which we did I blame my co-op student it works well enough but our byte machine is not Turing Complete and it doesn't have recursion so a three address machine would better meet the needs you don't need to have stack frames and so on there's a developer addition that you can use for academic use so if you want to download it and try it out we have the best price performance on TPCC for databases that fit within that realm so we're good for OLTP we don't have published results for OLAP style workloads but we feel we do pretty well there not as good as the data warehouses but good so can you say roughly like percentage wise what percentage of customers are running on like cell phones versus laptops or students, rock servers I think that I mean the curve is really steep that laptops and desktops dominate there are lots of instances on cell phones that are used so numerically they might be close to the laptops but they are not used for very long I guess and then there are a few that there's a number of customers that may have an application that they want to run in different spots and there's some that just want to use us on high end machines because they like the way that we do things so we do have a range obviously they tell you that the overall architecture is exactly the same across all devices and that's something that's important is that it's surprising or it was surprising to me that on a handheld our customers are using all of our features materialized views and spatial and text indexes and we thought about making a low end version especially back in the days of CE our executable size was 1.5 meg and that was kind of big for devices at that time and some other vendors were really stripping stuff out to kind of fit into there and we probably could have saved a third of our executable size but we left it in to be honest it was just easier than taking it out and retesting it and customers really found that materialized views and all those features were what they wanted they didn't want a stripped down version even if it was a bit slower to use some of the features to have them there it was good