 So, you see four of them, and I'm already familiar with it, is any member of David's that's built in a really, really fast. And the other thing that's still in my background, I was at CMS underground. I'm really interested in the legacy of Dan at TDU. Actually, I wasn't born on the 5th, but we used to work more on the 5th. So, I'm currently on the 3rd of the year at NSQA. So, we're going to start talking about it. The first thing we're going to do is have a stage for why an individual can do matters. And the best way I can remember is who is the future leader of my partner, called Hybrid Regression, which is actually a medical process for the first time. And each time is essentially, the memory's been out for a while, and probably the point is for it to really matter is to think that these two friends, like Dan and I, are trying to clarify the importance of, you know, the point that standard enterprise companies can understand. So, it's a really good way to set the stage for why we're trying to develop a medical system. And that's really what Dan and I, this is kind of the meat of the presentation, is where the key innovations that are in the people that enable us to kind of deliver on being a kind of very powerful, very exact, sort of a database. And just in case we're done, I have a question for those people that I'm going to answer, I'm going to play on a little bit, and whoever is the one trying to make. First, one memory. Um, probably speaking, there's two great categories of database workloads, um, all of them being, such as online transactional processing, and OLAP, which is online analytical processing. We actually found that in both of these great categories, people are really hungry for what they're going to do. So in all of these places, in every position or in every section of the project, you can push forward, connect in six seconds, do an internet assistant, and return on the line. And you guys can just do a data perspective, process a little bit at a time. Just for building an application, collecting data on one of our purchasing states is the most powerful thing that's going to try to help us work on. When it is necessary to store more data in memory at a faster rate, in memory kind of enables us to do that. And then we also have people who are low data faster, and they want to generate a course faster. When you load data faster, this is called data latency, which basically means the time from the state of the board to the time it's available. When you load data at a course faster, this is called program latency. And that's the rate at which we're going to receive results of queries. And it's really kind of a combination of these two. Let's be kind of an accept, and why do you are interested in addition and whether or not. If you want to do the same, these problems are pretty well solved. If you want to just literally pass on different bits of data, you can use extreme processing systems. But if you want fast queries, but it takes a long time to run the data, there's plenty of old-school program databases that you can all describe them for you. So if you want to know more about what those are, a lot of these workflows are not what you find in a textbook. So why do people use databases? And if you go on the field and try to present them in every database, just click on the link in the box. Let's say this is a kind of stuff I want to do. This is the way I look at it in memory. So one thing is just interested in processing data in real time. This is only a case amount of data from when data is born to when it's actually used. And this is pretty much the standard across enterprise companies. And they want to go and do this in real time. Another thing that's generally the most of the processing data sets, the standard today is that after collecting data all day, and not being able to use it, you do like 4 to 6 hours of preparation overnight. And what you can use is sort of an O2, the materialized approach that I was going to tell you in the next day. But for processing data that either comes in, there's really not a lot of flexibility. And all these connections are successful. The time it takes to react to an event tends to be too long. And this is only a stretch of forwarding when it defines a part of the business. So from the perspective of traditional technology and physical, it makes it really easy for us to provide value on an organization where it's actually only accessible and able to save money in the event of a failure. And finally, dashboards are really slow. So if you look up again, a financial analyst to allow such a company, you really need to be able to access more data. And doing that in a sub-second manner where hundreds of users can currently fill in as many as you can, is a very hard problem. And so all this energy in the American place, around the kind of non-characterized workloads has led Gargoyle to use this tool on HVAP, which is hybrid transaxle and local processing. So for new category, I think it's new as of this year, actually. And what's really cool is that in all of the documentation in the recent part, let's see if there's a hard line in the space. So it's kind of been really cool to see technology working on drying how the industry looks at this phone as a whole. So, again, this is kind of feeding the last line, but HVAP is all about animal rights over concurrently changing data. And then there's two kind of really interesting results of HVAP that an HVAP-driven system needs to be able to provide. One is predictable performance and scalability. Because you're playing with data latency and query latency, and you're trying to power hundreds of users controlling access in the backboard, and it's really all about performance, being able to predictably excel and perform at the same point. And that's why one thing was distributed for scalability and a memory for predictable performance. Memory is a very, very predictable basis of how it's going to be used. And finally, and the one thing, it's not, each hour is kind of your own. You don't really have time to move data from multiple sources. I mean, if you want to process it manually. And so, it's just like when you see who are going past having to do an exam. So essentially, I'm going to talk about a lot of adults at the end. We're trying to create things that traditionally take 44 hours. When it's 24 hours, it'll seem like it's going to be one party, but seriously, four days. And so, it's an application that, yeah. So we're trying to take things that have to be taken 24 hours, playing with data, and either be able to do an ad-hoc or do an ad-hoc once a quarter of a time. And that's how much more we do really work. Yeah, you're doing it, and you're doing it, changing data meaning the, out of the data, you're doing it in a period at any time, sudden time. So you don't have to wait for the previous days, you're doing it in a period that you want. And what you're seeing is very, very much. So it's all kind of up to the user and how the person actually wants it. You can also try to track, transform the load. And basically, a generic process of getting data from multiple sources on the system. There's a whole industry around ETL. And so initially, we have a separate data industry, so it's actually a separate data industry and a local system on the data warehouse. And you use ETL, essentially to get data from one place to another. And you just have, you don't really have the, one area where I guess, for data that are to be able to ETL. And finally, it kind of, in that way, it's a great way to deliver the data that we've prepared for, like actually, by the gate center. So it's also a sign that you get a higher throughput and a lot of latency. And most importantly, at which was always. On this space, the first, with same kind of time and time again, you don't get flexibility. At certain times a day, on Amazon, for example, a PDX is having a bad day. You're working on the high speed. And depending on predictability, like those from memory and the transactional world is a very, very important. On the other side, in memory it enables you to access data as it is written. And another point that's very important to consider, and it's kind of the first thing we talk about, that's part of what we talk about, many analytics and memory, is obviously, I don't want all the data that I'm analyzing to sit in memory. And being able to kind of have a high throughput of more latency, maybe less, there will be a lot more data as it happens. For one solution that we're proposing, we're going to say for, is having a planner so that as well. And that's something that we can talk about in that, but essentially a planner format enables you to most likely be able to move between memory and the client. So, here's the place I'm going to show you the SQL. It's an email address for it. It's distributed. Here's kind of the basic picture of what it looks like. There's two tools called aggregators and leaves. Aggregators are kind of closer to where and where the cluster looks like. And you sort of play with aggregators. Under the hood, they put a couple of queries and then intermediate results create for leaves and then combine the final results together. Each week, they expose slices of data and processes on those slices, which is a little bit too much about the platform because it was distributed so far. And then SQL is a very quick and simple set of online D&L operations. Prevent things like taking point-of-time snapshots, scrolling across the algorithm, even up and through the wall. We enable you to do all of these things while you're both reading and writing to data. This is something that makes building a system a lot harder, but in the context of wanting to have a lot of flexibility and wanting to work on changing the results, it turns out to be very important for a system that's already in production. It also leads to a pain point with the system that I think helps to help with the technology. And finally, we made an investment in the end-of-class answers school. Now it is a lot of databases that have different interfaces. It's something about teaching tools and people don't. Signal is hard to build and it's quite complex, but it's usually so standard and it opens up a lot of doors so it has a database right up. So in this app, I'll make it over five big flybacks or a team of eight students in the school and then we'll take that and we'll be back in time. So the first one, which is pretty popular if we were to read about the system, is one of the two abstract values in a luxury checklist. Another really cool part of the SQL is that we do code generation to try to keep queries. The most important in memory system or durability and replication design is pretty different than a traditional disk space system. So some of the products around that are going to help us to build really efficient durability and replication. Most of the memory system is one of the really interesting topics. How we think about distributing at the same time and how we think about fast between distributing to someone and distributing query to people. So distributing query to people is a lot of fun. And we can jump over there in some kind of data exploration in the school for our long query jobs that people try to run. So I think people here are part of the SQL. Let's see, both are pretty cool. So that's pretty cool, it's pretty simple. There are a lot of useful events here. We've been here in London about 20 years before that. And roughly the same problem with the three. So one event I expected to look at is going to be. Then we have lots of room to build and kind of think about and read about in a balanced way. And this is the first one that applies to a lot of rehabilitation. So we can expect to look at the SQL SIPPOS indexes. There's a really big blog post on our website and this is kind of a paraphrasing of the information on the blog post. So I'm going to encourage you to get interested in a lot of quick signals to check out some of those. But this is one of the reasons we have teachers around why we choose SQL SIPPOS indexes. So one thing is SIPPOS are very, very, very optinized. Random access in any other role is very feasible. And SQL SIPPOS is still running on access to staff. That's because every role in a SIPPOS is good for people going to work. There's not a lot of kind of, we're obligated to engage one person immediately. Data storage, I have to deal with the desk. They don't have the money to run my SIPPOS so we have to build this extra loop. So this extra layer of interaction is in between all the buffer pool. So buffer pool has complexity and obviously access to performance as well. SIPPOS are very simple. So a new implementation for a lot of SIPPOS is in SIPPOS of about 1,500 lines of code. You can create a new implementation in SIPPOS to 100,000 lines of code. SIPPOS, those things are involved with it. Building a lot of SIPPOS is kind of a very active way of research. And here is the paper for Microsoft's SIPPOS, and here's the root space. You don't need to register your SIPPOS but again, any more complex type of data. So one of the really important things about investing in a fairly simple data package is that you can use your SIPPOS. And then people will be able to go from zero to a maximum and they can start out in there. And SIPPOS index is a major reason why we're able to do that. SIPPOS are, I can, not free. Not free means that this is going to always be a problem, no matter how the explicit, right, this is going to be. And here, building something that is making you go to print and work on retaining datasets, what's good is a good example property because it enables a lot of concurrency. SIPPOS are kind of, they're always going to be a simple, but anything that won't take from the SIPPOS, there are quite a few reflections on a lot of other data structures. And as a result, it's going to be really, really fast. And when I, SIPPOS isn't very flexible, one really interesting sort of innovation in SIPPOS is what we call the SIPLIST diagram. And what we do is we use the higher level powers of the SIPLIST as kind of an estimation of the incarnality of different filters. And if there's not really this problem, generally it's just over in database by maintaining SIPLIST over just a decent amount of data. And in SIPPOS, we can compute this really cheaply by the structure of the SIPLIST on the slide. So it's safe for us to go through the process and build the inexpensive SIPPOS. And multi-level concurrency. So the way that we do it in SIPPOS, every way that the SIPPOS is a lot more robust. And a lot more robust is just the rest of the region. So there's a lot more there. And then SIPPOS you can slowly get rich on what SIPPOS is all about. It's quite a lot for you to all of you here on work. And then another really important thing to address in SIPPOS is a lot of comments and terms stand to present for why would you pick a SIPLIST index? As far as we know, we're the first commercial to actually really contain this system to actually offer SIPPOS indexes. So we do get a fair amount of SIPPOS. The first thing people ask about is memory overhead. SIPPOS are towers that you need to maintain. So the answer there is that the average SIPPOS tower is two. So the average overhead is about 16 bytes. So the 8 bytes per point are one point per tower. We also do this anti-trip where because we're totally aggregated every cable row, we get in line pointers to towers within the SIPLIST structure. In a commercial, if we came back into SIPPOS, you have to restore all of the towers and turn them here next. And we avoid that when we're in SIPPOS. SIPPOS has efficiency. I like to detail the data as it can be local. One of the traditional SIPLIST jumping pointers for every row. This is a very interesting concern because it does have SIPPOS. It turns out that for OLTP, SIPPOS is not as well used as for OLAP. In grouping and joining numbers, you have to have enough reference where you can get feedback from the performance by just asking for performance. Expensive execution performance is also very, very important. Again, standard performance, SIPPOS does not disappear when we do it. Every time we move along the SIPLIST, we shift to an effort-based system. That will do one important part. We use bashing reads, especially our garbage-like weights for non-overlaping read-and-write accounts to be able to improve that memory. By switching to the SIPPOS, we can see that we are able to do that. You know, that conversion in these early days is a more online type of RCU. It comes to the solution down into RISCAP, everything that you read. So, RISCAP, in two weeks' time, it turns out that you take a good amount of time on every, you know, if you want to do a RISCAP operation, you generally need to have a backwards point on your SIPLIST. And that's very, very low-free. I don't know if there are any papers right now about how to do a low-free, but it also increases your total memory over that number. So one of our engineers appeared on the way, it's helpful to scan a pointer and every time you need to go backwards. So the advertised product is still available, but it's about twice as low. So that still is a problem low enough. The most common mistakes for a RISCAP operation is really, like, give me the highest ten lightnings. In that case, there's no answer to both of those. And then you go back to the SIPLIST, which I know. So, if you were scanning backwards, you would keep a pointer here, here, and here. And then to visit this one, you would see, okay, I don't have anywhere up to here. I have a pointer here, so I'm going to go this way, and I'm going to go back from here, you need to go like this, like this, and like this. So the expected energy response is still available. And, you know, we were, you know, we didn't have this feature. So people would go running and it'd be cool to tell you that you could enter it forwards and backwards. So people would travel in queries and enter it forwards and backwards and then SIPLIST would easily run into this problem and try to get it through all of us. The answer was on SIPLIST, which is kind of where we ever had it. I also included the process of the SIPLIST solution to not be much simpler and hence much better. So that's kind of the basic idea of the SIPLIST. Again, they're very simple, very fast, and if you want to build a data that you can enter in the market and you're in a good position, it means you need to do it fast and you need to build something that works. So building a lot of SIPLIST is another technique that really matters to just a lot of people. In the database, where I listed this really major factor for performance, SIPLIST matters a lot as much. And the system where you're just getting a memory it really really starts to matter. And so we invested in point generation from day one basically which is kind of comparable to the difference between SIPLIST also and SIPLIST. And that's why we're interpreting according to what I'm not the pilot of. So there's a few areas where point generation really really helps. The first one is timeline standards. So in the SIPL, I will talk about this in a bit. We invested very heavily in using SIPLIST best efforts. As well as push things like massive functions and preventable crisis. We're into the definition of the SIPLIST. So pretty slowly, next row in a data structure is a different function at all. In SIPL, we can use the SIPLIST to file a depressive order down into any one of these operations as we innovate in the SIPLIST. The really common reason that you want to use point generation is to be aware that everyone is aware of what is expressed in the SIPLIST. So we see databases that say to do point generation make sense. It's the same reason that you want to use key systems that work by benefit, building a machine learning workload. One really interesting remark on point generation is we need a very, very powerful point cache. A point cache is something that takes a point screen or a 10-pointed point screen to an already compiled plan. And the reason this matters is that kind of the compiled query is very, very different from the SIPL, a simple tech query can see right from the 1 to 2 seconds to the file. So just having a wild application is really, really interesting. So we actually can invest a lot of engineering effort into identifying and prioritizing plans in such a way that many, many points will be in the same compiled plan. And try to do that point. I'm curious on the technique because we really benefit from all the key workloads that took a 50 to 200 unit price. But the data was not your workload or you want to set the risk to your environment that play with the data base. But the directs is pretty high. So in SIPL, point generation is putting into two big categories. One is growing and this is a bunch of headers which define what the SIPL is, the allocators, all the things that we need to be able to inline with respect to indexes. And the other one is the basic one. The pencil soon lets us do things like have arbitrary decimal points that are fully defined at the final time and that are no social dynamic overhead with respect to their size. Again, we really have to be investing in the SIPL's most complex to come to everything. It has its pros and cons. The pro is that it enables to build a lot of things on a very fast and a code that's compiled as a result is really pretty fast. The downside is that people who don't work so much on the performance of a completely complicated table. So, cogeneration in SIPL is really, really slow. And finally, to go over the basic model, each table we generate a value file. The value file includes the table definitions and a bunch of different functions that we can expose and ensure in one operation in the table. And other time is a best SIPL file. So, if you go in and download these SIPL files, you can actually look at what those files look like. In the last trick, you're going to build an opportunity to store people's costs. You should definitely look at the SIPL pre-compiled headers. Pre-compiled headers are essentially GCC will post their payment and essentially four of them will sell. So, it's actually between cost and state. It's fairly limited. It gives you like a two-to-three extra boost in your life. But I'm just going to tell you how many of these are included. They used to prove it of what this table tells me it looks like. So, to give you the idea this is a pretty single query I ran on my laptop. First time I ran it, it took 800 milliseconds. By the second time I ran it, it didn't require a useful laptop that ran immediately. I don't know what's in it, but this is just to give you an idea of what lots of people's SIPL files are going to make people look like. So, I ran it for ABC, I guess it's a little way. So, I ran it for GDF and I ran it for 100 million dollars. Generally, GDF is only about a lot of people access an email system they even have in the building. The answer is no, they don't do a building. It's running in an email system. It's not as important as a sign-off they will have in a display system. The most important one is we don't have to read through this error. So, we don't have to write things and we don't have really important costs of doing randomization results. So, we're designing around these write-offs and built-in doing-multi-design that we can do and we'll actually be out of this. So, the first thing is that the primary thing that we like to reduce is for the transaction log. That's three standard data from one interesting data so, the transaction log is just offering the primary two data. And the reason is as we read the transaction log we can reconstruct all the errors state of the indexes that we need. So, to introduce community and not to look kind of too past-long, relevant and against-price. So, the actual lines are very inefficient to report and the reason is in a transaction we need to learn a bit about. So, again, the pretty standard thing is to press a transaction log periodically into a snapshot. We need to see if we have a way to look at that transaction because we need to have a competitive computer and have three X-rays of long, three-statical database at that time. And of course, that happens when you're putting it in parallel. It's essentially the case of replaying from the most recent snapshot and then the remaining transaction logs to the overall state. So, this is a fairly different world of databases. Generally, if you write the style of your kind of nexus of disk, you can read them on-site. But in most people, we need to provide the guarantee of all the things that you really are safe to learn. And therefore, we have this problem. So, the conclusion is we've invested three times closer to a non-linear right. The transaction log is a basic differential pattern. The snapshot is a basic differential pattern for both reader writes. And the benefit is, the disk actually tends to keep up with the memory of one workflow. So, we just test how many people in the same place. So, we're going to put more transactions on the disk as well. So, going to the pattern to display state-based. Replication follows. We support boxed events and instrumentation. And answering this application, we have a transaction buffer in memory. And you can guarantee to have written to that transaction buffer in memory and we have a good point in the background that is how much transactions to do. It comes up in every solo conversation that we have. And the main reason that we have signature loading is to check off a checkbox for when to go ask a question. 100% of the questions are whether you signature a building. Including like a screen. And anybody that has data. As a result of using a signature of a building, you don't really end up with signature loading. You limit the ingest right into the system. And so you use data by not being able to ingest it right into the system. So you have to continuously discard data. With incrementability, you can kind of account for spikes that take advantage of the other values. And the number of transactions you can put through the system and capture density higher. That's all I have to say. I think that's what the SEPA performs. We can sacrifice that last 100% to a point that the average rated ingest to the best stages next slide. I don't know. I don't know how to call those words. I haven't looked at it for like three years. And replication, as we discover in the moment, is actually the most important building block to the disseminated system. We have to figure out a way of thinking about it. The answer is 100% next. We did not have replication. And then the release started three months after we had replication because it was such a pressing request. And everyone is running from replication. So replication is built directly on content durability. Essentially the master in the master in the master master. So the master screens start finding more files in the storage. And the slug is the same as the continuous recovery load. So it's exactly the same compact as the control on the C4 and the recovery is not found in the log. In this case, it's continuous recovery and it's possible to read or not like. And because replication is very simple, it's also very robust. So we built a 2012 release and then completely changed because this design worked very well. So one thing I did show aside is replication is actually being built back to possibly. For new data for redundancy, for cost data for replication, all of the things like you need to move around data after it's been created, both synchronous and asynchronous applications are in the cluster for you as I've already done it, but synchronous in different places. So our question was all in part as many as one of the others with pre-tune or deduction were in aggregators and leaves. Both tiers are available. So UCL needs to add capacity and compute to the cluster. And UCL aggregators to add is the network bandwidth that we can't use as a cluster. So UCL aggregators is able to do different sets of things. We support both inter-data and data center hybrid mobility. Before I start, I already was enterprise teacher at the age but these things really do matter a lot. So two kind of data center hybrid mobility, essentially you need to get a team of machine guys who will transparently replace that with non-data and non-data cluster. It's one match here and one it's like, and it's only exposed to views. So you get a different set of trade-offs using these two different features. Obviously you can use the future as you can use our intro data center that the patient request is and is of new wish. And other customers that do re-establish them. In general, our question is is there to stay out of the way? So there are systems that will automatically use different data and release data or do all the things we want to see into a cluster. Our design is not to do that. And that's based on people who would help with that which we'll talk about in a moment. In general, while we're doing question, we did a lot of research about the problem of non-data and non-data. And we discovered that the more automatic system as you would with your cluster, the worse the user experience would be and the lower the average performance would be. So let's come over some basic questions that we have. This is a basic setup with aggregators and leaves. We have a leaf pattern. So we have more than two or a parent, at least three and four or three. This means that they have the same set of predictions, half are massive and half are quads on different nodes. So it's very simple. But we, as guys, we just blow graphically on this graph. Anybody want to read from that? The flow of the time depends on the effect. If it fails because it's unreachable over the network, it's kind of immediate. If it fails because it's unreachable over the network, then we have a threshold that you can compare there. So the default turnout for a very basic measure actually only do a sort of 4 to 1 point per leaf. It will be a default time after that is 10 seconds. Depending on the weather and depending on the requirements out of the cluster, it will be good. In the same expected case it's going to be it. Our leader is a leader of product success. I'm thinking that's why if we want your caring as a way to strengthen our predictions, there are serious benefits especially for older people with why and when they vary. So we're going to visit that at the end. Any questions? We can turn it on automatically because we want to make sure that we have a good response potential which we'll do this for you. If you want to read this, I'll get to that later. Okay, that's kind of an innovation that we're going to do where it comes from. We don't do trade fairs. We do trade fairs. Yes, it's an awesome thing. We want to generalize it and make it more public outside of the data center. There's a different side of that after the data center. You don't ever want to have to take at least three or four different data centers. Some of you might be going to be three or four of you have fairly unpretentious performance and spend more than every day there is. So we've spent a spread of measures to see if it works really well. We've got a lot of these and a lot of people say across data centers we're going to provide where the nodes are doing the data center and we're waiting for the data center center to adopt. And that's why we do read, write, and that's what I'm going to do. This is what I'm going to do. So in the scenario, it's kind of the same that the latency between ads, leaf, water, and the fewer the closer the application ensures that the secondary cluster may really impact the performance of the primary cluster. And so one great scenario here is we have professionals in New York that are running like trading applications and those trading applications need to do the needed latency and it need to be really fast. But they have many analysts in Texas who also want access to all data in real time. And so what you can do with the secondary cluster in Texas is use this cross answer and do all the number trading in Texas on your secondary site without ever affecting the performance of the primary site. It will have the same number of positions but it doesn't have the same number of nodes nor does it have the same redundancy configuration. So again, we're going to have redundancy two replicating to redundancy one. But today is a tricky one because we're going to get us out of it. So the idea is if you're looking at the Texas in one aquarium in Texas you don't want it to ever use as you can use in New York. So you can even have a different kind of machine that may be better for an M.O.F. workload than it was for an O.T.K. workload and just engage those racial groups maybe in a multiple ways that you are running on a secondary site. The secondary site is always going to be the kind of work in your primary site. Either redundancy scenario or redundancy active. Which is this one. All of these four nodes and three nodes are active. Not because of replication and active-active, but because we have master positions on every node. And then people praise always one of these master positions. And so we don't have to think about those like a visual consistency because we always read and write to master positions. We just spread the master predictions so that every node and every CPU and every location scenario there is no master and master. So you tap right to the secondary site. Essentially there is no way for any action on the secondary site to affect the primary site other than obviously using something similar that works in the source as a way. So I think in the kind of the end-up topics that I want to cover in this part of the piece here. So the most important is actually what makes those your part of the pieces of the chart piece. So one of those databases is a very, very complex idea like between indexes and countless indexes. And it's part of that the users would sort of superkite us and sort of want to use indexes. You know, people all have a special part of the index called the chart key. And the idea of the chart key is any index is not only to affect the site but at least one of them is in the chart key. And in two ways with the same value of the chart key are guaranteed to be on the same value. And you can use this property to keep up with doing and drawing operations. So basically you make a trade-off around how you want to distribute data and based on what you select for your art piece, you're required to be slower and people are required to be faster. In terms of innovation, the index is between every years and weeks as a single. So weeks are actually what we call which has been originally also a very super product. And so that the people is very similar to my people as I've said, it's it's very similar to my memory, et cetera. And these are just working in a city where we don't know what the question looks like. And the other way to keep people to the leaves is to keep people with their local data set and the chart you're doing yourselves. In secret between those as opposed to the standard of innovation, like sterilizing operator free and sending across the network is important because it's a very large entity. So in a single, you can write explaining on that paragraph what's going on in the editor and see exactly what it's doing on the leaves and going entirely to those results. And the interesting thing for them is that any operator free can be expressed with a small amount of data. In terms of what we can express in order in this slide. In fact, we have this clinic in the engine for free SQL. So just like we can take that as free from an operator free in the SQL plus we can also generate the SQL. In the sense, again, what kind of a building block for how our engineers own this community. We pass the query on the aggregator we're going to pre-proceed into what we want around the leaves and then we just run two SQL on pieces of the tree and send those SQL queries to the leaves. It's also a great primitive value that's going to be a useful tool in general. We have some of the parts that are going to impact the SQL. If you mention that one of the key investors of the main operator was to SQL, first of all, we can look at SQL to SQL tree transformers and kind of validate correctness. So if you do all the knowledge that we have I highly recommend investing in SQL and then in terms of joins there's sort of two techniques that we exposed. One is called reference tables. It's particularly for running a broadcast join because we're going to take a smaller side of a join and send it to the entire table to the partner. A reference table is kind of a pre-broadcast joinable table. So it's a small table. These are usually less than 100,000 rows and that kind of dimensionally would mean we want to be present in full on every day. And they're very cheap and available for running any kind of on-the-carry join. And the other thing that we exposed is called a forward-shark key. The forward-shark key essentially lets you define the same type key in two-fold tables. And if you join exactly on that index then join is particularly vocal to band-aid and there's no network traffic that possibly needs to be integrated. So forward-shark key joins are usually for use cases like in TBC of orders and one-hands where you can co-locate a order with other one-hands and still get some amazing starting properties of putting one-hands across the cluster and not have to reshuffle that table every time you run a join. So this kind of example query will just let camp start. It's a pretty simple query. In fact, all we do on the index table to complete this query is let's let camp start on what we put this line and just sell these results together. I think we're going to get to work on what's in there. And finally, I'd like to mention that when we looked at these really long-running SQL workloads that take like two hours the simple model that we have with reference tables and flow showcase starts right down. If you're building a web application then reference tables and flow sectors have been completely covered. You can express pretty much any application available performance with these techniques. But if you're doing more complex on a little bit of data exploration you need to be able to shuffle a table across the cluster. So only investing in over the next six months and going forward is to learn cool new projects in the execution of the last half of the legacy space. So one thing that we're building right now and finally this is being built by some of these students is the SQL Blaster Shuffling project is entirely taking place. We're building the ability to express and reshuffle our operations across the cluster within our model of sending people from MVAs to leads. And so we've got this primitive code arranged and we've disclosed the results of the SQL query as a discriminative test data. So this is an example of a simple bootback that we've been mailing to shuffle to complete. Generally, you want to shuffle bootby queries for very, very high probability. So listen back to this rationale which is for like and this right there. There are more tables where you extract a result of your simple queries and the end result is basically going in an only pool to get an average for the specific shard data that you're looking at. With these two queries you can express all the reshuffling that you need to be able to do the standard to meet and shuffle going and shuffle bootbyes, finding the introduction of all of the antennas and your discriminative query execution techniques. The other thing that we're investing a lot in is an optimizer. As of now, an optimizer is very good at simple heuristics. Very good at complex optimizations to a node. Very good at index selection and azure ordering on a single node. That is no concept that's intuitive. The next major project right now is really pursuing a possible optimization. And the other really good project and it's been an engaging movement where we have today is we don't want to have a line on the standard SQL to SQL transporters. So that's the thing that's being reached into the face. There's like subparings in correlation and tonsil folding of expected and all of these kinds of transporters. This way every database and remember unfortunately I'm talking to a lot of database people basically starting with the transaction of these pre-transporters and going for that. Now you can pass these cross-transporations as well. A lot of the pre-transporations are subparity defilations. You just always want to do it. Or changes are like an in-subparity to a subparity to a tree. There's certainly a lot of that too because you're a little bit of a flexibility in executing a play. Only then you want to do something out of the companies. The one thing I'm actually right now is not cross-transporters but we will do that at least. You can have a table and the other thing is your next selection. So you can have a table with cross A and B and I'm going to give that to you on these columns and maybe a query which starts with a cross A and B and X will be your next. We actually have a week to generate this into the plan and because it's a quick question we can do a quick check on the other columns which the index would be more effective based on the queries on the query. That's a great question. The question was how do we kind of straddle between queries that run in memory and queries that we can do that between memory and disk? We're actually talking about this right before the top. What we do right now is this real story is in memory. So real story data must run in memory and if you try to go past the capacity in memory what do you do in memory? You can't write any more data from this real story than in memory. The plan store transparently is moving data between memory and disk. And right now the way that we expose the distinction between the real story table where I want this to be conversory level we don't transcribe new data between these two stories. And the reason we do this is we want to get the market with what we can produce so we can see what customers are interested in. So we're trying to find out who are used to data plans like that. But the things on the record capacity is almost always the driving factor for customer enforcement decisions almost always the driving factor of the welfare system. And the reason there's once you enable people to write home this data and tell them it's real and it's one of the people that are not used to the idea of the fact that they're running a biome's memory in the system. So the principle of data planning is really trying to find the continuity and then continually kind of struggle with the fact that they're running the biome's memory in the system. It's very fun to do that right. And so it's fascinating that in any of these systems there are really, really important problems. So it's like the simplest we've shown where the power for data directly in the system be able to see that there's one allocation per row. And so we've put a lot of memory effort into like parts throughout the inventory to reduce the memory over that per row. So literally some of the typical communities are things to think about after the talk. In memory, a lot of the new kind of work was almost ready to be called possible. So any able to do things like having an application running on a database and putting it in at the same time to keep designing or architecting a system you've never do before. But now these things are so much possible and they're constantly pushing the bounds of what people thought were possible and what people thought were as possible. And the thing about taking real problems and taking measures is about applying new solutions. And then doing booleans is a way to keep crystals out ways of looking at this where they're still doing booleans. We still have to straight out and figure out what the solution is. But because we're already reading out the disk we won't be right to the distance completely different than on the databases and it's up to different ways. So we kind of have a call out open to our new data structures but I'm still surprised with the boolean and the SQL that we were put a lot from kind of to show programming. These people come in and they kind of apply the techniques in different areas of the engine because this is probably not a lot of work that we've done yet that would be kind of tryouts available in SQL. And then storage is sort of completely different than we did before. There's all kinds of new solutions there and it's new tryouts. These are things that tend to be the kind of platform where everybody is saying what you think is based on the set of tryouts that you're out there looking for. But these components and these tryouts come out into very good components and a specific set of workloads. So when we try to change the world of these systems will completely change this. And finally, we're obviously hiring and so I'm interested in talking about projects in SQL around SQL today and what we're going to be able to talk about. And so I'll be there if you still will be there in two or three years we'll continue moving there as well. But I think we're out of time, but I do want to discuss something. I know many of you are crossing. We don't. We do handle and we take on a focus. The reason is in the storage format that's much, much, much more likely to have some sort of core-core barrier. We tend to not have this problem. And if we do we only just work just a little bit this is because I can pretty tell in practice that it is involved more than just in SQL. It's a rare enough resource and solved enough by checking all these here that we like to disk and start with the engineering or performance of what I said earlier. For in SQL, what we do is kind of teach them. At the screens that we are in the way that we think about some problems we look at the issues that are really running for stores at the time that we have them or we sell more of them. The largest questions that we have are the comments and notes. Whereas Google is dealing with like hundreds of thousands of notes. So the average question size for in SQL is more than 10 notes. And then there's a question of what's in your clusters. And that's a problem that we have to invest in. And that's what we've done in other cities. We've also dealt with SQL and other related databases like Renaissance and White. I will answer this question on this slide. What are these features present in other digital databases? So, essentially we've been investing very significantly in performance, usability, strong learning and following the key for a great optimization. Key for white is kind of designed to be a menu, but it's not even available to people who might not like to. What is, doesn't let you address the question areas. But it's also a movement of scalability. So, when it comes to 10 notes to come up, outside of a very difficult competition. So, it's most of you are thinking about how to respond to that. It's very straightforward for us to come up against. That's the kind of suggestion that we need to try out.