 Okay, let's get started Good afternoon everyone. I'm Nisha Tala Gala from Fusion IO and Rick invited me here today to talk to you about some of the things that we've been doing with New interfaces and non-volatile memory. So quick show of hands. How many people here were at Rick's talk this morning. Oh Excellent, so I'll try to sort of pick up where he left off And so one of the things that he talked about was you know, sort of the Arrival or pending arrival or the perpetually pending arrival of the new memories and you know Some of the sort of the crawl walk run kind of approaches that we can take To you know figuring out how to deal with them And so what I'm going to share with you today is some of the things that we've done at Fusion IO Which pretty much you know depending on what you are talking about fall into either the crawl the walk or the run category and Some of the software that we've written as part of the process that we are Planning on open sourcing or are in various stages of being open sourced so Essentially kind of the progression that we've gone down just to very you know at a very very high level of You know talking about starting with flash and then moving to some of the other new memories is You know People have started looking at flash as sort of a hierarchy between you know non-volatile memory and disk drives or As a straightforward replacement of a disk drive So there are quite a few big data centers now that are effectively going all flash So they have you know so flash has become cheap enough that there are people who are pulling out disk drives entirely From their data centers now So the the place where these API's and these new interfaces sort of come into the mix is when we talk about Using in one form or another flash and these new wall You know upcoming NVM's as forms of memory as well as sort of non-block forms of IO So You know the the devices that at least we make and most SSD vendors make today live You know present themselves to the world as simple block devices Regardless of how they're built internally and there's a variety of different ways that these devices are built internally So when we decided to create sort of new API's and new interfaces along with a bunch of other companies We broke these new interfaces down into effectively IO oriented models and Memory oriented models and the reason we did that is because in talking to sort of our customers and people who write Software that runs on our devices. It was fairly obvious that you know Every programmer either love to program the memory way or love to program the IO way And if they've been doing one of those for a really long time Then they really are just going to continue to do that and they are not really going to switch over to the other one And so, you know, it's sort of imperative that you have, you know Support people who think about the world as IO as in I send a request sometime later I get the response back and my program was sort of structured from the ground up to deal with that and Then the group of people who think ID reference a pointer and something happens and my program was structured from the ground up to look like that and so kind of with that in mind we sort of created effectively two kinds of What what you would call programming models or API's or interfaces Where some of them fall into the IO category where no matter what they do They essentially look like an IO model a request a response and then a set that you know fall into the memory category Such that regardless of what they do they fall into the you know I am map something or I you know malloc something and then I you know use pointers to reference it and then I Manipulate memory and certain things happen. So we actually kind of structured both of them and in terms of the way we sort of handle media is We put both of them to work on effectively a hierarchy of NVMs, so all of these or a majority of them work on flash They also work on a hierarchy of quote-unquote new NVM and flash so the The x the IO oriented interfaces that we've created sort of fall into a few categories Primarily the changes that we made to the IO interfaces are to add in addition to the existing block read and write kind of things Transactional updates so we have you know atomic rights for example, which we developed with HP and Rob who's here You know are one of the biggest you know changes that we made to sort of enhance the IO interface model We also have a set of interfaces that we've created that are slightly higher level There are more key value kind of get put kind of interfaces that are essentially You know provided by libraries that are that tie those interfaces very closely to flash and NVM behavior In the land of memory access what we did is we sort of divided the memory access into sort of a spectrum that ranges from purely volatile at one end which we call extended memory and purely non volatile byte addressable fine-grained updates at the other end which we call auto commit memory and Things that sort of sit in the middle where which are you know can be check pointed to persistence But most of the time are volatile and so this and this again is this is a This is a Programmer's view that is not necessarily the same as the hardware view So you can actually take any of these models and implement them on the other hardware They will just get cheaper and slower or you know more expensive and faster depending on how you look at it So just to kind of give you an idea of you know sort of how we think about this and I this is sort of should be You know a repeat of what was in the previous slide Iosemantics means you open a file descriptor you you know read you write you seek you close This is the standard stuff What is added to this is? Something like a vector Transactional right you can write multiple blocks atomically or you can open up a key value interface You can get and put you know immutable objects. You can do this in groups and so forth The Memory access pretty much the same way you would do before you know you allocate virtual memory You do with it the same things you would always do with it But you know you would be able to do this somewhat Performantly through a hierarchy of you know DRAM and flash in the land of purely non-volatile Memory as in purely non-volatile persistent memory, you know the the notion here is that you would allocate Persistent memory you would map it to your virtual address space You may call something like a checkpoint if you want to create a consistent version of it and You know have the ability to both name it as well as you know Remap it back to the same or a different virtual address upon restart And then what you do with it above and the terms of you know the data structures you build on top of it And so forth are sort of another layer that can be developed you know almost independently So so we've actually developed a good chunk of these things and we and the way we've sort of done this and this kind Of speaks to some of the things that Rick was mentioning is you know developing a lot of these new interfaces They're hard. You've got this sort of chicken and egg problem You know you've got the interface then you've got the app that doesn't exist You know without the app you don't have the interface without interface You don't have the app and then it goes on and on So what we've sort of done is we've kind of attacked the problem sort of at every possible angle So we developed some interfaces that were there so that applications could use it We modified some applications so that we could see how the interfaces could be used and Simultaneously we work through the standards bodies to you know put process You know proposals for the underlying commands that you know are the primitives You know through the standards process and so one of the things that we did you know one of the very first things So we did in this area was for atomics and so the primitive here is a fairly simple primitive the idea is that you can take a single IO and Executed atomically and this happens to be a very good fit for a flash device because flash devices are naturally Actually do this because of their you know You know write anywhere kind of nature and so they actually can be done very fast And so this particular one now we implemented it as part of our FTL And then we started out by Integrating it with the mysql database and this is basically where we see you know saw some of the initial sort of use cases for it so because it's implemented as You know part of the native capabilities of the non-volatile memory. It's almost as fast as a non-volatile memory But you know in terms of as almost as fast as just doing regular writes But because you are now making this capability Available to an application then the application can avoid some of the redundant writing that it does with logs and so forth And so at the application level it results in about half as much writing as you would do otherwise so This is an example of you know the mysql database through our file system translation layer that is doing atomic writes so what happens here is you know Prior to integration with atomic writes if you had you know fully acid compliant databases Then it was doing you know logging and multiple writing of data to you know Maintain acid compatibility if you turned off that multiple writing then you lost your transactional correctness But your performance got a lot larger When you use the native atomic IO capability you get about the same amount of performance as you would get in the Non-transactional case previously, but now you are fully transactionally safe at the same time so this is sort of an example of the you know the kind of You know benefits that you can get sort of by integrating the primitives kind of level so this particular case for example the two Major forks of mysql Maria DB and Percona both support the atomic primitive now So Maria DB supports it in the mainline code and Percona has published patches for it So so those who are sort of you know kind of an example of how we've sort of worked from the application to the you know FDL to the standards and this was mostly done through a kind of a somewhat iterative process involving the database and then us sort of you know Refining our work on it and then you know going back and forth that it took a little while But you know it gives an example of you know some of the ways that we've sort of done some application work that can be You know made available. I don't know Rob. If you want to comment on the standards for this You have your Well, yeah, and T10 the scuzzy standards body we've been standardizing Scuzzy commands to do atomic rights and atomic reads as well as a scatter gather. I know No, it's a new command called write atomic and read atomic And and also write commands called right scattered and read gathered That's implementation dependent. Yeah in our implementations. You can write up to four gigabytes. That's our current implementation Yes, or In our implementation a total vector size of four gigabytes. They don't have to be contiguous The way the T10 standards written in the device reports. It's a maximum size Yeah, so we support effectively a form of a capabilities query which you can query and it will tell you that Atomics is supported. Here's the maximum size is the maximum number of non-contiguous elements It will deal with stuff like that which is sort of a a softer interpretation of what you have in the standards so and then I think I have a link later in these slides to the Active proposals that these guys have the works so so the next sort of major area that I'll go over is Essentially sort of the plumbing exercise that's involved in getting some of these, you know APIs so one of the things is that when you have the API that surfaces from a device Practically speaking no one is going to use that API unless they can also have it with a file namespace and So because the file namespace is so closely tied in to all the data management that people do and so even if they have their Primary application is doing, you know fast access through some sort of a custom API Every primary application that's serious enough to do that has an ecosystem of secondary applications around it That are you know their backups their utilities their other, you know random maintenance things that are not going to change to go through the You know new APIs and so as a result sort of in as part of this kind of crawl walk run thing Even if you have API sort of surfaced you you know by say a device or a new kind of device You also need that same data to be accessed through standard APIs, you know File read write kind of things for the rest of these sort of the ecosystem surrounding that app and so one of the things that we built is what we call a native file system now So this one is an interesting thing in that it sort of serves two purposes for us It's a file system that is developed from the ground up to be optimized for non-volatile memory And so even if you don't use it with these you know for these API's It is also the fastest file system we have to date right now for accessing our devices Because it's almost as fast as a raw device It has the second benefit that if you want to use these API's then you can actually use them through a You know POSIX compliant file system and that in every other respect it will function as a regular file system And so this is a piece of code that We've developed internally it was its lead developer is Nick Piggan whom I think many of you might know and It's something that we are going to open source You know before the end of the year it plugs into the VFS layer the same as any other file system Primary difference is that it really supports, you know flash and other kinds of NVM devices So yeah, so a few notes on DirectFS It appears like any other file system in Linux you can run unmodified applications on it You know some of the people that are using it in trials right now are using it just for that They're not you know even using it for the primitives, but it also exposes the primitives now the fundamental difference between DirectFS and You know most of the other file systems out there is if you think about a file system You know it performs a few different functions. It provides a namespace. It typically does block management It has some kind of you know crash recovery, you know consistency kind of capability to it So DirectFS is different in that it actually only provides the namespace It sort of recognizes the fact that non-volatile memories do their own block Allocations and their own capacity management in a way that's very specific to that non-volatile memory So flash for example does wear leveling and things like that the new non-volatile memories do a whole other thing entirely and so what DirectFS does is it Uses an abstracted namespace that's based on virtual addresses and not physical addresses and It basically leverages the block management and the capacity management in the underlying NVM and adds only File names and directory hierarchies. It also relies upon capabilities like the atomics that essentially allow it to benefit from the fact that the underlying device is a persistent Consistent crash-safe device and therefore it actually has no journal of its own and so as a result it's actually an extremely thin layer of code and so the current DirectFS is a fully functional file system and Is I believe some of it in 8,000 and 9,000 lines of code so this gives you sort of a you know cartoonish view of What you know how DirectFS looks relative to some of the other FS's? So DirectFS maintains the metadata management. It has notions of iNodes has notions of virtual Extents you know keeps names directories permissions things like that But it does not do any block allocation. It doesn't do any mapping. It doesn't do any journaling You know it doesn't do any crash recovery It is fully it moves from consistent state to consistent state through the primitives available at the lower layer and Has something like four different primitives that it interacts with the underlying NVM as and those four primitives Or have all been you know stated publicly and we're in this in various stages of Introducing them to the come the standard communities the atomics is the most fundamental one so this is a Quick example of sort of some of the power performance that DirectFS gets and so the way to kind of think about it is the measure of DirectFS for IO oriented Applications is we want it to be as close to the raw device performance as possible and So what this chart has is you know IOPS or bandwidth in megabytes per second for random IO for you know various sizes and In each of the cases one of the bars is raw and the other bar is direct FS and we typically get you know anywhere from like say 2 to 5 percent within 2 to 5 percent of raw performance for DirectFS because of the You know very low code path and the very you know very very simple sort of nature of what it does What this chart has is essentially a slightly different thing, but for these primitives So if you have a primitive like atomic rights You can access that primitive through the raw device or you can access it through a direct FS file And what this shows is that the performance of the primitive is nearly identical Whether you get it through the raw device or through the file And so this you know helps people to sort of adopt the primitive because they get the file namespace for their regular data management tasks, and they don't you know cost it doesn't cost them a great deal in performance so another area that I I'll talk to very briefly about Is essentially the so the notion of like say Supporting KV kinds of APIs, you know for for typically common no sequel types of applications Many of the no sequel applications, you know, they are distributed, you know in their nature But inside each of the nodes support some form of a get-and-put kind of you know model and some of these models can be adapted fairly well to flash by recognizing sort of certain things So for example, one of the things that you can recognize is that many of the Many many of the KV stores frequently have a notion of key expiry about them So keys and information is only valid for a certain amount of time and after that it basically just expires and So if you when you combine the notion of that expiry with the underlying garbage collection of the flash You can actually get a lot of effectiveness so that it can essentially auto expire information and reduce the right amplification considerably and Those are sort of the kinds of things that you can you know combine with when you you know Try to understand sort of how common access patterns that are key key value store oriented can be mapped on to flash devices And so this is one of the things that you know we've done with some of our KV types of APIs and the way that we've structured this in particular is This lives pretty much entirely on top of the primitives the same primitives that drive direct FS And so this is effectively a user-level library that we also intend to open source that Sits on top of the same set of primitives so people don't necessarily have to use this But if they have a KV kind of application, you know, they can essentially put this on top of the previous You know primitives layer and use that to sort of adapt the underlying capabilities to this kind of an access model So these are some of the some performance numbers that compare sort of our KV implementation with in this case memcache DB and So I mean it's a it's a fairly straightforward thing, you know the memcache DB doesn't have an iterate Which is why the iterate chart doesn't have multiple lines, but the But in the other cases what you're really seeing here is that you know There are scaling limits in this particular Case of memcache DB that are you know essentially show up at larger number of threads and it doesn't allow it to get the full usage of the flash device and You know if you have something that's a little bit better Tune to what is going on underneath in the flash device itself. You can just scale a lot better So I mean this is specific to memcache DB It does it's not a general problem with KV stores, but we've also done some integration with Redis for example And we see some benefits when you know sort of integrating with something like Redis and Redis is a very Commonly used, you know KV library any questions on those before I move on Okay So going to some of the memory access Kind of so those are some of the previous things that I talked about were some of the IO oriented models that we've talked about and what we've developed Going to more of the memory oriented models As I mentioned earlier, you know, we've sort of chosen to divide our Memory-oriented work into something like three different areas One is what we call extended memory and extended memory is basically fast swap The idea is that there are people who want to use large amounts of DRAM But they can't afford to do it either because they're just aren't machines that can actually have that much DRAM in it Or if they want to have a large amount of DRAM They end up with very expensive dims or four socket machines or something that essentially makes it, you know Impractical and so what we want is to create Hierarchies of DRAM and flash that people use like virtual memory At the other end we have what we call auto commit memory Which is essentially support for physical persistent memory of the kind of things that we talked about this morning You know things that show up as physical memory, you know either on PCIe or on the memory bus and our byte addressable and byte level persistent and You know what we put in the middle, which we call checkpointed memory is the ability to take that same set of persistent memory API's but realize them through hierarchies of DRAM and flash Because you know to the point made this morning Everyone isn't going to have a persistent memory in their machine and so if you have you know API's that work for persistent memory You also need those API's to do something halfway intelligent when there is no persistent memory in the machine So the checkpointed memory is the ability to get the persistent memory API without the persistent memory through a hierarchy of DRAM and flash So the work that we've done for extended memory is Actually available on Github as a set of patches. They are patches to improve the performance of swap and they primarily kind of you know if you think about sort of swap No swap was never intended to be actively used as a hierarchy of DRAM and anything You know swap is a last resort You know kind of you know failure avoidance kind of thing And so it makes a bunch of decisions that are appropriate for hierarchies of DRAM and disk such as you know It takes a lot of time to figure out How what page should be replaced? You know when you move data between disk and DRAM and that is not the right decision when you're dealing with the hierarchy of DRAM and flash so a lot of this this is this is all public. It's available for people to test and And you know in at least now internal benchmarks It's shown a fairly dramatic improvement over what is there already and the key here is that if you want to you know Tier data between DRAM and flash the best you can do is to exhaust the performance of the flash device You can't do any better than that and so you know in the internal tests that we've done This these changes can keep up with something like three of our fastest devices ganged up together And so that's a you know that pushes the the swap subsystem, you know to a much better place than it currently is So so that's kind of that's sort of the stuff that we've done for For extended memory and from a programming model kind of perspective There is no programming model changes for extended memory. It is intended to be swap It's just swap running faster. So in every other respect. It's expected to behave like swap for the land of persistent memories the model that we've essentially Provided here the model that we've created so far and this stuff is sort of very early in its development is We see the memory as being part of a file system you know it is part of the file system namespace and so The the the way that we've actually done it so far is that you have a file system in this case It happens to be direct FS and that file system supports both flash and the persistent memory and hierarchies of DRAM and flash as ways to represent the file and you can use you know read and write access to the file but you can also use things like mMap and Extensions to mMap as ways to sort of access the file and you can create checkpoints of these persistent memories and those checkpoints present themselves as Direct FS files by another name and you can do anything you would do with them that you would do with any file So when you create a checkpoint it shows up as another direct FS file with the name of your choosing and so the idea is that it integrates with the existing storage namespace but is internally sort of able to comprehend this notion of You know multiple different media types some of it being flash and some of it being you know something that isn't flash and The amount of it that you have is going to really depend on You know what's available? So I mean there's a lot of there's a lot of expectation that these new memories will show up And they'll be the price of DRAM Chances are before they hit that point They will be more expensive than DRAM for a while and then they will become the price of DRAM And then if they're successful, they will become cheaper than DRAM And if they're really successful, they will become cheaper than flash so at any point in time there's going to be some amount of it and Sort of the approach that we've taken to this is we are not assuming a certain amount of it We're assuming that there will always be a hierarchy and depending upon how much is practical There'll be different usages, you know, there are things we can do with a very tiny amount of it There are things we can do with a medium amount and there are other things we can do with huge amounts of it So where we know so from a but from an architectural approach. We haven't assumed a certain amount We just assume they will always be it will always be part of a tier not you know a thing all to its own so a Couple of weeks ago or maybe three or four weeks ago We announced a sort of a undemonstrated integration of this Of a form of persistent memory with software and what we did here is we you know We have a small amount of persistent memory. That's essentially part of one of our devices It presents itself to the operating system as physical memory We essentially have a kernel module that takes ownership of this physical memory and Doles it out to any through an API to anyone who wants it So so effectively it shows up as a number of pages Those pages are then you know, they are accessible the physical memory So they are you know granular they support byte and cache line updates so Then what you know, so this kernel module that we have essentially owns this memory and Has and there is an API by which other kernel modules can get access to the memory pages And we have you know some mechanisms that you can recover the pages and things like that after they you know after a restart This these pages are consumed by direct FS You know as it needs as you know another physical sort of persistent resource that it can get access to and is surfaced up to applications as You know part of a direct FS file and then what we put on top of that is that through a set of user-level libraries we integrated this persistent memory with a MySQL database simulator called in OSIM and used it to accelerate the log writing of that database So a log writing is a one of the usages that you can do with a very small amount of persistent memory because the Because the access pattern is predictable So if you had you know highly random highly unpredictable Persistent memory you you know or workload pattern you actually need a lot of persistent memory Otherwise you'd be thrashing your crash like crazy But but when you're trying to accelerate a log you actually don't need all that much and so What you see here in this chart is three lines the the blue line is this particular Database simulator when it's running all on flash data on flash logs on flash the dock Blue or almost black line is when we actually turn off the logs So by turning off the logs, it's like how fast with this database run if it's logs were infinitely fast and Actually, sorry the yellow line is when we turn off the logs The dark blue or black line is when we put the logs on a persistent memory So you can see that you know by putting the logs on a persistent memory and a and database logs are typically the place where you Know the performance of a commit is driven because it's usually when you write to the log is when you commit the transaction And so when you take the logs and you accelerate them by putting them on a persistent memory then you get almost as much performance as if you turned off the logs entirely and Incidentally about twice as much as the performance of if you were logging to the flash itself so the underlying flash in all of these cases is the same and the logs basically Present themselves as direct FS files and to the rest of the you know ecosystem around the database They are as accessible as they always were they are effectively at a file And so your backup utilities see them as the same files that they were always seen them So what we see when we actually do this and we try to you know access an application from an application What we see when we try to access say a direct FS file that is backed by a persistent memory is You know the kind of the expected substantial latency reduction because you're not going through the IO stack anymore You know you're doing CPU store operations with you know certain operations to flush CPU caches and the like And the latency reductions are substantial The other interesting sort of thing is that you know if you are you know if you are doing small updates and you're going through you know a file system that is Forced to turn those small updates into four kilobyte writes. It ends up doing a lot of four kilobyte writes Whereas if those updates were doing their native sizes because they are now in cache line multiples It turns out that even when you destate to the flash You end up writing a lot less to the flash and it's a very very noticeable amount that you can write You know in terms of the savings of the rights to the flash themselves and Then you know Obviously, you know we get a lot better CPU overhead. So in this particular case, I think We're able to do something like somewhere in nine and ten million updates to this memory with one CPU core And that's simply because you're not over you know going through the stack and Yeah, and we already talked about sort of some of the direct FS kind of integration So what we have is a combination of sort of an extended m-map and a form of API So you can do an extended m-map that says I can m-map my file into persistent memory So the file is resident on flash But where previously would have mapped a flash resident thing into DRAM you now map it into persistent memory So this notion of you know being able to destage data back and forth is still present You know we also have some native more I oriented API is just because for applications that are doing logging for example Doing an I oriented API is much more friendly to them. They already know how to do that and then you know The point made earlier, you know, a lot of this is it's about doing tiering So the half the battle of dealing with you know Media medium amounts of persistent memory is knowing how to tear it in some areas. Yes, but Yes question is do we have any comparisons between the I oriented model and the memory oriented model We do but I'm trying to think of whether there's something I can say that's sort of meaningful I think I think the crux of it is that they're just two very different patterns and So for example when we're dealing with the persistent memories, you know The mem copies and other things that show up when you do the I oriented model show up as real overheads because you know, we're talking about a place where individual CPU cycles matter, right? So we can tell when something takes, you know Several hundred nanoseconds as opposed to 200 nanoseconds for example on the other hand You know when you're dealing with hierarchies of flash and DRAM for example One of the problems with the memory oriented model is that it is inherently synchronous So every thread can only have one IO outstanding Whereas the IO oriented model is lovely in that, you know Every thread if it uses a IO can have a large amount of IO's outstanding And so the IO oriented model can actually do better in that case So we do have the comparisons, but there's no clear win there on Either side I think you know So for the for some of the persistent memory stuff that we've done I mean we can we see a notable overhead for the IO oriented model Any other questions? Okay, so Just you know just sort of a reference for some of the open source stuff that we're doing and have done At the application level So some of the atomics integration with MariaDB is So MariaDB for those of you who don't know is a fork of the MySQL You know database and MySQL is by far the most popular open source database out there right now The MariaDB core, you know support for atomics is part of their mainline The Percona branch has support for atomics as part of a downloadable patch We have some other work, you know sort of in development at the database and some other areas that We are not announcing right now, but hopefully can once they are more mature The primitives that we've developed We intend to publish the interfaces to them We also intend to open source direct FS which is both a and it's also it's a standalone You know functional file system, but it's also a good example of how the primitives are consumed and re-exported So the primitives, you know sort of in code can be seen in the You know in the direct FS and then the various API libraries that sit in user space are all intended to be open sourced We have some Proposals in flight in SCSI Standards committee in T10 that Rob alluded to that we've been you know working on with our partners like HP and I've listed a couple of them here the ones for the atomic rights and for some of the Vectored forms of the atomic rights, and we're also You know an active member in the CIA NVM programming twig Yeah, so can you use direct FS on other non-volatile memory technologies? Or is it currently just fusion IO parts currently? It's just fusion IO, but this is what I was referring to sort of direct FS relies on something like four different primitives And we've got standards, you know proposals in the works for all of them You know I Ideally I would also like us to develop sort of a layer that helps direct FS run on top of others But we haven't we haven't announced such a thing yet. Yeah, I have one question So you said that the disadvantage of the non-volatile memory against the IO oriented model Is that you don't have a synchronous IO with the non-volatile memory? Yeah Oh, no, what I meant is that the sort of think of it the memory access model It's not about non-volatile memory. Oh, yeah, this model is inherently synchronous, but yes, is it really because you know the CPU does stuff like Prefetch of the data, you know, so so it does actually some prefetching Like reordering and so on so inherently CPU itself does something equivalent of a synchronous rights Yeah, and this is this is what sort of it comes down to and it's true not just for the CPU But also if you are doing tiering DRAM and flash right when you have the application only being able to ask for one thing at a time per thread Then you probably want to do some pre-fetch and some right behind if you want to get more parallelism Yeah, but CPUs already are doing this yeah with the ordinary memory. Yeah, so I think You know we have it's just the different scale. It's a different scale So so that the nvram is slower So like CPU would have to be much more aggressive in prefetching. Yeah And also it's you know, it depends on the IO pattern, right? Like for example with asynchronous IO I can fetch a bunch of completely unrelated Completely arbitrary things from the IO device Yeah, as prefetch is you know a smart Well, if you would be like the CPU, you know looks ahead into the code Yeah, and tries to guess what you would need and prefetch this from the memory. It's plausible. It's certainly plausible In theory, there shouldn't be such a it's quite it's certainly plausible Yeah, I don't think we've done enough work to know how effective it is right now for what we're doing But the theory can be applied Any other questions? So that's pretty much everything that I had The one other point I was going to make was there was a reference this morning about the block IO stack I don't know if Jens is here But I wanted to just add a pointer to the work that he has been doing To scale the block IO stack that was presented recently and you know, there's a link They basically they've been doing a whole bunch of stuff with multiple cues and have demonstrated something like pre-million IOPS out of the block IO stack We know and some something like 10 million with a very large number of cores, but there's a bunch of detail in there So and I'm sure he'll be here shortly to tell you much more than that That's pretty much everything I had any other questions. Thank you guys