 Alright, hi everyone. Thank you for coming for this last talk on this conference. I think it's time to start. So my name is Krzysztof Czeriwal. I'm a senior software engineer at Intel for the last over two years. I've been working on persistent memory programming, in particular, NVML, nonvolatile memory library implementation. And I prepared this presentation together with Tomasz. You could just listen to his talk about the Leap C++ extensions for persistent memory. And I'm going to tell you about the tool we created as some side work of our NVML development, the tool that would help you to detect errors, specifically persistent memory errors. So, as usual, for all those persistent memory talks, we start with persistent memory programming model. So we have seen this slide probably a couple of times today. So the most important part is on the right side. And as you know, we built our libraries on top of a NVM programming model, which is actually based on memory map files. And the essential part of this model is some PMEM error file system. So in practice, this is some file system which has a so-called DAX feature enabled. It could be perceived as a kind of RAM disk that runs on top of persistent memory. It's aware of that. So if you map a file that is stored on this file system, there is no page cache, and you have this direct low-end store access to memory. So this is an interesting model because persistent memory is a new tier of storage. This is something between the DRAM and storage, actually. So in some aspects, we could treat it as a ... It's more convenient to treat it as a storage and some other. It's better. It's more convenient to treat it as memory, actually. So this is like a sort of wave particle duality of light. So this part addresses the issue with how to locate your data in persistent memory. You don't need to remember where it is located, what is the physical address of your data. You can just use the path names to find your data, which is stored in persistent memory. You can also manage the access permissions and so on. But once you map the file to memory, you can have a direct access. So from this point on, this just behaves like a regular DRAM, more or less. But still you need some software to, let's say, when you map the file, you have some big bunch of, big block of persistent memory. You need some software to carve this block, this region of memory into some smaller objects and also to do some stuff to actually flash the data to persistent because it's not so simple as it looks like. We talked about it on the previous presentation. So I will talk about it on the next slides. So, yeah, the persistent memory programming is all about face safety. I like this because this is all true. I have found it on some presentation, but I don't remember who was the author. I believe someone from Red Hat or maybe HP. Anyway, but this is all true. And why is it true? So some people naively think that when you store the data to persistent memory, it's already there. It's already durable. It's not. Actually, this is the long way the data needs to pass through to get to the so-called persistent domain. So when you store the data, when you do some move instruction, store instruction, and you want to make this data to reach the DEM, actually, you need to flash all the cache, CPU caches and also the memory controller write queues buffers. So you can do it in a couple of ways. So for the CPU caches, you can use the CL flash instructions. It's a labor for a long time. It flashes the given cache line. But recently, we added the new instructions, which is CL flash optimized CL flash, also cache line right back. So this is the instruction that flashes the given cache line but doesn't invalidate it. So the data is still there if you would like to read the data soon. The other option is if you like to bypass this entire path, you can use the non-temporal stores. So the data would go directly here. And also there is write back invalidate instruction which invalidates all the caches, which is a heavy one, and probably you don't want to use it. You can use it actually only in the kernel space. And originally, when we started the NVML development, the persistent domain was this smaller red box. So to make the data persistent, you also had to flash the right pending queues of the memory controller. So the example code would look like this. You do some stores. You flash the specific cache lines where this data resides. Then you need to put some memory fans to make sure that those instructions are really executed. You don't need to do it with CL flash because this is strong ordered. But those two are weekly ordered, so the fence is really required. And then once you did that, you had to issue the P-commit instruction to flush this drain, these memory controller buffers, and then it also had to be followed by a memory fence instruction. But recently, Intel and together with some other companies decided that if you want to use non-volatile memory on your platform, you actually need to support ADR. Actually, your platform needs to be equipped with ADR, asynchronous DRR refresh feature. So because of that, the persistent domain becomes bigger. So if there is a power fader, actually this pending queues will be flushed by the hardware, so you don't need to do it explicitly in your software. So now this is the apricated way, how we flush the data to persistent and to persistence, and here is the new way. So the P-commit has gone, which is fine because it's also the heavy instruction and kind of problematic. Okay. What other, so the flashing data to persistence is one of the problems, you need to remember it, so it's not just like store and forget. The other problem is that this 8-byte atomicity stores. So if you're crossing the 8 bytes, when you're going to write more than 8 bytes, if this operation is stoned by some power failure or application failure before it gets flushed, then the result could be any of that. And actually the first one is okay. You can still see the old data or uninitialized buffer. And the last one is also okay. So you were lucky and the data get flushed actually by, you know, because of the cache pressure, the data has been flushed to deem without actually reaching this code. But those are probably not what you expected. And okay, what if we have the instruction that could atomically store 64-byte or more? Okay, that would be better because, you know, maybe in some cases it would help. You can do some more atomic operations using single instruction, but still then if you have a chunk of data which is larger than 64 or whatever, then you still could reach this issue. So yes. And this is the example with string copy, but actually there could be like multiple stores here. And you never know what happens when, you know, all these stores are executed but not flushed yet. So you know, the first problem is that the compiler could reorder these instructions, which is perfectly legal if you don't put the explicit memory barriers here. Also the CPU can reorder the stores, which is something that you actually, it's hard to control. And so when you do some multiple stores, you need to care about, you know, whether you need all those stores to happen like all or nothing, like you want to make these all the changes atomic or not. So to solve this problem, we probably need some sort of transactions. So this is something that Tom was talking about on the previous talk. All right. So we have a stone set, the presentation should be storytelling, so maybe some short story. So when we started this NVML implementation, at some point we find out that we probably need some tool to test our library to make sure that it actually does all this stuff correctly. Because the people that would use our library, they, you know, rely on us and they would expect that if they use our library, they are safe, they data is safe. So, you know, this is some sort of problems that are hard to actually test in software, because as I said, there are some things that happen in hardware. So it's hard to design the test that would simulate those conditions. So we decided maybe we should write some tool that would analyze the code, either the source code, some static code analysis, or maybe the binary code to detect the problems, like missing stores or maybe some, I'm sorry, missing flashes. So you do the store, but you don't flush it to persistent correctly as on the previous slide. Also, there could be some performance issues, like you do the store, you do the flash, and you do the flash. So you do some unnecessary flashes, but they are not for free. They would affect the performance. And also, you know, some other problems, perhaps. So we were thinking about, you know, writing some tools from scratch, but eventually we decided to use Valgrind, but let's start with some, let's say, requirements we had for this utility. So first of all, we, this tool had to recognize which memory is persistent, which is not. Because only for persistent memory, all those flashing and transactions applies. Also, as I said, the, I'm sorry, yeah, so this detection of the state of the device, which is not flashed, detects the stores that, you know, you store, you don't flush, and you store the new data to the same location. Probably there's something wrong with your program, and also detects some unnecessary flashes. And probably to do some support for transactions that I will talk about on the next slide. The last feature, probably the most interesting is something that could allow us to simulate the things that happen in the CPU. Like, you know, reordering the flashes, like, you know, because of the CPU cash pressure, some, you know, the later store could actually be flushed before the earlier store, right? This actually may happen, so you need to care about it. So we have a couple of examples like what the tool could detect, like, here we have a store and another store to another location, and we have yet another store to the same location, but there was no flash between. So this is probably something wrong here. Here we have some unnecessary flashes, you know, double flash, or the flash of the location that was not actually modified at all. And also here is the interesting problem, like, this is about this flash reordering. So here we have three stores to different locations and three flashes. So it looks like somebody did some couple of modifications, and he wants to, that all of them are flushed at the same time, like, they want to do those stores to be atomic. But in practice, if this is torn here, there is some power failure crash here before those flashes are executed, we don't know what is actually the result in the memory. It could be that this is actually executed, or this is, or nothing. And if, like, in this, we can see that this is actually the actual data, and this is like a completion flag. So you want to implement some sort of simple transaction by your own. So you store your data, and then you set the flag, okay, this is done. So you can recover when the operation is torn, then after reboot you can find, okay, the flag is not set, so discard the data. But in practice, as I said, the order of those flashes could be changed. So you never know whether this flag would be actually set, and those data are not. So the correct order could be, look like this. So we store the data, flush the data, then we store the flag and flush the flag. So we are missing fences here, but this is just a simple example. Okay, so, as I said, we thought about maybe writing some new tool, but because we also, the part of NVML, is also the persistent memory allocator. We also did some instrumentation for a mem check tool, which is also a tool written on top of Walgreen framework. So because we did that, we also had some experience how Walgreen works, how it detects that the memory was stored, and how it detects that the store was, you know, that you make a store to some memory that was already freed, and stuff like that. So we thought, okay, so Walgreen does a lot of work we actually need, so it tracks all the changes to the memory, all the accesses to memory, so we only need to teach him how to, you know, how to detect whether the order of operation is correct. So maybe a simple question, how many of you have ever used Walgreen, all of you, almost all of you, that's what I was expecting, but how many of you actually did the, I don't know, read the code, how did they take a look at the source code of Walgreen to understand how it works? Okay, two guys, good. So this is a very smart creature of Walgreen, and it, so this is the binary instrumentation framework, so it, what is the nice, that's its multi-platform, and on the next slide I show a simple example of how it works internally, so it actually, when you execute the code, this is dynamically deassembled to some, and converted into some intermediate representation is VEX IR, some intermediate language, which is platform independent, then it is, this code is passed to the, to the, to your tool, which, it could be memcheck, it could be Hellgreen, DRD or PMemcheck, and then you could instrument this code by adding some, you know, calls to some callbacks, you can register for some specific, specific instructions, and then it is translated back to the machine code and stored in the cache. So this disassembly and translation happens only once, only when this particular fragment of code is executed for the first time. This is nice, and yeah, so then you can, you can detect, you know, for all those, in our case, we are interested in all the stores to memory, and also these flash operations. So then we can implement some state machine in the PMemcheck to check whether those, those operations are performed in the correct order. That's basically it. Yes, yet another nice feature of Valgrind is client request mechanisms. So, so we can inject some, some specific requests or, yeah, queries to your code. Actually, this is very tricky because it's, this macro injects the, some magic assembly code, which does nothing. Actually, this is transparent. It's, it doesn't change the result of the, of your program, but also it is not optimized out. So it's like cheating with the compiler to not optimize this code. So it's actually there. And the, your tool, Valgrind tool actually recognized this magic sequences of code. And, you know, it knows that, okay, that this is the request for the application. So this is to, you know, do some actions that could, couldn't be, you know, that's, you know, like, as I said, the Valgrind just recognizes some, some specific assembly instructions and it converts it to, to, to some intermediate language. And, you know, in some cases, this is enough to, to do some actions. But there are some actions that cannot be, you know, detected based on the, on some specific sequence of instructions. So then you have to help the tool with your macros. And one of those examples is like in PMEM check, we are telling the Valgrind, the PMEM check, which memory is actually persistent memory, which is, and which is not. So this is, and the other example, which would be also the next slide, is like we informed the, the tool that we are starting the transaction and, or we are committing the transaction. Yeah. So if you want to write your own tool, it's, it's not rocket science, but also it's not trivial. There are some examples in the Valgrind's repository. So you need to implement those mandatory four functions. The names are actually, you can pick your own names. We, you know, these are, the names are just the big, the action with what these functions does and PMC is just our prefix for PMEM check. So the most important one is this instrument, which does, you know, handles all those intermediate representation, even statements and does some actions to instrument the, the final code. Because we are handling this client macros, we also, of course, have this handle client request function as well and some command line arguments processing. All right. So what are the pros and cons for using Valgrind to build a new tool? Basically, the Valgrind is very feature rich framework. And as I said, multi-platform. So we are, of course, we are Intel guys. We are focused on x86 architecture, but we believe that the same programming model, the SNEA programming model could be used for other architectures. Perhaps the instructions to flash the cache would be different. Maybe not just one instruction. Maybe a sequence of instructions, but the general idea is the same. And actually the same tool could be used on other architectures because, you know, what's inside operates the core of the PMEM check tool operates on this intermediate representation, which is the same for all, all the architectures. And it's widely used as we have seen all of you have used Valgrind. So there is a big chance that if we write a new tool based on Valgrind, it would be adopted by the community. Also, the multi-training programs are much easier to analyze using Valgrind because it serialize all the, all the threads, which is good and bad, but it makes it easier. So the drawbacks are the API. It's not very well documented. It takes some time to went through the code to understand how it works. But this is actually our problem, not the user's problem, right? So the actual problem for us is the performance. So, you know, when you run your program under the tool, it, of course, affects the performance. The execution is a couple of times slower. And we also face this one of, because of the design, how we implemented PMEM check, we face this problem as well. So I will talk about it later. Okay, so what is PMEM check, actually? So as I said, this is a persistent memory error detection tool, which is focused specifically on persistent memory programming issues. So as I said, this flashing to persistence, also this flash reordering problems. And it also provides some support for basic support for transactions. If you, you know, it doesn't have, you don't have to use NVML, but if you implement your own persistent memory library, then probably you would like to implement also some sort of transaction. So PMEM check would also support that. We build this tool, you know, having in mind the NVML and the PMEM option libraries, but we strive to make this tool pretty generic. So it could be used with any software that uses the same programming model. So, you know, if you use NVML, you have it for free. You know, all the instrumentation is there and you can build your application on top of NVML libraries and, you know, you can run and test your program under, with PMEM check. But if you decide to write your own software, okay, PMEM check would also help you to detect errors in your programs. What PMEM check is not. This is not a generic error, memory error detector. So it's not designed to detect memory leaks, double freeze and stuff like that or access to uninitialized memory out of band and stuff like that. So for this purpose, you should use PMEM check because this is designed for that. Actually, this is a tricky, you know, the memory leaks is, in case of persistent memory, this is a tricky part because, you know, if you allocate and you don't free before the program termination, this is perfectly fine because this is persistent memory. So, you know, you want these objects to persist until the next run of your program. So this is perfectly okay. But leak is actually the situation when you lose the reference to the object, right? So we probably could detect that. But the problem is it doesn't have to be a problem. It depends on the architecture of your, of the design of your program. So, for instance, in case of a Lippymem object, all the objects you allocate from our Lippymem object persistent memory pool, they're just, you know, there's a big container of objects. They don't have to be, you know, referenced to each other. So we have a function or macros to iterate through all the objects. So even if you don't have a reference to, you know, to any of them, when you start your program, you can iterate through all of them. And, you know, there are no or funds, you know, or the objects that you cannot access, you cannot free. So you would have persistent memory in such case. But also, we can imagine the situation that you have a structure like, I don't know, three or something. So each object refers to another object. So there should be no or fund objects in the pool. And also our library provides the feature like a root object. So, you know, whatever structure you have, or multiple structures, all should be referenced from the root object. So this is the one object from which you can access any other in the pool. So in such case, if you have an object that is not referenced from anywhere, that this is a leak. But, you know, as I said, it's not a must. You don't have to design your program that way. But also the problem is that, as we could see on previous slides, it's what actually the persistent pointer is. And, you know, in NVML, we decided to implement it as a kind of structure which holds the pool identifier and the offset within the pool. But, you know, other implementers may implement the libraries, the allocators in some other way. So the persistent pointer would be something different. And PBM check is a generic tool. So, you know, unless we have a standard definition, you know, of a persistent pointer, how it is, how should it be interpreted, what it actually is, then it is hard to implement this feature in the tool. So we decided not to do it at least yet. But who knows, maybe in the future. Okay. So how it is implemented internally, or actually how we did implementation of this PBM check. So the first thing we had to do was to the support for the new instructions. So this CL flash opt, CLWB and P comit. So, as I said, P comit has gone, is deprecated. But we started the implementation two years ago. So it was already there. So we implemented it. So I mentioned it only for educational purposes. But still, we had to add the support. It's not a very complex task. You know, you can just do some copy and paste because, you know, there are a lot of instructions. All of them actually must be supported by Valgin. So to add the new instruction, it's not a big deal. But also we had to differentiate few instructions like, you know, the intermediate representation of CL flash is just flash. But, you know, as I said, we have a strong or weekly ordered flash instructions now. So we need to know whether this flash requires fans after that or not. So we had to, you know, modify the core part of Valgin to support that. Also, the same problem was with memory fences. So all the fence instructions, load fence, store fence, memory fence, they all translate to the same intermediate opcode. But, you know, the CL flash opt is ordered by store fence, but not the load fence. So we had to differentiate those. And we added the P comit support. But as I said, it's not necessary right now. Yeah. And we also added this support for CRM macros to tell the tool whether when the memory is actually persistent memory is allocated. And not only to tell which memory is persistent, but sometimes we want to, you know, tell the tool that this is a big block of persistent memory. But this part is actually volatile because we store some runtime data there and don't treat it as a persistent memory, even though it's, you know, actually on top of NVDMS. We have at least two examples in our libraries for this situation. So as I said, we do some in the header of the pool, we store some runtime data. It's more convenient because you have all the metadata in one place, but some of them are not persistent. And the other examples are our PMEM logs. So maybe it's not time to talk about it in detail, but these are sort of just regular mutexes, read, write logs. And we want to, you know, that the state of these logs, it resets. They are already realized when you open the pool the next time. So you make sure that all the logs are actually unlocked on the next open. Okay. Some examples of those client records macros. So, you know, sorry, wrong button. So here are the macros to register persistent memory region or unregister. And also we have those macros to inform the tool that you flash the data to persistence or you drain the memory controller buffers. You may ask why we need this? The answer is because this is a generic tool. So if you have some other mechanism to flash the data or drain the memory controller buffers, there is no dedicated instruction for that. Then you can do some stuff and put the macro, you know, I mean the other architectural like, for instance, so you can put the macro to inform the PMEM check that you did the right thing. So it knows that the flow is correct. Yes. And of course, if you're using NVML, you don't need any, our API that you don't need to do any instrumentation in your code. Other macros for transactions and for some logging features. So I would tell about that on the next slides. So to track the, you know, whether the flow of the data is, you know, correct, then we implemented some sort of simple state machine where each memory location, like a byte of the persistent memory region could be in some four or five states. Actually, because of this, again, because of this picomic deprecation, this stuff is gone. It's not necessary here. So when you do the flash and you do the fence, actually, it goes directly to the clean state. Yes. And as I said, we track the state per byte. So someone might ask, why not per cache line? Because, you know, when you're flashing the memory, it's always a full cache line. But, you know, so when you do a couple of stores, like two stores to the same cache line, but to, you know, different bytes, and then you do the one flash, see a flash instruction, it may happen for the tool, it may happen, okay, that was correct. You did the stores, you did the flash. So if we track the, you know, the only, with cache line granularity, it would be like the cache line was dirty, then it was flashed, everything is fine. But, you know, maybe this is because the data is, you know, the location of the data by accident is within the, you know, both fields are in the same cache line. But if you know the alignment would be different, you would actually touch two cache lines with those two stores and then you would need two flashes. So, actually, it means that you are missing just one flash and PMM check would detect that. So this is why we decided to track the, you know, the memory state per byte. Of course, if there is some big region of clean memory or some region of dirty, like some continuous contiguous region of data, we don't keep the state for each and every byte. We keep the, you know, we have a tree of regions. But if you do, like, if you do the store for each, you know, every second byte, so you have a clean byte, dirty byte, clean byte, dirty byte, then, of course, the tree would grow up rapidly and this is one of the, you know, performance issues we observe. So we need probably implement some better data structure for that. Okay, in practice, this is some example of what are the options and what is the default value for them. You can, of course, use a dash-dash help to display all of them. So the first, the basic issues that could be detected with PMEM check is, of course, this detection, whether the, you know, the sequence of operations was correct. So whether you did the store, whether you flushed the data to persistence. The second issue is when you store the data, you don't flush and you store the data to the same location. Probably it's not correct. If you store exactly the same value to the same location, it could be correct. So we provided the option to, let's say, make it less restrictive. And actually, this is for on purpose because we detected that we observed that the, I believe, MEM copy or MEM set function does exactly, you know, the situation was observed that it does the store, if the data is not well aligned, like the beginning of the end of the buffer, then actually some bytes could be written twice. This is because of the internal implementation of those factions and, you know, but, of course, it stores exactly the same value to the same location. So we allowed to do that so the PMEM check would treat it as a, if you enable this option, it would treat it as a legal operation. And, of course, it also able to detect double flashes or unnecessary flashes and so on. We support some, it provides some support for transactions. So this would be, I would talk about it in details on the next slides. Yeah, so here is some example how it, what is the output of PMEM check if you run it with some basic arguments and you have a store to this variable, but there is no flash. And when the program terminates, the output would be like that. So you have a store that was not made persistent, probably. Another example, you have a double store to the same location without the flash in between. So, again, of a written store. Here is the example of some unnecessary flashes. So here we, of course, this is the example of some, so the code of, of from x86. So we have this CL flash opt built in and we do the flashing the same cache line twice. So this would be detected. And also this one is not necessary because there was no store to this location. So here we have also the report for that. Transactions. So usually if you want to, to make sure that your data is stored in some atomic way, like you do some multiple modifications, like adding the new element to the list or something, you need to allocate the memory, fill it with data and put it into the list. You need to modify some pointers or whatever links to the elements. So you would like to happen that, you know, all or nothing is executed. And if the operations is torn, you can roll back to the old state. So for that, as, you know, you could, it was, you know, presented on the previous talks, we use some, in LibPim, we use some underlocks, so we take a snapshot of all the objects that would be, that's going to be modified within the transaction. And if it's for some reason the transaction fails, you can restore the data from, from that. So if you write your programming that way, you are using the transactions, then you should some, you know, explicitly inform the tool that you are starting the transaction. And if you do some changes to the persistent memory before that, that would be assumed like out of transaction changes. And this is usually a problem. It indicates some error in your software. Also, even when you start the transaction, you need to inform the tool that I'm going to modify this object, this region of memory. So, and this also indicates that, okay, I'm taking the snapshot of this block of memory. So now I'm going to modify it. And this would be that the PMEM check would know that this is legal. You know what you're doing, right? So some simple example, we have a, you know, we have object A. We started the transaction and it's, it was added to this transaction. So the tool PMEM check knows that object A, actually what is the object A, it is just a block of memory like pointer and length. It's going to be modified. So it assumes that you have a snapshot of this. And we have also the other object that is modified, but it was not added to the transaction explicitly. So it means that probably the snapshot of this object was not saved anywhere. So if something wrong happens and the transaction is aborted, then the old content of this could be restored, but not for this object. So this is a bug. And PMEM check would report that issue. Yeah, here we have some examples of that. Like we start in the transactions, we are adding the this piece of memory. We are going to modify it and we do. And we also do the modification of some other location that was not explicitly added to the transaction. And the PMEM check reports that there was some, some stores made out of transaction. Another problem is when you try to modify the same object from two different transactions. And the problem is that in the first and in the second transaction, you would take a snapshot of this object, right? This is how the underlocks work. And this is not bad because, you know, you may want to modify different parts of the same object. But what happens when one of those transaction commits, let's say in one thread and in the second thread, it aborts? So in the second thread, the one that aborts, the old content of this object would be restored. So actually overwriting the changes made by this one. So this is, again, probably not what you expect. And this is an example of how it would work in your program. So if you do all those instrumentations correctly, that would be detected by the PMEM check. And of course, the last one is the leftover transactions. So you started, you did some modifications, but you never commit. So probably a buck. Yeah, we also support transaction nesting. So this is because in the PMEM option, we do that. So also the tool should be aware of that. So like in practice, in the PMEM object, transactions are flattened. So if you start a nested transaction, all the changes apply to the outermost transaction. So actually you have just one. Yeah. And the last one, maybe yet another information about those transactions. In PMEM option, we decided that we support only one transaction per thread. So when we did the instrumentation, each transaction in PMEM check has its own ID. And in case of PMEM option, we just pass the thread ID as a transaction ID. But we can imagine that somebody would implement their own libraries, their own implementation of transactions, and they would allow, you know, they would like to have their own IDs. So our macros also support that. So you can specify your own transaction IDs to PMEM check. And the last nice feature we have, which is actually in some sort of prototype phase. It's implemented, but not yet publicly available. But, you know, this, I mean, the full feature, but some support in PMEM check is already available. This is logging. So as I said, to detect the problems with flashy ordering, this is hard to detect it in runtime. When you stop your program and you examine the memory, probably you would see the correct values. Also, if you terminate like kill your program, the system would do some cleanup and all the stores would be flushed anyway. So you can't observe the problem like missing flashes or reorder flashes and stuff like that. So to simulate that, we implemented the feature, like, you know, that allows you to lock all the stores and all this, you know, flashes, operations in the some sort of text file or binary file. And then you can use another tool to do some offline analysis. And you can then take this log and simulate the execution. So every time you have a number of stores and there is a fence instruction detected, the execution stops. And then it tries to simulate all the combination that could happen in case of a real failure. So in this case, we have three stores. And in practice, if the program is, you know, crashes here or there is a power failure at this moment, we can observe all those effects like this. So it could be like all the stores were successfully flushed or none, which are actually the happy scenarios. But those are probably not. So we can simulate that, you know, do all those permutations. And for each of them, you can specify the program or function that should be executed to check the consistency of the data. Because PMEM check does not know nothing about what are the data structures and what actually stored here and what is here, what is the meaning of those, you know, variables, memory locations. So the PMEM check is not able to analyze whether it's correct or not. Perhaps it's not. Perhaps it's not. Perhaps it is. Perhaps it's not. So we need to provide some external tool. So when you write your program, you need to provide the tool that would do this consistency checking. And we have some sort of Python scripts that would do this post-processing. All right. So here is how you can take the log from the execution of your program. And later, this is the output, so later you can take this and run through these scripts to check whether there are no issues in your code. All right. So that's basically it. Here are some links to our repos. So, recently, originally, we have just one repo, Valgrind. Currently, we have split this into two, so everything is under github.com slash PMEM. So, this reflects how the original source code of Valgrind is actually organized. Here, we have some links to the tutorials in our blog on PMEM.io, so you can some examples how to use it. So perhaps some of those are the same as in this presentation. Of course, this is the original homepage of Valgrind's project. Yes. So, yeah, I'm running out of time, so maybe the last slide. This is actually the last slide is about the future work. And so, as I said, the post-processing is in some prototype phase, so we need to complete this work and make it public. And actually, currently, it's a separate repo, but I believe it should be a part of this Valgrind repo together with PMEM check. So, currently, it's still our PMEM check. Still expect that you do the memory controller drains because it was written when the peak commit was still required. So, currently, in NVML, we don't have peak commit. There is no op, actually, but we still call this client request macro to inform the tool that we did the memory controller buffer flashing. So, it works, but we need to make it optional. So, by default, it should not expect any other actions like cache flashing. Also, the ordering, I mentioned that CL slash opt, CLWB, they require the memory fence, but there are also some other instructions that would order the execution of CL slash opt, CLWB, like change and lock prefix instructions, so we don't support it yet. We have to add it. Performance tuning, as I said, in some scenarios, the performance is terrible, so we need to fix it. Other architectures, if anybody would need it, if anybody would, let's say, port NVML to other architectures. Recently, yesterday, we had a question on our mailing list that somebody is trying to port it to PowerPC, so maybe also the PMEM check would support that at some point, and it would be nice to eventually upstream all of this work to the official Vargin repo. So, yeah. That's it. Thank you. Any questions? Okay. I'm sorry. Could you use the mic? The other day, there was a talk about clang and something called lip sanitizer, where you link with a different library, and then it's much faster, like an alternative to well-ground. I don't know if you heard about that, or if that would be another approach that might make sense. So, I mean, the address sanitizer? Yeah. Yeah. We are using this for testing, but I'm not sure. Well, in case of persistent memory, whether it could help or not, but maybe this is one of the options. Yeah. There was just another approach than using well-ground. Okay. Yeah. Okay. Thank you. Any other questions? Nope. Okay. So, thank you. And here are some links to maybe some more information you would like to check out, of course. And also, I would like to encourage you, if you are staying here for the next day, tomorrow we have an NVML programming workshop. So, if you like, please join us tomorrow. Thank you very much.