 Okay, I guess we start. Welcome to my talk about where estimation for devices with EMMC flash memory. There you go. So here's some points about myself. I joined Toradex eight years ago and I pretty much spearheaded the embedded Linux stuff there and nowadays we are our latest the Toryzen platform is built all on mainline technology with mainline U-boot, Distribute, with KMS, DRM, Graphics, Ednavi for NUVO and OTA, OS3, Docker on top. Then what do we cover today? I will give a brief overview of the technology. We look at the EMMC of course. Check out how the flash health may be analyzed there. And then we look at some ways to do IOTracking and with all that we can dive into the lifespan estimation. And then I will show that tool that we're working on, the flash analytics tool and then the conclusions. So flash it's pretty much the non volatile memory of choice nowadays for embedded systems. Of course it helped decrease the size, increase the robustness because there are no moving parts and also reduce power consumption. But why would one even want non volatile storage at the edge? Well, probably you want to keep some redundant data on site, for example, if you have any kind of intermittent connectivity problems that you still keep your data there. So that's why it's ever increasing in size and importance for IOT stuff. Then of course, it's a question, nor versus NAND flash. The difference is basically at the transistor level, how actually bits are stored. Basically it's either a nor or the NAND gate at the logic gate level. Looking at nor, it's simpler, the principle of operation. Also has higher reliability. But of course, it also usually has higher pin count and you get a lower density in silicon. So you need bigger size chips, which is more expensive. So usually nowadays, nor flash is only used in very special, specific applications. When you have like really special, high critical industrial grade requirements, you might still use nor flash. The rest pretty much moved on to using NAND flash nowadays. What does that mean? How is the NAND structured at all? So the cell is the smallest entity. Basically storing the data at the bit level. And then a page is the smallest array of such cells. And it's also what is addressable for read write operations. And the write operation usually means that you're flipping bits from a one to a zero. And nowadays page sizes are in the range of kilobytes. For example, the device we're looking at, it has four kilobytes. Then the next entity is the erase block. It's basically an array of pages. And it's the smallest unit that is addressable for erase operations. So there is no way you can just erase a page. You always have to erase the whole block. And that returns the state back from zeros back to one. The block size is usually in the range of megabytes for example four megabyte in the device I will demo. Unfortunately, erase operations are kind of slow. And also it's wearing out the flash over time. So when you do too many erase operations, basically it will develop bad blocks. That's why usually we're talking about block erase count. So it's an interesting metric basically, how many times a certain block has already been erased. Then we can have single level versus multi-level cells. So it's about how many bits it can actually store. And this of course it depends on the voltage level threshold. So a single level cell just stores one bit per cell. Then we have also kind of a intermediate thing, what is called Pseudo, SLC. It's basically a multi-level cell operating in SLC mode. So it at the end also only stores one bit per cell. A regular multi-level cell stores two bits. And then there are triple level, quad level, whatever you get the idea. Of course there is a trade-off between density and cost versus reliability and lifespan. So the more bits you can store, the less area you need. And it also is of course cheaper. But the drawback is that the reliability will be worse and it also will, lifespan will be worse. So that's why it gets more and more interesting to actually know what, how long it will survive actually. That's what I'm talking about. Of course you need ECC and taking care of bad blocks. Basically only with error correction code algorithms you can even use those NAND flashes. You have to add some kind of level of redundancy. And that will allow correcting, or even detecting certain bit error conditions. But be aware that even healthy blocks, they will of course have random bit flips. And only with that ECC, you can actually use it to store good data. But over time when you use it, basically every time you erase it again, you, the probability basically increases that you will get more and more bit flips. And at a certain point your ECC will be too weak basically. And that's when the block is worn out and it becomes bad. And usually even from the factory it can have already bad blocks, the so-called factory bad blocks. And the way you handle the whole bad block stuff is you need to keep some spare blocks and one by one when they turn bad you have to kind of replace them with those spare ones you keep. Another concept is the wear leveling. So the problem is if you would use the same physical block all the time. For example, if you have a certain file that needs updating all the time, it would increase the wear out and that would prematurely basically create bad blocks. So whatever those blocks would turn bad before all the others turn bad. That's kind of a bad idea. So you're trying to basically evenly distribute the wear out and the only way that is possible is basically you also have to move the data around. One usually distinguishes dynamic whereas the static wear leveling and it's there, it's about whether you only move the new data and leave kind of the static data there or whether you even include the static data and try to redistribute that among the whole device. Of course it's better to use as much as all the space basically to distribute it but that is of course also it requires more complicated wear leveling algorithms. Another concept is garbage collection because as I mentioned before the erase operation is kind of slow. So it would be a bad idea if you only then when you actually want to write data you then decide now I have to erase it you have to wait until that operation is finished. So what is usually done is that you just mark the block as dirty and then it can do kind of a cleanup operation this garbage collection later when it's idle or even can do that in the background while you're actually writing other stuff. Then one thing that is predominant in kind of managed flash stuff is of course also the write amplification factor. So the problem is that the actual data that you write to flash to the cell and the data that you send from the host down is basically not the same amount because due to for example different program and erase size you oftentimes have to kind of move data around even more than that you actually really need to write. So internally it will have to move stuff around very much and another reason because it needs erasing before rewriting so you cannot just you have to erase a whole block and all the other data that is in that block you somehow have to move it around. That's those memory management features we covered before the ware leveling garbage collection all that adds to that and the typical write amplification factor in nowadays EMMC a good average is around four but of course that heavily depends on the usage scenario. So one important thing is that kind of the data size is selected in an optimal fashion concerning the page size basically. So if you have totally small files or things like that that it will be a much worse kind of write amplification factor. Yes, in average more or less that's what happens because well if you only write new data then that is not the problem but if you now have a system you overwrite data you know what I mean and it has to kind of move stuff around overall you will have much more that will have to be written internally. Exactly. And all the old data you have to kind of move it and only that page that you changed that actually then changes. Well usually you only have kind of one place where you have the hot data you know what I mean but of course it's still in there and the other block is kind of marked as dirty and then it will clean it up again. Well that's some average I think that one I found in some kind of Toshiba application note but of course like I said that heavily depends on the usage scenario and the idea is now with our tooling that we actually analyze that and yeah well that's a good question that's more a question of this whole kind of management algorithms they're using. The good thing is because that technology is kind of matured now you actually find a lot of papers about these exact algorithms online. So I read through some of those but of course it's kind of quite abstract stuff and I'm not sure how much they, well I guess that micron or Toshiba or so that they actually run some tests and even maybe tweak all that stuff to actually get some kind of better value but that because it heavily depends on the usage scenario it's kind of tricky of course but I know that there are certain of those algorithms they even kind of analyze and have different kind of sections and even try to kind of do that intelligently whether you have small files or bigger files and things like that to get better values there but that's really kind of especially in the EMMC that is kind of a black box of course. You can kind of hope that the vendor did that in a smart way basically. But the good thing that we will see later is that you can get some numbers about this kind of low level stuff that is going on and you can then basically yeah for lifetime estimation you can use that of course. Even so you don't know the exact algorithm how they do that internally but if you know how many times then a block actually gets erased and stuff like that you basically can run your usage scenario and then basically look at those counters and then kind of predict what it will do okay. So the EMMC it's basically also called Managed NAND so it's basically raw NAND dye with an uncompanied NAND controller so it obstructs a lot of data way like I said before it's kind of a black box. You don't know all the details. The nice thing about it is that you can use regular block operations basically. So you can run regular file systems like the next four file system and don't have to worry about any of these details. The latest it's a JetEx standard. The latest one is 5.1 and the example EMMC that I will be covering is this micron part that is four gig in size as 1024 blocks and an average lifespan of 3,000 righty race cycle and it's produced in 15 nanometer process. That's more or less pretty standard for embedded systems. So how does the EMMC protocol work? It's basically a bus. You have command clock and seven data lines. Command is kind of a serial command response channel and data is a parallel kind of read write data and it also has some CRC and then you can have single or multiple block read write operations basically. Then you have the register set. I'm not gonna go in much further detail here. What is interesting is this extended card or device specific data that contains more information and we will see that also that they added at a certain point also some lifespan information there. This is what is called the JetEx standard health reporting. So basically you get a device lifetime estimation of type A and type B both show in increments of 10% basically what the device current state is. And in our case with the Micron EMMC they use the type A to refer to the Psyto SLC blocks and the type B is the regular MLC blocks. And then also overall you get so-called pre-end of life information. That is basically it has it knows about three states. The normal state is up to 80% of the reserved blocks are consumed. Then there is a warning state that is if you go above that 80% threshold and then they also know an urgent state where you have more than 90% consumed. And that stuff got introduced with the standard version 50. The problem with it is a little bit that maybe for a live product it's okay of course you could use that to at least know kind of when it will gonna go and die. But if you now in the process of creating a product and you would kind of like to know, well, how long will it survive? You have to do really long kind of runs of stuff to even have it kind of increment the full 10%. So that's why for our purpose it's not very well-sweeted basically. But the nice thing it's there on all vendors if they use the 5.0 spec. Then there is basically a vendor proprietary health report in our case the micro on one. They have a technical application note about this device health report, they call it. And they have more information. For example, you get bad block counters. You get the factory bad blocks, the runtime bad blocks. You get to how many remaining spare blocks you have. And also you get some kind of logging per block when they failed a program or erase operation with the page address of where it actually failed. Then you also get some block erase counters. You get a minimum, maximum and average overall. And you can even also get per block. So all the 1024 blocks it will give you the exact number how many times that even got erased. Then of course you also get some more internal data about the block configuration. So the physical address of each block and also the pseudo SSC versus multi-level cell kind of configuration because that is actually a user configurable in this case. So the user can actually say whether you want the whole device or parts of the device in certain modes. And how to access it? You can use this general command also called command 56. But what is even meant by flash health? So basically it's about the percentage of the capacity that already is worn out. And this endurance, you can calculate it either number of blocks times the average block lifespan. So in our case that is this 1024 by 3000. So we get some kind of three million block erases you can do during the lifetime of this device. Or oftentimes it's also more common in the SSD world they talk about how many terabytes you can write. So it's the endurance, the block size times the number of blocks times the average lifespan. So in our case that four gig device you can write 12 terabytes. And once you've written 12 terabytes basically if you write more stuff will go bad. So how can we monitor the flash health in Linux? One set of utilities the MMC utils. It's basically software to extract some meaningful information from the MMC devices. What it allows you is basically reading that XCSD data that contains this information. And as we saw before, since standard five O this includes this lifespan as defined by JEDEC. And as an example I show you here you can first read it to even figure out whether what kind of standard version your device supports. So in our case it's five O, so it's fine. It should have that information. And then if you for example grab for that thing you get that's how it gets presented. In this case I have a one everywhere means it's a new device. So it's within the first 10% of where basically. And also the pre-end of life it's still in normal case. Okay, then the vendor proprietary health report as mentioned before you can use the command 56 so you could for example extend MMC utils and kind of go through and read that data out like that. That way you could read bad block count, serious counts and all this kind of information. Okay, another way to do that is we can use propriety tool. So in the micron case they have a tool called EMNC PARM which allows you to read that stuff out. Then the IO tracking I mean somehow we want to relate that information with what actually went on what we wrote and things like that. So it's a useful indicator. It's also interesting to know which kind of application is causing most where things like that. And that data like I mentioned we use as input data for the very estimation model. And there is of course it's independent of any kind of EMNC information. And you could use it for any kind of flash technology. To know how to do that, we have to look quickly at the IO stack. So basically you have the file operations somewhere in user space. You have stuff that gets written. It does some system calls, it ends up in the IO stack and somewhere at the bottom it will actually write it down to the flash. The block device IO stack looks like that. I'm not gonna go in much more detail there. Of course, question is why can we not just monitor it from user space? The problem is a little bit well it's not very accurate because we see it here. I mean it's actually doing some advanced stuff. It has an IO schedule where it might not commit stuff immediately, cache it. There are layers of caches. So that's why it's not very well suited to just look at the user space rights basically because you don't know whether they end up on the flash or not. So measuring real IO rights, there are a lot of places in the kernel where you can kind of hook up. Question is how and where will it, when do we know that it actually will hit the flash? IO top, that will be the one to use from the user space. It's easy to use but like I said it's not quite accurate for what we want to do. On the other hand, block trace allows you to get more information on that but it gives you basically too much information and then you can use filtering and you basically need to know which is a good filter to actually that tells you whether it's now really gonna hit the device or not. And for that you're gonna use the complete flag basically. And how do we now estimate this lifespan? So we can basically lock the flash health and the IO tracking and we can then correlate the two over time. So the lifespan in seconds is basically the endurance divided by the average global block erase count that it already got. Or if you want to look at the right rate, the endurance divided by the adjusted average kind of right rate you have. One remark I want to do is that this whole lifespan stuff is strongly affected by temperature. So you have to make sure when you do any kind of such measurements that you actually also consider that. So if you want to run it at kind of industrial temp 85 degrees you also will have to do all your kind of testing in that same environment. Otherwise you will get quite different numbers at the end or you will estimate something and it will end up that it will die two times faster in the field because you forgot to take that into account. Just be very careful of that. Okay. So we have the flash analytics tool. It's basically a tool developed under the Toradex Labs umbrella. We're trying to abstract away this whole complexity that we talked about now. And it mainly targets application developers that of course they don't want to know all these details. They just write their application and they want to know, well, will that survive five years on that device or what will happen? Or do I have to kind of adjust how much data do we have to change how we do that, things like that. The current prediction model that we're using is just implemented using regular linear regression basically. So we assume that it will be linear whatever we calculated and measured. Okay, I can do a quick demo. I have a board here. It's our Astor board with a Colberti iMix 6. I can quickly hook that up. So I use ethernet to get to the stuff. And unfortunately I came back a little late from lunch so it should be quick. Just block that in. This one can be powered just by USB. So let's see, probably need some kind of, excuse me. No, it will be in the slides as well. You can just go to the Toradex lab thing and download it. And it's free to use, yeah. Let's see, probably there is a way to increase the font. Doesn't want to do it. There you go, okay. And let's fire it up. Voila, that's now running Toryzen and we have the flash tool basically containerized. Let's see how that goes. Then on the other hand to access the data we're using. Voila, we're using the browser. Let's see, put that whole thing, connect to it that we also see what it's kind of doing. Probably takes a few seconds. There you go and usually at the beginning it will not have any data available of course because also it needs some. You can look at how much writing is going on and here you get that whole flash information. So this one is that regular standard information. So you see it's also still in the 10% and it is still in the normal case. Over here you actually get the vendor specific stuff. Blockerize content, so you see the average, so it's a fairly new device I pulled out of the cupboard and you could even also look here at all the blocks. And you see that in this case it has most of the blocks in regular MLC mode. And PAP blocks only a very small number of the PAP blocks is used yet. And like I said, it doesn't wanna show anything now because it doesn't really have any data yet. I mean, if no writing is going on it's kind of hard to predict. So we have to first cause some writing. Let's see how we can do that. I can for example, SSHN, we also increase the font here a little bit, so. Very well, and let's see. For example, we can use some DD to just write some random data. Let's go back to the browser, check again what that guy is doing. So it doesn't have any data here. We look at the write statistic here. You see this one, for example, it's the JPD. So it's basically just, how you say, from the journaling. I mean, journaling is going on. It's probably writing a little bit of log data, something. But the system is more or less idle now. And now we go back here, we start writing some data. And then we will hopefully see that appear here. That would be the ID. It should refresh every 10 seconds, so let's see. Not yet. Let's hope that it's not a Murphy thing. Ah, there you go. So you clearly see the DD. And it's kind of nicely writing away. I mean, we're kind of unthrottled writing to the flash now, basically. So it gives a nice little wear. So what does it now do? Ah, there you go. So if you go full throttle on that one, basically it will not survive very long. And most people don't really realize that. I mean, you can wear out the flash within weeks, basically. It's really, I mean, so you have to be very careful. So as an application developer, you now would really have to kind of make sure that you design your whole system in a way that it actually meets whatever your marketing might have in mind, how long it should survive or not. And of course, it would be a kind of a bad thing if you have them all in the field. And a couple months later, they all call you and say, well, it kind of died. What happened? Oh, that's the idea, basically. Okay, any questions? Yeah, like I said, I mean, in this case, now we only implemented the micron one. In theory, you could just use the ZX standard one. The problem is a little bit, you would have to really, depending on your application, you would have to run it maybe for a month or even longer to even get one 10% increment that you even know. I mean, because before you don't know it, you can run it forever. I mean, it's useless. So that is a little bit the problem. Okay, Mark, yeah. Ah, oh yeah, I totally forgot. Otherwise, we don't have it in the video. Does it work? Well, I can also repeat it. Just go ahead. Okay, go ahead. Inside of the chip itself works. So do we just take it for granted that it will work well or do we measure that at all? Well, of course, you kind of have to assume that they don't have the worst box and stuff and that stuff. If you look at the real low level, we really look at the blocks how many times. So it doesn't really matter how they implemented it. If you assume that your workload stays the same, it should basically also have the same effect. You know what I mean? And also it is the right amplification factor. Everything should basically be accounted for. But I agree, one kind of dangerous thing with that is that if you basically have a new device, you probably will behave much nicer than once you have kind of really written everything and it's kind of a mess and it has to do much more cleanup and things like that. That's why you probably will have to run it for a little while. You know what I mean? It's kind of, the demo I had here, I mean just to run like a couple of minutes that probably won't cut it, you know what I mean? But because we really look at the low level data, it should take that basically in account. But of course, like I said, it doesn't know whether the algorithm will suddenly have troubles once it's more written, things like that, that is kind of difficult. So for example, is there a standard for querying just the number of erases per block? Exactly, exactly. There is a reason. Yeah, this vendor health report gives you all that data. You know, the vendor proprietary health report, it gives you all that data. Let's see. Here, micron proprietary, I mean it has every block, how many erases, I mean you get all that information. The problem is that each vendor is allowed to implement that however they like and there is no standard that says that they have to include any of those elements. Well, this one is anyway, this is the vendor proprietary. So they don't have to do that. And but even the Chedek one, it's not fully defined. Like I said, there is this AB, what that AB means, that's kind of up to the vendor again. I have a question. Yes? Have you investigated to see how many of the vendors are opting to put this health information in their vendor specific area? Well, basically all the vendors that we so far use, they do have some ways of getting to that data, but it's everybody does it differently basically. And you have to kind of invest in your tooling and but it's possible. Is there a discussion going on with the vendors to express our desire to have a mechanism to accomplish this? Well, I think that probably on the top level that is what boiled down to this Chedek standard stuff. But and like I said, for a live device, it might be okay, but for kind of to estimate stuff, it's kind of useless because it's not very fine grained. So basically I think what would have to happen is that in Chedek 5.2 or whatever not, they would have to really extend that, I don't know. It's not super secret. It should just allow you to really get that number from every block and that would be enough because that's how we do it now. I mean, but it's really, it depends on the vendor basically. So far it's not standardized there. Okay. Did you try to kill some of those EMMC and see if the? Yes, exactly. We actually run a bunch of them in this continuous kind of writing thing and it's more or less like we saw here. I mean, the good thing is that none of them went bad before kind of what we calculated here. But after three months, you basically have various issues. And it's also very different how they behave. I mean, some suddenly don't take any rights anymore and just return error. Some kind of just continue to work, but the data kind of suddenly is a little bit random and things like that. It clearly shows that even that is not so fully standardized. Yeah, pretty much cross vendor, exactly. But of course, I mean, we did, I think we did around, I don't know, two, three dozens of devices, not hundreds. So it's kind of, we don't know, but we took more than just one per vendor and they kind of behave the same. Yes? I'm really curious if this also works for standard laptops or desktops using solid state drives? Or if they're very useful. Well, in this case, you actually have standardized stuff. You know what I mean? You have the smart information usually. And I think with the, how you call it, NVMe stuff, they have a new, whatever they call it, I forgot, but they, those guys kind of have that information, yes. Since we're talking about EMMC and vendors, you mentioned temperature. Do you have any insights into the performance capabilities of a consumer part versus an automotive grade part versus an industrial grade part when it comes to this aspect? Yeah, that's a very, very good question. Because I remember, I mean, I'm now in the industry quite a while and at the beginning, when we still had a lot of kind of Ronan stuff and we were thinking, well, should we now do this EMMC or not? There weren't even any industrial or automotive grade devices available yet. So that shows clearly it's probably not totally easy. And like I said, it also heavily depends. So that's why you should really run to estimate it. You should run it at whatever temperature that you also think that your device, your product will run in the field at the end. Yeah, I just found that with my own part qualification, like the data sheets don't really get into much. Like they'll talk about average write performance, read performance, but they won't go into MTB. Well, in theory, it's that thing I showed here. Basically, you have to kind of ask, what does it mean that, I mean, if they say 3000 erase write cycles, usually what that means is that after the 3000, the data that you've written still survives one year. That's actually, they use this data retention. And that is basically really the biggest problem. So that time will decrease. So the problem will be, if you write it at 85 degree, basically you can still do these 3000 write erase cycles, but the problem is the data probably after a month it will be bad. And now to counteract to that, you kind of have to rewrite the data, but that of course will use some of these precious erase cycles. So it's kind of a trade-off in between that basically. But yeah, they don't really mention that in the data sheets. Thanks. Actually, in response to yours, we have to qualify each part from each manufacturer independent of the data sheet, because we do not see the data sheets reflect any semblance of reality. Well, I think we all do it the same, more or less. Like, I mean, you have to really round the stuff. You don't get that data. I do have a question. EMMC and SD have a lot of elements in common. Do you have any information about health on SD cards as opposed to direct them to EMMC? Well, I think that is rather tricky and that is also why there are these special industrial kind of great SD cards. And those suddenly do have that. So we were in discussions with some of those guys, for example, Swiss Bit, companies like that, or I mean, even Kingston does some special industrial. And there they do have tools where you can read that out. But in detail, how exactly they do it and stuff like that, I don't remember right now. Plus, I also don't think it's standardized. It's, I think a regular SD card, I'm not aware that there is some way to kind of get that data. There's nothing in the specification today. Okay. Very well, then I guess that's it. I wanna actually respond to him. So we did some investigation on the SD card because there was no in-site information like this. And so we designed a tool to destroy the SD card in terms of how many write events we were doing on the field. So, and we found actually that some vendors are really, really good with respect some others. So, Sandisk, if I don't wanna make like any, but Sandisk has an incredible lifespan in terms of number of writes that we did like writing on a database like any other application. So yeah, we did all the time but we have to basically destroy the SD card all the time. Yeah, right. So I think that that's the way to go. Well, at the end, I guess we would have to kind of stand together and pressure them more that they would actually kind of standardize all that stuff. Okay, thank you very much.