 Hi. So I'm Boris Brezin and today I'm going to talk about the plans I have for the NEN framework. So first, let me introduce myself. I'm working for Free Electrons for almost three years now. And I have contributed a lot of drivers for different ARM SLCs. And since the beginning of the year, I became the maintainer of the NEN subsystem, which is why I'm doing this talk, actually. So let's see what we're going to see during this presentation. First, this is about dealing with the current state of the NEN framework and all its limitations and inconsistencies. Then I'd like to propose some changes, but nothing is set in stone. So if you think that the changes I propose are not relevant or they do not fit what you need, then come and share your ideas with me. And yes, the last aspect is getting some feedback from people which are really developing NEN controller drivers. So let's see where the NEN framework stands in the flash stack analytics. So in the middle, you have MTD, which is abstracting all kind of flash storage. And under this abstraction layer, you have the NEN framework, which has a layer between the NEN controller drivers and the MTD layer. And of course, on top of the MTD layer, you have all the MTD users, which includes a file system like GFFS2 or a world-leveling layer like UBI, sorry. So when we talk about the NEN subsystem, actually we're talking about raw NENs. So the NENs which are accessed through the NEN bus. Everything that is abstracting in the NEN subsystem is actually done through the NEN chip structure. So why is the current NEN framework so limited? Well, the first thing is that it's been created a long time ago in the 2.4.6 scale. And it has evolved a bit, but it has never been reworked to channel all the new controllers and all the modern NEN chips, which led to an addition of new features, but no real rethink of the whole subsystem. Another thing that is not so good is that you see a lot of code deplicated on all NEN controller drivers. And this is usually boilerplate code which could be done inside the subsystem itself. So we're trying to address that, but there are still a lot of this code in non-controller drivers. Yeah. Actually, sometimes the NEN framework is too open to bring some consistency to the subsystem, and sometimes it's too restrictive on some aspects, which prevents optimization of performances. And of course, this makes the Ronin usage way less efficient than what you have on EMMCs. So yeah, what I'm currently trying to do is improve the NEN subsystem to get better performances and to help developers developing their NEN controller drivers. So, as I said, the main problem with the NEN subsystem is that it has evolved a bit each time someone needed a new feature. And this actually has been done adding some new hooks inside the strict NEN chip. And if you look at the strict NEN chip nowadays, it's filled with a lot of function pointers. And when you come into the subsystem and you don't know anything about it, it's completely crazy. So it's documented that it's still not obvious which function should be implemented, which one should not. And some methods which should be endowed by the core are actually exposed to NEN controller drivers, which mean they can overroot them. And this led to some problems and some inconsistencies. So all that results in an hardly maintainable subsystem. Since all drivers can implement the functions as they want, this also brings some inconsistencies to the MTD users. Because depending on the controller you have on your board, you won't see the same thing that you could have seen on another board. So this can be a problem. So yeah, in the other side, in the other end, the NEN subsystem is also too restrictive to allow good performances. So I don't know if some of you are developing NEN controller drivers, but almost all modern controllers are providing some advanced features like pipelining, accessing several NEN dies in parallel, and stuff that could bring a huge performance gain. But actually the framework is limiting all that because everything is cute inside the framework, and then everything is serialized. And this is a big problem. So yeah, in some aspect, the framework is too open, and in others it's too restrictive. And the other thing that is missing is support for all advanced features that you can find on modern NEN chips. So here I'm talking about cached accesses, multi-plane accesses, multi-die accesses, and probably other things that NEN vendors provide. So yeah, the goal here is really to get the best performance we can have. And for example, I just tried on one of my setup to use DDR bus and cached accesses, and it gave pretty good results. So we should really find a way to support those advanced features. But back to the changes I'd like to introduce. The first thing is that the NEN chip interface is kind of fuzzy, and the first thing we need to do is clarify the different concepts we have in the NEN stack. The first problem we have is that you have a single NEN chip strict, and this NEN chip strict is supposed to represent a NEN chip, except that this is the controller which initialized the NEN chip, and then this is the controller which fills almost all the NEN methods. What is done in other subsystems is that usually you have the core which is accepting a non-controller driver registration, and then the core is linking NEN devices with NEN controllers, and that's what I'd like to introduce in the NEN subsystem. So splitting the NEN chip interface, which is supposed to represent the NEN chip itself, and the NEN controller interface, which is supposed to provide some methods to access the NEN chip through the NEN bus. So the goal is to move some of the methods we have in NEN chip and move them into NEN controller, but not all of the methods, because some of them are really related to the NEN chip itself. So you know that some NEN vendors provide some features, and depending on the vendor, it's not exactly the same set of command you have to send to the NEN to access the feature, so we still need some methods which are at the chip level. But then most of the methods we find nowadays in NEN controller and NEN chip are actually related to the NEN controller. So yeah. And the second aspect, actually I'd like to think a bit about, think a bit before moving all the methods from NEN chip to NEN controller, because some of them are actually not so good and not fitting what the NEN controller nowadays works. So we need to take our time and think whether this method fit or whether we should introduce a method which is better fitting our needs. So if you look at most drivers nowadays, this on the left, this is what's done. The NEN controller drivers just assume that they have one controller and one chip, and they declare that as a single entity and register the NEN chip to the NEN framework. But actually when you look at the sheet of those NEN controllers, they usually can handle more than one NEN chip. And this is how it should be done. So you should have one NEN controller and then you should discover several NEN chips on the NEN bus. And your NEN controller should be able to handle more than one NEN chip. So yeah, about the method you'll find in NEN chip itself, it's most of them are really not clear. Sometimes the ropers think they are using them correctly or implementing them correctly, but actually they are not. And yes, I already said that it's not clear which method should be implemented by the NEN controller driver and which one should be implemented by the NEN chip driver. And yeah, if you look at the method, you have two kind of methods. You have those which are designed to be used by simple controllers and you have those which are designed to be used by more advanced controllers. And it's not clearly stated in the documentation which is kind of confusing. And as usual, everything is just mixed up and strict NEN chip. And it's really easy to get it wrong. So let's just take an example. And I think this is the method which is actually the most abused in the NEN framework. So let's have a look at the command function method. So this is the method which is used to ask the NEN controller to send a command to a specific NEN chip. And before we dig into the command function method, let's see what a NEN operation is. So a NEN operation is here to do a specific operation, like read a page, write a page, erase a block, read the NEN identification ID. And if you look a bit deeper, you'll see that a NEN operation is actually formed of one of several command cycles, one of several address cycles, and one or several or zero data cycles. So let's take a few examples. The read page command is first sending a command cycle, zero, zero command cycle, then a few address cycles which are telling the NEN chip which page should be read. And then the 30 command cycle, and then you have some data cycles to retrieve the data from the NEN chip. It's pretty much the same for the write page command, except that you'll have the last command after you have transmitted the data, because you need to transmit the data before programming the page. But you also have less complicated commands like reset the chip. In this case, you only send a single command cycle. And read ID which is just reading the NEN ID to detect which NEN we have on the bus. And in this case, you have a single command cycle, a single address cycle, and then a few data cycles to retrieve the ID. So now let's go back to the command function. Actually, it's partially handling the NEN operation. Command function is here just to send the command cycles and address cycles. It's not doing all the data transfer on the bus. And the data transfers or the data cycles are actually done using read, write, byte, word, or buff. So you remember that I said that the framework provides an interface for simple controllers and an interface for complex controllers. And actually, this is done using the command control method. And then the core provides a wrapper to implement the command function method. So if you look at the default implementation of command function in the NEN framework, you'll see that it's calling command control several times. So it's calling command control each time you have a command or address cycles. And that's how it's implemented. So first thing, depending on whether you are implementing a simple NEN controller or a complex one, you will have to choose between implementing command function directly or letting the core do all the hard stuff for you and just focus on implementing command control just sending a single cycle, whether it's a command or address cycle specified in parameter. But again, even if command function is supposed to be designed for advanced controllers, it's not really the case. Because nowadays, the NEN controllers are able to send the whole NEN operation in one go. So you'd want to actually, in command function, you'd want to also do the IO operation, the data transfer. And the problem is when command function is called by the NEN framework, it's not passed any information on how many bytes you have to read, which is a real problem because some NEN controllers just can't do NEN operation without linking with it the IO operation. So yes, even if it has been designed for advanced controllers at some time, it's not really the case today. This is also a problem because the NEN operations evolve over the time. The NEN vendors decide to add new operations. And if you ask all NEN controller drivers to support all the set of commands, this means that when you add a new operation and you want to use it in the framework, you'll have to go over all the controller drivers and patch all of them to make them support this new command, this new operation. And this is really a pain to maintain because yes, you'll have to patch everything and then ask all the people to test it, and that doesn't work well. This also implies that for the same NEN chip, depending on the controller you have on the board, you won't have the same behavior because, yeah, the controllers are free to implement the command function as they want. So they may just support a small set of functions, which means you might not be able to access all the NEN features, which is again a pain to maintain and to explain to NEN users. And, yeah, the fact that all NEN controller drivers have to re-implement everything is kind of encouraging people to just implement a minimal set of commands. And, yeah, this just teared the whole support down for the whole subsystem. So to address this limitation, the idea is to just add a new method inside the NEN controller structure. And this method would actually ask to execute the whole operation, which means this time you would have the whole thing, including the IO transfer and the size you want to transfer. So this should fit most of the NEN controller, at least those that I've seen. But I've heard that some people are interacting with NEN controllers, which are not allowing such fine-grained configuration. So all they are allowing is some idle commands, like read this page, write this page, or read the ID, or reset the NEN. And in this case, yeah, the exact operation is not really good. So for this kind of case, I guess we'll have to add a new interface to GLOAs, high-level NEN controllers. But maybe it's not even supposed to be on the NEN framework. Maybe it should directly be linked to the MTD subsystem. So that's not really clear yet, and I don't have any of these controllers, which means I can't provide something really. So if some of you have some ideas or this problem, then just come and talk to me after the talk. So yeah, this is not the only problem we have in the NEN framework. Another one is that the NEN framework tries to be smart. And sometimes it helps because this means NEN drivers, NEN controller drivers don't have to implement everything, don't have to implement all the methods which are described in NEN chip. But sometimes the decision, which is taken by the framework, is just wrong. And it leads to some weird behavior when you are trying to use the NEN. So the idea is to try to avoid guessing what the NEN controller driver wants, and instead providing helpers and ask the NEN controller driver to use these helpers when they want to rely on the default implementation. When the driver does not implement affection, instead of trying to do something, we should just say it's not supported and return an error. And that would be a bit clearer than trying to do something which is obviously wrong. Yeah, we are also trying to adapt the framework to developer needs. So some people complain that, for example, they couldn't test whether bit flips were present in erase pages. And instead of having the same block of code in all drivers, we decided to provide some helpers for that. Brian also did some work to automate the GT parsing, which is good, because this removes a lot of boiler play code in all controller drivers. And we recently also automated the timing setting, which removes also a lot of code in some drivers. So this is good, but of course we need to continue that, and we need to push it even further. And yeah, when we move to the NEN controller approach, we should really take the new constraint into account. So new NEN controllers are able to do awesome stuff, and we should really support that from the ground up. Another thing that is, in my opinion, a problem, but maybe not that much, is that the NAND and MTD concepts are just mixed all over the place. And if you look at the method you have right now in NAND chip, they are usually passed a pointer to an MTD device and a pointer to a NAND device. And actually, those objects are the same thing. The MTD device is just the abstraction of the NAND device so that any kind of MTD user can use the device. So yeah, we tried to remove that. The first thing we did was including the MTD device directly in the NAND chip structure, which means now you are able to retrieve the MTD device from the NAND chip object, and this way you can just pass the NAND chip to the NAND chip method. So this removed one of the parameters, and of course we would like to go further and just completely hide the fact that the NAND chip is actually an MTD object. But that takes a lot of time. So now I think the most interesting part is this one, and yeah, the rework we are trying to do is actually based on the assumption that the current implementation is not providing good performances. So the idea is to try to get the best of the NAND chip and the best of the NAND controller and try to provide decent performances, which is not the case right now. So first let's have a look at the different design we have in the wild. The first one on the top is where you have a single controller and a single NAND chip, and you connect them through the NAND bus. So that's what you usually have on boards. But sometimes you have NAND chips which are actually embedding two dyes, and in this case you will connect the same NAND chip. This NAND chip will multiple dyes to a single NAND controller, and you'll be able to interact with different dyes. And the last case is pretty much the same as the one in the middle, except that instead of having two different packages, you have, yeah, instead of having one package, you have two different packages. But from the NAND controller point of view, it's pretty much the same. So the case in the middle could just be under as two different chips. It's just that. It's in the same package. And the good thing is that modern NAND controllers are able to take advantage of that. So they are able to access the multiple dyes in parallel, and hue operation, and due fancy stuff which are improving a lot the performances. The bad thing is that the NAND framework is just preventing all of that. Because all the accesses that are going through the NAND controller are just serialized at the framework level. And each time you want to actually, for example, read a page, the framework will split that read operation into several common func read page operation. And this takes a lot of time, and this prevents all optimization at the NAND controller level. So the idea I have, and it's not definitive yet, but the idea is to completely remove the dual serialization at the NAND layer level, and instead ask the NAND controller to do all the queuing work and the queuing work. So instead of sending a common function, you would set a high level operation which is queue NAND IO request, and then you would wait for the NAND IO request to be completed. And this way you can send several NAND IO requests in parallel and let the NAND controller just dispatch the request as needed and try to optimize all the operation. Now let's talk a bit about the optimization you can do at the chip level. So the first thing is that no matter what you try to use at the chip level, it depends on the implementation of the controller. So even if we try to use the cached accesses or the multi-plane accesses or the multi-dye accesses, you'll need to have a controller which takes advantage of that. Otherwise, it's pretty much useless. So yes, the idea is to really, once we have decent controller support, try to use those advanced features because they are providing better preferences. The thing is that not all chips support those features. So we need to have a way to let the NAND controller know which features are supported by the NAND chip. This is already standardizing on FI and GDEC, but of course, not all NANDs are compliant with on FI and GDEC. So we need a way to expose that in a generic way. And of course, in the end, the one taking the decision of optimizing or not the access is definitely the NAND controller. So let's have a look at the cached access feature. When the NAND says that it can do cached accesses, actually, what you have is two different regions in the NAND which are used to store the data which will be read or written. And instead of waiting for the data to be flushed to the NAND or retrieved from the NAND, you start retrieving the next page or getting data to write to the next page. So this also helps a lot when you're ECC calculation and IO transfers take a lot of time compared to the read operation or program operation. And with modern NANDs, those MLC and those with big pages, actually, it helps a lot because you can start doing the IOs before the program operation is done or start reading the next page before the IO operation and ECC correction is done. So yeah, as I said at the beginning, it could help improve the performances a lot. Another kind of optimization and actually that one is a bit harder to implement. But the idea would be to try to have some kind of IO scattering at the NAND level or at the NAND framework level because, as I said, you can access different dies in parallel. And this feature can help us, yes, optimizing the access time. So the idea is to provide some kind of scattering and some kind of queuing at the NAND die level and maybe at the NAND plane level. And then have some kind of algorithm to order those accesses and let the NAND controller just dequeue one of these queues when the die is free is ready to be accessed. So I'm not sure exactly what the gain would be, but I guess it could bring a huge performance gain. So it's, again, it's just a basic idea. I don't know if it's easy to implement or not because I never tried, but that would be good to have. Another problem I've seen is that a lot of modern NANDs do not comply with the on-fee standard or GDEC standard. And even those who comply with those standards, they're still exposed private commands. And actually, those private commands are just not supported right now. And some of them are quite important, like read, retry, which is almost mandatory for MSC NANDs. So in the current design of the NAND framework, the only solution we would have to support that is to implement that directly in the core. But if you look at the different NAND private commands, you'll see that the size of the code will grow quite fastly. So the idea is to instead try to separate the NAND ship drivers from the core itself and put them in specific drivers. So there was a proposal, which I posted a few months ago, which was trying to do all the NAND detection and take all of the code out of the NAND core and putting the vendor specific code in spirit C file. And actually, it worked fine on my different setup. But since I never had any tested by or reviewed by, I never pushed that further. But that would be a good thing to clean up a bit the code in the NAND core. Another thing that people are trying to do right now is trying to share some code between non-based devices. So far, we have seen that so in the slides, I described the NAND framework, which is actually here to deal with raw NANDs. But maybe you'll see some different NANDs in the future. And typically, I see a lot of spy NAND drivers, which are coming around right now. And those spy NAND drivers would like to share some code with the NAND drivers, like the bad block table code, which has usable all kinds of NANDs. It has nothing to do with raw NANDs. So the idea is to provide an intermediate layer, which is exposing generic NAND features and abstracting away the NAND interface, so that we would have some common code which could be used for raw NANDs, one NANDs, spy NANDs, and maybe others. And currently, we're just trying to share the bad block table code, but maybe we could go even further and share the software implementation of BCH, ECC, or AMING, ECC. So, yeah. A new door open. Let's see what happens. And actually, I posted a proposal recently to do that. And hopefully, it will be in 4.10, 4.11, I don't know, but should be in the near future. So, yeah, the basic idea is to try to factorize the code and get, yeah, factorize as much as we can the code. So, as I said, people were working on that, and I took over recently and proposed patch series, which was creating a NAND device, which is the abstraction for the interface agnostic NAND device. And then I was inheriting NAND for NAND device into NAND chip. So, with this trick, I was able to just provide a generic interface at the NAND level and move the bad block table code at the generic NAND level. So, that's pretty much all I had for today. Again, this is just work in progress. Some of what I proposed has not even been implemented. So, anyway, any help is welcome. Any feedback is welcome. And, yeah, you can do different things if you want to help. First, you can share your ideas or your problems, explain what you'd like to see in the framework. Or you can directly propose implementations so that I can review them, or you can review other proposals. And, yeah, of course, testing is welcome. Because each time I do a change at the NAND framework level, it may impact a lot of people. So, that would be great if other people could test the changes. At the driver level, if you are an implementative driver or maintaining a driver, please try to convert the driver to the new infrastructures. So, I try to patch as much drivers as I can, but I just can't patch all of them. So, yeah, if you can help with that, that would be great. And, of course, review other submissions. Do you have any questions, suggestions or comments? Yeah. Do you have a mic? Yeah. Yeah. Yeah. Actually, the NAND controller is able to detect NAND chips automatically. All it needs is a chip select line, or several chip select lines. So, yeah, just about describing the thing differently in the device. If you want to see a single device, just describe a single node under your controller, and then say that your node used two different chip selects, and then it will be exposed as a multidit chip. If you want to have two different chips, just define two different nodes under your NAND controller, and assign to each of these nodes the chip select line you want, and it should be seen as two different chips. So, it's just about describing the thing properly, but internally, it's pretty much the same. It could work. You would just have more chip select lines, and, yeah, you would just have more dies. That's all. So, you would have two nodes, and each of these nodes would be assigned two different chip selects, for example. No more questions? Suggestions? So, yeah, the question, okay, the question is, is there any plan to be able to use standard block file system on top of NAND? Well, you need an FTL for that. And the only FTL we have right now is UBI, and it's designed to work with UBI FS, because it's acting at the erase block level, and not the page level, or even smaller block size. So, no, we don't have that right now. Yeah, no, MTD block is not designed to work with read write file system. It works with read only file system, but not well with read write file system. So, no, there is no solution right now. But if you want to come with an FTL implementation, I think we want to reject it. So, it's just that nobody matters before. Yeah, you can use a randomly block file system. Actually, that's done a lot. You put your randomly file system inside a UBI volume. Yeah, this works well. But it doesn't work with read write file system, which is, I guess, the question. No other questions? Okay, thank you.