 It's time. So let me get started. Hi, everyone. I'm Faiz from Texas Instruments India and today I'm going to be talking about the UFS subsystem inside Kernel and in Uboot. So just like a show of hands, how many of you have worked on UFS or are planning to work on UFS? Quite a few people. Nice. So let me get started. A little bit about me. I've been working with Texas Instruments India for the past from 2017. I've worked on peripheral drivers for various TISOCs, mainly including MMC, CAN and now UFS. This presentation is a result of my work getting the UFS subsystem into Mainline Uboot. So what exactly is UFS? Primarily it's a managed flash as a lot of you guys must know. Now flashes are not very reliable and that is why in most of today's applications they come together with a flash controller in a single chip. So there are things like bad block management, ECC, where leveling, where some memory addresses might go bad or you might read the wrong address or if you continuously erase a given address, the flash might give out at that address. So because of all this you require a flash controller to manage the flash. The second thing is, so in today's systems you can have a host communicating with a defined communication protocol with the flash and then the host does not really have to worry about how much flash is there or how many lines I need to go attach to the flash. The communication protocol will just help the host send a command and get whatever data from this memory address to this memory address as it wants. So some of the most used communication protocols for flash subsystems are MMC, SATA, SPI, we just had a hyperflash talk in forum one I guess most of you were there. Moving on to why UFS, it's a very high performance serial interface and it's supposed to take very less power. So in any of your low power devices, your cell phones, low power laptops, et cetera, UFS will be very useful. It's in all of our, most of our smartphones up till 2015 or 16, we had EMMC storage, which is fast, but UFS has a lot of features that are on top of it, that are better than it. So unlike EMMC, UFS can have bi-directional full-duplex transfer speeds up to 1.45 UPS, which is in UFS 3.0, HHS gear 4. And UFS, because they want to replace EMMC and SD cards, both of them, you can have removable storage as well as storage that is soldered on the board. Bites, yeah. So a comparison between EMMC and UFS to just bring home the fact how much better it is. So EMMC 5.1, which is the latest standard, has IO speeds of up to 400 MPS as compared to UFS 1.45 UPS. And it takes up much lesser power while transferring data. So EMMC's work from anywhere between 1.2 volts to 3.3 volts, while UFS works at 0.2 to 0.4 differential. Also the random read speeds for UFS are much faster. It's 68,000 IO operations per second as compared to EMMC's 13,000. And the protocol is also better. So in EMMC, you can either be reading or writing the data at one point of time because it has a share, it has the same bus for both reads and writes. That's not the case in UFS, where you have different differential lines for input data and output data. And also because it's a new standard, you have a bigger address space inside the device. So you can go up to 16 TB of space. So moving on to UFS, the overall, how does the overall system looks in both kernel and in Uboot. So this is a simplified representation of how a UFS device will appear to a host. There will be a bunch of logical units, which I will explain about there later, and a separate memory for configuration of these logical units. Going down the stack, you have the UFS interconnect layer, which is the physical and data link layers that connect the host to the device. Then you have the UFS transport layer implemented as the UFS driver in Linux. And then the UFS application layer, which is implemented by the SCSI driver in Linux. So this diagram is basically going to represent what I'm going to say in this whole talk. Let's start with the application layer. So the UFS application layer is based on the SCSI command set and the SCSI architecture model. They basically picked up the simplified SCSI set and the UFS device will follow this protocol. So if you guys are familiar with this SCSI protocol, it consists of transactions based on units called command descriptor blocks. Each command descriptor block needs to define an initiator, which is the host device, a target, which is the UFS device, and a specific LUN, which is a logical unit inside the device, and a query. So a query is basically, it's the opcode of the command that you want to send. So you can have read, write, read the device capacity, read what all logical units are there inside the device, et cetera. So most CDBs are six bytes long, although you have 10 and 16 byte long CDBs as well. So these are some of the important commands that you will be sending as part of the UFS during UFS operations. So obviously read and write. Then next is read capacity. So if you target a logical unit, it will tell you how big the logical unit in terms of the block size of the UFS device. Next is report LUNs. During enumeration, you can find out which logical units are active and what logical unit number they have been assigned to. The next one is S unit ready. You can send this to a logical unit to find out whether it's ready to accept your other queries and read, write commands or not. The next is a start-stop unit, which is basically used to switch power of the UFS device. So you can go from active mode to a low power mode to a completely power down mode and back to active mode using this. The next one is inquiry. You can send it to a logical unit to get some other information about the logical unit, like what is the vendor ID, whether this logical unit is writable or not, things like that. These are some of the more important commands, but the actual command list, you can go and read inside the spec. It's a lot more elaborate than this. So I've been talking about logical units all this time. I guess it's time to define what they are. Every UFS device will contain a set number of logical units, each of which you can assign the complete UFS capacity to. So for example, I'll say this logical unit, I'll assign it 100 MB. And the next one will have, let's say, 400 MB of the complete device storage. So the device capacity can be mapped to all of these logical units. And it's not necessary that if eight logical units are defined, you need to assign memory to all of them. You can have just some of them enabled. Usually in our system, we'll have one boot logical unit in which all the boot images are present, and there will be a root file system logical unit. The good thing is that this is not the only configuration that you can do per logical unit. There are a bunch of other properties of logical units that you can configure. Things like write protection, whether it's a boot logical unit or not, which means that you can do a boot operation on that logical unit. So a boot operation is basically used to get boot loaded images from an internal SOC ROM while enumerating, while your device is undergoing enumeration. During boot operation, the UFS full stack will, like there is a, it only comes up halfway and only enables the boot run. So you can fetch the boot loaded images much faster. That's correct. That's correct. It's similar to EMMC. But if you are in, in EMMC, you have to pre-define which, what address, what image you are actually going to send. So in EMMC, the ROM just needs to send you one command and it's predefined what image the device is sending back. In this case, you can actually choose. There is like a proper lead command with which you can choose, okay, this is the address I want to go and read that. Basically. The next property of Lance that you can set in UFS is memory type. So there are four memory types that are possible. Default is your normal root file system memory type. So you can read, write, modify memory, like it's the default type. System code is used for memory that is not updated, that is not written to a lot, like very often. For example, boot memory. You only need to go and update the firmware or, and the booter images once in a while. So it's optimized for reading and you'll have much faster reads. The next one is non-persistent. If you set your logical unit to be non-persistent, it means that it's not, it's basically non-molatile. So you can, the application is you can have it as a swap space to extend your virtual memory space from the RAM. A few more properties are basically priority access. So you can set it such that some logical units have a higher priority over others. So those tasks assigned to those logical units will get done faster and they'll report back faster than others. And the last one is RPMB. People familiar with EMMC, this is replay protected memory block, which basically means you need a cryptographic algorithm to go and access this memory and you need to provide hashes and keys to access this type of memory. Going ahead. Now, the spec defines the number of lumps to be limited to 32 lumps, but in the devices that I have used and in most devices, the number is usually eight. Aside from these logical units, you can have four more well-known logical units which are sort of special and each of those has a separate function. So the first one is the report and these logical unit numbers will be special. So any report lens logical unit will have this number or a UFS device logical unit will have this number. So the first one is report lens. This is the logical unit you need to address your report lens SCSI request to and it will return you what configuration, which lens are active on the device and what their numbers are. Next, you have the UFS device lens. Any inquiry request that you need to send goes to this. Yeah, and you can basically use it to do a lot of configurations inside the device. Next is you can set any lens to a boot lens like I just said and it will basically once you set it as a boot lens, you can actually using this logical unit number. Next is RPNB, which is the security protocol protected memory. Moving ahead, this is a representation of how the how Linux SCSI driver brings up a UFS device. The first thing after detecting a device as a UFS host is it will send the report lens command. And after getting this response, you have it knows, okay, these logical unit numbers are active and I can go and start reading and writing from them. But before that, what I need to do is I need to know the capacity of each of these logical units and basically save it on the software side. And after this, so once these steps are done, your logical units will show up as SCSI devices under slash dev slash, like SDA, STB, things like that. So after this, you are free to do read write accesses, depending on how your logical units are configured. Going ahead, going down the stack to the UFS interconnect layer. This is implemented by the MIPI physical layer standard and the MIPI Unipro data link standard. These are, you can go to the MIPI website to read more about these standards. But what I'm going to ask is, so the interconnect layer defines what signals are present in the device. It's just four signals. So there is a reset signal, which is used to reset the device and bring it back to a known state. The reference clock, the, and a pair of input and output differential signals. So dn underscore t is the higher voltage of the differential signal, underscore c is the lower voltage. And you can have up to two of these links. So a pair of dn and d out and another pair of dn and d out. You can have up to two of these as defined in the standard. And the physical layer also defines a bunch of high-speed gears that the UFS device can go up to. So at HS gear one, you have, you can go up to a maximum rate of 182 MPS. At gear two, you can go to 64 MPS. At gear three, you can go to 728 MPS. And at the highest gear for about 1.4 GBPS. Next, what is defined inside the interconnect layer is the UFS power modes. So because it's a low-power device and you're supposed to use it with in portable low-power devices, it needs to have some power, a low-power mode into which it can go to. So the first mode that you will usually have is the active mode. This is when your UFS devices is basically processing some request. So you have sent a SCSI command to it, and it's processing a request to send back to you. In active mode, you can have up to 16 levels of power consumption, 16 levels of power consumption. So the highest level, 15 will have the highest level of power consumption and hence performance, and 0 will have the lowest amount of power consumption and performance. Why does this exist is because you might have a use case such that most of the times your device is not, is running off battery. So at that point of time, it should be consuming lesser power while active. But at other points of time, when you go and actually plug in your device, now it has a greater current that it can, and greater power that it can, that it has access to, and it can go to a higher active ICC level and do your request faster, basically. After your device is done doing any background operations or replying to your request, it automatically goes into idle mode, which is a lower power mode. There is no software intervention required here. The third one is your UFS sleep mode. This is a much lesser power consuming mode as compared to your idle mode. At this point, you can't access any of the logical units, you can't send any commands, you can't access data, and it's possible to remove some of the power from the UFS device. The third mode is the power down mode. At this point of time, like it's at an even lower power level than sleep mode, at this point of time, if you had any non-volatile storage inside the UFS device, that will be lost if you go down to power mode, power down mode, and you can remove all power from the device. Now, this sleep and power down mode, they match very well with the Linux power management ops, and therefore, you can basically have callbacks to runtime suspend, runtime resume, suspend, and resume in your particular driver. If you assign these callbacks in your particular driver, the UFS HD platform driver will automatically switch it to sleep mode or power down mode depending on the situation. As I have already said, all of these power modes are set using the start-stop unit SCSI command, the start-stop unit SCSI command. Moving ahead, now the UFS transport layer, this is the layer in which, which is like unique to the UFS spec itself, so the UFS spec took up SCSI drivers for its, the SCSI protocol for its upper layer, and the MIPI and MFI layers for its interconnect layer, but this is like sort of unique to UFS. The minimum unit of transaction in UFS at the transport layer is called a UFS protocol information unit, or a UPIU. At this layer, you again have an initiator and a target, a host and device configuration. There are different types of UPIUs. You can, you have different, you have them for commands, you have them for data operations, you want to, for task management operations or for queries to the device. And each transaction, each SCSI transaction that you get from the application layer will convert to a command UPIU, zero or more data UPIUs and one response UPIU. And each UPIU contains a 12 byte header at the beginning of it and the, the, and we just come into all UPIUs and the rest of them is, and the rest of the UPIU structure depends on which actual UPIU you're trying to send. So on the left side, I'll keep the UPIU header constant. So, and highlight the, some particular features of it. And what, it basically, the header corresponds to a lot of the features that you have in the transport layer. So it's, it's good to have it on one side. So this slide explains the types of UPIUs that exist in the UFS transport layer. The first, and they are, and they are different UPIUs for the initiator and the target. So an initiator will send a NOP out UPIU and receive a NOP in UPIU. And similarly for all of these others. So NOP out and NOP in are used to check whether you have a connection to the device or not. It's basically used for debugging. So if you have a connection to a device and you're able to access the LAN, you will get back a NOP in UPIU, which is a good indicator of whether that all your lower level interconnect layers have come up properly and are functional. Next is the command UPIU. So your UFS SCSI CDB will be embedded into this command UPIU and send to the UFS device. And the, and correspondingly you should get a response from the target. The next is data out UPIU. It's to write data to the device. Data in UPIU is to read data from the device. That's very simple. Task management request and response is basically in each of the logical unit you will have a task queue. And once you have sent a data out or data in command, you can, after that the host can basically go ahead and remove their task and either cancel the task or clear the whole task queue. Like those kind of things are task management operations and have separate UPIUs. The next important one is a query request. Yeah. The next important one is query request and query response. These are used for the configuration descriptors inside the UFS device. This is an example of a UDP read transaction. So you got a read CDB from the application layer and you convert it into a command UPIU and you get back a bunch of data in UPIU depending on how much data you want to read and it ends with a response UPIU. Now because your host will usually be much, much faster than the target device, a write command needs to be flow controlled with radio transfer UPIUs. So you send a command UPIU for corresponding to a write CDB and you get a bunch of, and you need to wait for a radio transfer UPIU before you can send out your data out UPIUs because the host, the target needs to take some time to allocate that memory and be ready to receive your data. And again in the end you'll get a response UPIU. Moving on to how the UFS driver represents these UPIUs in memory and how to actually do a transaction. So these are the data structures you have to be aware of. The first one is UTP transfer request descriptor and which points to a UTP transfer command descriptor which is a table of a command UPIU, a response UPIU and a physical region descriptor table. So again whenever you are setting up, whenever you want to send it a SCSI command, you go and write, you go and allocate memory for a command UPIU, a response UPIU and if it's a data command, you need to give this physical region descriptor table like some data that you have allocated. This whole structure represents one SCSI request and you can have up to 32 SCSI requests queued at one point of time. So which corresponds to the task tag entry in the UPIU header. So at one point of time the different applications can go and queue up up to 32 such requests. And also there are up to, depending on how many logical units you have, you can have task management UPIUs for doing these tasks. You can do, having already queued a task, you can abort it, you can abort the whole task set, you can clear everything, you can do a logical unit reset and you can basically find out whether a task is queued to this queue or not. So the complete structure that you need to have before you can start your UFS transaction is 32 UTP command descriptors, 32 UTP transfer descriptors and the UTP transfer descriptors will have the base address of the command descriptors in each of them and also UTP task request descriptors. Once you have this whole memory, the whole data structure set up in your memory, you can go ahead with your SCSI, you can go ahead with your commands from the SCSI layer. This is some information about the host controller interface. So the most important part of the UFS subsystem is what logical units are there inside the device and how do I configure them? How do I say, okay, this logical unit should have this much allocated memory and it should be right protected and it should be a boot logical unit. So there are a bunch of configuration structures that exist inside the device that can be read using a query request UPIU. So there are three types of configuration structures inside the device. They are a descriptor, a flag and an attribute and all of these basically define different things about the device and about the various logical units that are inside it. The main thing, the main descriptor that we need to worry about is the configuration descriptor. You can read the configuration descriptor from the device, you can modify it such that and you can basically configure it such that by allocating different types of memory to different logical units and you can write it back to the device. So yeah, coming to the kernel implementation, you can find all the source code in driver SCSI UFS and there's a documentation file for inside documentation SCSI UFS and you can find the device tree bindings inside documentation device tree bindings UFS. The device tree bindings are very simple because this is a very highly standardized interface. Mostly what you need is the base address of your host controller registers and an interrupt and an interrupt that will and a top level interrupt property and a few others which are based on where you're actually putting your host controller. After you have implemented this device tree, all you need to do is call a UFS HCD platform in it from your probe and it's done. It's as simple as that. Your UFS device should come up. So this is an example of a UFS device. It's the UFS HCD has been directed as a SCSI host. First it came up in gear one and only one lane was active but it quickly switched to gear three and the full bus width of two lanes. It detected two well-known logical units and none of but none of them were a boot logical unit and it detected normal logical units SDA which has 32 MB space inside it and SDB which has 32 GB space allocated to it and a bunch of other configurations for both of them. Now once you have a UFS device, how do you access it from your user space? So this SCSI generic block layer exposes your device as a block SCSI generic device basically. So it's under slash dev slash BSD UFS BSD and it also exposes an IOCTL with a request course SDIO and this data structure SDIO v4. If you allocate this data structure with what request UPIU you want and some and a response UPIU and a bunch of other structures, you can go ahead and write to descriptors, write interconnect layer commands, write to flags, you can do all of those things. Now in the wild we only have one UFS utils that I have been able to find and you can use a very simple command to read any descriptor attribute or flag using this application. Examples like if you want to read the configuration descriptor you say UFS utils descriptor and you give the path to the device. Now this is not very user friendly because all this will give you is a data file. So if you hex dump that data file you will have a configuration descriptors and you have to basically understand what the configuration of the descriptor is which bytes represent what to actually change it. We really need to add a very user friendly way of configuration of configuring these descriptors and maybe I'll do it if I have some time later later this year. Coming to the UBOOT implementation so this I added this slide recently. My UBOOT implementation got merged this Thursday which is nice and you should see it in the 2020.01 release. It also contains a very basic implementation. All you can do is if you have a UFS device on on your board you can send a UFS init command which will go ahead initialize the whole stack, detect all the logical units and register them as SCSI devices and then you can do whatever you want with those SCSI devices using the SCSI commands. What it does not contain is commands to access the descriptors or flags or attributes which I hope to add later on but patches are welcome from you guys. Okay, I think we are done. Yeah, so how do you write these patches? First go and reference the Unifers Flask 2.0 specification in the gedick.org website and also the host controller interface specification and of course you can see the source code. You know where it is. Any questions? Yeah. Yeah, okay. I'm sorry. Yeah, yeah. Yeah, you can have a SCAL TOGETHER list assigned to this PRDT so you can point it to a to the base address of a SCAL TOGETHER list. Yeah. Yeah, I'm sorry. You can use this. Yes and say that it's reliable in the sense that if you're writing to a block and you have a power fail, it should either not have written to the block or have the complete data written to that block. Like it should not have some halfway written corrupted data anywhere. Other than that, I don't really think it's more reliable than that. Yeah. Any other questions? Smart beta? Any other questions? I don't really remember the kernel performance but in Uboot, we have measured up to 700 Mbps. ours is a gear 3, our device comes up till high speed gear 3 and I've measured till the maximum theoretical performance, at least in Uboot. I can get back to you on the kernel performance. No, I have not done that analysis yet. I can get back to you. Any other questions? In that case, you can go.