 Good morning, everyone. Welcome. My name is Vignesh. I'm from Texas Instruments India, and I'll be presenting about the Spina subsystem. To introduce myself, I am part of the Linux team that works on supporting various TI resources in the mainline kernel. I work on supporting peripheral drivers like QSPY, UR, touchscreen, and USB. And this presentation is mainly based on my experience of getting QSPY to work in TI platforms on the mainline kernel. So let's begin. This is that end of today's presentation. I'll be talking about Spinar flash devices and its types, communicating with the Spinar flashes, Spinar framework, the Spinar controller drivers and types of the Spinar controller drivers supported under the Spinar framework, writing of controller driver and ongoing development work and what's missing today in the Spinar framework. So what's a Spinar flash? Non-volatile solid-state storage media where the storage cells behave like a Narget, hence it's called a North Flash. North Flash are available in parallel interface or serial interface. So serial North Flash are typically interface to SOC via spy bus, hence they are called Spinar flash. These flashes have reduced spin count when you compare with a normal parallel North Flash. This table tries to compare Spinar with NAND and EMMC, which are the other two most common non-volatile storage media that you find on embedded boards. In terms of capacity, Spinar flashes are in the range of megabytes, whereas NAND and EMMC are available in gigabytes of capacity. North Flash have somewhere around 1 to 8 IO lines, but NAND and EMMC may have up to 16 IO lines. So the read speed when compared, you see that Spinar have fast random access, that is the time taken to access the first byte is usually in the order of nanoseconds, whereas NAND and EMMC have a random read access speed in the order of microseconds. But in terms of writes, NAND and EMMC are usually faster when compared to Spinar. NAND technology has such is not quite reliable, that is it suffers from random bit flips and poor write endurance, hence we need ECC and bad block management either in software or in hardware. EMMC has such needs tuning to support higher speed of operations, but Spinar has no such software overheads. All this reliability, fast random access and reduced number of pins makes Spinar an ideal boot media and it's mostly most commonly primary boot media or secondary or backup boot media in embedded devices. So this diagram shows a typical Spinar flash connected to a spy controller. The top three signals are common spy signals, there is a clock running from controller to flash, there is master out slave in and master in slave out. MOSI line is used by controller to send data to flash and the other line is used by flash to send data back to the master. There's a chip select line to select the appropriate chip to talk to, but the write protect and hold lines are mostly flash specific. Write protect is used to make nor flash read only and not respond to write or erase commands. Hold line is used to pass a transaction without actually deselecting the nor flash, so that you could pass and then resume the transaction. This diagram shows a multi eye of flash where there are four bidirectional IO lines connecting controller and the flash. Therefore, we call the flash as quads by flash and the controller is a quads by controller. In quiet mode, the write protect and IO lines double has, sorry, write protect and hold line double has IO2 and IO3 lines making it four IO lines. There are also nor flashes with up to eight IO lines which are called octal IO flash or you could have a nor flash working with just two IO lines, IO0 and IO1 in bidirectional mode. Such flashes are called dual IO flash. So there, to summarize, there are dual IO, quad IO and octal IO flashes. A little bit introduction to the spine or hardware itself. A spy flash is composed of sectors and pages. The sector is the smallest possible block size. It may be 4K, 32K, 64K or 256KB in size. The sectors are subdivided into pages. Pages may be 256 bytes or 512 bytes. A page represents what is the maximum number of bytes that can be programmed in a single write operation. So before we do any write or read operation, we have to set write enable latch inside the flash. This is done by sending write enable command. This has to be sent for every sector or every page in thumb when you are doing a write operation. So most flash devices support read ID command which is used to discover the flash. Read ID command is a JDEX standard. The sending read ID command, the flash will respond with the manufacturer ID, device ID and unique ID of the flash. This will help in discovering the flash in a unique way. Also, a spy transaction between a spy master and nor flash has four phases. Command phase, address phase, weight phase and data phase. During command phase, the master sends out the one byte opcode which will represent whether it's a read, write, erase or accessing a flash register like configuration registers or status registers that are there within the flash. This is followed by address phase which is three to four bytes long depending upon the opcode and the flash configuration itself. If it's a read operation and if the spy bus frequency is greater than 50 megahertz or if it's a multi IO operation like dual IO, quad IO or octal IO reach, there is an additional weight phase which is anywhere between eight cycle to a number of cycles as denoted in the flash data sheets. During this phase, there is no data exchange between master and slave but there is a clock running. Hence, these clock cycles are called dummy clock cycles. During this phase, the flash prepares itself to send data back to the controller. Finally, in the data phase, depending upon whether it's read or write transaction, the data is either sent from flash to the device or device to the master. Moving on, we'll talk about the spy nor controllers itself. Typically, the controllers that are supported in Linux can be classified into three types for the ease of understanding. Traditional spy controllers, spy nor controllers and the specialized spy controllers. The traditional spy controllers provide direct access to the spy bus. That is, you could send any arbitrary data out on the spy bus and send it to a spy slave. So these devices can talk with any type of spy devices. It might be spy flash or a normal spy device like a touchscreen, et cetera. These spy controllers don't usually have a very deep fee for and hence cannot support a large burst of read or write operations. The second type of controllers are spy nor controllers. These controllers are aware of the fact that the slave they are talking to is a spy slave and they only support talking to the spy slave devices, spy flash slave devices. And they are aware of the fact that the communication protocol has command address and data phase. Spy nor controllers provide low latency access to flash, either via memory mapped interface or any other accelerator interface. In the memory mapped interface, the entire Q spy or the flash device appears to be memory mapped inside the SOC address space. And CPU could just do a mem copy read from the address range and it will get the data from the flash. These controllers also support read prefetch and have an internal hardware buffers to accelerate the amount of data that can be fetched in one go. But they don't provide access to spy bus directly, which is the reason why these controllers cannot directly talk to the spy devices other than flashes. Then there are specialized spy controllers which are like hybrid of traditional controllers and the spy nor controllers. They will have interface which is similar to the traditional controller as well as an additional interface which would provide an accelerated access to the spy nor devices like memory mapped interface that I just talked about. Coming to the spy nor framework. The spy nor framework was introduced in kernel in order to support the second type of flash devices that I said the dedicated spy nor controllers which only talk about spy flash devices. It was merged in v316. It is present under the memory devices, memory technology devices subsystem and this is the path to the source code. This was derived from the pre-existing M25 PIT flash driver code which was supporting all flashes under the on the spy bus. It had code to support both flash and talk to the spy core and the flash specific part was refactored and spy nor controller driver was done. So what was the need for spy nor framework? The main reason was to support controllers that only talk to spy flash devices and these flash controllers needed to know information about the flash with which they are talking to and there was a need for a generic interface or generic framework that could supply this information. So we'll go into each type of spy controllers in detail. This is the traditional spy controllers. These controllers are quite simple. They have a TXV4, RXV4 and a shifter unit. CPU or DMA would write data into the TXV4 and whatever is written is just shifted out onto the spy bus and similarly the data that is received from the flash device is sent out by reading the V4. Since there is direct access to this V4 the CPU can send any type of protocol here. I mean CPU can implement any type of protocol and the same get shifted out on the bus. Hence making it possible to communicate with any type of flash devices or any type of spy devices like touchscreen etc. So there is no notion of command phase, address phase or data phase anywhere here. So this represents the kernel stack to access spy flash using the traditional spy controller that I just showed you. At the top we have MTD framework which is the memory technology devices framework which abstracts all type of raw flashes like NAND, NOR and similar devices. MTD abstracts these devices and exposes them as character devices or block devices to the user space. User space utilities like MTD utils or flash base file systems can be mounted on top of this carrier device interface to read write and access the flash. MTD also abstracts flash specific properties like presence of strike sector page and also does the ECC handling in case of say NAND flash. It provides wear leveling and bad block handling using the unsorted block images which is the basis for UBIFS file system. It handles partitioning of flash storage space based on either command line arguments or device tree data. All the MTD devices in a system can be obtained by reading MTD proc fs entry. You could do CAD slash proc slash MTD and it lists all the devices. So under this MTD framework we have the spy NOR framework itself. The spy NOR framework is responsible for implementing the spy NOR specific abstractions, the read write and erase operations of the flash to detect the connected flash and configure the flash to operate in appropriate modes. It provides the flash specific informations like erase size, page size to the MTD layer so that it could be used by the file systems and it also supports the dedicated spy NOR controllers which need to know flash specific information like what opcode to use, what is the number of address bytes and the dummy cycle information and so on. And finally implements support for multi eye of flash devices. So below the spy NOR framework we have M25 PIT driver which is basically a generic driver to access flash devices on the spy bus. The M25 PIT driver acts as a translation layer between spy NOR framework and the spy core. It implements the spy NOR interfaces and based on the information that spy NOR provides it generates spy transfer structures which would encompass all the command address data phase and then pass it as a spy message object to the spy core. All the communication within spy core or communication with spy core is always in terms of spy messages. Therefore M25 PIT acts as a translation layer and generates those spy messages based on the parameters supplied by the spy NOR framework. The spy core the some messages that are submitted by M25 PIT will land in the spy core. The spy core validates queues and sends the spy messages from upper layer to the controller drivers. The spy controller driver writes data to the tx and rxv4 and whatever data that is received via the rxv4 is sent along the same path back to the spy NOR and mtd framework. So this diagram shows the second type of spy controllers that is dedicated spy NOR controllers with memory mapped IO interface. So the entire flash here would appear as memory mapped region for the SOC at predefined address range as specified by the SOC. So the CPU would first configure the IP registers within the spy NOR controller IP with flash specific properties like the opcode to be used to read from the flash, opcode to use to write to the flash, number of address bytes to use and the dummy cycle and all this information would be pre-configured in the IP registers. Then the CPU would do a mem copy operation to read data from the flash. This transaction will land in the spy command generator block. The spy command generator would generate appropriate spy transaction on the spy bus based on the information that is supplied at the IP registers. And the data that is received from the NOR flash would be forwarded via the memory mapped interface. Since the generation of spy transaction on the bus is handled by the hardware, it is possible for the hardware to make 100% utilization of the spy bus cycles available here due to which you could achieve the maximum possible data write on the spy bus. So it need not be memory mapped interface, some controllers would just provide a big internal SRAM or a FIFO and the data that is received from the flash is stored in the FIFO and CPU or DMA could read in a large bus. Therefore making effective utilization of the available interconnect cycles as well. So the stack shows, this kernel stack shows how dedicated spinor controllers are supported in the kernel. So the spinor layer instead of taking the traditional route now will talk directly to the spinor controller driver itself. This will provide all the information required for the flash and the controller driver is responsible for programming the IP registers. The controller driver would also provide a way to write to the flash registers and read from the flash register and it will implement interfaces to read data from flash either via memory mapped interface or some internal hardware buffers. This diagram shows a specialized spy controllers which is the third type of controllers that I talked about. These controllers provide two interfaces, one would be a spy interface and another is a memory mapped interface. Using the spy interface which will mark here as direct access path, CPU can directly access TX and RX-FIFO. Therefore you could talk to any type of spy devices not just the flash using this direct access path but it also provides memory mapped interface which will help you to read data at a faster rate from the flash. So the read transaction especially with respect to flash will go through memory mapped interface but all the write and erase transactions can go through the normal spy mode and also you could access other flash devices using the normal direct access path. So this shows the stack for accessing such spy controllers. So the hash as I said before the normal write transaction and erase transaction go through the M25-PAT translation into the spy core and the actual spy controller driver will reside under the spy core itself and this goes via the normal path but then for the read transaction M25-PAT will call spy flash read API which is a special API provided by spy core to talk to the spy controller drivers with memory mapped IO interfaces. The spy flash read API is supplied with spy flash message structure which is similar to the spy not structure and this contains all the information required for the controller driver to access the specific flash. So the writes go through the normal interface but the reads go through the spy flash read API and the implementation of that API by spy controller drivers. So where do you put a driver if you have a spy nor controller driver where would you put in one of these frameworks either in spy framework or spy core framework spy nor framework or on the spy framework is decided by how you want to support devices on your spy bus. If the spy framework sorry if the spy controller provides direct access to the bus and there are no accelerated interface as such then you would put such drivers under spy framework. If the controller IP supports only talking to spy flash devices and cannot talk to other type of devices then you would put them under spy nor framework and if you have a controller which has both the spy interface and the memory mapped interface and your board or SOC has both type of devices in the spy bus that is you have a flash device as well as a normal spy device then you would put the controller driver under the spy framework but implement both spy related callbacks as well as implement spy flash read API interface so that you could access flash reads via this interface but access other spy devices via the normal spy framework. So how to write a spy nor controller driver? The spy nor framework expects these four at least these four APIs to be implemented that is read reach write reach read and write APIs so that read reach API you implement this API to read data from the flash that is either status register or configuration registers or the flash discovery ID that is read to send read ID command or to read serial flash discoverable protocols and so on to send various I mean to read various registers within the flash then you would implement the write reach API which is used to write to these registers say to send write enable command or to set flash into quadio mode and all all the write transactions to the spy configuration registers and finally implement read write APIs which is used to read the actual data from the flash either via the memory mapped interface or any other accelerated interface that spy nor framework that the spy nor controller would provide. So the spy nor struct that's being passed here would contain the details required to access the flash. So in the probe of the controller driver you would call spy nor scan the spy nor scan will request the spy framework to send a read ID command that is jdec read ID command to discover the manufacturer ID and device ID of the flash. Based on this information there is a table within the spy nor framework which will dictate for the given flash for the given flash device ID what is the size of the flash what is the erase size sector size and page size of the flash and whether or not it supports quad mode dual mode or octal mode and what are the opcodes that needs to be used for the specific flash. So when spy nor scan API is called the spy framework would based on that table populate the spy nor struct with all the required details this will help the spy controller drivers whenever it wants to do a read or write operations to actually configure the IP registers which require address and address or sorry which would require read or write opcodes to communicate with the flash. So then you would call mtd device register which will register the discovered flash with the mtd framework and result in creation of slash def slash mtd interface. So if if there are multiple spy nor devices connected to the controller at different chip selects you will have to call spy nor scan for each of the chip selects and register each of the device. So this is an example for instantiating spy nor controller driver using dt. So this is an example from cadence cues spy driver. So you the this represents the controller node that is compatible which says spy it's compatible with cadence spy nor you can see there are two rich properties the first rich property corresponds to the IP registers where you would actually configure splash specific data or other IP registered configuration and the second address range actually represents the memory mapped IO interface using this interface one could so this is the address range from where cpu or dma would do mem copy operation so that which gets translated to translated to spy nor messages and goes out on the spy bus and you would get data back of what is present on the spy flash so each of the fly spy flash devices would appear has child node to the controller driver you would say compatible it's equal to jdx spy nor if the spy flash supports read id api and based on the flash that was discovered the spy nor frame would take the appropriate actions the rich is equal to zero this property will suggest that this flash is present at chip select zero and the spy mark max frequency will dictate the spy bus rate at which the uh communication would happen between the flash and the slave sorry flash and the controller and similarly you would have other flash devices as well so if the control if the flash doesn't support uh jdx read id then you would have to supply the name of the flash has one of the parameter to the spy nor scan api and if that flash is supported by the framework then it would do the appropriate configurations also spy nor scan api will accept hardware capabilities of the nor flash controller driver that is whether or not it supports all the i o modes or it's just a single i o device based on the properties of the discovered flash and as well as the uh properties of the controller the spy nor api would choose the appropriate uh i o mode to be used uh this table shows performance comparison between various uh framework that i talked about using ti q spy controller uh on dr s7 xs families of soc this uh the ti q spy controller is a dedicated i mean specialized spy controller which can talk to both spy devices as well as spy nor flash devices so um under spy framework with without any acceleration just going through the traditional route uh the read speed is around 800 kilobytes per second with a CPU load of 70 percent but under spy nor controller driver have or using the spy course accelerated read interface which is memory mapped interface in case of ti q spy controllers the throughput is approximately four megabytes per second so one of the reason for this large jump is the fact that q spy controller doesn't support bus greater than 4k in case of memory mapped interface the bus can be as long as uh i mean whatever is the length of the read but uh in normal spy mode it can only support up to four k bus uh such limitations actually uh lead to decrease in the read performance in in case of normal spy mode of access uh and then with dma we see that so the controller actually doesn't support reading uh dma reads from the fee for that is it has no events going to the dma so it's not possible to read data via the fee for mode but if you use the memory map mode and use dma we see approximately 20 mbps of read with 15 percent cpu load uh this is because when compared to cpu the dma can do larger bus uh hence uh you could see that the speed increases further here but uh the spy nor controller uh i mean the spy nor framework itself has no support for the dma i mean no generic support for the dma but you could still implement dma apis and would see the similar performance uh there is no increase in terms of write speed because in write most of the time is spent polling on the flash to know whether or not a write operation has completed so uh i'll talk a bit about the ongoing developmental work in spy nor framework uh one of the main uh things that's being addressed is use of 4 byte addressing mode for opcodes flashes uh older flashes had uh sizes less than 16 megabytes and just required 3 byte of addressing mode but newer flashes have higher density and would require 4 byte of addressing mode uh and there are special opcodes which expect 4 byte of addressing codes and there are opcodes which just take 3 bytes of address uh and flashes provide many ways to uh use the access the memory region above the 16 megabytes that is you can enter a 3 byte addressing mode uh sorry you can enter a 4 byte addressing mode by setting a bit within the flash and then all the 3 byte addressing opcodes would also expect 4 byte addresses or you could use the dedicated opcodes which will always expect 4 bytes of opcode irrespective of what is the state of the flash so uh but we should make sure that the communication with the flash is stateless and there's no bit that is set within the flash to enter 4 byte addressing mode or come out of 4 byte addressing mode that's because it would create uh incompatibilities with the boot loaders which would expect it to be in 3 byte addressing mode say for example because the boot loader uh boot loader might be residing within the first 16 megabyte of area so uh another uh thing that is being addressed is flash can send sorry the controller can send command either on one wire or send command on all the 4 wires similarly address can also be sent on one wire or all the 4 wires so we call this as 114 one wire for address one wire for data and four where sorry one wire for command one wire four address and four wires for data and similarly four wires for address uh sorry four wires for opcode four wires for address and four wires for data. In order to work in quiet mode, there is a bit within the flash which says quiet enable. You will have to set this mode, set this bit before actually using quiet mode of operation. But this bit behaves differently on different flashes. Like for example, on expansion with this bit set, you could use the flash in this mode or this mode depending upon what command you are actually sending. There are different commands for 114 mode and 444 mode. But on micron, it's always 444 mode that is supported with quiet bit enable set. So we could just not set quiet enable bit and not worry about all type of flashes. We will have to do some work for micron alone. So the other thing is handling different sector sizes. Flash may support either 32K, 64K or 256K sector size and optionally they also support something called as small sectors which is 4K in size. So there are dedicated opcodes for this sector arrays versus 4K sector arrays. And there is what you call as serial flash discoverable parameters or table which is being supported in v4.14. Based on serial flash discoverable parameters, I mean the serial flash discoverable parameters specifies a table called basic flash parameter table. This table contains all the information regarding whether the 4K sector sizes are supported and what is the opcode to arrays a 4K sector and what are the opcodes to access in 114 mode or 444 mode and so on. There are different versions of serial flash discoverable parameters mainly 1.0, 1.5 and 1.6. If the flash supports 1.6 that is the newest version it would have all the information but older flash versions, I mean older flashes don't support the latest version and we still have to handle this in the framework. Finally there has been patches for supporting octal mode flashes and also a DTR that is double data rate mode wherein data is sent both on raising and falling edge of the clock. So one of the things that is missing today in the framework is DMA support. So flash files, the main reason why it's not so easy to support DMA with north controllers is the fact that the flash systems or the flash file systems which make use of the spine or flashes are not written with DMA in mind that is they make use of VMalak buffers and it's generally not easy and safe to DMA into VMalak buffers. It is known to cause issues with VIVT caches and if it's a VMalak buffers it's possible that the buffer may be from the LPA back memory region and if the DMA engine is that's 32 bit in size then it cannot actually address the regions in I mean buffers in the LPA region which require more than 32 bit of addressing. So the spy core tries to handle this VMalak buffers by mapping it to an SG list and passing it on to the spy controller drivers but this still cannot deal with VIVT caches or LPA memory back buffers. So one of the solutions that is used is use of bounce buffers. TIQ spy driver uses bounce buffers which is an intermediate buffer from which you copy data from VMalak regions to bounce buffer and then copy from the bounce buffer to the actual destination buffer. There are other spy controllers which also make use of bounce buffers but this is spread all over the all over the drivers and there is no common implementation as such. So one of the questions that keep coming up on spy or MTD mailing list is can DMA mapping DMA mapping APIs that are generic implementations that are there in the kernel can that be modified to support VMalak buffers for DMA in the same way so that it could benefit all the driver frameworks and if mapping such buffers is not possible whether it is possible to provide a bounce buffer generic bounce buffer implementation within the DMA framework itself so that all drivers can make use of it on its own. Here are some of the references that I used to make this presentation and I would like to thank Texas Instruments for sponsoring my travel and the next foundation for giving me an opportunity to speak at this occasion and with that I open for question answers. Yeah that's dual stack device. I don't think there is support for any type of dual stock devices. I mean dual stacked spy devices as of today. Okay the question was there are spy north flashes which are dual stock that is there are two spy chips connected to a single chip select so you could erase 64k sector but it will actually erase 128k and how do you support those things in spy north framework. That was the question. Spy NAND is it? Okay. Okay. There is a spy NAND APIs that are now available which should work with spy NANDs but we will have to explore how that can be merged with spy north controller. I don't think you could directly use it. The question was whether we could support a device that supports both spy NANDs and spy north. I guess we'll need a I mean some common interface that can talk to both spy NAND versus north as well although it's abstracted at MTD level but we'll need an SPI flash subsystem which is more generic and doesn't distinguish between NAND and north. Sorry. Any more questions? Okay. Thank you for coming. Thank you.