 Let me get started. Good morning, everyone. Welcome to my talk on hyperbuzz memory devices. My name is Vignesh. I'm from Texas Instruments, India. I mostly work on the Linux SOC support for Texas Instrument devices, and I co-maintain the memory technology devices subsystem in Linux, especially the common flash interface part and the hyperflash part of it. So this is the agenda of my talk today. So I'll start with hyperbuzz, types of hyperbuzz memories, hyperbuzz protocol, the hyperflash command set, the kernel framework for the hyperflash, and writing a controller driver, and what are the recent developments and future enhancements that can be done for hyperbuzz framework? So what is hyperbuzz? Hyperbuzz is a serial bus. Nowadays, serial bus have multiple lines, so that's a pretty naive definition now. But it's a high-speed bus connecting a hyperbuzz memory device to the SOC or the controller. It has up to eight data lines, or it has exactly eight data lines, but it works in double data rate, that is, data is transferred on both raising and falling edge of the clock. It uses either a single-ended clock or differential clock, depending on the voltage. If it's a 1.8 volt part, it will use a single-ended clock, sorry, a differential clock. Otherwise, it can just use a single-ended clock. There is a chip select similar to a spy bus to select the memory device, and a data stop. Since this is a high-speed interface, data stops helps in having an accurate data capture window for the host. There are two types of hyperbuzz memory devices today in the market. First one is a hyperflash, which is a persistent storage similar to spinor flashes. And there is HyperRAM, which is a pseudo-static RAM for read-write capabilities. Mostly hyperflash devices are used on embedded devices and not exactly on server machines and et cetera. Moving ahead, as I said, hyperflash is a non-technology-based device, so it is organized into sector and pages. A sector is the smallest erasable unit, whereas pages, the maximum amount of data that you can write in one transaction. So since it's an eight-line bus working in DDR mode, so effectively it's a 16-bit bus, that is, you can transfer 16-bit of data per clock, and therefore the word size is also 16 bits. Hyperflash only uses the data strobe for while reading. That is, only hyperflash is able to toggle the data strobe line, and it doesn't expect hosts to toggle the data strobe line when it is writing data to the flash. It works at least at 200 megahertz frequency, so you can get a read throughput of about 400 megabytes per second because you are transferring 2 bytes per clock. So it's a pretty fast interface, and since it is a non-technology-based device, there is very little setup that is required, and if you are using it for an instant-on kind of applications and fast-boot kind of requirements, this is a very good storage media. So what hyperflash does is it kind of draws the features of the parallel-NOR memories that are already there and kind of combines with serial memories to reduce the number of pins that are actually required to connect the device, and therefore takes the best of the both kind of analogy here. And this is kind of a competitive technology for octal-spin or devices that are available today. Just a brief note about HyperRAM as well, so it is a pseudo-static RAM capable of having its own self-refresh security, and it has exactly been compatible with a hyperflash device, but it uses a bidirectional data strobe that is if a host has to drive the data strobe when writing to the flash, and the device will drive the strobe when reading. This is required because since RAM needs to be refreshed every now and then, it is quite possible that during a write there may be a need for refresh, and you have to give that what you say the wait phase so that the device is refreshed, and then you can go ahead and do the right transaction. So yeah, this is how we can divide the transaction on a hyper bus. Basically there are three phases, command address phase, wait phase, and data phase. So in the command address phase, it indicates what type of transaction it is, and what is the address to which this transaction is targeted to. So this is always driven by the master, this phase is always driven by the master, and there is a wait phase where either host is waiting for the refresh to complete or host is waiting for the flash to start sending data, and then a data phase which depending on the transaction would either be from host side or the device side. So just an example of how it would look on a bus, very similar to a spy transaction. You have chip select going down, followed by the command address phase here. Since it's a DDR bus, it just takes three cycles for six bytes of data to go out, and then the flash responds with data here, and you can see that it writes in a data stroke toggling to indicate the valid data. So to explain a bit what happens actually in the command address phase and what those 47 bits, 48 bits represent. So the highest bit, the 47th bit represents whether it's a retransaction or a right transaction, the 46th bit represents whether this is a address that is targeting memory space or the register space. This is mostly used by HyperRAM where if you want to control the registers of the HyperRAM, then you would probably set one, you would probably set the 46th bit one to indicate it's targeting register space. Otherwise, it's all mostly zero targeting the memory space or where exactly, where the actual data is stored. And bit 45 represents whether it's a linear transaction or a wrap transaction. Wrap transaction might, you know, you can wrap around the addresses at 16 bytes, 32 bytes, or 64 byte boundary based on the configuration that is done on the flash side. So this is useful in filling up a cache line or something. So then the actual address itself, so HyperBus has of today supports 32 bit addressing. So you have address line, address bits, A31 to A0, so off which A31 to A3 would map to, the 29 bits will map to bits 44 to 16. And there is a reserved bit range in the middle, which is 15 to bit 3. And then the last three bits are mapped to bits 220. So there's a concept called half page, which is the size of the A22 A0, which is 16 bytes. So it depends on the flash. But for example, on HyperFlash, this 16 bytes represents the smallest unit on which ECC is calculated. So if you want to enable ECC on HyperFlash, you'll probably do read, write with 16 byte granularity. So bit about the programming sequence of how software interacts with the flash itself. So HyperFlash follows the common flash interface extended command set 002, which is the same command set that was parallel nor flashes by Cypress or legacy AMD flashes use. We already have a driver in kernel implementing this for a long time. So you can find it under drivers, mpdchips, cfi, command set, dot c. So the flash comes up in read mode, where it is possible for you to directly go and read the data on the flash. So going back here, so you just need to set bit 47 to 1, and then bit 45 to linear mapping, and then set the address bit. And flash will start responding with the data. So a basic transaction will start reading data from the flash. So it's very useful for quick read kind of applications. So programming or writing to the flash or anything that modifies content on the flash is a bit tricky. So there are two step unlock stage where you write a predefined value to a predefined address has step one and step two. So unlock one and unlock two, after which the flash is open for programming actually. So then you write, for example, this sequence is showing how to write 512 bytes of data to the flash. So in that case, the sector address to which you want to do the update, you write value 0x25, which is the write to buffer command to the start of the sector address, and then tell how many bytes needs to be updated, and then send the actual data. So this data is not actually directly returned to the flash. Instead it's buffered within the flash first. And you have to issue a confirmation command, the last step here, to the sector address, so that the programming actually happens. The hyper flash supports up to 512 bytes of buffering in this way. Again, all these addresses are 16-bit word addresses and not actually the byte addresses that are being shown here. Then flash supports something called as address space overlays. Basically there are multiple parallel address spaces within the hyper flash. The default address space is where the actual data is stored. There are other address spaces like device ID or common flash interface address space where there is a table which tells you about the actual device itself and its parameters. There is status registers address space and production bits, which can be set to protect the individual sectors. And many more vendor-specific address spaces may be present, so there will be a specific set of sequences that will be given in a data sheet to tell how to enter a specific address space. Once those commands are sent and you enter a specific address space, even if the entire flash will be replaced with the new address space. So reading data from the same address would give you different data depending on which address space you are in. And there is no need for separate commands for each and every register access. So it will just become a serial address space overlay. So moving on to the different types of hyper bus controller, we saw the flash part now onto the SoC part where the controller is. So there are two broad variety of hyper bus controllers that we can see. The first one is dedicated hyper bus controllers, which only understand hyper bus protocol. And they can support memory mapped access to the flash, where in the entire flash can be accessed directly by CPU as if it's an SoC address. And there are multi-io serial controllers which support wide variety of protocols such as it can support a spinor, spineland, octal spy, or even a hyper bus, etc. They may or may not have memory mapped access, but these are the two variety of controllers that you can see out there. This slide shows how a MMIO capable controller works. You can see that, for example, on an SoC, let's say for some, sorry, let's say for example at address 8 followed by 7 0s to 8 followed by all f, there is a memory mapped interface that is exposed to access to the flash. So whenever CPU address has a right access to this address range, the SoC interconnect would probably route it to the hyper bus memory controller. And the hyper bus memory controller intern on the hyper bus will generate the required command address, weight, and data state. So since it knows the transaction that is coming in is a read or a write transaction and the address range, so the 48 bits of the command address phase where what, whether the 47th bit needs to be set or not will be decided by the hyper bus controller and it can talk to the hyper flash. This is very useful if it needs to be used for XIP kind of use cases, although not something for Linux, but you can easily execute code out of hyper flash memory. So let's come to the main part, which is how this is supported in Linux kernel and the software part of it. Hyper bus framework is relatively new and it was merged in kernel 5.3. We only support hyper flash and hyper RAM is not supported at the moment and support MMI or capable hyper bus controllers. So most of the code is just reuse of the existing common flash framework. So on the top we have the empty layer which provides the user space and the kernel level interfaces and there is common flash interface driver which implements the parallel nor command set that is used by the hyper flash and below that there is map framework, which is actually to forward the request of CFI layer onto the memory map capable drivers and hyper bus framework kind of access a layer between the hyper bus memory controller driver and the map framework forwarding the calls from the map framework onto the hyper bus driver and at the lowest level you have the hardware. So basically it's mostly reuse of the command set driver and giving a hook onto the map framework so that the hyper bus controller drivers can plug into it. So there were a few modifications that need to be made to the CFI layer to support hyper flash. One of them or the main one is actually the status register polling in case of write or erase after the completion of write or erase there is a need to pull the flash to know whether the write or erase operation is complete. The legacy CFI driver used to look at the data lines toggling to determine whether or not the programming is complete but it's not possible with the hyper flash kind of hyper bus kind of device where you have double data rate and just eight IO lines for a 16-bit bus. So the way it works in hyper flash is there is a dedicated status register which you can go and read it to know whether the write or erase is complete and it also provides a nice status saying okay what failed and why did it fail in case of failures. So okay if you're writing a hyper bus memory controller driver so what needs to be done it's a very simple interface as of now so you have to implement the hyper bus ops which consists of these five functions I'll go through them in the next slide so the first one is read 16 which is used to read 16 bits of data in a single bus and mostly used to read from non-default data spaces such as ID or CFI space or the status register space and then there is write 16 which is the complement of read 16 which is used to write 16 bit of data in a single bus and there is copy from and copy to which does the majority of write and reads of data from the actual flash memory array if you are reading the data from the flash it's actually done by the copy from interface. So the reason for having this read 16 and write 16 is that the memory map controllers are capable of although your hyper bus is a 16 bit bus you are not supposed to do byte accesses the memory map controller device drivers sorry memory map controller IPs are capable of you know doing non 16 bit accesses as well so for example if you try to write one byte of data the controller might append 0xFF to the higher byte and make it a 16 bit write and write it onto the flash this would be okay if you are writing to normal flash memory space but this would be a problem if the target was a status register or a configuration register for example where you want exactly the value that needs to be written so therefore read 16 write 16 must use proper IO accesses which will do 16 bit accesses whereas copy to and copy from are okay to use non default accesses as well and do the optimization that is required to read data from the flash. Finally we have a utility for calibration which is called very early on by the core code so that the the hyper bus controller driver can calibrate the controller itself because because this is a very high speed interface operating at 200 megahertz most devices may have a phi and may need to calibrate a phi or have to calibrate some DLL locking or some of those stuff so the core provides a calibration utility which is called even before trying to detect the flash device okay so once the driver implements the hyper bus ops itself for every hyper flash device that is discovered on the bus you register a hyper bus device onto the core so this is the structure most important one would be the mapping for which represents the start and end address and start address and the end address of the physical map where you can access the flash device followed by the pointer to the node itself and the empty destruct which gets populated once the device is registered and the hyper bus controller structure which will consist of the hyper bus ops we also have an enum for mem type for now we only support hyper flash and this is just for feature extension once this is populated you can just call hyper bus register device and register the device with the core so this is our device tree representation for TIM 64 SOC's hyper bus controller node there is a memory map window available at this range and it is being assigned to two two chip selects so the chip select zero has 64 megabytes of data result and megabytes of address space result and chip select one has another 64 bytes of 64 megabytes of address range result so yeah and using the ranges we map it to the flash device where the first entry here represents the chip selects itself and the second entry would represent the start and the size of the flash so the compatible for hyper flash would be cypress hyper flash or it's also backward compatible with the CFI flash so I haven't shown the second slave device but it will be exactly same but just the chip select set to one so yeah how do you access hyper flash from user space it's same as any other empty device so it exposes slash dev slash empty DX to the user space or empty block if we are using that block device on top of it so we can use the empty details that are available on the infrared link and yeah you can also use any of the flash file systems such as UBFS so that you can have a root FS on to the hyper flash or any of the storage yeah so one of the recent developments has been that there has been a new JDEX specification called extended spy specification or xxpi specification JEST 251 so the aim of this document is actually to standardize the command set and programming of different serial flashes but it's kind of mix of all type of serial flashes out there and being put into a single specification so there are two profiles mentioned in the xxpi specification which is profile one and profile two so the profile one mostly talks about the regular spy devices the spinar flashes up to octal octal spy flashes and profile two talks about the hyper bus protocol itself so yeah so therefore now hyper flash is also an xxpi standard so but if you really try to compare how these two protocol look on the with respect to each other so the above diagram is of spy flash protocol where you have one or two byte of command phase depending on if it's a ddr mode you'll have two byte of command phase or if it's a str mode there is a single byte of command phase followed by an address phase it can be up to three to four bytes most likely four byte addressing followed by a wait phase and data phase but on the hyper bus protocol there is just a six byte combined command and address phase and the problem is the address is kind of not divided among one of the bytes it's it spans all the way from the you know first byte to the sixth byte so you can't really divide this into two phases but the the xxpi standard kind of adds another profile and says okay these two are are almost same in terms of the phases of transaction so we do have xxpi compliant hyper flash devices on the market so such flashes will power up in spy mode which is single bit mode 1s 1s 1s so one bit command sorry one wire command and one wire of addressing and the one wire data phase also so this is backward compatible to any of the spy devices that we find in the market but using that mode you can program a bit in the configuration register which will switch the flash over to the hyper flash mode and then it will work with the normal command address data phase kind of thing that I showed in the beginning so although one advantage of xxpi compliant flashes they have this serial flash discoverable table or SFTP table which has wide range of parameters and things described about the flash that software can read in runtime and try to find out what needs to be done with this flash exactly without actually knowing to know the exact part number and so on everything is discoverable on the fly yeah we still don't have support for xxpi or compliant flash or the flash controllers which can support both spy nor flashes as well as hyper flash devices but what we have today is we have a spy mem layer which is an abstraction between the spy subsystem and the fly spy flash memory devices and it has been able to work with any type of flash spina and spinor and so on so we could probably extend that to also support the hyper flash so the the spy mem subsystem has spy mem of template which expects one byte of address and sorry one byte of command and four byte of address and followed by data but hyper flash is slightly different that it has a combined command address phase of six bytes we could probably add a new member to the template saying okay this is an hyper flash mode and then extend the command and address field so that you can accommodate the hyper flash protocol as well if you could do that then it should be possible to use the spy nor core and the hyper hyper burst core has ease with the spy mem ops to you know use a single controller driver to talk to both spy spy nor devices as well as hyper flash devices yeah these are the some of the enhancements that can be done to the framework which is one of the thing is the right performance is quite slow because the right is done at word granularity that is 16 bit at a time so but hyper flash can in general do 512 bytes buffered right at a time so should be able to extend that there's also a need to add dma support for reading data from the flash given that we can go up all the way to 400 megabytes per second using dma would would be the only way to actually you know get that such high throughputs but the most flash file systems such as ubfs and jffs to would use vmalloc buffers and we can't just pass around vmalloc buffers and try to map it and use it for dma that's that's that's not possible because of various limitations so that needs to be handled in hyper burst core or at the empty layer level and yeah then probably extend the controller itself sorry probably extend the core itself to use the spy memops so that we can support multi protocols by controllers as well as the xspi support for the spy and hyper flash compatible devices okay so yeah that's it so these are some of the references where you can find hyper burst specification hyper flash and hyper ram data sheets and you can find a source of hyper burst code on the d-tub uh yeah so thanks to my company texas instruments and the linux foundation for providing an opportunity to speak here uh with that uh i open for question answers uh thanks thanks for attending um so uh i saw your patches today linux and you with mailing lists thanks for them good work thank you um you keep mentioning the ubi file system and basically be on the empty on the hyper flash but isn't the hyper flash kind of an uh north flash so why would you run ubi on top of that why would you run ubi on top of that yeah why would you run ubi on top of parallel north flash uh well you can run any of the file systems but uh you still need to do uh i mean you don't need to do wear leveling kind of thing that is required for land but it's still uh erasing the same sector again and again and trying to use the same one as kind of not really good for north as well so the purpose is to uh do the wear leveling basically yeah okay thanks yeah so where where is the subsystem living is it a direct under drivers or is it under the current uh flash drivers so it's here uh it's under drivers mtd hyper burst so it's under the hyper mtd framework hyper ram uh we need a you know if there's a good use case then we can start implementing targeting the use case so that you really know that i mean just supporting in uh out of the blue would probably not meet the requirement when somebody actually tries to use it so that's that's why i kind of haven't come to that part yet yes yes because there's absolutely no programming required actually so yeah uh and it's most likely that hyper ram is a supplement for people who are trying to use it on mcu site where hyper flash would act as a storage and hyper ram has a small amount of memory and it might be a single multi-chip package which can be used for such applications so may not be a linux use case never know uh do you know which other vendors do provide hyper bus on their devices sorry uh do you know other vendors who provide hyper bus in their devices uh vendors who provide hyper bus in their socs yes okay there are ti socs ti am 654 soc which has hyper bus controller on it and i know renaissance has also has a similar soc yeah did you compare the throughputs between uh using octals by mod on hyper bus uh throughput in terms of throughput you mean or yeah i think you can use the same controller with both correct protocols so did you compare the throughputs uh no i haven't done the bench parking yet no um even octals we have octal mode support but octal ddr mode support is still something that's missing in kernel as well so we don't support the ddr mode and hyper bus is a ddr bus by default so yeah that's still pending to be done yeah uh but i would theoretically it seems like both are both can operate at 200 mega hertz and ddr so yeah uh it should be identical if i if i take a guess yeah okay thanks yeah okay thanks thanks for saying thank you