 Alright, I'm Mike and I'm going to talk about boot time memory management and so these are topics I'm going to cover more, more or less to some level of details is how Linux initializes its memory management subsystem, what happens from that perspective from the moment Linux starts after bootloader jumps into it and until the moment it is possible to use page allocation and KMALOCK and all the usual allocation methods we know from the kernel development and one of the topics that I'll cover is memblock which is a boot time memory management of the Linux kernel, what APIs does it provide to the developers and how the cannon should be used. So normally when we develop the kernel code we use ALOCK, page KMALOCK, KMM, cache ALOCK or whatever other allocator there is there and in the end every allocation in the kernel boils down to body page allocator granting some physical memory to that or another allocation request, it can be cached with KMM caches, it can be used with the JANALOCK or other methods but in the end everything is about physical page allocator. So physical page allocator itself is really complex beast and it needs memory to allocate its own data structures that it uses to manage the entire physical memory but these data structure are not readily available at the time you start executing the kernel and there is a need for some other mechanism to take care of memory allocations before page allocator is available for normal use. So when bootloader jumps into the Linux kernel that's more or less the view of the physical memory, it's a bit difficult for me so I just wave my hands there are several used regions the kernel code data etc. the parameters bootloader pass to the Linux kernel it can be command line there is some data structure it can be device tree it all depends on particular architecture there is an optional init or d image that will be used as the very first initial user space file system and there is sometimes a fewer data that is live during the kernel execution it can be used or should be used when the kernel tries to access firmware or do something with it this is also architecture dependent not all architectures leave that piece and the most memory as we can see here the most of the memory is free the problem is the kernel still doesn't know where it is so whenever some code in the kernel tries to access some of the physical memory it can actually overwrite either kernel or init or d or some other important piece of information in the parameters that were passed from bootloader to the kernel so it should the kernel should be very careful it what it does with the memory before the last bullet here mm init so what happens it's quite a lot of code that runs from the start of the kernel till the point when the allocators are set up and the functional so usually what's happening it's also architecture dependent but for the most part assembly code sets up some basic page table basic page tables with the memory pre-allocated for them from the bss in the kernel so that it won't overwrite some essential information and the fact that they are in the bss already makes them visible and in the null location the next step usually is that setup arch for each of 20 something architectures detects the physical memory configuration and notifies the rest of the kernel where the physical memory lives what is the starting address what is the end address how it's organized in banks numenodes what not and also setup architecture for every every architecture in the kernel make sure that the areas from the previous slide the kernel parameters in 3d whatever essential that is in memories that should live during the kernel execution is reserved and it it's known for the rest of the kernel that this memory should not be touched then start kernel from the init main dot c must use early memory allocation because it locates several large buffers that page allocator won't wouldn't grant to anyone because of certain restrictions and only at that point the setup kernel calls mm init which in turns transfers all the physical memory pages that were not yet reserved to the page allocation and from that point everyone can use page alloc came in came out and so on and so on and I must note also that from from the beginning of the kernel execution to the point where the memory management system is initialized it's quite a lot of going on there it's s and p insertization is a c groups kernel print k buffers and all other kind of things and they all must take proper care of their memory allocations so what happened a long while ago and to do so until Linux v 2.3 the code for this is actually from historical Linux git trees and every function that was called from start kernel was past memory start memory and and if it did some memory allocation if it needed some memory it just returned the new memory start to account for the memory allocated for that particular or paging or console or pci whatever it is so if they came out in it needed 1k of memories to create the metadata for k malicious it just received memory starts in 16 kilobytes and then it returns 17 kilobytes as a new memory start for the rest of the system in version 2.3.23 press 3 for the first time appeared a boot time memory allocator called boot mem which was the first fit allocator based on bitmaps with every physical page represented by bit in the bitmap zero says page is occupied once the page is free one says page is occupied and then the setup code for instance console pci in it etc could use bad boot mem allocation functions and they went to look up the free bits in the bitmap and then made make sure that pages were reserved for that or another function that allocated them as the time went people discovered various problems with boot mem and that it was quite challenging in evolving infrastructure for memory physical memory configuration they appeared numer systems not every machine had started its memory from address zero it could move wherever in the 32 bits address space I mean physical address space and for larger machine of that days like 32 gigs of RAM and the bitmap would be quite large it would be one mega of bitmap so searching that bitmap would be quite expensive and from version 2.5 a new mechanism for early memory allocations was suggested actually it initially was implemented for power pc 64 bit they didn't use bitmap at boot mem at all they used their own mechanisms they called then logical memory blocks which eventually was renamed to mem block and as the time went it was adopted by more and more architectures at first pretty long period they existed the compatibility layer between mem block and boot mem for instance x86 used that no boot mem compatibility layer so that actual memory allocations at early boot process were done by the mem block but the rest of the system seen the older boot mem apis and boot mem was completely removed in 4.20 so the rest of this talk is about mem block and I apologize for Android developers who are still using older kernels but still a both boot mem and mem block apis pretty well documented and you can see the reference documentation in the kernel documentation page at kernel.org now a mem block does have some advantages over boot mem and because it uses static data structures embedded into the bss so it can be used straight away virtually from as even from the assembly code you can call some mem block something and allocate memory it doesn't need to manage the the boot mem's bitmap which was very difficult and you you would see a lot of code trying to find the exact place where the bitmap can be placed and not to clobber with other important things in the memory the other thing the allocation itself has a bit more complex logic than simply tracking the last used bit in the bitmap and one of more dangerous features of mem block is that whenever you allocate memory it may implicitly grow the internal data structures allocate some more physical memory and consequently clobber the memory already using so some care should be taken to avoid this to happen and this is how mem block is represents the physical memory so yellow is a memory bank suppose we have like I suppose we have four memory banks and the blue blue rectangles are the memories it is already in use so mem block has two structures called mem block type one mem block type memory and one mem block type reserve each of these structures is an array of initially 188 entries and each entry in the array represents a memory region a continuous physical memory region it has base size some flux that allow to distinguish regions with different properties for instant mirrored memory all the crazy things with no maps the term people invented and and it has a node ID so that mem block would know which memory bank been built which which memory banks belong to which new monode in the system and now the memory array represents memory bank memory banks as plain physical memory banks and the reserved array represents the used memory that somebody in the kernel care to tell mem block that hey I'm using this so please take care of it and so these arrays obviously not directly accessible for the modifications you have to use apis to modify them and the basic apis for dealing these mem blocks arrays is mem block at or mem block at node these get a base address and size of the region you would like to make available to mem block and that region goes into the memory array so these functions effectively register physical memory banks with mem block then there is mem block remove which is kind of weird because there is no hot plug at that time but still some architectures want to make sure some parts of physical memory are not visible to the kernel and are not mapped by page allocator but used solely by devices of firmware or some kind of stuff things that not exactly the kernel itself so they choose to implement mem block remove and make some reservations of physical memory for other purposes and normal kernel memory usage there is also mem block reserve which is the lowest level allocation function and that simply says okay I'm using it from here to here this is my memory and I wanted I wanted there and I don't want anybody to touch it and there is mem block free so if you did some reservation of memory and you don't need that memory anymore you can just free it and it will be available for any subsequent reservation or whenever the pages are handed to the page allocator it will be able to make use of that physical memory but then people realize that these basics are probably not enough to do actual memory management because every time you need to allocate memory you have to search where the free memory is then make it mem block reserved and it was a repeating partner for a lot of power pc code back then so they've added several additional apis that were renamed with the time but now they're called mem block fees alloc something and what they do is they allocate a chunk of physical memory with specified alignment of specified size and there are some modifications that allow to restrict where the memory is allocated for instance mem block allocating fees alloc range allows to say okay I want a memory allocation between this physical address and that physical address and mem block will try to satisfy that request you can also try to allocate from a particular number node if there is no memory in that particular number node mem block will fall back to other number nodes and so it tries as hard as possible to grant memory rather than to satisfy any restrictions and all these functions are considered for now at least by me some somewhat legacy and probably it's better to use the other set of functions that I will cover in a bit and these functions return physical address of the allocated memory and on the contrary there is another set of allocators that return virtual address of the allocated memory and the two basic functions from that family is mem block alloc try need role and mem block alloc try need and so the shorter one gets a whole bunch of restrictions where the memory should be allocated what node it should be gotten from and some other flags that specify that probably memory should be with memory mirroring enabled or some other things and if the location succeeds including all the fallbacks mem block does implicitly which I'll cover a bit later and the mem block alloc try need is going to convert the physical address of the memory it found into virtual address do the mem set zero for the entire range allocated and return that virtual address to the caller for some of the early allocations of large pieces of memory the mem zero was considered something that hurts performance and therefore should not be done because these areas are anyway initialized to something else so people added mem block alloc try need role which does all the things that the try need does but in the end it leaves the memory unset and it either contains garbage or it could be poisoned if your kernel has enough VM debug features turned on at build time or boot time or even both now these two are large beasts and have way too many parameters for most uses so there is a bunch of convenience wrappers around the mem block alloc try need and they differ in amount of restrictions they put for the memory allocations and the simplest one is mem block alloc which is used in most of the places when somebody needs early memory just receive size and alignment and that's all you don't need anything more than if you don't need anything more than that you're good to go with mem block alloc now if you want to have your memory allocated above certain physical memory address above certain physical address probably you would want to use mem block alloc from which tries to let you have your memory where you specified if it cannot it again falls back to the rest of the memory because for mem block it's more important to satisfy the allocation than to complain to complain with all the restrictions there's also mem block alloc low and for some architectures was it was very important to have certain pieces allocated from what's called low memory for instance i 386 must have its dm is a dma memory below 16 megabytes so there is a lot of codes that allocates memory from lower memory regions and therefore this this wrapper is quite useful and whenever somebody wants to allocate from a certain numa node and it's the only restriction for the allocator there is mem block alloc node which tries to give the requester memory from the exact numa node it asks for and there is also mem block alloc role for those who do not want to have their memory clear to zero and for those who consider its performance drawback they always can ask for memory as it is and again if enough em debug features are turned on it will be poisoned now how the whole mem block allocation works so suppose mem block got a request to allocate the say green rectangle and this one so it will traverse both memory and reserve the race to see where the free memory is which are the yellow the yellow bars and actually represent the free memory and it continues the traversal until it finds the free memory area large enough to satisfy the request allocation request excuse me and after the memory found it will be reserved and there will be a new entry and memory in mem block reserved array that will represent the reserved memory and if the allocation if there are some allocation within that have adjacent addresses a mem block tries to merge this so the array won't grow beyond some limits and here a mem block can implicitly grow the reserved array if you had already allocated 188 areas and you're located allocating the 199th area the array will be doubled and this of course has a possibility of overwriting some of the used memory so some caution should be taken that won't won't happen and I'll talk about it a bit later so the basic function that traverses the arrays is mem block find in range node which takes into account all the restrictions and all the criteria like minimal address maximal address flags and a new monode and and it can search in both directions both bottom up and top down the default for mem block is top down allocation so it starts from the end of the memory but there is a control method that allows to force bottom up allocations now these two basic functions are never called more almost never called by themselves there is also a wrappers for them that that do like in a loop allocation so that mem block alloc range need tries to make the allocation with whole bunch of restrictions user put there but if it fails it retries to allocate memory from other node in the system so that allocation will succeed if the particular node doesn't have enough memory to accommodate the request and this function returns a physical address of the allocated memory and then there is alloc internal which actually implements all the mem block alloc functions that return virtual address that that does another retry without lower bound for the allocation so if somebody said okay I want memory above this address and mem block alloc range need didn't succeed to find memory above this address mem block alloc internal will try to find memory below that address as well there is a bunch of functions that allow controlling the behavior of the allocator itself so to avoid the problem of a growing implicit growth of reserved array you can disable array resizing with mem block a low resize which gets true or false and respectively you as a disable or enable resize of mem block arrays in case you restrict your disabled implicit growth of the arrays your location will fail and you get nice crash but still you know you're something you did something wrong instead of having some memory corruption along the road in a few milliseconds then there is ability to set bottom up or top down direction and then every architecture thought they need a different way of limiting the actual memory that can be used for the memory allocations so there is a whole bunch of setting limits which behave more or less the same but a bit different and each one of them I simply used by different architectures that implemented it and pushed it some time ago to the mem block probably in some time some time going along will unify at least some of them next you can inquire is a mem block state and see what's going on there there is a method to find out what is your physical memory size rather than go and ask firmware or traverse all the mem block arrays you just can tell you how much reserved memory there is in the system it can tell you what is your start of physical RAM and where it ends which is useful for instance on arms that may have a lot of possibilities for physical memory mapping it actually depends on the SOC vendor and you can check if some physical address is actually in physical memory if it's already used and you can see what is the highest allowed address for the memory allocations and if you need something beyond these interfaces that already available you can simply traverse the mem block arrays with a lot of different iterators either it's it's not the fullest by the way there are more of them and each of the iterators have some differences so most important for for each frame memory range that traverses the intersection between memory and reserved arrays so what you get is actually free memory regions and you can do something with them and then another one is for each reserved memory region which is used mostly at the time when mem block hands over the memory to the page allocator so the page allocator will know that this memory is already in use and they will not all get it to somebody else so most of them are pretty well documented and you can take a look at the kernel documentation if you'd like so now let's return a bit to the memory management initialization and the important piece that should be taken care of and all architecture does do this mostly right is that the kernel initrd firmware pages whatever memory is occupied by the time kernel starts should be reserved as soon as possible and then and then you can run with the mem block resize on and allocate memory as as much as you need as long as you have all the reserved and areas reserved very early in the good process the next thing that's going on is the detection of physical memory as we already seen each architecture does it in its own way it acquires firmware which is from the device tree asks bios whatever it's not it's also important to have this piece as soon as possible so you'll be able to allocate physical memory and then if a certain machine has some restrictions for physical memory availability like you'd like to have all your upper banks available only to dsp or virtual machines or whatever you can limit mem block and ensure that it won't allocate that physical memory for normal kernel usage and in the end of this boot memory boot boot process a start kernel calls mem init which in turn calls mem block free all and at the time mem block traverses all the memory and reserve the race and passes the whole physical memory to the page locator and voila you can you can live normally you have came out so and these are some references I use I used in the talk thank you very much