 Greetings, everyone. My name is Vicky Podek. I work in semi-half embedded systems and today I will be presenting our free BSD and net BSD ports for the APM86290 system on chip from applied micro. During that presentation I would like to compare both systems in case of their portability, but I'm not willing to make any statements which system is better and which is not. I would like you to make that decision after the presentation. So this is a brief introduction to my presentation. First I would like to say a few words about embedded systems in general and market requirements for those systems. Then we are going to take a look on the hardware that we worked on, that means the APM. And especially I would like to say some words about method-passing architecture which was introduced to the chip. Then I will tell something about porting operating system to an embedded platform. This actually can be divided into some parts, some stages, and those stages are going to be described. And last but not least I'm going to say something about testing at debugging and current state and future work that we would like to perform on this port. So first question, what is an embedded system? Actually it is hard to say nowadays because embedded systems are actually everywhere starting from the doorbells ending on server boards. But the subject of our interest is mainly not X86. So maybe I skip that and go to the market requirements. Modern telecommunication industry demands more and more packet processing from the embedded platforms along with lower energy consumption. Of course additional hardware extra features like packet classification, security extensions, and offloading engines are highly appreciated. And according to that, very important features that are often implied in the embedded systems are these offloading engines. And software maybe fewer about software in general, even most sophisticated chip is just an empty vessel without software that is operating its features and capabilities. And usually we want the software as soon as possible. When the target is on the market, it has to be already supported by the operating system. Of course reliability, license, and availability. This is important for us because BSD license is very friendly for the vendors because not everyone wants to share their work for free. And BSD license gives the ownership of the code to the person who is developing it. And of course support for the chip's extra features. Okay, now a few words about the hardware. APM 86290 incorporates two-power PC bookie-compined cores in the single package. Apart from other details concerning cores, it is delivered with a bunch of peripherals integrated in the chip. This includes gigabit Ethernet, PCI Express, NAND memory, SATA, USB, iSquare-C, and others. And of course the on-chip processors are assisted with a rich set of configurable hardware accelerators focused on packet classification, scheduling, packet manipulation, and security extensions. And scheduling and packet data manipulation is combined in a facility named message-passing architecture. Okay. Now I would like to say a few words about message-passing architecture implemented in the APM. The core, the central point of this idea is queuing manager. It allows most efficient moving data and packets between the processor and the integrated peripherals. And it offloads the CPUs from the expensive data servicing. And queue manager interface, QMI, it is located in each of the supported devices on the chip. And it maintains the queue status and prefetch buffer status. Okay. So data transfers can be organized in queues. And QM allows system nodes to communicate with each other through the program queuing commands. And the main abstractions that mechanism distinguishes is queue, message and the buffer. So queues are organized in a circle of buffers and are stored off-chip in the main memory. And queue, and its contents are prefetched on chips as needed. The queue state is maintained on-chip for each queue. And it contains pointer to head and tail, occupancy level and other parameters. And when the queue contains messages that are pointers to free buffers in the memory, it has to be configured as a free pool. Otherwise, when these messages are pointing to an occupied buffer in the main memory, it has to be configured as working queue. So basically, we have two kind of messages in this idea. A standard message is 32 kilobytes long and can point to one packet, one buffer, means packet stored in DRAM. An expanded message, which is 36 kilobyte long, can point to maximum six messages, six buffers in my memory. In that case, the next data address four is pointing to another message of the same size, head. So it looks a little bit complicated. And this is an example of expanded message usage. Regarding buffers, they are fixed size memory locations that are used to store data. For instance, packets or something else. They are kept outside of the chip in DRAM. And messages in the working queue are assigned, have assigned buffers corresponding to the, they are assigned to the buffers, corresponding buffers in my memory. And this assignment is one to one. And we have two models of Q usage. First one is where we have one free pool and one working queue. And the second one is when we have one free pool and one working queue. And in addition, a compression queue. This second example is when the producer of data wants to know the status of, the compression status of the command he sent. This is an example of the first model. As we can see, the producer gets free pools from the free pool queue. Then he updates the corresponding buffer in my memory. And then he sends a message to the work queue of the consumer. And from that point, the producer is no longer concerned about transferring that data. So he can go back to his own jobs at that time. And the consumer might be occupied doing something else at that time. So this data is not lost. It's just waiting for his time to be processed. And after that time, he takes the data from the main memory of corresponding address from the message. And then the message is brought back to the free pool. And then he takes the data from the main memory of the pool. Second model has an additional compression queue. The start is the same practically. So the only difference is that the consumer returns a message to the compression queue. So that producer may know that the command has finished, was executed on the consumer. And then this free message goes back to the free pool. Okay. So now I'm going to say a few words about porting. General phases of porting are divided, for instance, in a way like I did. First is the baseline call selection. Then a cross build environment preparation, system boot strap, early kernel initialization in low core S. And platform initialization, device drivers along with support for chip special features. And the last is testing at debugging. So I will start with the first point. In our development, we started from FreeBSD 8.1 with basic support for PPC 460 EX, which was unofficial port not integrated to the main source. Finally, we wanted to move our work to FreeBSD 9. But this was a good start. And we had more out of experience and examples of this kind of chip from the previous works. So we started from the previous work. So we started from the previous work. We started from the previous works. So we need to rebase after basic preparations. But we have a firm baseline of the older brother of PPC 465. Okay. Regarding netBSD, we started from the port for MPC 8.5 XX on netBSD. And this probably wasn't a very good decision because there were also more similar platforms like Walnut. But we thought that this would be the right way because earlier port for PPC 460 was basing on MPC 8.5 XX. Okay. I omitted the part of concerning bootloader because our port was already supplied with sufficient bootloader, it means UBOOT. So we started from the first code that executes in the kernel. And this is lowcore.S. Or in netBSD, it's a start port for every platform. And as I said before, we need to make sure that we don't have a lot of problems. And we need to make sure that we don't have a lot of problems. And we need to make sure that we don't have a lot of problems. And we need to make sure that, as I said before, we made an assumption that the UBOOT already did basic sorts of initialization. And we have initial mappings present in the TLB so that our code could be executed. I might say at this point that in Boogie platforms, in Boogie platforms, there is no way to switch off the MMU. So there always has to be a valid translation in the TLB so that we could execute the code. Okay. So start code is written in the assembly language because it is capable to be executed for any place apart from C code. And goals to achieve at that point are to remap the kernel in virtual space so that, and set up temporary stack so that we could execute C code. And regarding this MMU issue, we had to slightly hack at this point because we wanted to, in order to remap the kernel, we had to create a temporary mapping switch to it then create a final mapping for kernel, switch to it, and in the end we invalidated all other entries in the TLB. So comparison of local and star code between freeBSD and netBSD. We hooked up to the existing local S for Boogie processors in freeBSD, set up of the exception vectors, so-called IVORs in freeBSD. So IVORs in Boogie processors, these registers contain offsets to exception handlers. So this is set in the low core. And this is the mapping process which I described a minute ago. And this stacks it up and after that we were able to go to the platform initialization in C code. In netBSD there is always a new file for a new platform. And this of course gives a little bit redundancy in the code because on the example of the Boogie platforms it could be combined in one file. And the rest was practically the same as to the freeBSD. Except for the low core in the netBSD is generic per platform. And basically it creates only the temporary stack. So in the platform initialization our main goals to achieve were to create mappings for the SOC registers, the CPU initialization, and message buffer and console initialization and in the end virtual memory subsystem bootstrap. And once again in freeBSD we hooked up to the existing mahtep for Boogie. We extracted the common part for the Boogie and the platform dependent mahtep. And next we mapped the SOC registers. And at this point I have to say that basically in freeBSD all of the platforms, embedded platforms, have their internal memory map registers organized in one place in this single continuous chunk of the memory. But APM has a couple of chunks. This area is divided into a few chunks. So we had to create a table with the translations and map our registers according to the table. And we applied minor changes to the UART and set up the console. And at this point in freeBSD 9, in freeBSD 9 we set up the FDT framework. In netBSD of course a new mahtep file for a new platform was created. The rest was basically the same except for the filling of the stack functions for the interrupts for Boogie processors, excuse me, exception for Boogie processors which are initialized at that point and basing out on that stack. Okay. Maybe one more thing to clear. The low memory management support is the most sensible area of the operating system. All of the packs that are located in the memory management are the most fatal for the overall system operation. And the lower level, the lowest level of the freeBSD and netBSD's virtual memory subsystem is PMAP. And PMAP basically manages the physical address maps, maintain the page tables and handle memory management hardware. And this was our point of interest because this was a way to adjust for our upcoming ports. So we had to implement the operations on TLB. This was consisting of creating functions for TLB writing, TLB reading, invidating particular entry and flashing means invidating and invidating a particular entry with a translation ID. Okay. So regarding device drivers, in freeBSD starting from 9.0 Flatten device reconcept has been adapted. And FDT is basically the hardware description for embedded platforms that describes the existing hardware in a unified way. And thanks to FDT we can use the same kernel for different platforms of the same family, which are different in case of registers or configuration or amount of peripherals and so on. And few words about FDT bus and simple bus. FDT bus is a glue between native freeBSD new bus device drivers framework and Flatten device tree device description. And simple bus is a main bus lying on FDT bus. Simple bus is managing, is adding children to the new bus framework. And FDT bus is managing resources like interrupts and memory resources. In this area we had to do a little modification because, as I said before, simple bus assumes that FDT assumes that all chips registers are grouped in one place. And we obtained, we got the DTS file which describes the hardware that was actually reflecting the actual bus hierarchy and device hierarchy in the hardware. So we modified a little bit FDT in order to store, to pass resources properly. In FDT there is an auto configuration process. During config the table with device description is created and there are two possible ways to configure the device, direct and indirect configuration. In the first example the bus is fully aware of the device in the system even if the corresponding driver is not present. In the second example the device has to probe the bus for the resource. And in FDT we had to write bus drivers from scratch when on free BST we had ready to use FDT bus and simple bus. And this is summary of what I said regarding the device drivers. And this is the list of devices that were elaborated during our work on APM. It is interrupt controller, gigabit ethernet along with Q-Manager, PCIe express, USB hosts, UART, I2C, GPIO and RPC. As it goes with interrupt controller the APM incorporates interrupt controller which is compliant with OpenPeak register interface specification 1.2. It is a well-known piece of hardware on free BSD. There was already ready to use OpenPeak driver and machine-dependent interrupt management liar which was located in IONTR, Magdeb. And very useful feature in free BSD is that interrupts, incoming interrupts are serviced in a similar way that treats our service. So the full preemption of the interrupts is implemented. In net BSD there was no ready to use OpenPeak driver. However, there were OpenPeak drivers but there were other designed or designated for the special platforms and we decided to combine the generic interrupt liar for PowerPC, our own code and OpenPeak driver to fulfill the requirement of the OpenPeak controller. Unfortunately, in net BSD we have SPLs. It means system priority level. This gave us a little bit trouble during development. So Ethernet controller is cooperating with the QM to maximize performance. And we created four types of queues for the Ethernet controller. First is receive queue, transmit queue and these are working queues and completion queue and free pool, of course. And also in Datapath there was a packet classifier which is not currently supported but it lied in a way so it is programmed to pass through all the incoming packets. And when the network stat wants to transfer a packet, it calls the AMI if start. And AMI is a shortcut for applied micro Ethernet just to be clear. And it starts sending packets. So in that moment QM is taking the messages from the processor and feeding the callbacks for the completion queue. And the message is sent to the controller. And after that the callback handlers are executed and packet is being sent and the processor is informed back about the completion of the task. And regarding receiving messages, receiving packets, when packet is being received by the controller and the handler, receiver handler is called so that the message to the processor is sent and the data could be acquired from the buffer. Of course, in order to improve our development, we implemented extended debugging features connected to the building kernel debugger. So it is fully supported. Okay, testing and debugging. Apart from JTAG debuggers and integrated debuggers in the chip, it is convenient that kernel has his own facilities and these are going to be presented now. These will be the testing framers and kernel debugging features. Okay, so both FreeBSD and NetBSD incorporates in kernel debugger. This can be enabled easily from the kernel configuration file by adding two options, KDB and DDB. And this needs basic console installation in order to communicate with the user. And kernel tracing facility, KTR, is a FreeBSD feature and can be added by adding an option, KTR, to the kernel configuration file. And this basically locks kernel actions while kernel is working. And this could be, according to configuration, either printed to the screen or stored in the memory and read after the problem occurs. NetBSD is equipped in automated testing framework and it is located in USR slash tests. Running is very simple as it can be seen. But in that case, we have to have a working system and this basically is good for finding issues that are not visible at first sight. So this kind of testing framework is included to the NetBSD. And our current state is that we support full core complex of the PPC 465 interfaces which were described before. And special features of the chip which is a support for data passing architecture. And it would be nice if we support SMP because there are two processors in the package. SATA, LDVACH and extend our utilization of the QM because not only Ethernet controller can or is able to use that facility. And of course cryptographic engines, power management support which is the main feature or one of the most appreciated features of the APM86290 processor. I would like to thank some people at that point. I would like to thank Rafał Jaworowski and Tantua Wysinka, mentors of this project and all the people who committed their work for the project. And these are Grzegorz Bernacki, Michał Mazur, Marcin Ropa, Łukasz Wójcik and Piotr Życik. This is all semi-health group. So this would be all from me. Are there any questions? If not, then thank you. I would rather not answer this question. If personally it was easier for me to work with free BSD than with net BSD according to the PowerPC platforms. I have a short question. You talked about some Q management system implemented in hardware. Which free PLS can use this system currently? You talked about Ethernet can use it? Are there more free PLS using it? Ethernet controller is using it. And other peripherals can also incorporate these features. For instance USB, I believe. And crypto, yes. One more time because I can hear you. I'm sorry. Is your porting work shared back into the trees? Free BSD and net BSD? We are going to integrate this to the free BSD mainline, but currently there are no plans to integrate to the net BSD. Oh, we can help. Here's some fallen tears. How much more work to integrate the cryptographic accelerator? I believe I can't answer this question because I simply don't know. But if you are very curious about this, please leave me some contacts and we will answer in a few days. Okay, so I think that's all. Thank you very much for your attention.