 Okay, good morning, everyone, and welcome to my talk, which is, yeah, I'm tempted to say for now for something completely different and talking about the I2C multi-controller and how to do controller target mode in QEMU. So I'm a co-maintainer of the NVMe device in QEMU, so why would I even care about I2C? But the reason is the NVMe management interface, so I want to be able to emulate that, and the interface provides an out-of-band interface to manage NVMe devices and enclosures, and it uses the management component transfer protocol. And this protocol can run on a various set of transports, one of them being I2C or SMBus, and also on PCIE vendor-defined messages as this out-of-band communication path. So this talk today is about the work we're doing to emulate the I2C binding. So the management component transport protocol is, as I said, transport-independent. Everything is transmitted in forms of packets, that's the unit of data transfer, and the size of those are typically defined by the actual transport. And these packets are then assembled into messages, and the messages are the content that's passed between these MCTP endpoints, which is basically a terminus. It's the entity that receives a message, handles it, and sends back a response. So this talk is not about MCTP, it's about the requirements that are to actually support MCTP and QEMO. So the goal of this work was to be able to launch an emulated Baseport Management Control Platform with an NVMe MI device hooked up to the I2C bus. And because of the way MCTP works, or the way MCTP uses I2C, this requires the target devices to be able to actually act as a controller on the bus, something that QEMO cannot handle prior to this work. And it also requires a mechanism for somehow having the data that these controllers or targets are sending back to the controller on the bus to be delivered to the host. And this requires this so-called target mode in the controllers, which we'll also be looking at. So I2C is the integrated circuit and it's a very simple bus. It's a synchronous controller target two-line serial bus. And the way it works is that the controller will address a specific target on the bus by transmitting the address it wants to, of the target it wants to communicate with. And if the target is there, it's gonna acknowledge its presence. And then the controller can start transmitting data. This also works in the other direction. So as you can see here, we have something called a start and a stop condition, which is basically like a wire condition. And then the controller sends the address, which is seven bits, and then they send one bit to indicate whether it's a write or a read. And if the host, or if the target acknowledges this, then the data can start flowing. And this will either be, like we're mostly in this work, we're interested in the transmission part. That's where the controller is always sending to the target. But the data packets can also be controlled by the target, and the target will be the one driving the data line. But when you send these individual frames, which is one byte, you get an acknowledgement from the receiving party. And at some point, the controller says, okay, we got what we wanted, and it does a stop condition, and the transfer is done, or the full message has been sent. So in MCTP, it's based on the SMBus block write bus protocol. And it's basically just a layout of these messages that are predefined. So that's the start condition, the address, the read write bit, which is always zero to indicate a write. Then there's a command frame that is always set to OXF, to indicate MCTP. Then there's another data frame indicating how many bytes are in this transfer, and then the data bytes start showing up. And in the end, you have something called a packet error code, which is just a CRC over the entire message. But because MCTP uses these block writes exclusively, the receiving party somehow has to know where to send its response to. So the first data byte holds the source address, I2C source address, yeah. So I2C and QEMO is emulated with the I2C bus QBus. And you have a I2C controller, there are various different I2C controllers that implements this, that will own this I2C bus, and then the targets that you have sensors, the NVMe MI device, stuff like that, they are children on this bus. And we try to emulate how the hardware works, so the controller will sort of send start and stop condition by issuing a I2C start send or start receive, or the generic start transfer onto the bus. With the bus as the target, not with a targeting a specific target, you know, in the parameters. And the stop condition is with the end transfer. So as soon as the transfer is set up, we can start sending or receiving bytes, and we do this with the I2C send and the I2C receive. Again, we're addressing the bus and not the particular target here. And the acknowledgement or not acknowledge of this is implicit in the return code of these calls. So when you have a target and implements that, like a sensor, some kind of I2C basic I2C device, you have to implement a specific device class called the I2C slave class, and you implement the send and receive callbacks, and these are called in response on the specific target that is the target of the current transaction on these I2C send and receive calls. There's also an event callback that allows the target to react to the start transfers and the end transfer stop and start conditions. So in the simple case, when you have a single controller transaction, so something like this, you have a controller that does these I2C send and receive calls to the target, and it's just a synchronous thing. You do a start transfer, you start sending your bytes, and you do that in transfer. So the problem is now, how can this target, with this API, how can the target actually reply to the controller? And before this work, it just couldn't because the I2C core code couldn't support multiple controllers, but there's actually nothing in the API that sort of prevents the target from just getting a reference to the bus and that just starts sending, and this can actually be made to work, but the problem here is that you end up having these recursive end transfers, and yeah, it's a miss, so that's not the right way to do it. So what does hardware actually do in this case? So in hardware, when you have multiple controllers, what happens is that you use an arbitration mechanism where the controllers will just try to sense on the bus if it's free or it's busy, and if it's not busy, then we take control of the bus and we start sending our message. So in QEMO, luckily, we can just have the controllers that wants to be a controller on the bus to line up nicely, and we can do this by registering a callback in the form of a bottom half. So the solution here is that upon the end transfer and getting the finished stop condition event, then the target device will try to acquire the bus and it does so by registering a bottom half on the bus, and at some point, the I2C core code will schedule that bottom half, and then the target can start the transfer, send the data back, and then you end up in a situation like this where you basically have two controllers sending data to each other, and this works fine. But the problem is that this is not the full story because this controller, the sort of main controller that owns the bus is controlled by something, and it's controlled by typically some kind of host I2C bus driver. For instance, could be the A-speed driver in Linux. And because that works with MMIO, another problem with the old recursive way of doing things here would be that you would end up blocking the guest while you were handling all this and you could just keep going on and on and on, having the guest blocked and nothing would really work. So basically, if you have an MMIO right, you handle the command, you start the transfer, but you get another bus right to actually send the data. But even if we introduce this new way of doing this, that when the command, the full message has been sent to the target and we acquire the bus, then the problem with getting the data back here becomes that these are totally synchronous. And for the driver to work, the driver needs to pick up the data before it can receive another one because there's only one byte register in the A-speed model to receive a byte. So we sort of cannot just have the target synchronously send this stuff because we need this interrupt to be handled before we send the next byte because we somehow need this data to go back up into the bus driver. So we need some way to suspend the target at this point. So the first problem, of course, is actually that the target devices, they don't implement the slave interface. So there's no target of this transaction. And as I said, you end up overriding the data in the target if you just implemented it naively. So the way to make all this work is that the controllers can support something called target mode, which basically that they are a target on their own bus. And it allows the host driver to read the data that is being sent to the device. And it can read the data, it can acknowledge to the controller that it read the data, and then the controller can continue from there. So again, no controllers in Q&A supported this. And the biggest problem here is that it breaks this fundamental rule in the I-square C core code that all transfers must complete immediately. So we sort of have the first problem we need to fix and that is to fix this constraint that the transfers are synchronous. So we do this by basically adding a new asynchronous version of the I-square C send and a new event into the event callback. And the cool thing here is that because we already have this infrastructure, we already have this button half by the target being registered on the bus. So if we just reuse that button half and allow that button half to basically yield and turn it into a state machine, then these button halves knows if they are just getting starting on the send. If they are in the process of sending a byte, if the byte has been acknowledged, a stuff like that, so we can add that. So to finalize the transfer, we also add a new command or a new function in the I-square C code, the I-square C act, which basically is the same thing as returning an integer, but instead we are returning explicitly at a later stage that yes, now we got the byte. The cool thing also is that this doesn't impact any of the existing device models. So all the sensors, all the other targets that are synchronous in nature, they can just continue being synchronous in nature. And the asynchronous device models like the I-square C board controllers, they can just choose not to implement the synchronous version. They only implement the asynchronous version. The second thing we had to solve was adding this target mode support and we choose the ASP to do this. And the reason we did that is because the Linux kernel driver has nice support for this already. And also in QMU, the authors of the ASP controller model, they sort of left it as a fill-in-the-blank scheme. So the code was all filled with to do, gotta make slave mode work, stuff like that. So given the code in the Linux kernel and the driver code, it was actually surprisingly straightforward to know what interrupts to raise at the right points and stuff like that. So this is already in upstream QMU. So the way we did add the target mode apart from just making sure we got the right interrupts handled, it was also that we basically added another target directly on the bus. And this target, as I mentioned, only implements the asynchronous callback and it implements the event callback. And what will happen is that whenever it gets something on the center sync, it will copy that data into its byte buff register and it will raise an interrupt to tell the host that there's data waiting. The host driver will pick up that data and it will acknowledge the interrupt and then the device model, the A-Speed device model will acknowledge on the bus that we can have ready for the next byte. So it starts looking at something like this. When we're done sending the message or the MCTP packet to the target, we're gonna end the transfer and then the target over here is gonna acquire the bus. So it will process the command, it'll make a response ready. It will indicate to the A-Speed that it wants to start an asynchronous transfer and the host driver will acknowledge that the slave mode is enabled, we're ready, you can start sending data. And then we basically do a dance here where we continue doing the async send with a byte and when it's acknowledged by the host driver, we do the act and this continues until the full message has been sent and finally the button half will enter the transfer and release the bus. So to put this to work, what we did was add a abstract MCTP target device. So we made a I-Square-C target that implements the basics or the core of MCTP. That is it handles the I-Square-C transport, it handles setting up the right CRC calculations, encapsulating the MCTP data. It also handles something called MCTP control messages which is about setting up the MCTP network and then it implements these send and event callbacks. So this is a synchronous device on its own because all it needs to do is soak up the message and as soon as the full message is there, it can deliver it to a deriving device that implements the actual MCTP message type in this case NVMe MI. And when the data for the response is ready, it can, the deriving device can call the I-Square-C MCTP schedule send which is just a way of getting all this asynchronous mechanism working or started and then we'll basically end up in this where the abstract device will take care of, you know, packet by packet sending the data back to the host. So to get all this running and getting up and running and trying it out, the framework in the kernel that handles MCTP needs to be aware that there is an MCTP controller available. So in this case, you can modify the device tree and tell it there's support multi-master, there's an MCTP controller on this bus and that specifically at this I-Square-C address, there is a I-Square-C controller that supports slave mode and we want the slave mode to be set up at this address. So when we launch this, we use just the system arc, we choose the A-speed 2600 evaluation board. We have a kernel supporting I-Square-C and the MCTP framework. We have a super simple root file system and then our modified device tree and then we add this NVMe MI-I-Square-C based device. They're giving it an I-Square-C address and MCTP endpoint ID. And as it loads up, we'll see the MCTP call in the kernel being registered. You'll see that it'll add a slave device on the bus and then we can suddenly start using this entire framework that has only recently been added to the kernel. So this MCTP tool works sort of like the IP route commands. So you set up an address for, so you tell it that on this particular interface, I'm gonna give myself the I-Square ID MCTP endpoint ID eight and I'm gonna bring the link up then I'm gonna add a route to this device on that interface and because this is an I-Square-C based transport, we need to tell the routing mechanics inside the kernel that it's available at this address. And with all this, basically with these four commands then the MCTP stack is bootstrapped and we basically have an MCTP network running. And now we can use this to exercise the MCTP commands and the NVMe MI device. So we can use this tool that's included in Lib NVMe as an example tool and basically communicate over I-Square-C directly from the host with the NVMe device and getting this stuff out. So this doesn't work too. So when I posted this, there were a surprising amount of interest in it. So the work that Peter is doing on upstreaming all of the meta, Baseport Management Controllers, apparently used target mode on multi-master functionality extensively. So, but he raised the issue that right now, two HB controllers and there's one of the boards that basically is a 2600 and a 1030 I think that wants to communicate over I-Square-C. But the problem right now is that they can't do that because the controller transmits are always done synchronously. So we need some way of changing the logic inside the A-Speed to basically say if I'm targeting an asynchronous device, then I need to use the asynchronous API. And so PDII talked a little bit about that and we definitely have a good idea about how to solve this. Also because MCTP is exclusively transfers then currently the target mode inside the A-Speed doesn't support the target mode receive. So it can only send data, it can only do the MCTP block right. So that's it for my talk. I don't know if there's any questions on this. Yeah, so the question is I think if we could just do one control, basically without not having any arbitration, but just. Yeah, so I guess that's possible but the core idea was to actually hook into, we wanted to be able to make sure that we could just have a target device that we hooked up on the same bus as all the other devices there that could be sensors and everything else. And we somehow wanted to use the same API, the same I2C core to do this transfer. And as I mentioned, actually the biggest problem was the need for the asynchronous part. The multimaster was actually not, I had an initial solution that was basically just swabbing around when the target was removed from the active target was moved on the bus. But you still ended up in this that when you actually were driven by the target controller then you're in a VM exit and you are blocking everything. So that needed to be some way of yielding and that was the reason we did like this. So the question is if you have to provide your own device tree. So if you build this on say the A speed which right now is the only one that supports this then the only thing you need to do is this is the size of the overlay you need to do. So you just need to, and this is not because the QEMO or anything like that requires this. This is only because the kernel driver needs this attribute to initialize on that device. So I know that Jonathan Cameron did some work with the multimaster and the I2C stuff where he baked in the device directly into the board. Like here we are adding it dynamically but he built it into the A speed and he basically set up the device tree from within so you can bake it into it. So if you build your board and the board includes a static device then you can build it into the device tree directly. Any other questions? Thank you.