 So I have to, I still welcome, thanks for being here. I have to change a little bit as mentioned before, there will be no live demos today, sadly, but I'll be tomorrow in the evening at the showcase with my laptop, hopefully in better condition than and then we can have all the live demonstrations which are, which I sadly can't do today. As the title already said, the thing is about making I-square-C more robust. The motivation for that was that I, as a maintainer form, I'm surprised myself, five years now, when it comes to special error conditions on the I-square-C bus, I found out that the drivers miss mostly the same things. They look different because the driver course and the registers and look different, but at the ground level, the problems arising were pretty much the same. And honestly, I wasn't too sure, I'm a consultant and I'm mainly contracted by Renesus, so I'm taking care of the Renesus I-square-C IP course and I wasn't too sure if those all behaved correctly. And the thing was it was just the problem cases were too rare and pretty hard to reproduce. If, I mean, it's somehow a good sign, I-square-C just works, right? Mostly. So there was, I'm very thankful to Renesus for that. They wanted to have a reliable mechanism to produce these errors and they were funding this as a first step before we now as a second step can go and fix the drivers. And as good citizens in the community, this is all open source and will be available for everyone. And I think it will be very good if more people use it. Okay, here's a little advertisement for the SIGROC project I'm mainly using. It's a bit moot now because I cannot do the live demos, but yeah, I'll mention it because I really like the fact that you have one software, of course, which you can use with various multimeters or logic analyzers. And instead of which each new device you have a proprietary software and have to learn new stuff and this software can do, you probably understand what I mean. This is a short description from their website with I think pretty nice logo. Here also from their website are the design goals, which I think they meet pretty well, especially cross-platform, lots of drivers for hardware. I think this is also where most development is happening right now in that adding new drivers for hardware and protocol decoders, which are super nice and easy accessible because those are written in Python, which makes it pretty easy for people to make something readable out of all the zeros and ones. So especially for I-squashy, you can immediately see where the start, the stop and what the single bits mean. Why is that going so fast? Yeah, I'm not going through all of that. You can read that and check the website. I think it's a lively project, but not surprisingly they could need some more help, especially if you're into GUI coding. I think the guys doing the PulseView program GUI doing a great job, but there are lots of things which still could be done. And yeah, so to show that it's really super simple to measure I-squashy, there's my PC. Well, it's not down there. By USB, I use the open bench logic sniffer. That's a little bit of advertisement, but hey, I like to advertise projects which you open hardware, which is an open hardware project. Costs about 60 bucks. So I think this is both on a private level, doable and also if you want to convince your manager, I need that device, it's not super much. You could get a bus pirate, which is also open hardware and even cheaper, has a little less functionality, but will still do for I-squashy. And we just need to connect three wires between the sniffer and some board. I had, of course, here a renaissance board connected and then you're ready to go. Which I sadly cannot show you and what I wanted to show you was like how to set up, it's mainly just selecting which hardware you're using, which channels you're using, then you give the channels a few names and then add a protocol decoder. Because of the names you have given to the channels, the protocol need decoder knows, ah, this is the clock, this is the data line and then starts interpreting data which makes I-squashy debugging pretty easy. That would have done, needed less than a minute probably. And for the first thing, which is super nice to do with this setup, which is not even an error case, but often done wrong. For that, I just want to give some basics, very, very simple basics about start and stop conditions and bus idle and stuff like that. Because I-squashy wires are open drain, you have to pull down lines, otherwise they're always in the high state. And so bus idle, nothing happens, this is here above both lines, the clock line and the data line are high. So this means everything's fine, Sabi, somebody can start requesting the bus and doing transfers, transactions. Usually, the data line is only allowed to change when the clock line is low, usually. This is what always happens when data is communicated. That means when the clock line is high and the data line changes, this is a special event. In this case, the start event, we start from the idle condition, both are high, and if that goes low, then some master is requesting the bus, this is the start of a transaction. Or as we will see later, it's pretty the same if you have a repeated start. We'll come more to that later. And the other way around, if the clock line is high and we have a transition from low to high, that mean bus is available, I'm done. And remember that I-squashy is multi-master capable, it's not used super often, but once in a while. And all the masters will check for this and then they know, okay, bus is busy, I will wait for a stop until I can start again another transaction. And so there's one terminology which is super important in I-squashy. This is a difference between a transfer and a message. The transfer is really simple, it's everything between a start and a stop. That's the time, thank you. Where the bus is requested by one master and nobody will be able to interfere. Then again, such a transfer may consist of multiple messages. That is usually, one message is a write message and another one is a read message. They need to be different. That read and write need to be in separate messages but they still can be in one transfer. The super standard example is, if you want to read a register from a client, from an I-squashy client, you first have to send data which register you want to get data from. So the first message is a write. And then you want the actual data, so the second message is the read. And for that, you use a repeated start which makes sure that the new message has a clear start and you intentionally leave out the stop message to make sure no one can interfere. So also when reviewing patches, I really make, I try to be very clear, what are we talking about transfer or messages here because it's a difference. Oh no, that won't work. I would have loved to show that you were using SIGROC how the difference looks over on the wires. I sadly cannot, we can do this tomorrow. But just keep in mind that this is really worth checking on the wires. There are some drivers getting in wrong and instead of doing a repeated start, use a combination of stop and start. So that means there's a short time of a bus free condition where another master can take over the bus. So if you send out the first message, the right message to the device you want to read and another one interrupts you, your second read request will get a random result and not the one you want. This is one thing. And the other thing is there are clients, I had one, you send a message to bring it from a default mode into a configuration mode. And after the initial message of entering the configuration mode, you send another right message to configure things. As soon as you send a stop, this condition I showed you before, it will fall back out of the configuration mode into the default mode. So if you don't have a repeated start there, you're writing somewhere something totally else. So this is really the standard case which you should pay attention to because I just recently, we had this on the I2C mailing list and there were guys assuming that this was working and they were looking for weeks, read here, saved our team weeks of investigation on a major issue because they thought their setup was wrong until they really got down to the fact that the I2C communication is wrong. They took it for granted that it's correct. So they were modest and said that's our problem. Thank you, but I think that this should be really, in terms of good engineering, this should be the responsibility of the driver author who should check such things. And as I've hopefully shown, it's not or indicated more or less, it's neither complicated nor expensive to do these kinds of checks. And the same setup can, of course, now we're getting more to the topic used for the error cases I mentioned which are not so easy to see on the bus just by regular hardware because they happen rarely. The most obvious thing is the bus does not work. As I said before, the bus free condition is both lines are in the high state. So if one of those are stuck low when they should not be, that's a problem. The bus is not considered free and we cannot wait for years until they accidentally might become free or not. It's a bit of a problem that I2C has no timeouts defined as M bus has but I2C not, so that's a bit tricky. As I said, I2C can be multi-master so most driver authors for host drivers implement the arbitration thing if they two masters start communication at the same time, one of course loses. They detect that, report that via an interrupt bit or some status bit. And a lot of people implement that according to the documentation. But I'm not sure how many actually checked that in a multi-master setup. So this would be also nice to create an easy scenario where you lose arbitration and then see how your driver reacts. And what might come first, but I think it's, I don't know how useful it is just insert some faulty bits, make one a zero or something like that. I will mention that shorter, shortly later. And so I got this, got the idea. Well, fault injection might be a good start to create this rarely error cases and it is as simple as that. You have a SOC with the I2C brush, here's your client and you wire some cables so that GPIOs will access the same SCL and SDA line. And then you use the I2C GPIO driver to manage those GPIOs with the additional fault injector compiled in. And then you can create strange situations on the bus and check how the bus master driver handles that. Very few implementation details. As I said, currently it is just one block of code inside the GPIO driver store, I2C GPIO driver source, which is protected by an if death. I think this is good for now because the code is not super much. If we start adding new stuff, it might grow much larger than the actual source. So that's the option to do a separate on top of that module but currently it is that way. And it uses the debug FS file system, which I really, if you do such kind of stuff, I can really recommend it. It's super easy to use. You put all your files, which trigger something into a sub directory. And at the end you say, if you want to remove it, you just say remove that sub directory and it will do all the necessary stuff for you. So whoever worked on that props, good stuff. And now this is, I think, the error case which happens most. And which is the one where people want to come out of, which luckily is possible. This is when the SDA line is stuck low. That usually means you want it to read from the device. It sends a zero bit to you and then something happened and it stays on that level because the device still thinks you want to have a zero bit. I wrote some occasions when this can happen. We already had some very, I think it was a propriety first level bootloader where we didn't have the source code which did some I2C transaction but still handed over to Linux. So in the middle of transaction, Linux resetted the I2C driver and we had an inconsistent state on the bus. The other thing, also obviously, when the watchdog kicks in your Linux itself is in the middle of a communication and bam, you might end up with such. And of course, always true, there are broken devices out there, which just leave it in that state. How to simulate that? So there's a file in DebugFS where you write the address of a known device. You have to know that this device exists. And then the fault injector will start putting out that address byte saying, hey, you device, are you there? And the device will acknowledge it and say, yes, I'm here. And this is done by putting SDA low. And then at this point, we stop communication. We're not saying, ah, nice that you're here. Thanks, nice for letting me know. We just stopped this communication at that point. So SDA will be left low. Put this away, it doesn't work, anyhow. And this is what I wanted to show you. I hope you'll believe me that it actually works. And for me, that was a surprising thing because when I tried that with the PMIC, I saw after 27 milliseconds, SDA went up again. And yeah, that after reading the documentation, I saw that the PMIC is a very well-designed device who pays a lot of attention to the detail that the i-square shiba stays functional. It has an internal, so the client device has an internal timer watching the data line. And if it's stuck, it releases this again on its own. So this is pretty, I didn't know that, but this was pretty nice. On another board, I discovered that we have a chip, an external chip just monitoring the i-square shiba's and trying to get it out of this inconsistent state on its own. So lots of surprises if you do that. I would have after that showed you that there's the same doing with the audio codec, which is not so interested in the i-square shiba's, so it does not release the SDA line. If you do the incomplete transfer fault injection to that device, SDA will stay low and you have exactly this condition which we wanted to check. And you can read it up there for that case, i-square shiba's specification for that case has a recommendation how to get out of that because what the client really wants is some clock pulses because it still thinks you want a zero bit and if you give more clocking on the SCL, it says, oh yeah, okay, no, you got it. And some when it will be done with a transfer of the byte and will release the line again. So what you have to do, clock a little bit and always check is the line released or not. And we have a support for that in the i-square shi core. So some hardware IP cores have a special bit which does this nine times toggling to get the bus active again. So you can pass it an own recover routine. If your IP core has just the ability of setting SCL and SDA manually, you just pass functions how to do that and the core will do all this toggling for you or you can specify GPIOs which are also connected to the bus and then also the i-square shi bus will try to recover the bus by all doing this, generating the pulses. I think it makes pretty much sense to have that in the core because there are also some gory details. There's still a patch pending after, we have this for a few months and there's still a patch pending to even improve that to get that rock solid. So the good news is for that, we can have a recovery where we can get out of this situation. I would have liked to have shown you that because I implemented it for one of the Renesys IP cores. It works basically. So if we look at the wires, we get out of this installed bus state. However, the hardware does not report it back the way I expected. So I need to talk to our hardware team how this bit works. But on the wires you could see that it actually, the toggling works and after that recovered from the stuck bus. No, there's one slide missing. The problem with this bus recovery is that a lot of people are thinking this is magic and will solve all of my problems. And I'm very conservative here. I think it should be really only applied to the state I just mentioned which was described in the specification. So if there's one guy, maybe he's here, then we should definitely talk about it who wants to do this bus recovery although SDA is not low. I don't think this is from how far I understood it. I don't think that's appropriate. I think you can still send a stop signal and see what happens. Also a transfer timeout is not really indication of a stalled bus. Some devices do internal operations which last a while. Think of an EEPROM which needs to erase memory before it accepts further writes. So in that case I think you should really just report the timeout to the upper layer for the client driver. The client driver knows the client and knows what the timeout is and whatnot. I don't think you should have that on the driver level and do some bit toggling which might affect other devices. And obviously even people want to do that even if the clock line is stuck low. I'm really confused about that because if the clock line is stuck low we cannot do the pulses. But I have heard people wanting that so we have this on the list. So for me it's really only on the case you want to start a transfer and you don't expect, you don't find the bus free although it should be then you can try that. And I'm really, yes I said, I'm conservative because I understand that people want to fix their things but if they say yeah we did this and it somehow works, that is for me it's randomly. And I'm too afraid that while it works for them it will break things for others. So I really need to be convinced on how to put engineering level to do changes to that part of the I2C core. So this is easy to do but still useful. We can also simulate that the clock line is stuck. For that we have just a file called SCL where you can toggle it to a high or low but just writing one or zero to it. And if you write zero to it, it's stuck low. And then you can try a communication. It should return E busy. And then the upper layer again, the client driver knows okay the bus is busy, no for several times I should really start pulling the reset line or report to the upper layers that here's something wrong. But this is not the place where you should fiddle around with the I2C bus despite the fact that you cannot because you cannot clock. So this I wanted to show you also but it's super simple. You just see that nothing happens. Yeah, this is basically just what I said just before except that I also you see that this case is mentioned also in the I2C specification and it says exactly that. It doesn't say anything about trying something on the I2C bus. It just says do a reset. So this is also why it's a good idea to have if you have an I2C device with a reset line if you have some, if you have a GPIO left populated it can help you a lot. So this already works so far and helped me in fixing the Renesus IP cordless drivers outlook. What I still have in mind is like the arbitration lost thing. I think you can easily do it by getting an interrupt from the GPIO if another master starts the transfer and then just pull the data line to zero for a certain amount of time and then the other master should detect that it lost the arbitration and then you can see how it reacts. I think using interrupts this should be possible. The same for the SDA low case which we have here but without needing an external device. That would just, you just by doing the GPIO, using the GPIO you just pull the line down, count the pulses coming from the other driver and if you're some number between one and nine, if you're happy with that, then you release a line on your own. So I think that would be an addition as well. And the faulty bits I mentioned before can be useful on the SM bus level if you want, they have an optional PEC package error checking byte which is some checksum. I never practiced or used that in real life so far. That might be a good idea to check if the I-squashy routines, no the Linux routines for that are actually working. But it might also be useful if you have some integrity checking on the higher level again then so you could check for that. I think TPM might like that. And of course what I mentioned before, if it grows too large, maybe we should have a separate model just instead of just a huge block of if death. So and yeah, a bit rushed and a bit without demonstrations but this is what I wanted to talk about today. I wanted to recommend getting an easy setup to actually measure what you're doing with I-squashy to check if on the wires what really you expect, what happens on the wires is what you expect to be there. Which is very useful already for the start-stop thing or repeated start thing which really must work if you don't want to have or other people to have unexpected problems. We have now the fault injection for using the I-squashy GPIO driver. I gave a summary of what can be checked against already and what I plan or have as future additions. And I gave a small outlook on the bus recovery thing, when to use it, when not because there's also quite some confusion or misunderstanding and this is my view on that matter if you have a different one, just meet me here at the conference, I'll be here all the time. I think we'll have a few minutes for a few questions but all write me a mail, use the I-squashy mailing list to discuss but you've already been warned that I have a very conservative approach about that. Yes, okay, that was an interesting talk for me because of all the challenges. Thank you so far for attending and having the patience and thank you very much for that. And yeah, I will allow a few questions. Well, I think we have time for that. Thank you very much. So Ben, the first question. Yes, so the question was, do I have considered to have an actual device and whatever which could be connected which could also do this kind of fault injection thing? And I have considered that. Actually, I have considered a separate device and also we have now the I-squashy slave framework maybe using that. But why I chose that approach in the end was because I wanted to have something really easy for developers to apply and to check their stuff against. Buying hardware was a bit over what I considered the limit. I wanted just them to enable a compile time option, hook up some wires. I think having a logic analyzer is good anyhow and be done with it. So that was a motivation for doing it like this. I'm not, if some of you or someone else does a cheap open hardware open-sourced device which can be hooked onto something, I would definitely buy it. So I'm not against this approach in general. Yes, right now, yes. So the question is my example was just one master and one client. And what happens if there are multiple clients and you start clocking the bus? Do they behave correctly or what happens? So I think a generic answer for what happens is impossible because it depends a lot if the clients are broken or not on which clients. What I do think should be the standard case is that all the clients are in the same state regarding if the bus is free or not. So they should also know the bus is currently busy and another client is doing a communication like the client doing the communication knows, oh, it's my turn now. So I hope the other clients will understand the clocking. Oh, it's not for me. And whenever they see the stop condition, they know, oh, now the bus is free and I can start listening for my address again. That should be the standard case for correctly working devices, but we all know hardware. Can you repeat louder again? I see. So if I got you right, you're asking if I considered doing fault injection on the driver level. So like raising an arbitration loss case once in a while or something. I haven't considered this as a generic framework. I do it once in a while just hacking the driver in just setting a bit in software. I don't have, well, from a gut feeling, this is too driver specific to be generic or the overhead would be, I don't know. I'll think about it, but that's, I just can tell you my gut feeling right now. If you have an idea about that, we can talk maybe about later about it. Technically, it should be enough. Every client should check every time for a stop condition and go back to the, well, basically reset their state machine. I have, I recall an email communication where someone said that's not enough, but they didn't have a reproducible test case for that. So I was not very sure how well the problem was understood. And if it was understood, one stop at the right point might have been enough. So there, my answer, I still think one stop should be enough. And I want to be proven false with some not random data. So it looks like not more questions. Then I say thank you very much again. Have a nice lunch and see you later at the conference.