 Thank you very much. Yes, today I'm going to talk about a little side project, which I had to become somewhat intimately familiar with while I was developing drivers for the FATO project, which involves writing a driver so that people under Linux can use some really cool audio devices, which connect to the FireWire bus. So I'm going to start just by introducing the FireWire bus very, very briefly for people who may not be familiar with it, since it's not the most common thing around these days, and then move into the reason why we do this, the easy bit of the whole problem, and then get into the main section of the talk, which deals with the, what are so-called asynchronous packets, which end up being the really, really hard bit for various reasons. So the FireWire bus, it comes at this stage for all practical purposes, comes in two speed variants, 400 megabits a second and 800 megabits a second. Not many people see 800 megabits per second devices, but they are around. The Fireface 800 audio device from RME is one such unit. Each FireWire bus can have up to 63 devices per bus, and the bus operates or the bus implements something which is best described as a global address space, in that when a device is plugged in, a bus reset occurs, the devices organise themselves, they get node IDs, and everyone just shuffles themselves around and sorts themselves out, and at the end of the day each device ends up with a block of contiguous addresses on the bus, so that if any device wants to configure or communicate with another device, they do so by effectively writing into one of the addresses that belong to the target device. The actual address space is divided up into other sections, so you've got node IDs and so on, which aren't important for the purposes of this talk, but basically we've got this unique address space and we can control devices by writing into these addresses. There's two different packet types on the FireWire bus, each one of which is optimised for different reasons, a bit like TCP and UDP really. The so-called asynchronous packets, they are addressed from one device to another address on the bus, which basically constitutes a registered write to the target device. The bus gives no timing guarantees about when that write will actually happen, but it guarantees that it will happen, and as a result of this, asynchronous packets are used for device configuration control, so if for example you want to hit play on your FireWire camcorder for example, that results in an asynchronous packet being written, which tells the unit to hit play and it plays. The other type of packet is an asynchronous packet. These are broadcast packets, so any node can see them if they care to listen for them. There's up to 64 different channels on the bus, so you can conceivably have one device sending data on channel zero and another device sending it on channel five, and whether there's a listener for it or not, the originating device doesn't actually care. It'll send it out anyway. The important point about the asynchronous packets is that as part of the bus setup that I mentioned earlier, each device can request a fixed bandwidth from the bus, and if that request is acknowledged by the bus master, then that device knows that it will always be able to get that amount of data through in a given time, so there's a guaranteed bus bandwidth and there's guaranteed delivery timing as well. And for obvious reasons, that is a sort of traffic that's used for transporting streaming data over the bus, so mostly audio and video. I think hard drives use that as well, yes? You're doing transfers that say 600 megabit and then 400 megabit bus. Okay, so the question is if you've got an 800 megabit bus and you've got an 800 megabit device or 600 megabit has been reserved, a new device gets plugged in and that device is 400 megabits and ask 400 megabits what happens? Basically what will happen is there'll be a bus negotiation phase and maybe the 600 megabit device gets the reservation at 600 meg, and then the second device requests its 400, and that device's request for 400 will be refused by the bus. Not even if the device is doing it, if you just plug in a device which is not. Yeah, even if the device has requested the bandwidth and the bandwidth has been allocated. If you just plug in a device which is not. If you simply plug in a device which is idle. Yes. That is lower speed than the amount of traffic reserved on the bus. Will that still renegotiate the bus and kill the transfer? It will cause a bus renegotiation, yes. Will it kill the transfer? It depends on what that new device requests during the bus reconfiguration. Are you saying that, okay. So you're trying to plug a 400 meg device into a bus that's 800 meg. Otherwise 800 meg. The Fireway 800 bus plugs are actually different to the 400 meg bus plugs. So you need converter cables. What actually happens in that situation is the bus, the bus, I think the bus link at the other end of that device's physical cable will drop back to 400 meg. But the other link between the 800 meg will stay at 800 meg, I think. But anyway, yeah. The Fireway bus is quite smart in that you can have, because it's just double the bit clock, so you can have 800 meg of traffic go past the 400 meg of device, and the 400 meg of device doesn't care. So it's basically like got a header that every device on the bus can understand it. 100 meg of it. And 800 meg of devices can understand everything. And if you're not fast enough, it just doesn't understand it and ignores it. That's right, yeah. All right, I will continue. In order to manage this bulk transfer, I'm going to skip this pretty quick. The asynchronous packets are managed. We have a second. It's divided into 3072 so-called cycles, and each cycle is further divided into subsections. And basically what happens is that for each cycle offset, each device can transmit exactly one asynchronous packet, and that's how they arrange for the guaranteed timing and for the bus bandwidths and stuff. There's an asynchronous clock that runs just over 24 megahertz, and that clock is synchronous across the bus, so every device has a synchronous idea of what this bus clock actually is. And the final piece of the puzzle is this so-called asynchronous resource manager, the IRM, and the device that becomes the IRM is part of the bus negotiation when it's setting up, and it has an election thing that happens, and the IRM is responsible for tracking the bandwidth requests from the different devices and making sure it doesn't over-allocate bandwidths and so on and so forth. And that's just a diagram that summarizes how that all works. I'll skip over that pretty quick. All right, so there's basically two different types of firewire cards, which can be used on the bus. OHCI compliant cards are what everybody in this room would have if they have a firewire card in their laptop. It basically, every card that's OHCI compliant implements a standard interface to the hardware, which means that an operating system basically needs one driver, and it can talk to all of these OHCI compliant cards. So that's why in the next corner, we've only basically got a single device for cards, and that's the OHCI cards. There is another type of card called the PCI links card, which utilizes a completely different chip set. It is completely incompatible with OHCI, and the manufacturer, Texas Instruments, actually originally designed this, I believe, for hardware bus analyzers, and as a result, they never bothered writing example driver code for the chip. And as a consequence of that, while some people thought, hey, it'd be really cool to put this chip on a PCI card, because there was no example code by the manufacturer, they put the chip on the card, chip the chip, chip the card, but no drivers for any operating system. An interesting thing. Now, what that basically means is that you can't use a PCI links-based card for normal work. However, as we'll see soon, it's a particularly nice little thing that they did this because it made my job a lot easier. So quickly, why do I want to try and see packets on the firewall bus? And obviously, not everybody who makes devices provides drivers for Linux. And so if we are to use those devices under Linux, we need to know how to control them. There's a whole debate going on about whether we should be supporting uncooperative vendors and the like. The context that I do this stuff in, which is in support of ProAudio devices, the reality is that these devices cost many thousands of dollars. And if somebody in the audio game or in the music game wants to run Linux, if their $4,000 worth of investment isn't supported, then it's a non-starter from the start. They just simply won't do it because they don't want it. If somebody comes and says, hey, I want to run Linux in my studio, oh, you've got to buy a new hardware because none of your hardware is supported, you can probably imagine where that conversation ends up. So my motivation for doing this is to lower the bar of entry for people who have existing hardware, didn't know about any of this stuff before they bought it, want to use it under Linux, and I think we ought to support them if we can. The other thing is that even if you've got manufacturer support, sometimes it's actually useful to see what packets are flying across the bus, particularly if your driver isn't doing what you think it should be doing. So what we end up wanting to do here is capture the packets that are sent from a so-called supported operating system to our target device and observe that's one part. And the other part is that if we're trying to debug our own Linux drivers, we want to see the exchange between the Linux system and the target device basically to see where we're going wrong. So the easy bit, that slide's got a wrong title on it. This should be asynchronous, sorry, asynchronous packets, that's correct. So the asynchronous packets are actually easy because as I mentioned earlier, they're a broadcast packet. And by definition, everything on the bus, if it's listening to the right channel, can receive these packets without anything special having to be done at all. We can have an OHCI card on the bus, we can listen to the right channel number, and you'll see the packets without any problem at all. So there's no special hardware. You basically can, as long as you've got a second computer, for example, you can set that up on the bus anywhere on the bus. And there's a little utility called dump ISO, which is shipped with Libreaux 1394. Very, very simple. We can run it. There's the example command line is up there. Dump ISO, filename, hit Enter. And any ISO packet that flies through the bus at that point from then until when you hit Control-C will be dumped into that filewire.dump.dat file. You can use a hex viewer to view it. You can use dump ISO to find out what the format actually is. It's a bit messy, but it's easy, it's quick, and you can see what's going on. As a side note, I'll also mention that the PCI links card, if you've got one in your system, can actually be used to view the ISO packets as well. In many ways, it's actually more flexible at the moment than the dump ISO command simply because it's very easy to get human readable ASCII hex out of it. You don't have to run a hex dump or anything like that. You can get it straight out of the system. All these tools are open source tools. We can hack them if we need to, formats around or whatever. So if you're debugging a particular protocol that has weird data alignment conventions, then we can do it without any hassle. The asynchronous packets are a little bit trickier. I mentioned that the asynchronous packets are addressed only to a specific receiver. And the wonderful people that came up with the OHC specification mandated that when you design the OHCI compliant hardware, if an asynchronous packet wasn't addressed to you, the hardware dropped it. And what this means is that even if we want to see it, we can't because the hardware drops it before the operating system has any chance of seeing it. So there are fortunately several ways around this which I will be outlining in a minute. This is where the useful fact about the PCI links cards come in. One of the useful features of those PCI links cards is that by design they won't drop asynchronous packets that aren't addressed to them unlike the OHCI cards. This is why they're very, very useful when you're doing this sort of work. I mentioned it was on an earlier slide but I didn't highlight it. You can still get these PCI links cards for about $100 US if you know where to look between $100 and about $300 US I think. So they are still around fairly easily. I think you can even pick them up secondhand on eBay at various times. They're relatively inexpensive and as we will see, they probably in my opinion give the best compromise between expense and hassle when you're doing this sort of work. The general idea with this protocol analysis stuff is very similar to Trig's French cafe technique. For those that aren't familiar with it, you're basically sitting down, you watch what happens when you do something, you watch what happens and you try and work out what bits of what you did corresponded to a specific action. In the context of the devices we're trying to look at, you might press play on something, look at the packets that go through and think, oh okay, that packet caused a play. So you try it and look at press play so you know partially how it happened. There's a question. Sorry, I wasn't sure. You mentioned the TI cards. Are they still available new? Yes. The PCI links cards are still available new. There is a company in Thailand called IOI I think that make them. They've got various distributors in various countries and that's all on their website. So it is possible to buy these things new as well. I got my card through them quite a number of years ago now, but they are still available still on their website. So yes, we cause an action to happen. We look at what was sent to cause that action and we note it and we do something with it. And then we repeat for every single possible control action that you could imagine for the device that you're controlling, which can get tedious. It should note that some protocols are easier to deal with than others. On the Mo2 audio interfaces that I've done a lot of work with, the device registers are actually read right. So you can actually read back from the device register and get the value that was written to it, which is kind of useful because it means that you can see at a glance what the entire device state is just by reading its address space. There are other devices which use a given register for totally different things depending on whether you read it or you write to it. And so you can have the same register when you read it might be status and when you write to it might control device streaming. And so those sorts of devices can be rather confusing to analyse because you've got the same register being used for two totally disparate things. It's all about patience. So here's an example of the sort of thing I'm talking about here. This is again using the program called Nosey Dumb, which comes as part of Nosey, which is the tool under Linux that we can use to use PCI links cards to do this sort of thing. So the first thing, we do two different things in this example. The first thing is to set, this is for an audio interface, so we're going to set the headphone source to channel seven and eight on the ADAT-B interface. The detail isn't important. We're just setting a source. So we do that and we see a whole bunch of stuff and I don't know, can you see the mouse pointer there? So the important thing with this stuff is this data line here. That data line there is telling us the actual contents of the packet that was sent when I selected that sound source in the control application on the other computer. So we note that at the moment in isolation, not much use. We then repeat the process, but this time we set a different source. In this case, we're setting the main outs. And again, we have a nosy dump output here. Again, in this case here, we note the offset, which is basically that address that I was talking about, is the whole bunch of stuff up here, and then C04. And the data is now hex 110. So we first of all see that the same offset was used for both of those, C04. So we can be reasonably sure that that register is used in some way to control the source that we're setting. It may control other things as well, but we know it's involved in this. And then we look at the two data values. One is hex 110. One is hex 11e. And so the thing that changed was probably that lower nibble. And so we conclude here, based on this very simplistic output, that bits 3 to 0 of device register Blast C04 control this headphone source. And we know that a value of 0 selects the main out. And we can guess that the value E selects ADAPB channel 7 and 8. And what you would actually do in practice is you'd go through and select each individual source in turn, capture this stuff, compare that data value, and actually make sure that you're correct in the interpretation of the bits. It can get very tedious because some of these things can have like 16 different settings. And when you've got 16 different controls with that number of settings, it can take a long time. But you've got to do this so that you can tell the differences between bitmap fields and enumerated fields in these device registers and the like. So that's the basic process you've got to follow. You've got to go through, select every single control that you can possibly select in the device and more or less go through that process that I just went through, spotting patterns, looking at those at the output from nosy dump or whatever you're using to capture packets, and use that as the basis for working out what bits in the device control what things. And sometimes it can be quite complex because often devices will not spit out a single right, like the example I gave you. But sometimes they'll spit out a whole torrent of stuff and only one of those rights is actually applicable for the setting you've changed. What I've found is that a lot of these devices will often send their entire device state to the device, even if you've changed only one thing. So it can sometimes be very tricky to find the actual bit or bits that you're after in amongst this huge torrent of stuff. And after you've done it for a while, it becomes easier, I assure you. So the options for a synchronous packet capture. And I'm just going to basically run through a number of different ways that we can capture these packets, outline advantages, disadvantages, and so on. So the first one is hardware bus analyzers. I won't spend much time on this because they cost far too much money for most people. They can be hard to use, hard to drive. And I must admit that I've never actually seen one because I don't know anybody who owns one. They're around, but they are often used by the device developers themselves. They're really expensive. And they also do a lot of stuff that we don't need because they also look at timings and all those other issues that are important for the manufacturers and not so important for us. So they're overkill for us. I mentioned them for completeness. The second example, and this is the one I very much prefer, is using the PCI links card. The process is summarised by the diagram on the screen. We have our so-called supported operating system on one computer. It's daisy chain 2, the device via a Linux PC running a PCI links card, which implies that the PCI links card you use must have at least two ports, which unfortunately some card bus variants don't. The reason for this is that although the asynchronous packets should be broadcast across the entire bus, experience has shown that this does not often happen in practice. And so if you try and sniff at the end of the bus, you will possibly miss the packets from one or both of the devices. The standard says that that shouldn't happen, but it does. The software we use is a packet, a thing called Nosy. As of 2.6.36, that is a part of the standard kernel now. So if you get a standard kernel source that's in there, it used to be that you'd have to grab out a Git repository, so it's a lot more accessible now. I should also note that this is distinct from the PCI links driver that was in the kernel up until 2.3.36 or 35. I can't remember when it was removed now. That driver was totally different. It's very, very old. It was actually an attempt to write a driver that allowed you to use PCI links cards to do the normal FireWire stuff, like getting audio, video stuff through your FireWire. It's a bit rotted. It apparently never worked. I've never tried it. And when they removed the old FireWire stack, they removed the PCI links driver as well. That's fine. I don't care. The Nosy driver is the one that you need to do this work with. I make that distinction because it confused somebody on the mailing list two weeks ago. So an example of running Nosy dump is as simple as given there, just Nosy dump minus V. The minus V tells it to do a little more of a Bose output, which in some situations is... or gives better output for our purposes. The advantages of this, as I said, they're relatively cheap, 100 to 400 US depending on the vendor. All the tools we need are available in the Linux. They are all open source, which means that you can hack the output of Nosy dump to make it much easier to spot patterns in the packets if you need to. A practical example of that was with the Mo2 audio interfaces that I dealt with. They align their audio data on 24-bit boundaries. And the FireWire bus standard is built around 32, so you had packets, you had bytes from different audio samples in different 32-bit numbers. And it was easier to spot channels and the like by changing Nosy dump to output in blocks of three bytes. So there's an advantage in that as well. The disadvantages is that in theory at least it does require a second PC, and you need the somewhat specialized PCI links hardware. Another option which I'm assured works by some people, I've never tried it, is some of the... or two of the ancient PowerMax actually had a PCI links card in them for normal use. Apple actually wrote the driver that was needed and apparently everything was good for them. But they're very specific. They are only the blue and white G3, and I'm told the Yikes G4, which had the PCI graphics adapter in it, also used the PCI links chipset. I've got a PCI links card. I don't need one. You can give it away back soon. So you've got to use Apple's Firebug software with this. Again, I've never used it. I don't know anybody who has. That may or may not be easy to use in the context. Some people may have these lying around, maybe. Well, somebody in this audience has. They are relatively rare to find. The hardware is dating. It's hard to maintain. And also because there's so few people who've actually attempted this with these machines, it's probably difficult to find support, whereas there's quite a few people now, particularly on the FATO mailing lists, who have actually used the PCI links route. Another thing that sometimes works, and I've actually talked a couple of people through this, is what I've termed indirect analysis. The idea here is that we have a topology very similar to what we had before, where we have a supported operating system, our Linux machine, and the target device, all on the one bus. And we use our supported operating system in the same way as we did. But in this case, we don't have a PCI links card. We have just another OHCI card in the Linux PC. And what we do here is we use the... We change the device configuration in the supported operating system, and then we use the Linux PC to query the device registers in the target device to see what's changed. Now, I've got an example here of setting the headphone source to phones on our Moto Traveler, which is my audio interface. FATO has a tool in the test directory called ScanDevReg, which is a fairly developer-centric tool. I will admit, if you want to change address scan ranges, you've got to hack the software, you've got to hack the source, and so on. But basically what it does is you start it up, and it scans the device registers and stores a copy of it. You then go to the supported operating system and you change the source, and then you go back to the Linux computer and look at what ScanReg has detected as a change, because it's scanning the device registers every couple of seconds, I think, from memory. And so in the example we've got here, and again, I'm not sure how visible my mouse cursor is, it just outputs a single line that tells you that this C04 register, it changed, tells you what the original was, what the new one was. In this case we can actually see that, again, these lower bits are associated with the phone source, and you do this repeatedly and you can build up an idea of how to control the phone source on that device. The advantages, we don't need anything special to do this. As long as you've got an OHCI card in a Linux computer, you can do this process. The disadvantages is that you're relying on the device registers to read back what was actually written to them by the controlling PC. That is valid on some devices, like Mo2 devices. It is invalid on devices like RME, where they write and read modes do totally different things. The other important point to note is that if the device, for example, has a generic configuration register, and it involves write-enable bits, and by that I mean that there's actually a bit that says if this bit is set, then this other bit sets a control of some description. You don't get any information about write-enable bits by this method because you don't ever see them because they won't read back. Although you can get some details from it, it can be misleading and so you see, oh, I know how to do this, and you implement it and it doesn't do it. That's probably because you're missing a write-enable bit. Another option which I'll consider very quickly is actually doing packet capture on supported platforms. There's an example there called Bushound. It basically works by burying itself very deep in the driver layer of the target operating system, from which it can intercept packets on the very, very low in the driver layer and print them out. The advantages are you don't need an extra computer. The Firewire bus topography is normal. You're not plugging in extra devices into the bus, which can be an advantage for certain buggy devices. The disadvantage is that there's no free and open source software that does this. The only options are commercial software. Commercial software is very expensive of the order of 700 US and above. There is a free version of it available, but it is crippled in so far as there's a limited size of packet capture before it stops. And then you've got to restart it, which is fine if you're only having to capture one or two packets, but if you've got a whole stream of packets, like 64 or 128 packets you need to analyze, you run out of buffer space before the interesting stuff comes through. So there are limitations there as well. The other one which is worth mentioning quickly is using a virtual machine under Linux, whereby we have our supported operating system installed in the VM. We run it and we use the underlying Linux system to spy on the packets that are coming out and report them to us. The problem is that as far as I am aware, there's no free and open source VM that implements Firewire pass-through, which means we can't do this. And at this present moment, I don't have the inclination to add this to any of the alternatives because I've got a PCI links card and it works fine for me. And I don't want to have to learn a whole VM infrastructure just to do this. I would actually love it if someone was able to do this, like add it to Quem or VirtualBox or something, because this suddenly would then give us a lot, like it would mean that I can simply say, hey guys, do this and you can capture the packets for me and send them to me and you don't need special hardware, you don't need special software, really, really easy. But so far that hasn't happened, but it would be seriously cool if we could add that at some point. So to summarise, I don't believe that capturing these packets is overly difficult. It's certainly much, much easier if you're willing to outlay 100 or so US dollars to get the PCI links card. I think it makes it much easier, you have a lot more flexibility over the other alternatives, mostly because your entire development capturing system is open source and you can edit it, change it as you wish. I've mentioned that there are other alternatives around. I don't believe they're quite as flexible as the PCI links option. But they're out there and we have actually used all of the methods that I've outlined, except the VM one, obviously, on the FATO project at various times when appropriate to get us out of a bind. And then as I mentioned, if we really could get a VM to implement Firewire pass-through, it would make a lot of this stuff very much more accessible because suddenly no one needs to buy the PCI links card in order to proceed with this. I'll throw up a couple of links here. Top one's just the FATO project, which I'm part of and which spawned all this work. Then there's Nosey there, Liberal 1394, which should be packaged or available on almost every distribution and existence. And my email address for anybody who might have questions about what I've raised. Probably the big take-home message is that capturing the packets at the end of the day is relatively easy, but if you're ever to attempt this particularly with more complex devices, like the multi-channel audio interfaces that are available with like 60-odd channels and onboard mixing controls and all the other paraphernalia that comes along with it, you need a lot of patience. It can take hours to capture all the variance that you need in order to be able to build up a complete picture of the protocol. But if you can do it, the end result is that it should be possible then to take that information and write drivers based on what you've captured. So I believe we've got probably time for a couple of questions if there are any. Any questions? I know it probably hasn't happened much yet, but has anyone done much work with the Yamaha digital sound desks that connect via Firewire? The Yamaha digital desks that connect with Firewire, you're probably referring to the MLAN stuff. Yeah. There's been start and stop with that. Nothing happened for a long time. Then someone said, oh, we'll look into doing this and they did a little bit and stopped. As I understand it, the problem with MLAN is one, nobody but Yamaha actually supported it, which made interconnects rather interesting. And secondly, I've heard from those who've actually looked into it that it is an absolutely horrible system. It's the only reason I'm still running Windows on that laptop. And also the big significant issue is that Yamaha themselves have actually discontinued it. It was a very, very short-lived experiment, I guess I'd probably describe it as. And so I guess the problem there is that the motivation is fairly low. The other issue is I'm sure that with this stuff here, if people don't have the ability to do on the devices they've got, there are people like myself and a few others that can do it, to do it, we need physical access to the devices. And often you'll find that people are not willing to ship two and a half thousand dollar systems halfway around the world to somebody in Australia or wherever we happen to be based. I feel your pain. And look, and that's unfortunate, I can't honestly see MLAN being supported anytime soon. The documentation for the protocol is actually available, I think you can get hold of it. But, and so there's less of an issue of having to capture packets there, it's more an issue of implementing the protocol stack. And it's a very time-consuming task because it's a horrible protocol. And yeah, at this stage I haven't heard of anybody really having the inclination to do it unfortunately, or at least it's unfortunate for people who are stuck with the legacy devices because I can't see Yamaha supporting it for very much longer themselves, so yeah. Any more questions? In that case, we'll end a few minutes early. No worries. Okay, thank you very much.