 Today's topic, as you can see, it's all about the touchy, feely side. Please welcome with a big round of applause, Peter Hutterer. Thank you. Thank you. Yeah, he's already introduced me. My name is Peter Hutterer. I work for Red Hat here in finally sunny Brisbane. Used to be a bit wetter than that, so you should pick just the right week to come. In my role at Red Hat, I maintain pretty much all the X input drivers these days, and the large part of the X input subsystem, at least whatever Daniel isn't working on at this point. Today, all I'm going to talk about is multi-touch. Not single touch, just multi-touch. And the first warning I have to give you is that we are still in the process of actually implementing this. The certain bits and pieces of the stack are already in place. Others are close to being in place. Others are completely even unscoped. So the talk today is going to be on a level where what I'm going to tell you today is hopefully going to be useful in a year's time. I'm not going to delve too much into APIs, because they may still change in the next couple of months. Unfortunately, when I submitted the talk abstract in the middle of August, I was hoping that we were close to release by now. Fortunately, it turned out to be a bit more complicated than expected. So I'm going to give you a bit of a high level roller coaster through all the three, four layers of the stack. First of all, touch. Everything is about touch these days. Everyone wants to have multi-touch, which is kind of funny, because multi-touch itself as a technology has been around for, well, since the early 80s. That's when the first research paper started becoming popular. Became kind of unpopular during the late 80s. People just stopped working pretty much. What made it really popular for the average user was pretty much your iPhone a couple of years ago. That really put multi-touch technology into everybody's hand. There is another quite popular, at least in the media, quite popular touch product out there, which is Microsoft Surface. I'm sure you've heard of that. Surprisingly, that hasn't made it into the hands of most users yet. I've had it from good sources that the current price stack for it is $25,000 Australian, which is probably $24,000 in monopoly money. So the iPhone really can be credited with putting multi-touch into everybody's hand. Since then, everybody has tried to do the same thing, so half of us probably have Android phones in some way, Nokia is working towards it, and plenty and plenty of other devices. We're seeing laptops with built-in multi-touch screens. Some of them have serial walk-on tablets. Some of them have other touch screens. So it's really touch. Everybody wants to have touch. Unfortunately for us, at least on the technology layer and on the user interface layer, touch is not the same as pointer. So I don't know if you guys are aware for the last couple of years we've been working mainly on something called multi-pointer X, which allows you to use multiple mice at the same time. Once that was done, multi-touch became big, and I discovered, great, now I can start all over again, because everything I just did doesn't really apply to touch. Why? There are two fundamentally different concepts. In touch, there is no state. But there's plenty of user context. If you think about how you use the average application with your mouse, for example, you use your drawing application. You use your mouse, you have one pointer that is your representation on the screen. You move over to the color selector. You select the color green. The application attaches this color to this pointer. You move back, you start drawing. You start drawing in green, surprisingly. That works. Our brain can comprehend that. We've just selected green with this pointer. Clearly, that pointer has to draw green. It works in the same way if you have 15 pointers on the screen, so you have 15 mice connected for whatever reason you want to do that. You can attach green to that pointer, red to that pointer, and depending on which one you move. It'll work out. For touch, it is quite different. What we see coming from the hardware is touches appearing, possibly moving around, and touches disappearing. So for that example, I just mentioned with the color selector, if you look just at the hardware events that come, if I select the colors green, you see a touch point appear. Select the color, disappear. And then later, you see another touch point appear. Maybe move down on the canvas and disappear again. There is no hint to the application. There is no hint from the hardware that these two touch points may be related. The only hint that is there is in our brain. Because we know that if we select green and then we start painting, we know we want to paint green. But this is not a straightforward situation. Because depending on what use case I had, this may be wrong. Because I'm two-handed, as most of you, I believe. So you can, for example, select the color green, and you start painting in green. Select red, red. So you do a non-dominant hand interaction mechanism. Other people say, hey, I'm really ambidextrous. I can use both hands to paint. So I select the color green, start painting, I select the color red, and start painting in red. In both cases, the sequence of events that the application sees is identical. It sees two touch points appear on the color selector, and it sees two touch points appear on the canvas. The only link you have is in your brain. So that's what I mean. There's lots of user context, but there's very, very little information from the hardware. This is, unfortunately, not a technology problem. Or, better said, it is a problem that we cannot solve in technology. So for all of you that are writing applications or application interfaces, this is becoming your problem now. The only solution to this is to write a user interface where this never becomes a problem. Where the user is never tempted to select two different colors, and then you have to figure out which fingers paint in which color now. You have to write a user interface where the path of selection, the path of operation, is a single one. There's no ambiguity anywhere. So we can't simply take our desktop user interface that relies heavily on the state permanently being attached to the pointer. We can't simply move that to a multi-touch interface and say, hey, everything works now if we make the buttons large enough to actually hit them. You really have to rethink the UI so that it works with touch. And you can see all the popular touch products that have taken off so far were phones and dedicated machines like the Surface Table. Neither of them is using a traditional desktop. All of them is using, at least at this point, fairly simplistic applications. If you compare your average desktop application to a touchscreen application on the iPhone, you notice the desktop application is always more complex in the user interface because there's so many more things we can do with the pointer. With the touch, usually, if you have more than three selections, it becomes really hard for the application to figure out where to attach the state to. There's a couple of other problems that are caused by touch in general is the classical fat finger problem. That's actually the technical term for it. If you have a pointer, you have a one-by-one pixel selection area. You can hit a button that is one-by-one pixel, assuming your eyesight is good enough for that. With a finger, most people's fingers tend to be slightly larger than one-by-one pixel, which leads to a problem that if you have a phone, for example, and you put a user interface element in the top left corner, you may not be able to select it. If it's a flat touchscreen, you can. But if there's a bit of a bevel around it, you may not be able to actually reach that user interface element because the roundness of your finger may not be able to reach that topmost bit. So another thing to consider in UI design, reach. Not usually a problem on a phone. Most people tend to be able to reach any area of the phone. If you're talking about large horizontal touch surfaces or even vertical ones, not everything that can be displayed can be reached by a user. So imagine this table here is one big touch surface. If you put a virtual keyboard on there and the space bar is in that upper corner, I'm probably not too inclined to write long paragraphs because if I always have to go like this to press the space bar, it doesn't work. It gets even worse if you have multiple users standing around the same table and you work because what's easily reachable for me is pretty much this area here. And if there's someone standing next to me, it's this area here. And you might see that there's an area that is about shoulder width, at least, that is overlapping space. So I can't just reach into someone else's space and do something. They're probably going to punch me. There are social protocols to prohibit that. So again, a user interface problem. You need to figure out that. Again, on a phone, not really a problem. And occlusion. Whenever I touch something, my hand occludes whatever's underneath my hand at the moment. Funny exercise for you to try at home. Next time you're trying to write an SMS on your touch enabled phone, turn it around. You can type on a keyboard that's upside down. It's not too hard. But you'll figure that every time you type a key, you can't see what you're typing anymore. Because all the phones are designed, or all the keyboards I've seen so far, are designed that when you hit the key, it pops up which key you just select just above the finger. If you turn it around, that happens to be under your finger, you can't see it. So you're essentially typing blind on a phone that has no haptic feedback. So again, occlusion. Massive problem when you're doing UI design. Gestures, next big topic. Once you start doing touch, you want to do gestures. We all know, oh, there's so many gestures out there. Just pinch, swipe, rotate, and tap. And there is only four gestures. There's really only four gestures that are intuitive enough that you can expect people to know them. A friend of mine was doing multi-touch interfaces. He started about six or seven years ago. And he said that with the popularity of multi-touch coming up, he's doing less and less multi-touch in every single application he writes. Because six years ago, the multi-touch application was a dedicated application for, say, an air traffic controller. The company, building them, could afford to spend two weeks training time for that employee to learn every single gesture, tip, trick, whatever he needs to. Nowadays, a multi-touch interface has to be usable in instances. And if it isn't, people are going to throw it away. That limits us to swipe, pinch, rotate, tap. Because there's nothing else. There's plenty, but you have to train it. Think of it, good example I've heard once. Everybody think of the gesture to copy an object and then find someone else who thought of exactly the same gesture. Good luck. You might find two or three people in this room if you're lucky. The next problem, once you start thinking about gestures, is, OK, so we have touch and we have gestures. But is touch more important than gesture or is gesture more important than touch? The answer for that, again, is not always straightforward and is largely a UI problem. Give you an example. We have a window here. I put one finger down. It's clear. I put another finger down. Simple. I put a third finger down. So what do we do now with the third finger? Is this a third finger, three finger gesture? Or is it a two-finger gesture in this particular window? And the third finger is just something else I'm doing outside? Or, given that this is hardware data, we don't actually know where the data comes from. I can put one finger down, two finger down. I put the third finger down. And then I start moving over this way. And if you just thought there was a three-finger swipe, unfortunately, by the time you realize it wasn't, you've already done the wrong thing. So again, largely a UI problem. Don't make this problem appear. Then you don't have to fix it. Easy. Another one is that we have two windows. You start dragging an object from one window to the other window, and then you tap down with the other finger. Is the other finger still part of that gesture, even though it originated in different window? What happens if that was actually like this and then I tap with the other hand? Because clearly, to me, those two hands, it's two different things. But two fingers is clearly the same gesture. Again, you cannot know this. This is one of the things why or I believe that's one of the reasons why no one has actually tried a multi-window desktop on touch yet. Applications on the iPhone and the Android phones, they're always full screen. That's the reason why, because you can't easily move gestures between windows. The other one, how easy it is to confuse even simple gestures? We all know what happens if I do that. Assuming this is, say, a map application, what happens when I do that? It zooms out. Unfortunately, where you put your fingers down, there was two waypoints. And instead of just moving those two waypoints together, which was what I actually wanted to do, you've just zoomed out the map and I can't see my waypoints anymore. So again, until you actually know what's happening at the very point you touch, you don't know how to interpret that touch. And even then, sometimes you don't really know. Because it might still be a zoom gesture. It just might have accidentally slicked at that. So again, massive UI problems are coming towards you there. After all these problems, how do we actually implement this? So let's see what's happening at the lower level. So because the technology itself is solvable. The UI, so far, still being worked on. If we go to the lowest layer, the kernel, what we have with 2631, we have what's called Multitouch Protocol A. So that was the first one. It's for devices without hardware tracking. So devices that just always give you the whole frame. So the diamond touch, for example, for those people who know it, is that it basically gives you the whole, it has 192 times 128 antennas built in, and it just gives you the value for each antenna in each frame. So if you put two fingers down, you essentially get the two spikes and the two spikes here. But with every frame, you get the full, so with every cycle, you get the full frame. Since 2636, we have the so-called Protocol B, or the slot protocol, for devices that can do tracking in hardware. There's a few of them out there. Wacom serial devices come to my mind here. I'll give you an example for both of them. First of all, let's look at a very simple input event from, say, an old traditional style touchscreen or an absolute mouse, whatever we have there. This is the sequence of events you get for each movement. So you get your absolute X position, 100, absolute Y position, 150. You get the time when it happens and so on and so forth. What matters here as well is the sooner report. The sooner report is simply the kernel marker for it. This was one hardware event. So you get these three packets, two positions, and hey, this is it now. So quite simple. A protocol A event looks a bit different in terms of that we have. Again, one SynMT report to mark the end of this particular hardware event. But then you get the empty position X, the multi-touch position X, empty position Y. You get a SynMT report, which simply means this is the end of one multi-touch part of this event. Then you get the second position, 500, 600. You get another empty report which says, OK, this is the end of the second frame. So if you get a frame like this, you know that there is currently two touch points, two fingers, whatever it is, on the positions 100, 150, 500, 600. Every time you move these around, you get the full frame, just this one. So the next one would be 110, 170, 500, 600, for example. So you always get the full frame. If you lift off one touch, what you get is, again, the full frame of everything the hardware sees. Because the hardware now only sees one touch point. You get 500, 600, you get an empty report. And in your user space process, you now have to say, OK, before I had two, now I get one. So one of them must have disappeared. That's it. And then if you lift the second one, because you have nothing, you just get the empty report saying, look, this is it. And off we go. Protocol B is slightly easier for user space to handle, because it tracks the touch points for you. So it has the concept of slots. Depending on the hardware, you can have one, two, 10, 15 slots. I'm not quite sure what the maximum is. 255 or something, probably. In the case of the same event stream, what you get is first you say, in slot number zero, we have a tracking ID of 30. So it provides the tracking IDs for this touch. We have a position x and y. And in slot number one, we have a tracking ID of 34. And the other position, and that's one event. The big thing about Protocol B is that it is a stateful protocol, so you have to remember what happened. So if I don't move the second touch point, but I do move the first touch point, I get again. Empty slot zero has just moved from the previous position to 110, 180. I don't get anything for the second one because we haven't said it's gone, so it's still there. So you have to remember the site. If I lift the first point, all you get is a tracking ID of minus one, means this has disappeared now. And you notice that the slot is missing now, because the kernel buffers certain repetitive events. And because the last empty slot we sent to the client was for zero, if we send another for zero, the kernel will just buffer it or make it disappear. So in the client, you have to remember, last slot was zero, just got a tracking ID of minus one. Therefore, touch point one has disappeared. If you now lift the other one, you would get an empty slot one and a tracking ID of minus one for that. So if you can always use the slot protocol, it's so much easier to handle. There is a guy called Henry Grütberg, who is responsible for pretty much most of both protocols. He has written a tool called MTDef, which converts in user space from protocol A to protocol B. So you can essentially have, as we have an X strict requirement on we want protocol B, we don't deal with protocol A. If you don't do protocol A in hardware, you need to convert it before we can use it in our space. Bit on the history, it has a couple of more features. So you get width and height of the touch point, the orientation. If you get width and height, you always get the orientation. You get a tool type. Walk-on tablets, for example, can send simultaneous pen and finger events. So the tool type becomes important, so you know which one of the two is the pen, which one of the two is the finger. You get pressure, and there's a couple of other ones. History of the multi-touch protocol is pretty much Henry Grütberg wanted his MacBook touchpad to work with multi-touch. And so that's where you get the width and height and orientation from, because they actually give you ellipses. So in terms of driver state in the kernel, we have currently most of the multi-touch devices have their own driver. Stephane Chartier and his crew have been working on a unified MT driver that essentially speaks to the HIT protocol. The Windows 7 driver is supposedly really, really good in terms of if you can plug any device in. And it will just work with Windows 7. So that's kind of what Stephane is aiming for as well, with the unified driver. Windows guys have it a bit easier because you don't get the Windows certification unless you work with the Windows driver. So yeah, it's a bit easier to comply if you're the driver in that case, because the hardware has to work with you. Let's see. Next layer up, what do we have in Xbox so far? Again, this is very much under development and I had to change the slide several times during the last four days. The fixed parts of it at this point is that multi-touch will be part of the X-input extension version 2.1. Version 2 pretty much revamped most of the input sub system and gave us the ability to use multiple mice and keyboards. And we're tacking multi-touch on top of that, which turns out to be surprisingly easy once you figure out all the semantics. Daniel might disagree, he's been actually writing most of the codes so far, but I'm just blabbering along. It's great. True. So one of the things X-input 2.1 will give us is simultaneous use of multi-touch and traditional point of keyboard interfaces. So if you want to do that, you have your multi-touch desktop, you can use your mouse, you can use a touch pad, and you can use multi-touch on the screen directly. And they all work simultaneously and independently if necessary. This is a rather novel approach. I haven't been able to figure out how to plug a USB mouse into my phone yet, but I'm pretty sure it wouldn't work properly anyway. It's rather pointless. But yeah. We also get point emulation in the protocol. That means we should be able to use any X application, be that Firefox or be Xcalc, with a multi-touch interface. I know, as I said before, that the UI doesn't really work with multi-touch. But if you want to transition from your current desktop to a true multi-user desktop, you can't simply say, at least not on the desktop, you can't say everybody rewrite their application right now, because otherwise you won't work. So we did one by one. We allow multi-touch applications if they can. We encourage people to write a multi-touch applications. But until it's there, you can still use your old applications. So you have Firefox and your drawing program open. You can draw like this and still click on links in Firefox. The big thing, and that's a bit of a break with the XServe internal infrastructure, is that traditionally we've had one device and one input point or multiple physical devices and one input point, the pointer. What's new about the touch support is that we now have one device but multiple input points. So even though you only have one point that you can interact with multiple areas of the screen at the same time through one physical device, it turns out that the old crafty X protocol is not particularly well suited for that. So we had to find a couple of other ways to get around those issues. One of the features that we have built into it, or we're planning to once it's released, is the difference between direct devices and dependent devices. Direct device is a simple one. You click somewhere, stuff happens. Exactly where you clicked. And dependent devices, for example, most of the latest Synaptics touchpads have multi-touch or the Apple Magic trackpad, I believe it's called, where the device itself is essentially a pointer device. You use it like a trackpad. But at the same time, it does support full multi-touch. So you can do a swipe. You can do your Zoom gesture on the touchpad. However, you do not usually want that to be directly mapped to the screen. If you use that touchpad and you click in the upper left corner of the touchpad, you don't want the pointer to suddenly jump up there and perform some multi-touch gesture. So a dependent device in X simply means that the device has a pointer. And any touch events are sent to the window that is underneath this pointer. So any touch event is always in relation to this particular pointer. In a direct device, if you click here, it's that window that gets the event. Dependent device, pointer is here. You do your gesture, everything goes here. So one of the biggest distinctions. Kernel doesn't need that. Kernel doesn't really care about graphics. So it's one of the things we have to add. We have mixed mode, multi-touch, and pointer keyboard, which means that the device can be both a pointer and a multi-touch device and a keyboard if you're really fancy, which could be quite interesting for virtual keyboards because you could pump that into hardware keyboard if you want to. And of course, traditional protocol events are still supported. As I said before, you can use your multi-touch drawing applications at the same time as Firefox. This is, as of two days ago, the simplified, actually no, it's a pretty accurate diagram of touch handling. Turned out to be a lot simpler than we expected. This is, unfortunately, not a joke. It covers most of pointer emulation. So what you see is that we can get three events from the kernel, yeah, from the input drivers under the three red ones. And you just follow whatever path along until you hit something blue, that means you have to send something to a client. And it turns, so far, we haven't found any dead ends yet, or at least not any bad ones. So we're getting there. Looks good. The three phases of touch that we care about in the server is touch begin, touch update, and touch end. It's quite simple. You put a finger down, you get a touch begin. You move it around, touch update. Lift the finger, touch end. There's special touch update unknown. I'll skip that for now. You also get a touch ownership notification. Come to that in a second. But the three phases of touch would really matter for clients. You see some touch coming from somewhere, moving to somewhere, disappearing again. Touch events are delivered like point events, which means if you touch here, the window that is here gets even. Same as if you click somewhere with a pointer. The one exceptions are grabs. And one of the reasons why the diagram before looked like it did was grabs. For those of you who have never had to deal with grabs, aside from enjoy, you've had a great life. Haven't missed anything. A passive grab is the extra mechanism to intercept an event before the event is actually delivered to its true target. The most common operation for this is the window manager has a passive grab on the button, which means if you click somewhere, it's the window manager that gets the event. Window manager reshuffles the window to bring that window to the foreground. And then the event is delivered to the application underneath. So that works if you click on the submit button in Firefox somewhere. The window manager will bring the Firefox window forward, but it'll still send the form off. Passive grabs are extremely complicated in the necks and X protocol specification that became even more complicated once XI and XR2 were added on top of it. Now we're trying to kind of wrap this around and still support passive grabs in the traditional way, through point emulation, while at the same time adding a new way for touch events to work. So what this means for touch events is for mainly technical reasons, if a client has a grab on a touch event somewhere, the events are delivered to the whole window tree on the position the touch event appeared. So I'll give you an example here. If you have a window hierarchy that is three layers deep, which could be, for example, you have a flash window inside a Firefox window, inside the window manager's window. You'd have a window hierarchy three deep. I've drawn them here as a tree. In reality, they're inside each other. If I click on any window here, say I click on this flash window, which is the bottom most one, and both Firefox and the window manager have a grab, all three windows will get the touch event. But only the topmost one that currently has the grab will also get the ownership notification. So you have two clients that get touch events, but they're basically being told, here's an event, it's not yours yet. The first one gets, hey, here's touch events, do something. Then the topmost client can either decide to accept or reject the touch. If it accepts, it basically terminates the events to involve the others, and it continues processing. If the topmost client, like the window manager, for example, shuffles the windows around and then says, I reject this, I don't need it anymore. So the window manager sends back a reject event. What happens now is we continue sending to the other two. And we send a new ownership event to the next client in the stack, saying, hey, this touch event that you already got some events for, it's now yours, do something. And we keep on working until some client either accepts the event, or we arrive at the bottommost window, and events are just processed at normal. So the main thing to take away from this, we haven't, again, Daniel and I were sitting in the criterion, which I can recommend, corner of Adelaide Street and George Street, has big Helgon tap. After four jugs, we had it all sculpted out. That diagram, yeah, four jugs. Awesome. So only one client at a time owns the touch grabs, or the touch stream, sorry, regardless of where they have you grab, even if you're just selected for events on your window, eventually you will become the owner of the touch stream. Until then, you still get the events. Why? A, because that way we don't have to buffer them in the server. It's quite easy to run out of memory if you do that. The other thing is that some operations on touch events take time, such as gesture recognition. You might need a sequence of 10 events before you can decide what you're going to do with that, or before you can assign it to user ends on as a fourth. So by essentially giving everyone the touch events at the same time, even when they're not supposed to do anything with them, they can do some pre-processing, just in case they will get this touch event, so they don't have to always lag behind. With button events, we currently do a serialized. Send it to the first client, the client says, nah. Send a second client, nah. Third client, and it just keeps on going. Touch events, everything at the same time. So as I said, you will get touch events, but if you're not the owner of that touch stream yet, if you process them, you have to do so at your own risk. So you can start to do gesture recognition. You can start to highlight something. Any visual feedback. But anything you do must be able to be undone. Because at any point in time, you might get a notification. You know those touch events we just sent you? Someone else claimed them. They're not for you. So anything you did up to that point, you have to be able to undo. If it's gesture recognition, easy. You just throw out the result. If you've highlighted something, you have to un-highlight it. And then you have to decide, was it really good at you to highlight something? Because not a user's confused. So again, there's a couple of UI questions in there as well. The ETA, for that, is approximately August 2011, I think. So the X over 110 is supposed to be out in two or three weeks, so six months from then. X over 111 hopefully has all that built in. Once the X over have it, we have to go further up the stack because most of you probably are sane enough not to code in Xlib, if not. So QT, I've been talking to some QT guys. What they're doing is they're essentially waiting for X input 2.1 to be in a stable release. Again, I'm just giving you a high level view of those. That doesn't mean they're just sitting on the sidelines waiting for us. They're actually tracking the current development. So there's heavy work supporting XI2 in the background as well, which is somewhat of a precondition for 2.1, not too much. They can't completely switch to XI2 because it's a lot of work that would lose some features and it's for a little gain. So they're probably going to have core events and XI2.1 events at the same time. QT already has a multi-touch API since 4.6, which came out, I believe, last year at some point, early last years, I'm not sure, which already works on Windows and possibly OSX. So what they're basically doing, once we have the X input stuff in place, they hook up the X input back end to the API they already have, and then magically everything works. So the theory goes. Point emulation is done in the toolkit by QT, so you have an application that is built against an older version of QT and you run it on a multi-touch enabled display. QT will do point emulation for you, which is going to be quite interesting because you know how I said before, we're going to do point emulation in X as well. So QT is going to get the touch event and the point emulation event or not. And then it has to figure out which event is being matched up and you can't do that without supporting XI2. So from an X point of view, if there is a point event that is generated through point emulation, so if you touch somewhere and it causes a button press to be generated because of some old client, we will actually say this point event was generated in response to touch. So you can filter all those out if you're just handling multi-touch. Could be interesting for QT how to do that. Terms of API, again, very, very high level here. You can see where we got the idea for begin, update, and end from. So they have the same three states for a touch. And in your graphics item, you essentially just say, hey, set except touch events and you will get all the events sent to you. So it's a fairly simple API. There's excellent documentation out there. GTK3 is going to use XI2 by default, which is kind of interesting. You can do all the multi-pointer stuff. There's no current plans to support XI2.1. Surprising all the norm guys are rather busy with norm 3 at the moment. I want to talk to Carlos. He said he's currently trying a gesture interpreter that passes gestures to widgets. That is currently pointer-based only. So you do your circle, you swipe with the pointer, and it'll transfer it. There's no direct plans for multi-touch yet. Having said that, I do have an email in my inbox with a couple of answers that I haven't read yet. So possibly maybe he changed his mind in the last two days. The API for the gestures is quite simple again. On your widget, you essentially say, I want to listen to those five gestures, six gestures. And then the race is just hooked into the event delivery with signal connect and callbacks. The third big player in the multi-touch scope is canonicals you touch. Canonical deserves a credit for actually throwing people power at touch. Everyone else is just working the spare time. Canonical, I think, has five people in the payroll. Yeah, five or six people in the payroll. They've already shipped some stuff in 10.10, which I've seen a couple of people use. There's going to be more updated stuff in 11 or four. Canonical has an extremely strong focus of gestures, which goes both ways. If you remember before when I said touch and gestures, you kind of have to consider both. In terms of what they are trying to do is they have a three-part component. I'm going to show an architecture diagram in the second. One of the main components is GRAIL, which is an acronym that you probably don't need to remember is gesture recognition and instantiation library, which is essentially a gesture recognition engine. GRAIL talks to GACE, which is the interface specification and the library that clients are supposed to use for system-wide gestures. And then you have Unity and the plugins and the compass for window manager-wide gestures support. The architecture diagram, the latest I could find in the Multi-Touch-Def list on Launchpad. So this is by Duncan McGregor, I think, sent to them. As you see, the X server sends out the XI2 events. For those toolkits or clients that handle the raw events, it goes straight to the application. For the others, it goes through the U-Touch compass plugin, through GRAIL into GACE, and then GACE talks to the application layer, GACE instance, through some RPC mechanism. I'm not sure if that's 100% scoped out yet. At least in this one, it says it's likely to like D-Bus. In terms of API, quite simple. Create a new instance of GACE. You register either your event callback, and you just say dispatch events, or you do a manual handling of the events one by one. And GACE events are essentially, you get a couple of one, you get a new device that has been added, the device has been removed, and of course you get all the gesture events. Unfortunately, we live in interesting times. And I'm very, very sorry for you guys. All right, thank you. If you have any questions, feel free to ask. Questions, guys? I've got a microphone here. Put a hand up a few. I've got a question. Cool. I'll just wander up there. I'll have to pass it to you. We've got a little bit of time, so just hang on to it if you want to ask a couple. Hand it back when you're finished. What device could I purchase to test this stuff out? What's the cheapest option? The cheapest option is probably the Apple touchpad at this point, the dependent device. There is a couple of laptops that have e-gallax controllers, entry tablets, and what's the one you have? OK, so think that T4010S has an entry tablet that I believe does two finger. Is it an IT iPad for the recording? IT iPad and Lenovo Netbook. There's the Magic Mask, which is multi-touch. I'm not quite sure how that's hooked up yet, but I think it works. So Magic Mask works as well. Phones, I guess any Mego device? Yeah, the Magic Trackpad is probably the cheapest option I think. It's like $99 in Australia at this point. OK, so first real quick, do you know if the Wacom bamboo pin and touch are supported yet? As of 2637, the bamboo has kernel support. And I think Chris sent me the patches for multi-touch about two weeks ago, and I think most of them got merged. So by the time I get to deal with my inbox, it will probably work. Having said that, Wacom is a bit of a special case in terms of that it currently does gestures in the driver. So if you do the especially zooming and swiping and whatever else it does, it does it in the driver and then submits key zooms to the application. So if you do a zoom, the Wacom driver will convert that into Control Plus and hope whatever application currently has to focus does something with that. Yes. The driver can't really do anything yet, because we don't have to serve in place. Which means we don't have the APIs for the input drivers in place, which means like as much as it wants to do, there's just no path upwards for multi-touch events other than this. And have you seen any what was done by Pogo? They have a touchscreen stylus for phones and trackpads called the Pogo Sketch. But then they have software called Inklit that turns those into behaving like a tablet driver. Haven't heard of it, can't comment. Just the one thing I'd point out is, according to them, their software works not as well on the Apple multi-touch hardware, the external tablet, because the Apple external one does not give as much information out as the built-in trackpads do. Could be. Yeah, I haven't looked at the, it's one of my to-do lists by a trackpad. It's been on there for about a week now. And sooner or later, I'll manage to go to a shop. Over here? Yeah. I'm not sure if this is too blue sky, but I'm just wondering, I'm just thinking whether you were talking about difficulty in distinguishing between, say, fingers and two hands and that sort of thing. I'm just wondering whether the protocols and the APIs are open enough, say, to consider developments in hardware along the lines of, say, combining multi-touch with, say, the connect, I'm not sure what the generic term might be, where you could actually, you know, using, imaging actually distinguish between fingers. So you could actually know that, you know, finger one is touched as opposed to other one. And the other thing is, you know, orientation of the hand as it's moving towards the screen. I don't know, does that make sense to have enough hooks to do that? So it's a multi-sided question. So one of the things is that, unlike a mouse, which is a fairly straightforward build, and the most exciting thing is, is it optical or does it have a ball? Touch devices tend to have a wide variety of technologies. So surface table, five video cameras underneath. They combine that into an image. I think three of them track different objects and so on and so forth. So you could do with a high-res camera, you could do something like fingerprint reading on touch, if you really wanted to. Other devices, capacitive, resistive, there's, I think, mixed melt devices out there. So a lot of it tends to be handled in hardware only and abstracted through the hit driver, HID, which is currently seeing some changes as well to accommodate for new features of this touchpad. Unfortunately, because I'm not actually directly involved with the kernel development, I don't know all the details. I can hook you up with a guy who's written the Grand Unified Input Driver. He knows all about that. So if you come later, I can give you his email address and then you can talk to him. OK, one last quick question. How does Ubuntu moving from Xorg to Wayland affect this? It could be interesting. Wayland doesn't really have an input infrastructure at this point. It does. It's a very simplified one. It's a bit blue sky. I don't think Wayland is going to be ready for, I don't know, another year or two. That is my strongest because input tends to look easy from the outside because, technologically, it's easy. It's semantically very, very complex, as you've probably seen in the first half of the presentation. OK, thank you very much and thanks for the questions. Thank you very much, Peter. I have a gift for you on behalf of Linux Australia. Has anyone not heard the story yet of what the gift is all about? Raise your hands if you want to hear it. Very few, so you're not. But in a very brief nutshell, it's a very nice gift and I'll tell it to you afterwards. Thank you, Peter, again. Thank you, if you would. And that's the end.