 I'm Tim Panton, I'm the CTO at Pipe, and this is like really just trying to encourage you to think about webRTC as being maybe a little bit more than just that thing that you do conference calls with, so I mean like that was what it was built for, it was built as a Skype replacement, that was the original idea, and that's effectively what most of you either are working on in the engineering sense or at least just making daily videos, sprint calls or whatever, and so it's like it's there, but it's actually kind of useful for other things, and I wanted to highlight that because I think it actually should influence the way that we see the APIs, and I think that's, we tend to forget that to some extent. So, here's a conference call, you know, in action, it looks like a conference call, but actually I would argue that it sort of isn't, in that this is Meet Echo, what's interesting about Meet Echo is that although it's a video conferencing tool, it's very specifically designed, I'm talking about somebody else's product here, but you know, it's very specifically designed to meet the semantics of a particular meeting type. And it has the rules of that meeting in there, but embedded in the interface, and what's more, actually if you look at some of the way that it works, the media priorities are fantastically complicated. This isn't actually a particularly good example because normally what you'd have is a slide deck in the middle, which is taking over most of the space, and then you have audio from multiple microphones, your own local feed, the person at the stand. There's a bunch of different video feeds coming in that are differently prioritised, and some of them aren't actually coming from browsers. A lot of the feeds coming into this aren't actually browser inputs. They're microphones, loose microphones, loose cameras, this kind of stuff. And so, although it looks like a conference call engine, it kind of isn't. And the other thing, the critical thing is for it to work, you have to be able to join it with zero install. You have to be able to open your browser, browse to it, you know, agree with the note well and you're in. And that's kind of important. So here's another one. This is something that we've worked on. It's a baby monitor, a reasonably privacy-protecting baby monitor that runs WebRTC between the camera and your smartphone. So you can watch your baby in real time rolling over or whatever. You can see its heart beat as well. No, sorry, its respiration rate as well, which is also carried over the WebRTC data channel. So you've got a privacy-protecting secure thing between the two ends, and there's also an interesting thing from the service provider's point of view is that if it does end to end media, not only do you get the encryption benefits, but you also get a reduced bandwidth cost because most of that traffic is not going through a central server. Now I built this thing. It's another example of a thing that is WebRTC, but it isn't a conference call. I do a podcast with a friend. We interviewed people who were doing stuff that we think might tell us what's happening in the future. And it's audio only, and we wanted to make it so that we could interview these people really easily, so we just basically send them a link. They open a link to a WebRTC page, and that gets recorded automatically, and that conversation has got to be really easy to do. So we did this. I built this thing in WebRTC. It's on GitHub, and the podcast is there as well actually. But the critical thing was that it was mobile first. Both ends of this call are on mobile. There's no laptops involved. So we tend to think of WebRTC as a laptop tool, and it really isn't for a lot of use cases I think. This is the extreme example, which I bought this the other day. This is Google Stadia. This is a games engine. You can play your favourite big shoot-em-up game. In this case in 1080p, but it'll actually run in 4K as well. So this is the WebRTC internals from the Chrome browser on my old MacBook running Stadia, and actually the twitches in it are me-flipping tabs to get this screen up. So actually it rocks steady 60 frames a second, 1080p, 25 megabits of video streaming into my device. And there are a load of other interesting things about the way they do that, but the critical stuff is its low latency, very high, but controllable, manageable bitrate, and to make a playable game. And I think that's amazing that there's a WebRTC use case that I don't think any of us really predicted when we started this game. And there's the final thing I want to show you is the thing that I use almost every day, which is remote access to devices. So this, if it works, is remote access to a device that's sitting in my apartment in Berlin, which is a Raspberry Pi Zero sitting on my router in Berlin, and I now have a terminal session to it. And what's really exciting about that is that it's sitting behind that, and it's not exposing any ports, but I can still log into it, because it's just going over the WebRTC data channel. There's a kind of interesting attribute of WebRTC in that. So what have we learnt? Basically what we've learnt is that WebRTC isn't just for video conferencing on laptops. That's not the only thing you can do with it. There are a lot of other interesting things you can do with it. It may be the original point. It may be the major use case, but I still think there's other things that we can do with it. The other thing that we've learnt, or that I've learnt in all of this, is that the W3C WebRTC API, aka ASTP, is really not a good environment for doing development outside the telecoms world. I'm not even going to discuss whether it is inside the telecoms world, but I would say that every single one of those use cases has had to manipulate the ASTP to get the behaviour that they wanted. And the other thing that I didn't emphasise, but that is actually true of that lot, is that only one of those, well, so they're not running Live WebRTC at both ends. Typically, if there's not a browser at both ends, the only one whether it's a browser at both ends is actually my pod call. The rest of them, you have a browser at one end and you have a server running something that's RTC web that runs the wire protocol, but doesn't run the API. So, if you think about MeetEcho, that's not running Live WebRTC at it end. The user is, but the other end isn't. And the same is true for the other devices, that you're running something where you're sharing a protocol, but not necessarily an implementation. And you're saying like, there are other implementations of WebRTC? Well, yes, actually there are. Thank you. There is a whole list here in a variety of languages. I've written one in Java, Pylons written one in Go, there's MeetEcho in C, there's one in Python, there's JavaScript, there's C sharp. These have got various different licences, they're not all open source, they've got various different licences. I don't actually know what GStream has written in, but it's there. And what this tells you is that if you write an open standard, this isn't really about open software, this is about open standards. That if you write an open standard, it means that other people can implement it if it's a well written standard. And then it's proof that it is because we've all done this. I want to run an interrupt session in Vancouver to try and get these people to prove that they can interrupt between each other and not just with Live WebRTC. And I should add, I realised at the end, as I was putting these slides together, of course there are other WebRTC engines out there. There's Asterisk and FreeSwitch, they're not really libraries, but they do talk the same wire protocol. So all of this, we ask ourselves, well, what is a good API? And the answer is I have no idea, right? And what's absolutely for certain is the W3C doesn't know what a good API looks like for this because they've had a couple of goes at it or more. And like the results are still unconvincing, the native library, the API is horrible, the Live WebRTC library, the API is pretty hideous. And I wonder whether that's because we're framing the problem wrong. So when framing an API problem, it's always good to think about Albert Einstein. So Albert said it should be as simple as possible but no simpler, which is a really good dictum for APIs. So I thought a lot about our use cases that we'd seen and what I realised was that actually people were using WebRTC effectively as a proxy. Like they had a local service that generated RTP or that monitored the baby's breathing and that gave data out over a web socket or whatever. And they just wanted to be able to kind of sprinkle the magic, natural-versal encryption, all of those things into the browser-ness of WebRTC. And they wanted to sprinkle that pixie dust onto their service and make it appear magically into a browser a long way away. And they didn't really want to kind of get into an API, they just wanted to configure a proxy basically. And so we ended up, and this took me two or three iterations to get to this point. The pipe agent, the pipe implementation of WebRTC or RTC web is effectively a configurable proxy. You tell it what things you're allowed to proxy and it does. And what does that look like? Well, so an obvious one is RTP or RTSP. There are a lot of cameras out there that speak RTP. There are a lot of devices out there that do RTP. And what you can do is just like wrap it up in DTLSS RTP and push it out to the browser and you get your video in your audio. There's some complexity about managing the encoder bit rates and stuff like that. But actually it's sort of first approximation, pretty clear how to do that. The really easy one turns out, and this is funny, when we specified the data channel API in the browser, it looks exactly like the WebSocket API, which means that you can substitute one for the other and the rest of the JavaScript doesn't notice, which is kind of funny. So what we do in pipe is we, instead of giving back when the device, when the page wants a WebSocket, thinks it wants a WebSocket, give it back a Proxied WebSocket, which is actually a data channel. So we create a data channel, connect back to the agent, and the agent then opens a local WebSocket to the service and then proxies the data between them. And amazingly it works invisibly to most of the pages, which is kind of cool. Then we have the one that's magic. Now this is the one that I feel most guilty about, because basically you've got a web page that wants to get some web pages from a server, from the service running over there, but there's lots of that and other stuff in between. So basically what you can do is you can do that over a data channel, and then with a little tricksyness of abusing the service worker API and abusing iframes, you can make the page not know. You can have a page that doesn't know that it has come from a data channel connection. You can hide the fact that it's a data channel connection, and the page is completely unaware of how it got there, which is kind of cool, actually. So what does this lettuce do? Well, it lets me, for example, take this thing, which is two motors, two wheels, a ball bearing, a Raspberry Pi Zero and a battery, and it has a little local web service which tells the motors to run. It has a video streaming service with G streamers from the camera, and what we do is we proxy all that, at least in theory, into a browser, but it also has a web service on there with the control page. So what we're doing now is we're proxying that control page up into the browser on my iPad, and we now have, at least in theory, a drivable device over WebRTC. So, get rid of that, back to keynote. So that lets us drive a device which isn't quite small enough, that our customers say that it's using too much memory and whatever, but in principle lets us drive a device from here with live real-time video. How's that an API? How can I claim that's an API? What does it look like? Well, what we've done is, and this is a bit tricksy, is basically when you create a data channel in JavaScript, you're allowed to give it a label, and what we've done is, again, we've slightly abused that label we give it is a URI, and the URI tells the far end what we want it to proxy. So, in this case, we've said, I want you, we've opened a local data channel whose name is WebSocket localhost motor, and then what happens is under the hood, we create a, we proxy this up to there, and the page is quite happy with this. There's some verification and checking, but in essence, the page hardly notice. The stuff with RTP turns out to be a bit more complicated, but in essence, we do the same thing, we create a data channel, we label it RTP, and then we have to do a little bit of, like, upgrade to get the SSRC and the P-type and the RID over between the two ends, because those are the things that you can't guess in the offer answer. The rest you can guess, the rest are already known by the time you've got a data channel, but those things you have to pass between them, and then you apply the offer and the answer, and boom, you get proxy streams, which is nice, and so I'm asking myself whether this is actually something that, like, should be standardised. Is this something that we really want other people to build into their APIs, or into their RTC implementations so that more people could take advantage of this at multiple levels, or is this just something that, like, is a quirk for me? But, yeah, or ask me questions, tweet me, catch me, whatever. Thank you. All right, wake up, everybody. No. We have five minutes for questions, so that should be plenty for some. Anyone? So anybody building robots? Anybody building... Like, do you have entry cameras, doorbells, those sorts of things, are all candidates for this kind of trick that you want to arrive, you want your user interface to arrive easily into a web page? That's kind of essentially the trick it does. But it could be, I mean, it could be big stuff. I'm playing with small toys, but presumably I suppose it could be a car or something larger. Questions? Slow me up there. No, no, otherwise it's not in the recording. So think about the good question. Thank you. First, thank you for amazing presentation. It was really new ideas. And the question is, is there any frameworks for, like, special libraries to do all of those Internet of Things or all of those application levels? What will be, like, the best framework to be able to analyse those packets that are not RTP, but do give us, like, the application level? What is, like, the implementation level from a framework perspective? So I think my answer is it doesn't matter. Like, I mean, there's two questions there. There's one of which is generally, like, how should you do it? And the answer is you do it with the engineers that you've got available who've got the tools that they know how to write. Like, if you, if you, you'll find a ton of embedded web servers out there. You'll find a ton of RTSP clients out there. And it's just a question of the ones that your engineering team feel comfortable with. And the meta question is, there is, and the meta answer there is, actually, you can sandbox it. So what's nice about this is it hides the RTC-ness from the embedded people. So the people who are building this stuff can carry on doing the things that they've always done. And then you just put, like this, effectively this proxy in between. And the proxy knows everything it needs to know and it doesn't, like, interfere. So using those protocols as the gateways. And the answer to the question about what protocols. I think it's RTP. I mean, you might want something like SNMP, but I'm unsure about that. And in the extreme case, I mean, what you saw me do with the shell is it's just a plain pipe. So it opens a shell, a pipe to a shell, and talks to that. So you can, I don't think you can call that a protocol, but that's the extreme back end of that. Did that answer the question? If you didn't get all this first time from the explanation, is it all documented somewhere with this stuff? Less than it should be. There's a GitHub repo, there's a GitHub pipe webcam which lets you build this minimum version of this. But part of the question is, actually, is it something that we should be writing up and trying to standardise and make something of, or is it just a little interesting game for me? I'm genuinely unsure about that at the moment. So the question was more feedback, actually. I think it's an absolutely brilliant idea because being able to explain to web developers that this is just like a web socket and I have to go through the whole peer connector offer answer to get a session up to an endpoint is absolutely superb. The only thing that makes me shiver a little bit is that iframe in order to get the data from the service worker into the peer connector. Is there any way that you're ever going to be able to get away from that? Get away from having to have a child iframe so that you can open up a peer connector? Only if I can persuade the browser vendors to support data channels in service workers. That's not totally implausible. I've had soundings that mean that it's something they'll consider, but getting it done is a whole other game. Although, if it turns out that, for example, Stadia needs it, it'll happen in an instant. Absolutely. I guess that crosses over with the permissions model as well, for the reason that it's not done within the service worker. There's a point. They actively don't want the full peer connection API in service workers because it makes no sense to send video to a service worker. But data channels, there's some agreement that there is a point to doing it. It's just like everyone's looking at it and thinking, oh, do we have to? Thank you.