 Thank you all. Thank you very much. Completely unnecessary. All I did was walk from there to here. But, you know, it's nine o'clock on a Sunday. I'll take what I can get. So, as this part of the slide says, my name is Steve. As that part of the slide says, I am a geek. As you can probably tell from my accent, I am from England. I'm from London, England. And I'm a European, despite what my people keep saying. So, I'm going to be talking about building your own version of Skype in the browser. Because you can actually do it. How do you do it? Well, it's quite simple. It's called WebRTC. Talk over. See you later. All you need to do is... WebRTC is already built into the browser. It is real-time communication. And it just sort of occasionally works. It's an API which is the JavaScript component. And it's also used to describe the flow, the protocol between two machines. So, how does it work? Well, I'll talk in theory and then I'll actually show you a real-life demo, which I'll do at the end because I'm not going to trust it. Let's face it, it's a computer. And it's a demo and it's live. So, WebRTC, how does it work? Well, there's multiple steps. The first thing, you grab the local media. Get, use the media. That could be the webcam, it could be the microphone. Sometimes if you're doing this on your phone, you have the option of two cameras, the front camera and the back camera. You can specify all of this, the parameters. The next thing, you say, okay WebRTC, off you go. Create this peer connection. The browsers agree on what it is they're going to communicate. So, are you communicating using a camera image? Are you doing a microphone? You can also say, I want the camera, but only if it's as big as this. So, if you're doing insurance, for example, and you want to use WebRTC to take a picture of the car that's been broken, then you need to have a high resolution image. Otherwise, it's probably not valid. So, you can say, I want a camera image, but only if it's 1024 or larger. You also say, how are you going to communicate? Are you on the same local network? Are you having to go via another server? You have to specify this. And then once you've done that, the streams just flow and it all works normally without a problem. So, the first thing, get, use the media. Very simple. These are called constraints. You say, I would like video, true. I would like audio, true. These are also the places where you can say, I would like video, but only if it's really small, because I'm on a phone and I'm a low bandwidth, so don't give me too much video. Or, I'm doing the insurance example, I would like video, but only if it's this large. You can specify an awful lot in there. I'll leave you to go away, look at the docs. Once you've done that, you need to give it the parameters on how it's going to be communicating. So, turn servers and stun servers. Turn servers, actually, there were just stun servers first. So, most of you will realize that when you connect to the Internet, you don't actually have your own IP. You're knatted and it's an IP which your ISP gives you. That's pretty useless if you've got another computer that wants to have a Skype-like conversation with you. So, what you need is the publicly accessible IP, which you can get via a stun server. If you go to sites like, what's my IP and that, it just says, well, I know you think you're 192.168, blah, blah, blah, but actually I see you as this global IP. And the stun server gives you that, which you can then pass on to the other side of the conversation to connect. The turn server is when you are behind a nice chunky firewall and no one wants to talk to you at all. The turn server says, right, well, I will take your media, your video or your microphone data. I will turn it around and send it back down to the other peer. That's what the turn service does. That's a lot of data and that's why there are generally no free ones around. You have to have that server yourself. Once you have this data, you just pass it into a JavaScript method called RTC peer connection and job's done. You add it, you add that media stream, that video or that audio to the peer connection. You create an offer to the other side. So, you've got these two units, the person that wants the conversation and the person that has to have the conversation with you. The first person creates an offer. I offer you the chance to have a web chat with me using this microphone with this video data. Do you accept that? The offer is then sent to the other user. They go, yes, I accept this conversation and they create an answer back to the first person. When both of those users have applied that information into their peer connection, traffic happens and that's it. How they communicate? As you said, there are three ways you can generally communicate between two peers. The first one is you're on the same local network. That's pretty simple. You can see the local addresses, you send the data through. The next one is you're behind a firewall but it's not a particularly well implemented firewall so you can just break through it. Job done. The other way is you're behind a really difficult firewall where you have to go through a turn server which is outside. WebRTC, they call it a peer-to-peer protocol. It isn't. There's no such thing as peer-to-peer. Never. There's always needs to be something that says, I know you're a peer and I know you're a peer and I'm going to connect you. That's something you're going to have to build. It's not difficult but it's a thing they don't tell you in the brochure. Once you've done that, as we say, the streams flow and everything works fine. So, let me do the live demo bits. Get user media. This is the first part. I'm on mute so the URL should be available most of the time. There we go. So, we simply call get user media and there we go. And this is one line of JavaScript. It's so easy even I can do it. That one line, it creates the object, it opens, let me, I could give the talk from here. Actually, I could have given the talk from home. It's a tradition. I have to do it. So, get user media. It creates the object which opens up the camera. If you have multiple cameras on the device, that's where you can say I would like the front camera, the back camera. You're also able to say, you know, large video or small video, all of that kind of stuff. Each side of the conversation has to do this. So, let's go to the second window and do the same thing. There we go. Once you've created that object, you create that peer connection. Again, this is the same code that I showed you in the slides and it is genuinely the same code that's running on the machine. All the normal stuff set in the constraints for what you want to do, setting up the peer connection and these things, ICE candidates. Now, ICE is the connectivity. It says there are three ways you could connect. You know, I said it could either be the local network through a bad firewall or through a good firewall. The ICE candidates come along and saying, I think we could connect via this method. Oh, I also think we could connect via this method and basically it creates a series of possible candidates that it could work and this can vary. It can be anywhere between one, two, I've had about 50 different candidates and it says, I've got all these possible options to me. Let's try all of them. Both sides try all of the possibilities until they find one that works and when they do, that's how your data flows peer to peer. So, we've created our peer connection. We then add the media stream to that peer connection and that's all the code. There isn't any more than that and to prove it, you can go to the link and you can view source and you'll see that I'm not fibbing. This is when we come to the exchange, the offer and the answer. So, the person initiating the call says, I offer you the chance to have a conversation with me. It can, it creates this STP. It's a session description packet and it looks like this. This is the information that gets passed between each machine. Hope you're making a note of all of this. There will be questions at the end. Everyone got that? Good. So, once you have that session packet and this contains all the information about the offer, the type of video or the type of audio it wants to send, the size that it wants to do it, little bits about the network, the protocols that might get used. It is an absolute mess. The reason it looks like a mess is because it is a mess. The people that designed this wanted to have some kind of resemblance to the SIP protocols that came before it. Because there are millions of SIP phones in the world. They're not going away. So, the WebRTC people thought, let's be nice to the SIP people. We're taking over their business. Let's use something which is ugly and horrible just like the SIP protocol and this is what they came up with. By the way, if this is being recorded, SIP people, I love you. You're lovely. So, this is the packet which the sender needs to send to the receiver. At the moment, it doesn't know where it is. It's going to be any machine in the world. So, you need to have a server that passes this information from one location to the next. And for the purpose of this demo, the way I shall be sending that is via the protocol of copy and paste. So, let's, on the second receiver, let's create another machine. Let's receive the offer and paste that in from here. Blah, blah, blah. Done. Now, this is a two-stage process. We create a session description from the thing that the other person sent us. We then have to set the remote description. That's two operations to do one thing. We've already said create an object with this packet of junk. We then have to say it again. Why? Because that packet of junk, you might want to change. That amount of stuff there, you might look at this as a programmer and think, you know what? I disagree with this 8,000. I want this to be 7,000. So, WebRTC does everything in two steps. You create the object with all the things in and then you apply it to your connection. So, we set that description. We then reply and say, thank you. I accept this conversation. Thank you for your SDP packet. I will give you my answer of how I would like to this conversation to go and it gives you this. You might recognize this. This is the same hunk of junk you saw from the first example. Again, there will be questions at the end and it is pretty much exactly the same. It says this is the sort of data I want to send. You might be having a conversation with me via a webcam which is high resolution but the other person can say that's fine if you want to send me high resolution data but I'm only sending you low resolution data. This is my answer. And the only real difference is that very first word type answer. Everything else is exactly the same. So, the second person sends back their answer and say, yep. And again, two-stage process. First you create the object with that big text string of how it's going to work and then you set that remote description onto the local peer connection object. Both sides now both know what the other is going to be doing. And that's pretty much it. You've told each machine what it is you're going to be transferring. Is it video data? Is it audio data? You then have to pass in the ICE candidates. This is the way that the two machines are going to connect. Is it via the local network or not? So we send the ICE candidates from one machine. That's the signaling bit. Which, if you have a look here, these are the ICE candidates. These are the bits that I mentioned that it creates automatically. The browser looks at the interfaces on your machine and says, I've got two network cards. There are these IP addresses. I'm on this part of the network. I could probably access it remotely because the firewall is quite easy. The firewalls, I mean everyone says, oh yeah, firewalls, firewalls, you know, they protect us. What this actually does is it keeps firing packets in. The firewalls have got very simple logic. It just basically says, well if someone sent something out on this port, there's a reasonable chance it's legitimate traffic. So what the LRCC does, it sends loads of stuff out on loads of random ports hoping that the firewall has a look and goes, oh yeah, I've seen some traffic on this port before. It must be legitimate. So when the other person starts firing packets in, it gets through the firewall. Which is great because it solves all the problems that the IT people put there to stop us doing this thing. So you have a series of candidates, in this case what we've got, nine to ten, and you send that over the same signaling channel to the other person. Again, I'm using copy and paste, but you'd have your own little server that says, oh yeah, person A wants to talk to person B, send these nice candidates along. So we send them along, we receive them, add them in. Okay, so there are 11 candidates there and that is it. That's a working demo. All I've got to do is scroll up now and if there are two windows with my image on, this is a live demo that's actually worked. The suspense is killing me as well. There we go. There's my one, there's the other ones. We've got 11 candidates there. Yeah, so there we have a, there we, it's me again. So there we have a very simple example, because it is very simple, on how these two things are communicating. You don't really need that much code, but what you do need needs to work. Let me show you some other bits of that. So what I just showed you, that takes about 50 lines to write. There's nothing much there. The trick is everything else you need to do around it. You need to build that signaling thing that says, oh here's machine A, it wants to talk to machine B, and then actually have that communication. You've got stats in there. So if you want to know what's going on at any time, you can query the browser and saying, what are you doing now? Are you expecting an answer? Have you just sent an answer? You can say, what sort of throughput am I getting here? So if you use Chrome, and most people that do this stuff use Chrome, not because it's best, but because Google created WebRTC many years ago, and they really are trying to push it so they've got the best support in the browser for most things. You can go to your Chrome WebRTC internals, and you see stuff like this, which if you, you know, you can see it will say things like, oh I'm decoding this, this is the jitter buffer, this is how many packets I've lost, which you can then use from JavaScript. You can grab all this out using the getStats method to say, how many packets have I been losing? And then you can report that back to the user. You can say, is this jittering a bit? Should I tell the user, if you're doing something else on your machine right now, it's blocking the network, it's jittering, and you as a machine can work that out. The machine itself will automatically degrade the quality if it sees that the bandwidth is quite low. It will do that, you don't have to do that, but it's good to be able to grab this information so you can pass that information onto the user saying, by the way, it's going to go a bit slow now. And you get pretty graphs as well if you want to do that stuff. There is also a data channel as well. So Skype as well as supporting voice communication will also support text, and you can do it with the data channel object. That's very simple. You can create as many data channels as you like. Let's face it, if this thing is able to send real-time video between two peers, it can certainly hand a stupid piece of text. There, although there are theoretically an infinite number of data channels, they actually all go over the one channel, and it just has a little header at the top saying, this is channel user one, this is channel user two. And you can send anything across strings, blobs, array buffers, and as the bottom bit says, which is the important bit, it matches the WebSockets API because that's where they stole it from. And it works. So it conclusionally says, getting it roughly on time. The peer connection handles everything. You create, you use a media object, you add it to the peer connection. You create an offer for the other side to have that call. That goes into the peer connection. You get an answer from the other person that says, I'm happy to have this phone call. You add that answer to the peer connection. The order is very strict. Every single thing needs to be done in the order that's been given, otherwise it will not work. The states will go out of sync and you will never be able to recover. You'll have to refresh the page and go right back to the start if you get any of those bit steps wrong or in the wrong order. And there are other gems hiding, like the get stats thing for example. It's there, it's really great, but it's not something that when you get a video connection happening for the first time, you're jumping up and down so much you'd forget that there's all these other things that are hiding behind the scenes. So you can always take a look for them. They are there, they're quite amusing. And that's the end. I'll just update the scorecard. There we go, 11 talks given. So I might even have a couple of minutes for questions. So how much cross browser support? This works in theory on the majority of modern browsers. So Chrome, Firefox. I don't think Opera support is fantastic. Apple are only just beginning to stick it in Safari because Apple have had their own FaceTime stuff for years. They really don't want people using free stuff in the browser. There is a site on iswebRTCreadyyet.com.org or whatever that gives you a version by version breakdown of what methods are supported in what browser. Which is a problem. It does work and it works Chrome to Chrome and it works Firefox for Firefox and occasionally it works Firefox to Chrome. Occasionally it will work. And that is one of the bigger problems. So those of you that were here two years ago saw me give this same talk. Hopefully I've got it right this time. Those were here last year saw me gave a talk focused entirely around that point. There was a lot more and a lot of other things you need to consider when building WebRTC solutions. One of them is the cross browser support. Yes it is supported. But if you've got Chrome version X and Firefox version Y there will be certain issues that you as the programmer need to handle. Unless you happen to be in an environment in an environment where you can control what each end user has you will have to cater for at least 20 different versions of Chrome. Probably about 10 different versions of Firefox. And that is something that WebRTC doesn't do. It gives you that responsibility. So yeah excellent. Thank you.