 Greetings from wherever from Atlassian. Yes, we are carrying the world and coming over here to talk a little bit about this fun topic of WebRTC. Thanks very much everyone for coming here. This is really a lovely event. So a lot of the work that we do at Atlassian and HipChat is around this open source project called JITSE. Where we do various server components and people that deal with WebRTC often use them. So I get to talk a lot of people that experience WebRTC for the first time. And one of the things I've gotten to hear a lot of times is we love WebRTC because it is peer to peer. And I definitely understand the first part of the sentence. The second part leaves me a little bit confused. So I go and ask why exactly are we so excited by the fact that WebRTC is peer to peer. So they go and say, well, obviously it's more secure. Also it has lower latency and it doesn't require to deploy any servers and it costs less. So this gets more and more interesting because there is some amount of confusion here. Let's speak about security a little bit. The notion that your data is going to be safe because it goes directly between your peer is very confused. So first of all, unless there's a cable going between your computer and the other guy's computer and you can see the entire cable, then you're probably in for a rude awakening. There's a bunch of things that you can do in order to make your data secure. WebRTC does some of them like DTLS support. There are others out there like ZRDP, but none of them rely on the fact that your data goes from point A to point B. Also when you're loading WebRTC pages, they go through websites that relay your signaling from point A to point B. And whether or not they would have the grace to preserve the DTLS as RDP fingerprints so that they would be properly verified by the browsers is entirely up to them. So unless you're controlling these servers, you have to be trusting them in order to rely on security. Lower latency is somewhere in the middle. Yes, it might be true that because you're talking to another person somewhere, your latency between you and them would be lower. But the likelihood of your respective internet providers having better peering arrangements with a data center somewhere than they have between themselves, that likelihood is not low. So it is entirely possible for you to be able to connect better to Amazon than between you two. And that would obviously have an impact on latency. So it's not automatic. Also less hassle. That is entirely untrue. First of all, if you want to have WebRTC, you have to have web server. That's the first thing. Secondly, you have to deploy a turn server unless you're okay with the fact that 10 to 20% of your users won't actually be able to talk to each other. So you are deploying servers anyway. And whether that server relays 10 or 90% of your data doesn't really change the hassle of deployment. And a lower cost is the one that I respect the most. I think that's the best justification for making the effort of supporting direct peer-to-peer sessions. Because, well, if you have a big service, then you would have to pay for significantly less bandwidth on your server side. And that would actually have a real impact. But even that only starts being a factor after we reach a certain amount of scale. So all of these arguments should be taken with a grain of salt, bigger in some cases. And also, people should not be afraid of using media service, because they can help you with all of these things. They can help you when you need to do recording, talk to the PSTN, when you have to do a broadcast over YouTube or something like that. They help you with not traversal. And one use case that combines all of these ways they help you is conferencing. So we work a lot around conferencing, and this is really what I love doing. So I'm going to talk a little bit about that. The different ways that you can do conferencing with WebRTC, there are three popular architectures that you can use with more or less success. The first one, peer-to-peer mesh, sort of stems out of this love of the community for peer-to-peer and how it's great and secure and all of these things. So when you look at that diagram where you have three browsers talking directly to each other, it's a beautiful triangle with beautiful covers. That stops being that beautiful when you actually become a real conference. And this has a bunch of problems. So first of all, many users will actually be constrained in the terms of upstream bandwidth that they have, so they won't be able to stream to a bunch of people. Also, because of the way that congestion control works, you have to have different encodings when you're streaming to the different people so that you can adapt separately to the way their respective bandwidth estimations are turning up. So you basically end up with a lot of CPU usage. Now, you could potentially get around that but not with WebRTC the way it is today. Another way of doing conferencing is by using the also popular MCU. MCUs are actually good for a number of use cases. This is the thing that's basically behaving as an endpoint. You send one video stream, you get one back. In what you get back, there's a bunch of composite images that actually contain all the participants in the conference. But it is one single stream, very simple from the perspective of the endpoint. However, relatively complicated from the perspective of the server. You have to be decoding 30 frames per second coming from every participant. You have to be creating composite images, scaling stuff down, then re-encoding. That is very heavy. So it's going to cost you a lot of CPU, but not only that. But because this is an endpoint here, you would have jitter buffers, you would have to have synchronization when mixing stuff. So we are going to add latency on your MCU that you can't really solve even if you throw more servers at it. It's just a technical delay that you cannot avoid. So this is still very good when you want to talk to legacy, if you want to talk to sit-end points, if you want to talk to just BSTN and you want to mix the media, you have to do that. But in case you would like to scale, then you really have to go for video routing. I have the GC logo here because I'm subtly doing marketing because I'm great at that. This is our stuff. But you can use out of stuff as well, obviously. There are a bunch of SFUs out there, all of them basically are video routers. And this architecture is all about the performance, all about the scalability. When we did testing with our stuff, we basically... It was a very basic test on a quad-core film. We got about 1,000 video streams coming off of it, taking about 500 megabits of data. And 20% of the CPU was being used only on that. It was a bare metal machine, not the most powerful, still not bad. Basically what we got out of it is that we were going to be bound with constraint much earlier than we got CPU constraint. So that was pretty important. Now, when you start talking about video routing too much and inevitably someone would say, yeah, but mobiles, how are you going to make me receive a bunch of high quality streams on a mobile and render the mode there? How is that going to work? You need an MCU there, right? Well, not yet, not so fast. There are these two things, Simulcast and Scalable Video Coding that SFUs and endpoints have learned to do over the years. That actually helped you get away with routing video to mobiles and you end up with something pretty good. So let me get into that a little bit. What Simulcast is, basically having every end point, every browser streaming three versions of the same stream toward the core of the network, toward your video router. So you would have the crappy quality stream, the normal quality stream and the high definition stream. All of these would be generated at the end point and they would all go toward the video router. They would be independent streams. Other than the fact that they have the same content, the same movie or camera or something, they would not be related in any way. Now, a different way of achieving this multi-stream thing would be to do Scalable Video Coding. There are various kinds of Scalable Video Coding. I'm going to talk about Spatial and Temporal today. The one that's being very popular is, they're the most popular actually, and Spatial Video Coding is actually pretty easy to understand. You have to think about it the same way as you think about images when you see them slowly loading in your browser and then initially you see a crappy version of your image in the browser and then when you get more data, it gets better and better and better until it becomes the image that you're actually supposed to see. Scalable Video Coding sort of is something like that. Basically, when you get the different video streams that the endpoint is generating, they're not entirely independent. You do have a base layer that is just a regular video stream, but anything that comes on top of it is just the information that you need in order to complete that image into a higher quality image. And then again, you have just the deltas on the upper layer, the higher definition image that would help you to upgrade your image to a high definition image. These streams are related in that direction, so you could easily drop these two and still be able to read that one or you could drop the upper layer and still be able to reconstruct everything here. What you gain by doing that is a little bit of a bandwidth optimization as compared to simulcast and a little bit of a CPU optimization, although these have to be weighed very carefully because it depends a lot on the resolution. If you have big differences in resolutions then simulcast might actually turn out to be more bandwidth and CPU efficient. And then the third way of doing spicable video coding that I was going to talk about today is temporal SVC. Again, we have the concept of three separate streams going from the endpoint toward the network, but rather than having differences in quality, they're all the same quality, you just have differences in terms of frame rate. Your lower layer is just let's say 7.5 frames per second and then in the upper layer you basically just have the missing frames and then the layer to get you to 15 layers. And then on the layer on top of that you just have the missing frames that would get you to 30 frames per second or to 60 or to however whatever frame rate you want to support. Now, how does this help you in the context of video routing? Well, whichever of these you use, whether it says VC or simulcast or anything, your browser or endpoint would stream them toward a video router and from there it would basically try to send the best that it could toward the endpoint. So if you have something connected over fiber on the other end, the video router would basically go and relay the high definition stream. The bandwidth estimations that are being done use the algorithm that's implemented in Chrome that's defined in these jobs that Harold wrote. The same thing has to be implemented obviously in the video router as well for the whole thing to work. And obviously as these bandwidth estimations start reporting lower bandwidth between the router and the endpoint, you start dropping the layers, you move to 720p or to a thumbnail image whenever you detect that this is necessary. And then obviously you can do exactly the same thing on the sending side where as the sending browser detects that there's not enough bandwidth, it can drop the resolutions as it needs. So you can very easily adapt, not very easily, but very accurately adapt to the bandwidth and the device and the network that you're talking to even though you're doing absolutely no transcoding as an MCU would do, for example. Now, how would you do that today? This is always a very interesting topic. Today there's a good way to do that in Chrome. It works. So you basically do your create offer and or create answer or anything and when you find your media line, your video media line in that and you just put in there a simulcast group with describing three streams or two streams or however many streams you would need and that's it. You don't do anything else. It's actually really great because you don't have to early when we were thinking about simulcast, we were thinking well, we would just clone your video track and then we would just stream to separate things but then that means that we have to worry from within the JavaScript application about getting the bandwidth estimations and we don't actually have access to RTCPs so we have to find another way of sending them and the whole thing isn't very optimal to begin with so having Chrome take care of all of this stuff is really, really neat and the even more interesting part here is that when you do simulcast with Chrome in addition to the different simulcast layers that you get you also get temporal SVC within some of them and at least that was the case at some point and if that has changed, please let me know but so you get temporal SVC in there which basically leaves you not only with three but potentially four or five layers that you can choose to switch between on the router and adapt very, in a very fine way to what's actually necessary for the endpoint to receive now unfortunately right now this only works with Chrome but it is very soon going to come in Firefox as well so this is really great news it's likely going to work in a similar way as in you won't have to, and where's Nils, here's Nils he can confirm or deny or maybe neither but this is likely going to work in a similar way as in it is not going to require to Chrome video streams you would just basically either using SDP or RTP sender parameters or something like that just tell your browser that you want to have multiple tracks and it would take care of it all which is again very, very neat unfortunately on the standardization layer of this thing we don't yet have a unique solution that everyone should implement and it is unlikely that we would see it from what I understand in 1.0 so there still seems to be consensus about a number of things for example we won't have to Chrome tracks that everyone seems to see that seems to think is not a practical way of doing it you may have to use the RTP sender or it may be SDP only, who knows but there is still some bottlenecks that were identified on the meeting job that just ended in Seattle yesterday there is some work that has to be finished in IETF and music that is not quite done yet so maybe the decision would be postponed toward the next EPAC meeting and again it probably won't end up in 1.0 but what I am trying, I guess I am still very optimistic because if we get support for simulcast in the individual browsers even though it will be a little bit different as long as it on the conceptual level works the same way browser takes care of the details and the syntax is a little bit different that is not a big deal or at least not as big of a deal as it would be if we didn't have simulcast support at all so what you should take away out of this is that simulcast is going to be a solution for your conferences in the near future regardless of the browser and without my friends my talk is done thank you very much thank you Emile questions? do we eat too much? too much popcorn? do you want Emile to do it again a little bit slower? yes, let me start again hello earth people there is the question over there thank you very much so it seems pretty easy to use simulcast from the STD packet modification that you showed do you have to do work on the forwarding unit to choose which of these streams? absolutely it is actually very fun work we have been dealing with that for the past year we actually had even one working version and we decided to do it in a different way but yeah what you have to do in the forwarding unit is that you have to implement a bandwidth estimation so that you would know when to do the switches you have to recognize the different resolutions not as in looking into the video content itself but you have to indicate it somehow and then what we decided is also necessary is for you to rewrite SSRCs basically be able to switch video streams without the endpoint noticing switching them because otherwise you have to implement support put out on the receiver end as well and the whole thing becomes a big mess and you have to do it every time you implement a client and all of that so yeah there is some amount of work on the forwarding unit but it is substantially more efficient than say just because you could just open multiple connections and have different user medias give different I am sorry I thought that you were asking me is there development work on the STD yes I was there is no processing work there is basically none so it is infinitely more efficient than processing medium there is not a question over there what about support for different devices having different codecs or different coatings where do you have to implement an MCU in those cases ok next question please so having different codecs in a selected forwarding conference is a big problem actually the bigger problem is having devices that do not support the codecs that other devices are trying to use and this is a very tough problem so there are various ways to solve it none of them are ideal for example you could basically make sure that they are dependent on how many devices you expect to have some weird codec that are specifically for these devices working on your network or something like that one thing that I often say is that this is definitely not something that you should implement in the forwarding unit because it basically modifies your scaling behavior in a very dramatic way so you should definitely do it somewhere separately using an MCU as a complimentary element of your architecture that would also be one way of doing it but there is no ideal answer here unfortunately so everyone should support VP8 too thanks very much