 Hello, my name is Olivier Craig and I'm the multimedia lead at Collabra. I've been working on video calls for the last 15 years, first as the maintainer of a library called Farstream, which was used to do video calls using SIP, NX and BP, and later with all kinds of other technologies, but in the last couple years, just like everyone else doing video calls, I've really been concentrating on WebRTC. In this presentation, I will be telling you three things. First, what is WebRTC? I'm going to give a high-level introduction. Then I will go over the most popular open source implementations of WebRTC. And last, I will give you a couple tips and tricks on when and where to use WebRTC and when not to use it and how to deploy an embedded system. So what is WebRTC? It's a low way to send low latency data, audio and video in particular, to browsers. It's really been designed by the browser makers for their own use. It is designed for peer-to-peer use, meaning that you can send the data without going through a server. It is composed of a JavaScript API that the browsers implement, as well as a set of IETF standard that define how the data is actually sent underneath. The JavaScript API in the browser is composed of really two main parts. First, there's the GetUserMedia function, which allows the JavaScript application in the browser to access the microphone and the camera in a secure way, meaning that the user has to approve it. This is completely offline, so it doesn't actually connect to the Internet. It just accesses the data and you can put it on the screen, for example. In addition to this API, there is the PeerConnection API. That is the actual API to connect to the other side to establish the PeerConnection, and you can connect it to the camera that you get it from the GetUserMedia to create a complete video calling solution in a browser. The way this API is designed is that it extracts all of the encoding, decoding, transmission. If you're a web developer, you don't need to know any of this. The rest of this presentation really is for people who want to understand how it works and be able to put it in embedded systems. What are the component protocols at the IETF level, the lower layer? There are dozens of RFCs involved, but the most important thing that you have to know about is first the real-time protocol, the interactive committee establishment protocol, ICE, and the security layer, which is DTRS. What is RTP? RTP is a protocol to send audio and video over the Internet. It is widely used with protocols like SIP, XNPP, RTSP, etc. It is designed to be sent over UDP because it's designed for low latency, but you can also send it over TCP as a fileback. It's very simple, actually, RTP. The RTP packet only has a 12-byte header. This 12-byte extra header adds a couple more information points that the UDP header doesn't contain. First, it contains a flow identifier allowing two multiplex multiple streams over the same underlying transport. It has a sequence number, so every packet gets a number that the sender puts it in sequential order so that the receiver, by looking at these numbers, can put the packets back in the right order because UDP does not provide ordering. It can know if there was a gap, so if a packet was lost and take appropriate action. There's also a timestamp, so the sender just puts a timestamp so that the receiver can play back the packet in the same time sequence that they were captured in. It has an inferred of media format so that the receiver can know what kind of media it's receiving. This is particularly useful so that you can switch the type of a media at runtime of the codec or things like that. And then it has a system to extend the header, so if this 12-byte header doesn't have all the information you need, you can add some extensions. This is one of the things that the WebRTC system does. It has a couple of things that it does as a header extension. So what is ICE? ICE is a way to create a peer-to-peer connection between two computers. The way it works is actually not very complicated. Each peer, so each agent, each site, finds all the possible ways that it can be reached, all the possible addresses. So for example, it can be the address on its local interface. If you have something like a telephone, it can have like a Wi-Fi and an LTE interface, so it can actually have two addresses. Then it will try to get the address of any NAT box either by using some local protocols, such as UPNP, or by sending a packet to a server and saying, hey, server, where do you think this packet is coming from? And then we can maybe get the external IP address. It will collect all of these addresses and probably will pull the address of a relay to use as a fallback and send these using a reliable mechanism to the other side. So now both sides have a list of these addresses, which we call candidates. The list of these addresses is used by the implementation to take each local address and each remote address and pair them. And then it's just going to try all these network paths in a specific order that is specified by the process. And then it's with the trial and error. It will look at all the network packs and will try to pick the best one. And then once one of these are actually connected, you'll be able to send the request, receive a reply. Now we say that ICE has been connected and a connection has been established. So now that we have a peer-to-peer connection, the next protocol on top of it is TLS, Datagram TLS to be particular. This TLS is the same protocol that is used for HTTPS. So it does almost exactly the same. It just adds two things to work over UDP. So the first of it is a sequence number so that you can know which packet you're talking about since there could be gaps. And there's a mechanism for retransmission. So if a request hasn't had a reply within a certain timeout, the request is resent and the other side will reply again. So this allows it to work over an reliable lower-layer protocol. What does TLS do? For those who are not familiar with it, what it does really is at first it uses certificates to establish a session between both sides. And from this session, it negotiates a temporary session key. It generates a random session key and it exchanges it securely between both sides. Once this secure key has been transmitted, then this is what suits actually send the data. In the case of RTP, we use something called secure RTP. So instead of encrypting the totality of the packet, some bits of the RTP are left unencrypted so that you can have a middle box that can forward it more or less intelligently without having to decode the content. The way this works is that the session key from the TLS session is extracted and then it is fed into the SRTP stack. This is called the TLS SRTP. WebRTC has a couple other features that I'm not going to go into details, but there's a way to retransmit audio video packets that is used by the browsers. It supports forward error correction that is mostly useful when there are a lot of errors which are not caused by congestion or when the latency of the link is quite high, for example a satellite link. It also transmits information that enable bitrate adaptation and WebRTC itself does not specify how to bitrate adaptation that's really left as an existence of the reader, but appropriate information is transmitted and different browsers have their own implementation of these algorithms. Most actually use the code that comes straight from Google from Chrome, but other implementations do exist, other protocols exist. Then there's a data channel, so I talk about audio and video. WebRTC also has a separate way to send arbitrary data that can be sent from the browser application. This data is sent as messages and these messages can have different levels of reliability. So it can go from full ordered reliability such as you would get from TCP. You can have fully available but not ordered so they can come in a different order. You can have partial reliability where if a packet doesn't arrive in time it's retransmitted, but up to a certain limit, so either a certain number retransmission or a certain timeout, and then you can just start completely unreliable. So this is controlled by the application. So I said there was a relay server as possible so this is called turn. There exists a couple open source implementation but by far the most popular it's called turn. It's very scalable, it doesn't take much resources from the computer. In our experience we fill the bandwidth way before any other resource gets a problem. So that is the main kind of agnostic server so turn doesn't know anything about the content just forward packets. There exist two other kinds of servers. The first one is pretty much the most popular it's called selective forwarding unit. It doesn't decode and encode a video it just forwards the flows. So this is what is used for example by many of the online video call platforms. One of the big advantages of this obviously that it's cheaper on the server because they can receive a lot of flows and just forward some of them or like if you ever call it a thousand participants maybe it's just going to forward you the flows of like five or ten participants. There is a number of open source SFUs. The most popular one is probably Jitsi and Janus. There's also MediaSoup so all of these three are widely used and pretty reliable. Then there's something called a milky conference unit that is the more traditional video conferencing system. So an MCU will receive the video decode it and then you can for example compose it create a mosaic or something into a single image and then send a single stream to each client. Because the disadvantage of the previous system is that if you had multiple streams that you want to see like talking to multiple people then you would actually receive multiple streams of the client. You have to decode separately which requires more resources than the client. In an MCU we move these resources to the server which does the transcoding. The most popular open source one is probably FreeSwitch which was generally used as a telephony server but it can also do video and there is something called Corento also which is a Java based framework to build media processing applications in particular where our TC MCUs. Now that I've talked about more the server side if you want I'll talk about the end point. So the end point is what you actually have like with you. The most popular library by far is called Lib WebRTC or just WebRTC. It's from Google. It's the code from Chrome that is used in the Chrome browser but that was also adopted by all the other browsers. So if you use Firefox, if you use Safari this is all the same code base. There's also Pion which is pretty actively developed which isn't go and which is more a framework to develop WebRTC applications. It doesn't do the encoding or the decoding a bit like Gstreamer does doesn't do that either in the WebRTC stack and it looks pretty decent. I haven't used it personally but I only hear good things. There's also a library from Amazon from AWS called the KVS WebRTC SDK. I've only seen it used as a way to send video to the KVS WebRTC service from Amazon. So that's a bit of a special implementation. And then last but not least is Gstreamer's implementation of WebRTC. It is based on Gstreamer's mature RTP stack. Gstreamer for those who don't know is a media framework to process audio and video information that is based on the concept of pipelines. Pipelines are a graph of elements that process information one at a time. So you have the first element of the graph that captures information and then sends the next one which does something to the next one that's something else all the way to the end of the graph. This makes it possible to mix and match different technologies together quite easily. One of the features is that we have encoder and decoder elements some of which are software based some of which are hardware based and which cover pretty much the entirety of what the industry can do. This means that when using Gstreamer's WebRTC stack that's also a separate element. So we can connect an existing encoder to the WebRTC stack without having to write any new code. By its nature as a separate element it makes it easy to integrate it into an existing Gstreamer based pipeline. But a design choice that was made when creating the Gstreamer WebRTC element was to mimic the WebRTC IPI from the browser so the JavaScript API. And that creates the limitation that the this element cannot be used from the command line with GSC launch. You actually need to write an application around it. This means that it's normally what you want in real life but it means it's a bit harder to prototype. So to make problems that seem easier there is a element called WebRTC stack protein and rust that has a web server included in a small application built in so that you can easily try the WebRTC Gstreamer element and even use it for simple use cases. WebRTC is great but it's not for everyone. There are some cases where it's not the appropriate technology. The most evident normally is anything that requires large scalability. WebRTC is based on the idea of a one-to-one connection which means that it's difficult to implement caches and CDNs around it. Although some people have tried it's really hard to scale and costly to scale. Also if you don't need low latency or peer-to-peer maybe you just want to have use something like low latency dash or hls that gives you around one to two seconds of latency at scale that can go through a cvn and allows you to send video from a single source to a lot of recipients relatively of cheaply but if you need a latency under one second and you want to send a video audio to a browser or receive video audio from a browser WebRTC is for you. If you want to go peer-to-peer and skip any kind of servers WebRTC is for you either because you want to use a server call for privacy reasons. Another typical use case would be to replace RTSP so that's the protocol that's commonly used by security cameras and it works really well if you're on over a local network but it really dates from 20 years ago and it doesn't have the modern feature that WebRTC has in particular it doesn't have eyes to go through NATs it doesn't have retransmission normally it's not deployed with any of the other modern technologies so it's quite common now that people want to replace RTSP with WebRTC to be able to access a security camera remotely but with the low latency that you get with WebRTC. That said WebRTC is great but it has some performance requirements even if you're not encoding right because encoding video normally takes quite a lot of processing power but then an embedded system normally you have a dedicated hardware encoder to do that so it's not really relevant often but what is relevant for WebRTC is that encryption is required you cannot avoid it and it means that either you need a processor that can process a yes in real time so if you use something like OpenSSL it has really good ARM assembly and it's quite fast but for example a Cortex A5 is too slow to do 1080p video to do HD video so in this case you will need some kind of hardware accelerator if you have a modern Cortex there is hardware instructor built into the CPU that is very fast you've already won but if you're using something older and they chose to build the TS accelerator the AES accelerator as a separate IP block often that will not work for WebRTC not because the hardware accelerator doesn't have the right AES variant but because there's a setup cost using a next one accelerator and since you have to encode every packet separately you just lose all of the benefits of the hardware accelerator and the other thing that you have to be careful when using WebRTC is that it has a larger attack surface than traditional HTTP based systems for example there's many different protocols there's RTP there's a video encoding video decoding there is the bitrate adaptation stuff there is ICE so there's many different protocols that could all potentially be an attack vector so it's very important that any device that integrates WebRTC capability have an update mechanism and an update process that makes it relatively easy to do critical updates in a timely manner so if it takes you like six months to deliver an update to your device then WebRTC probably is not a good idea for you either. In addition to the security aspects, browsers move very fast right some of them are on the four-week release cycle and that means that the alpha version is going to be the final version in two months right so it means that any system that depends on connection with the browsers or anything non-trivial with the browser needs to have continuous integration against the alpha version of the major browsers for the system to keep on working it means that when you find that the browser have changed in an incompatible way you have about two months to deliver updates to all of your devices out there otherwise they won't be able to work with modern browsers by the time this update rolls out to everyone. So you need to update quickly and mark my nation also with to update to the latest version of your WebRTC stack relatively regularly so multiple times a year perhaps. Thank you very much I hope this was useful I hope you learned something and I will be available for questions on the on the chat I believe thank you very much