 Thank you, Chad, for the wonderful introduction. I'm Varun Singh, CEO of CallStats.io, a startup based out of Helsinki, Finland. CallStats' basic promise is to help detect, diagnose, and deploy fixes for our customers in real time. Although today's talk is a bit more about how you guys can build your own platforms in a more elegant way using the statistics API from WebRTC. I'm going to, first of all, begin with a small refresher about the protocols for multimedia. So when we talk about multimedia systems, especially like WebRTC, there are a bit of things that multimedia system needs to do. So one, it needs to capture audio and video. These come in a couple of codecs. For video, you have H264, you have VP8, VP9, and more codecs in the future for audio. There is G711, G722, and Opus. And typically, an endpoint, which is a multimedia system, captures the audio and the video packets or frames at first, then packetizes them and sends them over to the network. On the receiving side, the endpoint receives these packets, depacketizes them, has a jitter buffer to make sure that plays out correctly, and then basically renders it. And so just a bit about how we do this. Typically, we use a protocol called RTP, which stands for real-time transport protocol. Typically, in the next few charts, I'm only going to talk about senders and receivers to just to make sure that people understand how this works. So in these cases, the media is only flowing in one direction, but with ViberTC, media flows in both directions. So in this case, you have a sender and receiver. The sender is encapsulating the audio and the video packets and sending them across the network. Along with the media packets, it might send protection packets. It might send retransmissions. It might send repair packets, so on and so forth, in case packets get lost. At the receiver, the receiver is receiving these packets and playing them out. It needs some information to play out these packets in synchronized manner, especially when it's getting audio and video. So it has a jitter buffer, and it also monitors some monitoring and reporting so that endpoints can look at the data locally and see if the media is being rendered properly or not. With RTP, there is also a control protocol associated with it. It's called RTCP, which is RTP control protocol. And there is, of course, to create a feedback loop between the sender and the receiver. As I said, the receiver needs to play back audio and video in a synchronized manner. So the sender sends some timing information from the sender to the receiver so that it can play them out in an accurate, smooth manner so that there is no AV desync problems. At the receiver, it needs to send data back to the sender so that the sender can adapt. Typically, this is carried in RTCP's receiver reports, or called RTCPRRs. They have rough statistics and some congestion cues, which helps the sender do short-term adaptation. And when I talk about short-term adaptation, this can be automatic bandwidth control or adapting the media bitrate, or resolution, or frame rate based on the network characteristics observed at the receiver. So this was a quick recap of the protocols associated with multimedia and today with WebRTC. So we do all this mainly because every application service wants to have really high quality video and audio. And typically, that requires, especially in WebRTC case, the latencies for these systems to be low so that you can hear and interact with the people in real times. And the talk before, for example, talked about what happens when these situations are encountered. And one of the proposals was to do a pretest. And the pretest is all good because you make sure that the call starts off very well. But if the calls are long enough, there might be issues that people may encounter during a call. This might be people, someone turns on their microwave or people walk out of the door, lose connectivity, and so on and so forth. So when we talk about WebRTC, we want to always say we want the highest quality of audio and video. But literally, the most important thing is not the highest quality audio video. You want optimal audio quality and optimal video quality. But the most important part is to have interactivity. And that is guaranteed a bit by having lower latency. And when I talk about low latency, I also mean lower one-way delay or lower RTTs. They're all synonyms for the same metric. For example, in this chart, you can see that this is a call which is about 260 seconds. The y-axis is the delay or the one-way delay measured by the endpoints. What you see is also a red line, which is the ITUT standard for audio latency. Typically, when you say the audio latency should be below this for a call to be considered good. And in the chart, you can see that the audio latency from one endpoint to the other is spiky. At the beginning of the call, it's lower. But towards the end of the call, around 200 seconds in, you see a three-second spike. And that's not it. The spikes continue after. So one of the most important things is about being able to measure this and to be able to respond to customer support tickets when they say they had a bad quality of experience. A bit more about quality of experience is the ability to measure the user experience. And when we talk about WebRTC, the most important thing we talk about is the call experience, the duration of the call from the start to the end. And you want to be able to measure the quality of experience by collecting metrics. And you also want to collect user interactions. For example, did they turn off their audio? Did they turn off their video? At some point, did they switch from their camera to screen sharing? Did they switch back and so on and so forth? Because you want to be able to correlate the metrics that you see during the call experience with user actions. For example, if you see a video bit trade drop, you want to know it was because the user indeed turned off their video. Then you can speculate why they may have done that later on. The other bit, and it was also an important factor in the talk by Google earlier on by Daniel Peterson, was the call setup time and how important it is. So when you build applications, you want to be able to measure when the call started, when was the interaction made, and when did the first media frame arrive so that you can say the call has begun. And that's an important aspect, because if the call takes too much time to set up, people might get a bit annoyed or just leave. And you might have a larger fall off in your usability of the product. At the end of the call, typically you've seen with Skype and various other applications that at the end of the call, they ask the end user how the quality of the call was. It's really important to ask the right question, because if you do not ask the right question, you might get really weird feedback. So being able to say, like, did you experience specific annoying issues with the video or with the audio is a better thing than just saying, please rate the quality of the call. Because you might get answers like, for example, one of the times we've seen, an end user said, I do not see Mary. And it's an interesting question. You'd wonder who Mary is, and if she was expected to be on the call, for example. More about measuring user experience, depending on what type of service you build, for example, Hangouts and other services, when you pass a URL or give a URL to a person, the video immediately starts rendering. In those cases, you want to be able to measure the page load times and so on and so forth to make sure that the call starts immediately in a snappy manner. You want to make sure that all the libraries that the call depends on are loaded ahead of time. And if you have any issues with caching or with some JavaScripts, you want to make sure that you are able to measure those at the beginning of the call. The other thing is that not all calls work that way. There are other calls where, for example, on Slack, you press a button and are initiated into a call. You want to actually see what happens at the point when you press that button. So you want to be able to capture all those aspects of the call initialization process just to make sure that you have all the data available for you to measure user experience. So as I said, so there are a lot of things that you can do in terms of measurement. And typically, you can do that at the network level. So you get bits per second. You get RTT. So those are network statistics. And the second level of information you want is the multimedia statistics. You want to know if a frame was rendered, if there were burst frame losses, was the audio and the video played out synchronize. Asynchronously so that you know that the pipeline is working in a correct manner. Above all, what you want to make sure is when you have these network metrics and the multimedia pipeline metrics, you are able to create models out of them that we call annoyances so that you can say, for example, the resolution changed by too much. So you started at 4K or 1080p, and the video resolution dropped to 640p or 480p very quickly and over a very short period of time and maybe went back up to HD video in a very short period. So those things are annoying for users, for example, because the quality of the video has changed dramatically over very short periods. And by measuring those things, you can come up with your own quality metrics, depending on the kind of service you're building. So there are some services where you want low frame rates or some others you want high frame rates and maybe optimal resolutions. And you can do that today by using the GetStats API. So the GetStats API is at the top where it says that there's an interface on the peer connection, which you can call. The response to the peer connection or to the GetStats API is asynchronous. You call the API and a short time later, asynchronously you get a callback which gives you an answer about the data in it. You can call it at the peer connection level or if you know which stream you're interested in, for example, an audio or a particular video stream, you can call the GetStats on that particular stream. You can call the API as often as you want. Typically, if you call it quicker than 150 milliseconds, you're gonna probably get the same response back as you got a moment ago. And you can call the, by calling the API more often, you can get a series of data which might show you the trend. For example, what is happening? So is the bit rate increasing or is the bit rate decreasing? So that is one of the charts that you saw before was we're calling peer GetStats more often about every one second. Here's an example. For example, in this case, we're interested in the audio latency or in this case, audio track latency. We call the GetStats on the peer connection and this is a promise API. So as soon as the data is available, the then function would be called and if it fails then the catch would be called. So if there was some error in GetStats, when you call GetStats, if it was not implemented or so on and so forth, there would be a log error displayed in the catch statement. And as you can see, in this case, I'm just logging almost all the data out of the outbound statistics for the audio. And the reason I did that was the RTP's RTT or round trip time is measured at the sender. So you send a packet and it sends something back and you measure the time taken for that interaction. So if you want to measure the RTT, you call the outbound RTP. You can see the output on the right side where it says the packet sends, byte send and round trip time. In this case, the round trip time is in milliseconds so it's 31 milliseconds. When we talk about metrics, another way of looking at it is that there is media flowing between Alice and Bob in this case. Again, for simplicity reasons, I'm only showing media in one direction. And you can gather statistics across the pipeline. So at the sender side, there are tracks, there are audio-video tracks, that are put into a sender which is an RTP sender which sends the data over the network through an ICE transport. And at the receiver side, the data is received on the ICE transport and given to the RTP receiver where it depacketizes it and sends it to the right track. So in most cases, you can call getStats on all these objects to get accurate information related to that particular stream, for example. So I'm just gonna walk you through a quick example of the skull between two people. So there is a user zero and there's a user one. And I'm only using the data that is going from user one to, from zero to one shown by the orange line. And we're just showing the frame width in this case. Sorry, it is frame width on the top, I say frame height, apologies for that. And as you can see at the beginning of the call, the frame width is 1080p, it quickly drops to 640, I believe, and then remains constant for most of the period and then around 1600 to 1800, it switches between the current and higher and then comes back down. So at the tracks level, you can see the data related to the tracks. So in this case, we're showing the resolution. At the RTP center level, you can look at the throughput. And for example, in this case, you can see the throughput is not always constant. It is like varying at the beginning of the call, it goes up to 2600, then drops. And there are quite a moments where it drops below one megabit per second. And towards the 1600 and 1800, what you notice is that the bandwidth drops quite a lot very quickly and then goes back up and comes back down. So this is similar to the frames, this might be one of the reasons why the frame width was changing in this period of time. But it's also interesting because we see these drops in the middle between 600 to 1200 that do not really cause any variation in the frame width on the previous slide was basically flat from the beginning till the end. So one of the things that we noticed was we went down to the ice transport and we looked at what are the losses that we're seeing on the same path. And what we noticed is that there are periods of 40% losses which coincide in the beginning like at 400, 800 and 1200. We noticed these kind of peaks in packet loss which perhaps indicate why the throughput or the media throughput dropped in that period. But, and the other thing that we noticed is that around 1200 and 1400 you see spikes again which along with the fact that the bandwidth also fluctuated in this period, kind of is indicative of the fact that the congestion control tried to change the frame width and the resolution to overcome these losses and in the lack of capacity in this period. So these are things that you can do if you have data across the pipeline being able to go and investigate after the fact. So I'm just gonna walk you through another example of a simplified e-model. This is for G711. It's a recommendation from ITUT from maybe 25, 30, maybe even earlier years ago. And what it basically says is the one way delay from the mouth to the ear should be within bounds. So at the highest level, if you want users to be very satisfied the mouth to ear delay should be about 250 milliseconds. And at the worst case, if it's above 500 milliseconds, then it's considered bad or the user is dissatisfied. And this is fairly easy to implement today because we have the GetStats API. And remember at the beginning of the charts before I talked about the GetStats API. So we were just repurposing that data again. So you have a selector which is only using the audio tracks and we're looking at the GetStats API only for the audio track. And in this case, we are pulling it at every one second interval. So that's why there's a timeout that will get the RTT measurements every one second. And we can look at the variation over a longer period of time. The next thing we want to do is we want to be able to look at the outbound RTT. As I said before, the round trip time or the RTT is only measured at the center side. So we want to look at the metrics at the center side which is the outbound RTT and the local statistics for that audio stream. And what we're going to do is every time we get a round trip time measurement, we're going to put it into an array so that we can calculate an average over the lifetime. So you can either take from the beginning of the call or you can take 10 second samples or 20 second samples or longer samples for measurements depending on how robust you want the metric to be. And once you have calculated the round trip time average, you can divide that by two and pass that to the simple e-model which would then go and compare it to the graph that I'd shown at the beginning of the call. And you can log that, for example, locally or you can send that data to your logging service where you would be able to show the variation of the call quality for that particular person over a period of time. So this was just an example and you're wondering what you could do more. It's not just about the GetStats API. You can also attach yourself to the I States. So there's something called on I State Change. You want to attach yourself to that because whenever it goes to disconnected state, for example, or if it fails, you want to be at the application level, be aware of the fact that you lost connectivity or the connectivity is only momentarily lost because you can build things. For example, when a user complains that they did not have any good quality or they could not hear a person. For example, in our case, what we've done is we have this visualization where we have three participants in a call. They're present for about 16 to 18 minutes and we have this metric called disruption which means that either they lost connectivity, so the network address changed or there was low available capacity or very high RTT or very high losses. So in this case, you can see that by the gray periods on the charts, the most important thing to notice is that the third user AE is disrupted for about eight minutes with this user E6 and in this case, it's because that he was experiencing very high latencies during that period. So at the high level, if you go and look at your logs, you can easily look at this disruption, for example, and if a user complains, you're kind of aware that you're also seeing the same behavior that the user is complaining about. And then you can, as a second step, go and try to diagnose what those reasons might be. The common occurrence that we hypothesized earlier was the fact that whenever you have like low bandwidth situations or losses and you can't hear the other person, typically at least I go and try to switch off my video and sometimes turn off my audio and then turn them back on to see if the person will be able to hear me better. So I turn off my video in the process to see if that happens. And in this case, there are three people in the call again and all three are disrupted using the same definition of disrupted as before at the same time. And what we see now is that the users are actually turning off their video and turning them back on. So each liner presents like their interaction. And this is another way of looking at like how people are self-diagnosing their issues by turning off video. And this is how we do analytics. And that's why you don't only need data from the GetStats API, but you want to like leverage as much information and context that you can gather from your application. If you're interested in more about GetStats, I'm one of the co-authors with Harold Alvestran from Google about the GetStats API. It's available at a URL listed there. We recently updated the document maybe six, eight weeks ago. We're going to update it again over the next few weeks before the end of the year. So if you're more interested in using the GetStats API, you can have a look there. The other thing that we learned over the last few years is that, you know, our customers needed to deploy more turn servers. Some of them needed to go down to introducing TCP support. For example, this summer, there was an app or a product which was targeting, let's say, 12 to 18-year-olds, so people or children at school. And they released the app just before summer began or just after the schools closed down. And, you know, the app was available over the summer and about the week when schools reopened, what they saw was, you know, they were seeing more failures on their network or on their calls. And also, there were a lot more turn servers were being used. And one of the things that was causing this was when people went back to school, the schools have very restrictive firewalls that they did not anticipate and during the summer did not notice because they were at home and the home typical normal turn servers were okay. But at the school, they noticed that, you know, because of the restrictive firewalls, there was, they needed to deploy turn TCP that happened on a particular like day of the week, which they did not anticipate for the first few months that they were in production. The other things that we've noticed is that our customers start to detect crashes and disruptions and the talk before and after they're gonna talk about like how you can reset up the call or handle these crashes in a more elegant way. So the crashes might be because of the pipeline crashing or because the screen sharing plugin crashing or even loss in network connectivity. So to summarize my talk covered a bit the basics of RTP and RTCP, the protocols required for carrying audio and video, a brief introduction to GetStats API, the talk before mine already introduced it at a certain level. So I hope you have more use cases or more ideas about using the GetStats API in your own apps. And I showed a simplified e-model that you can use today if you're using G711 audio codec. Thank you.