 So I'm Philip I work for appearing we're a we're part of telenoor the Norwegian telco And we run the service called appearing and I'm doing all the weaponry stuff there I want to talk about debugging tools and techniques for weaponry C and Answering the question what happened when a call went wrong So weaponry C works great usually like 95% of the time knows it's great The video quality is awesome. The audio quality is good and everyone is happy Most of the time especially when you test locally because then you don't get packet loss however in production Alts fail 1% or less, but they do and People can't hear each other. We've seen Marcio saying that and it's true unfortunately and There's a large decent for example between audio and video. We had that in February and people this Gribes the quality as horrible whatever quality means and whatever horrible means and Let's not get me started on into our problems like Chrome and Firefox not playing well with each other or Chrome not playing well with Microsoft Edge and I've seen frequently browser upgrades breaking things. So every six week I Oh something broke. Okay Great if you want to read more about that I have a blog on medium and Lots of stuff there So you get user reports. They say oh the call failed the audio is bad or the video is bad and then Your product manager or your customer support person is going to ask you what happened So what do you do then? How do you answer that question? How do you figure out what happened and maybe what action do you take? Though, let me show you some of the tools and techniques I use so in terms of tools there is most importantly Chrome's WebRTC internals page Firefox has the page about WebRTC and Most likely you will end up building your own solution or use someone a service like Varun's So Firefox is about WebRTC It is very basic compared to Chrome's page. It shows you all the connections that are currently active It shows you the STP And it shows you how the connection is established. So it shows the ice candidates and the state From the web racing internals page is one of the tools I use most often. It shows you API traces and statistics for all calls for all peer connections and At the top you can see Over here you get all the connections you get all they get me user media requests You get all the connections and for each connection you get the configuration and You get those API traces on the left side They basically tell you all the peer connection calls that happen and on the right side and at the bottom you get These statistics which are sent to the WebRTC internals page every second and the API traces are really useful to figure out what happened in the call like When was an offer created did you get any set local description failure calls and all that and For example, you would call ad stream on the peer connection and it would show oh The stream was added. It had this label. It had this audio track and it had this video tracks. That might be important if you For example expect this to happen with an audio and a video track and You only get an audio track here. For example, then the user might not have a camera and You can see how the STP offer is created with what options like Here we are looking at something where we call it with create offer to receive video and offer to receive audio So we want to receive audio and video and We can also see the set local description call which shows you the whole STP It's a big blob of text if you want to know more about that. There's a great WebRTC hacks post on that And you can also see all the on ice candidate and ad ice candidate events which Show you how the network behaves what the IP addresses are that are used to connect You can see the ice connection state and how it changes from checking to connect it then completed And that allows you to understand the network behavior of your application You also get these statistics on the right and Your application can get the same information that is displayed there with you get stats API It shows you things like whether this is the active connection that is used currently it shows you the IP addresses However, this shows a Google local address. Don't use any stats Prefixed with gooks. They might change any time without notice and It shows you the remote appeal address as well And for example, it shows you number of bytes received number of bytes sense and number of packets and and you can measure how the application is behaving on the network and At the bottom you get a lot of statistics graph which basically Paul gets that every second and you extract the metric from that like bite sent and you make a timeline graph which has the time on the x-axis and the value on the y-axis and For example, here you can see the number of bytes sent per second and the number of packets sent per second It's all nice and stable in 1.6 megabits per second and about 200 packets sent per second. There's no big variation here. So it's Flowing nicely that was a local host connection. So no packet loss and you can also see For example the gook you Gook encode usage percent which is a measure for the encoder CPU usage And if you use this we're reporting that their fans were spinning that is a graph to look at and Another example is here for example the frame hate input From the camera and the frame hate sent on the network that should usually be the same unless You have bandwidth adoption because your band was not was not sufficient or The CPU was not if not sufficient to send that frame over the network So basically there are three steps when debugging Something using weaponry internals. The first step is to get a dump Second is reading it and the third is importing it somehow And I'm going to walk you through that process so first you navigate to the chrome weaponry see internals page and Then you expand you click on this create dump and that gives you a number of options the first Or the second and third are Quite important when you need to file a box they get you dumps that the Google developers will ask for because it allows them to understand the behavior of chrome much better than What we can get from JavaScript and what is more important for you is this button? Download the peer connection updates and stats data which gives you a big Jason file usually and you can interpret that yourself whereas even I don't interpret other two usually and it gives you a big Jason file which contains a lot of information and Good luck if you want to read that yourself. I Sometimes do that. I try to explain it in back reports and nobody can follow me then And you can import it. I have a tool written for that. It's on GitHub. It's open source which basically takes a stump reimports it into a web page shows the very Same information that we have on weapon is internals on purpose because we have this quasi standard Here and it adds the ice candidate grid from firefox, which is not on chrome's weapon is internals that is useful for quite often and The one nice thing it shows me By the green fields here whether the connection succeeded if it fails it will show me a red Ice connection state change so I don't need to expand all of these states and then look at them saves me 20 seconds per day so Calls failing the typical thing to look at is the ice connection state change and you're looking for a value of failed there and if that happened you Should check whether you're using turn servers as chat said you should and whether you're getting relay candidates both in the on ice candidate and the ad ice candidate calls and a relay candidate basically looks like this the important thing here is the type relay and I have a whole talk on that 20 minutes and You should get those candidates from both sides If not something is happening in the network UDP might be blocked TCP might be blocked and the user can't get out in any way And Another thing that Marcio already said people don't hear each other It happens quite often on OS X and Windows and the issue is that if you send your laptop to sleep too often Chrome can't open the microphone anymore. You don't get a signal from the microphone and You can see that in the statistics. So here we have a timeline graph 110 seconds on the y-axis we have the audio input level here, which is zero the whole time That could be perfectly normal if the audio if your microphone is muted More importantly, there are no packet send for that period and that is not normal And You can get that from the get stats API. So you're calling PC get stats and then you're dealing With the results you're looking for a report Which has a send in the name and then you are looking for a Chrome specific report type SSRC and the media type should be audio Then you're parsing because Chrome stats are a little behind the spec the report by send and check whether it's zero and if you do that You can show a warning to the user and we got a lot of complaints about this issue We see it in about two three percent of the calls, which is very bad But Google is going to fix it next year and after we showed a warning to the user and recommended that they restart the browser to reset The application states the complaints dropped to zero You want to read more as there's a blog post about that? So something we saw in February was a very large desync between audio and video So it was no longer lip-sync, which is very annoying And we got a cut I was working for talk box at that time. We got a customer report audio and video are desync then we tried to figure out how to measure that and after some hours we found this gook target delay millisecond which describes an intentional delay on the video to sync up with the audio and Well, we saw this graph and we saw it Should this increase in this linear way over time and we saw it no and we found a bug and it turned out to be an issue with specific cameras where the timestamps were often this decent was growing it was pretty severe in Chrome 47 and Google fixed it very quickly after they could reproduce it and the bug fix got back all the way to stable More about that URL if you're interested So the quality is still horrible after you fixed all those things So I'm going to show you some examples of what to look for For example, if you look at the throughputs you can see here a graph Showing the packet loss over time and you can see the packet loss is cumulative and you can see four events where there was packet loss and Typically in those cases the bandwidth adaption will reduce frame rate and resolution and the user perceives that as blocky and bad video quality you could also have the packets lost being continuous like in the blue line and it happens as well It is different kind of packet loss, but you need to deal with both and if you look at for example the jitter which See here same graph for the same call we can see huge spikes in the jitter typically if the jitter is Basically smooth and you have little variation low variance. It's good, but those spikes are really bad that was about two three seconds in that case and Run trip plan Varun talked about it if it's smooth It's good if there are spikes like this It's bad and that was two to three seconds and that was really bad and 100 to 400 milliseconds are acceptable depending on where your calls happen in the globe So another thing to look at is usually the resolution in the frames per second So we can see here that there was an issue in the call the blue line shows the frame height over time and at this point it drops and Stays low for one and a half minutes and at the same time the number of frames sent went down So during that period Basically, there was not enough bandwidth to send the frames over the network and in reaction to that Chrome first dropped the number of frames per second started dropping frames to reduce the bandwidth and then reduce the resolution and Yeah What you can also see in that case is that the CPU usage drops a lot when the solution goes down because the encoder has less frames to process and less bytes to process another thing to look at is the bandwidth estimate so in reference see the bandwidth between the two peers is continuously estimated and You want to know how much bandwidth is available and what we can see here is That's the car all started good and then Went to a bandwidth of about 600 K and then suddenly there's a drop here to 40 K And it starts ramping up to 600 again stays there for a second and then drops down Ramps up again drops Ramps up drops Ramps up is stable for a while and then drops again and when ramps up and that kind of behavior is going to be very annoying for users because what happens then is that the video quality reduces and It is perceived as bad So it's caused by latency and packet loss usually and will result in bad video quality. So that call was not Very good user experience. I don't want to be on that call when that happens So we're pretty seen tunnels is a great tool However, there are some limitations and the most important one is that you can only if you Need to ask the user before things happen to have Chrome where pretty seen tons open and then send it to you So when we had this decent issue, we couldn't reproduce it locally and did request Two weeks or more to get it done from the customer who reported it So then we started doing a thing to automatically collect That data for each and every call we open sourced it. It has It is available on that URL. It is a joint project between us and talk box and You just include a single line of JavaScript Before any of your weaponry sees tough So it's just a single line no integration You can have more a deeper integration if you want to but it's optional and It transparently modifies all the weaponry see API's and it inserts itself into them and That week you can create the dumps Like you have from where pretty see internals for all sessions that you run Which can be quite a lot of data and you send all the API traces and the gets that's data to a server and Then you can really figure out what is going on and as a summary When the customer support person is going to ask you what happened now you have the data to answer and The data is the API traces and the gets that's data and You can try to figure out what happened in the call Most of the time honestly, it's just bad internet There's not much you can do but There are some cases where you're doing something wrong like not running turn service and you can't figure that out and if you spent that time He uses well in the end be happier and use your service more and your service will grow Hopefully and was that? Thanks you