 WebRTC for a web developer maybe is a little bit hard, but before that I was also an electrical engineer. So although I'm not from the VoIP side of the world, I can definitely appreciate that this is much easier than it could be for sure. But there's a lot around the design of WebRTC applications that I think we're still learning because this is very much an evolving leading edge space. So that's what I wanna talk about is user experience around WebRTC based applications. So as technologists and as engineers and people who really love and are fascinated by WebRTC, we love to talk about things like stun and turn and signaling and peer-to-peer and encryption and all of that good stuff. And that is super cool, but our customers, our users of our applications don't care about it at all, right? And so we try to talk to them with metaphors about things like it's Skype in your browser, which helps to convey perhaps the business value of it and that's great as far as it goes, but we need to still make sure that we're not just stopping there and think, okay, great. The same experience that we might have in a particular tool in this setting is what we should be plugging into a mobile app or in a web browser, right? All that the users really care about is the experience that they have in the application. And that means a lot more than just peer-to-peer and encryption. Now there are some things about WebRTC that it provides to us out of the box that determine a lot about the interaction patterns with our users and so these are really important to not only recognize that they exist and we don't control them as the application developers, but also think about how they impact people using our applications. And the most obvious one on the left here is the allow or block from the browsers to use the video or camera and that's great, obviously really important for permissions and security, but we also need to think about what is your application going to do when someone does block the use of their camera unexpectedly, right? So even though we can't control the display of that pop-up nor should we be able to, we need to think about what do we do when someone does something we don't want them to do, which users do all the time, right? And then the other obvious part of WebRTC that it comes with out of the box for experience is that little red circle on the tabs in the browser experience telling you that you are live, that you're on camera and that's a great reminder to users of which tab is using their camera, right? But especially if they're coming back to your site for the second time over SSL and so maybe they previously gave permissions for your application to use their camera but they don't necessarily remember that when they came back and all the sudden we're turning their camera on for them, so we need to think about how does our application in its experience warn them perhaps that this is about to happen even if the browser doesn't and make sure that they know that experience is about to happen for them and that they're ready for it. And with screen sharing too, similar things there, you've got the permissions bar that's gonna be displayed on their laptop, reminding them that they're sharing their screen but we also need to think from an experience perspective, especially if you're balancing that screen sharing with video streams going on at the same time, how do we make sure the user has a great experience with both the video and screen sharing aspects of the application? So there's a couple of terms around user experience that you should be familiar with and these are the themes of anything in how people are gonna interact with our WebRTC applications. First there's this term called micro interactions which is the spit and polish of design, the little touches that users don't necessarily notice but make a big difference in how positive they feel about their experience, how easy it is to interact with something, how intuitive is it. Then there's information hierarchy which is all about the structure and the placement of controls and the implicit priority of those controls based on their placement on the page. And then finally feedback which as engineers we might think about audio echo, that sort of feedback but here we're talking about feedback to the users of making it obvious to them what's happening, what's going on, what we need them to do, that sort of thing and how the connection is going. Okay, so first thing let's talk about is a pre-call experience and this larger screenshot here I had a very early morning call today on Hangouts with people on other continents much later in the day and thankfully Hangouts gives me that nice video preview reminding me that I was not ready to go on camera at five o'clock this morning San Francisco time. So it caught me, it saved me, although now I've shared this with you so that was foolish but this morning it did save me and reminded me before I go into the call you're going to be on camera, this is what you look like right now and if you wanna change that at least pause your video before you go in or mute yourself, okay? That's really important. Once I was in the call that I had this morning the other participants are not only displayed there but a couple of interaction patterns here I really like about Hangouts is you've got the name of people so that you're reminded who they are and then for me at the time I grabbed the screenshot I've got the little green dots vertical bar reminding me of the volume of my voice and it's a visual indicator to me that they can hear me and that I didn't leave the mute button on, right? So those are some really important interaction patterns to keep in mind in your applications. Typically in one-on-one video chats you have this pattern where you have the main speaker in this case our UX lead Mariana and then my camera is down in the smaller square in the bottom right, a typical pattern that you've got the person you're speaking with and then you in a smaller window and I think that that's a really important pattern for us to keep following, however keep in mind that it can also be distracting, right? After I don't wanna get too enamored with looking at myself but at the same time I do wanna make sure that I see that I have the right camera on that I'm sharing with the user if I have multiple cameras attached to my laptop as well as that the placement of it is there that people can see me. But a pattern that I'd like to see more often in these applications too is the ability to perhaps move the view of myself around so that it's not interfering with the video stream of the other person and whatever they're displaying or maybe even minimize it once that I know it's there. Now on the web experience when you're doing a web-based WebRTC applications the controls that you have with, did we just lose Ozzy, how do you hear? Are you here? Okay, there we go, all right. So in a web browser experience we can take advantage of the real estate that we have within the browser. We've got a much larger space to deal with so the person we're talking with can use up more of that canvas but also we can have the audio and video pausing controls can be more subtle and kind of up in the top right corner where people expect to look for them but they don't have to be real huge. Although if I mute myself I wanna make that more visible. Mobile is different, right? Mobile, some of us, myself included have very fat fingers and so those controls have to be evenly spaced, spread apart on the device and we wanna give, we have very limited real estate to work with here so we wanna give as much priority as we can to the video that we're looking at in the app but still have overlays for those controls on top of it. So those overlays are really powerful especially in a responsive or mobile layout but we do wanna use them wisely and temporarily. So in this sample from one of our applications I'm chatting with a colleague and as we're doing text messages they're appearing in bubbles overlaid on top of his video but they're only gonna be there briefly and they're gonna move up the video as we create them. So that way at least I can give maximum coverage to his video but still know what's going on in any text chat in the conversation. Obviously it does interfere with that video which is why we want those to move and we want that to be temporary. And the other pattern that I've seen too for maybe seeing a more permanent listing of the chat to go a little further back in it is to sort of like reversing your camera and have an icon that you click on that sort of flips and hides the big video, shows you the chat messages and now my colleague, Armand, his video goes to a smaller icon, a smaller space on the screen so I can focus on text for a minute but still see him talking to me, right? In multi-party chat, of course the dynamics are a little bit different. In this multi-party conversation in an open talk call we've got lots of different parties. The video is all given equal priority in there which is great but we also want to think about other settings here, right? What about giving the dominant speaker a larger video space to deal with and maybe even lowering the quality of the other video streams if they're not active participants. And then also think about what happens when in multi-party you have to deal with performance problems more often so I want to be able to pause the streams of others to improve the performance that I'm seeing or have the application itself automatically downgrade their performance. And when, if you've ever been in a multi-party call where the performance was low and so everybody pauses their video and now all you see are a bunch of silhouette generic icons, it's really confusing if you don't know everyone in that call and maybe you don't know their voices really well to know who they are so putting in their profile pictures their names instead of just a silhouette when their video is muted can add a lot to the user experience. We want to provide feedback to people both before and during the call about the performance that they can expect from the conversation so that they know what's coming. So in this screenshot on the left it's a tool that we're building where think of it like live support, right? So you're looking at people who are on a website and you're maybe going to start a conversation with them but if I can tell in advance that they have a really bad connection I don't want to even reach out to them I don't want to try and talk to them because it's not going to be a good experience for them or if I can see that they have a mediocre we just have kind of like good, okay and poor ratings of their connection. I can see it's mediocre maybe I still want to talk to them but the video is going to be paused we're going to leave that part of it out, right? So you can anticipate that before the call starts and then in the screenshot on the right here in FaceTime I does a nice job of when I was chatting with my son and he's walking around with his phone the connection goes bad his video's automatically paused and there's a message that tells me that but now it's given priority to the audio but that message is important too so that I don't just think that his video has frozen and the connection has died it's telling me the video will resume automatically when the connection improves, okay and but I can still have the actual conversation with him and although we want to get like you know we talk sometimes about how quickly can we do the signaling process and get people into the conversation and reduce that delay into the conversation sometimes we might actually want to delay that that conversation, the connection just a little bit to test the connection first so with a call quality check in the beginning only takes a few seconds but tell the user what you're doing we're checking your bandwidth right now and so then in this example that we built it will tell you if the quality is too poor for your video it's gonna tell you you failed the video check but you're gonna be put into the call with audio only the important thing here is providing that feedback to the user and making sure that they know how it's going to improve their experience in the tool maintaining focus that information hierarchy part of user experience is really important always think about what is the most important thing that somebody cares about right now so in this screenshot Mariana and I are talking and one of us is sharing our screen that's obviously the most important thing and also where the most details are so that gets the biggest video panel of course, right but we still wanna see what's going on in our video tag so they're gonna get some smaller profiles here and in this screenshot it's kind of like a typical pop-up window but we also wanna make sure that we've got controls in there so that the person watching the screen can make it full screen bigger than the browser so that they get as much detail as possible and you could still leave the video overlays in there in that full screen mode too or maybe be able to move them around like I was saying earlier and then coming to the end here I wanna emphasize the difference between web and mobile which sounds obvious but just making sure again that you think about how everything about the interaction matters even more with mobile than it does with a web application because of that constrained space that you're dealing with so this application not one of ours but called GV I like how here as I'm loading the conversation with someone I'm seeing a video preview of myself so I've got that opportunity to see if I'm holding the phone in the right place for them to see me but also I'm seeing the profile picture and the name of who I'm calling the connection status as I'm calling all of that is overlaid on top of my video feed and then following standard mobile conventions which are really important on mobile devices that you're doing when somebody expects to be able to swipe something where the buttons are of the standard convention in the standard place like any other call on that mobile device so in summary the key points here are that WebRTC is much more than just a video tag the VoIP side of things is fascinating and is amazing what we can do now and bring into web and mobile experiences but just getting that stream from get user media and putting it into a video tag is not enough we've gotta think about the full user experience around that the information hierarchy of how we structure the application, the feedback that we give to users the micro interactions that they have with our device that will make all of the difference in whether people really love your application because no matter how good the technology is behind it if they don't have a good user experience they're not gonna be impressed with the engineering so that's my message to you today keep that in mind as we talk about a lot more really fascinating speakers and technical topics today which I'm looking forward to as well but always remember on top of all that make sure that the user experience is awesome thank you very much that was great thanks Aaron we have two minutes for any questions anybody have any questions for Aaron don't be shy we have a nice and convenient Hi, do you have any suggestions for solving UX problem during screen sharing because when you screen share you lose the context of the web page where you have your WebRTC session and you go to another app how do you solve this? Well the challenge I think with screen sharing is that as the developers one of the interesting challenges there is we don't have control over what they're going to share with us if they're gonna share the whole screen or only a particular window and so I'm not sure of a way to force that necessarily and then once you're in that screen sharing in the use cases where we've used it it's important to still see those other video streams and have that level of connectivity with it but in terms of maybe what you're referring to is like if I've got the screen sharing maximized in full screen mode but we're still having a text chat or other data is on the page maybe streaming in that we wanna see perhaps we can overlay that on top of the screen sharing video area that gets maximized so that when someone brings the screen sharing full size they've still got the focus of presumably that screen is the most important thing to look at but that other information on the periphery can still come in in temporary scrolling sort of fashion somewhere slightly out of the way so that we can focus on the screen itself. Does that help? Any other questions? Wait a second. There are more than 20 participants in a application and we have an active speaker and when the active speaker changes so we get issues regarding rendering of the video content so how do we handle that? Say like we have, we get like OSNAP issues and DJM issues, we face it every day so how do we handle those things? Do we have any debugging tools in Chrome itself or are we planning into that? Yeah, it's a really interesting problem and probably a better question or you get a better answer rather from someone more from the voice side than myself but that I think comes down to what is the video implementation of how you're handling all those streams and feeding them out to the participants and I think from purely on the interface level and the interaction design level you can kind of go one of two different models maybe more. One is let some sort of a media controller that is not pure WebRTC but something in there that is aggregating streams for you handle that to reduce the load. Another is try to do detection around the audio and then maximize the video based on that and play with the video streams based on that sort of the content that's coming to you in real time or the other is if it fits within the use case of that do all 20 participants regularly participate or can it really be more of like a few to many architecture where you have and this is one we've done a couple of times where you have a few people who are gonna be constantly talking like speakers and organizers at an event and then other people like asking questions who are going to come in temporarily and so maybe they don't need an actual pure to pure WebRTC connection that entire time they can come in and off of the stage as needed to ask a question to debate a particular topic but then they go back off the stage and everyone else in the room doesn't there doesn't have to be 20 or 300 connections there can be a broadcast of the few important connections to everyone else so I think that's the probably the most if you use case support that that's a great way to handle it.