 So thanks for joining us this afternoon for the RTC on Debian Track. For our final presentation, we've got a special guest. We've got Emil Ivov, the lead developer and founder of the JITSEE project, formerly known as SIPCommunicator. The project now does SIP and JABA other protocols. Emil is one of the real leaders in the free multimedia and real-time communications community. He's involved in the IETF processes. He's participated in FOSSTEM and all the other major events in free software. So I'm going to introduce Emil and go where you go. Thanks Daniel, that's a great presentation. Thank you. So today I am going to talk about the things that we have running in the JITSEE community, which has been around for 10 years now. We just had our 10-year birthday a couple of months ago. It's been quite a ride. So today I'm going to talk about what JITSEE can do and all the projects that we have spun off from JITSEE. And the things that we're currently working on, the things that we have planned for the near future, and all this. We, by the way, just got accepted in Debian. So that's something we're very happy with. We've had our Debian packages for quite a while, but now we just feel different. Well, thank you, by the way, for that. So JITSEE started out as a multi-platform instant messaging communicator. Today it supports a number of protocols that you can use for instant messaging presence. And on top of that, you can do audio and video calls and conference calls with XMPP and SEP. This is what the application looks like. As I said, it does audio and video calls. We support a very wide range of codecs. Actually, I would claim that we are the most feature-rich communications applications around. That's our thing. We do a lot of stuff. And it used to be, at some point back in the past, some of that stuff was somewhat, you know, gadget-y, somewhat useless. But today we try to concentrate on things that can really make working together remotely a very lightweight process. We actually work on JITSEE using JITSEEE. So we support codecs like Opus, Silk, J722. We even have J729 for those who would want to use it for some reason. It's just not compiled by default for all those licensing issues. We have H.264 and VP8 for video. By default, JITSEE sends 640x480 video. You can configure it to send 720p and have HTCos, which works especially well if you have a good webcam like the one that Daniel showed a minute ago. We do conference calls and know that this is not the kind of conference call where you just call into a set conferencing server and then the client just sees that it's a regular call. JITSEE can actually host conference calls. In other words, you just start 5, 6, 10 different calls to different people and JITSEE mixes the audio from all those calls and you can all talk together. You can do that on your own computer without having any constraints on the server. You can do it over SIP, over XNPP and it's going to work. One of the things that we are really proud of and that we try to work on as much as possible, we really care about is the security aspect of the application and the privacy aspect. There are several ways, several things that you need to take care of when you are talking about privacy and security with VoIP. Obviously, securing the signaling is one of them, but you then get to securing the media, securing what's actually being said, making sure that not only no one can understand what's being said by looking at the packets but even protect you against men in the middle attacks. That is if someone is in between you and your correspondent, you can still protect against them or at least know very clearly that there might be a chance for that to happen. We do that using SRDP and the ZRDP key negotiation method. For those of you who are not aware of it, it's a negotiation method designed by Phil Zimmerman, the creator of PGP and it basically comes down to Jitsie showing you the four-letter hash of the key that got negotiated between you and your correspondent so that you can compare those letters and if you have the same, just click on the button, you do this only once. You click on the button and you know you are secure and that's something WebRDC cannot claim today. I'm actually ready to argue that it's probably never going to be able to claim that given how it's supposed to work with how easily it is to just serve different JavaScript to different users and how difficult it would be for the browser to indicate reliably when communication is secure but even if we forget the future discussions and arguing between DTLS and ZRDP, today WebRDC does not have that. So if you want to be secure, this is a pretty decent way to get there. We also support other encryption mechanisms just for interoperability's sake and of course we are also going to support DTLS even though we believe it is inferior to... Sorry, am I moving too much? Okay. Even though we believe it is inferior to ZRDP, we're going to support it just so that when next time someone here asks, hey, can I go from that web page to Jitsie, we would be able to answer yes because don't get me wrong, I'm not saying that WebRDC is something I don't like, it's going to serve a great purpose for a great many different use cases and we would like to be able to use Jitsie in some of those use cases. So that's eventually going to happen. We also support encryption for chats and instant messaging. This is an end-to-end encryption method, it's called OTR, it means of the record instant messaging. And again, it protects you against man-in-the-middle attacks and it works over revenue and a protocol. So it's not something that's limited to XNPP, for example. It is not something that secures your connection between you and the server, it just makes sure that between you and the other person, there's an encryption going on and no one can listen in on what's happening and what's being said. We also support DNSSEC that got implemented actually by a Swiss contributor, Engel Bauerzaks, recently. He's currently working on a rewrite of that part so that it's going to be more lightweight and more easy to package and run reliably. But we already do support it. For those of you who are not aware of DNSSEC, that basically allows you to discover your SIP servers or XNPP servers in a reliable way so that no one can lie to you and make you connect to their SIP server or their XNPP server and then potentially ease your point of traffic or at least, because with GRDP that wouldn't be possible, but at least see who you're calling and you might not want to do that. You might want to avoid it. We support video conferencing as well. Now, we support it on GC. There are two ways you can do it. You can have GC do the bit of the conferencing server. Obviously, for that, you would need the organizer of the conference to have very decent bandwidth and I'm going to talk more about that in a minute. Or we have taken apart the parts of GC.do the video conferencing and we have created the GC video bridge project, which is obviously also open source, which is probably going to go... I mean, we are certainly going to submit it to Debbie and hopefully it's going to be there at some point. I have a few more... I would have very much loved to run a demo, but for several reasons, I won't be able to. So we have a number of screenshots here. All of these are real calls. It's just regular calls that we had while working on GC. None of this is Photoshop. This is actually one that we took and posed them on our booth. So none of these are reworked images. It's just conferences that we had. And I've just taken snapshots. We do run on Windows Linux and Mac. We support RPM and Dev packages on our website. Obviously now we are also in Debian, so that's a great thing. And we are also currently working on an Android version, which is still a little bit shaky, but it's getting there. And all of the features that are in GT are eventually going to find their place in the Android version as well. This means all the things like XMPP multi-user messaging and SIP calls and being able to look up a user in LDAP and all these things. So we have the tablet version here. Here's a chat on the phone version, the contact list on the phone version, a call with XMPP... Sorry, going on the... on a phone. We also support desktop streaming. And we do that in a very simple way. We simply capture a video of your desktop and send that over as if it were a simple video call, which is very practical because it makes it more platform. In this case, we have a Mac OS user watching the desktop of a Debian user. And you can actually even take control of the remote desktop. And that's also possible with what Jitsi. Although admittedly that feature needs a little bit more work. And because it relies too much on the server being stable right now and it could be improved in that way. We have... Yeah, that probably no one cares about it, but I thought I'd mention it because actually that helps when you want to make enterprises move to open-source software. So it's great to be able to say, oh, you know it's going to be the same... It's going to be able to do the same things that Link is currently doing or Skype is currently doing. You don't have to dump everything right now. You can start by moving to Jitsi and you're still going to be able to start calls from Outlook and they're going to go through Jitsi this time, not Skype and not Link. So I believe that's important. And we have a number of other features. Again, we work a lot on that and we support a number of codecs. We have things like native echo cancellation, depending on the platform. We have spawned libJitsi, which is a media library that one can use, for example, to build server-side software for multimedia real-time communication applications. That actually works together with WebRDC. There's a project of Telecom Italia Labs, for example, that does WebRDC to PST and interconnection and they rely on libJitsi for that. One of the things that's not here and that probably came up earlier today is Ice4J. That's a Java implementation of stun, turn, and ice. And that can also be very helpful for... for naturally versa. Even when not using Jitsi directly, you can use it with other projects or server-side projects. When talking to WebRDC and that's also something that we hope to be able to to integrate in Debian, because part of the adventure of integrating our package in Debian, and that was a great adventure, because we have a lot of dependencies. Many of them come from ourselves, but many of them are third-party libraries and being able to track them down, because for some we were just using an old version that was already doing everything that we needed to, and since then the project just died, for example, so we had to track down the sources to be able to get all that before submitting to Debian. So our first submission was really one, a relatively rough package for Jitsi, and we are hoping to now get to splitting that into several packages, like, for example, getting LibJitsi apart, Jitsi VideoBridge, so that you can get these libraries and deal with applications with them, even if you're not planning on using Jitsi itself. We have also made sure that you can use existing protocols, such as SIP and XMPP today, in a way that takes the best from both worlds. Now, I'm not sure to what extent people here are familiar with SIP and XMPP, but in general, historically, everyone has been using SIP mostly for calls, and everyone has been using XMPP mostly for instant messaging and presence. Now, that doesn't mean that the protocols can't do the rest of the stuff. You can do instant messaging with SIP, you can do presence with SIP, and you certainly can do calling with XMPP. The thing is that historically, both protocols have been used for different things, and that it can very easily feel the impact of that in servers, for example. It's very, very hard to find a media server for XMPP today, and it's also very hard to find a decent, simple instant messaging and presence implementation for SIP. So while it is really extremely trivial to get your contact list with all the avatars from an XMPP server, doing the same thing with SIP requires you to run three different protocols You need SIP, you need XGAP, which runs on top of HTTP, and you need MSRP, which you need for the instant messaging. So these are three completely different problems. You need three different servers, and you don't really have a lot of choice when you want to do that. So what we thought is that we wanted to make it possible for people deploying JITC in the university, for example. This is something that we tried at the University of Strasbourg. If you want to have a solution that does instant messaging and presence and has contact lists and can share files at the same time as audio and video calls, and you want to use specific SIP servers because you want to interconnect to the public telephone network, you want to have things such as voicemail like copicop, directed transfers, and that kind of stuff. Then you really want to use SIP for the telephony and XMPP for the instant messaging. And with JITC, now you can do so. That's something we are... We've called CUSACS, which stands for Combine Use of SIP and XMPP. It's a protocol that's currently in the last stages of standardization on the ITF. It should be an RFC relatively soon. So I thought I'd mention it here in case some of you are wondering you need to deploy a working infrastructure for real-time communication in an organization where you are involved. I'm not going to go into the details, but you can always ask me questions later if you want me to come back to that. Now, something that we've spent a lot of effort on during the last several months, actually almost a year now, more than a year, is JITC Video Bridge. It's a server-side application that allows you to do conferences for both audio and video. Here's what's special about it. Normally, traditionally, when people did audio conferencing, it always happened the same way. You call a media server for whatever protocol, you behave as if that's a simple one-to-one call, you send your media, and in return, you get a single audio stream, but that audio stream contains the mixed content of everyone else. And this is okay for audio because mixing audio is really a lightweight process, so much so that you can do it on your own computer. That's why we were able to do audio mixing in JITC and that runs on most modern computers today. The reason for this is that mixing audio is an extremely simple process. It's just adding integers, basically, if you put it very roughly. You just do additions, and there you go, you have your mixed audio stream. In practice, it's a little bit more complicated, but not extremely more so. It's certainly not from the perspective of a computer. It's maybe of a CPU. It requires some thought in terms of implementing, synchronizing, and making sure that it doesn't go beyond the borders of, beyond your ranges, beyond the ranges that you're supporting, but it's something that's very easy to accomplish for any CPU today. Now, video mixing is something that is way harder than that. And why is that? Now, in order to get video mixing, what you need to do in general is receive four different streams in a central point. So four people have calls established, video calls established to some sort of video media server. And the video media server is supposed to take all of these streams, decode them, scale them down so they would fit in a single image, re-encode the new image, and send it to everyone. Now, and that needs to happen at least 25 times a second. This is extremely heavy. If you've tried doing a regular real-time video on your computer, you've probably already noticed that it's quite CPU-intensive. Imagine if you had to do that for four, five, six different persons. So traditionally, that's how video mixing was being achieved. Yes, sorry. Sure, go ahead. Behind you. Okay, now it is. Yes. Thank you. Can you just change the schema so that the four video clients send an already scaled-down image, video stream, and then the server, instead of decoding, laying them on the same frame and encoding again, just send four already encoded video streams to all the listeners. We'll get there in a couple of slides. So that's how video mixing has been accomplished for many years, but it's always required, dedicated DSPs, and things that are extremely expensive, and that's why video confidence has been traditionally a very expensive thing. Certainly not something that you can run on your own computer. That's out of the question. What you could do, however, is exactly what you asked about, and you don't even need to scale down the initial images. You just keep sending whatever you send normally. It gets to the server, and when it gets there, it doesn't get decoded. It doesn't get scaled down. The server is simply going to take your packet and send it to everyone else. Now yes, that's going to require a little bit more bandwidth, but bandwidth today, that kind of bandwidth, is way easier to find than a CPU that can handle that in real time. We did that in GT itself in the beginning of 2012, and then we thought, well, that still requires a user, not a server, but a user to have that bandwidth. At least one of the participants in the conference needs to be able to send all these streams to everyone else. So we see we have in a four-person, in a four-member conference, we have a total of nine streams going out. I'd say that, depending on how much you move, that's approximately 200 kilobits per second per stream, so you can do the math. That kind of upstream is rarely available today, unless you're working at a university, for example. If you do that on a server, however, now that becomes a lot easier to find. There are machines out there that you can get for 20 euros a month or even less, and they can very well cope with that task. We did this XMPP extension protocol that we called Colibri, which allows you to take that part of Jitsi that did these package shifting and put it on a server. It's an XMPP component, so you connect it to an XMPP server, and the organizer of the conference controls it through the XMPP server. So here's how this works. Basically, when you want to set up a conference between several clients, the organizer just tells the Jitsi video bridge, hey, could you please open a conference for me with three different channels, three different sets of ports, and send me back your address and the port numbers that you allocated for me. The server responds, and when the focus, the organizer of the conference sets up the calls, instead of sending its own addresses to the other participants, it sends the address you got from the Jitsi video bridge. As a result, when the call is set up, everyone will end up sending their stream to the Jitsi video bridge, thinking that this is actually the focus, which is okay. And then the Jitsi video bridge is going to take care of all the relaying and making sure that everyone gets all the media. This is basically, it's as if we just export it onto a server, which is part of Jitsi, because none of these clients has to know anything about that Colibri extension that I talked about. Not that knowing about it is a bad thing. It's an open extension. We just need to write it down because we haven't been lagging a little bit, but you can always look at the code and see how it works. But maybe because you want to be able to interoperate with standard clients that don't support Colibri at this extension. It could also work with SIP, because as I said, the only communication that's dependent on XMPP in this case is the part that allocates the channels. So once you allocate the channels and get the IP address of the video bridge, you can still invite everyone over SIP and telling them, hey, I'm a male and I'm calling you from this IP address. And then the clients can start sending media there to the same situation. Question? What about encryption? I mean, now we need to trust this server, isn't it? The server is receiving the whole video in plain. What was the first sentence? What about encryption? When I'm talking with someone by XMPP, I'm encrypting between him and me and the server without the error, for example. Yes, that, yes. So currently, GC video conferences are encrypted when they're hosted on GC. It becomes a little bit more complicated when you want to encrypt the video bridge hosted conferences. This is currently working progress. Right now, there's actually someone working on that. And we should have DTLS for the video bridge within, I'd say, a couple of weeks maybe at the first version. The reason we chose DTLS here, because you remember I wasn't quite happy with what DTLS had to offer like 10 minutes ago. Now I'm changing my mind. Well, I'm actually not. The thing is that once you run DTLS gives you a good encryption how to put that simply. Let's say that in this case this entity has to be trusted. If you care about your privacy and you want to do a video conference that's hosted on a bridge somewhere you have to trust this. Let's say it's almost impossible to have end-to-end encryption in this situation without trusting this guy. So once you make that assumption that you're trusting this because, for example, this is something you're running yourself. You are your own service provider or you know the service provider and you trust them because it's another member of your community or organization then DTLS is entirely sufficient. And this is something that we're currently working on and it will also make JT Video Bridge compatible with, usable with WebRTC clients and browsers. So two questions already and both of them have anticipated something. I was just about to say great questions. Next one, please. I was wondering why we need to trust the server. We could share keys for just this conference coordinated with all the clients. How are you going to share the keys? By using one protocol. But how are you going to send that key? And then you trust your XMPP server because it can see what you're exchanging. If you're doing RTR on top of that, for example, then that's something that could work potentially but you can see that it's a very complex from a user point of view, it's a complex way of achieving things. Now let's say that multi-party encryption is on the whole in general a complex topic because you basically have two ways of doing it. Either everyone has to have a trusted connection with everyone else. Which requires a lot of signaling to get right. Or you need to trust the server. In which case Yeah? I would have to actually think about it. But isn't it possible to extend the Diffie-Hellman protocol to support three exchanging parties or more? It could be possible. If I remember correctly what happens in DH? Is there a question there? Or a comment maybe? Actually, multi-party OTR is being specified and implemented right now. Well, we'd have to look into that. The thing is that you also need to trust your server for other reasons as well. For example, it needs to make sure it needs to be able to change the media for other reasons as well. For example, you might have negotiated different payload types with your different user agents. We need to change the numbers of the payload types, the dynamic payload types that it's using between the different clients. It would need to be able to decrypt and then re-encrypt the media anyway. If you try to Well, it's something that we actually need to discuss in more detail because it's quite technical, but I'd be happy to talk about that after. If you want to completely eliminate your bridge from the loop and make it entirely incapable of changing anything in what you're saying, that gets you into a whole into a number of different issues. And I'm not saying they're unresolvable, but they're hard to get. Yeah. Hello? Okay. One solution maybe is to make simply connection like an SSH whatever, an example from each machine and then exchange one pair of keys. Then all the people encrypt with the same key all the key people encrypt with the same key but it's a two-way key. It's used only for the session and it's exchanged via a secure channel peer-to-peer. Then when it's changed, it's changed each peer connect knows the key and can make the encryption part. Sure. And the same would that actually is the same thing as using setting up different OTR channels with everyone and then you have a trusted connection with anyone and then you generate a key and send it to anyone else. But even in this case this puts us in the same situation where if you don't want to trust the bridge at all, then you are preventing it from doing a number of other things for you like, for example, redefining the payload types or mixing audio in some cases being able to change the packets. For example, in the video bridge, we still mix audio. Why? Because it's cheap it doesn't cost a lot to do it saves you a little bit of bandwidth it simplifies the negotiation and contrary to video, there are a whole lot of different audio codecs so it's a lot easier to get codec mismatches. It's a lot more difficult to make sure that there's a single codec with everyone in the conference support. So we thought we would just mix the audio. If you have an Android client, for example, joining into a video conference, then the video bridge would definitely want to mix the audio and maybe just choose one or two of the video streams that are being relayed and then change them and switch between them. So the video bridge needs to be able to do stuff with the media. So that means it has to change the packet so it needs to know the encryption mechanism that's being used. So you can probably come up with something that's going to work and never said that's impossible. It's just that it makes it harder on the whole to achieve the whole conferencing thing. If you eliminate it from if it ties hands it won't be able to help you when you need it to help you. I'm sorry if I'm not being convincing enough but we can probably get to that after and discuss it in more detail. What about graceful degradation? So you have people who you trust where you can negotiate the encryption directly and then you don't need to trust the bridge and then maybe by sort of bootstrapping a web of trust on top of the star of people that you trust directly there are people that you can negotiate through trusted parties and one of these entities could be the bridge how much do you trust the bridge? If I set it up I trust it completely maybe. Then also for every specific use like a teleconference you can say how much do I need privacy. So if I say for instance I want expediency not privacy then the client can be like okay so for this call you trust the bridge. You don't care to even have to wait for negotiate a strong encryption even if you could do it. So then you have graceful degradation of the privacy. Do you want me to take the microphone? So let me remind you something if you want to have an encrypted session with everyone else without requiring trust for anyone else but yourself and your peers you can do that today with Jitsi. You just need to have the bandwidth so that you can stream to everyone. That's already supported in that case Jitsi is being the video bridge. Jitsi does all the all the rewriting media of the payload types but that's okay because you're one of the participants in the conference so you are trusted anyway. So you can do that and in that case you have graceful degradation because you're using interface and everyone else is going to show these prisons are who they are and we have encrypted sessions with them and this last participant well we haven't been able to publish a ZRDP connection with them or her and in that case well you'll be aware of what's going on if you want to use the video bridge then you are going to be guaranteed that everything going to the video bridge is being secured that no one is able to read what's going on except for the video bridge. It's just something that it's a requirement in our design. You have to we use the gtv bridge too much and we can't get away from not trusting it otherwise the whole thing falls down but that doesn't mean that you trust your sessions with the video bridge is going to be exposed in any way you can very easily use a public private key pair on the video bridge or a certificate that you have to join yourself with your own certificate authority or even a PGP pair you would be able to do that and that's going to work. It has to be your video bridge. There's another question there. No question, okay so the last thing that I was going to say is that yes we're currently working on the security with with the video bridge we're going to start by DTLS and initially you would have to be able to trust the machine that's hosting the conference unless you want to host the conference yourself and we're also going to be adding support for trick-a-wise to the bridge so that you can use it for WebRTC clients and that's the end of the official part of the presentation I'm very much enjoying the question so if you have any more keep them coming. So I was just you know because you said before my last question I was thinking again maybe of course you can have anything if you're willing to code it but maybe you can have a situation where you have like I understand the basic situation of the video conference is processor and bandwidth intensive and me I'm not a security freak so but then there are people who are so maybe you could have a situation where you have channels going out in a star but they are not bandwidth or processor intensive so you could have for instance I could have the video coming to and from the video bridge but a control channel to every participant where I'm basically negotiating identity with them so as long as you got this control channel open with me and can identify yourself periodically through it then I trust you're the one that are there and I don't know we could even run some basic checks to try to flush out the man in the middle stuff like that and again this is if I trust the video bridge because I'm running it or one of my friends is running it who cares if I don't care about the security of the video conference who cares but there might be situations where I'm not being paranoid I don't know who the hell is running the video bridge but I still care I still care to have as much security as I can in a situation where I understand I can't have really good security so you take a look at the cryptologic objectives and you say this one no I can't do that one but this one maybe and if it takes a low bandwidth control channel then I don't see why not so again yes you can do that but in that case you would have to eliminate the bridge from helping you and in our case we are relying on the bridge doing a number of things for us we are relying on it being able to mix the audio to make sure that we're getting the video with the payload types that we expect to get and we won't be able to do that it is possible to it's very easily actually to make sure that the bridge runs with the certificate that you have signed and you're only trusting that certificate the problem is not in getting it to I should say that the problem is not in getting the security and eliminating preventing the bridge from seeing what's in the media the problem is that if you do that then the bridge won't be able to do all the things that it's currently doing and it is possible to get a simple media proxy that relays to everyone but that's not something that would fit our requirements we want to be relatively we want to make the bridge relatively easy to use with other ship agents we don't want it to be specific to GC we want it to work with WebRTC as well maybe in a single call have Web browsers GC clients and other empathy clients if they not actually that should work even with not a lot of effort I believe so we want that to be possible to happen and for that we need the bridge to be a little bit more smarter not too smart we don't want it to start decoding video and all that but we do want it to be able to change parallel ties, maybe mix video, be able to read the RTCP packets that's actually very important it has to be able to read RTCP yeah it's part it's not it's not only a security it's not only an encryption program it's a it's not an encryption problem at all actually it's all the other requirements that that come with it okay was that five minutes or was that over five minutes okay so a couple more questions oh one minute so one more question okay well we'll be hanging around there's also Jana from JT here she's actually the person who has written the highest number of lines of code in JT so a plus for her and we'll be hanging around until tonight if you want to chat with us just yeah just get by us and talk to us thanks