 Hello, everybody. Good morning. Thank you for keying or whatever to come in here. Can everybody hear me? Can everybody understand me? Okay, I apologize because there's a lot of stuff to go through here and I've got 20 minutes, so it's going to be quite fast. So grab your coffee and kind of hold on for good luck. First of all, who knows what matrix is already? Wow, that's petrifying, honestly. Okay, well, the good news is that we can probably skip very quickly through the first bit, which is a kind of generic, hey everybody, this is what matrix is. So matrix today is the point of this talk is to compare where we are today and crazy sci-fi for the future is obviously matrix for a non-profit open standard. It's all about defragmenting communication. The idea is that we build some kind of meta network which goes and links together all of the silos out there today, whether that Slack or WhatsApp or ILC or whatever sort of bubble of communication we have today. And we link them all together with a less common denominator, decentralized fabric that liberates communication to be controlled by us. So in practice, you have a whole bunch of silo bubbly things and it could be a community on Gitter, it could be a community on Slack and ILC, XNPP, Skype, whatever. It could even be an application like GitHub, which needs to be integrated into this with some kind of bot. And you go and slap matrix in the middle as a whole bunch of servers where you have conversation history that is decentralized across all of the different bubbles. So suddenly they're no longer trapped inside say Slack instead. It kind of functions like a bit like Git, but for chat some history or communication conversation history itself. So you have bridges that go through to various different silos and then the core network is a full mesh of servers which store the conversation. You can have native matrix clients that connect to that and that's basically matrix in a picture. The whole point is that no single party owns the conversations. The thing aggressively decentralizes the conversation so that if I'm talking to somebody on a different matrix server, that conversation has to be replicated over both so that if my server goes down, they can continue communicating. And that's kind of the main core unique thing about matrix that it literally subversively aggressively stops the ability for you to form a wall garden or a silo. And obviously the conversations end up replicated over all the participants. You can use it for group chat. You can use it for VoIP signaling, bridging any kind of silo. It's kind of fun for IoT data because it acts in the end as a great big open JSON database basically where you can publish any old object out onto the web. And people can subscribe to it. It gets persisted and replicated around the place. And in some ways we like to think of it a bit like an evolution of the web itself, kind of a step towards the read-write kind of dynamics of the web that Tim Berners-Lee was originally kind of aiming for. So in the end, in the simplest case, if I want to send you a message, I do an HTTP put to my server. It fans that out to the other servers and then you on your servers will do an HTTP get to get the object. And it's as simple as that. It doesn't have to be HTTP, but that's the baseline kind of lowest common denominator, stupid, simple transport that we use, because everybody knows what HTTP is. Some people are probably thinking, why are you inventing XMPP? Is anybody thinking that? Or can I skip the next two slides? OK, we've got one, two, three, four, five, six, seven, eight. OK, we've got about 10 people saying, why are they reinventing XMPP? Our take is that we're really not as in they are very, very, very different things. I mean, critically, I'm probably the single biggest difference is the difference in philosophy and architecture, that in Matrix you have one spec. It currently is at 0.2.0, and we're way overdue 0.3.0 release. And if you implement that, you get Matrix. Now, bits of it are optional, so if you're implementing it on an IoT device and obviously you don't want to do video calling, you don't want typing notifications. So there are different bits of the spec which are optional for different use cases, but in the end it is a single consistent spec. So that's probably the biggest difference. Then at the architectural level, the primitives are also completely different. Matrix is not message passing. I'm not checking an XML stanza to you, so I'm going and taking my eventually consistent history of the room and replicating it over to you. It's the difference between doing a git push and sending an email, if that makes any sense. Also, everything is group conversation. So you have rooms and you have people in the rooms. If there are two people in the room, then it's a kind of direct conversation, but that's very much a subset of the general idea of group conversations. And also, almost end-to-end encryption is in there as a first class citizen in the whole protocol. Obviously differences that we're using HTTP and JSON, but it's not X and PP with JSON. As I said, it's got very big other differences. And finally, I guess a big sort of philosophical difference is that we're not trying to be kind of the XKCD 15th standard that solves everybody's problems. Instead, we're going and focusing on bridging together all of the other silos. So if you want to keep using X and PP, please do. If you want to use ILC, like I do, please do. The only point here is to just have a protocol really optimized in some ways for bridging. And if you want to use it natively, that's cool too. So, actual architecture kind of already touched on it. Formation of servers, clients connect to servers. Application servers are a bit like clients, but on steroids. It rips off the ILC structure where you have an ILCD and then you have ILC services like NickServe, which basically are users, but they have superpowers. Same architecture here. The application services allow you to build arbitrary functionality and bridge to other things. And they kind of hang off a server like a big client. And then finally, the interesting one for this talk at last, I'd be wanting to do this talk for ages, are these slightly suggestive looking identity servers going and sitting in the corner. And they honestly are an embarrassment. I'll talk about it in a minute, but basically they map third-party identifiers like email addresses and phone numbers and Facebook IDs, whatever, into Matrix's internal IDs. And at the moment, these are logically centralized. This one is matrix.org and that one's running on vector.im. But it's basically a single database that maps identity. The one thing you need to know is that it's only mapping identity. It's not where your accounts live. It's not where your passwords or anything live. It's literally just a naming service that says, hello, this email address maps to this person on Matrix. And this is one of the big embarrassments of Matrix and one of the things in the future that we're hoping to fix. Ecosystem is a whole. Main deliverable is the SPAC. We provide a bunch of server implementations. Synapse is the only one that you can really use today. It's Python and Twisted. It hit 0.19 yesterday. And it actually doesn't suck anymore. Honestly, it's taken us two years to get to this point. It started off as a proof of concept that was very inefficient, very organically grown, and it's then grown even more. And over the last couple of six months, we've kind of been collapsing it back down to be a sensible size. And certainly, as of 0.19, it now uses an almost reasonable amount of RAM and is almost kind of efficient. So, for instance, on the matrix.org server, we're taking about 10 messages a second and pushing out about 2,000 messages a second because the rooms are so big. And that's taking about 40% CPU fairly flat and admittedly quite a lot of RAM there. But my personal one is now running at about 350 megabytes of RAM, which you might think is, wow, relative to a little XNPP server is huge, but it's doing a lot more. We also cashed a lot more. It's not perfect. And more excitingly, and I'm not going to talk about it much, is dendrite, which is the code name of our next gen home server, which we're writing in Golang, which we're aiming to be at least a couple of orders of magnitude, like literally an order or two of magnitude faster than the Python one. And then we've got bridges that connect through to IRC and everything. Orange stuff is from the community. So we've got lots of servers like rumor and Rust being written by the community and lots of gateways and bridges and services. Then on the client side, we provide the JavaScript stack, the iOS stack and the Android stack, and this is placed into three layers in theory, the HTTP wrapper, then the UI components, and then the app itself at the top. And we have some very ugly reference apps. And then you have Riot, which is a flagship, kind of really glossy, hopefully really glossy app built on top of those. And then meanwhile, the community, you have lots of different clients and services if you don't like any of those. So what do you get? Very quickly, conversation to history, group messaging, end-to-end encryption is the big new one throughout the whole thing, currently implemented in there, there and there. Therefore, anything on top of that gets it. We're looking at how to roll it out to everybody else as well. Voice signaling, push notifications, yada, yada, yada. It's sort of parity with everything you'd expect from a communication system. So what does it look like? Very quickly, let's go and look at Riot. So here's Riot, having loaded a totally random room. Let's go to Matrix at Folstown. And that's our internal room. People comparing their panel of raisins. And this is the dark skin, by the way. If anybody has ever got upset that Riot was very, very bright before, this is now what it looks like if you turn on the dark skin and it makes your eyes hurt less. If we go to Big Room, like Matrix HQ, then this has got 6,000 people in it. And they've spread over about 600 servers. You can see that Matrix IDs here are the form, username and colon domain. So it's a bit like an email address, but it deliberately isn't to avoid it being confused with an email address or a Java ID. You can see people chatting about the thing. Read receipts coming down the right-hand side doing a funky Tetris thing. Why does Slack still not have read receipts? If you want a chat system that has read receipts, come to Riot. On the left-hand side, I'm in about 400 one-to-ones and about 400 rooms. So we're kind of pushing the limits a little bit there. But in practice, it works relatively well all the same. If I can expand up the list, you can see I'm not even lying with all of my one-to-ones hanging around there. And we have it, obviously, on Android. It's on the asteroid store from a FOSS perspective. And it's all great. I'm talking about Riot itself and the interesting stuff. Our quick stats, basically. We started two years ago. We're up to about 750,000 accounts on the matrix.org node. Now, we've got 1,500 servers that we can see out there, which take about 50% of the rest of the traffic. So at the moment it is a bit centralized to the original matrix.org node, but the community and the ecosystem itself is growing out. Much better graph is this one, which is showing the messages per day going through matrix.org. And we've filtered out all of the bridged ones going through to IRC and XMPP. So this is the pure native traffic. And you can see we have this slightly scary exponential whoosh happening, which is why we spend a lot of our lives doing ops and optimization work on Synapse at the moment to support that. So let's get to the future halfway through the talk. What are the big limits today? Identity servers already touched on briefly. Centralized accounts. And then the big ones, spam and reputation and also metadata protection. But I think I'm going to pump that to this afternoon's talk if people want to come and hear about it. So identity servers. As we said, we have matrix IDs today. These are opaque. It's at username colon domain.org. The whole point of these is they're not meant to be human-visible. We don't want people to put matrix IDs on their business cards. Ironically, I have one on mine, but that's a bootstrapping step. So instead, the idea is that you're meant to invite people by third-party IDs. So you take your address book on your phone and you look at the email addresses if you want, your phone numbers if you want, Skype IDs, Facebook, LDAT news and names, whatever the hell it is. We don't care. We just want to aggregate all of the other ones out there. And they just map, as I said at the beginning, from third-party IDs to matrix IDs. And the current solution is terrible. It's like a thousand lines of Python running in a little HTTP server called SideEnt. And it's logically centralized. This is a disaster, really, because you shouldn't be having to trust a centralized ID mapping server. Also, we've got the problem that the data is very sensitive because it's everybody's email addresses or a hash of everybody's email addresses or phone numbers, which is still quite enumerable. And it's also got sort of pseudonymous properties we want to protect. Identity mappings themselves have to be trustworthy. So we need, if somebody asserts that I own my email address, we don't want to get it wrong. Otherwise, they're going to be hijacking my account, basically. And we also have a problem of somehow needing to trust the reputation, then, of the validators. So if I am trusting some random third-party, like Facebook, to claim that my Facebook ID maps to my matrix ID, I want to know whether I trust Facebook to manage Facebook IDs or not, if that makes sense. So possible solutions. This talk is not about telling you what the solution is. It's giving you some science fiction to muse on about the things that we're looking at at the moment. Keybase is so close to this, it's untrue. Except it's still basically centralized, even if they publish the route of the mapping database that they maintain. And please, by the way, contradict me or if I get any of this wrong, because half of the point is hoping that somebody's going to come up to me after this and say, oh, well, what you want to do is this. So please do tell me. It also doesn't handle email and it doesn't handle phone numbers, which is a bit of a shame because those are the two main identifiers that people use for human communication. Something like BlockStack, which is obviously blockchain-based way of broadcasting and identities. But they have the problem, as far as I can tell, that they basically blindly trust the identity validator, that if they let you on the network and you start asserting email things and they basically say, oh, OK, then we let you on the network and if you say that email address goes there, then we believe you. Which is fine if it's a relatively closed network, but not if you're making it completely open. Then you have something like WebFist. Anybody know WebFist? Nobody knows WebFist, and I don't know why. So this was written by Brad Fitzpatrick who's a hackathon a few years ago and an indie web thing, I think, in Portland. And it specifically for email uses domain keys to assert ownership on email. It's basically decentralized webfinger, hence the name WebFist. And it looks unpromising, but it's never taken off. It even works today. Persona had some characteristics of this for email, and as I'm sure everybody knows, poor old Persona, rest in peace. Then you've got other decentralized ledgers. There's Sovereign from guys at Evernom. You've got U-Port, and Sovereign uses its own ledger. You've got U-Port built on Ethereum. You've got Stella, which is its own sort of ledger system. You've got Namecon as a fork of the Bitcoin thing. These look promising for distributing the mapping data, but they still don't really solve the validated trust problem of who the hell do I actually trust to start making assertions onto this database. You can go down the DNS route, like the good old enum back in the day on sort of class at DNS. And there's the GNU naming service as a sort of decentralized alternative to DNS. And then in the last couple of days, in fact, we've seen the community getting pissed off with us and saying, hang on, guys, can't this suck? There must be a better identity server than this. And somebody just contributed a Java alternative implementation in the same APIs called MXISD, which allows at least fall through. And it looks a bit like DNS for this sort of thing. And I'm really running out of time. So the moral of the story here, though, is that this isn't anything about matrix. Everybody in this room, especially this room, needs it. I mean, it's basically the problem of saying, hello, I've got an email address. How do I map that to a Bitcoin ID? Or any decentralized thing? Could be an X and PP address. It could be IPFS hash. I don't care. And yet we still haven't solved it. And this is a huge stumbling block to the evolution of the decentralized web. Decentralized accounts. At the moment, the rooms are decentralized in matrix. The accounts are not. Mathiormatrix.org is stock on matrix.org. It means that we're bound to DNS. It means that it can't have backup home servers. We can't even do secondary MXs. We can't migrate, which is pretty crap. Possible features. Well, we can use the identity server to treat MXIDs as a kind of indirection layer and basically just say that, hey, Mathiormatrix.org should map through to these secondary ones. Kind of low tech, but would probably work quite well. And that's probably what we'll do. Alternatively, we could go the whole hog, ditch DNS, use public fingerprints for user public keys, and then you have mappings where you keep the MXIDs, but now they're a form of third-party ID. So you're going to take your MXID, map it through to the public key, and then you use something like a DHT possibly to discover which servers are actually hosting that ID. Third one I had was spam. Unfortunately, spam is here today. And much of it is bridged from IRC. I'm sure some people here have had the unpleasant experience of receiving 2,500 room invites into their Matrix client overnight because some script kiddie on MozNet went and enumerated the room and went and did PM spam to everybody. And the bridge, usefully bridged it all through into Matrix. On the flip side, we also have a potential disaster for messages going the other way because having gone and bound everything into one network and put a simple HTTP API on it, unless we have very good reputation in anti-spam stuff, it just becomes the world's easiest way to ever spam people short of SMTP. So it comes in two different flavors, invite spam and public room spam. We can mitigate, this is mainly because we force an invite handshake for one-to-ones, but that still means that you can end up with 2,000 invite spam. And we're still starting to see idiots coming into public rooms and saying, oh, come and buy Viagra or whatever. And they're doing that both natively and big bridged it from other systems. We also had the complication that we're going to end encryption everywhere. We can't do the spam assassin trick of just going and filtering everything out. So somehow we need to be assigning possibly one solution is to assign reputation to users. So obviously, let's talk about reputation quickly. Users need to be able to filter out low quality content. Now that can be anything. It can be obvious spam, it can be offensive stuff, it's stuff they don't believe in. Who knows? Somebody they don't like, it's their ex-girlfriend, whatever. So in a typical system, this has to be morally relative. We are not in the business of doing DNS blacklists, where you say, this person is a spammer, please, never receive anything from them. Instead, we need to let people curate their own blacklists or their own gray lists. Because you've got one guy's terrorist, there's another guy's freedom fighter, one guy's spam, there's another guy's direct marketing. And obviously, just because I want to filter out certain political yabbering doesn't mean that you want to do that. So we have to do this in a way that users can visualize and curate the algorithmic filtering that's going on. So possible solutions. Well, the thing that I've been thinking about the most is basically letting every message and matrix be rateable in a very free form way. So you have to be able to filter out what's going on, what's going on, what's going on, what's going on, what's going on, what's going on, what's going on. So it's possible in a very free form way. So you can have up vote and down vote buttons on a message, or you can go a little bit further and do Facebook or Slack style reactions to them. It could be tags. So literally just say, well, I'm going to tag this as spam, I'm going to tag this as genius, and I'm going to tag this as stupid. And you can obviously see that the richer the rating, the more risk that the rating itself needs some kind of moderation system. So you want to keep it probably as the voting can go wrong. So if somebody does something embarrassing, you can even abuse that person by deliberately upvoting it so everybody can see the embarrassing thing they did. So very quickly last slide. Reputation solutions. A possible solution is that you take all the up vote down votes that you've seen and it forms an implicit social graph. So we're not forcing people to say, I'm friends with this person, but just the fact that you've gone and repeatedly down voted them probably means you have a huge UX and possible voting rings by doing cluster analysis on that graph. You can correlate them with the content in the public messages, not the end-to-end encrypted ones and help visualize reputation and that could be presented as something like 95% of users who like this message also like whatever. Then you can also look at Transitive Trust. Oh, by the way, 90% of your friends like this too, which is a useful data point even if you choose to ignore it. And critically, you have this huge challenge of letting the user curate and visualize. So 70% like it, 90% hate it. You want to do it anonymously because you don't want to be gathering metadata on who likes who on a global basis. So it's at least got to be pseudonymized and you can also have other indicators like IP address, ISP rating, blah, blah, blah, blah in there. So spam can be done the same way. You can make people spend money to talk and that's hard and might be over engineered. We don't know yet. Everybody needs it. It's not just matrix. We're talking about basically having a flexible identity reputation for the web in general. Come to the talk this afternoon at three if you want to hear about crypto metadata privacy. What's next? Lots of cool stuff which I just said. We need help. Please run servers. Thank you very much. Thank you. Any questions if we have any time? One minute and 30 seconds for questions. Quick. And answer. Anyone? Sorry? Yes, yeah, I'll go and publish on the blog afterwards. The slides will be available on the blog. You mentioned the web with webfist. Sorry, and IndieWitch? Okay, yeah. Yes, so the IndieWeb community off of looking also solutions along the slides and I've spoken a bunch to some of the IndieWeb guys. Nobody is there sort of said, well, clearly you just need to use this. We solve this already. But somebody must be. It's kind of everybody's trying to work towards the same thing. We must be nearly there. Yeah, you put on your website. Right, so I mean is that like webfinger? Basically. Yeah, so the W3C has a working group looking at basically solving this sort of thing. So you'd hope that we're going to get there eventually and repeating the question that it was whether there is a