 We have Krista here, she's head of the engine development at the PEP team, so let's give a warm welcome applause to Krista, please. Okay. Good afternoon. I guess it is by one minute or so. So as she said, my name is Krista Bennett. I am the core engine developer for the PEP project. How many of you actually know anything about the PEP project? Okay. A few of you have heard about it. Typically the talks about PEP have been the sort of sales pitches we want to get to people for the project, but I am not a PR person. So instead this will be slightly more technical, it's not super technical, but what I was thinking I really wanted to do was I've worked there for about two and a half years now and it's kind of an ambitious project. So pretty easy privacy or PEP aims to give encryption to the masses and make it easy for everybody. And we want to do that. That causes us to have a number of design goals that are really, really challenging and I was thinking about myself 20 years ago when I started doing software development and wanted to think about some of the lessons I would tell myself if I could go back in time based upon what we've learned in doing this project. So what I'm going to do is I'm going to explain a little bit about PEP. It might be a little speed walkthrough of PEP because again I think a lot of people have heard the long form use PEP, it's great and it's easy speech. And then I want to talk a little bit about our architecture because a lot of architectural decisions have brought up some really unexpected challenges at least for me as a developer. A large part of our software is intended to be free because we like free software, it's good, but a lot of our users use proprietary systems and we want to be able to reach them because we want everybody to be able to encrypt and be private. So that compatibility actually causes a lot of issues and then email has been around for a long time and so you would really expect it to be super standardized and it turns out it's really not. So I want to talk a little bit about that because that's been kind of, well I won't call it the fun challenge but it's been one of the more interesting challenges for us to continuously deal with and then kind of wrap it up with some, what we've all learned from all of this working at PEP so I'll start this with what PEP's supposed to do. So PEP is a secure private messaging solution. Right now it's just for email but it's basically intended to get everybody encrypting all the time so that we can beat mass surveillance, yay. If there's one person encrypting, you know, if I'm sending you some mail, you and I encrypt to each other and nobody else does it, that stands out. If everybody's doing it, nobody's interesting so if we can get people doing it by default without thinking about it, then I think we've won something and it also makes it a whole lot harder to search for the internet for keywords. And I know grandma calls you up over the holidays or makes you go up to her room and fix your computer but she doesn't really want your tech support and so we'd like to give both you and grandma a break and letting her have a product that she can install and forget about that still does the complicated work of encrypting your email. So this looks like I downloaded it off of our web page because I did but I'm going to give you a short overview of the PEP protocol just so you can understand what it's supposed to do and then we'll get into the other stuff that I really wanted to talk about. So let's say grandma has downloaded PEP and she wants to communicate with her grandson Bob and he also downloads our product. When they both download PEP to handle their emails, on both sides keys are generated and they don't have to think about it, they don't do it on purpose, they don't have to understand what a key is. Then the first time Bob mails grandma, he sends her an unencrypted mail because he doesn't know, he doesn't intentionally encrypt anything, he's just using the product to do that and it contains his public key. Grandma gets the public key and she sees that there's an email from a PEP user and her PEP product when she reads the email goes, ah-ha, it's a PEP user, I'm going to communicate my key back to him when I reply and I'll encrypt to that key because it's there. It's not a trusted key yet because we haven't gone over any particular secure channels to verify that these keys belong to each other but at least, even if they just stop there, they would be communicating with encryption and again, not worrying about it, not intentionally encrypting. But of course, we want to be able to verify keys. How many of you verify all the keys of all the people you encrypt to? And if anybody raises their hand, I'm going to call you a liar, well, maybe one guy, okay, one guy. We want people to be doing that verification process but we want to make it easy because reading out a super long hash over the phone to somebody is something that people just don't do. And it's not that it's hard but people don't do it. We want to make it simpler, we want to make it part of the interface and part of the process. So what we do is we've mapped the fingerprint space to trustwords in various languages. Those fingerprints, this is a little bit wrong because we now use a combination of the fingerprints to generate the trustwords. But basically on each side, you say, okay, I want to, you push a button and it says I want to verify grandma's key. And so Bob and grandma get up on the phone and they want to verify each other's keys, the trustwords pop up and they read the words to each other and if they match, hey, the keys that they're intended to be, they can verify trust with one another and now from now on, they will be using a trusted key back and forth to communicate with each other. And the other thing we give them is a way to tell whether or not that trusted key is being used or if there's been a compromise. And we do that simply by putting a color at the top of the message. So initially when they send the sun encrypted mail back and forth to each other, that color is gray because we have no trust information at all. Once we're actually encrypting to one another, we make it yellow because we want people to know that those messages are at least encrypted. But again, there's no trust that's been exchanged. So it's like warning, yeah, it's encrypted and stuff, but you don't know that this is really Bob. It might be his evil younger sister who you don't want to send mail to because younger sisters are always evil, right? I don't have one, so I can say that. But once they verify things, then it's green. And if there is some sort of detected attack by PEP, we'll turn it red. And so if a bad key is used or there's a known attack that we've detected within the system, we can let grandma and Bob know that, hey, something's not right. And we want to keep it that simple. And yes, under the hood, it's a lot more. It's what most of us are doing with GPG plus, plus, plus. But for Bob and grandma, we want to make sure that this is about as complicated as it gets. But we also know that Bob and grandma, I mean, maybe grandma is technically inclined, but Bob does other things. He's more interested in other stuff. And he's not necessarily going to want to install Thunderbird with Enigmail on his Linux system because he doesn't have one. He has an iPhone. But we still want him to use it. And so one of the challenges for us is that we want to be able to reach people where they are. It does not make sense to say, well, you know what, if you want to encrypt, you're just going to have to get down and dirty, because most people won't do it. My father's a chemistry professor. He's a very bright man. He can understand the math behind encryption, but he does not want to go through the pain of setting things up to write encrypted emails to me. He just doesn't. He's got other things to do. And he's told me that. But I'd still like him to use it. Now, he uses Outlook. And I can say whatever I want about that, but that's what he uses. And that's all he's going to use. So we want to ensure that we reach people where they are. But at the same time, we want as much of this to be as free and transparent as possible. Close source solutions have all sorts of lovely problems. My former homeland is really fond of trying to slip in back doors to various encryption products. And we don't want that. And so we've developed an architecture which, for us, seems to handle our main functions very well in a way that's isolated. It remains free software, and we're still providing services for applications which are at a level that people are normally at. So let me see if I can explain this. So I am the core engine developer. I work way down here in the engine. There are two main, I guess two, I would call them two main functions. There are probably other things we do that I should talk about. But we make sure that all the cryptography happens down here, and we make sure that the males get parsed and generated and dealt with, and everything gets decrypted and put in the right place, and keys get imported. All of the crypto stuff, all of the really detailed mime stuff that needs to happen to a message, that happens down here. Part of our architectural design, too, is to separate this sort of cryptographic complexity and protocol complexity and everything else from the UI complexity. You don't want me designing your UI. Anybody here use GeoCities back in the day, Blink Tags? Yay. That's what it would look like. It would be awesome, and my son would never speak to me again. So we want people who are experts in writing applications to have an easy way to implement our protocol. But at the same time, we don't want them having to worry about the crypto and stuff. So we've separated those things and provided sort of an API layer of these adapters, which allow them to actually, sorry, sometimes I get tripped over my tongue, they allow application developers to be able to interact with our engine without having to deal with the really dirty stuff. And it allows us to rely on people who are way more expert and way better with user interfaces and design to deal with what's actually interfacing with our customers. So up until the adapter level, up above the adapter level, and so from the adapter level down, that's all free software that's all maintained by the foundation part of our entity, which is a nonprofit. The applications that we develop for money, those are handled by the company, PEP Security, and those are non-free. That doesn't mean you couldn't develop a free software client on top of the engine, but it does mean that this is how we have to make our bread. And again, we want to reach people where they are. It has some other kinds of implications, though. So because those are proprietary apps, that sometimes makes it very hard for them to interface with free software. We all kind of know this is an issue. And our decision to do things this way, to keep it free at the bottom and to make it available to everybody at the top where we're with everybody, has created really one main set of challenges for engine development. So there's that set that comes from this architectural design. And the other is simply the domain. Again, I've had emails since 1991. You'd think it would be easy to deal with. I mean, it should be standard. Everybody does the same thing. And it's really not. So yeah, it is not. So I'll start with talking about the free and compatible ideas. So if we want it to be free, we want free software. It has to remain compatible with these upper level proprietary software pieces. And there are three main challenges with that that I'll discuss. So the first is making sure we all speak the same language. And use a portable language. C is great. You can use it everywhere. So the engine is all in C99, kind of. But Microsoft does some things in its compiler that are not exactly C99. So we always have these little challenges on various systems where it's like, yeah, there is a standard. But platforms like to deviate just a little to make your life fun. We want to limit the dependencies we have, because especially with free software, you might have several different licenses you have to figure out what are going on. And we have these two main things that we do down in the engine, cryptography and mind processing. We don't really want to roll our own. I mean, rolling your own crypto, well, some people can do it. Where's Neil? He's like, hi, Neil. You want somebody who knows what they're doing to roll their own crypto. And yeah, I understand the cryptography behind it, but I am not a cryptographer. And MIME, it just turns out, is really freaking hard. And we want to limit the dependencies to the stuff we really have to borrow from somewhere else. These adapters are really nice for making sure we speak the same language. Basically, it's a translation from iOS's version of a message object into what ours looks like. And that goes from object to see to see. Or for Android, it's JNI, it's Java to see. So that's how we sort of deal with it. But it's a challenge. Not quite as challenging, though, as the libraries themselves. So for the most part, what we've had, I guess, most trouble with in terms of making things work together is getting crypto library licenses and what the proprietary guys will let us deal with to work out. So everybody knows the Apple Store is really, really nice about letting you have GPL libraries in there. As in, you can't. So right now, we have a couple of different crypto engines we use. We were using Ganupi G. Most people use Ganupi G. It's kind of a standard. But since we can't use GPGME, which is a library interface to it in the Apple Store, we have to use, we had to find some other free library that does crypto. Now, there aren't a lot of super awesome crypto libraries out there that are free. And you want to be really careful about those. And so we use NetPGP, which is available under a free BSE license. But I think the last official update of that was in 2010. We maintain a fork, which is more updated. And that's what we've been able to find that works for us at this point. And so that's something that's difficult, because then you're maintaining another project just that you can actually be able to deal with this proprietary store. Even if you have two great free software or crypto libraries that everyone's going to use, you can't really just do drop-in replacement. So we have kind of an additional, it's not an adapter level, but it packages things so that they look the same semantically as far as the engine's concerned. But it may be that one crypto engine does things incredibly differently. So for example, NetPGP does not do detached signatures on non-files. So GPG will let us do it with a string we have in memory. We can get a detached signature for that. It's something we need. NetPGP doesn't let us do that. So then you have to go into the code and figure out how to do it yourself and write another adapter. And sometimes that can be messy. One thing that is sort of, I guess one of my favorites is that if you're a free software developer, one of the great things about it is you get to decide or you get to make more decisions about how that project works. You can make, if you've got a religious conviction about a programming language or a way of doing things, if it's your project, you can do it. Your boss is not going to come to you and say, hey, you're fired. I want it this way. There are consequences to that. But you get to make those kinds of decisions. So even with really good libraries, sometimes the developers of those libraries have intent that is not compatible with yours. One of the things we try to do with PEP is to give as few prompts as possible, especially prompts dealing with crypto. Because we don't want people to have to say, 10 times, do you really want to delete this key? Do you really want to delete this key? Because we don't want them thinking about keys. But, for example, GnuPG prompts for certain things. And it does it very, very intentionally and with some, I think, force of purpose behind it. And getting out of doing those things turns out to be a lot harder than you would want it to be. Because maybe the developer thinks it's so important that he doesn't want to make it easy for you to do that. Or she, I could be doing it too. But sometimes there's also no good open source library solution, because the problem is hard. And that's sort of what I'll get into after this, which is that our domain, MIME, is really, really ugly. And we do have a free software MIME library. And I don't want to be too critical of the developer, because I tried to write one. And it turns out it's a difficult, difficult problem. But regardless, often there's not the thing available, the tool available, that you need to be able to do the development. And then the question is, do you have the time? Do you have the resources? Do you have the knowledge to roll your own? Or are you stuck with a solution that almost fits? And then you spend a lot of time trying to fit it into that box, which it's not really made for. And that's been a challenge that's sort of constant for us in this process. And I think everybody has it. But this is the first paid big free software project I've worked on. So I guess this is just my experiences. And then there's distributions. And under this, I also count platforms. So Lions and Tigers and Bears, oh my, is a popular American kid's book. This was languages, libraries, and distros, oh my. But the distributions are not usually my nightmare. But the default distributions for, and I'll give GNU PG as the example, because we had a big fun time with this. Debian's default, Ubuntu's default, Fedora's default, were all completely and widely different. And in fact, internal parts of those libraries, so dependencies that were related to GNU PG, were very, very different from one another. And it turned out they had very different functionality. And that functionality was super important to us. Some of those libraries were broken. We couldn't do anything with them. And so if somebody insults the system default, which, of course, we kind of want to let people do, we want this to be easy, it's bad when we tell them, OK, well, you have to go to a PPA to get a different version of this. Or we have to package another version and ship it with the product. So then they have two versions of the library on the system. It can be really difficult. And sometimes, I don't usually expect Debian stable to have bleeding edge stuff. But sometimes it does, because it's well tested. And sometimes it's decrepit and hasn't been updated since dinosaurs walked the earth. And sometimes you've just got a bunch of all of it. So that can be really tough. The other thing is that some distributions that we use have an installation candidate for a piece of software that we want. And sometimes they don't. And sometimes that is something that's default with the operating system. Sometimes it's a lot of extra installations. This is always a problem with dependencies. But it's become particularly messy with us, because we, again, have these proprietary platforms on top of it. So it's not just a matter of, am I using Ubuntu or Damian? It's, well, what is iOS going to do with this? And what will Windows do with it? And so this mobile versus desktop solution is bonus fun action for all of us. And sometimes I'm very glad I'm down in the engine, because while I do have to deal with that from a dependency point of view, I don't have to deal with the problem on top and answer the bug reports. OK, so as I said, there's sort of this portion of it comes largely from our architecture. But our domain is not really very easy either. So the MIME standard, if you've ever looked at the source of a mail, sometimes it'll say, like, content type message RFC822. And that's the initial standard that was released for internet mail messages, back in, well, when dinosaurs walked the earth. And it would be nice if you could just take that grammar and shove it into a parser generator and in doing so, magically get something that'll parse all of your emails. And you would also think that it would be standardized, because you've got so many different email clients sending stuff out through the internet. How can we all possibly read each other's emails if we don't agree on something? Well, it's because there's this rule, which says, if you are doing MIME, you should accept very liberally. Take almost everything you get in and accept it and find something to do with it. But what you should produce, you should produce strictly according to the rules. Well, most people don't produce strictly according to the rules. So I guess it's accept liberally and produce whatever you want. And if your company is big enough, everybody has to accept it, which is really how it works. But even if you're going to follow all the rules, it's not easy. So an RFC, if you don't know, is a request for comments. It is often a formalized statement of the accepted practices for something. So for example, for internet mail messages, it's what people were doing to send internet mail and somebody said, well, we need to formalize this, so everybody agrees on it. And then the request for comments goes to the ITF and is accepted or rejected or modified several billion times. So RFC 22 was the initial standard, but that's been superseded by RFC 2822, which has been superseded by RFC 5322, which I think the current standard. But you can't just say, okay, well, I'm just gonna read RFC 5322 for two reasons. One, it refers back to the other two repeatedly. And if it's only those three things, that would be great. Those are just the things for mail messages. But it also is one of these things where, well, let's say we've changed something between 822 and 5322. Somebody out there has got a client that doesn't implement the difference. So if something has been implemented or suggested in a standard, it is in the mail space forever. So you have to deal with what was and what is and sometimes they're in conflict. The other thing is, well, those were the standards I gave you, but these are updated by RFC 1123, each 1256, 1327, blah, blah, blah, blah, blah, blah, blah. And that's all I could get with sort of a one level of depth of clicking through the RFC. So you have to read an awful lot just to kind of understand what's going on out there. The grammars themselves sometimes seem to be in conflict with themselves. So generating your own parser and generator would already be hard. But that's not enough. That's just mail messages. But at some point, people started to decide to send weird things to each other, like videos and PGP keys and other things, other kinds of content. And that's where MIME comes in. So those are the multi-purpose internet mail extensions. And it's in five parts. It's great reading. If you've got something, nothing to do between now and New Year's, I highly suggest you avoid them. But again, these are also updated and obsolete by lots and lots and lots of RFCs. And they don't go away. If you really want to do it right, you end up having to come in contact with them. But there's more, because we don't just do MIME, we do MIME and crypto, MIME plus crypto. And I know I don't have all the RFCs for these and Neil can make faces at me. These are sort of the main top-level ones. There's one for cryptography or some encrypted messages and signatures via MIME, that's RFC 1847. It's the initial one. There's one for S-MIME, which PEP doesn't do. And then there's one for PGP MIME, or two for PGP MIME, which we do. And I'm sure there are more, because I think there's one that starts with 40 that I can't remember at the moment. So there's quite a bit of this. And this is really what it would take if you wanted to make a magic email parser that deals with everything out there. And we have a free solution. It's called lip app pan. And if you look at the code, which is difficult to understand, it's not terribly well-documented, you see that the attempt has been made really to do what I just suggested, to take the grammar and literally translate it towards a parser. And that is problematic. Because big companies do whatever they want. And sometimes customers want weird things. So for example, enterprises are concerned about security. And I'm not really sure what the attack is they're concerned about, but for example, Exchange has something called header firewalls. And that means, so when you send out an email on the internet, you know, it's got all these headers, there's the ones that we all see, if you don't hide them, that are from and to and subject. But there's also some additional stuff, like what time it was sent at and the various servers it was received by along the way. And there's something called X headers, which are supposed to be optional headers that people can use to put additional information in. And we use some of those in order to identify our messages. But header filtering is something that's used to filter out any non-approved header. And that's perfectly legitimate according to the RFC, but it makes things difficult if you wanna do anything with email headers. And I actually don't know if filtering out X headers is okay. There's a bunch of others that are okay to filter out. But this kind of thing happens. Symantec decided to rename our attachments, I think, and that causes some additional problems. And these are things that are maybe they're either unspecified or specified you're not supposed to do them and it doesn't matter. Sometimes people leave things out or they add things that are not allowed. So there's, okay. So there's two kinds of, I guess, returns. There's a carriage return and a line feed. And back in the day when you were either typing on a typewriter, you had a teletype terminal. A carriage return ends the line and goes to the beginning of the next one. Technically speaking, I think a line feed just advanced at one line. These are just characters for us in C, it's slash, N, slash, slash, R. But we call them CR and LF, so carriage return and line feed. And this is really dumb and it sounds really stupid, but it's used as a token everywhere in my message. So in order to be able to parse on my message and understand what it's doing and break it up into the right parts, there's a carriage return line feed sequence called CRLF that must be in all the right places at all the right times or things go bananas. Except for there are some clients which do what the RFC says not to. So the RFC says the body of a message is simply US ASCII characters. There are extensions to that, but this is the male RFC. And the only two limitations of the body are as follows. This is the first one. CR and LF must only occur together as CRLF. They must not appear independently in the body. So a carriage return before CRLF is technically not okay and there are clients that don't accept that and our library certainly didn't because that was not okay in the grammar specified by the RFC. So we choked on that for a long time. It's small, but trying to find that and figure out what happens takes a lot of your time. So it's technical debt you get every single time you use one of these, every time you use somebody else's library because you're gonna have to spend time figuring out why it doesn't work for you. And that means you get to know their code too. And sometimes people add things which aren't part of a standard. It doesn't mean they're wrong or even entirely specified. Now I haven't looked at the memory hole specification which is a way of removing metadata from the headers of your mail and sticking it into the body of the mail so that it can be encrypted and safely used until it's received on the other end. But when memory hole was initially specified it was very vague. And that's definitely not in the standard and there's some additional things that EnigMail has put in to be able to use memory hole which is a nice idea for protecting this metadata but my parser's going to choke on it or at the very least ignore it. And so that's one of those things again we have to go back into the code because it's not a standard thing. And sometimes something's totally standard but normal clients choke on it anyway. So if you have a really long file name or you use a non-unicode character set or I don't know, unicode is a problem. So you use a non-US ASCII character set then there's some special things that happen with it in the MIME but some clients just rename it, some clients ignore it entirely and some use the correct format for that and it turns out that a lot of clients choke on it when you do what the RFC says to do. So it's not an easy problem that's why I don't wanna bash the provider of our library but at the same time it costs a lot of time and effort to figure out why it doesn't work and then you have to go back to the RFC and go aha, this is or isn't in the RFC and that's why it's not implemented. So this is my personal nightmare every time I have to go back in here I wanna cry. What does this all mean? For the domain we in it means that MIME is hard and standards don't solve everything and my boss is a really bright guy and he's really good at generation of code and yeah, all sorts of stuff and he really wanted to be able to just generate a MIME parser and it just turns out that I think that that dream of magically generating a MIME parser may be doomed to failure because figuring out what the actual grammar is for the moment is probably impossible. So I've kind of babbled about what I do up here and complained about things and I like I said, the talk I wanted to give was more kind of a, if I could talk to myself 20 years ago when I was taking a fairly useless software engineering class, what would I tell myself that would be more useful? The first thing is that so we try to be as interoperable as possible and I think we plan pretty well for it but it's a lot of work still. So you know it's and it's not always the work you expect and I think that's something that's really important in development that you realize that it doesn't matter how well you've crafted something things will come up that you may not have the tools to deal for and you have to develop them. Using the right language won't solve all your problems because you just may have to deal with too many other languages and problems to let that be your only source for interoperability. And libraries won't solve them either. You still got work to do even if somebody else has done the hard core work for you first. You'll often find yourself at the mercy of other people's software decisions. Again for us it's, we have these two core areas we deal with and you know for MIME it's that we have such a literal parser and generator for crypto. It's really more a matter of having different takes on how you think people should use cryptography. And software use is really great but you have to pay attention to how much extra work you're going to have to put into it. And first of all I consider, I think everyone should consider therapy it's awesome but if you're gonna work with MIME maybe have an extra therapist because it's a nightmare. I hope I didn't babble everybody's ears off that's really the end of what I had to say. I just kind of wanted to give some feeling for what kind of work is going on under the hood cause we have a lot of PR that goes on with PEP but not a lot of well here's where we get our hands dirty and I'm the dirty person. I have a student out here who is laughing I'm pretty sure. And if you have any questions I'm willing to take them either about PEP itself or about my current nightmares and thanks for being here and listening. Thank you Krista. So we have lots of time for Q and A so if you have any questions there are two microphones on the room and you can line up there and while there's no one standing up yet do we have questions from the internet signal angel? No the internet is fully encrypted and has no questions. Excellent. Anybody in the room? Anybody still awake raise your hand. Okay well seems like there are no questions so then we have to thank Krista again give her a nice applause for this awesome talk please. Thank you.