 Thank you all for sharing up today. I'm going to talk a bit about my pet project called Tala That's gonna take 30 to 40 minutes, and then there will be questions afterwards So the talk is gonna be split into a couple of points We're gonna start with an introduction then we're gonna talk a little bit about the C programming language that we all love Then we're gonna walk over a bit of the architecture behind Tala Some testing and how we represent a very specific component of the software in Erlang And then at the end there's questions. If there's anything you want to sort of interrupt me with and ask a question during the talk Feel free to do so So Tala is an attempt to write a robust third-party implementation of Tor I Decided to do it in Erlang because I really like Erlang. Is there anyone in here who's familiar with Erlang? Then you can raise a hand. Oh wow. There's a lot of people. Cool. Very cool So this is also a talk to create some enthusiasm for the project So if you want to hack on it afterwards come by to me and we can talk about that It was also I Everyone who is here, I guess knows something about Tor and knows like the EFF picture with with how Tor works Does everyone know that picture? It used quite a lot and I wanted to know a bit more deep how it worked So I wanted to try to implement look at these specs see how it works See how what are the problems with designing these things and so on It's also one of those typical evening only open-source projects where I have a little time every now and then and If I have time to hack a bit on it, I do it But other than that it doesn't get too much attention because I also have a work to do So a little bit of the history it all started out at the Erlang user conference in 2015 They take place each year in the summer where I met Linus from the Tor project And he was very enthusiastic about doing a tour implementation in Erlang but didn't have much time to do so So we sat back during one of the talks and talked about what would be required to do so and how far would we be able to get in What kind of time frame? So we started the development in August 2015 we had a very simple proof-of-concept demon up and running a bit later that year and We then went to a slightly more sensible design a couple of months after that And a guy called Lasse also joined And Lasse has been working on it quite actually ever since and cleaning up some of the initial code and so on getting it to work in a More stable manner. So right now we're two active People on the project and we're of course interested in finding more active people to the project There is the official tour implementation in C and then there is a couple of other implementations as you can see here I'm not gonna go over any of them. They are Specific to the languages some of them are further than others Kaloa, which is like a company that does a lot of Haskell. They're pretty far with their implementation They have a hidden service support. I think up to that point and then nothing else I Think these sub graph people are the ones who started the archive project, which is the Java implementation Which is also pretty interesting So these are the ones you could look at the one that inspired me initially to do it in Erlang was the go implementation Which it actually managed to run on the production network and set some pretty with a very very fast relay so We set to figure out a minimal viable product to do here But due to tours and to any nature of the program We really didn't want to implement a client because the client is where all the difficult logic is it's the client that decides the Circuit path is the client that does all the crypto negotiations etc. Etc. So we settled on doing a middle relay As the primary priority and as a secondary priority doing exit notes I guess is everyone familiar with the differences between these two great Onion services we're ignoring for now except where we have to and we want to use as few C dependencies as possible That's also part of the goal because like we're gonna talk about that a bit later it was also important when we started the design that We did it so modular that We could go back and rewrite the core part of the system from scratch if we did something stupid Because there's a big chance we're gonna do that So when you do something like this you have to be a bit careful with what we do We cannot run these things that we experiment with on the production network There exists few test networks I don't know if people are familiar with Bitcoin Bitcoin has a very active test network and it's Well running and there's notes on it and it's doing work Tor has a small test network, which is only like internal people and you have to get invited to it So we have to settle with some smaller test network that we run ourselves I'm gonna get back to that when we look at the how we test the source code I Also started an email thread in August about what the Tor project saw as important aspect to think of When you start a project like this because there might be other people doing it at some point So there's already a lot of people in here who knows Erlang Erlang is a functional programming language made by Ericsson in Sweden It focuses a lot on doing Concurrency where we pass data around using messages some people might be familiar with that from other languages But Erlang is based around this extra model where you design everything using different processes that communicates throughout message passing It has one really neat feature I don't know the people who are doing Erlang probably knows them as binaries It's a way to do pattern matching from functional languages on binary structures, which is really really really awesome It was the one feature who really sold me on Erlang initially And it's very nice for working with network protocols because of that both string-based protocols But also like real binary protocols like Tor It's running on this virtual machine called beam And it compiles to these byte code byte code files that you load into the virtual machine We're gonna look a bit more at that um You code using modules in Erlang and modules consist of functions like you expose a function interface from the module You tend to module your complex objects with state as processes and processes and communicates via message passing as said before and We also use when we have a lot of concurrent processes that is doing a lot of different work That we sort of need to have some ordering of some of the events in these systems So we have processes taking care of sort of serializing the flow of data by sending a message to this process Then it can only handle one message at a time So the next message that comes after will have some kind of ordering in this system The language has very very good testing frameworks very very mature testing frameworks And it has very very rich like mocking features where we can change a function just for test and altered it's like Yeah, playing with Lego where we quickly can move things around when we do do doing some kind of tests And that's especially useful for cryptography stuff in the code Um So you have to have this special mentality when you're working with Erlang you have to see it as you're writing an operating system Which is domain specific for your work Your program the way Erlang works is that you have a lot of modules which is included in applications And you then start the applications inside this Domain specific operating system that you have built so we have also a set of applications that consist of a set of modules that were then running One of the really nice features of Erlang that many languages don't have is that you can hot load code I thought that was nice nice for tour because you know when you have a tour relay and you need to restart it You're gonna terminate every connection that is relaying through you With this unless there is a very very critical bug We would be able to keep the connections open while we upgrade this system That's also a very very nice property to have in a system Any questions about this? No, we have a community you're free to join our TALA IRC channel. It's on the same IRC network as the Bon Hack Festival We have a yeah an IRC server and an onion URL you can connect to For for for chatting and all the developers are basically sitting in there. So a little bit about C So it's hard to write complicated C code We already know that and I guess if we know that from history by now We've seen it with a lot of projects each week having severe security issues The tour demon is a very very high quality piece of C code in general It's considered that by quite a lot of people we've seen that the pvs studio is people familiar with us too There's a proprietary static code analyzer and they went over the tour source code recently and they couldn't find any bugs with it, which is Quite well done We also had that the does people know what covariate is covariate.com is also a static code analyzer They also ran a lot of tests and they gave a lot of kudos in 2009 to tour and a lot of other projects where we got wrong three Certification, I'm not fully sure what that means, but you can Google this report and find the explicit summary of it Tour and see works pretty well. We have very high. I also work on C tour and like for a living So I do the Erlang thing in my spare time and see things in my work time We have very high test coverage. That's one thing that's really really good for us So we detect things early because of the test we have Active team rotations in the core team of tour so that each week we have different tasks We need to deal with like handling new bugs handling user support And one of them is handling covariate issues that comes in from static code analyzers running around We recently joined the OSS fuss is people familiar with what that what that is It's Google who set up this fussing infrastructure where you can submit Small programs that is executed on some Undefined big cluster at Google and is doing a lot of a fussing using AFL and lip fuzzer I believe and of course we have code review as all other mature projects Everyone all code that goes in has to be reviewed by someone and we also have that Nick Matthewsson is at the end reading All lines that enters tour that get so there is like an extra safety net there in Nick So some really interesting work that started happening at the last tour developers meeting When we were in Amsterdam earlier this year was that there was a breakout session about third-party implementations And it was led by Chelsea There was a lot of discussions mostly around a thing like Tala, which is an entire rewrite of tour But also by people who wants to slowly change the sea tour into something else And we are actively now working or well Sebastian and Chelsea and Isis are working on Integrating rust into the sea tour project There is rust code already in the sea tour repository and you can build tour with rust enabled And I believe we use the rust memory allocator a sort of a test for all of it It's pretty cool and it seems to be that way. We're moving right now We are talking about new features at some point in the future. It's gonna be have to be written in rust instead of see So Tala and see so I want to write something that has a little see as possible Beam is see by nature We use lip sodium for some cryptography We use lip crypto for from open SSL or Libra SSL, but we do not use lip SSL Does people know what the difference between that is? So Libra SSL is the TLS state machine and the TLS protocol and lip crypto is just the crypto primitives So we don't use when there is issues with TLS in open SSL, we're not affected Erlang has its own TLS state machine We also had to use some small C functions for RSA key generation because that was not available From for some reason in the Erlang VM So I'm gonna jump to a bit of how the architecture is in Tala. Was there any questions to any of these things? No So we have one component called E-Naggle Naggle is this small crypto library made by Daniel Bernstein and a couple of other people E-Naggle is written by Jesper who's quite active in the Erlang community It's a wrapper around lip sodium and we use it for x25 519 Diffie Helman and For it has access to deaf you random it has like a wrapper that is portable for different operating systems that we use for a random byte sequence generation The source code is in GitHub under Jesper's URL. It's a pretty nice library. It's very well tested and pretty high quality It's used a lot by different Erlang projects Then we need to also use ed25 519 it turns out that there's different versions of this signature scheme Where they encode the signatures differently some hashes the public key into the final signature and some doesn't So we couldn't use ed25 519 from x25 519 and now from E-Naggle So we had to take the implementation from Tor Lift out of the Tor repository and make a small shim to interface with Erlang for it and Yawning was a great help for this to find out that there is different versions of this signature scheme I think it took me a weekend to figure out this Then we have Luke there is this big fear in the crypto community right now that everything has to move to post quantum Cryptography where we're secure against quantum computer attacks There is some work by Isis and Peter Schwabbe where they've made a specification for Tor To support this new hope handshake mixed with x25 519 So that if new hope turns out to be a problem Then we can still rely on the hash function that we believe is secure and x25 519 Which we believe is secure now as well We haven't really integrated it yet because there is no code for the C Tor implementation right now to support this So we're still sort of waiting a bit with how that is going to turn out for for Tor itself. Yes It's based on oh, what's it called a ring learning with error that's sort of the problem they're using In in this system. Okay, so it's like it has a pretty big keys But smaller than some of the other quantum post quantum It's also important to address the tour is trying to We don't try to prevent an active attacker who has right now a Quantum computer because then we would need to change all the signature schemes as well So it's only that we want to be sure that the data flowing in the network cannot be decrypted when someone in the future Builds a quantum computer. That's two different attack scenarios But this is mostly a fun project that I added one evening. So there is nothing yet for from Tor about this Then we have the most important library. It's just called onion. It's a small early application which binds all these other components together and exposes Nicely modular interface, which means that the big application which does all the state machine for the protocol has some utility library Where there's not really many stateful functions, which we can just use as API for wrapping out to different providers everything we do we generally Try to generalize it and lift it into the onion library and then we add make sure that there's tests for all the code in there So that is like the most important and most stable part of the project right now It's like the standard library for building Tor related applications in our life Then there's Tala itself If you're familiar a bit with Tor there's like a directory component where there's something where we work with This system where you announce your relays and clients use it to figure out which relays exist And they're voting about it. And then we have an actual onion protocol where we connect through the network using this using the onion protocol And then we have some kind of core which is abstraction It's information about uptime of the relay and stuff like that things that just needs to be generalized, but are still somewhat stateful The onion routing component and the directory component have sort of a circular dependency to each other Which is really nasty that we're trying to figure out how we deal with somehow That means the code is a bit more eggy than it's supposed to be But we have some kind of idea how to abstract it out right now by of course adding one extra layer of abstraction This is pretty much how it looks we used to not have RSA No, we used to need a sea shim in the onion library for key generation of RSA That has been moved into OTP like in the online releases So we don't have that need anymore for from OTP 20 which was released in June earlier this year But this is generally the whole all the dependencies all the applications that we have running in the Alang VM Of course ignoring all the things that comes from the Alang standard library Testing was there any question to any of this before no Can people see this yeah ish. Okay, cool. We have like classical unit testing and the Tor source code itself comes with a lot of tests That is very nice when you're working with this because you can just copy out some of the test vectors and play around with it And make sure that your code works. This is generally taken from the C-Tor implementation But for some of the components that are standardized we also add Like test vectors from the RFCs that we then include into the source code so that we just Build up some trust in the code that we're trying to to build up We also use something called property-based test where we try to generalize our tests into Testing properties instead of testing direct values We have two people familiar with property-based testing someone. Yeah, of course the Erlang people are properly You have a concept called a generator which generally is a way to Generate a random value of a specific type And then you have a shrinker attached to the generator which can shrink towards some zero value So for a list type it would be going towards the empty list for an integer value It would be going towards zero etc. Etc. Etc. Etc We use a free implementation of this there is a really really good proprietary Quick-check implementation by a Swedish company which costs like 5,000 euro a year So we cannot use that because nobody can afford that And right now we mostly use it for stateless testing with quick check You can do some really neat stateful testing where you start testing protocols We plan on using that but for now it's only these stateless stuff. We're using so a simple example of that I'm gonna go a little bit away from you We have a base 64 module Which has an encoder which takes a binary data and returns some binary data That's how like the signature of the encoder The decoder is a bit different it takes them encoded data which is in binary data and returns a tuple Which is either okay and decoded or error on a reason so the decoder can fail Then we have a validate function which just takes and some encoded data and tells us whether the It's base 64 encoded or we can assume that it's base 64 encoded This we can generalize into a property. So we say We encode we have some data, which is a random binary sequence We say we encode this random binary sequence into the encoded way data We then check with the our validator that the data that we generate is valid to ourself It would be pretty stupid that we generate data that we cannot validate And then we say that if we decode the encoded data Then we get a decoded variable and then at the end we say that data should be equal to the decoded value Does people get this sort of we're testing the isomorphic between these two things This is a pretty simple example The build tool for a long then generates a set of tests you can have it run all night or you can just have it run for 100 tests And it will try to find some errors if there's no other errors everything is good And we try to run this for everything we do a Slightly more complicated example. We have Diffie helman classic Diffie helman. It's used Mostly for the legacy hidden services right now Where we have a generator Which is two in this case and we have some prime number Then we have a function to generate a key pair a secret key and a public key We can generate a shared secret from someone else's public key and our secret key We can check if the public key we received is degenerate It has some specific specifications now with some specific properties We want to test for there which is also defined in torus specifications And we have some parameters we make the parameters a function that returns a list of two elements just our P and G This is so that we can mock it later. We can change the generator and we can change the prime number We then create a symbol generator because we're using real random data here We just generate the key pair using whatever open SSL is doing to generate it We can then say for all A secret and a public B secret and B public they are both key pairs. We've just generated If we compute this shared secret between A, S and B, P and B, S and A, P we want to be sure that they're the same This is sort of the property of DvHelman that we get these shared secret if we exchange public keys This is a pretty simple test to do This is a bit more complicated. We have to test for these G generate values So we still have the R, G and P. This is the same module We have these generate is degenerate value We now have a set of things that we know is bad values when you have DvHelman We know that all the negative numbers are bad We know that the integers 0 and 1 are bad We know that using the generator itself is bad And we know that every value From P minus 1 to infinity are bad values So now we can define a test down here that says for all bad public keys that we've generated with the bad public key generator The onion is generate function should return true Now we can generate a lot of bad keys that we can then test And again, we also test that the keys that we generate are not degenerate. That would be pretty bad Does people get that? Cool. It seems like it For network testing we use chutney. It's a pretty nice little tool that is provided by Tor You write these python files that where you say how many of a certain type of tor instance you want to be running Then you say how many of them you want to run And then you run those pretty a few steps to configure your network to start it to check the status of every node If it's there still running and you can stop it again So you can quickly spawn up a big tor network of 200 nodes and have it run on your laptop with directory authorities Middle relays clients exit nodes everything It's a very very nice tool to use Yes It is designed for tor But basically it takes a binary And just starts the binary many times with some configuration files from a template It's a very generic tool, but it's like the code that ships with it is is designed for tor Yeah, you can integrate your own stuff into it. So I integrate tala into chutney So I spawn a big network of ordinary c tor demons and then I run a few taller demons in it So that is possible This is like the most important component of the tala application We have a peer process which represents a node in the network an active node that we're connected to Because the network can block when we send messages We also have to have a small satellite process that are sort of connected to it Which has the queue of cells that we're sending This is where we sort of serialize the data that we're sending out to this node What we then do is that we have a number of circuits that we build up that we make out So these are also individual processes This means now that the circuit encryption that is happening is happening in their own processes This means that they're isolated. We can terminate one of them if we have an error and it won't affect the other ones When we build connectivity to another node in the network because we're only a middle node We never make clients that means that we always have a connection coming in which wants to connect to someone on the outside This means that we have a representation of both of them And we have a manager which sort of takes care of all the input and output of it And make sure that if one of them disconnects the other one is notified etc etc Right now we have one circuit process represented for each of the peer which is stupid because they have the same state So we're moving towards a network design, which is more like this. So we have States circuits that are mutually owned by the two processes There's some resources to look at we have the tour specifications There's the tour spec and the directory spec. Those are needed. You have to read all of them to understand the TALA code There is some really good airline resources if you want to get introduced to airline And there is of course the C tour code, which is really really easy to also dive into And get some kind of understanding of what's going on and extracting test cases and so on when you're building stuff The source code itself at TALA is available. I would really like that you don't run it on the production network That's sort of the whole carefulness thing run it and test if you want to play around with it And feel free to submit Issues and patches if you want to work on it So some of the conclusions that I found out when I started this project and especially after I've started working on the C Implementation is that it's really really difficult to write a third-party implementation of tour These specs are really good. There's really good tests But there's a lot of things in the source code that is not in these specifications because there is such a big research community And security people who are looking at tour constantly and things are evolving very fast This makes it difficult to make a very safe implementation It was a really good way for me to get a very deep understanding of how tour works Which was one of the things I really wanted to to learn initially Um, we're at the point where we can relay traffic as a middle node. We want to do exit node during this year at some point And we can run it on a small test network for ourselves and that seems to be working pretty well Is there any questions we are at the end Yes Or you noted during your last slides that It's really difficult to write this a safe implementation because there's Some parts in the source codes that aren't really in the specific case. I understand correctly. Yes Now that you've written Do you personally see that as a problem or a weakness of tour that they don't Because I can imagine if Protocol isn't really Tourist standardized not really completely on paper That researchers analyze how tour works. They have The wrong picture of the protocol and then Issues can arise Due to well basically implementation errors or source codes errors. Yes, that don't appear to exist in the protocol level I mean, I think one of the problem is the specifications will always be a bit behind Of state of the art research because the research comes out. They release papers We analyze the papers. We find some issues and then we have to go back and fix the specifications That's more the problem that you would have to I think I would have to read more text than I want to read to understand All the things that are encoded in each line in the c-tour implementation That's sort of the scary thing reading these specs is very easy That was what I wanted initially my mental image was that I should be able to Sit down read these text files And be able to implement it That's sort of a pretty sensible goal. I think But when it came to that I started looking into the c-source code There's a lot of things there with like timing and How it schedules how things are coming out and stuff like that that are important to keep Equal to these to this implementation Yes No, um the go implement. Oh, I actually think the java implementation might have some client nodes But of the middle nodes, I think the go implementation written by tom he he wrote it and Ran it and I think it broke some records and how much traffic it was actually running through He faced a problem with the memory leak between goes Copying between stacks from how go is to the c-code of open ssl That I know has been running on the production network, but it's not running there anymore He sort of left the project Yes A very good question for people who are not familiar with it erlang is sort of an old Functional language and elixir is a more modern It's a bit ruby like I don't know if it's bad to call it that it's a more fancy version And it's running on the same virtual machine. There is no reason other than I know erlang I think we would welcome elixir components in it. We we use rebar three as built tool so you would be able to use But I just don't know I actually think I like some of these things from elixir, but I'm just more familiar with erlang Cool Other questions If you want to you can come by our village. Oh If you want to you can come by our village and chat a bit about it if you're interested Cool. Thank you