 Good morning, everyone. My talk titled Building Reproducible Python Applications for Secured Environments. Before anything else, a little bit about myself. So I work in a US-based nonprofit NGO, what we call. The name of the foundation is Freedom of the Pace Foundation, where we try to empower, protect, and defend public interest journalism. I'm also part of various other free and open source software projects and groups. One of them is Python Software Foundation. If you know about Python programming language, this is the foundation behind the language and the community. I'm one of the core developer of the language and also director at Python Software Foundation, and also core member of the TOR project. TOR project is another free software, which also projects plus an open network, which helps to protect your privacy on internet. So if you came to this talk thinking, I'm going to talk about something new tool, new excited thing. That's sadly not going to happen. So in this talk, I'm going to explain how we are using the already available tools and materials to make sure that we can protect and build our applications, which are written in Python, for a more secured environment. So throughout this talk, I'm going to start with the very basic examples of how Python applications, what we do in general, we means every one of us. And then from there, we'll slowly start building up like what is a secured environment in our case and what exactly we are trying to achieve here. So I'm going to take a very basic example of a Python code, print hello world. It's supposed to print some code. And generally, the people who are developers here right now attending, for us, when we write code, we say that everything is fine. It works on our laptop. Yeah, not any more desktop. It works on our laptop these days. So that's the point when we say, code done, work done, project done. But we all know that the organizations or anything big other than my personal use, that never works in the same way. So for the same amount of code, we might think that it has to go into one particular server. It may happen that we are using multiple servers. And the thing what we call is this cloud. So we might be using a lot of servers, different systems altogether. And how we deploy the final code basis. And every person has their own opinion, which is perfectly valid and OK to have opinions. For example, some people will say that they want to use directly from the source code level. Git is the current example which almost everyone is using. So they deploy from Git. If you are packaging your application for any particular distribution, you may want to use a Debian package for Debian-based distributions. If you are using in the Red Hat land or Fedora or CentOS land, you will be using a RPM-based distributions. And knowing the kind of people who are in, again, all of us in the room, you might be thinking, wait, that place. There is something else which all of us are using these days. Sadly, again, this is not the talk for that. We are going to come to the very basic tools and we'll talk about Python applications in a certain way. So before the actual application packaging and things, I'm going to talk a little bit about this project, the project where I work on. It's a free and open source project called SecureDrop. It's an open source whistleblower platform. That means this software getting installed in various media orgs, let's say New York Times Intercept Guardian. And they run their own servers inside their organization. And then they expose an onion service like this, which you have to visit via tower browser. And if you open up any of their site, you will see a different logo. But everything else will exactly be the same. And you can click on the submit documents. Or if you already submitted something, you can come back and check this website. You can submit some more. You can send out some messages. That's what sources do. The whistleblowers, they can come to this website and they can leak information, information which are really important to, let's say, different big organizations, governments to hide. And what happens to the journalist side? They have a complete different application to log in. And after that, they will see something like this on a particular laptop inside a locked room. And that laptop runs Tails operating system. And you can see this almost looks like an inbox where you can download these files. You can see red, unread. You can actually reply to a particular source. But the problem for the journalist at next step is that these files are all encrypted with Guru PG. And the secret key is not in the same box. Means the journalist can download the file, but they cannot view what is the content of that file. So for that, they have to use another device. Let's say a USB drive or a CD-ROM CD to put that file in and take it to another second computer, which is air-gapped. That is, it's not connected to any network. And only in that computer, they can view, open and view the actual content. And this whole process takes a lot of time. When we talk to the journalist on an average, it might take 45 minutes for them every week to go through the submissions. And not all of them might be the actual useful thing for them to work on. So that brings up the question between security and usability, UX, because we want to be very much make those journalists and the sources to be secure, correct? Because as we all know in the modern world, if you try to leak any information which may expose any big organization or people in power, they will come after you. And at the same time, we have to think about the usability part. If just trying to read one email or one document takes 45 minutes, you may not want to do that every day, isn't it? So to actually solve the problem, the idea is always to get professional UX help. In our case, we got more funding from different other projects. And someone named Nina, who really helped us to build up the whole UX research on the project. And what we exactly have now is looks like something like this. So the thing on the left is a new desktop application which we are working on. It's in alpha level. Alpha release has been done. You can see it looks like a normal standard app. It's written using Python PyQt, the Qt library and the Python bindings. And journalists can use these to log in and view all the material. But the big thing is this thing is being deployed inside a particular operating system named Cubes OS. Cubes is an operating system which is being claimed as a reasonably secure operating system. It's a Linux-based system. The base OS is Fedora. And on top of Fedora, it uses something called Zen, the hypervisor, in such a way so that most of the applications you will execute or run on this operating system can run on separate VMs. So you can have three or four different Firefox running in separate VMs. And those VMs cannot talk to each other easily. I mean, you can allow, but it will require a manual option. Like, you have to click something to say that this VM or this app requires SSH access. And if only you give it SSH access, then only it will be able to allow or use SSH. So that way, Cubes OS is actually giving us an option to move out of that whole big 45 minutes tapes and come into something which takes less than five minutes for a journalist to view the same file. And keeping almost the similar level of security. At the same time, what all things can go wrong? So few stories from the last few months and or rather last few years. You may already heard about the story where people got access to the ASUS signing keys. And they managed to sign their firmware, which got into all the ASUS laptops and systems. And that's one way of making sure that your malware gets into people's land. If you look into the programming languages, we are not doing much better anyway. In the popular JavaScript place, NPM, some people tried to put in another malware where if you download this particular library, which used to get downloaded like two million times in a week, I guess, you will get a code which will try to do Bitcoin mining and trying to steal your wallet. And exact same kind of code were pushed into Python, PyPI also. There are people who tried to use something called typosquatting, means a name which is almost similar to the real project name, maybe one or two character different, which you may type wrong. And they tried to push code which will try to steal your Bitcoin addresses or wallet address and details from your Windows boxes. So people already tried that and we know that this is happening. But this is not the only thing. Back in 2013, this gentleman, Mr. Edward Snowden, also leaked a lot of documents about NSA and CIA and which gives us a lot of information how the nascent state actors are working. And in this particular tweet, he's talking about another document which is about eye-hunt CIS admins. Do we have any CIS admin DevOps people here just to raise hands if you do have any kind of access to computers? So here is a screenshot of that. Still, officially, maybe I don't know what is the status of the document. But this explains how a particular NSA employee is boasting the fact that that person goes to different conferences and different places and try to steal the access from credentials from the CIS admins. They try to particularly target the admins because if you get into an admins network and admins credentials, means you have all the access everywhere. And most probably not NSA in this room, but maybe we'll have the similar friends sitting inside the room beside you or behind you at this moment from our government. So thinking about the threats, we can actually narrow them down into four points. The source can contain malware, or maybe the source has been changed because we all are using a lot of open source free software projects where the source being written by someone else, they upload to a particular place you download and use. The binaries might be replaced, which we know that can also happen. If it is an essence state actor in your threat model, it may happen that they will try to do man in the middle attack. Or if it is a country like India, we know that our ISPs try to do the similar things. I'm not going to take any names of the companies, but yeah, like you're telling. Then there is storage and web server compromise, example where people actually manage to get into your application server, and they try to change things the way they want. So these are the different threats we tried to mitigate through our process. So the truth, the biggest thing which we found for us is that to review all the dependencies, we wanted to be very sure what all things we are running against. And we want to make sure that those code doesn't have any kind of problem. We all get all the latest updates, correct, latest and shiny tools. And we want to make sure that those tools or latest source code update doesn't have any malware or any changes. So because we are a Python place, as I said, the project, we use different Python tools to create and make sure that our project works. We stopped using this particular tool, by the way, the ppnb, but we still use the requirements.txt file, which will look like something like this. If you are coming from a Python background, you will see a file which actually tells you all the dependencies you have for your project. Let's say here it's written we need arrow0.12.1 and idna2.7, the project versions. This is how generally the Python developer starts working. And anyone can use this file to regenerate the particular environment where they can work on the project. But we do not want this. What we want is we look like something like this. With the project name, the version, and a particular SART 256 sum, which is something we build of a binary pile, which is known as will in Python. So Python source code comes generally in .tarzz, tarballs. And then from the tarballs, you can create a binary package in Python world, you can say, which are known as wills. So we want to build our own wills. And we want to make sure that there is nothing wrong with that will. And we also want to pin down those hasses so that this application can only be built and run against these hasses. This is the final thing we want. So now to achieve these, what we have is makefile and few bascripts and few Python scripts too. And you can imagine or understand easily that we have people who are doing sysadmin and other encodes DevOps work for a long time because we have makefiles for things. This is what we call the makefile targets. This is one particular target which builds those wills for us. If you see these four lines, these are the actual four steps which is being done. And before this, what it tries to do is it tries to download all the already built wills. Fetch wills is another step. And after that, it tries to verify all the certificate sums and then tries to build any new wills, build and sync. I'm going to actually go down into these steps now. If you are using pip file, so pip file actually has another part called pipfile.lock which contains all the source hasses. So you can get at least the hash or the certificate sum of the source package in pipi.io from where you will download those source files. So we can actually use a command called pip3, or pip, which is again the Python package manager, which will download only the source because here we are saying hyphen hyphen no hyphen binary and making sure that we are using those hasses from the upstream to download the source starballs. And after we have all the starballs locally, what we do is that we again call pip with another command, which is to create the pip wheels of all the local source starballs. So this is the way you can build your own wills on your system. And these wills will have a different certificate sum than the ones which are built by the actual upstream author. And then I can store those values into the requirements.txt. So in future, I know that this particular requirements.txt file has all the shaft 256 sums of the packages which we built. It's done by us. And when a Python application developer wants to release the source code, they run a command like this, where they say that let's create a source starball for ourselves. This is the command for the same. But at this moment, you will ask me that I'm talking about all these building wills and using source from upstream. But we never check the source code, correct? We're just directly downloading and building it. So what about checking the actual source code for malware? So we use a tool called Defoscope for that purpose. Defoscope, this is the web page. I took a screenshot from there, but it's a command line tool, too. And you can run Defoscope against two starballs, two zip files, or two different directories. And it will tell you what are the dips, what are the changes happened in a source code. And our current mandate is that we need at least one human beings to verify and going through the whole source code change. So it's a human person, a human. Actually, it's no AI, no fancy thing. A human is going to look into the source code and going through all the dependency changes. If there is any malware, anything which looks suspicious, like if any code is trying to connect to calling a socket call which is not supposed to do, or it's trying to open a base, or the standard things. You try to do to get privilege escalation. We tried to do it with at least two human beings. And then we found it's very difficult to go through thousands of lines of verification done by two person. And it's a waste of time. So we came down to one person. So after we actually did all those tapes, we stored those values not only in the requirements.txt, but also in a plaintext file called sr256sum.txt, where we are storing the wind file names and the source code file and the corresponding sr256 sums. And they are also signed with our GPG keys. That means we are here trusting GPG or the GNU-PG so that we can in future verify if anything change or not. So at this step, we have those source files. We have the binary wheels. And we want to make sure how to make those available for anyone else, let's say anyone from the audience right now, use. And it doesn't require any fancy technologies. In Python land, we can just create simple index.html files. If you remember the technology called HTML, we can use those. And it will help you to make it available to the world. And there is a document from the Python developers called 503. If you want to learn more, you can just search these terms. Otherwise, you can just skip it. And our index file will look like this. This is actually from the live production a long time ago. So as simple index or rather HTML file, it can be. It has names of the projects and just pointing to the directories. And inside of each directory, you will find links to the tar balls and the wheels. That's it, done. You can do it by hand. You can write some scripts in Baz, Python, Ruby, whatever your favorite language and get it automated. And this will actually, you can serve the static files from any server. And they'll work as a mirror or your own Python repository. And again, up to this, everything is nice. But how are you going to deploy all of this into those cubes OS laptops? And we're going to use Debian VMs for our particular use case. We know there are many people, at least in the Python land, who still loves to live dangerously, which is also known as Asudo, pip install, hyphenr, the requirements.txt. So what this command actually does, it installs all the Python module dependencies as root into a system path. That means you may just destroy your Linux computer by doing this, which sometimes work. And even in secure drop land, we did that before. But there's always a chance that you'll break your computer. So Python also has a feature called virtual ENV, which we use regularly to try to develop or create the similar kind of environment everywhere. And we can try to package that instead. And luckily for us, we find out that we are not the first people to think about it. Many other companies and projects thought about that. And Spotify was one of them. And they came up with a project called DH hyphen virtual ENV. So the DH part of the project name is actually the thing Debian uses to create their Debian packages. That's the latest way of creating Debian packages on a Debian system. And what Spotify added is a thing or kind of scripts and commands on top of it so that you can install and build and install a Python application in Debian. Using a virtual environment and then package it as a sum inside a Debian package. So there are more than 1,000 projects which are using this particular tool. This really helps any third party people who can or wants to update their dependencies of their own. So again, Debian packaging means it will require a rules files which comes down to a Mac file. And this is what we use. So you can see it adds this new thing called Python hyphen virtual ENV. And this is the index URL we are using. This is where we publish all our source star balls and verified wheels. And this is a public URL. That means you can actually open it right now. Dev hyphen bin dot ops dot secure drop dot org slash simple. That's the whole path. And then this file also contains a few lines down. Fine commands and a particular thing again coming out of DH. So these last four lines help us to build something which is known as reproducible builds. So this is a way you can actually verify the final binary package you are using. Let's say Debian, Debian package is exactly the same which we claim it to be so that you can go back to the steps and you can rebuild the exactly same bit by bit. Reproducible hyphen builds dot org has a lot of different groups which are helping out together. And I just learned yesterday night that FreeBSD as an operating system is reproducible over 10 years now. So it's not a new concept. It's happening for a long time and they did it pretty good. So we wanted to build this kind of reproducible builds. So as a final user of our system, you can verify that you are installing the right thing. And all of these things together helped us to have a secure and usable system which people can still use. And we can be somewhat like assured about people's average security because our end users are journalists. They don't have time to make sure that everything is updated, everything will work properly. We have to take care of that extra part. And if you remember all those four threads we talked about, we can go through now how we try to mitigate those problems. The first problem, source containing malware or changed. So we do the human review of the source code changes and then we also store the SAR 256 sums SAR 256 sums with GPG signature. So nothing can be changed. And one thing about GPG signs in our case is that none of our keys are stored in on a computer, on any network connected computers. So that also means that it will not happen that one can so easily get into a machine or a system and get access to our signing keys. Binary replaced with malware again because we store the SAR 256 sums of the binaries too and we build our own wheels. So we are not downloading anything randomly from internet. So we can trust that place. Man in the middle attack, again, HTTPS and SAR 256 sums. Storage and web server compromised. GPG keys, no keys to sign our final packages. So you know that you can sign the whole tag of the source code or sign every commit. Or in our case, the app repos are signed for Debian opening system. And because the keys are not present anywhere so that anyone can break into those computers and access, so they cannot reproduce or try to fool the opening system. And obviously the final thing is the reproducible builds which we are really happy that now we are being able to do that such a way so that people or anyone who wants to verify the system, they can. There are a few links. Everything is under github.com slash freedom of press. I think if you just go there and all our project, all our work under free software licenses, so you can just go and have a look. SecureDrop.org is the actual website for the project. So you can go and see who all are using these projects and freedom of the press, website is freedom.press. And I'm available over Twitter, Kushal Das, if you have any questions, you can ask me there or here. I still have three minutes. Thank you. If you have any question for the speaker, please raise your hands. So apart from using a journalistic setting, what other places do you think a reproducible build could be of use? I mean, my general answer should be everywhere. Like if you have a internet connector or any kind of computer, you should try to have your own system as a reproducible build. So your final users can verify that what they're using. So would this say help, if someone does a source code audit of say XYZ company and then they use that code on their actual deployment, can the two be verified that what the audit was done on and what is being deployed is the same thing? Exactly. So this is the question about like if someone did an audit to obtain source code, how do you verify that you are actually running that source code? Nothing has been changed. So if you have a reproducible build, you can try to build the application of your own and you will get the exactly same artifact at the end. So that's how you make sure that you are using the same thing. That's what it helps.