 Okay, let's start. Let me introduce you to Dagelings Moor Graf, security officer in FreeBSD and working for the security team in the University of Oslo. Thank you. So as Olivier said, I am employed at the University of Oslo in Norway where I work in the security team and for the last two years I was also a developer on the TSD project which is what I would like to talk to you about today. I've also been a FreeBSD developer for more than 15 years and I'm currently the security officer. So the title of the presentation is Securing Sensitive and Restricted Data. So I'm going to start by defining what sensitive and restricted data are. And in the context of a university these are genetic sequences, patient records, responses to surveys, audio-visual recordings of patients and respondents and anything you care to think about. So we have, in TSD currently we have projects that work on human genome data, human genetic sequences for diagnostic purposes. We have sociologists who work on a research project regarding alcoholism and they have interviews with patients. We have psychologists who work on a research project regarding the psychological effect of the July 2011 bombing in Oslo. So they have recordings and videos of interviews with victims and next of kin, things like that. In most cases these are the usual, these are personally identifiable data, these are privacy issues but in some cases leaking those data could actually bring harm to the person in question. We might have, I don't think that we do at the time, but we might have for instance researchers working with dissidents in the Middle East or in Asia where leaking information about their research might mean that somebody dies in the worst case. So the law, I'm not a lawyer so I'm not going to explain this in detail but I can summarize this as follows. Personally identifiable data which is anything that may be connected to a specific person may only be collected and retained with the person's informed consent for a specific purpose, for a specific length of time. And in the academic world at least in Norway the way this works is that a research group that wishes to collect a certain type of data such as patient information or whatever, first have to get approval from the Academic Ethics Review Board for their research. Then they have to get written consent from everybody involved, they collect the data and the authorization that they get from the Ethics Board has a date limit and a specifically stated purpose. And once that date limit is reached the data must be destroyed. So the dilemma is that the data must be accessible to those who collected it but thank you Impress for running the animation backwards. The data must be kept under lock and key however the data must also be accessible to those who collected, that's the dilemma we face it. And we solve that by providing a fully functional working environment which I will define later within which the data is accessible but from which it cannot be extracted or at least cannot easily be extracted. So a fully functional working environment means a storage, we're talking about data so necessarily there must be storage. Databases for organizing that data, depending entirely on the type of data of course some of our users require relational databases we offer them PostgresQL and some require entirely different tools. We provide virtual windows and Linux desktops with remote access. We provide the usual suite of office software and scientific software. We have a standard package that includes things like Biopython R, we can provide MATLABS, Tata, SPSS and other commercial software on demand. Provided of course that they have licenses or they acquire licenses through us. And we also have a high performance computing cluster inside the TSD environment which is separate from our... We have an HPC cluster which is currently number 400 or something on the top 500 list. It was number 96 when it was built but... Once again data, the data originates outside of TSD and it must be brought in. And the results must be extracted again in some sort. The only direct access that we provide to this walled garden is as I mentioned earlier virtual windows and Linux desktops with either RDP or SPICE as a graphical remote desktop protocol. And those are tunneled through SSH and I will describe this in a little more detail later. So we turn off all side channels in those protocols. We turn off the clipboard, we turn off shared folders, USB tunneling and other easily used side channels. Of course let me say this right away because if I don't inevitably somebody will ask me about it at the end. We cannot, it is impossible to close all side channels. The only way to do that is to store the data on a disk, put the data in a bucket, fill the bucket with concrete and bury it in the ground. At which point we're back to our dilemma, it is not accessible, it needs to be accessible. So there will always be side channels. You can screen scrape the desktop connection. You can write an RDP client that sends key presses to display the data and then scrapes it and of course we have to draw the line somewhere. We do, we make a best effort and we make a certain number of assumptions about the users and about the attackers. We will never be able to defend ourselves against a truly determined adversary anyway. So we make a best effort. So when data has to be transferred into the system or results or whatever have to be transferred out of the system, we use what we call a data lock in the same sense. Lock here in the same sense as an air lock or a lock on a river. Which is a pair of machines which I will describe in slightly more detail later, where the outward facing machine is an SFTP server and data can be deposited there and it will be copied into the system. And likewise, users within the system can move data into a specific area of the storage system from where it will be transferred, copied to the SFTP server and they can download it from there. So a bird's eye view. We have the big bad internet traditionally represented as a cloud. So this is becoming a little bit confusing because the cloud is something else now. But anyway, the big bad internet has a cloud as is traditional. And we have two interfaces between the red zone, between the world, the internet and our green walled garden. One of them is the file lock that I just described. And the other is, I've written jump hosts in Singular here, but there are actually two jump hosts that are redundant. And the jump hosts are the main entry point for users into the system. So from the internet, I could access it from my laptop here or from my workstation at the University of Oslo or from my PC at home or wherever. I can connect to the jump host to establish a tunnel and then through that tunnel, I can connect to my virtual desktop. And depending on which services the research group to which I belong have ordered and paid for, from that desktop I can access the high performance computing cluster. I can access databases. And of course, I can always access the storage system where a certain amount, however many terabytes you're willing to pay for, have been reserved for the project. The storage system is a segment of a system called AstraStore, which is a seven petabyte hierarchical storage system, which is split about evenly, I think, three petabytes of disk and four petabytes of tape or the other way around. So it's a segment of that system that has been set aside with a separate, it has its own head, if you're familiar with storage terminology. So there's a huge amount of block storage and then there are SIFs and NFS heads, front ends that we communicate with. Right. So let's take a closer look at the network topology that we see here. So we have two space heaters in the server room. One of them is a big Cisco box, the other one is a smaller Cisco box. They're our main routers. So what I haven't shown, which is off the screen here on the left, is the rest of the university network and the internet and the world. And the jump hosts are directly connected to the routers and the external half of the data lock is also directly connected to the routers. So these are accessible. They have public IP addresses, both IPv4 and IPv6 addresses and are accessible from wherever provided you have an account and credentials and everything. And on the inside, the inside network is divided into separate, quite a large number actually of VLANs and subnets. We have a Slash48 IPv6 block that we use for everything, but as I will, as I mentioned on a later slide there, we've had a lot of trouble with IPv6. So we also have to use an RFC1918 network on the inside, an IPv4 network. But the IPv4 network is a private address space and it's not routed, whereas the IPv6 network is a public address space and it's partially routed. So that, for instance, machines on the inside can retrieve security updates, install software within certain limits. Some of our machines need to be able to access licensed servers, which are on the outside. Some of them can be proxied, but some of them can't. Things like that. So the jump hosts and the external data lock are FreeBSD10 machines. The inside of the data lock is Linux. Most of the other stuff is either Linux or Windows. The Nexus, which I will describe later, is also a FreeBSD10 machine. Prism is, it's a Linux VM that sits, so we have a separate storage VLAN. And Prism is the only VM in that VLAN, apart from the data lock that actually has. It's a management machine. We call it Prism because it can access all the data. These two machines are the only ones that can access absolutely all the data in the system without, and they do it by being within the same VLAN. So everything is routed through the jump hosts. They have a dual role. They are both routers and firewalls and also login nodes. I'll get back to that later. On the management VLAN here, for instance, we have a domain controller. We actually have two, but one of them is physical, one of them is virtual. We have an authoritative name server, which is not used as resolver. The jump hosts run unbound with multiple forwarders. So our internal authoritative name server is used to resolve our own DNS domain, and they forward other requests to the university's resolvers. Our DNS base is also exposed, by the way, to the world. We made a decision there. We made a judgment call and said that, well, it makes our lives much, much easier to have our DNS base be accessible, and the risk is not really that high. So we chose to do that. RevM is the Red Hat Enterprise Virtualization Management Node, which is a separate... So the jump hosts and the external data lock are physical machines. Most of the other machines are virtual, and they run on Red Hat Enterprise Virtualization in a Dell Blade server. The management node is actually a separate physical blade for obvious reasons. We don't want to have the management node be a virtual machine. The storage, obviously, is a rather large black box, literally huge black box, and not a virtual machine. Note that this is a network topology map, so of course this large black box is not actually located in the same room and inside, etc. It's located in a different room. There are other details that I've hidden there. There's a 10-gig fiber between... We actually use the built-in switch in the Blade Center. The Blade Center has a built-in switch with a 40-gigabit backplane, I think. We have a 10-gig connection going from the Blade Center to the storage facility and 10-gig connections going to the jump hosts to the data lock. The jump hosts have, as I mentioned, a dual role. They're routers, slash firewalls, and they're also login nodes. In hindsight, one of the things I would have done differently, and which we could probably still change, is to have login nodes that are separate from the routers and firewalls. The problem is that the login nodes have to have IPv4 addresses because they have to be reachable. Most of our users don't have IPv6, so they have to have IPv4 addresses. I can't just fire up a couple of VMs inside TSD and designate them as login nodes because they won't be reachable. You'll have to use the login nodes to access the login nodes. These are surmountable technical obstacles. They run FreeBSD 10, as I mentioned. They started out running 9.1, and over the course of two years, they've been upgraded to 9.2 and then 10. 10 was a huge relief. There were many issues with, for instance, with carp in FreeBSD 9 that are fixed in FreeBSD 10, and some of them are issues that I discovered and fixed in the process of developing this system, and some of them are issues that, for instance, Gleb completely rewrote carp in FreeBSD 10, and it's much, much, much simpler to configure and maintain now. And the login function is implemented with OpenSSH with two-factor authentication. I'll go into... I'll describe the authentication system later. Right. Multiplicity. There are currently around 45 separate research projects. I'm not sure they're all active. I say around 45, because I know that the... So internally, we have objects in our provisioning database that are called projects, and the first 10 are reserved, and we are currently at 55 or 56. So 55 or 56 minus 10 equals 45 or 46, but I don't know how many of those precisely are active or how many are reserved for future users. But it's about that, and we get new... There are a lot of people from all over the country, because the University of Oslo is with 7,000 employees, 30 or 40,000 students. We're the largest in Norway, and we provide services to other universities and colleges. This is a national service. We have new applications almost daily or at least weekly, and we've actually gotten to the point where we need to rethink the one project per VLAN model and where we're probably going to merge or to place smaller projects, projects that have very small amounts of data and they... to place them on the same VLAN but with separate subnets. So we have a slightly lower level of security because if somebody manages to get root on one of the... on a VM, on one of the subnets on that VLAN, they can, of course, change their network configuration or spoof and talk to other machines on the same VLAN, but the point is that they must be kept separate. They must be protected from each other and to a certain extent from themselves but from each other. So back to the network topology. I've removed the Cisco space heaters, and we have... so we have the jump posts here and a few more VLANs that I didn't mention earlier. We have a separate VLAN for Drax, serial consoles for all the physical... all the actual hardware. The management VLAN and the storage VLAN were shown earlier. The hypervisor VLAN, the Red Hat Enterprise Virtualization host nodes are on a separate VLAN. And we have project VLANs. We have a lot of project VLANs that are kept separate. So all traffic between each project and anything else in the system is routed through the jump post and the jump posts can monitor and control that traffic and we have a very fine grained packet filter in place. So I talked about the login function. We have a very complex identity authentication authorization system which is composed of multiple different... I've written multiple provisioning systems there but multiple IAA systems. We have a system, a provisioning system called Cerebrum which was originally designed as an IAA system but it has grown to also... it's now also a machine database. We use that at the University of Oslo and several other universities in Norway use it as well in colleges. So it's a database of persons, users. Those are distinct concepts because a person can have multiple users associated with them and machines... not only machines but also... so it can generate our entire DNS zone file. It knows about machines and we use it to assign IP addresses to machines. We use it to assign names to machines, CNames, PTR, DNS records and also roles because we can set... we can assign roles to machines which are then exported as groups in the LDAP directory which means that at the other end we can do an LDAP lookup. Is this machine part of that group and if it is then this machine will have either automatically install this software or things like that. Active Directory is used for identity and authentication internally so Cerebrum is not something that you access directly. It's not like when you log in on a machine it will ask Cerebrum for information about you. It's a database and when you make a change in Cerebrum it will propagate that change to our domain controllers or to our Active Directory and also to the Nexus which I forgot to list there but it was on the network topology map. So the Nexus is... it grew out of... we made the decision to use Cerebrum rather late. We were initially going to have our own provisioning system and the remains of that provisioning system is Nexus which is used specifically by the two-factor authentication system and also for network configuration to a certain extent. The second factor is handled by a radius server which... so Cerebrum, when you assign an OTP key to a user it is pushed to the Nexus which delivers it to the... places it in a place where the radius server will find it and then... I'm getting ahead of myself. I have a slide for this. Here is the entire... if anybody here actually knows and likes UML I'm terribly terribly sorry for my horrible abuse of a sequence diagram but anyway... so you... a TSD administrator or tech support person creates a user in Cerebrum I will not go into all the political details there is a lot of paperwork because we have to get a copy of the ethics board approval and everything and create a project, I won't go into that. So you create a user in Cerebrum and... that is immediately propagated to Active Directory and it is immediately propagated to Nexus then you set a password and that password is sent to Active Directory it is not sent to Nexus, Nexus doesn't need to know the password then you set an OTP key the secret for an OTP key which is then sent to Nexus so... at a later point there should be an arrow here that says login or whatever but at a later point the user attempts to login so we do an identity request from Active Directory that's LDAP then we ask the user for their... for a one-time code which is then transmitted in a radius request to the radius server and the radius server runs a program that retrieves the key from the Nexus database and verifies it and if it is correct it will increment the counter or to prevent reuse and then the OpenSS actually verifies the password that's actually a Kerberos request to the Active Directory server and... this line was supposed to be there then it updates the firewall it uses auth.pf to insert user-specific rules into the firewall rule set and at a later point at a later point when the user essentially logs out then the auth.pf process dies and the rules are removed from the firewall so here's some of the things that we do with the prevailing system other than... other than user management when a machine is created in cerebrum, a machine object is created in cerebrum so I type into my command prompt and I host add whatever and then I assign a role to it that says a role that's called auto-provision I assign a role to it that says for instance Linux operating system what happens is that within a few minutes there's a back-end that will create a virtual machine in rev and start the installation we use I think for Windows machines Windows machines are cloned I think I'm not entirely sure because I haven't been involved in the Windows side of things and Linux machines are pixie installed with Kickstarter every machine in the database has a role which is either Linux operating system, Windows operating system or FreeBSD operating system so that in the LDAP directory I can look up a group in LDAP called Linux operating system or Windows operating system FreeBSD operating system whatever that group will contain all the machines that run that operating system at least in theory if the information in the database is correct and I can use that there's a script that goes every either every minute or every five minutes that will traverse the LDAP directory and look at specific groups in the LDAP directory and translate those groups into pf address tables so it looks up a group for instance Linux operating system and it gets a list of machine names and then it looks up all the machine objects and gets the DNS hostname property from the machine object resolves that does a DNS lookup for that and you end up with a long list of addresses and those are shoved into a pf table which is updated dynamically so we have firewall rules based on those tables for instance we have a tool that says that Linux machines and FreeBSD machines are allowed to talk to the CF engine server for configuration management but Windows machines aren't they have no reason to talk to CF engine and I don't want somebody to log in on their Windows desktop wherever and start messing around with the or try to mess around with the CF engine server for whatever reason only Windows machines are allowed to talk to WSUS the Windows software update system for instance Linux machines are allowed to talk to our YAM proxy which is a proxy for the Red Hat network for automatic updates our FreeBSD machines are allowed to talk to the FreeBSD update and package servers things like that and that's updated dynamically so if you add a machine in Cerebrum five minutes later the machine is at least for a Linux machine is very fast the machine is created installed and firewall rules have been updated for that machine users are affiliated with projects and machines actually everything in the system is affiliated with a project that's one of the modifications we made to Cerebrum which means that we can also modify the firewall rules or have firewall rules that are specific for groups of users and that's something that we haven't it's on the to-do list but it hasn't been implemented yet but if a user on project 42 logs in then the rules that AuthPf will install into the PF rule set will only allow will only allow that user to access machines that belong to project 42 here's an example of the login process it's not really very interesting but it's just you get a banner connecting to the University of Oslo blah blah blah restricted to Julia accredited members so we had a lawyer who required us to print a banner before logging in which is fine by me I think the operative part here is that we had a research project from the Oslo University Hospital and their security people have restrictions on which machines they are allowed to use to login to our system so that's actually why we have that banner here one time code password blah I am now logged in and what I get when I log in is a screen full of text that describes how to use describes how to use it and of course you don't get a shell because your login shell is AuthPf and your home directory is var empty so it just remains open until you control C or you lose the network connection so where the rubber meets the road I've covered I think most of how it's supposed to work but we've had issues TSD was intended to be an IPv6 only environment unfortunately as we discovered after a fairly short while a lot of software does not support IPv6 still does not support IPv6 or they claim to support IPv6 but they don't do it properly Red Hat Enterprise Virtualization does not support IPv6 at all that was an eye-opener and even you can't even have IPv6 name servers in your resolve.conf on a Red Hat Enterprise Virtualization host node because they don't use the glibcdns resolver they have their own resolver which does not understand an IPv6 name server address in resolve.conf also you can't pixieboot over IPv6 it doesn't have an RFC for it but it's not implemented so the QMU bios the pixie bios in the VMs doesn't talk IPv6 we can't use Slack which would have been very nice yes Slack has limitations and issues and security issues but we can't use it because Linux source address selection is broken so we have to use carp on the inside using Slack for router advertising also found and fixed some bugs in FreeBSD's rt-advd some crashes and things issues with pf and carp there was a bug in FreeBSD's source address selection as well so the backup if you have a master and a backup node and they share an IP address but the backup node is not supposed to use that IP address when it's the backup node but it will still use that IP address as the source address for outgoing connections fixed that we still have problems with routing of IPv6 UDP packets and I think it's related to the fact that pf does some weird things with checksums and I know that the IPv6 the IPv4 code there is somewhat wrong and the IPv6 code there is very wrong so we have to test under certain circumstances and apparently we trigger those there's a PR for that that's a sign to me I haven't our state table kept filling up and I discovered that it was actually due to DNS NTP Kerberos LDAP requests boom like the script that populates the pf address tables 2000 states and the individual LDAP requests the moment I run it I suddenly have 2 or 3,000 new entries in my state table so my state table is now has a state table limit is now 100,000 states should be enough yeah IAA issues so free radius is difficult to configure correctly it's also slightly unreliable there are weird things that happen when we add a user sometimes free radius insists that the user does not exist so you get a you invalid user it doesn't even ask the back end it doesn't even try to verify the code it just know that user doesn't exist even though the user has been in the ETC SPWD whatever for days and SSL DAP is also slightly broken and that's my fault it had a huge bug and I fixed a huge bug but when I fixed a huge bug I introduced a small bug and I'm out of time I could go on for hours about all the things that don't work in TSD and all the things we should have done differently but I'm out of time so questions wait for the microphone do you have a mechanism to rate limit outgoing data transfer just to stop bulk export if there turns out to be a bug in your implementation to stop the design of the system is to stop people bulk extracting the data for so do you have a mechanism to say rate limit network throughputs for given connections to at least mitigate some of that I'm sorry do I have a mechanism to to rate limit to stop the oh rate limit okay the so when you export data at some level we have to trust our users so we have groups we have user roles as well and initially only the administrator and head of a project are allowed to export data and they do that by placing it in a specific directory and then it's copied from there to the SFTP server and they can download it from there currently there's a flaw in that mechanism because it's limited to I limited it to 8 simultaneous transfers so the system scans the directory and then it picks up new files and it starts transferring them but it won't transfer more than 8 files at a time the problem of course is that if you have 8 huge files like 8 huge FASTA files, gene sequences then nothing else gets transferred until they're done so yes that's an issue but other than that no we we have to trust our users to a certain extent and we chose to trust them there any other what you're using for OTP exactly you're using tokens you're using SMS or what's this RFC 6238 time based one-time password the same same algorithm that Google uses so users who have smartphones can use free OTP or the Google Authenticator app but I prefer free OTP which is Red Hat's version of the Google Authenticator app users who don't have or don't want to have smartphones get a UB key instead and so they get event mode instead of time mode I just wanted to ask if your jump posts are they independently or is there some load balancing between or have the users to explicitly choose one of the jump posts there's carp both in front and behind so the users are instructed to SSH to the jh.tsd.use.no which is the carp address and which will go to whichever is the master so it's a failover mechanism not a load balancing mechanism if I correctly understand the users have to connect to a desktop which is inside the perimeter so they can access the data from there so are there some way to prevent the users exporting the data from their desktops outside using some DOP because you mentioned that the files are already using SHA 256 so using DOP with this information sounds quite easy using DOP, data leap prevention tools because first of all they get a remote desktop but they can't copy paste out of that desktop they have to use the file lock to send data out and all accesses are logged so this is a case so we're back to what I said earlier we have to trust our users but there's also from a legal point of view there's a point where our job is no longer to move around but to be able to document that the data has moved around and this is what the checksum is for we log the file name, the user, time stamp and a checksum of the file and we have, I mean the system is useless if our users aren't allowed to extract the data at all so it's difficult I hope I answered your question how easy is it to leverage your implementation for other sites like other supercomputer sites at universities that have got similar requirements yeah so it's a huge system really a huge system it's not it's not really a product but there are components that can be reused and we know that there are other other institutions such as the Kungliar Techniska Högskolen the Royal Technical College in Lund who are implementing a similar system and they've expressed interest in our file lock implementation and that's actually that's a self-contained piece of software that if you have the need to copy data from A to B and log it etc that's software that you can actually build and it's written under a BSD license but I don't think the Git repo is accessible currently so I just have to move it to a different Git repo the OTP code is something that I wrote myself it's still experimental sort of it's in the OpenPam SVN repo when I initially wrote it two years ago there were no other BSD license OTP implementation except for a command line based one for open BSD's authentication framework I don't remember what it's called now there are plenty but yeah last question how usable for end users is connection to SSH and making local port mapping do you have some third party utility for everyone like Mac users Windows users which use something like P-Link we we document how to set up putty basically so unfortunately so this screen documents how to use SSH unfortunately there's no way you can't distribute so putty does not have a file based configuration we would have it uses a registry so we would have to distribute pre-generated files that install registry keys dot what is it dot reg files that install registry keys to configure putty so instead we just tell people how to do it and most of them managed to figure it out so the question yeah the question was batch files with P-Link can we use that instead I haven't looked into it I think that somebody looked into it and decided that it wasn't as simple so there's the two factor thing I'm not entirely sure of why we chose not to do that we did look into other SSH implementations than putty but we end up with putty it's not really difficult to configure so okay thank you very much