 The next talk is going to be by D'Amato Spinelli's from Essence University of Economics and Business. Hello, good morning. Can you hear me? Yes, okay. You next will be 15 a couple of years. This is more than most people in this room. A lot has happened through those years in the 1980s and 1990s. Can you hear me now? Yeah, okay. A lot has happened through these years in 1980s, 1990s, 2000s. In this century. How has the architecture of Unix changed? This is what I will discuss today. So I will describe the groundwork about Unix. I will describe the sources I will be using. Important architectural milestones, evolution in numbers and evolution in words. I will start from Thompson and Richie programming a PDP-11 through a typewriter in the 1970s to Bill Joy and the group, the CHRG team, hacking Unix on a Vax in the 1980s to this century's modern open source communities. Unix was born in the AT&T Bell Laboratories, an amazing place home to eight Nobel Prizes, three Turing Awards, Radio Astronomy and Cosmic Radiation, transatlantic cables, the transistor, the charged couple devices being the cameras in all our phones, the communications theory, lasers, solar cells, the C, C++ and the OAK programming languages, fiber optics and CDMA telephony. It started its life when another project called Maltix that was going on in the 1960s at MIT with AT&T Bell Labs and General Electric folded and didn't go as well as AT&T expected and so Ken Thompson, Dennis Richie, Doug McKillor and Joe Sana went off and developed a non-named system on a PDP-7, a smaller computer. Based on their work, which looked promising, they created a funding proposal for a word processing facility of all things because Bell Labs was typesetting patents to work on a PDP-11. The history of what followed was very long and very complex. Why is this all important? First of all, because of the exemplar design of Unix, its technical contribution, its impact on all of us, all of us have some device on us probably running some sort of Unix or a system developed on its ideas, the development model it used and I will describe that, it's very interesting how open source came into that model, it's widespread use and in the words of Doug McKillor, it's unusual simplicity, power and elegance. As a recognition of these achievements, developers received the U.S. President's Medal. Regarding technology, what does Unix give us? First of all, a hierarchical file system, compatible I.O. for files, for devices, for networking and for inter-process I.O. The pipes and filters architecture, virtual file systems, the shell as a user selectable regular process, anything that we can change and use. Also a number of technologies were associated with Unix, C and C++, parts and lexical analyzer generators, software development environments, document preparation tools such as TROF and its cousins, declarative markup, scripting languages, TCPIP networking all around us again, configuration management systems. They form the Unix systems form a large part of the modern internet infrastructure in the world around us. The evolution I will describe is based on three sources. First of all, the availability of Unix sources that goes back to decisions and decision by Caldero to open early sources. And based on that material, I've created a repository that records the history and evolution of Unix. I did that to record the evolution of programming style, to consolidate artifacts that were, I thought, very important to record a recent history that is fading away and to provide a place where one of us can go there and look at the history. So this is a rough diagram of what it contains. These here are snapshots of various editions, and these again. At some point at Berkeley, they started using the source code control system, so there was a version control system called Versions I recorded, but there are also snapshots, and it goes down through open source editions, three at six BSD, a patch kit, and then I continued the path of free BSD. I could equally well continue on open BSD or net BSD. The source really goes back to the 1970s. Here I've run Git blame on a specific file of the C library. You see there are changes that have happened in 2009, and you see that there are also changes that have happened in 1979 by Dennis Ricci. So on the same repository, if you run Git blame on it, you'll find code that spans many decades. In Numbers, it contains more than 13,000 days of activity. About 1,000 people have contributed to it. This is a timeline of the releases. So early on, there are only snapshots. We don't have precise changes, because people were not using a version control repository at that time. Computers weren't powerful enough, but from one point afterwards, we have each change in each file, and we have specific changes. If we look at how files come, you see here with different colors how files gradually get introduced, and then the contribution is smaller as new additions are coming in. The second source of things I'm using, I want to see the evolution of facilities, and this you cannot see in the source code, so where is the virtual node part of the system? It's difficult to find that. So what I did is I went through the documentation, created a database that records for all parts of the manual, when each part, when each page of the manual first appeared and disappeared. So you can go on GitHub and find a trace of every system call, of every command, of every system administration command, of every library facility, and the releases in which it is documented. Some parts, of course, exist without being documented, and some parts are documented but don't exactly work, but you can easily get a rough idea of what happened when, and this is what I used to discuss the milestones that we will see today. The third source is a type set of few manuals that were missing, such as the third edition in Tirov, in Enrof, and the fourth edition in Enrof, so I could easily go through them and find important things. Let's now begin by describing the architectural evolution. I will describe this in terms of milestones and also in terms of numbers and insights, and I will begin with milestones. So the things I want you to get to take away from here is how evolution happens in practice over a number of decades. This repository is more than one gigabyte that contains all the history of a large important part of UNIX history, a project where you can contribute as well if you want to enhance that, and also architectural lessons to apply. So what happened in the past is important lessons that might happen in the future and good and bad things we can do regarding architecture. The architectural part is joint work I performed with Paris-Afghanio. So starting with milestones, we said that history begins with a non-named system in the 1970s, and this thing, it wasn't even named UNIX at that time, contained a number of very important architectural facilities. It has not survived in digital form, but volunteers have taken printouts such as these, OCR them, typed them in, and to verify that it actually worked, made them run on an emulator. Yeah, I think this is an applause. The kernel itself is extremely small. It's just 2,500 lines of code, yet it can load and execute user-level commands. It provides a file abstraction, so you can write data into files, named files. It visualizes the hardware interface. You don't need to know about the paper punch, the tape punch, or about the teletype writer, and it establishes a ownership of files, so they're already file owners. There's also layering and partitioning. So these are the files, and you see that there are different files for the kernel, different files for commands. Where commands, more than one file is needed for a command, they are grouped with the same name. The commands will call the kernel. The kernel does not depend on the commands for this clear layering here. The commands are not coupled together, and I showed you the partitioning for a filename convention. Everything is in assembly, by the way, .s, right? There is process management. Here is the source code of the fork system called, that's still with us today. There is the scriptor manager. So the descriptors that we get when we open a file and we use when reading and writing to files are there. There is NF gets and NF put part, entry points where you can obtain and release descriptors. There is a separation of file metadata and file naming. You recognize what this is? It's the definition of an inode, the index node which contains the metadata, flags, the user ID, the number of links of the file, the size, but not the name, so that we can link files together using a different mechanism and establish the hierarchical directories, which were not there at the time, by the way. Devices as files. So there was a system directory, what now is called the dev directory, that contains links to the console, the paper tape and the second terminal. Those were the devices they had on that poor computer. There's also FileIO, so there were open read write, seek and tell system calls, and the file system abstraction where you could create files, rename them, link and unlink. Nowadays we don't create files, we open them with a create flag, but this is how it works at the time. There was an interpreter there, at least two commands, an indentation command and a command that converts things to lower case were written in a language called B, which was implemented through a threaded interpreter. You see here a main, which later became new B and the main we have in C. After that system, the number of research editions came out that were actually Unix, with name Unix, developed on a PDP-11. The PDP-11, Unix is a complete rewrite of the PDP-7 Unix, so because the assembly language is different, it was written in assembly, and it wrote everything from scratch on a computer such as this. This has also not survived, but there was a memo at the laboratories that contained a listing, again, of the source code, a bit better form. You see here various things like the definition of constants, a link to panic, very important, and so on. And again, contain many important architectural innovations. The architecture of the system I have in this diagram is recognizable Unix. So you have user commands, administration commands, a small library, a system call interface, IO subsystem process control subsystem, and some utility functions. I will come back to this diagram later on. The system established a binary API by numbering system calls. So these are the first ten system calls in the 1972 first edition of Unix, and I've taken those system calls again on the FreeBSD 11.1. You see the same names and even the same numbers. And those of us who are also using Linux, if we go and look at the numbering used for the Linux system calls, which are not derived from the source code, we'll see the same numbers again. It's remarkable that over 50 years there was a binary API that survived those many years. At that point, the shell was established and we can regard this as normal. It was not if you were working on IBM. The shell belonged to the kernel and the kernel, God, only could modify the shell. In Unix, you can see this documented. The password file contains the program to use as a shell so anybody could go out, innovate, and create new versions of the shell. It also abstracted standard IO, so it allowed commands to read from any file and write to any file, not yet pipes and filters, and established interoperability for documented file formats. So a section of the manual documents some file formats. You can see here the documented file formats and clients of which this file format. I find this very interesting because it establishes the use of conventions of a configuration so you don't have a rigid mechanism of how you communicate. You document the format and you allow anyone to read and write those files with the permissions that are needed, of course. Another very interesting thing is the section 6 of the manual, which contains user-maintained programs. Some of these programs, such as SORT, are now with us as programs that are part of the system. Can anyone guess where these programs were located in the file system? Yes? In the USR directory. Exactly. The USR directory we have is the user-maintained programs directory. And there was even a mechanism to force their documentation. There was a cron script that was running and for any command that appeared there in binaries that didn't have a manual paste, it was deleted. This is why we have so good documentation. Things migrated from USR to other places and they were documented. It also established a tree directory structure with the system calls we know and two programs and a number of programs that use them. A mountable file system interface. This seems like a luxury for a system that had one at most two disks but enforces a single tree naming scheme and does away the ugliness of drives and devices which systems I won't name here are using still to this day. We have seen the diagram of this system. We have a manual page here of the second edition. Now we are moving forward. If we look at this, the second edition established a software library. More than four teams of these subroutines that were there exist as functions in the modern C library. This demonstrates the power of well chosen utility functions. In the third edition, the third edition established the pipes and filters architecture. Interestingly, as an idea, this was put by Doug McIllroy many years before. You see here the date 1964 where he said we need some way of coupling programs and we need to have some hoses so that each another signal becomes another process and we can massage data in this way. This by way of IO also. So he said that as we plug garden hoses we should handle IO in the same way. Programs at that time could have redirected the standard input and output but only like that. At that point, the people behind Unix, the team, went and transformed all programs into filters. It happened only over a few months. Everything was as many programs as possible were converted into filters. The fourth research edition, so this coincides with fourth edition of the manual, brought with us a number of things. Structured programming. So programs were not written a lot of code was not written in assembly language anymore but into a language called new B derived from the B language I showed you earlier on. There are thousands of lines were written in that and just 700 lines were written in PDP 11 assembly. The system became more modular so it had about 105 C functions and 50 assembly symbols. Compare that to 250 global symbols in the first edition. By using this language it became more modular. It gave also, it had a language independent API. Nowadays we use Unix system calls from all different languages. This is the first time it happened. You see here the pipe system call which is documented in two ways. As a system call you can call from assembly, the read file descriptor and register 0, and write file descriptor and register 1, and the system call we can call from C with an array of two integers where you get the read and write file descriptors. It established data structure definition and reused through header files up to that point anyone who wanted to use some structure just copied that thing to their part of the code and reused that form. Through header files we have portability we can change the definitions and introduce new functionality. Introduced dynamic resource management. So two routines malloc and mfree don't have anything to do with the malloc and free we have in the C library. And these were interestingly used for two purposes for allocating things in the main memory core memory and also in the swap area so with two maps with these two routines could allocate part resources in any two of the two. It also introduced a device driver abstraction here you see 16 special files not many of those do you see today but there are many interesting ones such as a photo type setter and the voice synthesizer for example, 1970s and the device driver interface that is still with us and this is also something common in Linux and even Windows we have a lot of things to do here of a strategy routine so we have the open close and the strategy routine and things to read open and write to routines, to devices. Introduced a buffer cache to manage to cache things from the disk into memory these are some constants for buffer cache elements and I found this again still existing with the same name in free BSD 11.1. So the idea that some parts of the system could be written not in assembly language or in new B or in C but as shell scripts so you see here an example of a shell script that compiles some data the sixth edition is famous as the most widely photocopied and underground transformed system through this description of the sixth edition by John Lyons at the University of New South Wales in Australia it gave us the portable C library so a different version of the library aimed at portability trying to be able to compile in different systems which as you can recognize by the names influence the design of the standard C library that we have with us today the seventh edition brought many many important things with us and it was the last edition that came out of Bell Labs for a long time so introduces Unix as a virtual machine one problem they had is that they were moving into new hardware and they were finding it difficult to port programs from one hardware to another so at some point as Steve Johnson describes Dennis Ritchie came to him and said it might be easier to port Unix to a different piece of hardware than porting each application from Unix to a different operating system so they decided that we will use Unix as a virtual machine, as a way to move applications from one different thing to another from one different hardware platform to another something we commonly do today it offered dynamic memory allocation malloc and free at the C library this became an instant success 20 C programs use it use it at that time and two library routines use it as well introduced static analysis so the famous lint tool that searches through not your pockets for lint but through programs to find pieces of code that are not exactly good was introduced as a separate program because static analysis is resource heavy it puts a toll on the compiler so they decided to put it outside the compiler with the idea of Unix each tool should do one thing well introduced environment variables again here there's an important part of convention environment variables we write them as key equals value that's just a convention there is nothing underneath that requires that to happen if we don't do it some things will not work but it's not a requirement and it was for modifications the kernel, the shell and the C library already at the 6th edition there was YAC a parser generator it was complimented by YAC a lexical analyzer generator and this brought the language development tools it's completed the language development tools and 12 clients 12 programs were written using this idea that you can easily create a small programming language you can use it to document the idea of domain specific languages special, small languages that can do one thing very well do you want to search through the file system you write a find predicate you want to search for lines in a file you write a regular expression you want to transform a file you want to use the M4 macro processor and so on and so forth it also documented the file system directory hierarchy so there is a manual page quite long, four pages long the first place is put this is important for two reasons first of all that we still follow this secondly because it described the self-hosted system so this hierarchy contained the source code, the development tools the libraries, the documentation so the idea that the system is self-hosted you take it and you can build the same system from source somewhere else there's no magic thing in the background a temple where the gods build the system for you and we utilize this idea day in and day out today after the sixth edition Unix went to Berkeley there the computer science research group got a copy and started doing research on that the first and second Berkeley software distributions weren't complete systems they introduced software packages so using the established directory hierarchy and the idea that you can build a program through a make file they send out tools such as the C shell the X editor which later became VI, mail, Pascal and the terminal library 3BSD through funding that was given in order to establish a system that could be used for running a research to be running a research facility introduced a virtual memory paging up to now Unix could run on a Vax but it looked at the Vax as a very large address space machine it did not utilize its virtual memory facilities it was a big effort 2800 lines out of 1600 160 line kernel and also introduced a number of special system calls we read, we write before to take advantage of this this was a violation of abstraction and you don't see these calls anymore these are the two first ones before it is discouraged because it didn't live very long but it shows how easy it is to veer off from the good way of architecture in 4BSD two important elements were introduced regular expressions this we think it's very important and many scripting languages today live and die by regular expressions but it proved to be a very hard sell at the time there were five different implementations of regular expressions but when they were introduced just a single program more was using it by 4.3BSD another two were using it dbx and artist today with a lot of effort from the top programs add, grab, set and expert are also using it which shows that when you introduce some architecture feature late in a system it is difficult to make it catch and to make it adopted it also introduced optimized screen handling at the time there were a variety of terminals you could use a teletype writer which was developed for sending telegrams but also for hooking on computers to a glass CRT terminal each one had different peculiarities so a terminal library was built in order to standardize this we still use this library so programs that you are using screen handling programs still use it but for a different purpose for backward compatibility with older systems 4.2BSD was extremely important because it brought to us networking so introduced a number of the internet protocol family it introduced a socket interface for local and remote inter-process communication a number of databases that were needed to make the internet work and the pseudo terminal driver so you can write program something that looks like a terminal from a program so that you can write a terminal emulator and allow one piece of software to talk to another computer as if it is a terminal the socket interface is controversial so you see that it was extremely worded introduced a number of, it was used a number of system calls, all these system calls you see that not many of them are widely used this is how each the programs that are using these system calls this shows that just being throwing architectural elements in a system isn't always a good idea the arguments for and against shoe hoarding the old system calls so using read, write open and close but maybe this was going over the top another interesting version that came out of Berkeley is the Tahoe edition which supported two CPU architectures, VAX and the CCI power 632 architectures, it was not very well in the marketplace but it allowed the current to be split into a device dependent part architecture dependent part and architecture independent part so this is the Tahoe code, machine dependent code and the VAX code this proved to be very useful when UNIX was later moved into 386 BSD the Reno edition introduced a packet of hardware that gave the idea to router vendors that could use the operating system to write vendors and the virtual file system interface so the idea of Vnode which means that you can write different file systems to interface with the system at the Vnode interface so you don't need special code above that layer or other the kernel infrastructure can hook up to any different file system through the Vnode interface then it introduced another interesting facility the FunOpen you may know it as FunOpenCookie in Libs, how many of you know it? not many you see this is very interesting, it allows you to create a file a file, capital file where you can read and write and open and close and underneath it can be something that you program with something virtual although it's very interesting because it was introduced late, it is not widely used you can use it to read from internet things through HTTP for example or for compressed files but it is not something that's widely used 4.41 of the last editions that came out of Berkeley introduced stackable file systems again, the idea that you for example can stack on top of a CD-ROM something that you can write on top through the union file system and the generic system control interface through which you can modify the kernel with this control function and a documented thing that now this contains more than 3000 values this is something controversial I think, it still lives in architecture limbo, the interface is undocumented it could have been, it violates the idea that everything in Unix happens through the file system directory hierarchy and it shows that there are hard architectural decisions that need to be made and nobody may know the correct answer as I said, the split into system dependent system architecture independent parts allowed a group of people to create 386 BSD which allowed the system to be used on a very cheap hardware that was widely available and on that a number of patches were developed because Bill and Lin-Joyce do not have time to integrate all those things into the system and establish the idea of organized community contributions that the community can contribute to the system up to that point people at Berkeley or at Bell Labs developed the system but not a community so it changed Unix from open source software it had become open source at the time because Berkeley had developed a lot of code that as government funded could be openly available to an open source project where people can contribute a group of people took that point and worked through free BSD this route I will discuss open BSD and net BSD are also equally important introduced a package manager in free BSD that allowed other packages other compilers, utilities and so on to be introduced and used by the system through patches in the source code compilation, instruction, installation and the handling of dependencies this allows the system to grow without burdening the base system up to that point every new utility was part of the base system but nowadays with thousands of ports this could not continue free BSD 2 introduced a proc file system to see various things regarding a process again tension here between system calls that get process information through the file system hierarchy and dynamically loadable kernel modules in free BSD 3 we have the common access method subsystem so I think to abstract all the functionality needed to access storage devices kernel networking and user library that allows us to build protocols through a data flow model free BSD 4 introduced and brought the open SSL secure socket player part which is the idea that you bring a large collection of software this is more than 1000 files more than 200,000 lines of code with many functionality the library level and the very rich user level command that allows you to use it in free BSD 5 you see that the version numbers are going fast now we have the modular IO disk request transformation framework geom that allows you to virtualize and create various levels of functionality to file systems in 5.3 we have a streaming archive access library mini port data driver in 7 we have the ZFS file systems in 7.1 dynamic tracing toolkit in 8 the packet capture library was introduced that allows user level programs to manipulate packets and in 9 there was high speed interconnect library introduced again you see quick succession of versions and relatively few major architectural additions I can discuss this later so let's move now to our observations not regarding our milestones but regarding numbers and insights on the number side we see a large increase in size from a few thousand lines to millions of lines what that has meant in terms of modularity we see that the modularity has increased with code size so that people tried to keep modularity at a constant set through the introduction of static declarations for example through the use of hash include directives and so on at some point complexity rose but followed the self-correcting path so number of files per function increased and then started decreasing the nesting of statements if else then while and another if in between increased and then decreased again the density of cp process or non-include directives which can make the code very difficult to follow again follows this self-correcting path even the go to function became less and less used there is a way to count this called cyclomatic complexity essentially the number of branches that happen within a function and they see this decreasing within the kernel, the libraries and the tools I find this remarkable because there is no dictator there saying you shall decrease your cyclomatic complexity it happens through a very rich and varied community as if an individual force is guiding it perhaps Lehman's laws of software evolution how do facilities grow over that time as I said that we could trace it by looking at the manual pages so we see a continuous growth in the number of commands three phases in system calls so it seems that every time the system moves from one place to another from research labs to Berkeley and then to open source it changed the way system calls were constant or increasing nowadays the same with the C library constant followed by growth a constant growth in the number of devices this expected because the hardware industry is constantly innovating and giving us new devices or forcing us to buy new devices a stopped growth in the number of documented file formats perhaps they are no longer serving us software is becoming too complex today to document it through a file format such as password or et cetera groups a constant growth in the number of system commands so in system it's still happening and the growth in the kernel interfaces so things are happening behind the scenes regarding observations on the insight side so what can we say on the quality of these changes not the numbers I see the following things one very important thing the importance of conventions of rigid enforcement there are other systems that force you to go a specific way they have API strict where you create very difficult data structures to do something in Unix the preferred way of working is to establish a convention and politely ask the people to follow it we see that in file identifier naming from the early beginning at 1.s at 2.s the environment variables the C calling conventions the idea of prefixing C functions with an underscore where files are located continuous evolution so evolution has stopped it continues to this day I showed how things were introduced over the recent versions of FreeBSD however I've also seen a slowdown and we've seen it of architectural evolution so we see that few major important facilities are added over time maybe it's difficult to make a big architectural change when we are dealing with a system of tens of millions of lines of code how do you make an architectural impact or maybe it's difficult to change a system if you have a big ship that you need to do turn and it's difficult to make it turn we see a wide influence of its internal design so even internal design features that are not documented can influence and survive over decades we show that with how the internal device driver abstractions have moved and are used even by other systems the case of character and block device switch, the strategy routine which can be found in Linux there are also features that can be added and not used, we saw that with regular expressions file streams, the stackable file systems so architecture is not an easy task it can be a thankless task as well we also saw that architectural features change role over time these are the bones of a human a dog, a bird and a whale so we saw that the Cursus Library for example was initially introduced in order to provide compatibility with various terminals now we're not using terminals anymore we're using terminal emulators and it's used now not for compatibility purposes but for providing backward compatibility with all the programs that were written for the various terminals there's also the idea of inadvertent technical depth, so architectural decisions that appear reasonable at the specific time can turn out to be suboptimal when the socket interface was established the idea of socket streams and datagram abstractions seemed like a good idea at the time but we know that the internet moved in a specific direction vendor specific implementations are not used and now we're using the open internet standards rather than the ISO standards for example the importance of portability as a shaping force so many times along its history UNIX was moved from one architecture to another and Denis Ritchie described we want a system that first of all is easily portable unchanged but can also easy to change to move it and take advantage of different hardware architectures and power of hardware this influenced the design of the system the C programming language and also the whole architecture of UNIX then we saw competition between alternative architectural styles so the competition between system calls to search and get our limit and the file system interface that provides similar access facilities we also saw the preservation of architectural form so I showed you this diagram of the first edition have a look at it now squint your eyes and look at the modern FreeBSD 11.0 architectural form has remained stable especially in areas where technologies haven't evolved the colors you see in this part are available in this system everything that's colored with the same color is still the same or maybe the initial architecture was very well thought out could be a platonic architectural truth a truth that that system adopted in similar systems have adopted as well the diagram also points us to the growth of federated architecture so a number of architectures now have their own living their own boxes such as OpenSSL, Git, Geom, Cam, NetGraph you no longer influence the direction of the whole system but you built your own federation of architectures and finally the idea of having individuals with imagination, with creativity to come up with powerful abstractions the taste to select the most appropriate abstractions and the energy to implement them this brings me to the end of the presentation at this point I wish you to go out and architect something great thank you very much