 So now, here's John Mettafall, no need to announce him, and his topic is Two Contents, Almost No Waiting. Please give him a warm applause. Thank you. I will explain the almost in the course of the talk, but it was a slight change that happened recently. First of all, for the people that do not know me, I was inspired yesterday by Bedel's talk where he explained his history with Debian. And although I haven't had quite as tight a history with Debian, I have had a history in computer science, and particularly with Linux since 1994. I've been a large number of different jobs. A lot of people think of me as this kind of a marketing, somewhat technical guy who goes around giving away free software, ho, ho, ho. But one time I was a programmer and I did teach operating system design and compiler design when I was teaching in university. So I do have a technical side to me also. In 1994, I saw Linus Torvalds for the first time and saw Linux for the first time and working at Digital Equipment Corporation trying to promote the Alpha system. I saw this as a perfect target to do computer science research, the ability to distribute the code freely after you've done the research, and particularly in large address spaces which the Alpha had. I saw this as a perfect thing and then later on I saw it not just as an educational and research vehicle or a technological thing for computer gigs, but I saw it as a commercially valuable system. And so in 1995, I became the executive director of a small organization called Linux International, which is made up of some very small fledgling companies that were trying to promote Linux. And at that time we developed the Linux Mark Institute to protect the Linux trademarks so that anybody could use it for any legitimate purpose. We created, helped to create the Linux Professional Institute for doing certification that is distribution neutral, and we created the Linux Standard Base Project which continues to today. And then I've spent the last 20 years trying to promote Linux worldwide to companies, to universities, to governments. So that's a little bit about me. Who is ARM and Lunaro? Most of you know that ARM is a corporation that develops an architecture for CPUs, GPUs, and other types of processing units. They design the architecture and license it out to companies like Broadcom, Samsung, and others to actually create chips, unlike Intel and AMD who both create the architecture and manufacture the chips. A few years ago, what was happening was all these companies producing these chips were hiring, say, 50 engineers a piece and trying to port Linux onto their chips and onto their system on a chip's S or C's and other types of boards. And then all of them would be sending a patch to Linux Torvalds to say, oh, Linux, here's a patch with a kernel that handles the memory management system and ARM architecture, and Linux would end up with 50 patches all purporting to do the same thing and, of course, having different code in them. And he was getting fairly exasperated with this. And so a friend of mine, David Rustling, who actually wrote the first bootloader for Alpha Linux called Milo, was a fellow at ARM, and he decided to start an organization called Linero, which has the idea of cooperation of all these companies. Coming together and saying, let's collaborate on this, let's all pour resources together, and you have two engineers that you assigned to us along with some money, and we will create the patches that will go to Linux. And so now Linus gets one patch coming to him from Linero. And we work on more than just the kernel. We also work on tool chains and testing mechanisms and various other things in the ARM ecosystem. Now, most of the people in the room, I mean, if you haven't heard, I'll ask the question, who has not heard of the Raspberry Pi? Okay. Now, the reason the Raspberry Pi was created was because professors at the University of Cambridge were a little bit concerned that students coming into the university actually knew less about how computers worked than students of 20 years ago. Because the students of today coming into the university, they get a laptop bought for them, whether maybe they buy it themselves, and on the laptop it says, you open this thing up and you void your warranty. Well, of course, we don't want to do that because if the thing breaks, our warranty is voided. So you never open that up and then you go and you get a game at your local store, you pull it down off the web, you pirate it from your favorite friend or whatever, and you put that on your computer and you don't have to compile it, you don't have to debug it. This is not like it was 20, 25 years ago when you had a Commodore 64 and you copied the program from the bulletin board over the net and you typed it in, you got these things called syntax errors and you got these things like buffer overflows and all sorts of weird nasty stuff and you had to figure it out. And so the professors were really, really upset about this and they decided to create a computer that purposely did not come in a case, purposely was small enough to fit in a student's pocket, purposely was cheap enough that if you blew it up, it wasn't the end of the world. And they stuck these things on there, magical things called GPIO pins, general purpose input output that you could then cause these pins to put out signals and then all of a sudden there are these prototyping boards that showed up that people could start designing things and then there was a website that shared the designs with different people and the source code was published. I sound familiar to you. This is why we did it 25 years ago. Only back in those days the computer might cost a little bit more and the circuit board being fairly expensive. I still remember the day when a transistor, a single transistor cost a dollar 25. And the Raspberry Pi was only three watts of power. So this of course, and when they started, they thought they'd only need a thousand of them. And then they updated it to 10,000. And then by the time they took the first order, it was 100,000 systems and in the first year they manufactured a million. And this year they're looking at 5 million. So it's been a little bit of a success. Now it all, of course, sponsored some clones. Along came some companies over in China that says, hey, we can create the banana pie. And we're going to make it a little bit better. Dual core ARM. Now this is a little bit more interesting from a computer science standpoint because if you're putting an operating system on there, of course, having dual cores means that you can actually have one CPU interrupting the other. And you can have race conditions and all sorts of other interesting things. It had a gigabyte of RAM. It had a SATA connector, which is really nice because it gave some decent input output. And it had gigabit ethernet versus a Raspberry Pi, only having 10, 100 megabit per second ethernet. Slightly more power, but hey, it only cost a couple dollars more. And then, of course, the Raspberry Pi people came back with an even better Raspberry Pi. Four cores now are really becoming interesting. Unfortunately, only a gigabyte of RAM. And no input output device. And it's still kind of working off of the USB 2.0 bus. So then we came out with the Banana Pro, which had gradual improvement. But when you actually take a look out on the internet, you find all these tiny little computers, and they all are very interesting in one way or the other. And even you even have some Intel interesting computers like the Galileo over there. I'm not afraid to mention it. There it is. But they're all computers that people can touch and put our favorite operating systems on. And then you have things like the Adaptiva's parallel board, which not only has a two-core ARM 9 processor, but also has a field programmable gate array. Now, we've known how to build these for a long period of time, a very long period of time. It's just that they probably would take up this entire room to build what's there on that chip right now, and would cost many thousands of dollars when you could buy this chip for a much lower price. It has some digital signal processing chips on it. And on this particular board, it even has a 16-core or 64-core CPU where each core has 32 kilobytes of memory that's associated only with that core, so that you can load your programs and your data into that core and work on them in parallel. And people go, oh, bad dog, 32K of memory, what can you do with that? Let me tell you something. I programmed for the first 10 years of my life and had less than 32 kilobytes of memory on my computer system. And if you remember, CPM used to work in 64K of memory. That had the operating system in it and a whole bunch of other stuff. And we got along fine. And it only uses five watts of power. I will point out that for a certain number of months, this was the fastest, most power-efficient Bitcoin mining CPU on the market. And it is the latest in my little reign of computers, the BBC's micro-bit. The BBC, the British Broadcasting Company, has gone in with some other corporations to create a tiny little computer system that they want to make cheap enough to actually hand out for free to every seventh grade student in the entire United Kingdom. And they're going to be working this into their program, into Doctor Who will be having a micro-bit. And, you know, an East Side Street or whatever they call it will be having micro-bits worked into the programming to interest the students in programming that. So besides all of that, why am I showing you all of this? It's because of things like this. This is my latest little hobby. I wanted to put together a nice little, in fact, Beowulf computer system and made it out of banana pros. And it worked very well. I had all together six gigabytes of random in it and six HDMI ports so I could have a decent number of screens. And six SATA ports, although I only used two. So if you notice, there's two levels to this. There's the bottom level. And on the very bottom level, it has an eight-port gigabit switch. The next level up has two one terabyte disk drives taken out of a notebook. And above that, the bottom two banana pies are controlling those two disks. And then up above that are just in effect CPU units. And the whole thing uses, including the switch, 70 watts of power. But the biggest thing for me was the fact that it actually fit in a standard size briefcase. And so I can take this standard size briefcase around the universities and stuff like that in a small amount of time, put it together, so that I can demonstrate various things to them. Now, what type of various things can I show? Well, I can show high-performance computing using OpenMP, MPI, you know, a variety of other different types of free software. I can also set it up as a highly available system doing mirrored disks and be able to show heartbeats and be able to show the fact of failover and that type of thing. I can use it to do heterogeneous computing. I can put, pardon me for saying this, other operating systems on there such as BSD or the Herd or Erno and stuff. And I've been told that in the future sometime I might even be able to put Windows 10 on one of those systems and be able to show heterogeneous computing with that. Likewise, I can do heterogeneous systems administration on there. Now, I know that we can do this very simply by using virtual machines and things like that, but, you know, it isn't as exciting as when you can actually see the lights blinking and not hear the disk moving and stuff. There's a type of stuff. And I wanted to be very, so I wanted to be very modular and as time goes on and as I get find more and less expensive and more powerful CPUs, I can unplug some and put some more in there. Now, this was the first version of this. It's the first prototype and I made it out of things like Plexiglas, which is very expensive and stuff. And so I want to go back through now and substitute other materials with the drawings of where you drill the holes and everything else and reduce the cost so it should be able to get down to less than $400 maybe even less than $300 and actually have more powerful processors up the stack. When I say more powerful processors, I'm talking about things like this. This is an 8-core ARM64 chip inside of here with a gigabyte of memory and this particular system costs about $100 US. And we'll talk more about this later on. So contest number one is sponsored by actually three different companies, one of which is called Invenio. Invenio is a nonprofit organization in San Francisco. I first met them at one of the first Linux worlds in Moscone Center in San Francisco many years ago. They are in the business of bringing electronics to places where electronics probably shouldn't even be or it not shouldn't be but couldn't normally be because there's no electricity there. No electricity, no telephone, no nothing. And they bring telephony and electronics to these places. They have satellite links to reach up to a satellite for the communications and they have some method of creating electricity whether it be a solar panel, a water wheel or in this particular case a bicycle with an alternator on the back of the bicycle charging batteries. Interesting side story on this, they would go into these villages in Africa and they would say, okay, here's the system. What type of power source do you want? A solar cell or a bicycle? And the village chiefs would look at them and say, we want the bicycle. And why the solar panel is so easy? It's just you got plenty of sunshine and they say yes. But when the solar panel breaks, it's a long way to get one. And number two, it costs a lot of money, which we don't have. But we have lots of broken trucks and lots of broken bicycles. And so we can take an alternator out of the broken truck and we can fix the broken bicycle to pedal. But most importantly, the bicycle creates a job for somebody peddling it. Shows a slight difference between their thought processes and western thought processes. And some of these emerging, I don't like the word third world countries. I don't like that phrase at all. So I usually say emerging economies. But in this case I will use third world countries because another place they are very good is like in disaster areas like New Orleans after Katrina. Third world country fly in their telephony systems and create a telephony system in cases of disaster. And that's what a venue does. And if you're not familiar with them, go to their website, they've done some amazing stuff. So they had this idea that besides bringing telephony systems in, if they could bring in a whole micro data center that could run off of solar power and make it so efficient and so dependable that these people could now have greater communication and greater capability to run stuff. So they create this contest to design what they call the micro data center. Using up to 15 small arm boards hooked together. They have a 16 port gigabit switch in there. 10 SSD drives to hold data. And then able to run off of either 12 or 24 volt solar panel being plugged into them. They wanted to have a UPS built into the system in case the solar panel came unplugged or something like that. The system itself would still be alive that the SSDs would have a chance to shut down normally. They needed to have it passively cooled because as you know the fan is the first thing to go and particularly in a salt water environment. And they wanted it to be portable and manufacturable to have it at the lowest possible cost. They wanted to have a Faraday cage because a lot of times these are used in telephony situations and telephone standards say you need to have stuff in Faraday cages. And in their specifications they said we want to run a lamp stack. Now I looked at the lamp stack and thing from a computer science perspective I said that they didn't really have any good answer at the time and we'll get a little bit more into that later on. They separated the contest into two parts. Part one was developing the hardware and they said you have to be teams of three to seven people. We don't want individuals doing this but three to seven people in the team. The prize was 10,000 US dollars for the first prize and the second prize was up to seven for the seven people in the team next to seven tablets. Now part of the specification was that everything which you donated everything which you had in your design had to be open so that they were free and anybody was free to take any of the designs and combine any of the components together into an overall Uber design. You can sign up. This part of the contest is already over. They haven't actually announced the winner but they haven't actually put up the specifications for the winning prize. We're expecting that to happen any moment. I with a couple of other people did make an input to this. We called it the pirata entry and we estimated its cost at 2,600 dollars complete, UPS, systems, everything. We think we could bring that down in manufacturing cost. As far as the design of the box we had a design to be very compact. It fit an overhead airplane cabinet you could carry it on as luggage and it was completely redundant. The only thing that did fail that was singular to fail was the switch and because the manufacturer said it was typically a 75 year mean time between failure and the switch we felt that that was okay. We did have failover with SATA between all the boards and we thought it was very good. We placed nine out of the 50 different entries so I'm waiting to see what the upper eight entries look like. This is the next step for the contest and this is where Debian comes in. I'm coming here to tell you about this because I think that the people in this room and the people watching this video are probably the best people in the world to design the levels of software that we go into this system. They haven't defined what their needs are yet but I have defined a small list that I think should be there. Easy to install and there's a talk right after this that I'm going to hold round for that I think will help with that. Almost easy to manage. Notice I didn't say easy to manage but no software is easy to manage but if we can make it almost easy to manage that would be good because a lot of these units are going into places where they probably don't have a lot of experience with computer science. Make it scalable so you can start off with maybe two or three processing units or disk units and then go up from there. High availability of course is stable but we don't know and they haven't defined yet but maybe we should think of this as a cloud type of device, perhaps a local cloud device with further storage in some other cloud, some other one, it's kind of a mirror site where it should be a client-server type of relationship or a high-performance computing type of relationship or I don't know. So for the people in the room you can start thinking about this and as they announce the contest you're welcome to join it and I think that we might be able to have a good solution coming out with Debian as the basis. So that was the first contest and now I'm going to talk about a second contest yet another computer contest. GNU Linux, if you think about it it's kind of like 45 years old. It started with Ken and Dennis sitting down at their PDP 7 and writing all of the kernel in machine language. In fact the entire operating system was written in machine language. C hadn't been invented yet and then when the PDP 7 kind of ran out of steam and address space they bought a PDP 11 and wrote the entire kernel in machine language and after the second time Dennis said that's it, I'm not doing this again he invented C and then they wrote the entire kernel again in C and they said phew, that's the last time we'll have to do that and then they went over to an interdata 832 and all of a sudden they realized oh this is a different architecture so we have to write the kernel again you know the history but back in those times 64k of memory was gigantic and therefore the programs a lot of the programs were written using data flow type of techniques and things like that so that the program and the memory system could fit into a certain type of space but things have changed no longer is memory $128,000 for 64k as I one time paid but it's now more like ten dollars for gigabyte or less that's not to excuse bloated code because as we all know the cache of the CPU still tends to be rather small if it's there at all but there's multiple levels of cache that's all sort of thing CPUs are multi-core and even then we may have multiple CPUs that are multi-core on the same board and algorithms have changed and become more prevalent and algorithms have become more acceptable as the memory sizes become larger back in the day pipelining was something you did in plumbing not in electronics and cache was something that you put in your pocket on your way to the bank but both of these things are now prevalent simply because the electronics have become cheap enough to allow them to become prevalent I remember when the GNU compilers produced code that was 30% less efficient than commercial compilers and today a lot of the GNU compilers are toe-to-toe with commercial compilers in terms of efficiency and we also have other free compilers available and the need for assembly language has decreased and sometimes it's actually detrimental to have assembly language in your code particularly inline assembly language that throws off the optimization techniques that the compiler has been generating up until that point so we're announcing the Mad Dog and Linero's GNU Linux optimization program now I don't have any type of fancy name for that other than that but what we did in fact Steve sitting here in the front row went through the code in Debian and in Fedora and found out that there were 1,400 different modules in there that had ARM 32-bit code in it or assembly language code in it and therefore were not portable to ARM 64 and that of course is a problem because ARM wanted to bring out their ARM 64 chipset and they wanted to be able to have Linux running on it but I came along and looked at this and said you know there's also an issue of performance on this and maybe we should be looking at these pieces of code which is porting them but also spending some time to do some optimization and so the goals of the contest are number one to make sure that all these modules do work on 64-bit ARM compile and test them sometimes it's just a matter of testing them now that Debian has an actual distribution Jesse that formally supports ARM 64 it may be that people can just take and test the code check it off and say yes it works or we find that it doesn't work then the people have to go upstream to the upstream people that haven't where it hasn't gotten into the package yet and say are you working on this and if not if that doesn't work well then they would put a bug entry into the bug tracking system of the project and say your code needs to work on ARM 64 and then they can go in and perhaps do the work themselves and submit the patch to the upstream developers but besides all of that to take a look at some of these packages and say can I improve the performance of them can I spend some time and the interesting thing too is the performance is not just in the speed of the application we'll get to that in a moment and then to take all the information that we learned from these performance improvements and try and create a course where we can teach performance programming now what do I mean by that well in the old days performance was you plugged your computer system into the wall and how fast did your program run and today performance is measured by in some cases how long does my battery last and it's not just about phones but it's things like if you Google and you have a server farm does this mean that you can only buy you only have to buy 9,000 servers instead of 10,000 servers that you only use 900 megawatts of electricity instead of a gigawatt of electricity that's what performance and efficiency means today and so the categories of performance that we would like to measure are memory utilization you may be going still going into an embedded system where you don't have 16 gigabytes of RAM or 32 gigabytes of RAM but instead you have half a gigabyte of RAM or less casualization a friend of mine named David Mossberger Tang did an experiment with the Alpha processor he took two larger rays and multiplied them together using the same multiplication techniques you would learn in algebra he then inverted the second array did the multiplication and inverted the answer which gave you the same result but because the second array was inverted it meant that every single access of the data tended to access it in cache the first method meant that almost every access was a cache myth and therefore the second technique ran 40 times faster than the first one or looking at it another way it operated in one 40th of the time now these were very large arrays obviously but he dealt with very large problems so and another thing that Steve found was that a lot of different places the same code had been cut and pasted or copied and maybe this is something that we should go back into the compiler and say let's create a compiler intrinsic so that instead of cutting and pasting this code when there is an improvement we make an improvement to the compiler and it improves every place for the next compilation and there may be more categories of performance that we can be looking at so we have some suggested prizes for this they're not huge prizes like a notebook from HP or something like that but when you sign up for the contest when you go to the Lunaro site and you sign up and you do one port where you test one program and say yes this works then you get a fantastic Lunaro golf shirt and you also get 20 points and I'll talk more about the 20 points later on the next thing you get is you get your name entered into a contest and that contest is for you to win a free all expense paid trip to Connect now what is Connect? Connect is a meeting held twice a year where all of the Lunaro engineers come to this meeting from all over the world all of the Lunaro's companies member companies and they talk about the projects they're working on face to face it's like a debcom but we hold it in a really nice hotel with really good food and a lot of beer and I think you could look at both Steve and myself and realize that we like good food and good beer and we typically have nice events to go to for example one year we had it in Dublin and we went to the Guinness Brewery for a long time and one year it was held in Hong Kong and it's a very nice city we have these day trips and stuff this year coming up is one in Burlington, California a lot of people go oh it's just Burlington we're going to go to the computer museum and have a nice reception there as far as Lunaro's fifth anniversary the one after that is going to be in Bangkok now for those of you who live in Bangkok maybe that's not such a great trip but you might be able to come to USA or Europe in a future Connect you would get the ability to come to Connect and spend a week there talking with the engineers and things like that now the final thing oh and so if you that's if you do one port you get your name put into the pool you do two ports you get your name put in twice three ports you get your name put in so if you put 10 ports in you have a 1 in 140 chance of going to Connect still doesn't sound like that great of a deal but your name stays in the pool and we do that Connect we do that drawing twice a year so if the contest goes for four years and I'll get to the length of the contest then you have four times that 10 ports so 40 chances out of 1400 to go to Connect so it mounts up it's better than being struck by lightning or winning the lottery finally you get an accumulation of 20 points towards a goal and part of that goal is to win one of these little arm development systems from a program we call the 96 boards program Linero noticed that a lot of the 64-bit arm chips came on boards that were relatively expensive for developers to get some boards may cost $600 and it's kind of you know out of the realm for a hobbyist but they came up with a program to make a bunch of series of little boards and this is one of them they're all the same size and they all have the connectors in about the same place they all have the same type of mounting holes and it gives a space in the middle of the board for Linero's customers to be able to put their system on a chips so that they can then innovate with these and hand them out to developers that are relatively low price as I said and the goal for this type of a board which we call the consumer developer board is that it would cost less than $100 for either a 64-bit or a 32-bit system we have another size board which is more stuff on it called the enterprise board which is more for servers and things like routers and things like that that has a goal of having it under $300 US dollars but it would have more memory more controllers things like that now the side effects of the oh I should go back to this for a moment each one of these boards is going to have a certain cost but every time you do a port or you do a performance enhancement you'll get $20 towards the cost of buying one of these boards and so if you do five ports you will get the choice of choosing one of these two boards the bottom board is one from Broadcom which has a Snapdragon chip on it it's also a 64-bit chip so you get a chance of choosing one of the boards and there will be more boards later on you'll be able to assign your points to whatever board you want now the side effects of all this is that you learn a really cool assembly language or not you really don't have to learn RM9 assembly language if what you do is you eliminate the assembly language which is in the code if you say I'm going to do this only with an upper level language I'm going to let the compiler do this work and you eliminate the assembly language then you really don't have to learn the assembly language of RM64 you'll learn some maybe learn some code analysis techniques I'm sure a lot of people in the room already know these type of things but again a lot of this is aimed towards university students and we hope that college professors already have had college professors actually use this program to teach the students programming efficiency and assembly language and computer architecture trying to make these programs be more efficient if what you're doing is went too fast if what you're doing is how did I do that sorry about that if what you're doing is just the assembly language port the things of this are very simple is that you tell us what you did to create the patch eliminate the assembly language what versions of the compiler and OS did you use for testing things like that if you're doing performance we're going to ask you to give us a little bit more information what was the performance of the application ahead of time, how did you do the testing and what was the performance levels of the application afterwards and there's yet another prize of performance in any given connect segment the person who gets the best performance ratio increase will automatically get a trip to connect we would ask that you present this to the connect engineers as the work you did we have lots of resources that we keep building on different books on different compilers and the way they work an arm assembly language and the time frame of this is immediately it's actually been going on for about a year now and we're looking for a university to be a home to this course that we feed this information to them and they build up this course and then make the course open to other universities to share the information so what should you do now go to the site performance.lonaro.org you can read the information about the contest if you decide to participate you can then log in and type in some information which we only need to be able to ship you your t-shirts and systems and stuff like that choose one of the modules to work on now once you've chosen that module it belongs to you nobody else can choose that module belongs to you you start working on it if you finish porting you go back to the site you mark it as ported if you decide to increase the performance you go back to the site choose it for performance improvement mark it as improved if you decide no I don't have time for this then fine go back to the site release the module so somebody else can get it that's perfectly okay because it's only recently that these boards have become available we've been advocating that people use QEMU to do the porting it's obviously a little bit harder to use QEMU to do performance work however since we're also advocating getting rid of the assembly language you could do the performance work on almost any other matching type of hardware such as Intel 64 and if you get the performance out of that you can apply that to ARM64 and then you pick the module to investigate for porting your performance with that it's the end of my talk if you have any questions I'll be around for the rest of the week I am leaving on Friday to go to a different conference but there's my email address and there's the address to the site again thank you very much do we have any questions right now I may have a minute or two okay thank you