 So thanks everyone for coming My name is Edville. Mehdi. I'm from packet This is a talk about packet and Qualcomm and misos a window into the development of the arm v8 ecosystem. Let's see So the theme is loose loosely around Lord of the Rings because you need a theme for a talk All's well that ends better. So I've been working on Getting software in general to work as well as it possibly can on arm systems since I got my first demo account at packet last October and Started working for them As a consultant in January and full-time in May We announced this week a Partnership with arm holdings to even expand that out and I'm missing that control here There we go. So who am I I'm special projects director at packet Special projects director means you can do whatever you want as long as you get someone to convince that it's working really well I run the works on our project for them So the goal of the project is to get every bit of software that anyone would ever want to run to run as well Or better on arm as it does on any other system. I live in Ann Arbor, Michigan I've been online since the 1980s. That's me in 1987 after the more 1988 after the Morris worm hit Doing a TV interview That's an Apollo workstation if anyone is room can can go back far enough in time I think it's somewhere on YouTube. Yeah, I had someone digitize it, but A friend of mine annotated it with the raptor with the raptor Notations there. So I've been doing this for a long time long enough to remember that the world didn't always used to be Intellip processors Long enough to remember that when the new hot workstation came out you had to do a bunch of work to port things to it Long enough ago to remember when we were running Linux on Intel back in the 90s And it was on Cheap little machines, you know the lowest and the cheapest way to do it just barely worked so I've seen the evolution of new architectures before and When when I found the arm ecosystem through the raspberry pi Realized that there was something really new and special there and possible and a little bit about packet So I work for packet packets based in New York City It's a bare metal cloud aimed for developers We offer hardware that you can access through API's or through Previsioning tools like Ansible and Terraform It feels for all the world like a cloud, you know It has the same kind of behavior as cloud computing but it from a hardware standpoint it's sort of more like co-location and We offer servers by the hour and by the month about 10,000 users about 50,000 deploys a month under eight minutes Both x86 hardware and a bunch of configurations and arm hardware AMD hardware coming soon, although that's That was news to me when it got added to the slide We're agnostic about What you do with this once you get turned up once once things get turned up But we're really all about empowering people to to build things especially to build things on bare metal So we give access to we give access to infrastructure that lets people Do things like develop hypervisors and do fundamental infrastructure without a lot of intervening Software in the way that we would run so a little bit about this talk We just sort of frame it for you. I want to talk a little about why hardware matters in an age of virtual machines I want to go through How one brings up a new system and the sort of layer upon layer upon layer of stuff that needs to be done And why this path through the system is different this time through Because of the increasingly interesting nature of hardware I have a very small demo that I did in advance So I can show you a little bit about DCOS on packet that sets the stage for Why someone would want to do things on bare metal and then I have Some hardware to show you and So let's let's get into this Why does it matter so when someone who doesn't know a lot about the current state of the arm hardware world? Thinks about arm. Their first thought is usually embedded. They're thinking about very small machines They might have touched or used or had their nephew or cousin or whatever had a raspberry pi There's this perception among some people that Arm is embedded systems and that was true for a long time and And it increasingly is not the only answer so Hardware is where innovation is happening at a pace That's really distinct from how software is innovating We've gotten really good at doing generic workloads DevOps has Meant that the things that used to take a lot of people a lot of time to manage by hand have gotten much simpler and because the Labor to manage a hundred machines has gone way down. We've been able to take on bigger and bigger compute tasks But you can only do so much with the abstraction of a virtual machine and in particular new hardware that's tuned to the software task that it's aimed at or Software that uses all of the capabilities of the chip that it's that's running on offers Enormous potential performance improvements over the generic CPUs that have characterized most kinds of Most kinds of data center workloads In particular think about the mobile industry Apple's new iPhone This is bits of their core In addition to the CPU in the new Apple hardware, there's specialized image processing capability There's a custom GPU that they have and this accelerates their ability to Serve the needs of the mobile phone user well beyond what you would have if you just had ordinary compute power and The the full expectation that I have is that we will increasingly see Specialized compute showing up in data centers Not as easy to consume as a abstract virtual machine But where your application fits it perfectly having substantial opportunities to continue the on the performance thing Other things that could get really that could get really Let me turn this down a little bit other things that could get really big and special We've seen Artificial intelligence machine learning having a substantial opportunity to engage With lots of data and lots of compute power Autonomous vehicles have an enormous stream of data flowing off of them that has a chance of Changing things both on the scale size as well as the special size Gaming The whole internet of things where you have Sensors spewing out data from devices around things and then in the telco world the conversion of Specialized telco infrastructure into something which is which gets called NFV or network function virtualization where there's An opportunity to turn on turn on its head how people How telcos provide services to their users? So there's a bunch of essentially new applications More you know well beyond the serve another web page piece of the world And this all these all given a motivation for considering other other hardware So this is Google's tent a side view of Google's tensor flow This is a hardware from Microsoft that they have in Azure again Taylor made two particular applications and then With with those two pictures of pretty hardware This is Qualcomm's 48 core ARM server that I happen to have with me and All I could pass this around if people don't spill their soda on it to take a look it's a it's a piece of hardware that's in some sense very ordinary in the sense that it looks like a server and if you Plug it into the right sort of rack. It's exactly what you would expect to go into a data center but also It's very special in the sense that it has 48 cores which is more than the Ordinary server of the same size You know plenty of memory and an arm chipset in it so to make it work You are motivated by the fact that it has a very dense core count and you're motivated by novelty, right because New things Provide new opportunities. So I want to do just a super brief Pick another window here see if I get the right thing so Qualcomm and packet have been working together to get access to that hardware to To people who are doing software development Qualcomm it has a very large history of doing mobile development, but that has given them a deep experience in arm development Packet has Use model that allows People to get access to bare metal so we don't have to provide a hypervisor. We're not giving we're not Assuming that we know what you want to do with it. We provide that We provide that for you The chip on this device is a centric 2400 With up to 48 cores in it The architecture is based on arm v8. It's a 64-bit only system. So That's Suitable for data center workloads The structure is that it has a number of pairs of custom cores that are Arranged in a system bus ring interconnect this gives substantial amount of bandwidth inside the chip and There's a number of optimizations so that instructions Can be done out of order to improve performance as well so it's Like I say, it's one of these things that when I first Ran into it you say, huh, that's different from all the other things that I've worked on in the past The The opportunity is there if you can make use of it to potentially Gain a performance advantage or at the very least Have some alternatives in the data center so that you can move your workload wherever it's most appropriate So I'll pull that back so if you have new hardware there's a certain Relack the better word layer cake of things that all have to work before your brand new hardware Becomes a boring routine part of your data center and There is plenty of time to talk about this over the course of many beers But it starts at the boot firmware So if you're if you're providing bare metal access to hardware The system has to boot and you'd be surprised how few people in the software world Really understand the firmware Thanks really understand the the How thank you how the particular chips the The BMC chip and how IPMI works and a bunch of things that like never register on On people's radar as being important yet if you're going to automate Access to this hardware all that stuff has to be right so the first couple weeks of our access to a new hardware is just Booting the machine over and over again getting it rigged into our system Once the firmware is running you have to get a kernel With all new hardware, there's almost always new kernel patches that have to be incorporated Fortunately for us Qualcomm and Cavium who we always also work with have been very good about mainstreaming their kernel changes But you have to be ready to run the latest and greatest and not not old things We keep pace with operating system developments again nearly always running the latest versions of things like Ubuntu and Debian sent to us and Red Hat Linux And working very closely with those operating system developers to make sure that their system boots cleanly and nicely and neatly on these systems We're not even yet at Applications right all the languages that you have have to work. They have to work really well They need to have libraries that incorporate all of the hardware instructions that your chip runs So and I lost it and Keep keep going up the stack right so there's just there's a lot of work to be done I don't want to minimize how hard it is to Have a brand new system be boring Right so boring that you don't know when you're typing in front of it whether it's an Intel system or an ARM system Where Everything works just as expected where there's no surprises and As we look at bringing DC OS to ARM Each of these levels of the of the stack have to have some attention in some order now I've been working on the languages and libraries front for a long time That was like the first thing is like compile all the compilers how well do the compilers work? Do they work all the time? Are there any bugs? Do the people working on them know that arm is a target? The containers world has been interesting because There was just an announcement this week at docker of native support for multi-architecture containers So that you can run a single container and the system will automatically figure out what architecture you're on and load the right image for it That was like two and a half years in the making to get that amount of effort done So, you know a lot of work right This journey is a little bit different because a lot of that work has been done Software has accelerated so things like DC OS Kubernetes and docker are in mass Adoption, we're not talking small amounts of people. We're talking large numbers of people and software developers whether they're doing web development or whatnot are Are Coming in this world at a increasing what seems to me to be an increasing pace Automating everything making everything easy to do Removing the uncertainty removing the grief from running a big system For better or for worse software moves faster than hardware High-level high-level components have been polished Things like boot firmware. There's just not that many people working on boot firmware. It's not exciting. It's not shiny I think we know all of them from the from the course of working with with the various vendors we work with There's a tiny There's a know a handful of pieces of code. They're not moving very fast and We depend on them utterly to do what we're doing So we're not really as an industry all that well-tooled to deal with diverse hardware We don't know as much from a cultural standpoint of how to deploy and secure it It's rare to get a full-stack engineer who Goes all the way down to the firmware or the chip design Much more when people say they're a full-stack engineer. They know both front-end and back-end and operating system but getting all the way through down the kernel to the driver level and then to the firmware that enables the drivers is like Unicorns trying to find those people so it's it's a challenge and It doesn't get easier because people are consuming hardware increasingly as abstractions in the cloud The only sort of counterpoint to this that I would say is fortunately, there's a Robust market of single-board computers people experimenting with things like a raspberry pi where there's Hands-on access to enough control software so that people can can do things But that's you know, I would love it if that was the trend and I hope it is But there's really not a store that you can walk down the street to and buy a stack of Hardware, which just doesn't exist anymore So I was encouraged to invoke the demo gods just to prove that I knew at least a little bit about what was going on and I realized that Automating access to hardware means that sometimes things that take a long time are not suitable for demos So I want you to look at the screen on the right hand side It took about 20 minutes to bring up a DCOS cluster on Intel on packet and just to prove that I did that No, this is not the commercial part This is the he knows a little bit about his stuff part so we were able to I was able to get it up a Couple false starts I worked with our team to make sure that I could do that I could get this thing up It's not really running anything. So it's not very impressive as a It's not a proof of deep knowledge. It's a proof of work But if you look at the difficulty of bringing up and this is not running on arm This is running on Intel. So if you if you want to did get a sort of degree of difficulty question It's like well, how hard would it be? To port DCOS to arm You can take it from a couple levels, right? Do we have the fundamental automation? Yes So I have a target I say if I'm going to do this It's not going to be a proof of concept that takes a week to install If it's done, right, it's a 15 minute start to finish or a 20 minute start to finish operation That's our goal, but to get it running. Look at all these components that are supported. So I need to have I Need to have a Kafka story. I need to have a Jenkins story. I need to have spark running There's a bunch of community packages that people rely on all the time. So a port of DCOS To arm really means like porting all the things right there's no there's no You could you want to get a start on it, but to be convincing you really want to say well I you know, I wouldn't know what I was what hardware I was running on if I double-clicked on influx DB It would be like whatever the best system would be for that and I don't want to minimize it that that's a lot of work Fortunately, it's work that can be done in parallel. So a number of these systems are already Have already had port started to arm some of them are completely done and you know, it's a Certain amount of work not So that system is running on packets infrastructure Maybe can see that it has four nodes running under container linux and sunny veil and that's the that's the real thing behind it So So DCOS on packet How did I do a 20 minute? Well, I didn't do a 20 minute demo, but I showed you that I had I had done it How do we do it? We use terraform to deploy nodes On Intel it just works We've automated all the bring-up process. I don't need to know a lot about the system to have it run Unfortunately oops DCS doesn't work on arm yet Right, so I'm not showing you this board with DCS what I'm motivating is hey It's an interesting enough piece of hardware. It's an attainable goal Due to the work of many it's inches away and What packet has done with arms cooperation is provide My time to help do community management and wrangling of this ecosystem I put it in a newsletter every week with news of what's going on so that people can find each other and make connections Give people access to the hardware so that they can log in themselves and and do all the ports and run all the tests Make sure that fixes Make it make their way all the way upstream so that instead of fixing it once and doing a demo you fix it forever and Get it installed and Get it to be part of the system. So what needs to be done? So the Works on our project has really solid funding for a year. So it's like what am I going to do this year? Of course, I hope it lasts longer, but it shouldn't last too long at some point you you give up and declare victory so For for all projects the path is as follows You identify the contributors to the system who the maintainers are Make friends with the community managers, which is usually easy because they're usually quite friendly Try it out yourself So one of the things that I did before coming here is like, all right, let's try to get me says running on arm And the answer was well most of it compiles except for a library from Google called Glog or G log And it's an old version and it doesn't know what an arm processor is because the code and the distribution is from 2007 so That wasn't a complete stopping point But it's an indicative of of what what things are so I opened up some bug reports You know start the process of engaging with the community I've done this sort of work of bring up work in a bunch of communities So work with the go language community work with the node community Work with the work with the Docker community and the Kubernetes community Find the people who are working on it. Find the people who care find the people who are interested and open up the bug reports start tracking issues and What you end up with is this very Wide pipeline of progress at any moment things seem to be taking a long time But you've queued up enough things in parallel that something good and new happens every day and you get these small wins that you can build off of and And do that it's always important to contribute patches to these projects and Crucially you have to upstream all the things there is No way to make this work Unless every bit of work you do is destined for upstream And this is actually so people in the kernel world have learned this It's very rare to see a hardware manufacturer in the single board computer world Not try to get all of their patches upstream because they know that people have a very Low tolerance for forks. They have a very low tolerance for having to hive off on their own and figure something out in the in the Arm world there are some vendors that are more Comfortable working upstream than others. Let's just say Kernel work is different from application level work Often there is some hesitancy to admit the things are not perfect I'm not shy about telling people that they have a bug and sharing the bug reports I think that's part of my job But you know the I want to really get from a point of it worked for me once or I was able to do it at a hackathon Or I did it in the lab To figure out all of the changes that needed to be made and engage in the potentially slow process of Going through and getting community buy-in and understanding the risks and understanding how you can Get a get a system to change over time like I say the the crucial change in The Docker community was multi-architecture support. That was about a two-year process from initial architecture Description to to final almost final right now so So the call to action Hardware is an innovation layer new hardware means we can approach new problems And solve things You know five to five hundred times as fast if our hardware exactly matches our problem The work of developing new ecosystems is worthwhile Is a worthwhile endeavor? the fact that you can engage with people who are on have a common task is a noble cause it's likely to be a More common task across the industry as more specialized hardware gets in gets into things for instance GPUs would be another good example of a system where hardware gives you a substantial advantage and You know actively looking at all of this new hardware coming online and like how do you Consume it in a way that's easy so The the call to action is come hack gone hardware with us. We have Equipment coming online Qualcomm systems cavium systems systems from other vendors work. I'm working with people literally around the globe to port workloads to arm and to and to fix bugs and to engage with folks and A year from now I should be giving the talk about how it all just works right how it's boring How it's indistinguishable from from DC us on any other hardware? So the the pitch is Come explore the works on arm works on arm comm is the is the website We launched we relaunched that this week. It's my community site to keep track of things a sort of a Catalog of ships of all of the logos of things that we know have a good good Good behavior on arm and some that are in progress and need need love I produce a newsletter every week Friday a noon Eastern time And send that out to a list which is growing If you want a login if you want a whole machine Like this to use for your efforts contact me We have a process of working with Qualcomm the the hardware is currently under NDA so You Where it gets metered out fairly slowly, but as they get closer to to going to market It'll be easier and you can reach me On I'm not the only Ville Medi in the world if you can type that successfully you can find me on any network Or you'll find my brother Packet host is our handle The logo on the left is the packet packet bot logo the logo on the right is that works on our logo And With that I'll step down and take any questions. Thank you very much sure Yeah, there's a there's a spot for a TPM here and That's a really good question. You have access to the you'll you will have You know root level access to the hardware. I think that's possible, but why don't you drop me a note and I'll Know no It's it's all bare metal and you get you get the whole thing So if your hypervisor can engage the trust sound You can make it work So the question is what am I trying to get working on arm the The works on arm project to date The first thing is get all the operating systems running so engage with people who have Gotten who are doing distributions and who've gotten tens of thousands of packages to work. So you want to Debbie in Santos fedora red hat For years, right For years, right, so there's a there's a baseline of of operating systems Compilers is sort of the next frontier. They all just tend to generally work the challenge is optimization in some cases where if you use the hardware instructions correctly you get You know order of magnitude performance improvements There's a certain amount of algorithm development that goes on in parallel with that Finding algorithms that work really well with this hardware so that you can do a hashing algorithm or something like that And so at the edge, there's people who know the Intel instructions that deeply and the arm instructions that deeply And are inventing things that work really well, you know from the start on both sorts of systems Bob the gym The question is about other sorts of things that you might do like FPGAs at packet we have that sort of stuff on our radar, but the Question from a service provider perspective is if you have an FPGA that you give to one customer And they're done with that You need to undo everything that they've done and give it to the next customer That's hard. I mean that's sort of fundamentally hard Right right right right and that which is a part of the same problem of like Yeah The same Right So the question is if if you're doing custom development and making changes to exploit the hardware How do you bear? How do you verify that you get the good results or the same results? so the answer to that is not one verification suite because the world has gotten a lot bigger it's for Essentially the larger the project the more likely that they have some level of test-driven development to support their CI system You can get everything out There's good verification suites, so Let's see Good being hours of work That all has to go correctly or more minutes of work if it's a super fast machine Good being thousands of tests Good being tests the test against specific regressions and whatnot. It's variable. Some systems are better than others at testing but Packages are available. Yeah, so the the the challenge of the next 12 months for me is continuous integration Making sure that as packages say that they have been ported to arm and have successfully gotten something running Going to the point where you could from there Every time someone checks in new code run the whole run the whole regression suite Make sure that nothing is broken if someone has a new bug you add a new test to test for that new bug and That's sort of more There are some systems that are hard to do that on right testing distributed systems is intensely difficult But certainly languages and libraries are attainable and I've seen Right you don't have to do as much yourself Right, right. Yeah, I mean I have occasionally seen cases where a Mathematical result on arm will give a different one from Intel and you file a bug report And you sort of dive into the algorithms to make sure that the libraries are doing the right thing Yeah, so it's it's actually You're actually using all those instructions and working your way up the stack from the fundamentals up to You know up to higher-level things the other thing you have to do is continuous performance evaluation Where you make a change and did you have a regression on various tests and Can you get forward progress and is it forward progress on all the architectures because you don't want to do Something on arm that makes the Intel system slower. That's that's a failure So so what arm is bringing is high core density? Lots and lots of cores for the same on the same You know die size and comparable power consumption more cores than than alternative systems So if your workload is by its nature of really parallel because it's IO bound rather than CPU bound Or you have a lot of threads that you need to run It that those systems tend to have much higher core counts for the same price performance envelope And that's valuable in some workloads and less valuable than those With that I will thank you Look forward to having