 Good morning. Hope you're all feeling well. First thing Sunday morning. My name is Michael Dexter and I have taken upon myself to document the Beehive hypervisor and I will give some background and my goal today is to wet your appetite as they say and to make it easy for you to give it a try and hopefully begin developing on it. So a little background on Beehive. Who's familiar with an unconference? Okay Fred. So an unconference is a novel idea where you congregate, you typically put up a giant board and have stick on messages and you propose topics. You announce your topic in front of the group, put it up and through a combination of voting and rearranging and horse trading, you come up with a schedule. Few people have prepared slides and it's a remarkably effective tool for getting an up-to-date topic discussed. So Meet BSD California took place in 2010 and will take place next month. It's an unconference format with a few prepared sessions and you congregate, you propose, you break out one of the sessions and if you're lucky you repeat the next day. So you're doing it wrong. Somehow at Meet BSD 2010 everyone wanted to attend the virtualization session. I had never seen that. Everyone got in a room, every breakout room was empty and we discussed virtualization. So 100% attendance. They're like the vendor summit meeting. There was a small intimate discussion the next day after everyone had time to think about what they discussed and the conclusion was we need a hypervisor. And as always we have no idea who will pursue that. So the result at BSDCAN 2011 Neil Natu and Peter Green at the Dev Summit announced Beehive to a rather stunned audience and you can find their slides on the free BSD Wiki and there is an audio recording of that and maybe a video. I suggest you check them out. And a history of my involvement about six months after BSDCAN I said hey that's pretty cool what's going on and they have day jobs and I said well it's out there and yes we can help you get going. So at my little journal I put up an article on Beehive demonstrating everything I've learned to date and how to build it and in April I set up Beehive.org simply as a repository and I should have a special thanks to the New York City BSD users group who has some storage space for some prepared guest images because throwing around 60 to 400 megabyte images is not something for everyone's server. There was a Google Summer of Code project this summer that is not completed because the developer ran into an issue towards the end hopefully that will complete and it is a project to include a BIOS for supporting foreign operating systems. So if you're not familiar with the convention too long didn't read for those of you who want to run it requires Intel extended page tables. Guests are booted from disk images unlike jails which are typically from a file system. JHB John Baldwin recently made some changes to the assembler in tree that allows for the existing GPLv2 license assembler to include the new instruction sets. The binutils package is GPLv3 and that concerns a number of people. It's close to being ready to being dropped into head but not yet and there's a bit of a chicken and egg issue so I want your involvement to hopefully push that over the top. It can be dropped into 9.0 quite reliably and I'll show you that and it's easiest to test on free BSD 9 and 10 current. There were patches 15 hours ago the last time I checked so the fire has been lit under them and the developers and it began life at free BSD 8.1 so guests are generally possible with some workarounds from 7.2 forward which might be quite attractive to those with say embedded products and who want to contain them on newer host operating systems. It should build with Clang quite easily and I haven't tried this but for those with Max reported leaving where Fusion on the Macintosh has VTX and perhaps VTD pass through such that you can run this in virtualization on hardware that supports it. If you do in virtual box you'll find that says there's a pre-generic PC without extended page tables. That's a cool trick. So in context and we touched on this briefly before the talk the proprietary and dominant players VMware quite a few hands came up for those using that. It contains busy box and I challenge any user to get sources from the vendor and build busy box and put it into your to your VMware instance as per the terms of the GPL they may be violating it and that could be a mess. Linux has the KVM hypervisor which is the closest closest work alike closest design to be hive. They also have LXC containers which were inspired by FreeBSD Jails. They have a few interesting features and are missing a few features so it's sort of the Linux Jail. SmartOS a derivative Lumos formerly open Solaris has the Linux KVM hypervisor that they imported. They have zones which are clearly inspired by FreeBSD Jails. They've added new features and they actually run their hypervisor in a zone such that if you break out of it you're stuck in a zone. FreeBSD has made the most progress as a as a guest on say Amazon EC2 and there's various Zen work going on but I'm not clear if the FreeBSD Zen host is fully up to speed and not should you want that. And honorable mention at the SDZen the classic open source hypervisor it is large it is reliable but you can see my multiplicity talk for a little more about that. So in context what is this hypervisor thing? Quite some time ago I was a year old Popech and Goldberg defined a hypervisor back on IBM mainframe hardware where they were in hardware accommodating previous generations of the system. So they define the formal requirements for a hypervisor and the the qualities you look for the properties you look for. Now it doesn't help to get too caught up in the different types of hypervisor type one is a dedicated kernel and user land which is generally off the shelf parts such as VMware and it launches then come your guests. A type two is something that is built under some form of host operating system. One can argue that the the intricacies are hidden in the host operating system but FreeBSD is rather proven as Alistair Crookes pointed out yesterday more data has gone through FreeBSD thanks to Netflix than any other OS so I wouldn't get too concerned about the risks of type two hypervisors. The properties of a good hypervisor are equivalence fidelity basically it should look like the host system it's running on and that raises the question of does the guest need to be modified in any way or can it simply be installed from an ISO image and booted as if it's in a nice friendly familiar system. Resource control is the hypervisor indeed control of the guest or is there a risk of it running free and monopolizing resources and efficiency primarily compared to say a software emulator where there's a significant significant performance drop a hypervisor should be near native performance because you are using hardware extensions to get as close as you can to a proper experience and the Wikipedia pages are pretty good not great but good so hardware assisted virtualization this is in the words of Peter yesterday be hive is all built on VTX exits they are used to build the PCI emulation since IO instructions are used for PCI can take space you may read that the biggest breakthrough was the extended page tables that were introduced with Nehalem hardware from Intel and is that's basically core i5 first-generation forward I'll touch on that and newer systems but EPT was a breakthrough and as I've joked before let's party like it's 1980 because we finally have a PC with some tiny main frame features so hardware assisted assisted virtualization the beat you've probably heard the buzzword VTX VTD and others and seen it show up in your d-message so systems have had VTX for quite some time since I think even Pentium 4 systems that intercepts privileged instructions and routes them and things such as page misses and all were handled in software recently Intel added virtualization for directed IO so that hardware devices can be masked out from the host system and provided to a hypervisor and the extended page tables replace a large amount of MMU software emulation in hardware and that is a major performance improvement source and I believe like many of these technologies like MD64 AMD pioneered these and they're out there and be hive can support them in theory but it's a matter of just a few developer hours so if you have a system you can try this here you've probably seen the VMX little feature for quite some time the one to look for is pop count that is not the best choice of a phrase it could be mispronounced and traditionally and I believe in all cases pop count includes EPT in any of the similar generation processors quite recently Intel has kindly added EPT compatibility on their site and I'll get to that so Nehalem Sandy Bridge IV Bridge systems most core systems that start with an I but they've complicated by adding Celeron and Pentium systems which is a surprise so I3 I5 I7 processors most of the Xeon processors from that generation perhaps all although they're Xeons back to 386 so the term Xeon is useless in this context surprisingly let's see Pentium mobile and Celeron processors and VTD is available on generally higher end processors so I'll show you how to find that out some vendors will have a BIOS option to enable or disable the VTX features why I don't know I don't know what harm they would do and potentially a vendor might block them out entirely from your system even though it's on the CPU loophole enough so arc.intel.com is your friend core processors get all the attention and I was surprised to find a Pentium processor with extended page tables and VTX so for about the last year at the very bottom of those processor descriptions they've included EPT there is absolutely no graphic matrix with all the processor names the wattages the EPT and the price that would be too convenient you know what I'm talking about so the assembler implications I've touched on this earlier so if I want to recap too much but VBSD has had a somewhat older binutils assembler because it was GPLV2 licensed you could add the GPLV3 versions newer one that includes support for the EPT instructions but that would never make it into base key points so for quite some time one would just drop that in which seem to be version 22.2 underscore three but John Baldwin implemented the missing instructions everyone thought it would be way too hard and take forever but no we just did it and if you look at the free BSD revisions 238123 and 238167 you will see those changes and those changes can be dumped on a a nine system you can build binutils and it works just fine so here's a brief description of what he changed just those instructions for those who care so here is how I pulled those in you'll find that in the different revisions some were updated later so you don't have to pull all of them twice I just pulled out these five of them and it worked hop in there and build it and you have a working assembler this makes life much easier so the versioning moving target we have began life with free BSD 8.1 and so I've found free BSD 9 to be a great point of reference although I started my work before even the release was out so it's it's very good to have a fixed point of reference to which one can say it works or it doesn't work so since nine there have been a number of floating point changes and beehive is slowly catching up and of course that's all in head so free BSD 10 is a way to go but they have not yet set up a build system and they would like to have ISOs built and he mentioned hosting them maybe at all BSD and my first logical thought was okay I will build the current one make a release and try it and spending about half a day doing that each one and then having it blow up on me and I knew there had to be a more efficient way to test for regression so my solution start with 9-0 release and fortunately PC BSD 9-0 is free BSD 9-0 and it's fully compatible you can work with them and do presentations and and hypervisor at the same time demo the build environment beehive is tiny and building world for as you'll see a very small amount of components is a waste of time and resources and SSD ticks so I came up with a way to one not touch the included release source directory with the help of all union FS mounted to a similar one which is source beehive drop from SVN the changes which are not very large and on an SSD system put a memory device for object so I'm just blasting the results into RAM and if you unmount that union FS you get to see essentially a diff of it binary of what was changed very cool so here are the components of beehive there is a just three user-facing utilities user actually I think that's yes in SBIN beehive beehive load and VMM CTL I've given the link of the sources in source what directory they're in there is a library that's used which is in user source library a little lib VMM API which has a number of components and their the blue ones are visible you'll see them in your file system they're not like built into the kernel and the red ones are actual utilities that you can call but you will in fact only call beehive which you include the beehive load and VMM CTL and pretty pretty elegant and the module is the VMM dot KO kernel so currently guests must be modified just a little bit it's running free BSD on free BSD so it's it's not monumental it's important to have MP table even back well for each of the versions that in time will change there is the BVM console which simply means it starts booting and jumps into your own console and it's just a as they put it brain dead simple console the x2 epic support which is for performance so it may be optional but you want it MP mocked up and the kernel configuration file which simply has the console and MP table and when they get ACP I working that will go away not a whole lot of changes and anyone who's built free BSD configure these out so helping the guests are vert IO and there are a number of vert IO projects out there this is the one from Brian event ticker I guess Germans in the audience help me here I'm not too out of Japan and there is the tap network device which allows a guest to have networking and you build a guest without modules because it will not be looking at hardware it's simply in a little contained environment so this is the file layout you who's built a jail in the past that's basically a user land here is a user land in a disk image which can either be disk dev or MD route from memory backed image the kernel is external to the guest file system a bit like Zen where you point it at a kernel and say go to town the vert IO modules are outside of that environment and the the boot directory is pretty much off the shelf I copy it from the host and it works and user boot so required by the utilities and the red is a heavy lifting where you must build the kernel and prepare your disk image but and you can use whatever tricks you like be it even nano BSD or otherwise to build that user land it's simply a free BSD user land so host preparation currently one must deduct memory from the host to allow it to be used by guests and one does that with the hw fismem property there's to take say an 8 gig or 16 gig system down to 4 gigs to the host operating system and allow the rest to be available to guests currently B5s a bit noisy so there's the you might want to suppress some debug output and the last ones can be done from the command line such as load the VMM kernel and set up your networking pretty straightforward so this is what my system puts out if I've done the entry real memory and available memory it I'm now down to 4 gigs out of 16 and I'm amazed how affordable memory is right now just let the record show so from the beginning Neil has had a very simple downloadable guest image it has a few shortcomings in my opinion it is MD root backed meaning it's a memory file system meaning any changes you make vanish upon reboot and it's like a hundred and five percent capacity so you can't really drop anything exciting in there and test that and for what it's worth to accommodate it using a large MD root there's a kernel option you probably want to set but the system is fine with that so here is the boot string and Neil has kindly provided a captive script that simplifies this but for development purposes I want it as verbose as possible you can see the user land utilities the MMCTL we have load and beehive that are simply available in S bin variables such as the name of your guest and memory allocation I believe above and below the I 386 classic memory map using the networking interfaces and I found that I needed to sleep if config wait till the system boots then launch it so you can have networking and we'll get to this later but it boots like a free BSE system it is free BSE on free BSE and that's just a capture from a console so shut down you can shut down a reboot if you do reboot it'll give you the splash screen hit escape type quit and it'll drop back into your host host console and I mentioned not wanting to build world all afternoon and did it through Thanksgiving and family didn't like that so to build these components all a cart I had to do a few tricks one is dropping in from S of VN the modified components for beehive on top of this on top of a source tree and again that I did with the union FS mount I had to do a few make file changes just so that they could build incitus as opposed to in the full build world and several of the utilities needed machine linked in there so it knows where the heck it is so building the guest image again use your preferred jail building technique I simply made a disk image 400 megabytes in size mounted at 3MD config labeled it mounted on mount and over the wire fetch the release and just drop it on top you can probably do the same with the PCVSD basic release but it's probably a lot larger it did take time to get the FS tab right RC dot com if you want networking resolve dot com and TT wise because you're going to a console and console it's not complex but it needs to be done and I've scripted everything to do that and you might want to enable SSH login for further testing although you do see the booted console into your console now how you get there I've had beehive menu SH out there for quite some time hopefully by the end of the day I'll update the rest of these I've done it all a cart so you can see exactly what's happening I want to be very verbose about this so there's no magic to worry about prepare the sources like the union FS mount patch the assembler build the assembler copy it in and patch the loader for the memory reduction so this was a quotation from a little while ago theater at was a bit concerned that x86 virtualization about basically placing another nearly full kernel full of new bugs on top of a nasty x86 architecture which barely has correct page protection then running your operating system on the other side of this brand new pile of stuff you are absolutely deluded if not stupid if you think that a worldwide collection of software engineers who can't write operating systems or applications without security holes can then turn around and suddenly write virtualization layers without security holes theater at so the heavy lifting of beehive is VMX dot see which does all the bit banging for the extended page tables and VTX functions I downloaded it I sorted it dumped the comments dump the spaces and 1300 lines of code small and in my process I build a simple package of the user land utilities the the kernel module and all that and thought wait I should check this and it's 259k a hypervisor in far less than a meg so again who's run VMware who's run Zen who's run all the alternatives that you can keep in your head this is good the future so I got some comments from Peter last night planning a pick tables I'm not a developer some of this is simply internal features and of course through the course of this Intel is releasing new features so and free BSD simultaneously so beehive has to catch up with the two of them but generally the result is very good I think just yesterday Neil contributed the either first part of or all of guest idle detection currently a guest will monopolize a CPU at a hundred percent that will drop down to nothing if the guest is doing nothing I see a HCI device simulation vertio MSI support if you if you're familiar with those that will make sense I sure hope that you will complete his work let's work on him it's exciting stuff he he took on a hard challenge blesses hard I cornered him in a bar in Tokyo and said hey can you put this to open BSD and instead he's doing BIOS which is would be a killer app to have say windows or Linux booting natively on beehive so good props his way more Emil more internal goodies and AMD support which is straightforward of using nested page tables instead of extended page tables are our VI as they've renamed it and all these technologies get renamed by the marketing department every few months on the horizon better integration with the host scheduler memory over commit such that you can convince a guest that it has a gigabyte of memory and really only use what's what's being used like a like a sparse disk image and on that note sparse disk images and I haven't tried this but it should work with Z balls right now very cool vertio has been developed in parallel and the developer is not a key a core committer so it's been a bit of a game getting it into free BSD and then further into beehive and generalization of the CPU IDs if you say boot a a guest on your little core i5 and want to move it to your Zeon down at the data center if some of the CPU ID issues are generalized and on a my whatever the guest will not care that it's on a different CPU currently it will say oh my gosh what happened what have you done to me my to-do list PCI pass through it's a bit frustrating that not all hardware has the VTD extension so when you're shopping for a notebook or something look pretty carefully at arc.intel.com if it has the VTD extension there's the syntax you find out what the device is you use the black hole driver to mask it from the host operating system so the physical card is there the OS ignores it I think we'll get to the proper term it probes it says okay let's not talk to you move on and then when you're launching the beehive guest you add the dash S option and tell it what card or device what PCI device to go after also you can redirect to a serial console for what it's worth and somewhat excitingly nano beehive so I'm impressed with Freeness and 259k adding a hypervisor to that is quite simple and for all those who have their like VMware served by an iSCSI machine could have the hypervisor on that little flash card I'm excited and I don't know who saw the Chris Moores warden talk yesterday it should be quite simple to include beehive support instead of a free BSD jail instead of a Linux jail have a beehive jail that's in the works and dog food I actually did this running PC BSD on my little refurbished think pad and use LibreOffice it was a bit painful at times but I'm happy to do that and I'll do a little live demo any questions here and now impossible that's the heart of it and there have been oh there was a past project years ago that was a simple bias for another purpose and that might be usable you've probably seen that I think it's Zen and even KVM use a lot of the QEMU components they have a more compatible licensing and they're probably big bloated components but that's the heart of it adapting another BSD to boot shouldn't be too difficult it's mostly just loader issues and it's it's far more familiar than say windows I hope that answers your question other questions more people interested Neil and Peter have day jobs and they blast out what they can but again it's a bit of a chicken and egg issue if they're three of us using it having that you know tiny user group is a bit limiting so do you get out there there's the countless opportunities to improve it from packaging I just throw things an old tar ball in the directories and it's definitely not a correct free BSD package but it works so vendors look at the code it's it's not big test it see what hardware it works on breaks on I haven't had in compatibility issues do check out my site call for testing org and be have org I again describe where the utilities are how they're built what they do and just familiarize yourself because again it's not a lot of code fair enough other questions you will see that the host system it has access to about both the video almost just over three gigabytes of memory out of 16 KLD stat the above all the VMM kernel drivers running and the disk dev is a 400 megabyte disk image built with scripts the boot directory is quite familiar there's the host OS there's PC BSD has quite a few added components but the beehive guest boot is quite simple there are the kernel mostly vert IO bits and the kernel so as I'm mentioned earlier one can one cannot reduce host memory on the fly one has to reboot but one can obviously load the kernel modules so I have a very simple script VM prep that does that preparation and and as I described in that breakdown of the command VM run that is beehive it's strictly free BSD on free BSD funny treatment of the border there no biggie you'll see the vert IO drivers fly by and running a standard base installation I didn't try doing the networking through the wireless so at the moment I cannot I cannot probably network you will see that it is consuming the processor however the patches went in in this in hopefully the full solution just literally within the last 24 hours so that should drop down to near 0% tada and do catch me in the hallway etc and I'll help set you up with it let's we have just a moment so let's see okay at beehive.org you will find the images I'm booting from right here it's the 400 megabyte image thank you nice bug and the 239k package 65 on that build link to the presentation and I will get my Alucard scripts up there as soon as possible there is a wiki entry page but it's it's hit or miss because the developers threw in the few there napkin notes and what I started with was literally their own notes themselves that were not ready for consumption on the outside my article on the subject and the mailing list is the best place to discuss it there is no dedicated beehive mailing list so thank you