 Hi everyone, thanks for coming. My name is Andy Bodding. I'm a system engineer and I work for ARDC, the Australian Research Data Commons, and we operate the Nectar Research Cloud. I'll just give a little bit of background about the Nectar Research Cloud. So we're a nationally funded project in Australia and we provide a compute resource for Australian research. We operate across eight different sites around the country. Each of those sites operates semi-autonomously and I'm part of the core services team that operates the central APIs. We run quite a large installation with I think more than a thousand compute nodes and I think we have something like 16,000 registered users with about three and a half thousand active projects in a year or so. So we do quite a lot of work. So what I want to do is just give a little bit of overview of how meta data works. So as most of you might know, when you boot your virtual machine, Nova will provide a mechanism for obtaining meta data about that instance for setting up networking, understanding what the host name should be, SSH keys, those sorts of things. It can be served from a config drive or a special URL with the 169.254, 169.254 address. So a request to that will then be passed through Nova and Nova can build some information to then allow the instance to provision itself. This is most useful when used with CloudInit. So CloudInit will understand how to fetch that meta data and perform actions on that. And so there's a couple of different types of data that are available within Nova. The first one in my list here is user data. So user data is the meta data that the user provides when they're booting the instance. So this is optional, of course. User doesn't have to provide any, but if they do, that user data will be made available and CloudInit can discover that and perform provisioning tasks based on that data. You might provide scripts for installing some software or installing some packages. It's up to you, really. Lots of possibilities there. The second one here is the sort of general meta data that we call and that's provided by Nova and provides system sort of data. So it's going to provide what the instance ID is, what the host name of the instance is, SSH keys, and other things as well. So things that the instance might need to know to do its proper provisioning. And the last one I've got there, which is really what this talk is about, is vendor data. So vendor data is a third form of data which can be provided by the Cloud operator. There's two types of vendor data, two types, the dynamic and static. The static vendor data is where you can provide a single JSON file and what'll happen is that JSON file will just be served as is through to CloudInit and it will be available at that URL, the sort of, the standard your place where you would get your metadata, but it would end in vendordata.json. So that will be the file, the file provided to Nova will then be provided through under that file name. And CloudInit understands that file and it will request that file when using the OpenStack data source. And so the information provided in that file will then be merged with the system metadata and the user metadata. And so you can have all of those, those three sources provide data through to your virtual machine for CloudInit to use. So the main part of the talk here is to talk about the dynamic vendor data. So dynamic vendor data is data that you can provide to a virtual machine and the dynamic, the dynamic data is, is generated based on context given to a web service. So I'll talk a little bit about context in a minute. So this vendor data can be available from the vendor data, vendor underscore data 2.json file. And so the, the interesting part about this though is CloudInit does not look at this file. At least not yet. I had a look in the source code and it would be possible to extend CloudInit but it currently doesn't look for that file. So if you do go down this path and you want to provide dynamic vendor data to your instances or your users, you will need some way of handling that yourself. So what I'd like to introduce now is a little project that I built called NovaPollinate. And so the purpose of this project is to facilitate the generation of that dynamic vendor data. So how it works is Nova, for the request to come through to Nova, Nova will then need a URL configured in its config file to then point to a web service endpoint. And so what this project does is it facilitates that web service. It is, it's a very simple architecture. It's, it's a basic web service but it's designed to be pluggable by nature. So you can build your own plugins and what will happen is each of those plugins will be executed based on, the context passed from Nova will be, will be available to your plugin and you can perform whatever sort of function you want. Look up an external database or an external system, whatever you like and then that information will be then formatted into, into JSON, merged together with any other plugins you have and then passed through the virtual machine. So it's Keystone Auth enabled so you can't just make a request to it to get data. It actually has Keystone Auth in front and so if your requests are legitimate requests through Nova the right way, Nova will pass that token authentication information through. So there is an element of security on that. The URL is there. If you want to have a look check it out on GitHub. Apache licensed. So this is just a little diagram to show you how it works. So you see starting from the top here you've got your instance here on the left. The first thing that happens is the instance will make the request over to get the vendor data to.json file. You can see it'll go through the Nova API which will then pass that request along with the context that we talked about earlier through to Nova pollinate server and then based on the plugins that you've built it'll make external calls to whatever system is appropriate and then pass that request back through Nova and then to back to your instance. So the context provided by Nova you've got project ID, instance ID, image ID, the user data, the host name and the metadata. I've got user data and that's why I don't know. So with all this context you can start to think about possibilities of what you might like to provide to your instance. So having project ID I find is is going to be a very useful one. So use cases. So a simple use case might be that you have an instance that you boot and that instance might want to start performing some API actions against your OpenStack API but the instance actually itself won't know what project it's running under because that's generally not available. So using this method you could actually use this to make that information available to your virtual machine. So this case is provided in the source code so you can go and have a look at it and so what's going to happen is when you request that vendor data to json file from within your instance it'll go through to Nova Pollinate. Nova Pollinate will query Keystone based on the context it'll know what project it's come from and so what the plugin does is it does an API request to Keystone, look up the project, give me the project information and then parcel that through as as json through to the instance and so once you have that data you can do whatever you like. So this use case CloudStore is a storage product by a company called Arnet that works in Australia and they provide storage, free storage for researchers to use. So we have a use case where we have a lot of users who might have storage through CloudStore and want to use virtual machines on our cloud and so we wanted to facilitate a way that would make it super easy for them to use that storage and so we were working with Arnet to to build a system where we could send an API request to them to say please provision us some storage. They would provision the storage send us back some credentials. The storage is based on webdav so they'd pass us a username and password and we would store that in Keystone for use in here so what would happen is with this plug-in if the users had that storage the plug-in would go check Keystone find those credentials and if those credentials exist then it would pass those credentials through through the metadata and then we would have a deam and a small service that would run at boot time and look for those credentials and if it finds those credentials then automatically create the mount point at the entry to the FS tab and mount that storage automatically for the user so ultimately the plan was that the user would be able to request a storage from an external service and then have that whole process automated to the point where when the instance boots the storage would be mounted automatically and they don't have to mess around with fetching an external token or going on to the command line to edit the FS tab to add the service to any of that we can automate that whole process so I think that was that's a very powerful thing to relieve relieve the users of that burden of having to do that especially when the type of users we're dealing with are researchers who don't necessarily understand command line very well understand FS tabs and those sort of things and a third use case that we thought about is MATLAB and so because of the distributed nature of our cloud we we service users from a lot of different institutions and each of those institutions might have their own license server for MATLAB and so by using this service we could potentially create one Marano image or just a regular glance image that has MATLAB preconfigured for users to use and rely on something like dynamic vendor data to be able to to to give us the correct license server information for that institution based on the user ID or the project ID and so we could just store that in some external database or some other web service and and not have to burden the users on knowing which institution license server they need to connect to or require the different sites in our cloud to have to provide whole separate images that they build for MATLAB and we can reduce all that duplication by having one image using dynamic vendor data with the Nova Polynate service to then provide us with the correct license information at runtime and just make the whole process simpler for users so I think once you once you start to think about what sort of things you can sort of things you can do for your users by providing dynamic vendor data you can there's a lot of possibilities of ways that you can automate things and make it much simpler so the OpenStack's docs page on vendor data is quite useful if you want to look into it more and Michael still I think was the instigator of a lot of this dynamic vendor data work that went into Nova so it's well worth looking at his blog posts for some more context in how that works so that's it from me I'm happy to take any questions if you're interested to talk about dynamic vendor data or the Nova Polynate service thanks very much