 Hi there. I'm Elise Cafford. This is Vitaly Grydnev and Nikita Konovalov. We're all cores on the Sahara project, and Vitaly is the current PTL as of Newton while may he reign. So just to go over what we're going to be talking about today, first we're going to do a very brief overview of Sahara, what it is, what it does. Vitaly is going to talk to you about some new features in Sahara this cycle, health checks, Kerberos integration. I'm going to go over some changes to the image generation capacities of Sahara and then Nikita is going to finish off with bare metal provisioning and some additional features. So first off, the use cases that Sahara fulfills. Sahara is of course the data processing service with an OpenStack. So what we do is we have a streamlined unified API to provision clusters of popular big data processing frameworks. So we have plugins for Apache Hadoop, Cloudera, Port and Works, MapR, as well as some sort of newer entrants into the field like Storm and Spark. We integrate with heat primarily and through that to just about everywhere else on Earth. If you ever want to test your OpenStack deployment, you may just want to run the Sahara test. We use just about the whole thing. We then have an elastic data processing engine, which if you're familiar with AWS, this is something like EMR, but it's much more intended to allow you to use any data processing framework of your choice. So we support Java, MapReduce, HivePig, Spark, Storm, just about anything you can name. We allow you to create transient clusters or permanent ones, depending on your use case. And for data sources, as well as being able to connect to local and external HDFS instances, you can also optionally integrate with Swift or Manila if you want to store your data there instead. So this is sort of an overview of the API. Up there you can see that first you want to register an image. And we have a repository for that that we'll be talking about in a little while. So you're going to register an image to your plugin, create a node group template, which is going to provide some basic configuration for each of the services that are going to make up your cluster. You launch that thing, have a cluster, at which point you can register your data sources, create your job binaries, which you do just like if you are any other Hadoop dev or Spark dev or Storm dev. Register those with Sahara, create a job template to allow you to configure it per run, and then you run it as many times as you want against your data sources and glean whatever insights you're looking for. So that's basically what we do here. We've already talked about the cluster provisioning plugins a little bit, as well as enumerated the EDP job types. We also have an image packing repository, Sahara image elements. We have a new framework to validate Sahara, which is now packaged and shipable to customers in its own right, which we'll be talking about in a little bit. We do have a Horizon UI plugin, as well as an OpenStack client plugin. So that's sort of an overview of the whole project. And at this point, I'm going to turn it over to Vitaly to talk about some new features in Newton. Okay. I'd like to talk about a few updates about health checks and some management improvements on Sahara's side. First of all, I'd like to mention about EventLog for clusters. What is EventLog? EventLog is a feature that provides you an ability to manage and understand the current step of your cluster deployment. And in case of possible failures, you can understand what is the reason of this failure with StackTrace or something like ErrorID and so on. And during the Newton development cycle, we did some improvements of this feature, including an edition of this feature to clusters provisioned by Ambari management framework. And we did some improvements in CLI with support of this feature. You can dump all events and steps to have ability to send this information to your admin, which can help him to probably understand what is the error during the cluster deployment. Okay. That's how it looks like. So you can see that steps, few steps already completed successfully. And some of these steps are currently in progress. And this is view from CLI. And from CLI, it looks like this. So you have also few steps completed. And you have configuring instances in progress. That's how it looks like for Bernplagen, for example. Okay. Let's talk a little bit about health checks. Health checks are a really important topic when you manage long-leaving clusters. So when you have a long-leaving cluster, it's quite important to have some monitoring service to understand what is the current step, what is the current state of your cluster. Sahara in liberty. And previously, we didn't have this framework to monitor your clusters. Cluster can be active always after deployment, even if all infrastructure no longer exists and something like that. So we implemented new framework that provides the ability to manage and understand cluster of managing, understand cluster health. Additionally, we implemented that for all plugins, but it's implemented for Ambari and Cloud Air Manager generally. And for others, plugins that this feature has a little bit less availability. Since Newton released health checks are available for mapper plugin. Additionally, you can get some notifications about health from Cilometer. And also you can easily check health if something was wrong five minutes ago. You can just probably currently cluster a little bit more healthy. We can just recheck that. And how it looks like from UI. So yeah, there are all steps and health checks are green. And how it looks like for some probably issues. So you can see that some of the checks are read and probably admin or user should take a look on this issue. And how that looks like from CLI. So what is the next step for health checks? So we are going to make some more detailed health checks. For example, for several date nodes or something like that, for example, to understand what's what particular date node or slave is failed or something like that for additional space for each for each GFS. Also, we have some ideas to make some kind of actions or suggestions to repair your cluster. For, for example, in case of date nodes, if you have failed date node, you can just start and replace that at or probably add new nodes or just restart services, something failed in total. And probably we're going to implement some kind of more more flexible configuration for health checks, like more advanced health checks or just disabling health checks temporary for case of maintenance for cluster or something like that. Okay, let's talk about a little bit about Kerberos integration and about security in general. What security is always important for clusters. So many customers are eager to have your cluster secured or something like that. And moreover, security always important, not just for clusters, but for Sahara itself. What we did in Newton, we implemented a new ability to provide for user ability to create secured cluster by Kerberos. So you can pre configure MIT KDC server or just use that MIT KDC server that provisioned by Sahara. And then your cluster can be secured by Kerberos. We also reimplemented musically and that so that it can support authentication to KDC server. We are managing tickets so that Sahara can easily run job as always. We also support that for Spark job executions. And we distribute all case for system users like HDFS, Yarn, and so on. This feature is also supported in Ambari and Cloud Air Manager plugins. But for others, it's still in progress. An additional note that when using the Kerberos secured clusters, you need to be sure that you have latest Hadoop Swift jars provisioned on top of your nodes of cluster to be sure that you have correct integration with Swift if you use Swift data source. And how it looks like from UI, you can just take another check box to enable security for your cluster. You can also specify your IP address for KDC server. Okay, for admin principle, password for admin users so that Sahara will be able to provision additional users which are required for cluster operations. And some words about security for Sahara itself. We have bandit tests for per commit to validate security issues in Sahara code. And additionally, one note, we have implemented secured secret storage using Barbican and Castellan in previous releases. So you can use that to store secrets which are created by Sahara during the cluster deployment. Okay, let's switch to the generation session. Cool. So you may have detected the theme that Sahara is actually pretty full featured at this point. And at this point, we're really drilling down and trying to make what we have already better with the health checks, you know, now enabling security. And part of this, you know, overall theme of our current work was that we had some problems in previous iterations of Sahara related to image generation. Really, we have three basic flows that touch image generation in some ways. First, we had pre-NovaSpawn image packing. So this is where we use Sahara image elements and Dib to sort of prepack an image to register in clients to register to Sahara to spin up your cluster. Fair enough. We also had a case in which after NovaSpawn, users might want to take a clean OS image without Hadoop pre-packed onto it and spin up a cluster from that. So we wanted the plugin to be able to act on those nodes to configure them appropriately to run Hadoop. And finally, you'll note before we had really two image flows that were relevant to image manipulation, but there were secretly three because we really should have had the plugins validating the images that you were actually giving to them to catch errors early and to make sure that it could sort of spit out reasonable things, you know, if it didn't happen to find some Uzi dependency instead of failing somewhere mysteriously. So there were a few problems with that. First off is we had huge duplication of logic. We had to maintain essentially the same logical flow in Sahara image elements for the packing side as we did in the server for the clean image generation, for the clean cluster generation. And that had to be expressed both in Dib and in Python. So that was very difficult to actually keep in sync. There were a lot of persistent errors around that. We didn't have validation much at all, as I said. So, you know, if you had an error related to image generation, and they did come up because Sahara image elements was versioned separately from the Sahara server, so you sort of had two components of each plugin application versioned completely differently. And finally, of course, you know, related to that, it's just kind of poor encapsulation to have a plugin model in which all of the image building requirements are still in, you know, one monolithic repository for the entire project. And a lot of projects in OpenStack do this. You know, Sahara certainly isn't the only one. But fundamentally, we didn't like the pattern. It makes our whole architecture less pluggable, less, you know, versioned well together and harder to maintain. So our dream implementation, you know, the shining city on the hill that we want to aim at is that all of these flows share the same logic. You know, we have one definition that does all these three things. We want to store and version image manipulation technology within the plugins. We still want the user to be able to generate these images via CLI and register them. But we also really want an API so that the user can just click a button, say, I want an image for this plugin, make it for me, and it will. And that's been difficult with previous tech. And we also really want both dev test cycles and user retries to be rather quick and painless. This hasn't really been true with Dib in the past. And we wanted a better development cycle for ourselves and our customers. So we came up with a plan, which is basically we're going to first build a validation engine that ensures that images meet a specification. We had a YAML based spec definition that the plugins could say, yes, this image in fact meets my spec. We then extended that image to optionally instead modify those images and bring them up to speed instead of just testing them. We wanted to build a CLI to expose this, then create and test specifications for each plugin that actually supported this. Once we had that across all plugins, we could go ahead and deprecate our image elements once it's stable, and then build that API to sort of spawn a clean image in an environment which we know is going to work, do a reliable process in it, build your image from a base image, push it back into glance, and have the thing be done. This is where we are. So we actually have the full abstraction built. This image generation CLI works today. You can build an image specification in this language. It'll pack it for you. The methods to validate and to modify images on the server side are there. They haven't been implemented for each plugin yet. That takes a significant amount of testing. So this is really where we're at now. But this is about what it looks like. So up here you can see that you've got, you know, a set of arguments so that you can make your image packing language configurable, however you wish to. And then you've got a set of validators. And those validators will either test or modify as you specify in the original call. You know, we're really going for item potents with this. One problem with Dib is that the retry cycle was rather arduous because Dib doesn't really aim for item potents much at all. So item potents is a big goal. You know, all the validators that we're writing have sort of item potents enforced. We do have a script validator so you can just run your own bash code. If you do, really, please keep it item potent. You're going to like your life a lot more if you do. And then we also have some logical control operators, you know, like any and all. You can switch on OS. You can switch on arguments that you specify just to give yourself a little bit more control over the logical flow. This is the fundamental CLI that was built. There's the Sahara image pack call that's built into Sahara now. You give it a base image. You give it a plug-in and a version and any arguments that you want to give it. Some nice features of this is that, you know, if you build your YAML, it'll actually auto-generate the CLI help text for you. It is item potent. It modifies images in place. So if something fails, you can change your YAML and run it again. And hopefully it'll succeed on the same thing, but it won't do any of the intervening steps again because they're done. It does allow free-form bash scripts. It also allows more structured resources. We really built it to try to be as extensible as possible so you can use our stuff or map in your own sets of validators if you want to do something crazy that we haven't imagined. Please contribute it back if you do. We're all very excited about that. And there is, of course, also a test-only mode. If you don't want to change your image, you just want to validate it and nothing else. So what it's doing is that this images module that we wrote is running this sequence of steps against a remote machine. On the server side, it's using the same Sahara SSH remote as we've always used. But on the image packing side, it's actually using a LibGestFS Python API image handle. I don't want to get into the holy war between LibGestFS and Dib that ever rages. They are both supported officially in the OpenStack community. We really went with LibGestFS because it gave us this option to feed the same logic through in Python on both sides. That's our use case and why we chose this technology fundamentally. But because we have that, we can actually achieve our goal of unifying these flows, having clean OS image generation, cluster generation, image packing and validation all actually using the same logic. So if you're interested in image generation, you want to get involved, you want to see what this looks like. We are actively implementing in O. Please contact me and get involved. But this is coming up quite soon. And I'm going to turn it over to Nikita now to talk about bare metal, which everybody is always excited about all the time. Because who needs virtualization anymore? It's all bare metal in containers now, don't you know? Okay, thank you. So the bare metal clusters, yeah, so why they're important in the first place is because the big data workload orchestrators or frameworks or job types or whatever you can imagine in the big data world, they all originated from the bare metal installations. So having this capability in OpenStack was just a matter of time that we should have supported the workloads on bare metal. What we provide with Sahara and bare metal together is the quick cluster scalability and the priority over the scalability and persistence can be managed as well. So in this case, we're providing the best performance by design. So you own your hardware and you know how to configure it to get the best performance. Sahara will not touch its configuration. And there is no, obviously no virtualization overhead. There is no virtualized storage on network that can be bottlenecked due to poor configuration or missing options. So that's why you get the best performance out of Ironic and Sahara for the data processing. And still, Ironic is backed, is a back end to the over provisioning API. So the ability to manage your Sahara bare metal cluster is basically the same as you would do with the VMs. And it will stay the same in most points with the data processing engine and with the provision engine as well. So I'll go through comparing some major points. Why would you choose one or another? So if you are in the place choosing to run virtual machines or bare metal clusters, where should you go? And the points here to cover how flexible you want the cluster to be. So if you just want to have a persistent, long running data cluster, then you probably are going to have it on Ironic and bare metal if you still don't have it there. If you want a quick running, fast provisioning virtualized clusters, then you should go for VMs and correct the tune your flavors and other settings. Then if you're very targeted to the 100% host utilization, then again, bare metal is your choice because you're in control of the host operating system. You're in control of the road devices and you can install your drivers and utilize them as hard as you can. While on the virtual machines, it's more depending on the hypervisor you use, but in default reference KVM architecture, you will be committed to the memory overhead because KVM based virtual machines consume more overhead than they have committed memory. And you'll also be facing the CPU and network overheads due to the abstraction layer between the real device and the virtual device in your virtual machine. The major point of all the big data workloads is about the data locality. And that locality is important if you are going to the batch processing more than streaming. The data locality was built into distribution since it's very early releases. And it was a multi-level algorithm that was passing the device handlers directly to the Java process and then directly to the MapRedisk talks, which was giving a huge boost in performance rather than randomly reading from the disks. So data locality is, of course, basically achieved on the bare metal, but if you configure your virtual machines to use the center block device driver and you use the correct scheduling options to fit your VMs and volumes on the same compute nodes, this same effect can be achieved if you still have this memory and CPU commitments okay. And maybe one of the last but also important points here is the live migration support. So right now we have to say that there is no way to live migrate an ironic host or the compute host completely. If you lose the host, you lose all the disks and the compute power of it. So be prepared for that. As for the virtual machines, it again depends on your back ends for the hypervisor and for your storage. But in most reference architectures with KVM, CEP or LVM back ends, they support replication and they support live migration without any data loss or sometimes even with a very small downtime of this node. And also an important point to be mentioned in the live migration and host loss scenario is that actually the head of base frameworks are built by design with this in mind and they can handle the loss of the host on their level. So probably the bare metal part is not very critical. If you lose a host, just make sure that you have configured your cluster right and you have the proper application set for it. And as I've already said that running bare metal clusters with Tehari is almost as simple as running the VMs. It is not different at all from the user's perspective. However, it is different in some points for administrator or operator cloud operator perspective. So it's very important to place the hosts into their availability zone or separate them without the metadata so that Tehari can schedule properly on this node with its own flavor. This requires special metadata to be set and it is not Tehari's feature to set that metadata. So please be aware of that. The second point is that storage is not backed by Cinder when you are running bare metal. So Tehari will discover the disks on the compute host for you. It will attach them, mount and format for HDFS but you will not see and you will not manage them through the OpenStack API anymore. So if you still want to have some more smart management like separate the disks between HDFS and non-HDFS workloads, there are cloud managers, there are Hadoop managers like Glider Manager and Ambari that can do that and please use them in this case. Also, Tehari is provisioned in bare metal servers with the images built with this commit builder or when you build image mechanism but all these approaches are based in the distribution on the cloud-based operating systems and if your hardware is running something non-standard or requires non-standard drivers, they will be missing obviously in these images and the administrator or the operator or cloud should care about that and should update the images properly so that the hardware is utilized then. And the network tenant installation is always a hard question when it comes to bare metal and ironic. It has a good progress in unit release and right now there are also manual switch configurations available to make this tenant operation work but in the default setup you should be prepared that you will have the flat network across your compute hosts and they will be reachable from one to another independent on which open stack tenant you are on your clustering. So we have also some few updates which are not that large like our features with Kerberos or health checks but we still think they are with machining and we've got them implemented in Uton. So there are a few services that we have new integration with. One of the most important is the DNS solution with designate. As hard as you can now can provision the clusters not based on the fixed IPs that were spread manually almost manually across the nodes but now the proper name resolution service can help building large clusters without any issues. The API improvements include the management and the UX pagination and iteration over the list of instances which now allow users to see a thousand nodes how do cluster in horizon which they previously could not because of too much data to be loaded. The plugin operation actually now allows to separate the different plugins for different tenants and disable unneeded plugins if those are for some reason not supposed to be used in this environment. Of course we have the newest plugin version supported as to the new release which are HDP 2.4 MapR 5.2 CDH57 and the latest release of Vanilla plus Spark which are now not more separate plugins. They have been combined under a single Apache plugin and can run both job types which were previously separated. So a few more improvements to the testing framework which is now available and can validate any Sahara installation with a lot of health and cluster check readiness. The output is now more pre-defermented so the user can see on which of these steps or on which of the checks the cluster is failing and can find the issues directly from the test output to not rather not going to this Sahara logs and seeing through the bunch of strange output. The tempest plugin is also growing to support the CLI and more test coverage on the CLI and API of Sahara and Sahara now is publishing to the PyPy so Sahara tests are now publishing to the PyPy so you can run the package version of our test download them and install and validate your Sahara installation on most of the plugins and distributions and actually these are all the highlights and features that we wanted to hint your own Newton release so we're very glad you have come to our late sessions and welcome with your questions. I'll repeat a question. We've been told that it'll probably carry. Okay the question is are we calling Nova to boot bare metal service or are we going to run it directly right now we are going to Nova actually we're building a heat template for provision all our sources and the heat resource name is still OS Nova server so which is backed by the ironic server in a proper availability zone and flavor. Okay then if we don't have any questions I just wanted to leave this useful set of links that can learn about Sahara and its new features and the specs are right now working on. Thank you.