 Hello everyone. I'm Telus. I'm the current PTL for the Sahara project, and I'm going to present today our overview of the project and updates that we're planning for this cycle and the future two cycles that We have ahead of us. I was going to present this with Alice as well, but due to some medical issues She's not going to be here. So it's just going to be me So first, I'm going to tell you a little bit of what Sahara does if anyone here is not familiar with Sahara Sahara is the data processing service in OpenStack And what it does is it provides a way for users to deploy clusters of data processing engines Without minimum effort with minimum effort from the user. So Sahara allows the user to create Hadoop clusters Spark clusters and storm clusters at this point and have different vendors of Hadoop like Cloudera, Mepar, Ambari and Have storm and spark as I said and another important part of Sahara is that it provides a way for the users to deploy Jobs into these data processing clusters. So the user won't have to log in into that machine to run the job It can it can do it from the horizon the OpenStack interface So Sahara was founded during the June release of OpenStack and this latest release we had contributions from 55 people But actively there were like nine or ten I guess most of those contributions like most of the contributors just wrote one patch or did a small review or something But active contributors will have a small group of nine to ten people At the latest latest survey The it's it said that 11% of the clouds in production or tests phase indicating they are using Sahara at this point so With that 11% 3% said that there was any production and 8% using in the test phase and 25% of the people in the survey said they're considering using Sahara in In their clouds So I'm gonna talk a little bit of how Sahara actually works So Sahara is the user to deploy data processing clusters But clusters is like the end product of what Sahara does in the middle There are some steps that you need to do to have this cluster running and the first step They need to create our node group templates the node group templates is gonna identify What parts what nodes are gonna do in that cluster? So for example in a Hadoop cluster you have a master node. Usually you have a master node and you have a slave node So in Sahara you're gonna have to create a template for that master I'm gonna have to create another template for that slave and choose what processes are gonna be running on each of these node group templates Once you have the node group templates ready You can create a cluster template which actually just put those together and you can select how many of each you're gonna use So for example in Hadoop you usually only have one master node But you're gonna have like ten or hundreds hundreds of the slave nodes running on that cluster depending on the scale out that you want And another important part of Sahara is that it allows the job processing That's actually very important in some cases that something that we're working on very hard to get it better and better every day That's called the elastic data processing part of Sahara and that's divided in three main parts the data sources Which are which is the part where you sell you can put your data in and Sahara is gonna read from that that that source and also you can put a data source as an output of the data of the Sahara processing so Data source is gonna store your data that gonna gonna go in into the cluster and also it's gonna go out come out of the cluster the job binary is the Jar file that you're actually running inside your cluster. So When you you can't to run this job in Sahara, you can create you have to create a job binary inside of the Sahara Project where you so you put a path of this jar file which right now We usually use Swift to store the job files So you just point out this job binary to this Swift URL where the job file is stored So we can put pull that to run the job inside the cluster and the job templates which put the puts the whole thing together So if the job templates are gonna select the input data source, you're gonna select the output data source I'm gonna select the the cluster that you're gonna run and You can submit the job to be run on the cluster The architecture of Sahara is a very basic open-stack architecture So we have a race we have a rest API where you have we receive all the requests from Horizon and the Sahara client or CLI We authenticate with Kiston. So we have all that configuration configured to authenticate with Kiston. We have a Data access layer. So we manage our own database we Secure storage with Barbican, but that's not required yet Like it's not necessary to have Barbican, but it's an option to store stuff in Barbican and We have two main ports on that architecture. That's the EDP port which I just explained So that's the port that manages the job submission port and we also have the provisioning engine Which is the part that actually creates the cluster and configures the whole thing for the user So for provisioning a cluster we use heat So we create heat templates that that the heat is gonna communicate. We've seen their glance neutron Ironic or anything that we want to create those machines and The data source up there is gonna it's just a kind of outside of Sahara But it's very important because that's where we're gonna store the files and the data to be processed The interesting part there as well is the vendor plug plugins. So Sahara is It's created in a plugin based So we have different plugins for each data processing engine that we that we run today So we have a Hadoop plugin. We have for Hadoop. We have actually three different plugins We have upstream Hadoop. So that's our vanilla plugin. We have Cloudera Have the Cloudera vendor. We have Mepar and we have on body. So actually four different plugins for Hadoop We have spark inside of on body and upstream as well and we have storm upstream We don't have any vendors plugins for storm. So talking about the new features that we're gonna be Working that we are actually working on bike I think the biggest one is the ability to create and validate images inside Sahara So one thing that I actually skipped mentioning is to use Sahara we create images With the data processing engine already installed on it That's because Before we used to create that on the fly So when creating the cluster will download the files to install Hadoop For example in the image while the cluster was creating but that takes like a long time Especially with vendor plugins for example, I think the Cloudera plugins over one gigabyte So that would take forever if you haven't if you were creating a big cluster. So we Encouraged people to create that the image with Cloudera already installed So you can just say here I can just use that image to create a new Machine and then we configure that after the instance is created And for that we actually provide a project called Sahara image elements Which uses this image builder to create the instance to create the image and we can use that but the downside of that it's Very complicated to get it working this game edge builders not the most friendly image creation too and Anyone who's actually worked this so knows that it has some difficulties to get it working So we are actually creating a new Way to create images for Sahara That's gonna be we're gonna use lib gas FS to manipulate the images and This gonna actually make a lot easier for us because lib gas FS doesn't We don't need to Reset everything after it fails. So lib gas FS gives us the opportunity to just keep growing the image So if it failed at some point we can just do a little tweak and work from that point on So the image won't be needed to be rebuild it completely every time And also this feature is very important because after the image is created We're gonna have a validation process in Sahara So we're gonna run checks During the before the cluster is created to verify that the image has everything that it needs to have to create Properly and the cluster will be running properly after with Sahara so this is like one of the biggest features that we're gonna have right now because Certainly it's gonna make life of the life of the users very Easier to start with Sahara because most of the people that we know and we try to introduce to Sahara Had a very hard time working with the images from the start to actually start creating clusters So that's gonna be a big improvement for us The other improvement that we actually is already on master. So it's gonna be definitely released on On bike is did we facilitated the addition of new data sources and job types so before the data sources and job types was kind of hard-coded inside Sahara and recently we had a work done that Transform that into a plugin based code So you can have like different plugins for data sources and job types just as we have for Hadoop spark and storm So if you we want to add a new data source for Sahara It's gonna be fairly easy right now, and we're actually planning to add a new data source for S3 so Amazon S3 it's gonna it's probably gonna be Not in pike but probably in Queen gonna be there. So hopefully you can get that working and One interesting feature which is actually there, but most people don't really know it is that we have bare metal to tenant Meaning that we can actually use Sahara to create clusters with bare metal nodes, not only virtual machine nodes that means like that's one of the major things that we have because Big data don't really run well on virtual machines like real big data We need bare metal nodes to run that and now if Sahara it can actually do that and it's still I Think that we need to improve because it's fairly new and it needs that we need a little more testing And it's not always easy to get like a good lab with bare metal nodes to test this Because it's it's not cheap to buy computers to be trying that stuff so Yeah, it's something that's gonna be I'm huge a huge like Divider for Sahara because we believe that once we get that properly working and we have advertised that People are gonna actually start using it more because you're gonna have like the use case that most people want which is having Clusters in bare metal nodes easier than we do today That's this slide shows a little bit of the teams that we focused on the bike release So it's kill a bit of scalability was not a focus for us on this release because we're actually at a very good stage When you talk about a scalability and so we focus a little bit more on resilience meaning and that big Improvement on that was that the image validation and creation part So we're trying to make Sahara not break that easily when we wanted to live more and clusters to be like Harder to fail when creating which is very frustrating for a person who's trying to use when it keeps failing and failing over and over So we're trying to make that easier for people Another major focus that we had was user experiencing that comes back to that image creation We want we try we're trying to make the use of Sahara a lot easier than it was before because it took a little Wired to get to know all the parts of Sahara When you're start we started to working with it, but right now we're trying to make these steps easier So we're trying to make the user Look like we did before with Hadoop. So right now you don't have to configure Hadoop You will we want the user to work less on Sahara to get it working So that's something that we are focusing on bike make making things easier for the user On Queen we have a different a little bit different focus so we're gonna focus a little bit more on security and modularity and this modularity stuff I'm gonna explain a little bit later because Right now and Sahara plugins are released inside of Sahara code and that's something that is actually good for us but in some some cases it can be a little bit bad because We should we could release Updates on the plugins faster than we could release open stack releases and this means that right now if I have Spark 2.1.0, which is the newest version and you have the Okada version of open stack you won't be able to get spark 2.1.0 You only you're only gonna be able to get that after you upgrade to pike and we we're Studying the possibility to remove the plugins code from inside of Sahara And this will we'll make a new library with only the plugins this will allow the users to Only update the plugins library and then you're gonna keep the pike You're gonna keep the Okada version of open stack, but you're gonna get the new plugins on on Sahara That's something that we're studying. You're not sure like How well that's gonna be For us and if the the community is gonna accept the users gonna accept that but still studying and looking for Looking for the opportunity to that cuz actually from my point of view. It's something great for us and the user so and another major release and major Feature in Queens is gonna be the release of API v2 of Sahara so API v2 of Sahara is kind of a Like for people who are involved in the develop development of Sahara No, it's gonna it's been there like forever. Like I started working in Sahara in 2013 and there was already documents talking about API v2 of Sahara back then and but it was never done and This actually something that I like kind of focused on is like we need to have that because it our API right now It's kind of divided and it has we actually have two APIs that are not completely integrated with each other And we need to make that stronger and a lot a lot better API So the API v2 is gonna be a better API for for Sahara is gonna make things easier for the user And it's gonna be released in Queens Hopefully we're gonna we're almost done with the implementation in pipe So we plan to finish everything implement to have everything implemented in pike and have everything tested in Queen So we can release something that we actually sure that works and the users won't have any trouble with that and So they focus on on the R So right now actually can change that R to Rocky because it's gonna be the rocky release Back when I created these lights that wasn't the name for it But we're gonna focus a little bit on skill ability So we have to always come back to check that because our project is meant to be skillable and actually Facilitate that kind of stuff for the user so we always have to come back and check how we're doing with that Stuff's like hey high availability as this kind of stuff We're not sure if Sahara needs it But you always have to come back and check and see how it's doing and see if we need to improve it in any case We're gonna keep keep working the resiliency. We're gonna focus even more on user experience That's something that I'm really focused on because I've before I started working on Sahara I was a user of Sahara and I Know that can be hard to start using So that's something that I actually focus a lot So make making things easier for the user to get it to get a start and making the project easier on the first contact Um security interoperability interoperability is something that's We have to keep working with other projects Like we always have to check with heat if everything is working nice with our communication Because we kind of depend on heat right now to work So that's something that we always have to come back a little bit and check how we're doing and we need to improve I Think that's all I have I have a question for people here now actually and this question is like what are the Major difficulties that you have when you're trying to deploy Sahara or most importantly when you try to use Sahara And that's something that we need to know from the users because we are used to it We are used to Sahara and we don't see the flaws like right in front of our faces that people who are getting to work with it No, I can't see so that feedback is like most important for us And if you have anything for us, we're most We are grateful to hear and we want to work to make that that better for us and for you guys like that most important part of the user so Please if you have any feedback for us, just let us know what you need and we're gonna try to work on it and now we have like time for questions if you feel have any please come to the mic so we have that on the On the recording so Thank you guys. Yeah API V2 going to include micro version support We want to and that's actually something that I'm working on I've ready like I've been on all the API working groups meetings and Read all the documents. We plan to to have the API With micro versions and on API V2, but we have some discussions about the problem with consistent Backward compact compatibility and maintaining too many API. So that's something that we're still working on on Sahara We're not sure how exactly we're gonna do it, but we're probably gonna have API V2 of micro versions Okay If you don't if you don't have any questions right now if you don't want to come to the mic Just come to me after I can we can just talk a little bit. So thank you guys for listening I hope you guys use Sahara and let us know what problems are and how we can make it better for you. Thank you