 Hi, everyone. I hope you can hear me well. I'm Doniziano, sales engineer at Mesosphere. I had an issue with my laptop just now. I had to reboot my Mac. It normally happens only with Windows, but I don't know why. It happens for me with my Mac, so hopefully that will be fine after. So I'll present you Mesosphere DCOS and I'll go through some slides and then I have a demo. Again, I have two phones, so that if one phone doesn't work, then I still have the second phone, so hopefully everything will be okay. So let's, yeah, everyone is there. Let's perhaps wait one or two minutes. So normally when I do my demo, everyone complains about the fact that my characters are too small in my terminal. So I think that time, that should be fine, right? You should be able to see it correctly. If it doesn't work, I still have a problem about the characters I'm using. So let's start. So, you know, machine learning, why it matters? I mean, there is nothing new here. You know that in all the industries, people are using machine learning either to improve processes or to increase revenue, generate new revenue, and so on. So here, nothing new or not go through examples. I'm sure you have seen like dozens and dozens of examples since you joined this morning. What I want to highlight here is that there are several kinds of people involved now in ML projects. You have like the system admins, DevOps, that are preparing the environment, that are taking care of the infrastructure. You have this emerging of data ops people who are kind of taking some of the lessons learned from the DevOps side and apply that to data science. And you have the data scientists at the end who will create models and things like that. And obviously, in the ideal world, we would like people to, like these data scientists, to just focus on creating their model and that's it. But in reality, in data science projects, the time you spend creating your model is quite small. And as I said just before, people tend to reuse the concept that they've learned with DevOps. When they have data science project, which means that when you have a DevOps project, you develop your application and you have like a CICD pipeline so that your new code is directly generating a container and your container is then started in a Kubernetes environment, for example, and then you run your test and so on. So you want to do the same with machine learning. You write your code this time, not to create an application, but to create a model. And when the model is ready, you want to have a CICD pipeline so that you create perhaps a Docker image where you have your model and you run some tests and so on. So to do that, you need many different components. Obviously, you need the different tools that people use in general for machine learning. It can be data stores like HDFS, it can be Spark, TensorFlow, notebooks, like Jupyter notebooks. But if you follow this principle, you also need to have software like GitLab, Jenkins, so that you have your CICD pipeline in place. You also need to have Kubernetes running perhaps so that you can deploy your containers there and so on. So that's what we see generally in data science projects. But I'm working also on a lot of projects where people want to modernize their applications or create new clandestive applications. And these two words are really, really merging. And you see the same kind of requirements. Obviously, when I have this kind of clandestive application use case, I'm focusing first perhaps on having my CICD pipeline, having my Kubernetes up and running and so on. But then I also need data. I need to store my data. I need things like no SQL databases. And then when I have my application production, I start to think that perhaps I could extract some value from this data. And I start to have the kind of project that you have in a data science project where you want to extract some value from the data using Spark or TensorFlow and so on. So if you just deploy all these different technologies in silos, like we have done generally since a long time, then what happens is that you have some physical servers or virtual machines that are dedicated for Kubernetes. Some others are dedicated for Spark. Some others are dedicated for GitLab, Jenkins and so on. So you get a lot of different silos and they are generally all underutilized. And the second issue, which is probably the most important one, is that you have to learn about all these technologies. You have to learn about how you can how to deploy and maintain Kubernetes, how to scale it, how to secure it. You have to learn about how to upgrade it without disrupting your applications running on it. And then you have to do exactly the same for Kafka, the same for Cassandra and so on. So it's really painful and it also means that it takes more time for you to develop your application or to create your models because you spend upfront a lot of time preparing everything. So that's why the public cloud is really popular because people just say, yeah, I have no value in becoming an expert in managing Kafka, Kubernetes and so on. So let's just consume these services in the public cloud. But there are two main problems there. The first one is you'll be locked. So if I start to consume all the services provided in Google Cloud, then I cannot really go easily to AWS. And perhaps I have a regulation that is telling me, oh, you should not be in the cloud for that workload. You should go on-premise. And there is no way you can easily go from the cloud to on-premise. And even if you are able to manage these kind of things, then you have a second problem, which is the price of these services. It's really expensive when you start to use all these very specific services. And the way the building is done is not clear at all. I mean, if you have a project and you know I want to use a Kinesis from AWS for exchanging messages, then you will have to pay based on the number of messages you have and things like that. How can you know in advance how many messages you have? You don't know. So you don't even know how much it will cost. You only know it will cost a lot of money. So the idea of Mesosphere is that we only depends on machines. It can be a physical machine in your data center. It can be a virtual machine on OpenStack, on VMware. It can be a virtual machine in the cloud on AWS, Google. But we even have customers when they want to go to China, for example, they just use Alibaba Cloud. There is no problem. You just need a VM, just need a VM or a physical server. So that means that you have no lock-in and you get exactly the same experience wherever you are. Then we have this Apache Mesos layer that we use to aggregate all the resources of all of these machines. And when we say resources, we speak about RAM, about CPU, about GPU, about disk and so on, and as soon as we have aggregated all these resources, we can then launch our services. And because we have this Mesos layer, we are able to have many different kind of workloads running on the same machines. So instead of having machines dedicated for Kubernetes, Kafka, Jenkins and so on, I can have all these machines used to run mixed workloads. So if I log in in one of the machines and I see there is a little bit of Kafka, a little bit of Kubernetes, a little bit of Spark, that means that I have a higher utilization. So I reduce the cost of my infrastructure. Then we have this catalog of services where we have certified and community packages. And the certified packages, we provide a full support of the life cycle of these different services. So if you try to deploy Kafka and it doesn't work, you can call us. If you try to upgrade Kafka, you can call us and the same for all the other packages that are certified. And we really take care of the full life cycle of these services with all the security options that you need. So you can very easily say, I want to deploy HDFS, Spark and Kafka and I want Kerberos integration and SSL. Seems like that would take like weeks if you have to do everything by yourself. But with DCOS, it can really just be done in a matter of a few dozens of minutes or things like that. So what it means for you, you get a lower management cost because everything is automated. The life cycle of this application is automated. And you get a faster time to market because you can now focus in developing your applications or developing your models. And basically you get the same time to market you get with public cloud offering, but without the looking and at a cost that you can manage. These are the services that are currently certified, so supported by us. So you see we have things you need in DevOps like Kubernetes, Jenkins and so on. We have no SQL databases like Cassandra, Mongo and Coachbase. I mean with the three, you have like 95% of the market probably. Big data software like HDFS, Spark, TensorFlow and so on. And before the end of the year, we are adding all these other ones. So we are adding like a JupyterLab service and I go through that in a minute. We are adding relational databases support because we see that even in the the very innovative project, you still need relational databases, perhaps to manage your users or to do some configuration management, whatever. It's very common that you still have these relational databases and a lot of our customers have requested us to add this kind of support. So in 1.12, we have added, so 1.2 is the current version of DCOS. We have added this nice JupyterLab as a service. So I don't know who is familiar or not with what is JupyterLab and what is a notebook, so I'll go through that very quickly. But basically in the traditional world, you want to develop your model, you write your code in Java or whatever language you want on your laptop, then you compile that code, then you have a binary, you store it somewhere in a shared file system and then you go to your Spark cluster and you start a Spark job and it creates your model, for example. And then you discover that it's not exactly what you want because generally it's an iterative process. You have to modify your code many times until you get the right model. So every time you have to go through the same process. But with a notebook, what you get is that you get a web UI where you write your code in the language you want and then you click on Play. You have a Play button and it does everything behind the scene. So it compiles your code and it sends that job to a Spark cluster, for example, and you're done. And you can modify your code and click on Play again and it does that again. And it keeps the Spark cluster running. So every time you click on Play, you have the new response very quickly. You don't have to wait for dozens of minutes on things like that. So what we've done is that we took this JupyterLab notebook, which is the most popular notebook, actually. And the only issue that people had with JupyterLab is that it was only for Python. So only for developers who want to use Python. But we have one of our customers called Two Sigma. It's a FinTech company. I don't know if you know them. They have created something we call kernels in JupyterLab so that it can support many other languages. So you can now use JupyterLab for Java, for Scala, for R, whatever you want. So we have added all these kernels in our JupyterLab service and we have added all the Spark and TensorFlow dependencies so that now the data scientist can just focus on using his notebook and creating his model. Another thing we have added in 112 is this ability to deploy as many Kubernetes clusters as you want, which is also unique in the same machines. So now you can have like one Kubernetes cluster for one team, another one for another team, different versions and so on. And we also have this hybrid cloud capability, which means that you can deploy your DCOS cluster in your data center and then you can deploy Agents in the cloud. And it's a new region for DCOS. Later you can decide where you want to launch your workload. So perhaps I want to use my data center and at some point I need GPU. I don't have GPU available in my data center. I can just say I want to launch JupyterLab notebook in the remote region in AWS where I have GPU and you can accelerate the creation of your models. So I'll go through a demo and basically the demo, the purpose of this demo is to demonstrate several things. One thing is that we have many different services available that will help you in your data science projects that also that we can deploy these services with security. So you'll see in the demo, I have secured everything with SSL for all the communication between all these different pieces and I am using Kerberos to secure also the access of the data. And what I'll do is that I'll... And some of the jobs I have already run them because some of the jobs can take like 20 minutes or things like that. So I'll show you the results because always we will be there for an hour or so, which I think was not the goal. So we use Flickr, we'll get some pictures of cats and dogs using the Flickr API and we'll store them in HDFS. And then we will go on the JupyterLab notebook and we will launch a job. In fact, I already launched it, but I'll show you. Launch a job that is like retraining a model to classify pictures. And as soon as my model is finished, then what I do is that I store that model in a GitLab repository and I have set up a CIED-CD pipeline so that automatically it will create a Docker container with my model and it will push that container in a Kubernetes cluster. So you have kind of the full pipeline, the full story I have described at the beginning and everything running in only one platform without having to maintain all the different dependencies. Oops, three before after. So let me check if I have a good network and everything works well. I have to escape that first. Yeah, that's good. Last time, that's when I escaped it that everything was broken. So as I said, the first thing I did here is that I used Apache in I-Fi to get pictures of dogs and cats from the Flickr API. So I'll not go through all the details, but I'll just show you I have two times the same job and if I go there in that component here, you can see that I have indicated on that one that I want to look for cats. Oh, you don't see it. That's not good. So let me just move it there that way. At least I've seen that you were not seeing it before I finished it. That would be really good for me to do that that way. So sorry, let me just do something different because otherwise I'll never be able to do it. Let me just change the way I clone my desktop. That will be easier. So let me go back and show you again. So I have my I-Fi here where I have two jobs, one to get pictures of cats and one to get pictures of dogs. So if I go on the first one, I can see here that I have indicated that I wanted pictures of cats in that case. Then I just go here through the Flickr API and get all the list of the pictures. I have a loop to go through the different pages. Then I get my pictures one by one. And when I get the pictures, I then just store them on HDFS. And as I just explained before, I have like a Kerberos integration so you can see here that I have indicated the information like my key tab and so on so that all the communication are completely secured. So as soon as I got these pictures, if I go to JupyterLab, which is the notebook like I described before, where you have all these different languages available, in that case, I'll just go and open a new terminal because what's nice is that each data scientist has its own environment with the shell available and so on. So I can just take a look here and see that I should have pictures of cats and dogs there. Yes, so if I do minus R, then I see all my pictures of cats and dogs. So I have 2,000 pictures, 1,000 for cats, 1,000 for dogs and this is nice because the way you can retrain the model with TensorFlow, you don't have to start from zero. It already knows a lot about how it has classified millions of pictures in Google before. So with only 1,000 pictures, you can get very good results. So then I started a job here that was just like this retrain job that is just looking at this HDFS directory where you have pictures of cats and dogs and it has just written the model in the slash output slash tmp directory like it's described there. So what I want to do now is just like to add this model in my GitLab repository. So I have here a GitLab repository where I have already did my first commit with just all the files I need to create my web application that will consume this model. But you see here I don't have the model, okay? I need to have my model added there and to do that I'm just going to go in my JupyterLab here and just do a cp of slash tmp slash output everything in that directory and I can do a git commit minus im with the model and I can do a git push. So in a few seconds I'll see here that I have a new commit and I have my model it will come in a minute and then what I did is that I configured Jenkins so that it will take a look at this GitLab repository and whenever there is a change there it will discover that change and it will start to run the pipeline and basically what it means is that at the end it means that it will execute that file that's generally we call Jenkins file and when it will execute that file it will get a docker image which is like my base image and it will then get the data from the GitLab repository with the model and so on and it will create a new docker image based on that it will push that image to the docker hub and finally it will run some kubectl command to deploy that container in Kubernetes so I have my new model I put it in my GitLab repository Jenkins discovered that by itself build my new docker image with this model and start containers of that new image in Kubernetes and if everything goes well then I can just go and show you this web application and we can validate if the model is working well or not okay? So let me first take a look at Jenkins here and perhaps refresh it and see if it's as yet like we see it has detected that there is a new commit and if I go there I can even see the output of what is happening and that will probably take like two or three minutes and I will use that time to show you a little bit more about how everything has been deployed on the DCOS site so I have this dashboard from DCOS where you see that I have like some CPU some memory, some GPU and so on and as I said before we have this catalog of services available where we have many certified services like Kafka, Cassandra and so on and I have deployed many different services like you see here so you see we spoke about NIFI for example so if I go there I can see that I have NIFI with two nodes if I would like to download faster my different pictures from Flickr I could just add more nodes here so I just have to edit here and change the number of nodes I want and click on run and I would get more nodes in like two minutes if I want to automate this process I can easily get the JSON corresponding to what I want to deploy or upgrades and very quickly just automate this operation perhaps even store that configuration in a GitLab repository to have some good best practices like using infrastructure as code and I have a lot of other components like as I said, HDFS configured with high availability with two name nodes with several data nodes with Kerberos enabled and so on we have like a Kubernetes cluster here where we have like several Kubernetes nodes and so on so everything has been deployed and I have a GitLab repository where everything is available if you're interested basically in 45 minutes it starts from zero and you have everything deployed and what you can see here as well if I go now in the Jenkins side is that there is a task that has just been started like three minutes ago so when Jenkins discovered it had to do execute the pipeline then it has requested some resources in Mesos to be able to start a container to execute that job and then when the job is executed then it can just get rid of that container on Mesos so you don't even have to dedicate resources all the time for Jenkins it will just get the resources it needs from Mesos so if I go there we should be close to the end and we have a failure which is good no it's not good obviously so I just have to do check here if my push worked or not it is oh you know what I did I missed to add my files so obviously it was not in the commit so that's why we didn't see it but it will be very quick to do it again because I already have this worker on Mesos so if I do a good push here that's why when I refreshed I didn't see the model this time I should be able to see it yeah and it's longer which makes sense because I have my model in it so what I will do there just to get some time I'll just execute it myself so I will not wait for him to discover it or it's even already discovered it so that's great so this will be fast because the executor that we have used just here is still there that's not been deleted yet so this machine already gets like the docker pool will be very fast and so on so it will just take one minute or so to deploy the... to finish the full pipeline so while it is progressing on that side I'll just go back on the UI and show you a few other things that are quite interesting as you see here I have one GPU used by my GPTel app notebook so when I retrend my model I've been able to use the GPU that was allocated on that node and in Mesos you have this ability to create quotas so you see for JupyterLab I have a quota of 7 CPUs and 48 GB of RAM it can use when it launches new jobs okay so that's very nice because you can give like a notebook to a data scientist and make sure that this person will not consume all the resources of your cluster so I'll do that at the very end when the rest will be finished I'll just start a quick distributed Spark job so you can see that in action and you can see the usage of the GPU but it's really unique because when you look at Docker in general you can use GPU but you cannot limit how much you consume of this GPU in Mesos you can do that so that's really powerful so let me go back here again oh I have a success which is better so now if I go on my Kubernetes cluster and I refresh here I should be able to see that it is starting some instances obviously it asked me for a token and I don't know if my file is expired or not so let's try to to see if it gets it should be fine yes so you see it's kind of doing a docker pool of the image I have just created and starting that deployment and I'm not going to all the details about Kubernetes because that's not the purpose of that session but then you need something like traffic to expose your application to the outside world and automatically you see that I have this rule that has been created to be able to go to my two containers that are running in Kubernetes so if I go there and if everything works well I have access to my web UI very beautiful web UI as you can see and I'll be able now to try and see if my model works well so if I try to download to give it a picture and ask it to discover the picture then we should be able to see if our model is working well or not yeah there are some caching so if I do that that should be fine I assume yes so that's a cat the model works quite well so don't ask me why but if I do the same with dogs it's never as efficient it looks like TensorFlow prefer cats than dogs I'll show you that perhaps today it will be different but it's not so bad but it's not as good as the cat poor dog that's perhaps why and you can even try sometimes to confuse it to see what it thinks about this picture so the model doesn't work very well on that one I don't know why oh that's probably why so sometimes you recognize as a cat sometimes as a dog depends on when I do it so very quick demo I will add as I said before when we executed this this job to retrain the model it was not a distributed job so I will just run another one that is using distributed Spark and what we do here in fact is that we use like a very nice project that has been open source by Yahoo that is using Spark to distribute a TensorFlow job so you just have to slightly modify your TensorFlow job and you get like it distributed on Spark which is a lot easier than creating your own cluster for each TensorFlow job which is what you generally need to do so we don't really care about the results so much I just wanted to show you here that on Mesos I see that I have my two containers that are starting now and these two containers are using GPUs and I see there if I go and look at my usage I see that I am using two CPUs, two GPUs and 13 GB of RAM okay so I have no quota for GPU I could have added also a quota for it if I wanted but just to show you the advantage of using DCOS for GPU and the way you can manage quotas so if I go back there just like a last slide just to summarize what I have just demonstrated if it wants to show it yes so you know like I said oh that's great I don't see it in my screen yeah that's good so DCOS provides all the software you need for this kind of data science projects it can deploy each of them in few minutes I didn't demonstrate that here because obviously the talk would be too long but I have some demos available I have that as a video on YouTube as well if you want to take a look later you can secure everything with Kabyros and TLS which is something that is very complex to do generally you can do that anywhere like what I demonstrated here in AWS I could have done that on Premiere I could have done that in Google or wherever you can have a nice notebook experience which is really what the data scientists are asking for currently you can leverage GPU you can use Mesos Quota to make sure that people don't really use all the resources of the cluster and you can even use the same platform to deploy Kubernetes and serve your models so that's all I add I think we still have some time for questions so I don't know if I don't know what's the who has the microphone I think we have a microphone now so if we can just get a microphone so do we have any question? I'm sure we have right I can't see you guys, I have a light in my face since the being of this talk Can you hear me? Yeah, very well So the technical question related to Jenkins so where's the decision made about spinning up a new machine? Is it on Jenkins side or it's on Mesos side? It's on Jenkins side so basically you have something that is called on Jenkins I can show you very quickly So it's a plugin? You have like a cloud so I'll show you here when you go to the configuration of Jenkins basically the Jenkins package that we deploy on DCS already contain what it needs to be able to talk with Mesos so when you go to configure in Jenkins you have this cloud section and basically this is just another cloud so it's another cloud that knows how to talk with Mesos and so on and one thing I didn't show you in fact I just realized that now because I've done everything with the terminal but obviously people tend to use this approach where you write your code you click on play it executes your job and so on Any other questions? Sure Good, that's great So just the last thing we have a raffle if you come to our booth and just register there tomorrow you can win like different prices we have a nice drone so feel free to come by our booth say hi and if you are lucky even go back tomorrow with a nice price Thanks everyone