 I know it's hard to pay attention in the evening. So say it again. Good evening. Good evening. OK. That's better. Before I start, let me ask you, how many of your data scientists here? Raise your hands. One, two, three, four. That's like one fourth of you. How many of your developers, software architects, DevOps engineers testing? So this talk is about building tools for DevOps for data science. So if you're a software developer or a DevOps engineer, you hope you'll get some insights from how to build tools for data scientists. How many of your developers built some kind of a tool or a system for data scientists to use? Raise your hands. A few of them. So this talk comes from my experience of building a platform like that and also seeing other building platforms like that. Some of them are good. Some of them are things which could have not went so well. So there are things about how to do things, how not to do things, et cetera. So I'm going to share my experiences in that. My name is Anand. I'm co-founder of a startup called Orade data. It's a past platform for data scientists. Last two years, I've been building this platform for building and deploying machine learning applications to enable data scientists to build and deploy machine learning applications completely on their own without having to depend on other folks. So I also run advanced programming courses. People at Academy also do courses on machine learning. So I get to kind of speak to people who are trying to use machine learning and observe what kind of things are working, what kind of things are not working, et cetera. So let's look at the landscape of DevOps for data science. So if you look at the left, you have these DevOps tools. On the right, you have data science tools. If you just look at the DevOps tools, you have EC2, AWS, Lambda, Docker, Kubernetes, a bunch of things to provision your hardware, CSED, a bunch of databases, how do you deploy APIs, and all that. If you look, it's a fairly complicated landscape there. And if you look at the right, we have these Jupyter Notebooks, NumPy, Pandas for data analysis, and machine learning algorithms, and a bunch of deep learning libraries, et cetera. So if each one of them is kind of fairly complicated and airflow, et cetera, for managing your data pipelines. So when you want to do data science and production, you have to kind of manage both of them together. It's kind of a pretty complicated task. So managing data science and production is really hard, because it's a very angry discipline. So the practices are still evolving. They're not really mature yet. So tools and practices are keep changing, and people don't have, I mean, it's not really proven for everyone kind of thing, right? Whereas if you want to build a web application, everyone kind of follows the same thing. But if you take something in data science, everyone kind of invents their own way of doing things. The problem with that is building own solutions is not that easy. It requires very careful system architecture and very complex DevOps. Very often, the people who are developing it will only have a very short-sighted view of the problem at hand. So they don't really think for a very long term, which kind of makes things not so much what the requirements that they have. And also, sometimes the organization may not even have the capabilities to kind of build those kind of systems. Before we kind of jump in there, let's kind of see what's our goal, what we're trying to achieve. The data science team should be self-sufficient to build their own machine learning applications. What it means is if there is a communication between two different teams, there'll be a lot of latency. There's a lot of lead time. You have to ask someone to do something. I want to work on a bunch of GPUs. Can you please allocate the GPUs for me? I mean, it kind of takes a lot of time. You want to work on a GPU or some kind of compute thing. You should be able to allocate it yourself and then close it when it's done. So having to depend on someone else, some other team, to do some of those tasks kind of slows things down. Also, the other important thing is this learning curve should not be very steep. Why? Let's say a new person joins your team. How long does it take to get him on speed? Couple of hours a day. If it's more than that, then there's something that's seriously wrong with the setup that you have. Can you get an intern for a month and then make him start working from day one? These are the kinds of goals that we're looking at. So how do you build systems to make data science teams more productive? That's what we're trying to do. So how do you do that? That's a problem that we're trying to figure out. Here we jump in. Let's take some inspiration from origami. So this is called a flapping bird. Has anyone seen this before? So this is the flapping bird. So take it and then you pull its wings. The tail flaps its wings. Now, you go and give this to a kid. You can understand that. The kids really enjoy this. They can just pull this and then the flap moves. But behind the scenes, there's a lot of folds going in. So this is made of a square sheet of paper. And there are all the folds that actually goes in. So what you have to do is there are two different kinds of folds. You can fold paper as like a mountain. You can fold it, put a crease that kind of forms like a mountain or a valley. And the two kinds of folds are shown with solid lines and dotted lines. So if you actually see that, it takes about some 15 to 20 different folds to achieve there. But if you actually give a paper to a kid and say that these are the instructions to follow and then do all the steps, just throw it away and run off. So the key thing is to focus on the experience. So when you give it to a kid, you should be able to do that. And the important thing is the right level of abstraction. So we're talking about the level of abstraction that we're giving to the end user. So if you're talking to a kid, you can understand pulling a tail. That's the abstraction that he really understands. But if you talk about mountain folds and valley folds, that's something that's not his language. So he doesn't really understand that. So when you're talking about building tools for data scientists, we should focus really on the experience. So let's talk about the level of experience a bit. If you look at the UA world, people talk about user experience a lot. What does it mean is if you're on a website, how many clicks does it take to make a purchase? So this is a language that user UI designates often talk about. It takes four clicks to actually make a purchase. It should be able to do it with just two clicks. So they create to optimize that experience. And there is a similar thing that nobody has talked about in the developer world. It's called a developer experience. What it means is, so how many steps does it take to start something? How many steps does it take to start a notebook? So how many tasks that you need to kind of remember and then follow to start a notebook, for example? So what's the cognitive overload? How many things seem to remember? How many times you have to go for a sack of overflow before doing that? So we should try to optimize those things. The other thing is also, when we're designing these kind of systems, often people forget about the end user. When people designing systems, people, as engineers, we kind of love complexity. You want to work on the modern buzzwords. Want to work on Docker, Kubernetes. I want to do lambda, et cetera, right? So what happens a lot of times people want to work on this technologies badly that they can pick those technologies for solving the problem, even though that may not be the most optimal thing for the solution at hand. So often what happens is this leads to produces not very efficient solutions. What I call is resume driven development, right? People can try to optimize for their resume, not really for the organization. And that's actually a very common problem I usually find when looking at solutions built to solve DevOps. When you look at conference proposals and the people talking about the kind of things that are built, it's fairly complicated. It doesn't have to be that complex to be built with and they don't really focus on the end developer experience and what happens after a year or two they move on and who's going to manage that system, right? So this is how people can try to visualize and build complex systems to do things, but what's really doing is push button, right? Start a notebook and notebook should be there. So focusing on the end user experience is the crucial part of this. Let me take an example of Hiroku. How many of you use Hiroku here? Not too many, okay. So Hiroku is a platform of service for deploying web applications. It started almost a decade ago I guess. Back then deploying building web application was really hard. You had to get a server and then set up things, database, et cetera. So what Hiroku did was it simplified the whole process. There's only one way of doing things. There's just a push button. Just get push and your application gets deployed. You don't have to worry about anything else. So that's one of the pioneers of developer experience you would call, right? So that's the best developer experience you can find for deploying web applications. Now how do you kind of take that ideas from there and then apply it to data science? So what I'm going to do is take couple of case studies from what I worked on, okay. Building systems for launching notebooks. Let's take data scientists. You want to start working on something. He needs a notebook. You may need a notebook that you want to run on, to run any kind of software. You may want to run scikit-learn, you want to run TensorFlow or Keras or PyTorch. You want to run it on CPU or a GPU, et cetera. And then deploying machine learning models as an API. So once people kind of work on the notebooks, they kind of end up with some kind of machine learning model and they want to kind of deploy as an API so that they can start using it in production. So I'm going to look at these two case studies and see how I have tried to approach that, keeping the developer experience in mind. Hope that gives some insights. So let's look at launching the notebooks. So if you look at the challenges, one of the very important thing is switching between different compute needs. What I mean by that is when people start working on a problem, they usually start with a smaller subset of the data and then try different algorithms. And once they find that something is working, they want to try on the bigger dataset. So you may have to move to bigger instance with more resources. And it's very important to kind of take whatever you have done so far and then seamlessly move to different instance. And the other thing is installing required software. So one of the usual problems is people try to do things completely on their own systems. What happens is that it's very difficult to replicate on someone else's machine. So the things, what goes in setting up is usually in people's head, it becomes difficult for someone else to reproduce the same thing. So one of the important things is to make sure that software depends on handled automatically by the system. The second thing is data storage. You want to store the data. Typically what people do is download some data from web or database, keep it somewhere. It's an iterative process, right? When you're working on notebooks, you get some data, do some processing and then store the data somewhere else and then run something else and that goes on. So you need to have somewhere you can store the data. And even when you switch between your compute needs or stopping the notebook and coming back again, the data should be there. And if you're working on deep learning, you may need to have some kind of a GPU support. So those are the kind of challenges that we need to address in building such a system. So what we have done is we've created these abstractions to kind of solve this problem. So I'll explain what each means. It's a project runtime and instance size. When it's a project, a project is the component of work that you keep all your code, all your data, et cetera. So all your notebooks' data and software dependencies, et cetera, are kind of part of one project. So you can start multiple projects and work on them and then come back to project or switch between projects, et cetera, in the runtime. So this is one of the key abstractions that we've done. A lot of times people want to work on TensorFlow Keras, PyTorch, et cetera. And these versions keep changing. And let's say, if a TensorFlow new version gets released, to make sure that the compatible version of the GPU or GPU drivers are kind of loaded and all that. So it's kind of very complicated setup. Has anyone kind of set up those things for the GPU? So I have spent sleepless nights whenever an upgrade happens and suddenly system stops working because some library somewhere would have changed and it's kind of hard to kind of go and debug. So it's very important to kind of capture all of them as part of system so that you don't really expose those things to the end user. So what the runtime captures is the base software setup. The TensorFlow includes TensorFlow and all the ML libraries. Keras includes TensorFlow and Keras and PyTorch. So one of the things about runtime is it also manages the different versions. What we do is we follow versions by the date. So we have every three month releases of these runtimes. So 2018.03, 06 are our versions. So when you start a project, you can pin to a version of the runtime so that you always use the same version. And the next thing is the instance size. So we can decide some of these instance sizes. There's one S2, M1. So we run this setup. So this runs in the cloud. So if you want to start working, we start with an S2 which is the 3.5 GB RAM. But someone want to have more memory, they can use an M1 or M2 which is 16 GB, et cetera. But if you really want to do some quick analysis with a terabyte of memory, you can use an instance called X1 that has 64 cores and one terabyte memory. Run it for an hour and you pay some $10, $20 for that. So it's kind of very easy to kind of switch between compute needs without having to worry about spending a lot of time. Because since we have the cloud, you can really exploit that. You can start an instance, use it as long as you want and then stop it. A lot of times when people want to process large amounts of data for a very short amount of times, they put a lot of engineering efforts to set up something Spark or something. It makes sense if a large organization and then you want to do that again and again. But if it's something one-time job or you do it once a month to kind of analyze all the monthly data and then generate some kind of report, you don't really have to spend so much of engineering effort on Hadoop or Spark. But it makes sense when there is a need, but you may be able to buy a couple of years of time before we kind of get those kind of expertise by using some system like that. Right, you can just start an instance which has terabyte of RAM or two terabytes of RAM and you just pay some $20, $30 extra for that and you're done. And then if you want to use it, so these are the things that we have defined. So data scientists can, when you start a project, he picks a project, this starts a project, picks a runtime and then starts a notebook with one of these things. So that's the language that we are speaking now. So data scientists doesn't really worry about where is the machine coming, what's infrastructure, how to install software and all those kind of things. Sure, yeah, so I'll come back to that. I'll show you behind the scenes what happens, then we'll come back to that question. Please wait for that. So now how do you specify additional dependencies? So this kind of, a lot of times when people build systems that solves 80% of the problem and for remaining 20% of the problem, you just have to find workarounds and that kind of takes 90% of your time. That happens if a lot of systems. It's very, very important to make sure that the systems that you're designing are open, that they can actually take the additional. So one of the philosophies is common things should be simple and difficult things should be possible. Right, so how do you do additional dependencies, for example? So the way we do it is we write a runtime.ext that specifies the runtime. And you can write an environment YML that's the, if you're using Python, there's a conda environment specification, you can specify all the dependencies that picks it from there. Or you can write requirements.txt file that just lists Python dependencies. Most of the time people just write requirements.txt, 99% of the time, no one really want to do anything more than that. Actually 50% of people don't even actually come there, just write runtime and then start working on that. If you really want to install some app packages, which is if you're using system packages, I need to have graph-wise installed so that I can visualize my decision to model, okay. So there's an app.txt, you can list down all your app packages that you want to install. That's mostly good enough for almost all people, but some people may, power users want something more. There's a post-built script that you can write that will get executed at the end of it so that anything that doesn't fit in any of them, these things you can write a script and then keep it there. So that takes care of everything else that doesn't work. Now, if you look at this, we've been very careful, there are very careful design decisions that are made here. Data scientists never see this complexity, but if someone wants to do something more than what the system provides, there's a way to come out and then do all those things. Typically we have to do these things by writing a Docker file. People write, use containers, and then ask data scientists to write Docker files so that it is generic, but it's really too low level for data scientists to work on. And you don't want to do that for every time you want to work on it, right? Because there may be a case where you want to do some customizations, like 1% towards 0.5% of the times, and it's a sin to kind of force everyone to kind of write this complexity to anticipate for that. So let's see what happens behind the scenes. Now every time a project, it's a 45 minute session, yeah. So it builds two Docker images. So Docker is a container system where you can take build a Docker image and then run it on any machine. So behind the scenes, it actually builds two Docker images, one for CPU and one for GPU. Randoms are actually also built in the same way. So there's one runtime image for CPU, one for GPU. It manes compute instances automatically. It also pulls the computer sources to optimize the source utilization. For example, if 20 people are working, you can actually put all of them on one machine with half the capacity, so that when your people are working on a notebook, a lot of times they're kind of looking at the code and then deciding what to do next. Computing, I mean, the utilization is very low when people are working with notebooks most of the time. So you can actually, so this has an option to pull resources. So the instance types are dedicated. S1 and S2 are pooled resources. So you don't really get a dedicated instance for you, but you can put a lot of common usage kind of goes there. It uses a network file system to purchase data and notebooks so that you stop in a notebook and then you want to start on a different instance. You actually start from exactly where you left. And it automatically manes URL endpoints and HTTPS. So when you start running a notebook, it gives you an HTTPS URL that you can start working with. Now coming to, so the question that you've asked, you're saying it's not safe, right? So this system runs inside the customer's VPC. So data is still there in the customer's cloud, doesn't really come out. So if you're an organization using a system like that, it stays in your own system. So we are building it for others to use, but if you are designing systems for yourself, you can still, even though it's in the cloud, it's inside your private network, so data will not go out. So that's not a problem at all. I think the important thing is to make sure like who are developing it for, and then are we speaking the same right level of abstraction for that people can use. So the three abstracts that we have created, one is a project, other is run times. The third thing is the instance size. So data scientists just have to focus on those things and doesn't have to worry about anything else. Everything else is managed by the system automatically. Let me show you how it works. They create a project, and they create a project, they pick the runtime, and they select the machine instance size when they're starting instance, and say start a Jupyter lab. That takes a while, and then launch a notebook, and then you get a Jupyter lab that can work on. I think I stop it, and then the system takes care of stopping the resources that you have used and all that. So everything else is kind of managed by the system, except what project you want, what's a runtime, and what kind of instance that you want to start with. And you can stop here, and then come back tomorrow, and then start your work. So, yeah, sure, no, it's already, so when you set it up once, it creates a Docker image behind the scenes, and then keeps it, and when you start running it, it starts from there itself. So that's, so if you set up a machine on a GCP, you tried something, and then you found that an algorithm working better for you. Now, you want to work on the entire data set, so it doesn't work on the same machine. You want to work on a larger instance, but you have to still run it and do it. You still have to run the whole thing, okay? You have to run all the installation sets up again on the new machine, okay, okay? Okay, so, okay, so the thing is, Docker is a design choice that I have taken that's not exposed to the end user, okay? So I think what's important is to keep what experience that we're giving to the end user, okay? So behind the scenes, what technology choice are made are ease of managing for the platform, okay? So Docker is used because, for example, you want to run it on a GPU. Will you be able to still do that? No, because it doesn't work exactly the same. I've never used GCP for GPUs, but the thing is what happens is you need to have completely different software setup for GPUs to work. You need to install TensorFlow CPU for CPU and GPU for that, okay? And that requires a lot of CUDA libraries and then there, whenever the version changes, you have to make sure that appropriate versions are installed, okay? So, believe me, I spent so much time, I kind of cancelled myself as an expert in infrastructure and Docker and all those kind of things and I've spent a countless number of sleepless nights whenever I upgraded from TensorFlow 1.6 to 1.9 or something, okay? Because the CUDA version, let me tell the story, okay? So, it was before one of the deep learning workshops that one of my friend was doing and I was providing the support, the people are using GPU standard platform. So, I upgraded the TensorFlow version to 1.9. It really stopped working, so TensorFlow wasn't working. I realized that it says the CUDA version is mismatched, just 10105 or something, okay? So, I went to Docker Hub and see that it has 1010. I couldn't find the exact version of the CUDA library because that's a minor version change, so they didn't actually maintain, NVIDIA didn't maintain that Docker library for that version, but TensorFlow insisting that it needs only that version. Now, what I had to do was actually figure out, go to the GitLab of NVIDIA, look at their versions and then clone that repository, go back to the date where they actually switched from 1005 to that version, get that code, build it myself and then now I saved it to my Docker repository and then pinned it and then started using it, okay? And this took me like whole, the next day morning was the training, okay? And I had to spend all of it, okay? Given that I had so much of experience in doing this, I could do it, okay? I mean, if you start managing these things on your own, when initially this comes in, will you be able to manage it yourself? Maybe you have experienced Docker, okay? What are the rest of your data science team? Will they be able to do the same thing? Let's say if an intern joins your team, if he gets with this trouble, okay? Will you stop all your work and go and help him? Or, right? So it's very important to build systems and platforms so that it speaks right level of abstraction. Otherwise, you can get this kind of issues. Now, what we do is, so this is part of runtime. So we change runtimes once every three months. So if the TensorFlow version is changed, then we change it and the next three months, we get the new version of it. Then we kind of do a thorough testing and make sure that that is addressed. But if you are trying to do it for your project basis, then you have to repeat the same thing again for every project that you're working on. Does it make sense? Sure. So Google Cloud, so right now this runs on AWS, okay? Could run on, what we have built is on AWS. It could run on Google Cloud as well, but if you look at AWS or Google Cloud, the two low level for data scientists use. Fair, I mean, I'm not saying that you should use this. The thing is about, I'm sure like, what Google is providing fits is that right level abstraction for your team to work. I mean, I think that's a fair choice, okay? What I'm saying is, I think the important message here is that it should keep the end user experience in mind. And I mean, frankly, that I have never found any of those tools that are developed by Google and AWS solving that abstraction. For example, I've looked at even SageMaker, it doesn't allow you to kind of switch between compute needs. For example, you have to decide the machine and then it starts the machine and you can't really switch to a different instance type, even with SageMaker. Though people are doing it, I mean, frankly, I didn't find it. What I'm saying is, I mean, you don't have to kind of buy my idea, but the important thing is to kind of understand the key ideas that are behind the design choices. Why I made the design choices, that I think there is an important element there. Made the design choice because the target audience are the data scientists and can find optimize it for that audience, right? Any other questions? Sure. Sorry? Okay, I think it depends on the platform. So, I mean, in our case, but using, no, no, no, it depends on much lower level how the architect is designed. For example, we use a network file system, okay? So if you create a file and other person can pick it, okay? But if you, people are reading the same notebook at the same time, that might create some trouble, okay? So usually, whenever you have a common state, concurrent rights are a problem, okay? So usually when you are doing concurrent rights, you can acquire a lock and make sure that only one person doing it and all that, okay? But if one person writing and multiple people reading it, usually not a problem. No, no, I think that's probably a limitation. I mean, the way it's kind of architected, okay? I don't really know how that system is designed, okay? Sure, no, we don't. So this system, what I'm talking about, gives you a notebook. No, I'll come to that. So I'm gonna talk about deploying later, okay? Sure, no, no, I'm sorry. I never worked on Databricks. It runs on Spark, right? Okay, yeah. Okay, yeah, I think it runs on ML, I believe, yeah. So, see, the thing is, frankly, so this thing kind of came out of our requirements, okay? I'm sure it could have, see, anyone who kind of keep focus on that end user developer experience would design something in similar lines, right? You wanted to, if you kind of study a bit in detail and then keep a long term plan, probably come up with something similar, okay? So that's about the first case study, okay? So that's how we build it, okay? So the important thing is, the focus always on the developer experience. The end user is a data scientist, and then he should be able to work comfortably without having anyone else need. If one started GPU, yeah, sorry. So everything else is behind the scenes. Yeah, so the end user, sorry, yeah, sorry. No, we're just using a network file system. So we provide something that you can also store use anything else, you can save the database or some, or an F3 bucket, et cetera. So this stores inside the network file system. We use EFS for that, but so anything that quick reads and write that you want to do, you kind of keep it there. So it's not very high-performance, but it's very, very, very convenient that you write some file, et cetera, but for hitting your performance, then you probably have to switch to F3 or something or database, but for the most common piece, common needs, it kind of solves it, okay? The second thing that K said that he wanted to talk about is deploying machine learning models. This is often the case, and if I kind of Google for deploying machine learning models, I actually find very complex picture like how people do deploying machine learning models and all that, okay? So therefore, this is more challenging than actually launching notebooks because you have to design and document your APIs. How do you actually build an API? How many of you actually designed some kind of an API before? Okay. So if it involves designing and documenting the API, running the service and config your endpoints, you have to do a lot of devops for that, scale to meet the needs, and then once you kind of build the API and it's running, you probably have to write a client library so that you can actually use it easily. And then how do you authenticate that, tracking the user in performance? So the common task that you have to do it, every time you deploy an API. So you have to kind of, so it kind of become repetitive unless you kind of build some kind of system to handle that. So also, I keep things to keep in mind, it's like once you've done this, okay? How long does it take to deploy another API exactly like that? Would you have to kind of repeat like n different steps or can it be done with one single step? How long does it take to explain to someone new? These are the concerns that you actually keep in mind when you're designing these things. The traditional approach is people write, pick a web framework and then write an API and then deploy using GUNE coordinates and X or something, and then put in some machine deployed, Docker, AWS, I mean these are the kind of, both words that kind of goes in when someone deploys machine learning model. And if you go for how to deploy machine learning model, you'll find some diagram showing like all these things interconnected and all that. But what we have observed is that, when I try to see how to deploy machine learning model, we found that running a Python function in API is really hard. Take any, write a simple Python function and deploy as an API. There's so many things that you have to do. It doesn't have to be that way. So we thought what's right level of abstraction for doing this? So one, you have a function to start with, write your Python function, this is square function. I don't deploy as an API. I don't have to do anything more, right? I won't run this function as an API. That's the best abstraction that you can think of, right? So we've built a tool called Firefly to do that. There's an open source library. So it really built just to solve this problem of deploying machine learning models as an API. All you have to do is Firefly, module name, dot function name, you get an API to use it, okay? Right? So you have a Python function, you already written that takes some parameters. You can build, so the tool what it does is, looks at the function that you have, what arguments it takes, and then creates an API out of the box. And how do you use it? You can create a client. So Firefly comes with a client out of the box. You can create a client and give the URL of the client, and then call that and pass the parameters and it gives you the result. Behind the scenes, it actually calls REST API and gives you the result, okay? So when we started looking at how to deploy machine learning models, we realized that there wasn't any tool that speaks the right level of abstraction. So we built this tool called Firefly, and then this is how it works. What are machine learning models? We're just talking about toy square problem, right? It won't be any different. So you write a phase detection dot pi that take import job live, you load it model, and then there's a pretty function that takes image or image URL, and then runs dot predict on the model and then gives back the results, whatever you're expecting. Make sense? Now that's fine, but you don't really want to do those things yourself. You want to kind of have an integration into the bigger platform. So what we've done is we wrote a config file. We specified a config file. You write a config file in the project, specifies what API is running. It's a simple ML based config file. It says these are services. It's the API I want to run. The function to call is phase detection dot predict and use instance of size S2 and you deploy and it starts running that API. It gives you our endpoint to use that API there. So if you look at the previous thing, we have instance size S1, S2. That's instance size that you can specify here. Now, when we started doing it, we realized that we wanted to, so we kind of built this and then thought, let us build a showcase to kind of demo the APIs that were built and realize that from JavaScript I'm unable to call it because you need to enable cross origin resource sharing. Now, if you build an API yourself, then you have to go on and add it to every API that you build. So what we did is we added that support to the API. So you just add cross allow origins and specify what origins that you allow. So if you ever worked with JavaScript, Browse has a security model that will not allow you to send requests to any other domains other than the domain that you're on. That's a security risk if it allows, okay. To be able to allow that the website, what you're calling should allow it to be called. So you have to specify a special header called cross origin allow, sorry, cross allow origins. So you add that additional flag here in the config file in the deploy again. Now it's ready to be used with JavaScript. I want scale up. Say scale, I want to run four of these instances. That'll run four of them. And then you actually get your land point with GPS and everything. The API is up and running. So if you take a data scientist, you want to run, deploy an API. All you have to do now is write a paid function that's still in this territory, right? Write a paid function, takes these arguments and then give something back. And then you have to write this small animal-based config file and then specify what all things that you need. And they could multiple of these things and this is a push button. Just say a row, row, deploy. That's a command line tool that we use. The deploy, that goes in, deploys it. So sends all the code to the platform and then deploys those applications. See behind the scenes, okay. It's based on the same, part of the same platform that we talked about for the notebooks. It builds the Docker image. Again, it starts with the runtime. You can write your requirements at TXT or app.txt for installing other things. Starts all the specific services, provides your land points with HTTPS automatically. So when you talk about deploying machine-line models, what the data scientist, the end user that want to do is like, I have this function and I want this, I have a model file, I want to deploy that. A lot of times you end up doing some kind of preprocessing, so you make it right by a valid function and then the data scientist just have to specify what function to be called as an API. And he has all the levers to make what all things that he wants, okay. And also after the box it generates the API documentation, picks it from the Python function. So it looks at the doxing of the Python function that you are exposing and looks at the parameters that you are using and then generates the API docs automatically. So if you have an API, you don't have to worry about writing API docs. You don't have to worry about writing a client library because Firefly gives the client library by default. Now we are working on adding support for type checking. Python 3 has this beautiful type checking function that you can write function adaptations. You can actually say that this takes list of strings as argument and gives one string back, et cetera, okay. Like you do in other language like Java. You can do that in Python 3. So we're trying to integrate that and actually make that part of the library and we're also working on adding GRPC support. So what would eventually happen is you kind of say GRPC mode equal to GRPC then it would start the GRPC service instead of the REST service. So GRPC is known to be more efficient than typical REST APIs. So again, that's something that next extension of the kind of planning, but the important thing is having architecting a system like that, keeping then developer experience in mind, allows you to expand it that way and the data scientist will still be able to kind of do those things without having to understand what goes behind the scenes, et cetera. Any questions? So we have built on our own, but right now, so we can build with Docker and all that. Now we're kind of putting that to Kubernetes so that we don't, right now, we kind of manage starting servers and all that. We're kind of putting Kubernetes now so that it automatically takes care of starting and gives the scaling ability out of the box. So that's about the second case study. I think one, let me stand with the quote, the good design is invisible. What it means is you don't really feel the presence of a good design, you only feel the absence of it. When you architect a system with the right abstractions, you don't really feel that it doesn't come on your way at all, you'll never notice that there's something like that system is present. You will not even see that is actually doing all these main scenes because you don't really have to worry about it. But if it's not really the right, then it comes and obstructs you and all your flow and that's when you kind of see that system. Let me summarize. So the making data science team self-sufficient is key to their productivity and optimize, always optimize for developer experience and right level of developer experience, right level of abstraction is the key for kind of making that happen. And thank you and I'm open for questions. Sure, so see when you're deploying machine learning model, the way typically web scales is you can start many of this, the many instance of the same process. So it scales automatically, you're really not doing any parallel computation there. So no, so there are two things, whether you're trying to optimize the throughput or latency, okay. So starting multiple instance, multiple servers will surely improve the throughput. If you want to improve the latency, then you have to make sure that CML government can run on multiple instances, multiple codes. So then the algorithm has to support that. So the other, again the important thing is now, once you have a system like that with the right interface, you can throw the implementation everywhere and then build something else. And still, the data science don't even have to worry about what's changing, okay. For example, there are some libraries to deploy the machine learning models built in Python and run them on JVM. So that the start time and they can actually run faster. For example, I was speaking to someone at Uber who built Michelangelo. They say that they actually try to get five millisecond latency for each API call. How do you achieve that? So they said, I mean, Python is too slow for their needs and they want to run everything on Java, okay. So now, if that's the need for you, maybe you can actually add some feature to the platform where once you put the machine learning algorithm, it translates that to something that Java can read from. Then the deployment, the process is the same. Data scientists would see have the same experience, but behind the scenes actually gets deployed. In Java or something, okay. So also, I mean, usually, I mean, that's one thing, okay. The other thing is when you have the tool language problem, the way people are solving it now is using microservices. Because you run it as an API and then the end users who want to consume the API, free to choose whatever language that they want to use. So you don't really, so I don't think we're kind of addressing that at this point, there's a prediction, okay. Prediction, so usually inference is very fast. Training is what that takes time, okay. So there are other parts of system where we're working on is trying to give one of the common patterns that people follow, okay. Data science, we're trying to make these patterns available as part of the platform. One of the things is offline queue. So you want to able to submit a job and then give it, get ID and then keep on polling it, okay. So setting up is actually very complicated. Usually you have to set up something like it is and then offline queue and number of workouts and all that. So what we're working on is kind of have a core platform and then have these kind of plugins. Then say I want to, for this project, I want to have an offline queue. I want to have an airflow as a plugin. So all you have to do is write the airflow files and then put it in an airflow directory. It picks it from there and automatically scales, automatically configures how to running them and the end user can say that what scale that you want to operate on, how many workouts you want to run, et cetera. Everything else would be part of the platform. So I think anything that you're doing, what you should keep in mind is what is the level of experience that you want to give? What's the level of abstraction that you want to provide to the end user? And then back what, build those kind of systems. Sure, yeah, yeah, thanks.