 Welcome to our talk about CGI-BIN, which is definitely a really new technology. It makes developing web applications really simple. All you have to do is drop a really simple Perl's Clip script into a certain directory called CGI-BIN on your web server. And that's it. It's deployed, and you can just run it from whatever path it's at. So unfortunately, our talk is not exactly about CGI-BIN. It's about functions as a service, which is definitely a new technology for Kubernetes. This talk is about fission. So on a more serious note, it's been more than two decades since CGI-BIN, and we've lost that simplicity that it had. So today you have this long series of steps, build a container, push it to a registry, figure out your Kubernetes configuration, make sure your cluster is set up correctly, and then repeat some of these steps every time your app changes, rebuild the container, push it, figure out how your version's run. So this is not to say that this new world is terrible. We've gained a lot. We've gained the ability to have homogeneous deployments using containers. We've gained really powerful orthogonal distributed systems primitives with Kubernetes. But we've made it really hard to get started in this world. So what we're really trying to do is get to a point where we can have the power of containers and Kubernetes, but make it much simpler also. So hold that thought. I'm going to switch topics a little bit to resource utilization, specifically CPU and memory. So we're going towards this world where we have divided our solutions, our applications into a lot of tiny pieces, and many of those pieces are very rarely used, especially if they're driven by events that occur fairly rarely. So once you have enough of those services, your cluster capacity has to account for all of the services that you've deployed and our idle, and also for the services that are actually loaded, which are the only ones that you really would like to pay for. So ideally, the services that are idle should be free. So what if we could solve both these problems? What if we could have containers and the power of containers and Kubernetes, but really simple dev workflows? And what if we could have our cluster capacities be more directly proportional to our actual usage? And the answer to that, the answer to both of those things is functions as a service. It's one of the really good answers to both of these questions. And to focus on one of the points in functions as a service is that if you want services to be free when they're idle, and if you also want them to have good performance, especially latency, when they do get a bunch of requests, then you need to make sure that you go from no instances to enough instances really quickly when your traffic does come in. So this brings us to Fission. Fission is a function as a service for Kubernetes. The user, you write short-lived, stateless functions. You define them declaratively, and at the source level, we'll talk much more about that. And they are free when idle. You only pay for the storage of those functions. They consume CPU and memory only when they're running. And they start quickly on demand. So as a user, you have only three concepts to learn in Fission, functions, environments, and triggers. Functions, more properly, they're modules, but the entry point is a function. The function runs inside an environment. So an environment in Fission is all of the language-specific stuff that's in a, all of the language-specific stuff of Fission is in environment. We have Node.js, Python containers, and so on. And an environment is a container, which loads the function dynamically. And a trigger is a mapping of an event to a function call. So HTTP triggers, map HTTP requests to function calls. They are message queue triggers, and so on. So there's a whole bunch of triggers and environments supported, Node.js, Python, Go, the synchronous HTTP, two or three message queues, and Kubernetes watches, timers, and so on. More complete lists on the website. So let's get a little bit into how Fission executes functions. And this is one of the ways Fission executes functions, but it's a more interesting way, so we'll talk a little more about this. Fundamentally, the way to get fast code starts is by having a pre-warmed pool of something. And since Fission runs on Kubernetes, it has a pre-warmed pool of containers running in pods. And so there's a Fission client, and Fission resources are stored as Kubernetes custom resources. So the client lets denote functions as these colorful circles. And the client uploads these functions into the Kubernetes API. I'm oversimplifying a little bit. And the Fission pool manager notices that these functions have been created on the Kubernetes API. It figures out what environments these functions are running. So these functions need. So if there's a Node.js and a Python function, it creates pools of Node and Python containers. And let's focus on HTTP requests. So if there's an HTTP request, it comes into the router. And let's say it's a blue request for that blue function. Then let's say it's the first time this function has run. So now that request is waiting. And we need to create an instance, a running instance of this function while that request is waiting. So this is a cold start. So what we need to do is the router makes a request to the pool manager, which draws a already running pod from that pool. It loads that function into that pod. And it hands over the address of that pod to the router, which then proxies that request into the pod. Really as more requests come in, the same process repeats. This cold start process takes on the order of 100 or so milliseconds, give or take. Those pods are cached as for a while, even if there are no more requests for a few minutes, so that subsequent requests can hold on. Subsequent requests can reuse those pods. And if there are no more requests, if a function is idle for several minutes, those pods are killed. And you regain that CPU and memory that they were using. So the function is free again. So let's look at how this actually works. I'm going to switch my display so I can actually see that screen. That's way too small. People in the back, can you read? Yes? No? Anyone? Yes? All right. Thank you. Okay. So this is a Kubernetes cluster deployed on GKE with Fission installed. Okay. We're connected to the cluster. There are those nodes. You can just see the Fission deployment. It's not super important what those pods are. We run a really simple hello world function. And we'll create an environment for that function. And this demo cluster, it was already created. But all you have to do is specify that Node.js environment that Fission ships with. And here we're just looking at the pool that got created. So all of those containers, all of those pods are idle. You can tune how many there are, but in this case, there are three. And we create the Fission function. This function is now uploaded to Fission. It's stored as a Kubernetes custom resource, but it's not executing yet. There's no runtime resources allocated to it just yet. Then we set up an HTTP trigger for this function, also called a route. And we actually hit that route with curl. And let's see. So here you can see it took, so we ran curl three times on that hello world function. It, first of all, it actually worked and we got hello world. You can see that the first one was a little bit slower than the subsequent invocations. The first one took 290 milliseconds, that's 90 milliseconds, and that's about 100. So there's something between 100 and 200 milliseconds of overhead in the cold start. And that part is cached. And you can find it with your usual Kubernetes commands. The part is labeled with the function. So if you have any monitoring tools that use labels, you can keep using them. Okay, so that's hello world. I'm going to switch back a little bit to our slides and talk about specifying an application. So we saw that this demo creates your function using the command line, which is great for starting out and experimenting. But how do you deal with, how do you do that in production? Where do you save that command line? Do you write a script and check it in? That sounds kind of horrible. So ideally, you would have some sort of declarative specification and you'd check it into source control. And that's great for doing updates, for having idempotent behavior. And what we really want is both. We want command lines to get started and declarative specifications so that we can check them in and have ongoing maintenance. And we can do both of these things. We're doing something that we call config by example. So what we want to do is say, create this function, but save what you're doing in some sort of declarative spec. So that you can then apply that spec to another cluster. So let's switch back to, let's switch back to, all right, let's try this. Okay, hopefully people in the back can read this. Okay, so let's take a simple Python function and it's over there. I'm actually gonna full screen that. And now what we've done is, we've done the same function create command that you saw earlier. Except that we've saved what it did in a specification, right? And that specification is saved in a specs directory at the root of your application, and there's a lot more in that YAML, but this is basically a Kubernetes custom resource. You don't ever have to write this YAML from scratch. You can just create them using these command lines. Okay, and then you can just apply this spec. You can check that YAML in and you can apply it to the same cluster, to any other cluster. And so that function is now created and we can run that test. So that's actually run. Now what we'd also like to do is have a really nice and fast code, edit, test, debug, change the code cycle, right? You're especially in a developer workflow. So what we can do is, since we have a declarative specification, we can get the fission client to watch your file system. And every time you change the file to rebuild the function, upload it, and deploy it. And this is basically to make your dev workflows really easy and fast, right? So let's save that. And the efficiency ally says, okay, I noticed a file change. I'm gonna apply the spec again. And now if we just reload this, we see it's deployed, okay? So you have a feedback loop that works on the order of milliseconds, instead of a really long time. We didn't rebuild any containers here. Your code was just deployed in the cluster. Now, this is great for a dev workflow. But what you can do is you can pull out the file. Once you qualify your file and you're satisfied with the testing, you can then save that file and check in your specifications and use them in production. And everyone on your team can have an easy development workflow using this kind of thing. So this is Python in which it's fairly easy to do these things. Because it's an interpreted language, so we don't have to build it. But what about things like go? So with go, we need to build the function. And we can do that too in Fission. And we can also do it declaratively. So here I have a go hello world application. The specs have already been created using the same way. And I'm going to do the same apply watch thing. And oops, so that one was already there. But let's do the same thing. And okay, so we save it. And now Fission needs to build that. So it just builds it. The pattern is again declarative. So the source code is uploaded and a build controller watches that package, notices that there's source code, runs the go compiler. Nothing needs to be installed on your local machine. All of the actual compile happens on the cluster itself. And you can reload that and that works too. All right, so you can keep watching this. You can keep making changes. It takes a few seconds for the go builder to run. You can do that. All right. Okay, so this is great for, we talked a bit about development workflows. I'm gonna switch to things that you care about at runtime. So things that you care about at runtime. Two of the things that you care about are observability and we'll talk about auto scaling as well. If you were at Ben Siegelman's keynote yesterday, you saw how modern outages are kind of like murder mysteries because there's so many interacting components and so, and if you think about it functions as a service makes those murder mysteries even worse because now you have 10 times as many interacting components. So observability becomes really important when you have a lot of interacting functions. So like everyone else, we're really excited about Istio and we've integrated fission with Istio so that your function pods have the Istio sidecar have Envoy and they report into, it repos data into Prometheus, Grafana and so on. Let me find that. So we have just a hello world function in Node.js on this cluster and we have Istio deployed as well. So you can see, Istio is deployed here and when we actually test the function, okay, pretend that didn't happen. All right, so this is a hello world function for those who can't see it at the back. Here it is again. And that shows up on both Prometheus and Open Tracing the Yeager dashboards that Istio creates for us. So now let's actually generate a little bit of load and see how that shows up in our dashboards. This is the function being loaded. A is a simple load generator. What we're doing here is sending it 2000 requests with, let's send it 200, with a concurrency of 20. So 20 requests simultaneously going on. That was the little test request we did earlier so we should ideally see this graph go up in a moment. And this is the usual Prometheus dashboard that Istio creates. Meanwhile, okay, here it is. It's doing, it'll eventually go to 20, I think. Oh, it actually finished. We can do some more. All right, and similarly, we get tracing. So let's pick an arbitrary trace and look at it. And we can see here that this trace is pretty simple. There's just two spans. This is the first one up there. You probably can't read it. It is the efficient router. And the lower one is the actual efficient function which just does hello world. In this case, the router took 30 something milliseconds and the actual function took 28 milliseconds. So this might have been the cold start. This is a slightly more representative trace. Not really, I'm not sure why. Anyway, this trace actually shows you the overhead of the router, which in this case is of single digit milliseconds. In this case, it's two and a half milliseconds, which is on the high side, but at least you have some visibility into your system. And we can talk a little bit later about what we're trying to do about the overheads of our router. All right, so that's observability using Istio service mesh. And I want to talk a little bit about having more traffic on functions. So the execution method that I showed you earlier is great for having really low latency for loading a function into a pod and being able to send it a request. But it's not great for systems that have high throughput, but may or may not care about latency. For those systems, you need something like the Kubernetes horizontal pod autoscaler. And I'm gonna demo that in a video because the actual autoscaler takes a few minutes to actually notice things. So this is a speeded up video of the autoscaler running. So the fission environment is created and we allow you to specify min and max CPU and memory. These are defaults for functions created in that environment. The function can override it. So we're creating a function and we define a min and max scale. We pointed at that environment. We define a min and max scale of one and six. And we specify that we want to use a new, the separate deployment backend, which will also create a horizontal pod autoscaler. Next, we create an HTTP trigger for that function so we can generate some load and point at it. We do a simple request on it to make sure it works. That's, again, a simple hello world function. We've artificially kept that, we've artificially kept the max and CPU really low so that hello world will actually autoscale. Otherwise, you would be fine with a very small number of instances. You can see that fission created a horizontal pod autoscaler for that function automatically. And again, we've set the target a bit low for this demo, but you can set the target to something more realistic like 80, 90%. And it set a min and max scale that you had provided at the function level. And currently, there's just one replica. So again, we start our load generator with a concurrency of 250 and we send it to that same URL. And what we're looking for is that CPU to go up and the pod autoscaler to catch up and create more replicas. So it does so pretty quickly. Again, this video is sped up about 2x. Yeah, 2x, I think. So you quickly reach three replicas. The CPU usage is still pretty high. Once the actual load generator finishes, the CPU usage comes way back down and the pod autoscaler notices that and it actually has a scale downtime of several minutes, which is why this is a video. But five or six minutes after your load stops, the autoscaler brings your instances back down to the minimum scale that you had set up with the function. So this is function autoscaling. And there's work going on to combine both these back ends so that you can have low latency for the startup as well as high throughput when you have a lot of load. All right, so autoscaling. Go back to our slides for a bit. And I'm gonna talk a little bit about larger applications that contain interacting functions. So there's a lot of different ways that you can have your functions interact. You can just do plain old HTTP requests from between functions. And because of this integration, you'll have some measure of observability into those. You'll have some insight into how those requests are going, but it would be pretty cool if that entire interaction between functions were abstracted away in some way. So we've created a system called Fish and Workflows where you can create some sort of flowchart in some sense of functions. And the workflow engine will coordinate those functions. It'll manage both data and control flow. And you'll actually be able to have functions talk to each other without calling each other explicitly using a separate workflow that operates on them. So I'm gonna switch back to a demo. Let's talk to mostly demos. Let's see if I can find my, okay. So again, I'm gonna show a really trivial workflow here. We're gonna run the Unix Fortune command as a function and it output some sort of random code. And we have this function, which outputs an ASCII art cartoon whale that contains a speech bubble containing whatever it sent. So now we're gonna create a workflow that combines these two functions without actually having the first function called the other. And we do that by defining a workflow in YAML. And it's got tasks. And each task is a Fission Function call. Let me make that a bit bigger. And the first one is the Generate Fortune task, which calls fortune. And the second one is the Whale With Fortune. So it calls whale say with the input of that first task. So the input here, that's the data flow. And there's a requires, which makes sure that the generate runs before this task. And that's the control flow. So you can define dependencies this way. This is a really simple workflow with two tasks. But if you had more tasks, you could, you have implicit parallelism. Parallelism. Hopefully we should see a whale saying something silly. Eventually, the code is entirely run. Okay, so the first function was run. The workflow engine interpreted the YAML and allowed you to allow those functions to just have the data sent from one function to another. We don't have a whole lot of time to dive into how the workflow engine works. But essentially it uses a message queue to have persistent events tracked as the workflow executes. So as each task finishes, it's tracked in the message queue and that triggers the workflow engine again to invoke the next function. And that's also how concurrency works. So if you have a task that depends on a bunch of different tasks and they don't have any dependencies between them, then all those tasks could run in parallel. And again, you would do this without explicitly having any kind of explicit parallelism in the workflow. Okay, so I've run through my demos. And so a little bit about the status of the project. Fish and Core actually open sourced exactly a year ago at KubeCon. It's close to beta. Real soon now, we should be releasing a fairly stable beta. We're gonna focus on performance, security, scalability and so on and have a 1.0 around the middle of next year. Our workflow is a relatively early project. It should have a beta mid to late next year. Yeah, and security, scalability and performance are the focuses of the project for the next few months at least. For more on the roadmap and everything else, check out phishing.io and GitHub and talk to us on Slack or Twitter. And I think we have a few minutes for questions. Anyone? Hey, thanks. Somebody to search this on. Yeah, it's on. Hi. Can you explain how imports like requirements are baked into a function? Yes. So I can open up an example. Essentially, the declarative build system that I showed you, the builder is the one that can do imports and requirements gathering as well. So you can provide a build script with a function or you can use the built-in one in the environment. And whenever the function is built, it uses, actually let me just use my editor. It uses that. So the stuff I demoed is, it's actually in a pull request. But you can have, I'm having a hard time finding it, but let me tweet out that link later to actually show you how that works. But essentially you can have a spec of requirements using whatever is idiomatic in the language. So if it's Python, there's a requirements.txt. If it's Go, there's either a glide file or the various other dependency tools. And you can write a script that runs those. And Fission then packages it and is in charge of transporting that package to the function when it's supposed to run. What kind of message queue do you guys have? We do not deeply integrate with any message queue. The picture I showed you earlier of triggers, that's how we integrate with any message queue. So there's a NATs message queue trigger. There's a Kafka trigger. That's super early right now. And folks from Microsoft have contributed an Azure storage queue trigger as well. So that's it for message queues. The workflow engine uses NAT streaming, which is a durable version of NATs. Yes. Where are you storing the build output? Yes, so Kubernetes custom resources have a size limit, obviously. So Fission actually installs what we call a storage service. It's a really thin wrapper on top of either persistent volumes or it can also be configured to use things like S3. And you fetch that when you specialise the containers. Exactly, right. So the custom resource points to that object and verifies the integrity and so on. So if I were to run something like NPM dependencies through the builder, you would fetch the entire node modules down when you were doing specialization of the container. No, no, no. So the build process is separate from the running process. Right. But if you are storing that in S3, when you specialise the container, you'd have to fetch the whole node modules. Right. There's some work going on on prefetching. That would affect the cold start times, especially if these dependencies get really large. Yeah. Could I write a pod initialiser in Fission? And if so, how would I do that? Yeah. You're talking about the new Kubernetes initialisers feature. Right. Today, right now, you'd have to create a, the best way to do it would be to create a trigger for Kubernetes initialisers and then declare, you'd need a new type that says this is an initialiser and the Fission implementation of that would be a custom controller that watches that and executes a certain function whenever the API calls specified in the initialiser actually occurs. So this isn't implemented yet in Fission, but we want to have a way for you to do this without having to do all the work of writing the controller. We do have Kubernetes watches. So you can watch an arbitrary set of resources after they're created. So it's quite different from the initialiser. But if you're trying to do custom behavior on top of Kubernetes, that fits some of the use cases. So you could use Fission to create like a controller that watches for CRTs, for example. Yes. Okay, thank you. Yeah. Anyone? All right. Thank you very much.