 So, welcome everyone. We will start the talk now. This is about an enterprise-grade serverless framework using Kubernetes. We call it pontoon and we will go over the details of it upon how it has been built and for what particular purpose. This may be a bit of a repeat if you have attended our earlier... Well, today we had two plus one, three sessions already gone by on serverless in the same room. So, it may be a bit of an overdose. We have one more coming after my talk. But in very quick to recap, serverless enables running your backend logic without you managing the servers and it also gives you instantaneous scalability because most of the things are event-driven. Of course, it puts some constraints that you have to think about, especially the process should have a particular finish time. It should have... It should be ephemeral. It should also be stateless because once the execution is done, you will lose everything which is maintained there. And good practice. Try to have minimal execution shares if at all, no shares across your execution states. And typical use cases for scheduling tasks, dynamic and burstable workloads, message-driven applications. But if you're thinking that serverless is a new paradigm, it's actually not. It has been, well, there for quite some time, it's just like we are catching up at onto it as a trend. And with trend, there is a lot of community enthusiasm and hence good and proper tools which you can use for writing your applications. And we hope that it's not yet another hype cycle, but rather this is something which is there to stay because we are... We as a company for the work that I do, we are benefiting from it. The major benefit is because of the economics, you will see you should be able to fairly reduce your overall cloud expand because you are going to pay only when you have a real usage of the hardware underlying it. And the projections are looking pretty good. Most of the workloads will move towards this direction sooner or later and hence we have all the big names, all the cloud vendors coming up with a native implementation of serverless. So with this brief upon our agenda, we'll start with the needs, then we'll translate the requirements, and then I will hand over to my colleague who will go over the pontoon architecture. And the last key take-aways will also have a live demo. So what is our need here? So the two of us mean, you can call me K.G. I'm initials and my colleague, Magesh, we work in the VMware cloud services. It is multi-cloud management solution which combines the capabilities of application, orchestration, monitoring, cost insight, and network insight. We ship our binaries in Docker container packaging for all the goodness that it comes with, and we also use Kubernetes heavily as an orchestration layer. We do run both on AWS cloud as well as vSphere private cloud. That means, sorry, we can't use AWS Lambda. A few months ago, we realized that there is a need for us to use a framework to run both schedule job on-demand jobs as well as some of the bursting applications. We did evaluate a few of the popular ones for us to see if we can readily consume them, but we have to finally build up, downed up, which is what we call as pontoon. And it was fairly easy to build. We were able to use it as a pattern on top of Kubernetes jobs, and it took hardly two to three engineering effort. I mean, one person engineering effort for two to three months to get it done. And before we proceed further, I would like you to keep an eye upon why we are building it because we primarily build it for self-consumption. So we are solving it for self. It may not cover all the use cases, which we will highlight, and important disclaimer, unlike Google and Red Hat, we don't get paid for solving technical problem but only business problem. So that's a cap on engineering investments we can make. But yes, we are pretty happy with the end result which we have. So changing gears, now let's look into the requirements which we came up with. So we have two sets of functions. One is the serverless functions and the other is the serverless framework itself. So we came up with these six requirements. The first one says a process, function is a process which is capable of running business logic in response to an event. It should be finite. So you can preempt on timeouts. It should be, of course, stateless. That is anything it has to persist should be in shared persistence storage. And good practice, it should have shared nothing architecture between executions. We don't impose any language constraints on containers, but we know that we will be fairly using JVM Python and maybe go at some point in time. So container was a default choice for us. Runtime of our functions is not exactly in millisecond. It is built to cater to order of few seconds to a few minutes. And hence container was absolutely fine for us. And though our function has to be idempotent, we do acknowledge that accidental multiple invocations can happen and will happen. So we handled it in the persistence layer as well as in the application code. Second set of requirements quickly capping through is on the serverless framework itself. So since we are heavily built upon Kubernetes, we wanted it to be a sort of facade on the top of it. We should be able to persist the logs and also do root cause analysis when needed. Function should be able to access other Kubernetes resources. This is a very, very important requirement for us. Because when we were designing these functions, we knew that they are going to run for quite some time. They will interact using Kubernetes network constructs with other live services available. And hence it should be able to access those resources. We also have a configurable way to say what should be the retry limit for a function and how many such parallel invocations should run. We can't of course assume infinite parallelism because it after all costs money. We do limit our resources on AWS using reserved instances and hence we know that we have to limit the parallelism to a great extent. It should be highly available and the execution of function should be visible in today's terms we call it observability. We should guarantee at least one execution of each function. Exactly once is desired but may not eventually happen. There can be cases where two invocations will also be fired. And lastly, developer friendliness, so ability to register function with an API. And also let me call out the non goals for us. We haven't seen a need for us to do sort of chaining or workflows of functions. If I translate these requirements into a bit of pictorial diagram it will look like this. We have a concurrent job store and also along with that disk backed queue implementation. Let's call it a parity queue implementation. We also have a pod scheduler which keeps track of how many open jobs are there at a moment and hence how many pod should be there in the system to service them. Then we need to have an orchestrator which is the job scheduler. The earlier one was pod scheduler. This is job scheduler which takes input requests through APIs and does the following important task for adding jobs to the queue, managing the jobs when they are done, or even in case of error, again put them back into the queue based upon some policy. Then we have on this side the pods themselves. So we have put a sidecar container pattern that is an init container which helps the business logic worker container to do the job. This init container consumes the jobs with a set of priority queue lengths from the concurrent store and keeps on updating the progress of the job. So whenever it picks a job it should update that this particular container ID is taking up this job and on successful completion it should mention what is the exit code so that it doesn't get retried or if it has to be retried then what is the error code for that? It has to maintain all of it. And then lastly our business logic worker. We execute the job, persist the output and we also need interactivity with other Kubernetes live services and of course the logs have to be on a shared storage. A few questions that we should be thinking at this moment is how do we package these functions? So there is a full CI CD pipeline which is expected to do the pod definition right from the point when a developer says that this is a function and this is the runtime packaging of the Docker container around it. And of course there is a spec for our developers how they have to publish a particular container. And before we ventured on to building pontoon we did a comprehensive analysis upon what is existing there out in the market. So we looked forward to a few of the dimensions like native Kubernetes integration, access to Kubernetes cluster resources, what type of packaging or language bindings does this technology support, event monitoring, persistence logs and so on. And we evaluated four of them, RNIO, Fabricate, Fishin and OpenWisk. If you skip to all of this but just read the last lines for the cons part, we thought that we have a good handle with Fabricate function but this was approximately about the time when Fabricate was also sandboxed. So Red Hat said that they are no longer going to support it but rather move towards OpenWisk or even Cubeless. So at this moment we decided that let's invest on our own. Now let me introduce my colleague Magesh who will take us through pontoon details. So after all the evaluation we decided to go ahead with pontoon. So what is it? It's a airtight hollow structures basically helps to float in water. So what are the major characteristics of a pontoon? It's lightweight, it built for a purpose and it's versatile. So what do we mean by lightweight? So the whole pontoon infrastructure can be deployed in your cluster using a single deployment file. It's cube apply for a deployment YAML file, nothing else. And it's built considering the scenario that you already have a Kubernetes cluster which was our case that was productionized for more than a year and we quickly needed a solution and it works. So it helps to float the functions around. So we had a lot of whiteboarding while designing pontoon and we took few major choices. The first being even though to support Polygot we standardized, we decided to use containers. We said pontoon will not directly interact with docker demand directly. So we don't want to handle dint or docker inside docker or docker on top of docker. We do not want to handle how to clean up the image once it's been executed or if you run a docker with a given a new name, how does your monitoring tools understand it? If you want to debug how you can get inside a container dynamically. So we said we are not going to deal with all those things, rather we will go with Kubernetes way. Kubernetes job and deployment looks like solving all the problems for us. So Kubernetes job. So it provides two awesome abilities. The first one is you can tell how many times a part should be completed before a job can be considered as complete which you can map it to like what your incoming pending events can be directly mapped to number of times your part should be completed and at the same time we can also tell how many at any given point in time how many parts can be executed parallely. So why it's required. It's very easy for us to say function can scale to a great extent like event scale. But the real problem comes with respect to what your function is actually doing. If your function is depending on few databases to retrieve data, you are limited based on the concurrency that your database can hold. So it's very important for you to limit how many functions can be executed parallely. We cannot downplay that aspect. So what Kubernetes job doesn't do or lets user to do is how to actually do the message passing. If your parts are running five concurrently, it's going to run like 100 times what are the corresponding inputs that you have to pass to the part. So that is left to the users and pontoon users that effectively. So if you want warm nodes, in order to reduce the startup time, we can use Kubernetes deployment. Same construct. You can use Kubernetes scale up to determine the parallel executions. The pontoon is written in Java. It's not in Golang. And we do not aim for a pure play function as a service. If you're thinking of like enhanced security, multi-tenancy, those things are not in. We hugely depend on Kubernetes cluster for all and so on and so forth. So let's see how pontoon works. It has three major components. The first one is the pontoon API server. So this is the only stateful component in the whole architecture. And it's built using Xenon microservice framework. So it's highly available. You can start as many replicas as you want. The replication is handled by Xenon framework. And it's DR capable. So it has streaming backup to S3 at the moment. So at any given point in time, you can recover from S3 if your cluster is lost for some of the other reason. Okay. It primarily holds metadata. So how to register the function? It's an STPA where you define what your function is like a docker image and you register it. Then how to trigger a function or when does a function trigger? Whenever an event is being thrown. So an event can be registered by firing a post API. So the event goes and sits in the event queue. This is a priority queue and it's built on top of Xenon itself. So the priority is applicable only within a type of function. So if you have three different functions, priorities are effective only within a function, not across functions. So the next component is the scalar, which keeps on monitoring the event queue. So by looking at the event queue, it understands what are the pending events that it has to process. It also looks at the Kubernetes API server to figure out the status of the jobs and deployments that it has previously created. So now it has the view of what is pending and what are the available slots. By looking at what is already executing, it can detect that, okay, the function is configured for five parallelism, whereas the jobs are already running with two concurrency, which means when it's creating a new definition, it has to run only with three concurrency. So it has the basic logic and it creates a job definition dynamically and it interacts with the API server to create the job definitions. Now when a job is deployed, you can see it not only has the function reporter, but it also has a sidecar watcher. So the watcher keeps on watching the reporter to say, okay, you are supposed to execute for five minutes, but you are going beyond the five minutes, so let me time it out. So it does a mini lifecycle management of the function and it also speaks to the API server to get the event. So when pods are executing parallely, you need someone to ensure an event is being picked by only one person at a time. Basically acquire a lock and if your function goes crazy, you have to release a lock so that it can be retried. So those jobs are picked by watcher. Okay, finally, so how does your function get the input? So your function simply says, fires a get API on localhost 8000, give me an input and once it finishes the execution, it says either I am done or fail to indicate success or failure. So every other logic is being wrapped inside the sidecar watcher. So now we have a full end to end flow executing. So the same thing is applicable for the warm nodes with a difference being instead of job, we now create deployment and instead of a normal container, we need to have a warm container which has some more logic to simulate the completions that's provided by Kubernetes job. And we again use Kubernetes scale up to determine the parallel concurrency. Okay, so let's see what are the other capabilities that's available in Ponto. So you can define functions and group multiple functions in a single app. We have ability to pass static configurations both at an app level and at a function level. Those configurations can be received as a header or as an environment variable. So you have ability to configure retry number of parallelism that you want to achieve. You also have ability to define a cron to create these events automatically. So if you want to create, your function can be invoked only via an event and events can be created by setting up a cron saying every five minutes create an event of this particular time. And it integrates perfectly with open tracing. So if an incoming event has a trace ID, it retains it and passes to the function. If it doesn't, it creates a new trace ID that is applicable when we have cron kind of scenarios. So one of the most important reason for us to build this from ground up was the operational aspects. So it has to integrate with your existing login solutions. If you have already a Kubernetes deployment cluster in production, you have your own login solutions, you have your own monitoring solutions, like your own pager duty, so on, so forth. So it has to work well with everything. Since we have built everything on top of Kubernetes and we don't dynamically create any containers rather depend on Kubernetes construct. All of those simply works. And we had a new case where these things should run on a private cloud. By deploying Kubernetes on private cloud, this thing simply works. So one other important aspect is when we are writing a function, how do you enable it to test, right? So it becomes limiting factor in a lot of frameworks. So with pontoon, it's very simple where you can bring up a simple HTTP mock server and all you need is a three API mocks, like get input, communicate, success or failure. So let's see a quick demo. So this is an Java application. All it does is it tries to get a range from the input, and it finds all the prime numbers between that range. And it says, I'm done with this output or if it fails, it gives the failure result. Let me quickly jump. So all we are doing is, okay, find me all the prime numbers between 1 and 1,000. So we triggered an event. Now, if you see, we have triggered an event for demo. Oops. Okay. Okay. So, yeah, we can create a duplicate event. We created an event to find all the prime numbers between 1 and 1,000. So we fired it against demo and prime. What is demo and prime? Demo is our app that we have created earlier. And prime is the function that's been registered already. It's a Docker container that we have registered. Okay. Sure. So, yeah, this is how we registered a function where you have given a Docker image of a particular version. And you created an event just by passing a payload. Now, what happens is your function would have been started and your function is saying, I'm going to read the input as a long array and I'm going to compute the prime numbers. So when you execute a function, you get an event ID which you can query against. So, okay, so you can see all the functions that we have created and you can also see the response body. So we asked it to find prime numbers between 1 and 1,000 and it has found some things and you got the response back. Going back, the most important aspect is, yes, whatever log monitoring solution you had earlier, it works. So you will be able to see logs using whatever solution you already had. If you had few dashboards that you have built on top of, used based on logs or based on data doc for your monitoring, all those tools works. So whatever you have, whatever investment you have made already on Kubernetes cluster, you can completely leverage that using pontoon. So finally our experience or before you picking some open source framework, just look at your requirements and see whether it fits exactly or not. Since serverless is in very earlier phase, there are so many projects and a lot of them are getting better day by day. And back in time, we didn't find one suitable one and we had to build something on our own and it was very easy. Kubernetes does all the heavy lifting. So when using serverless, we have to watch out for scalability issues, not with respect to the actual function, but with respect to all of its dependencies. So even though there is no server, you still have to do a lot of operations. Your observability and monitoring becomes more and more tougher, if not none. But by adopting serverless, what you can observe with your product becomes more and more robust and it starts transition into file factor that if you're all familiar with it. So what we saw is by using pontoon, we were able to save more than 1000 bucks a day and we are expecting our scale will go up like 10x by end of this year. And we are looking how we can enhance this to dynamically leverage AWS spot instances instead of pre-definite C2. Yeah, if you have any questions, we'll be happy to answer. So your question is if we were to start today, what are the options we will look forward to? Of course, new things which have come up is Q-Bless, which is very exciting, and even OpenVisq is now more mature. So these two options, yes, we should. Thank you.