 In the last 12 months or so, I have been working at least 12 of our key customers and trying to understand how Cloud is being used across different industries. Being part of the engineering team in IBM Cloud, I get to see the architecture and the implementation of both Cloud Foundry as well as the IKS, the Container Services. And also, I'm fortunate to have a close association and part of the open community, like I'm the co-chair for Istio workgroup. And then I'm closely working with Cloud Foundry and Kubernetes. So I could get to see three different point of views when it comes to Cloud and Cloud Foundry, a customer perspective, an engineering perspective, and then we get to see what's happening in the open community. Good afternoon, everyone. My name is Surya Dugiralla. I am an STSM with Watson and Cloud Platform and Engineering Guild Leader in IBM Cloud. So today, we're going to talk about three different things. We're going to talk about how containers evolved and what are the current pain points that customers are seeing and what we are working on to alleviate those pain points. And also, we're going to talk about some futuristic stuff, how the architectures are changing, especially Cloud Foundry, and how these three major open cloud communities like the Kubernetes, Istio, and Cloud Foundry, which are going in parallel, how they're actually going to be merging and actually going to integrate in future. So we're going to talk about the current design, and next, we'll get into some futuristic stuff. There are many projects that I'll be talking about today. And in fact, there are many breakout sessions that will go more actually into those specific projects. So if you look at these three projects, the open cloud community ones, the Cloud Foundry, Kubernetes, and Istio. So you might have heard about these things time and time again. So we're going to talk about how these three, within the Cloud Foundry, they're going to work together. So before we get to that, I want to just go down the memory lane. When this whole container stuff started, because we always talk about containers. So when these containers, these are not new. If you look at that, it all actually started with the CH route in Unix. You can call that as the starting point for this whole container journey, because that's where you could get the resource isolation concept that was born. But officially, like you can see in 2008, you have those Linux containers. That's where we really talked about containers. And then if you look at Cloud Foundry, when we have announced this around 2011, the Warden containers that are part of the Cloud Foundry in 2011, then from that Warden container point of view, then we moved to support the garden and docker side. How we went to that? In 2013, docker containers were born. And then 2014, you have the garden Linux containers, because that's where we were actually going from DEAs to Diego in Cloud Foundry. And of course, 2015, that's when Kubernetes was announced. And 2016, you have, of course, again, 2015, you have the Open Container Initiative. And then RunSea started there. And then we started supporting that in 2016 with the garden RunSea. And then if you look at 2017, that's where IBM, Google, and Lyft announced the project Istio. And now in 2018, we are talking about how these three projects are going to work together. Of course, there are many, many engineering issues that all these three communities are working in parallel. When it comes to Cloud Foundry, of course, there are many customers, but we still are working to harden the platform based on the customer input, on the performance, scalability, stability, resiliency issues. And also, the multiple integration projects that are going on that's also happening in the Cloud Foundry world. When it comes to Istio, Istio being very young. It has just started last year. We have gotten very good momentum for a year old project. And you can clearly see we need to harden the mesh because the potential is really high. There are a lot of people in the community working on hardening the mesh, especially performance and scalability aspects. And then we have announced the minimal Istio. Like, let's say, if you don't want to use all the features of Istio, you can actually deploy different through helms, a minimalistic footprint of Istio. And Kubernetes, as you can see, in Kubernetes, we have, of course, it's a very nice orchestrator and scheduler platform. But we still have some issues, especially if you look at the scheduler. Some of you might have seen there needs to be some more intelligence put into the default scheduler because it cannot really identify an unstable cluster and correct it to become a stable cluster. So imbalances in the cluster and those things, the default scheduler is not able to adjust them. So there is a project under the scheduler's track in Kubernetes called the descheduler. So what we are working there is to see whether we can actually dynamically adjust and dynamically move some of the parts to those nodes where you have resources, whereas you are just bottlenecked on a specific node. Things like that, we are working on the descheduler side. And then, of course, the new things like the SNI, like the CSI, like cloud storage interface and stuff. So there are a lot of stuff in each of these three cloud communities. Still, they're working to make them even better. Now let's look at the Cloud Foundry for a moment. Cloud Foundry, even though it's being embraced and used in multiple production environments, in fact, I myself worked on the health care, banking, and some consumer electronics and different areas. But these are some of the things that really some of you might have seen. When it comes to routing, you may see long tail latencies if you're not using the go routers keep alive. And when it comes to networking, because you know in networking, if you have microservices, let's say you have two-tier or three-tier microservices, then you may experience significant latencies, increased latencies, if you are not using container to container. And then when it comes to build packs, of course, you might have seen when you do a CF push. You may see a sudden spike in CPU. And that's mainly because the way the current build pack mechanism, how the file system works. It's a flat file system. And then you also might have seen CPU sharing issues, like the algorithm, the default CPU algorithm. Sometimes if you have a contended environment, you may see some staging failures because you don't have enough resources. And then, of course, the garden doesn't have a concept of a pod so that you can have multiple containers in a pod like Kubernetes has. So you can inject a sidecar, which is essential for supporting Istio. So that's another drawback. And then you have now, as many of you might have seen, if you're using both Cloud Foundry and Kubernetes together, you will have multiple scheduler issues. And then you have the service mesh support. Of course, lots and lots of people when they move to the microservices. As a developer, I don't want to get into any of those traffic management, security management, certificate management, all those things. If somebody else can do that for me, then I can focus more of my energy on my industry-specific, domain-specific problem solving rather than handling these plumbing things. That's where the service mesh support is really important. So we are working, and I'm going to talk about what are the things that we are working here to resolve the current design pain points and how we're going to actually get to the next architecture to solve some of these other problems. Yeah, you can see this here. The long-tail latency, we have run some health care application. And then we have seen this. If you look at the long-tail latency, for some of you who may not know about that, long-tail latency is a, if you look at the gap between an average response time versus the 99th percentile, if that is significantly higher, as you can see on your left-hand side, the max latency and the 99th percentile is much, much higher than an average response time there. And that's mainly because of the go-rotors' inability to have the keep alive to the back end. So once you enable, because we have that feature now, which is not on by default, but if you enable that, then you can actually significantly reduce that long-tail latency. So this is one of the things from a routing site, which we have worked on. And then the second feature which we have is the container-to-container networking. So most of you are familiar, like if Service EA is calling Service B in a microservice application, then Service CS, that invocation, it has to go all the way out to the proxy and then gets into the go-rotor. And then the go-rotor will go to the service. It may be in the same Diego cell. But still, it can't talk to each other. But once you have the container-to-container networking enabled, you don't need to go all the way. And actually, you can reduce those hops. And that's where you will see significant improvements in the latency. Again, you can see some proof here. This is a pharmacy application from health care. You can see the default without container-to-container. And then you see, enabling the container-to-container, you see almost 2 and 1 half 2.5x reduction in latency. And you can see the same thing with online banking application too. Significantly, it's reducing from 329 milliseconds to 278. So these are some of the things that we have introduced and we have optimized these features. But these features are not there on by default. So if you are a cloud provider, you have to make those things available. The third one about the CF push. If you look at these, you can see the Java application, Node application, Ruby application, all of them, they manifest differently from a CPU spike how much it is. From a Java, normally, you see a much bigger spike because Java being a little bit heavier compared to Node and other programming models. The reason why you see this big of a spike is because your droplet size is much bigger. You have a flat file system. So you don't have a layered file system like Docker. So each time when you push an app and you have that Docker, you have the droplet size, you need to do that zip and unzip, all those things that which are highly CPU intensive operations, that will maybe four or five seconds. But you will be impacted with those things. How are we fixing that? You might have seen the, you know, Jules talked about the OCI build packs. The OCI layer is actually addressing that. Basically, these OCI build packs, what it does is it basically changes the file system from a flat file system to a layered file system like Docker. So as you can see in this, so you have a much smaller droplet. So you don't need to spend that much in compression and the CPU spikes will be significantly lower. So these are some of the existing pain points and with the current design and architecture. But now let's look at, as most of you have seen, now Cloud Foundry and Kubernetes, they're actually working together. There are multiple projects that are going on here. And you can see there are at least five of these projects, mainly led by different vendors here. The first one is the CFE or the Cloud Foundry Enterprise Environment that Tammy mentioned this morning. And then you have SAP's CF, Kubernetes Integration Scenarios. And then, of course, the Irene project. I will touch a little bit on that. So that's another thing that where you will try to replace the Diego scheduler at the application level, not at the container level, and then you just use Kube instead of Diego. And then, of course, you have other projects like Kube and these things. So what exactly is the CFE, the Cloud Foundry Enterprise Environment? So basically what we are doing is actually we are dividing this whole as isolated segments. What we are trying to do is we have a control plane and a data plane. Control plane has all the, it's running on a Kube cluster. It's having all the CF specific, like maybe Good Outer or the Cloud Controller and all these different CF specific, fabric specific things will be deployed on the control plane. And then your applications residing on the Diego cells that will be on the data plane. So you have a nice isolation there. And all of these things are actually running on a Kube cluster. And the beauty of this is you will be running on that. At the same time, you have access to all kinds of services that are available in the marketplace. It may be a blockchain or AI or IoT. So you have the nice orchestration layer that's handled by the Kube. But at the same time, you have integration to all kinds of services there. And you have the concept of you can use the Helm concept. So it's kind of a first step towards a nice integration between Cloud Foundry and Kubernetes. This is just a dashboard. You can see that the application level, like the container level, and also the pod level, resource usage, and all that stuff, you can actually see it from there. So now let's talk about what kind of problems from an engineering point of view you will see. Now you will get into nested container issues. Then you may see, OK, so now I have the garden containers within the Docker container. So how am I going to scale? Do I have any kind of issues here? So all those things you will get into with this architecture that we have here. Because now the way we are trying to do is we're going to limit one Diego cell per node. So it is actually basically a pod. So that has other parts like the monitoring, like Prometheus pod or Fluenthe pod. But within that Docker container or the Diego cell pod, you will have multiple, maybe up to 80, or based on how big your node is. You can go up to 80 or 100 garden containers. So when you have this kind of a nested, so do we have any performance issues or scalability issues? That's exactly the first thing that we looked at. So we have a direct comparison of the IKS. You're running the same application, a network-intensive application, on the Kubernetes platform in pods. And then you do the same thing on SIFI, like Cloud Foundry Enterprise Environment. Of course, now you're running those applications in garden containers in the Diego pod. So you can see the performance-wise. You don't see any extra tax, of course, a little bit. But you don't have that additional scalability issue or anything on the SIFI nested container architecture. And you can also see the scalability. You can see that both the Java application, as well as the node application, you can clearly see how elastic and how you can actually scale those applications, even with this nested container architecture. Now, SIFI gives you a Kubernetes integration at the lower level, right? Because you're building your Cloud Foundry on top of a Kube cluster. But at the application level, you still use Diego scheduler. So now you have two schedulers. You have, at the container level, one Kube scheduler. At the application level, you have the Diego scheduler. Well, you may see some problems later on. So the project called Irene is actually mainly working. We are working on that. And Simon is going to be actually talking about this later. So you can see the idea behind this Irene project is because SIFI is pretty good for dev experience. So we'd like to keep the best of CIF part. And then Kubernetes is a lot of community behind it. It's a nice scheduler. So let's keep Kubernetes for scheduling. And then keep the DevOps and developer experience with the Cloud Controller and CIF push and all that stuff with CIF. So if you combine these two things, that's where this Irene thing will come here. So how it looks like. If you look at this, on the left-hand side, you can clearly see that the Cloud Foundry as we talk about the CFE, like Cloud Foundry Enterprise Environment, where you have the IaaS, and then you have Kube cluster on top, and then you have the Cloud Foundry stack. If you have CIFI plus Irene, then what you're doing, those stars you can see where you're replacing the garden scheduler and the garden cells, you don't have that. Now you will actually, when you do a CIF push, now it will be pushed to a Kube scheduler. So you have a Kubernetes. And then instead of the Diego cells, you have this Irene. So this is the direction that, in future, we may see Cloud Foundry. So for more details, actually, there are some sessions. Today at 3.50, we have a panel discussion where Simon and the rest of the folks are going to actually talk in detail on this. And then tomorrow, we have two more sessions, of course, the CFE from Simon and Tammy, and then another thing tomorrow again. So now we talked about Kubernetes. We talked about integration of Kubernetes both at the container level as well as the application level. Now let's talk about Istio. As I mentioned, Istio is mainly the one that's actually getting traction because to put it bluntly, it will make me as a developer. I don't need to care anything except my application because everything else will be handled by Istio. So what exactly is Istio? So with Istio, you get all these different qualities of service with microservices. You will get intelligent routing and load balancing. You get the resiliency that you need for microservices. Then you can enforce the fleet-wide policy enforcement. You can get all kinds of telemetry and reporting you'll get. And then between service to service, authentication, security management because it has its own CA provider, you will get the certificates. And for an application as a developer, I don't need to worry about the rate limiting. I don't need to worry about connection timeouts and all that stuff. Everything will be given by the service mesh. That's exactly the reason why Istio is getting actually more and more popular because everything else taken care by that. So as a developer, I'll be just focusing on my domain-specific stuff. So again, Istio has two things. You have the control plane and the data plane. So control plane has the pilot mixer and the certificate management, Istio. And then the data plane has the intelligent proxies as sidecars to each of your services. And then your service doesn't know where the other service that it calls reside because the sidecar, the intelligent on-way proxy will handle all that thing for you. And then pilot will control the sidecar. So you can actually, through configuration, you can actually adjust how you would like to have the traffic management and everything, like canry testing, A-B testing, and all that stuff. So when it comes to, there are some work items. And people are working on how we can get Istio and Cloud Foundry work together. But the main thing is Cloud Foundry has the highly opinionated and simple way of handling things. That's the main positive point for Cloud Foundry. We don't want to lose that. And the Cloud Foundry, again, remains the optimized for developer velocity. And we want to keep all these major design points while we are trying to integrate these things. And again, complexity of maintaining because we don't want to lose all the control that we have for the operators. And we should have the ability to control the service mesh resources. All these things are the major design points that we're actually trying to keep in mind when we look at the integration of Istio with Cloud Foundry. So where exactly are we trying to integrate Cloud Foundry and Istio? There are four points. There is a north-south traffic and east-west traffic. So north-south is like when you go through the go-router. So there is a plan to even replace go-router with envoy, which is the ingress controller or that ingress gateway that comes with Istio. And then for east-west, you need to have that sidecar kind of a concept. But that's where app-to-app communication and all that stuff, the east-west traffic, we want to have an integration point there as well. And then when it comes to security, because Istio handles a lot of security, we need to integrate with UAE for the OAuth and OIDC. So these are the four touch points that we are trying to get this integration going. But again, because Cloud Foundry didn't have that concept of pod, so you might have heard from Dr. Zools about the peas, which are kind of a sidecar pattern for our garden containers now so that you can have the envoy intelligent proxies attached to the services. So this is the work that the containers tribe did recently to support this. So what are the risks of embracing Istio into Cloud Foundry? Like, no, because the first and foremost is the performance and scalability. Because we have hundreds if not thousands of routes, so we want to make sure that Istio's pilot can handle and scale as we integrate that. And then we have to balance the desire to do the enhancements, because now we are getting into a completely two different, because Istio itself is a very active dynamic open source community. So we want to make sure that we have minimal enhancements to the Cloud Foundry. And keep the currency is also going to be a challenge, because as I said, like a four touch points, so there are multiple teams within Cloud Foundry that they need to really keep the currency when we talk about Istio and Cloud Foundry together. Luckily, of course, I am, as I said, we have taken the performance and scalability things into consideration as I'm leading the jointly that work group in Istio. So we have a few issues already opened. We are working on pilot scalability issues, apart from the other 13 things that we are actually working right now. So Istio, as we speak, is getting optimized in multiple fronts, right? From telemetry to cash management, buffer management, pilot, ingress gateway, everything is being worked as we speak. There is a SWOT team that I'm leading. We have a work item that's going on. Actually, you can see all the stuff that's going on here in those links. And also to make it easy for Istio performance, we are integrating multiple dashboards that will make it easy for anybody using Istio to understand the resource requirements. As you can see here, recently we have integrated this. How many VCPUs you need for 1,000 transactions per second for a specific Istio component? In this case, you can see Istio proxy, Istio telemetry, and the policies, everything, right? So we are making it easy. As I said, Istio is a very dynamic community. It's getting developed and deployed and optimized as we speak. For more details, I think we have some sessions. Shannon and Aaron are going to be talking about 2.30 today. And we have office servers and also Istio birds of feather sessions you'll get there. Again, if you want to put all of these together, like Watson, blockchain, Kubernetes, serverless, CloudForm, everything together, there is an app. You can actually see the CloudCoins. You can go and download it. And that will give you a perspective of how all of them working together and giving you an excellent user experience. So with that, I will complete this session. Any questions? Thank you.