 So let's get started, since we are tight on time, people will continue to filter in. So welcome everyone. My name is Mohsen Ahmed, I'm a Senior Systems Engineer at Ford Motor Company. I'm Manu Pasari, Senior Systems Engineer working along with Mohsen at Ford Motor Company. So I delivered the presentation here last year as well, and this slide and the content is a repeat from last time, just to provide a quick recap for everyone. So early last year, Ford Motor Company announced its transformation plan with the creation of Ford Smart Mobility LLC subsidiary to expand from an automotive into both an auto and a mobility company. This strategy has allowed Ford to stay focused on strengthening and investing in its core business of manufacturing car trucks and utilities, and at the same time enabled us to aggressively pursue emerging opportunities through Ford Smart Mobility to transform the customer experience to a new dimension which is core to our strategy. Next this is a high level Ford's One IT strategy, which has a futuristic theme. IT's main areas of focus are run and protect the business, improve IT capacity, agility and efficiency, and continuously innovate to improve the business. So we embarked on the cloud foundry journey about years ago with the development of a mobile application titled Ford Pass. The launch of Ford Pass was part of Ford Smart Mobility Initiative and the beginning of Ford's transformation into an auto and a mobility company. Ford Pass is the experience platform deployed on PCF to deliver mobility products and services to a broader base of mobility users that provides both in-vehicle and remote features and capabilities. After the launch of Ford Pass last year, there was a significant increase in the interest by the application teams to test and consume the microservices architecture. We continued to onboard additional applications on the platform by expanding its footprint in public cloud and our enterprise data centers. Besides expanding the platform, a lot of focus was placed on improving the governance for the platform and implementing a robust tool set to secure, automate and fine tune the infrastructure. The main theme of today's presentation will be around how we expanded the PCF platform and the efforts involved during the last year in improving its governance. The six main areas are architecture, security, availability, scalability, automation and maintenance. I will briefly touch on each of these areas as we will get into their respective details in a moment. So on the architecture side, two years ago we started PCF's implementation in Azure and completed the architecture for on-prem last year. We expanded the number of cloud-foundry foundations and improved certain areas to extend the integration with internal and external services. On the security side, a lot of focus was placed on implementing a tool set. We deployed Vault and on-prem GitHub Enterprise and expanded the use of certificate authority-based root sign SSL certificates. On the availability end, we improved the backup and DR capabilities to introduce better overlays. For scalability, we enhanced logging and alerting to monitor resource consumption and streamlining the scaling of resources. On the automation end, we deployed concourse for managing the platform. And for maintenance, as you know, it's a high maintenance platform which requires continuous and rapid iteration with various updates. We enhanced the use of concourse to create new pipelines for various use cases. So next, we'll get into each of these governance areas and cover them in more detail. We'll start off with the review of architecture for Azure and on-prem followed by rest of the queries of governance. This is the high-level current snapshot of our on-prem and Azure deployments. We initially started in Azure and then applied the lessons learned to our on-prem implementation and applied some improvements along the way. We tried to keep both the implementations aligned as closely as possible. However, as you can see, there are some differences between on-prem and Azure architecture due to the availability of feature set and components at the time of implementation and differences in network and storage tiers. We implemented the OpStack model at on-prem and planned to roll it out in Azure in the next few months. Now moving into the individual architectures for Azure and on-prem, this is a high-level architecture in Azure deployment with active-active topology across two regions. As you can see, we are leveraging services in our Enterprise Aero Centers where it made sense and deployed components in the public cloud where it had the technical and business justification. PCF Foundations deployed across multiple regions utilize global traffic manager and regional load balancers. Some applications are utilizing API Manager as well to be their client API front-end. As you can see on the slide towards the left, we are consuming on-prem infrastructure services which include GitHub Enterprise, directory services for single sign-on, logging uses SysLog and develop dashboard in Splunk for monitoring and alerting. Towards the lower end center of the screen, the diagram will state platform being integrated with Vault and Concourse with some applications using SQL Database for persistent storage along with events for handling event-based messaging. Next, Manu will cover the details for our on-prem architecture and we'll delve into the commonalities and uniqueness of the PCF platform in Azure and on-prem by touching on the areas of architecture, security, scalability and availability. So Manu. Thank you, Mohsen. So now that we've actually looked at all the different implementations that we have done across Azure and on-prem and how we actually implemented so many different foundries, I'd like to kind of like take us down to a single foundry and how we implemented in our environment. So this is kind of like a typical implementation of any cloud foundry in most of your data centers. So maybe most of you guys might be thinking, why am I seeing this all over again? So the key things that I'm actually I want to point out here is you see that the top-level edge firewall and then from there you see the load balancing systems combined together in the SLB and then the HAProxy layer and then the NAT gateway and firewall systems. And the reason why I touched these points are these are the ones that were the most pain points for us which we actually solved using different mechanisms throughout our implementation phases. So what this gave us is different points of control points for us to make sure that the ingress traffic is known traffic and also we had a control point on the egress traffic as well with the NAT gateway or the firewall device that we had. Another point that I want to touch on this slide is the HAProxies that we deployed in our foundries is an open-source HAProxy which actually provides a functionality to offload SSL to support custom URLs for applications running on this foundry. And also we have implemented some ACLs at the HAProxy level to make sure that we are not exposing all the administrative endpoints for these foundries to external networks that are trying to access these foundries. So now that we actually reviewed... Oh, actually one more thing that I want to touch on this is the dotted line that you see there is an RFC 1918 address space is where we actually deployed these foundries. And that is one of the reasons why we see or you see a NAT slash firewall system at the perimeter of the foundry. So going on to or taking us down to one layer down on how is our infrastructure laid out across app stack that we've been talking about and at the foundry level. App stack is actually deployed in a management vSphere cluster which is actually leveraging or using storage that is actually replicated across the data center. So in case of a single data center failure, I still have my management infrastructure up and running so that I can continue maintaining and managing my other rest of the foundries that are running in the other data center. So now going down to the actual foundries itself as you see there, they're actually deployed across two different vSphere clusters that are and each of the vSphere cluster is spanning across two different racks. So that actually gives me a resiliency of even if a full rack goes down, my foundry is still up and running and even if a full cluster goes down, my foundry is still up and running. So that way we introduced multiple layers of resiliency both at the cloud foundry level as well as at the vSphere or the infrastructure level in our implementations. And now going in, now that we've talked about architecture, designs and how we implemented the different foundries and all across Azure and on-prem, I'd like to kind of like take us through what are the differences and commonalities across Azure and on-prem implementations at the current state of our current day. So the commonality is we've been deploying cloud foundries across multiple regions in the Azure space, multiple data centers in our on-prem implementations. The differences are the tool sets that we used to actually deploy all these foundries. So in our on-prem environment, we've been using a full stack, up stack that we've been calling which is a combination of different components which we'll get to in the next few slides. So what we try to bring in using those up stack components is the fully automated deployment of not just the up stack which is the management stack, it also kind of like helps us to automate, fully automate the cloud foundry implementation as well. So on the Azure side, we've been leveraging jump server and to deploy our foundries. One other thing that I would like to touch on this is, up stack is deployed using protobush and protobush gives us the capabilities to protobush manages or monitors the up stack components and in case if an up stack component fails, it actually will resurrect that component in case of a failure. So I actually have resiliency at that layer as well. So moving on to what is up stack and how did we deploy this? When I say fully automated stack, what did we do? So the only virtual machine or the component that we deployed manually is using a template, Bosch in a template and from there on, every other component that we deployed in the up stack as well as individual foundries is all automated and automated using the concourse pipelines. And we also leveraged a couple of tools that are part of the bastion there that you see, Genesis and Spruce, which actually gave us the capabilities to split our long deployment manifest files into like smaller components and smaller properties that we can use and maintain and manage and understand them so that we can actually, it's much easier to deploy. It's much easier to actually kind of like remove the credentials and the certificates out of the manifest files and put them into something like a secrets management tool like Walt that we've been leveraging in our space. So now that we looked at what's up stack, let's look at how did we actually bring in security which is part of one of the governance pillars that we were working on all this year. So the commonalities are we've been leveraging Walt for secrets management and our credential management and certificates management, AD Federation across all the foundries to maintain access controls to apps manager, GitHub for version control as well as for authentication for concourse itself and SSL certificates across the foundry using HAProxy as an AI support and extensive logging and monitoring and logging into an external system as well as we actually push those logs into an SIEM system that is sitting outside of the foundry. So now that we talked about the commonalities quickly touching on the differences from an on-prem point of view, we implemented ASGs. Even though ASGs bring in some basic security functionality today, we are actually expecting the future releases of ASGs to bring in much more robust implementation process as well as logging capabilities within ASGs. And also, as you have seen, we have implemented perimeter firewalls both at the ingress and egress point of view in our data centers. And also in Azure space, we're leveraging network security groups which are Azure network security groups to actually implement our control, access ingress and outbound access into our founders. So the next slide actually very high level, automated implementation of ASGs. Basically the way we implement ASGs in our environment is once the application teams gives us the ports and protocols or access control list that they need to be implemented at their space level, we as one of our engineers reviews them, they put them into GitHub. Once that is approved, they are actually pushed through using concourse pipelines. And then that is actually in place in production. We're actually working on another pipeline which actually is to validate what we've implemented versus what's approved so that way we can reconcile them on a regular basis. And this kind of like leads us into our next future projects that we've been working on, which is actually to auto onboard the users onto the foundry, which includes creating the orgs and then onboarding the user onto the foundries. So if we automate the ASG implementation as well, that kind of like merges both the worlds together wherein they get the access to the foundry as well as they get the necessary security policies in place so that they can right away start developing their code without engaging yet another security group to open those firewall holes. So now that we talked about ASG workflow at a very high level, let's talk about availability. How did we bring in availability into our foundry implementations? We actually like we've been talking about, we've implemented foundries across multiple data centers and multiple regions. And we also have jumpfire replication across multiple foundry implementations with extensive logging and monitoring and also alerting in place that actually alerts or sends alerts of anything that happens to our operations team, which for them to act on a regular basis. And then the differences are in the tooling that we've used to back up the data, the metadata that is critical for the foundry to be resurrected. So we've leveraged something called SHIELD which actually backs up CCDB, Blob Store, or Bosch Blob, Bosch DB, and a few other components for us that gives us the capabilities for us to resurrect and bring the foundry up to a state where the application teams can start deploying their applications in a short window of time. From an Azure point of view, we've actually been running some custom scripts and which actually back up all those components that I just mentioned. And the jumpbox where those custom scripts are run is actually backed up using Azure Recovery Services. So now that we talked about availability, let's touch on how do we scale and how do we keep this, I mean, keep up on a regular basis. So we actually generate regular weekly and monthly capacity management or capacity reports and we proactively monitor all the Diego cell capacity so that we make sure that we are ahead of our customer consumption of Diego cells. Teams are working currently to actually do the same capacity reporting for service instances also. So once that is in place, we're planning to roll it into production. So today we actually have Diego cell scale up and scale down automated through Conquest Pipelines. So to keep us ahead of the demand that we are seeing, we actually leverage pipelines to scale up and scale down Conquest Pipelines. So now that I talked about all the difference, a few of the governance items and also a little bit of architecture on from Foundry, I'd like to hand it over to Mohsin to talk about automation. All right, thanks Manu. So on the automation and as you mentioned, we implemented the Conquest Pipelines along with Genesis. So as you know, deploying code or updates across multiple environments can be a challenging task, especially when it comes to ensuring that there are commonalities like iOS properties and networking are common and where they are common and where they should be specialized. An ideal deployment pipeline is where you are implementing tested code in sandbox, performing your integration and acceptance testing and pre-powered and then moving the code to production. We encountered significant challenges when we needed to make changes to the common elements of these different environments and observed that without proper discipline, these different environments can easily drift leading to negative consequences. To address these challenges, we introduced Genesis Proofs and Concourse in our environment. So just to give a level set of to everyone what Genesis is, Genesis basically changes the scenario of making changes to the common elements of these different environments by breaking up the Bosch configuration manifests along three logical strata, global site and environment. At the top, the most generic configuration is considered global. This is a general outline of the deployment, what jobs on which instances specified here and used everywhere. Beneath that, the site stratum defines the composition and configuration of the infrastructure and at the lowest level environment, it provides the place to set the networking for a single deployment and specify the override properties and scaling factors. Genesis combines these different levels of configuration to produce a single Bosch manifest for each environment and uses another tool called Spruce to handle overrides and references in a straightforward and a predictable manner. At this point, we can leverage Concourse to consume the manifest produced by Genesis. So this is the architecture, how we implemented Concourse, a pretty standard way of deployment. A single implementation of a web front end with a backend database and a cluster of workers across all the foundations. Concourse is integrated with GitHub and Vault and we are using GitHub as a backend OAuth for Vault and GitHub, for Vault and Concourse. Now let's review how Genesis and Concourse play together. Our infrastructure can be considered as code. Each PCF deployment uses a general file exceeding 5,000 lines of code to deploy it. By using Genesis, we are taking our 5,000 plus lines of manifest and turning it into an object oriented Jamel design. We split up our code into smaller, more manageable files and reduce duplication by using references instead of duplicating lines of Jamel. Some of these references pull multiple certificates from Vault, further reducing the size of manifest we need to work with. We achieve this by taking properties that apply to all foundations and put them at the global level. Properties that apply to the foundations in the same region are stored in the site level and properties that apply to a specific foundation are stored in the environment level. All these are merged with Genesis upon build time to create one manifest that we deploy with. Once these manifests are organized using Genesis, we can use Genesis to create concourse deployment pipelines for us. PCF and all services are then deployed using this pipeline that Genesis creates. This is just a high-level layout of our concourse CF pipeline. Next, this is the end-to-end concourse workflow that we already talked about, starting from an admin checking in their manifest to GitHub that triggers the alert at the concourse level and then the concourse worker doing the heavy lifting of using sprues to do the merge and pulling the secrets from the Vault and then deploying using Bosch to either elastic runtime or PCF services. So next, Manu is going to get into our day-to-operations and how our operations team is managing the platform on a day-to-day basis. So Manu. Thank you, Mohsin. All right, so now that we talked about all the automation that we built in and all the different architectures, how is our operations team managing this on a regular basis? So what we have a daily stand-up call in our environment. So every day pretty much the operations team looks at efficiencies around what are the different things that we can automate today. That's the mindset that we start with. So far, they've been able to, like you have seen, they automate the deployment, upgrade, scaling, reporting on a regular basis to our management. We also are in the process of bringing automation around, like I mentioned briefly early on, on how do we actually automate the process of onboarding a user without an ops team getting engaged in the onboarding process? So that's being actually coded and tested in the lab environments. Pretty soon we hope we can roll that into our production environments. And also briefly touched on this other next line item as well, we actually have pipelines being built to actually reconcile the ASG rules that we have documented in GitHub versus what's being running. So that way we have a good understanding and a clear picture of why a particular application is working versus not working. So that's the operational activity and coming to the maintenance activity of the founders, we actually created a true DevOps model in our environment where engineering team and operations team work hand in hand. We basically sit in the same room working hand in hand. We created a very efficient and nimble environment where we iterate through, we discuss through what are the different versions needed, the compatibilities and things like that so that we can actually keep upgrading and fixing the code or fixing the systems to facilitate our application teams onboarding process, application code itself. There were some instances where we had to work with the application teams to actually help fix their code. So that's the kind of environment that we work in. We actually extensively leverage concourse pipelines like we saw in our earlier slides. We actually perform platform upgrades during normal business hours. That's the most important thing that I'd like to stress on here. Now moving on to, now that we talked about operational activity, the maintenance of it, let's look at what are the challenges and lessons learned by our team over the last year to two years period of time. So we actually ran into for specifically from an on-prem point of view, we ran into the license limit at the LTM level by running only performance running of only one application with eight microservices running within the founder. So what we did is we worked with our partners brought in this custom or open source HF proxy which actually brings in the SNI support and the custom URL support to alleviate that problem. And we also have the capability to scale up, scale down HF proxies because it's Bosch deployed and Bosch managed. And when we were deploying the latest foundries in our data centers, we also deployed a NAT device which ended up being a firewall device which you never want to do that because now you actually have to work with your firewall team to document the ports and protocols of every application team that's being onboarded. So that's one of the overhead that we introduced into the process. So every application team has to go through this laborious process which we would want to eliminate in future if at all possible. And then in order for us to give a true active, active sense from an application point of view we actually had to introduce something called GSLB sticky configuration to continue maintaining the user session to a given cloud foundry for about 20 minutes time to avoid him going through the loop of authentication loop if he was bounced back and forth across the foundries. So that's one of the things that we want to avoid. And we also implemented Shield which again, it is a temporary shortfall most likely which is it doesn't back up wall today which has all the secrets that we need and it doesn't encrypt the data that it is backing up today. And we're hoping those will be fixed in future releases. And now I'd like to hand it over to Mohsin to talk about challenges on Azure space. All right, thanks Manu. So some of the lessons learned of the challenges in the Azure space we found out that the network address space has to be unique across all your foundations. That just makes the logging and the integration of tool set much easier. There's a limitation of mounting external storage in Azure to one terabyte. But I think with the recent announcement from Microsoft for managed tasks we will be evaluating that feature in near future. Another thing we don't have the ability to create custom roles to delegate granular permission set to our developers. And on the conference pipelines we've seen that they aren't fully portable. There's still manual work required for deploying Cloud Foundry from one environment to another. And on the logging side, if we ran into issues where any time a developer turns on verbose logging and especially since we are using Splunk in the backend it has exceeded our threshold for the licensing. So that does create a issue on the logging. Now what's next? So some of the items we plan to focus on is the draw more alignment across our Azure and on-prem architectures and the tool set. Do frequent credential rotations, definitely logging and monitoring enhancements. We like to evaluate some of the Azure service brokers around Event Hubs and Cosmos database. We like to introduce a self-service developer portal where developers can go in and create their own orgs. And there's another initiative in Worksite now for the software development or develop ecosystem where we would like to just not automate the CF but the tool set as well, including Jenkins and some of the other things. So before we open up for QA, I'd just like to recognize one of our team members, Tim Dickelborg. He couldn't make it to the conference. He's part of our PCF ops team. He helped us putting this stack together. And I see Chris Gullion and his team, they've been very helpful along the way. Post implementation in automating our Cloud Foundry through concourse and implementing that tool set. So with that, I open the floor for Q and A. Simultaneously, like it's your hybrid, I mean, do you go there or? No, we are not running in a hybrid mode. Those are two independent implementations, yes. On-prem and Azure, it's not a hybrid cloud. Yeah, so the way we've implemented the load balancing is active-active in both on-prem and in Azure. So it's active-active implementation in Azure, in two regions. Same thing, multiple data centers in on-prem. And also to add to that, the applications that we're running in Azure and on-prem, they're focused to those environments. So things that are mobile-ready and customer-facing which actually needs proximity to the customer are running in Azure space, whereas all the corporate stuff is running in our data centers. So most of our connected vehicle stuff, applications that need geographical proximity to the vehicles is in Azure. And most of the marketing and sales types of applications are on-prem. Part of it is in Azure, but we are bringing back all the logging data. Especially with our deployment in China, there are some regulatory constraints also to keep the data in those specific regions. So it's a big spack. Yeah, any other questions? All right, thank you there. Thank you for your time. Thanks everyone for coming. Thanks for attending.