 All right, well, thank you for coming to our talk. As was mentioned earlier, this is about operating PCF with concourse, or as we like to say, how you can sleep more and worry less as an operator of the platform. So first I have to get this fire exit announcement. I was going through slides earlier today, and I realized the word concourse is in this slide. So this is referring to a public space outside a building, not the CICD engine. So yeah, just take a minute to read through that if you haven't seen it already. So a quick round of introductions. My name is Ryan. I am a PM for a lot of different projects at Pivotal. I mostly go around and tell people to automate all their operations as much as possible. And that's pretty much what I do. We also have Yuri, who's a senior cloud engineer at Scotiabank. He's been working on this new platform there, which is part of Scotiabank's digital transformation over there that's been ongoing for a couple years now. And also Therese, who's the PM of platform recovery. Claimed to fame there is, of course, BBR, the Bosch backup and restore tool. All right, so we'll go ahead and start. So yeah, I saw from the number of hands up earlier, most of you are familiar with concourse, so we won't go into detail of what concourse is. So we'll do just a quick recap of that. Then we'll dive into Scotiabank's PCF architecture and their usage. So talking about how many foundations they have, how those are laid out, what their failover scenario is, what the workloads in terms of apps look like. And then also talk a little bit about their concourse deployment, and then all the operations that they do with concourse, such as backup and restore, upgrades and install, and then the kinds of SLAs that they set for their users. And then we'll also talk about what, in general, what the before and after state was for them when using concourse and BBR. And then also talk briefly about the Play-Doh platform at Scotiabank. So a quick recap on concourse first. This is a CI CD tool that specializes in operations or particularly complex operations and systems that you have to operate, such as Cloud Foundry. We've used it for a variety of different operations, such as software updates, provisioning new environments, managing your networks, such as the firewalls, and making sure all the right ports are open, and IPs are allocated, and things of that nature. Also for monitoring, so ensuring that the health of your platform and also the health of your apps continues to be good, and doing backups and restores. So a lot of different activities that can be automated with concourse, and we've seen many different users of the tool really grow to love it. So that's been its specialty so far. So a quick anatomy lesson in a pipeline, in case you need a refresh. So the pipeline is comprised of a series of jobs, which are comprised of tasks. And the pipeline is essentially running those jobs. It can also run them in parallel. It doesn't have to be in series. And then what ties these jobs together are the resources. And resources can be anything, really. It could be a file. It could be a schedule, which is essentially creating a cron job out of a concourse job, telling the job when to run at what time. And so here's an example of a pipeline which is doing an update of a PCF tile. How many of you are users of PCF, by the way? Your quick show of hands. OK, cool. And how many of you have used BVR? Just to, OK. So we'll go over that for sure. All right, and with that, I'll hand it over to Yuri to tell Scotiabank story. Hello. All right, good morning, ladies and gentlemen. It's my pleasure to be here. Thanks, Ryan. As Ryan said, I'm a senior cloud engineer working for Scotiabank. And I'm going to tell you about our story of using PCF, using concourse, and what we built with it. So as I said, I work for a cloud development platform team. And we build a platform that provides effective, reliable, easy to use, and highly available platform for developers to run their applications. And that helps them to achieve high velocity. That helps them to promote innovation and agility and all those good things. So the project that I'm going to talk about is part of the bigger initiative that Scotiabank is undergoing as a part of its digital transformation that I'm going to talk about later as well. But now let's jump in into our platform and its architecture. So CDP architecture at the core of it is the Pivotal Cloud Foundry. And we currently have 10 PCF foundations running on the fabric of two cloud providers, both Microsoft Azure and Google Cloud Platform. Foundations are organized in the active passive pairs. And it means that passive foundation is the full replica of an active foundation. And developers, when they push their code and or deploy their applications, they push them, they use the pipelines. And those pipelines, they push the code to both foundations simultaneously. But at first, all traffic goes to the active foundation. So then the question is, why do we have passive foundation in place? And I'm pretty sure that many of you know the answer. This is to achieve high availability. And we have smart DNS configured that way. So if something happens to active foundation, if it detects an issue, it changes DNS records to point to passive foundation. And all traffic seamlessly go to now the same app running in the PCF passive foundation. Let's talk about the workloads right now. So we currently have 50 applications running in production and 120 in non-production. And those includes different kind of apps, starting from user-facing applications, such as components of websites, mobile apps, encryption and decryption applications and services, apps responsible for authentication, as well as internal apps, not customer-facing, such as support and miscellaneous services. And as you can tell, those are pretty critical apps that, if ever something happens with them, would be a noticeable customer impact. So all apps are stateless. We use both clouds and on-prem databases. And there's also a so-called incremental readiness process of how developers onboard or transform their apps to run on our platform. So the incremental readiness means that we encourage team to build their application to transform their application with the operational stories and in the development lifecycle. So instead of just going OR with check boxes and when the delivery dates approach, we encourage them to inject this transformation gradually and build the applications in the agile way. And we are proud to serve customers in many countries, of course, Canada, but also Chile, Colombia, Mexico, and that list is growing. All right, let's go to the next slide and talk more about what helps Scotia Bank to operate our PCF environments. And of course, that's the tool that was introduced earlier, which is concourse. So at Scotia Bank, we have a separate concourse built into, deployed into each PCF foundation. And it has its own web subsystem. It has its own database and two workers per each foundation. Concourse is deployed by secondary Bosch as the Bosch deployment. What are the benefits of this approach? That's a good question. And we can say that that brings us environments being fully segregated and isolated. Since we have two concourse workers, they never get overloaded and able to run without any issues. And it's easier to operate. So we found out that one operator is enough to oversee all concourse instances. We also have email alerts configured if builds fail. And it's easier to manage and to upgrade it since it's a Bosch deployment. And on the left part of the slide, you can see the example of how our concourse looks like. All right, so now let me invite Tarif to tell me more about Bosch Backup and Restore, another great tool we use. Thanks, Yuri. So Bosch Backup and Restore is a framework that is designed to backup and restore distributed systems, which is kind of a hard problem because you've got data all over the place. There's essentially two parts to it. There's a CLI called BBR, which sits outside the thing that you're backing up and restoring. So it sits on Jumpbox. Or as recommended, it sits in concourse. It will trigger backup and restore scripts that are inside of the releases. And they're packaged in the releases. So they're always compatible with the release. The backup script will generate an artifact, a backup artifact, whatever that needs to be. And then the BBR script will transfer that artifact back to concourse or the Jumpbox. So BBR can backup or restore any Bosch deployment or Bosch director that implements the scripts, the BBR scripts. Currently, it supports PAS or ERT. It supports CF deployment. It supports the Bosch director. It supports PCF Redis. And there are a lot more products that are building support for BBR. It is part of the Cloud Foundry Extension's incubator. And so it is being promoted as the de facto way of backing up and restoring deployments for Cloud Foundry. We have office hours this afternoon. So if you want to know more about BBR, please come along. Thanks, Therese. And so it's fairly easy to understand that we built at Scotia Bank, we built the concourse backup pipeline based on BBR now. But before, we used a tool called CFOps. How many of you is familiar with that? CFOps, good. I see hands. Awesome. But the BBR is the next gen way of backing up PCF. And we found out that it's pretty good. And we built our pipeline based on Pivotal's BBR pipeline plus a bunch of enhancements. Like we have backups can be scheduled. We also added encryption of the artifacts. And since we operate in the cloud, we upload backups to cloud storage on both Azure and GCP. And there was some improvements done to make it more efficient to save space and, of course, time. Backups are taken daily, I mean nightly, at our environment. And that happens for all PCF foundations, in labs, in non-production and in production. And as I said, we store our artifacts in Azure and GCP cloud storage for seven years. And now let's have a look at our Scotia Bank's BBR pipeline. So you can see three jobs here. On the top, this is the job that backups elastic runtime. Then the one that backups primary Bosch director. And the third one, exports ops manager installation. They are all being triggered by the concourse resource or trigger concourse resource. Each one of them can be triggered at a different time. And all of them are being uploaded to cloud storage again through concourse resource, not directly through the code. So that gives more flexibility of how you can build our pipeline to make interaction with cloud storage through concourse resource. All right, and now let's dive in a little bit of theory. How many of you are familiar with what SLA is? Cool, almost everybody. What about SLO? SLI? Less hands, OK. So you can see them on the slides. It's all related. And it's about setting expectations to our customers. So service level agreement, it's like a contract with your customers that defines the level of service that you provide. SLO is the target value or range of values for the service level that is measured by SLI, service level indicator. And SLI is basically some quantitative measure of some aspect of level of service. For example, uptime, latency, availability, durability, et cetera. And there are two other definitions, which is recovery time objective and recovery point objective. So let's talk now about what Scorsche Bank's platform offers as a part of their service level agreement. So we have three SLOs, four SLOs for four SLIs. And the first one is platform uptime, platform availability. And we guarantee to have 3 9th and 5th uptime for platform availability. However, platform had 5 9th uptime year to date. And of course, we use health checks to monitor the uptime. Another metric that we have in our service level agreement is push availability uptime. And we set SLO for that at 99%. So that basically means that this is the time that we guarantee that developers can push their applications to the cloud, to our platform. And it's fair to say that after we implement a BBR versus CF ops, our CF push unavailability decreased 16 times. Yay. From four hours to 15 minutes every night, which is pretty much negligible. So that was a great success story for us and the benefit we found from using BBR tool. And of course, two other metrics is recovery time objective. We offer four hours and recovery point objective up to five minutes. We also use concourse for PCF installs. And as it's mentioned here in the slide, before PCF deployment was highly manual process. It involved lots of toil and doing repetitive tasks following the official procedures. But we made it better and we implemented concourse pipelines to install PCF foundations. And they are based on PCF pipelines provided on PivNet. And we use different pipelines to install different parts of PCF ops manager. We have pipelines for elastic runtime, for tiles, for IP second, other Bosch add-ons, and also for different Bosch releases. Same with upgrades. Here is the table that shows different upgrades, PCF upgrades that we undertook. And from 1.9 to 1.8, the process of upgrade was highly manual. Again, it involved lots of toil. We are following all the procedures. But it resulted in lots of time, engineering time spent on doing that. And total time for all foundations was four weeks. It was painful. And it led us to think that we need something better. And we reached out and decided to automate that. So from 1.10 to 1.11, PCF upgrade was now built on pipelines, multiple concourse pipelines. And the total time for doing that upgrade was about two weeks. So much better, huh? And the next upgrade we did from 1.11 to 1.12, it was pretty much one click concourse pipeline that you can see on the top over there. You click it once, and you go do your job while concourse is running, doing the upgrade. So in the total time to upgrade, all the foundations we have was less than one week. So it's less than 24 hours to upgrade one PCF foundations with all the applications running. And not only that, not only it was painless and required almost no human interaction, there was no downtime for customers. Yay. Thank you, thank you. So how's life with concourse in VBR for us? We found it to be good. We saw this automation set in place. It definitely became better. And automation helped us to eliminate what's so-called toil and allowing our engineers to spend their precious time and talent building something from engineering perspective and solving engineering challenges, not digging into these routine procedures. So and this magic unicorn, agile ready, this is how our engineers look like and feel after implementing automation. So the cloud platform, the CDP cloud development platform that I was talking about, is actually a part of a bigger vision we have at Scotiabank. The vision of digital transformation and the bigger platform that we have, we call it Play-Doh. Scotiabank Play-Doh platform. So what is Play-Doh? We can describe Play-Doh as people plus processes and technology. But to give you more details, it's like an integrated set of technical products that encapsulate critical enterprise capabilities and different automated processes to enable standardization, reuse, and automation. It has reusable transaction and data API. It automates security and compliance on one common cloud platform. And it enables application developers to build awesome customer experiences without needing to understand the details of what's underneath of core banking platforms. And in the simple words, we basically provide technologies so that developers trust and they can focus on solving their business problems and to drive better and faster income. And if you're excited about the cloud, if you're excited about running at scale, about solving interesting engineering challenges, and working in an agile team with talented people and digitally-transformed banking, and want to be a part of that journey, I have good news for you. We are hiring. Thank you for your time. All right, awesome. Questions? I'd have to look at the schedule. Just a second. OK, do another question. Any other questions? Yeah. I'll see you again, please. He takes the mic down. Yeah, I just wanted to ask. You probably mentioned it, but how many foundations are you updating when you say that you now are able to do it in a week? How many installations are you updating? We did that for six foundations, six PCS foundation. Office hours are at the very specific, 3.55 PM today. Yes, you? More questions? As you were going through the process of building your platform team, did you engage with a platform dojo through Pivotal? Yes, we had Pivotal that helped us in understanding PCF and building the platform at some point. In your previous slide, you showed 11 to 112 upgrade. It's like one click. This one? Not this one. Yes, yes, we will talk later. OK. No, in the previous slide, the major upgrade, 11 to 112, you showed as one click. Is it truly one click or it has a semi-manual process involved? It's a one click. You start it there, and by the end of this pipeline, you get pretty much foundation upgraded. Does it include micros and miners, both of them? So 11 to 112 is minor, right? Yes, it includes. I'm talking about the major version here. OK, OK. Anyone else? Oh, yeah, I'm coming. I had a question about the size of the team that you had to do this transformation. Like how many people you had working on this specifically? Yeah, so currently we have a team of five people as a part of Cloud Development Platform. But we are hiring. Anyone else? Oh, no. One more question. I saw in one of your earlier slides you hold the backups for seven years. And I was curious whether that's just regulatory or if you imagine using a six-year-old backup for something. No, it's definitely not because it's because we are highly regulated industry and we have certain compliances that we got to follow. So one of them is to keep backups for seven years. We have one or two more minutes, so we can keep going. Do you get to test your restores into a non-pride environment at any point? I'll say it again, please. Do you get to test your backup by restoring into a non-production environment at any point? We did that in labs, and fortunately we didn't have to do that in non-production or production because we have highly available infrastructure that is not required to do that. OK, last question, I think. How often do you test your passive side? Do you fail over often? It happens once in a while. Yes, it mostly happens when we decide to test that. And I can't recall when it happened because of some incident in the near past. Do we have time for any more questions? Yeah? Yeah. We do. You said you're just using for a stateless in the cloud, a stateless application. Is there any discussion about stateful? Is there something moving databases? Is there something that you guys are considering? Right now we are focused on stateless, but we might end up doing stateful as well. Yes, of course, of course. We do have stateful apps. So when you said before that the upgrade from 1.11 to 1.12 was a one-click upgrade, and it took less than a week to do, how many foundations were that? Six foundations. Is this only production, or did you include all your entire lifecycle from the dead? All foundations. Both labs, non-prod and pro. OK, last chance. One more. During the upgrade, first you upgrade the passive foundation and then change the traffic to it, or you upgrade the active only? We started by upgrading passive foundation, and then we upgraded active foundation, but the failover didn't have to happen. So while upgrade was happening, there was no downtime, and there was no necessity to switch to passive foundation. OK, thank you. OK, let's call it. Thank you so much, everybody. Thank you so much for your time. Hope to see you again. Thank you.