 All right, I think it is, this is working, right? Cool. I encourage everybody to come to the front, because this is a panel discussion, believe it or not. Of course, nobody's gonna move, but. Thanks for coming. So my name is Jason DeLorm, I work at Microsoft, and my job is to manage the relationship with Pivotal from Microsoft, among other strategic cloud partners, and Pivotal prominent amongst those. And we recently worked together very closely with Ford and Pivotal to deploy Ford Paz on Azure. And the idea behind this talk is that we would share some of the learnings, get to meet some of the engineers that were involved in the project, and open it up to questions. So I've got a couple of pre-canned questions, obviously, from the experience, but I'd love to hear from the audience if they've got any questions, and we'll tackle it directly. So with that, I'm gonna introduce John Sermon, or let him introduce himself. So yeah, my name's John Sermon. I'm a Senior Program Manager on the Windows Azure Cat Team. So the Cat Team, as we're called, is we're the customer-facing part of engineering. So we do the largest, most complicated implementations with Azure worldwide. They send us on-site, we write a lot of white papers, talk at conferences, and then we also handle executive escalations. So what that usually means is somebody's really, really mad, and their CEO's called our CEO, and then they send us on-site. Just for example. Yeah, just an example. Just an example, so no pressure. So yeah, hypothetically speaking. So it actually worked out, because I used to be an escalation engineer prior to joining the Cat Team, so I absolutely love escalations when everybody's screaming and mad at everybody. I mean, when I got on-site with Ford, I was kinda disappointed, because everything was actually not bad. Nobody was screaming at me, it was actually a pleasant environment. With that said, I'll let Hayden introduce himself. Great, thanks. My name's Hayden Ryan. For those of you that didn't meet me yesterday in my talk, I'm an advisory solutions architect with Pivotal. I'm part of what we refer to internally as Dino's team, which is our Dino's their boss. We go out on-site and work with large corporate customers to perform dojo operations, so we pair with them, we build up their platforms, we teach them day-to-operations, and how to manage your Cloud Foundry environments, as well as doing a little bit of professional services as well into the mix. I've been at Ford since October last year, and I've been working for Pivotal now for a couple of years, this is my second summit. Awesome, thanks Hayden. So, tell us a little bit about what it is that we actually built together with Ford. Give us a little overview of the product. What is Ford Pass? So, Ford Pass was the first application that Ford wanted to deploy to Cloud Foundry on Azure. It is a mobile application with a back-end written in spring. Ford actually used Pivotal Labs, as well as their internal developers and ICOM Mobile and a couple of other development houses to actually do the application development, and then they engaged with our team, the PCF Solutions team, to come out and help them build up the platform. So, what we originally built was a deployment of Pivotal Cloud Foundry 1.5. At the time, we did have 1.6 out. This was back in October. We hadn't done a lot of testing on an Azure though, whereas we had with 1.5, so we'd done a couple of POCs, and we were very confident that it was gonna work quite well. So, we stuck with what we knew was gonna work and deployed that. That was the first foundation that went up. We interfaced with Ford's application teams. And got a bunch of requirements from one of them being that, okay, they now want this to be active-active, and they need this in the East and the West regions of the US to start with. And that was because another component to the application was actually built on the Microsoft Paz solution, and so they wanted to keep latency down in terms of communication from PCF to the vehicle SDN. Yeah, so I put, this is the architecture slide that you shared in your other presentation. So I just put that up maybe a little bit late, and I actually had this slide up for a second. This is bring your toothbrushes, right? Can you tell us a little bit about the toothbrush? Let's go back to the architecture first. So in terms of the active-active, there is one issue with deploying active-active that you do have to be cognizant of is how do I get my data to replicate? So in this deployment, what we actually used was GemFire. GemFire has a feature called WAN replication that enables you, sorry, to connect through a VPN and to sync what's in GemFire in each of those regions. So that's what we use to keep state and data correct between the two deployments. But if we go back to the toothbrushes picture, this is a whole bunch of us, including some of the Ford guys, or mostly Ford guys, on the day of the launch it was, there was a lot of work that was done leading up to the launch. There was a lot of iteration and I mentioned this a little bit yesterday in terms of deployments, in terms of additional features that we wanted to utilize in Azure. And so leading into the launch, we were all really nervous because this is a big launch. It's a big launch for Microsoft, for Ford, massive launch for Ford, and for Pivotal. And so after one of the meetings with Marcy Claver on the CIO of Ford, she said, well, you guys better bring your toothbrushes. So we actually have this photo of all of us ready, waiting for any issues that were gonna happen to jump on them with our toothbrushes. And unfortunately, well, fortunately, we didn't have a single issue and we still haven't. So we've had zero downtime of this platform since launch, which is pretty awesome. So there was a couple of references so far in the conference to unicorns, rainbows, and butterflies, actually, butterflies and rainbows. So it was all butterflies and rainbows. Were there any lessons learned, John? Actually, what was the biggest lesson learned? Actually, so yeah, there was one lesson learned. I always look for, I like to find broken stuff because I love fixing stuff. And unfortunately, there wasn't a lot, but there was one thing. So, and I use an analogy. I don't know, I like to use analogies when I'm explaining things. So TCP connections, if you're familiar with TCP, you create a socket connection to a client server environment. Connections are a lot like relationships. So you can have short relationships, you can have long relationships, you can have relationships that are abruptly or ended, terminated, about third parties sometimes. So in this example, I'll kind of explain what was happening here. So say we've got, I think, Hayden, you're single, right? Yeah. So you're a single guy. Say you start dating somebody. You're talking to this girl, let's call her Miss Azure. And you're talking to her. She's real pretty. She's an evil mistress now. Yeah, you're talking to her quite a bit and your conversations are pretty frequent. After a while, like say four days or something, you don't contact her. And she's sitting there looking at her connections and she's like, oh, I haven't talked to that guy in like four days. Well, I'm gonna go ahead and just delete him out of my contact list, because I ain't got time for that. And so Hayden, the fifth day, he's like, hey, I'm gonna contact that girl I was talking to. And when he does, he contacts her. The number shows up as unknown. She's acknowledged the fact that I got a call from this guy, but she doesn't know who it is anymore. So she basically does an act reset. So she acknowledges it and resets the connection. So in Azure, what was going on, we had a idle timeout that basically at like four minutes, I think it was a default, we were destroying that connection. And it's a standards-based. We're not gonna send a reset proactively to the client. So we kill that connection. Client doesn't know about it. He's still trying to talk to the server, network load balancer. And she's like, I don't know who you are. You're nothing to me. So she resets the connection. The client gets the reset and he just like, what is this? And throws an unexpected error. So lesson learned here is we had to lower the TCP keepalives on the client to be consistent with what was on the Azure load balancer. So once we did that, it was like everything, all the problems we were noticing, these intermittent problems fixed it. Like it almost became a joke. Like if we ran into an issue, it was like, are TCP keepalives enabled? Do we lower the keepalives? And yeah, if we did, then we're like, ah, well I don't know what it is. And so ultimately that was the most effective change. And it was like, once we did that, it was like we were heroes at 40. Well, our lives got a lot easier at that point. Yeah, yes, it definitely did. So it should be added as well that the TCP packet that gets sent out, that's not required by the RFC. It's optional to be sent back. And so it's completely up to the vendors as to whether or not they're gonna implement that. So the resolution kind of posted here, right? Was a change, can you describe hidden, the change that was implemented? Yeah, sure, so the Pivotal Engineering team developed a Bosch release that was then applied to every single virtual machine using the Bosch Update Runtime Config command. And all that that did was to set the TCP keepalive for the kernel on the VM. And for the runners and also for cells, that will flow up through into the containers themselves as well. So all the issues around networking that we were seeing, once we enabled that, they were resolved straight away. So that, it was interesting actually digging into it because it manifested in different ways. We had an issue where we had Spring Cloud Services and that was an application that was pushed to Cloud Foundry. And we found that after a while they wouldn't be able to communicate to RabbitMQ. And so that was because the connection pool was filling up and it wasn't clearing out. So once we enabled the TCP keepalive, that worked correctly and the issue was resolved. Yeah, I think it manifested in other ways in the sense that it felt unstable, right? So that the environment itself, there was a broad label across it for a while that it's unstable until we really got to the root cause of what this was all about. And after that. Yeah, and then that was about the time that the CAT team got called out, which... Yeah. So I'm from Australia and the CAT team in Australia is stands for the Psychological Assessment Team. It's something that the hospital sends out when they think you need to be an old, true patient. So when I got told that the Microsoft Azure CAT team were coming out to assist it forward, I was like, what's that saying about me? Like, is it okay? We're kind of different. We're like a dysfunctional family where I look a little different than most guys in Microsoft maybe, and yeah, we're strange. And we work weird hours, but we're able to fix stuff. But yeah, and then whenever we ran into an issue, it was Hayden and I and then Ford. And so it's Hayden and I were both like, oh crap, is it Azure or is it Dibital? And so we're working as a team now, which was really nice. So I've worked with partners. I've been at Microsoft like 15 years, and it was an escalation engineer. I've worked with tons of partners. This was really kind of a breath of fresh air because we're on the same team and whenever there was an issue, I was able to pull in our CSS team, which is our support team. We have CSS escalation engineers that are working with the Pivotal escalation engineers and they know the product, they know each other, and they work together. And it was like this, I think the one time we actually did have to create a support request, it was like this massive amount of technical horsepower on the phone. It was really impressive. And I was like, yeah, these are my guys. And then your guys got on the phone. I was like, yeah, this is pretty cool. So I think it's worth mentioning that Microsoft's a big company. There's lots of different teams. And if we were to say there's lots of thanks, let's say, to go around the table here. John, can you describe a couple of the teams that are out? So you mentioned CSS, we've got the Azure Cat. So I live on a team called DX and I'm responsible for managing the partnership with Pivotal. We've also got a customer team at Ford. They have a cloud solution architect. They've got a technical account manager for support. Yeah, and that's with our team, because we're part of engineering, we're kind of unique. So when we come in, we're not as sensitive to different teams. So a lot of times I'll stomp all over sales and marketing. I'll say things that are horrified that I've said. They usually hate it when I start talking. They're like, don't tell them that. But I mean, we keep it real. And we've had to interface with all these different teams. And we had sales, marketing, DX, you guys are great. When you come in and help the customer get up to speed and you've got breadth across customers, we're deep technical. We've got our engineering team. So we've got Kim Donna's team. She did a talk yesterday and then Ning and her team from Shanghai, they're amazing. Like I don't think Ning and her team actually slept for like two weeks. I don't know, she was always on IM whenever I had a question on CPI and Abel, they would fix it. I'm like, where, they're robots. And so we interface with them and then the CSS team, that's where I came from. So we interface with them and we're all kind of on the same page. So it was actually, I really enjoyed it. I mean, it was exhausting. I mean, I lived in Detroit for like three weeks. I had like closed for three days. And then I know my boss says, you need anything? I was like, I need some clean clothes. But I stayed there for three weeks just to make sure everything was fine. And it was, but yeah, there's the main teams. And then we worked with Pivotal Engineering and then Haydn's team. And Ford was deeply involved too. Yeah, Ford, yeah. Let me take a second. Ford's role in this. And so the Ford guys, like, they were great. So we've got, yeah, Shaji, I don't know if he's here. There he is. Yeah, he was amazing to work with. Most of it as well. Yeah, motion. Yeah, they were awesome to work with. Sometimes like, I'll get on site with customers and it's just, you know, they're not as deep technical as you'd like, but these guys were solid. They knew their stuff. And it was a really nice work with them. So switching topics and the mics are open. If you wanna come and ask questions, I've got a couple more here. Haydn, Pivotal talks about certain levels of availability in the way that you deploy Cloud Foundry. Can you describe some of the subtle differences maybe between Azure as an IaaS versus some of the other public IaaS that you work with and public Cloud IaaS and how Pivotal Cloud Foundry runs with in the context of availability? Yep, sure. In PCF, we usually talk about the four levels of high availability. They range from Bosch restarting virtual machines using Bosch Resurrector to Monit that runs on each of the VMs restarting processes. Then we have Cloud Foundry itself, as you'll be aware, will automatically restart application instances if they go down. And then we also have the level of high availability at the availability zone level. Now currently, Azure doesn't support availability zones, but they do have something else called availability sets. And this is something that is quite distinct from other public IaaSes. So I understand that availability zones will be coming very, very shortly on Azure, so there's gonna be a fifth level of high availability there. Yeah, we have managed to, did you mention that? Well, not in the scope of it to say, but yeah. Yep. What are some of the other challenges that you faced? We talked a little bit about storage in the way the storage system works in Azure relative to some of the other IaaS. What did you learn there? Yeah, so Azure provides a lot more transparency through to how the storage actually works than other IaaSes. So it gives the user really the ability to control where they're putting their data. So all the virtual machines that we have in Azure are backed by a VHD, and then Bosch attaches persistent disks for data, as well as having an ephemeral disk, which actually runs on the virtual machine hard drive and the compute cluster. So because it exposes up a lot of that details, it becomes a situation of you need to actually be cognizant as to what you're doing, where you're putting your data. You can get into situations, and I mentioned this briefly yesterday in the talk, where you need to make sure that you're managing your storage accounts correctly. So a storage account is a concept where it's a fault-tolerant cluster of physical hardware that's located in what's called a stamp. Now a stamp is a very, very large, well, they're racks of storage, and that's located in a data center. So as a best practice, if you wanted to be super paranoid, you absolutely should be splitting your deployments into multiple storage accounts, and even potentially your jobs until Managed disk comes along, which will resolve. Yeah, the takeaway is like multiple storage accounts is you can never have too many storage accounts sometimes. I mean, they give you a lot more granularity in terms of managing things. When Managed disk comes out, that's gonna make things a lot different. But I mean, I think that's one of the key differences that we had with the other clouds that you guys might have been familiar with. Now, are there any questions from the audience? Can you come up to the microphone? Yeah, any questions, like if you don't mind coming to the microphone, so we get it on the recording. Is there any shared storage support, is the question? Uh-huh, which, yes. And can you qualify that? So shared storage in... Like, you know, containers? So beyond the container, beyond the VM, is there shared storage? So just in general, Azure Blob Storage is, you know, a shared storage source. In the context of Cloud Foundry, can you qualify it a little bit more? Sure. So state, I mean, essentially maintaining state. And I think you guys use... There was a lot of gemfire use. Maybe you can talk about how gemfire was deployed, which is a... Yeah, so just to address the question, there are a lot of options in regards to storage. Cloud Foundry is a cloud native platform, which essentially separates the processing unit out from data storage. So you have a lot of different options as to... Test. Yeah. A lot of different options as to where you want to store your data. So gemfire was one of those, which is an in-memory data grid. We, you know, you could easily use MySQL, you could have Blob Storage. And from Cloud Foundry, in Azure, you can actually use any of the Microsoft services directly. But someone who doesn't like me, the microphone. The mic is not working. Oh, yeah, right. Yeah, so you do that using a user-provided service. Yes, Sean. So on some of the other IaaSes, there are tools to test your high availability. Is there anything similar to that on Azure? High availability. So Simeon Army, shut down your instances at random, disconnect your security groups. Yeah. Actually, yeah, there are some. I don't know if we announced that yet. We don't want to do any product announcements here. Yeah, I know. I'm the worst about it. There's nothing in the public? Yeah. Yeah, it's funny. I was just in a session the other day on that, but I don't know if I'm allowed to say it. I'm the worst about it. Happy to follow up with you offline on that. Yeah, definitely. Good question. Short answer, yes. Yes, there's multiple ways. So our team, we've got solutions that we've written to do that. We've got more broad Azure-specific ways to do that. Then we've got things that are coming out that we do that. But yeah, definitely get up with this because we know people. We know different ways to do that. Thank you. You talked more about the mobile application of Ford. Could you tell us more about the application architecture? Did it do more reads, more writes? You didn't run any performance benchmarks on Azure to see you were meeting the required SLAs? Yeah, I can talk a little bit about that, but there was actually a different team at Ford that was involved in the application development. They did run full load tests with their expected load all the way through till the end of the year. So they've absolutely hammered Azure and Cloud Foundry on Azure. And all those tests did succeed. Did the app have more reads or writes? What was the combination? I don't know. As I was saying, that was a different time. Like at this stage, it is now, I believe it's predominantly most reads, but I think eventually we'll do more writes. So we had a separate, we have Microsoft Consulting Services, another team that I forgot to mention that they've partnered with Ford early on to create the mobile app. And so that part talks to the Telematics platform, which is the part that is inside the car. Like I think it's the 2017... The escape. Yeah, escapes and I think I can say that, but... You're okay with that one. Yeah, I'm okay with that one. All right, four guys are gonna get mad at me. But yeah, that's what talks to that. And I think the majority of the workloads they ever read oriented, I think eventually we'll do more writes, but based on the, I mean, there's lots of docs and stuff on this. It was kind of like drinking from a fire hose when I was ramping up on this. I'm like, oh wow, it's a lot of stuff. So yeah, it was a pretty performant. It was, if you meet up with me afterwards, I think I have some public docs that I could share with you guys. But we dealt mostly with, is the marketing, I forgot the acronym for it, but it was the microservices framework that was written in PCF that was running on Azure was the big piece that we were focusing on. So how many current users out of that right now? How many downloads have happened? It was... We can't say that. We haven't been cleared by Ford. See, I was about to say it. I was about to say it. But that's why I kind of... Yeah. Thank you for the question. Good question. Thank you. What? Hi, I'm Tushar. I had a question regarding monitoring at the IaaS layer. So things like storage monitoring and you'll have two different kinds of storage accounts and like what's the roadmap look like on supporting monitoring at the IaaS layer in Azure? Like a lot of things are not very visible and clear. Yeah. Like to draw parallels, I don't want to do this, but like Amazon has things like CloudWatch which shows metrics at the IaaS layer and like what's the roadmap looking like? Yeah, the roadmap, again, the question is... Yeah, so there are some capabilities in Azure. So we have app insights in Azure which will do some monitoring. There's also a lot of third-party solutions that plug-in like New Relic, et cetera. And PCF has built-in capabilities as well. So we've actually... I think in the case of Ford, we started prototyping a little bit. Microsoft has Operations Manager and Ford were a user of that. And we did some initial prototyping of pulling some of that data into the Operations Manager plug-in in there. And I think that you see the dashboard, that big green thing that we're all standing in front of with our toothbrushes, which is weird as it seems. We're in front of a green screen. That was the dashboard that they used, Splunk. And I think internally you guys use Datadog for kind of road to your end. So my question is like, are those like Bosch health monitor metrics or are those Azure? Like are those coming from the infrastructure or are those coming from the platform? Yeah, those ones are the Bosch ones. Now, that's a great question. It's good to hear questions like that because a lot of people, they'll roll out some giant application architecture and they'll have no monitoring and then there'll be a problem and nobody will know that there's a problem until an end user calls in. So this is good. All right, thanks. Thank you. Thanks. Hey guys, I have two questions. The first one is more on the operational aspect of it. Right now, I mean, I can see there's Pivotal, Microsoft and Ford involved. Who's really operating it right now? And how did the handover work out? Well, that guy right there, one guy right there, one guy right there. Yeah. So we've been at Pivotal in terms of our platform and operations, Dojo's. We've actually been hands off keyboards. So it's basically been Ford that's been driving all of this. They've built new environments themselves. They've deployed PCF themselves. We have been there as an escalation point that they're very self-sufficient. Yeah, they've got mad tech skills, I'm impressed. Yeah, actually, I think it's an important point to reinforce is that our role from a Microsoft perspective is to make sure PCF runs great on Azure. So we've been working very, very closely and we didn't really get to it but the CPI got wrapped many, many times during this project and there was kind of some rapid iteration with the Microsoft team who was building that and contributing it to open source. Ultimately, Pivotal was helping teach the customer how to run this and manage it and the customer is ultimately managing it in the end. Thank you, that makes sense. I want to make sure that was... Yeah, yeah, yeah. As far, you mentioned that you guys deployed it with 1.5, have you already upgraded? When are you planning to upgrade? Is it a decision that Ford would make? I mean, what's your idea? The 1.6 upgrade has been scheduled. It's, again, resources and timing that's the only question to get. Right now you're learning that on 1.5. 1.5, yeah. Great. So we've got about three minutes left. So if we wanted to kind of recap some of the things, any other big learnings that you guys wanted to share? Thank God. I know he likes beer. Yeah, yeah. We kind of formed a bond there. It's like he likes beer and I like beer, so we get along great. So I'll take one takeaway, one big happy family, right? Very close collaboration. Yeah, yeah, it was kind of Pivotal and Microsoft. We've kind of partnered together and I'll be going on site to Pivotal for the next six weeks and with another engineer from Redmond. So we've got a strong partnership and we're more committed to making customers successful. So that was the main takeaway. I didn't know what to expect when, when I go on site with customers, sometimes I'm like, oh, this is gonna be bad, but I really had no idea what to expect and it was, I actually had a blast. It was fun. I mean, it got stressful at times, but it was overall we worked well together. So I'm also gonna put you on spot a little bit, John and I'll come back to you. So one of the things that I understand the CAT team delivers is some best practices and patterns and practices. Exactly right. So yeah, I'm glad you asked that. So we have a patterns and practices team. So they're part of the CAT team. So it's funny, when I was on site at Ford, I get everything in my inbox is like a variation of like hurry up, I need this. And so I'm getting all this information to engineering team. And so one of the groups that I'm giving this engineering feedback to is patterns and practices and hey, we need to document best practices, architecture best practices, how to configure storage options. So I've given the patterns and practices team like tons and tons of stuff on here. You need to write this up and make it pretty and take all my typos out because it's something that we need to get out there quick. So there's some of it out there. Like for OSS, Cloud Foundry in general, like some of the stuff's in the GitHub repo. There's some documentation there. And we just got some PCF document, some Pittle Cloud Foundry documentation that's been published recently. And if anybody has specific questions in my email, I'll see me at the end of the talk. Yeah, I'll throw it up on the screen. Hitting any kind of closing thoughts from your perspective. Yeah, I've got a couple around that. I think that also documentation is super important for the operators that are actually using the platform. So if you see an issue, log and collect as much logs as you can, submit a ticket as early as possible. And Microsoft have been super responsive on addressing these issues. I think it was a lot of fun, cross company pairing with John and with others. And it's definitely been super fun. And I just wanted to say thank you to Ford. Thank you to Pivotal. Thank you to Microsoft. Thank you to Abbey. Thank you to everyone. Definitely has not just been John and I on this. There are so many people that have been involved in this project and have just been above and beyond in every aspect of it. So thank you. Yeah. Yeah, thank you. And we'll send this out afterwards. Actually, I was going to type the email addresses in there, but... Yeah, John SI at Microsoft. John SI. I'm Jason DEL at Microsoft. I'll put these up in a second here. So if you're actually looking to run Pivotal Cloud Foundry on Azure, likely in your company probably has a relationship with Microsoft, would happily get engaged, work with you and Pivotal to see it running on Azure. There is a marketplace, an Azure marketplace offering that you can go to that will allow you to deploy it in your own Azure account. So I'll just throw that up there right now, which is aka.ms-cs-summit. This is agile presenting, right? Yeah. Oh, and it's not right. Well, you're doing that one last thing. So I'm the functional lead for Pivotal Cloud Foundry on the Azure Cat team. So we have 10 other guys on my team that have ramped up and Brian, he was actually on site with us for a week at Ford. So he'll also be working with Pivotal Cloud Foundry going forward. Awesome. Well, thanks guys. And thanks for your questions.