 dot AT You can append slash Do d As G or if they ask you for the event passcode just enter do d As G and to the Q&A. I just type your question here Easy mark who has a mobile phone Oh For real now. Good morning. He said good morning. I thought it was a little half-hearted The answer Did you guys drink too much yesterday? Yes, good job. Let's see if this works. We'll test this. All right Does anyone know what this quote is? Does anyone ever heard this quote? It's Picasso. That's right. That's a little Picasso little talk about so painting But his his his point was that everything is derivative Right like he didn't he didn't invent art. He was a very gifted painter He's a prodigy, but he didn't he didn't invent art and his point was that everyone has good ideas You could steal them, right? Does anyone ever stole anything don't you have to answer? This is this is an alternative title and this is this is the theme of what I've been trying to convince people is the core of of DevOps is Learning and I'm gonna re I'm gonna revisit that theme a few times But there's there's a couple themes learning and then Advantage sustained advantage Does anyone want an advantage? Yeah, who likes to lose anyone? No All right, so this is a word from my sponsor. They pay for me to fly around Pivotal transform how the world build software and then Andrew my mission is personally to try to transform How the world operate software, but I'm not doing it alone. I'm doing it with you Right So this is an agenda. I'll just let you read it. Well, I get drink water. I took out most of the bike shedding That was on the agenda earlier, but does anyone know what bike shedding is? I'll explain bike shit later so this is an introduction I Have done a few things and I think the best way to frame this is DevOps has been very good to me and The last ten years of my life. I've basically been trying to help people Build better IT systems with better tools and better processes and that's heavily influenced by Agile software development and open source and it's a bunch of this stuff with you know Everything is touching everything at this point. So these are projects. I've been involved with these are Books or events. I've helped organize a right and then this is the best methodology for delivering software It is artisanal retrofuturism crossed with team scale anarcho syndicalism. And if you don't know what that is well Maybe that's why you're not very good at this kidding This is the most important slide because it has my Twitter handle and that's that's going to be important for your future and Just to make it, you know mix things up I wanted to make sure you knew that I come little idea comes in various phases of beard and hair So there's important for the next time you see me Especially outside of Asia. I might look very different than I do right now Support it. I've also got a plethora of Videos you can watch on YouTube or I'll sell you CDs if anyone has a device that still plays CDs From the trunk of my car Has anyone ever seen me talk before? Besides Sergio and Matt who's laughing All right, so this is the most important thing you should do. I'm serious about this So the most important thing you should do for your tech career Is you should learn how to write? You should learn how to speak You should steal good ideas And you should follow the light on Twitter So moving on I'm gonna tell you some stories I'm gonna tell you some patterns and some framing about how I see the world and That is maybe gonna be meaningful to you and maybe it won't be the one thing to keep in mind here is This notion of bias selection bias has anyone ever heard of survivorship bias or survival bias So this is a very famous story and there's a couple other examples, but this is my favorite one where there was a study of the aircraft that were returning from from war and They they were studying where the aircraft had bullet holes and they decided that where they have bullet holes They needed to add more armor But that's actually completely wrong because the only Aircraft that we're returning were the ones that weren't lost and it's actually the opposite of where those holes are There are the are the vital places for that aircraft that you need more armor So there's lots of things that we see that we we have bias and you could give and people have given talks about Bias and how that frames how we view the world But what I hope Happens a little bit today is that we we shift some of our bias as a group towards things that help us Change our behavior and do things and that's that's gonna be reinforced. So this is the short version of the talk You're a software enabled How many people work in a company that has set out loud? We are not a software company anyone no Got a hand up. I Guarantee more of you have I Been in meetings with executives probably at companies that you work for where they said we're not a software company Right, like we're a bank. We're manufacturing like we do all this other stuff. We're not a software company It's like well, that's interesting because you have like 10,000 developers on payroll. So like what are you doing with that? right So if you if you're not a software enable you're gonna lose to someone who is in the world that we live in I Saw a few hands went up when they we just had cell phones I got off an airplane in Singapore yesterday morning at 6 a.m I got out my little super computer that I carry my pocket And I pushed a button and a car came and got me at the airport and drove me to this building, right? Like that that is that was not possible 10 years ago that was not possible like Everything has changed, right? It's not the world. We were born into and Software's what's eating that changing that so The best way to improve development I don't actually believe that in developer happiness like I keep hearing people talk about developer happiness Like what the hell is that like give him cocaine like what are you talking about? Like what it doesn't make sense at all. I don't understand but people keep saying so the best way to improve development is actually to improve ops You don't want I don't I don't actually want happy developers. I want I want healthy developers, right? I want a healthy system. I'm trying to think about this whole system not about your you know Random mystery meat docker file that you're building And then last but not least you're gonna build a learning organization or you're gonna lose to someone who is and I'm gonna define learning organization at the very very end, but I Want to start with this so where are we and why are we here? When I go to places in the world, I've never been to Singapore before I like to read a little bit about their history And the and the and the events that kind of formed that culture and I learned lots of interesting things along the way And I've incorporated lots of interesting things from that and how I think about the world, but Singapore literally didn't exist until you know, 70 80 years ago, right? It was like created From most of the like what I learned is most of the land is actually reclaimed from the sea right where we are right now was built up from nothing and I know this is true that this lion symbolizes courage strength and excellence because I read it on Wikipedia and Everyone knows That Wikipedia is true and it also It also stands for this resilience in the face of challenge and and the reason I bring that up is I think that a lot of The challenges that people are facing in IT is Basically rebuilding things reclaiming things from the sea Right, especially if you work in in a legacy situation But why we're here is we want to continuously DevOps microservices Or die trying right? If you're reading hacker news does anyone read hacker news? This is a blog post from O'Reilly radar from 2007 so it's 10 years old It's exactly 10 years old now because it was October and This blog post is arguing in 2007 10 years ago the operations is a competitive advantage It's the secret sauce of startups and what it's arguing here is that if you do things with traditional operations That you're going to have all this toil that you're going to have all this human work So that's the number of hours on the on the y-axis and then there's the scale of the number of servers on The x-axis and that this new secret way the new secret sauce was was giving people you know the competitive advantage Right, and this is this is sort of like you know the golden age of puppet 2007 Chefs about to be born basically and It's funny because like we still have to keep saying this stuff I heard I heard people in open space yesterday says they're is like is there any Data or it was like anyone doing anything better is anything faster, and in fact the data is there Like this is not Debatable anymore if you go look at the state of DevOps report and you look at what people are actually doing what they're reporting They're doing the the difference between the high-performing teams and the low-performing teams in terms of delivering IT In terms of recovering from failure. It is a drastic difference right And if you if you don't believe that you're just in denial Faster is safer right and people say that in lots of talks, and I'm going to try to give you a little more than Like what you could get from reading a blog post or watch the rest of these talks, but this is this is what's going on all this stuff There's a little bit of a problem though In that the way that gets framed and part of this is the survivorship bias is that it all just starts to sound like Panda's vomiting rainbows And especially if you're in a place that's difficult to deal with you're dealing with legacy systems You have a legacy culture you have legacy processes It's very difficult you hear people talk about you know the fancy stuff They're doing with some new tool or the fancy thing they did with some process, and that's just that's just this is hard Hard to it's hard to bridge the gap between what I should do in the in the world that I live in and and this thing That I just saw and hacker news are on you know the stage and everyone wants this pan is bombing rainbows Who doesn't want that? I mean come on But they actually don't want that here's what they really want. I mean they do want the rainbows They want scalability. They want availability. They want reliability. They want operability. They want usability They want it all for free, and they want it without changing anything Right who's about this person? Who works for this person? Who is this person you know that's raising So to make a point sometimes I just like to make the words bigger And I used to get really frustrated with this and and you know some of the work I did and some of the consulting Like I did everything I could and I got emotionally attached to the outcome, and I want people to succeed And they're like we you know how can we do the DevOps without changing anything? It's like just pay me and I'll go you know like can't help you But but what I came to understand this is actually a recent thing I've been reading It's very interesting is this notion of institutionalization theory And it kind of finally explains to me why people don't actually want to change or why they want to change minimal amount and This curve that some of you should be familiar with is anyone ever read crossing the chasm or like familiar with this adoption curve. Yes. Yes So there's this there's this tendency as things move towards Institutionalization that people are doing things not to seek advantage right, so in this institutionalization theory Model and let's let's be clear all models are wrong, but some models are useful The the early adopters and the innovators are are seeking advantage. They're trying to seek advantage At some point as that advantage becomes clear as the data demonstrates that the the state of devils reports demonstrates that that's clear people stop Actually seeking advantage institutions organizations stop seeking advantage and what they're actually trying to do is legitimize themselves What they're actually trying to do is become quote-unquote legitimate, and so they start to adopt the wording But they don't adopt the practices that the words cross the chasm But actually changing behavior doesn't or it does much slower Right, and you see the same thing with with all these waves with agile with the rest of it, right? And and these are based on this model of institutional theory kind of these three forces that create isomorphic The isomorphic pressures and isomorphic means the same so same shape So there's there's copying those coercion, and then there's normal So at the point where something is you know the certified accreditation that people want then that's what they do Right like how many people have scrum masters? Yeah That's a terrible thing Like how the hell can you be a master in two days? Give me a break and Then it's further complicated by the fact that these things just keep Building and building on each other right like there's not a single wave of change that's happening in tech right now right there's this wave of Configuration management there's this wave of cloud there's this wave of DevOps like all of them are superimposed on each other Right, and so it's very hard to kind of keep that that straight unless you're I think you you have this tension between Learning about all the stuff that's changing and like actually doing anything useful into your life right getting any work This is a very famous painting. I love it, and it's basically these waves are gonna crash If you have the waves crashing then that is potentially bad, but it can also be potentially awesome Right. Is anyone surf before? Yeah So warning surfing is hard Has anyone ever ate crap in the surf in the breakers? Yeah, the waves will pummel you and And I also want to like just fair warning software is actually hard to It's not panacea bombing rainbows software was always hard people talk about the the renaissance and you know the Collaboration and all these metrics and the panacea bombing rainbows, but I was there and it often felt a little more like this Right and this is actually picture of me trying to manage open stack Right that like is all mud and blood And you try to tell people it's hard and they're like and they're like Well, I'm just gonna build an open-stack public cloud and somehow become a service provider I'm like, okay Cool story But this is this is how I kind of came to frame my understanding of dev ops and software the five stages What stage are you in? Depression I heard some depression So this is how I frame dev ops in my own Understanding as Andrew I wrote a blog post in 2010 that kind of have these three bullet points Which I think are still true now, but I've simplified it and I'll give you my working definition today in a moment And that is the developers and operations can and should work together. I don't think that's controversial Although sometimes it seems hard in organizations that system administration is evolving to look more and more like software development So in 2010, you know EC2 has been out for a while puppets been out for a while You had the ability to Provision systems with API's you have the ability to configure systems with API's you have the ability to do things with monitoring with API's all this stuff that was Traditionally operations work looks suspiciously like software development if you're writing its API's that's software development And you can take all the lessons that you learned and all the tools that you have to do software development you know with respective planning with respective version control the rest of that and we could apply that to Operations work and then last but not least and I think this is actually the most important one And this is what brings me to places like Singapore is that we are a global community of practice Sharing solutions that the solutions that we have and that the problems that we have are similar and that we should help each other Do them and then and then all these all these communities are doing that is anyone on slack or IRC channels Talking to people about the tools and processes I mean there's a lot of great communities to be part of right now And if you're not part of communities right now like you should become part of communities And then this is a model that I'm very fond of and I'm gonna I'm gonna use this model a lot for the rest of the talk is stealing from my friend John Willis and Damon Edwards is is this model of camps and they they started with cams and then I like adding lean because I think St. Colm's is way better than saying cams But it's this notion of culture automation lean metrics and sharing I'm gonna break those down a bit more as we go But my most simple version of DevOps what DevOps means to me 2017 is DevOps is optimizing human performance and experience operating software with software and With humans Right this notion of a social technical system that it's not all about humans It's not all about tools is both of them are one thing one system And software is hard, but humans are harder And here's a little aside. So these are humans and these are humans. I train martial arts with in my life so they just put them up there and The reason I put this up here to transition from this notion of humans is what I learned from my practice, you know when I was young and not slow is That the the principles are actually more important than the practices That if you get locked in this notion of what the technique is and then someone throws a punch slightly different Like you're in trouble, right? The punch that hits you is never wrong The punch that he was not the wrong punch, right? So you have to understand the principles and apply them in real time not not something that you did as a ritual and If you don't put principles in the practice, they're worthless Right so a lot of times you have this opposite thing where people get so so caught up in their mental models of how Things should work that they don't actually they can't actually fight anymore It's like lost especially in as we move to these cultures. They're more and more nonviolent which I kind of like, you know Not being fighting all the time But in the old old world like it wasn't a theory right like you either died or he didn't right and that's the survivorship by us again and Then the other thing I learned which I believe is is very useful is that if you don't have the intention To do better to win to whatever have that advantage. It doesn't matter Doesn't matter how good your technique is if someone else has more intention to kick your ass, they will But you can have both And I never really cared about any belts or ranks or whatever like the only System that I care about is like you could either beat me or you can't Right and that that's manifested in lots of ways and you get a lot of feedback When someone punches you right or choke you or whatever So I was never interested in being DevOps Like this is a thing that happened not because anyone wanted to be DevOps People wanted competitive advantage People wanted to win people were trying to solve problems That competitive advantage is What drove all this stuff we're talking about that's like without that impetus to have a competitive engine That's that's in the curve right the institutionalization curve is those initial invasions are driven by the desire for an advantage and It's cool that you just want to do anything From Richel does anyone know what cargo colts are? No This is a very fascinating human behavior So the the cargo colting is typically referred to in this like sociology archaeology those kind of things. It's a it's a reference to a group of Indigenous people in the South Pacific that happened to be on an island where the US built a base during a war and they were flying these planes full of supplies onto the island and the and the people there decided those were gods sending stuff from heaven and When the people left the base they built a essentially a religious ritual and practice Around they would build these coconut earphones and they would make these things and they make these straw airplanes To try to get the gods to bring back the the goods right But they didn't understand and then it's cool that people go through and like they copy the thing that they saw in the blog post or the Conference talk, but they're not going to get an understanding of what that person like actually did Just by doing the the ritual with the coconuts right It won't deliver better software So here here's something I added from conversation yesterday. So I till sort of this last wave Right and people are talking about in implementing I till and I till probably seemed like a good idea once upon a time Right the idea is basically we're going to minimize the number of incidents We have by going really slow by making sure everyone checks off and signs off and thinks about it really hard And then we'll think about it again they'll sign a thing and then we'll think about some more and they'll sign everything right and That's cool, but Like we moved on from that right there's there's there's ways to do things better and we're not thinking about We're basically not going to think about minimizing incidents We're going to think about Minimizing the time to recover and we're also going to get rid of a whole host of complexity There's a bunch of questions about complexity yesterday that you can solve a lot of your complexity problems by collapsing them to not have them Right so in in the cloud native world. You don't have a Different configuration for every possible machine that's like with a project that moves through thing You build racks and racks and racks of identical stuff You collapse the complexity at the bottom you make people build on top of that and you spend all your complexity budget delivering value at the top of the pyramid and That's what I mean if you look at the first version of you see to it came out There's like three different instance sizes. That's it like there's no networking model. It's all L3 Like there's there's none of this stuff that people do with like complicated architectures And I till is not how the web's built. It's a huge disadvantage now It's a huge disadvantage. So this is a question. I want you to seriously ask yourself if you're working on implementing. I till today Are you doing that for some advantage? Or are you doing that because everyone else does or everyone else did in your in your like little vertical Right, you're just doing that to be normative. You're just doing that because that's that's the professional thing to do It's not gonna help you win The other thing and this is I verified this because the people are saying oh I till has this notion of separation responsibility and how you do that. I till does not say that I Till got implemented that way and that's been copied and coerced and normalized But if you go look at what I till actually says that's not what it says So this is this traditional IT model We're gonna put a wall of confusion between everyone Don't let them talk to each other except through the ticket system Wonder why everyone's unhappy and then and then there's shenanigans right Who does this? Yeah, this is the transformation But but in reality these are actually just they're they're opposing forces They're at odds with each other in some ways by design like ops is there to do things that are stabilized and dev is there to make things happen But the reality is your business your your organization, whatever doesn't really get Differentiating value for all this other stuff and this could be you know Let's get out more and more stuff But the point I actually want to make here is this all software right to the very bottom until there's actually silicon It's all software and that in organizations that you have operational responsibilities at all of them So when people talk about oh, you know developers are gonna have this operational burden from under their application No, they're not actually gonna have that much operational burden They just have a little tiny bit of operational burden at the very very top of the of the iceberg right for their application that they wrote Like they don't have to understand how networking works They don't have to understand how storage works like just make the logic of your thing not You know bring down the database too often like that's all I'm asking and this this practice of Operations is moving right so what we call operations today is going to look very different tomorrow right and it already does from ten years ago and And you know it's changing even now so serverless, whatever like there's a next wave of this changing So the task will change But someone should make sure things actually work Who's with me? I? Never understand the developers who like they commit code they deploy and then it's like it's literally busted You're like you never even ran this thing, so like why did you go home on Friday anyway? So people people often refer to Amazon is like this innovator on cloud And they just certainly did some things and should get credit for bringing it to market not there I'm gonna talk a little bit about Google in that too, but this is a quote from 2006 So this is the year that EC2 was launched and Amazon already had a decade of Managing the you know large-scale e-commerce infrastructure they had so this this quote and all this read it The traditional models that you take your software to the wall that separates development operations You throw it over and then forget about it not at Amazon you build it you run it That brings developers into contact with the day-to-day operation their software It also brings them into day-to-day contact with the customer the customer feedback loops is essential for improving the quality of service Now I know people that were working at Amazon you know Jesse you Rob and some of these other people were there at this time and It doesn't mean that all these developers were operating the the networking in the storage, right? Like they they were operating their service right and we need to as a as a community Understand what that means when I say your developers should wear pagers it does not mean your developers will now do every task That's just had news used to do who knows what that is So this is this is a model. I've been using and it's the it's the Chinese five elements from Chinese medicine and The reason I like this model is it sort of maps to the five things that you know the five pillars of DevOps In a way that I want to point out that they're not they're not Separate they reinforce each other and in some ways they sometimes kind of Destroy each other to right so those these are cycles of creation destruction So I'm gonna walk through this model really quick so culture you know people say all this stuff about culture. This is a Spectrum of culture. So this is the West room topology model and it talks about pathological cultures bureaucratic cultures and generative cultures And then it has a bunch of things that categorize each of those and so what I'm gonna try to do copying off the West room It's sort of bring that notion of a spectrum to like all the five elements and and and I think that if you read these Most of us, you know, they're saying people will read this and think things like Novelty crushed is probably not a good thing if you're trying to innovate around software right How many how many people work in in organization where they feel that it's generative right ebbs and flows right this is At least, you know, you don't have to say anything or not agree But I think it's easy to say that things on one end of the spectrum are probably better than things on the other if you're if you're actually trying to Generate create innovate any of this So this these cultural Ideas are really to me around aligning incentives and interests So when people talk about the problem with silos the problem is not that you have silos The problem is that you have misaligned interests and information hiding right people are people are Strategically politicizing the work and so instead of doing the right thing for the organization One of the things that really really hurts it in general is this notion of the framing of a cost center versus a competitive advantage in much of it you even have CTOs that report to the to the CFO right and so when you framed everything as a cost center, then you get very different behavior Then if you're actually trying to drive competitive advantage right, you're often under resource there's a bunch of problems that come out of that and and Probably most people in this room aren't in a in a situation to refactor your organization But you should at least know that this is possible and and there's there's other organizations, right? So one of the things I'm fond of saying is you should change your organization or you should change your organization So automation is this thing that we all get excited about I'm going to add that it's actually also quite a bit about architecture and and you go through this phase When you start learning about these tools that you want to automate everything and that's very exciting and you know You got your favorite tool and you're going to do that and then you start building stuff and and has anyone ever built this This is automation, right? No, I'm going to say this is not automation That's manual. This is automation Right and and the point I want to make here is that what we automate is is often as important as what we do Right and there's things that you can try to automate that will resist automation They weren't designed to be automated They're designed before this and so it's a huge Opportunity when you when you go in and you do these kind of projects to revisit why things exist And you don't want to lose that opportunity It is a is a failure to just take what exists and automate it with the new with the new tools, right? And and if you do that you're going to take things and you know Maybe you got this container thing and you're going to just put it all in containers and we're going to schedule it It's gonna be awesome except for that doesn't actually help Because what you really got out of that for the most part was just the the sys admin didn't have to restart the process right like you didn't get very You didn't get a lot of advantages around the the way that those things fail the way those things Actually can be operated. You just got the ability to restart it and like let the scheduler do stuff Does anyone ever play Tetris? Yeah, so I'm sure a lot of you've lived this or you're about to So this spectrum for for me I think on the on the one end you have a bunch of toil Humans doing work in the middle you've started to script things and then on this on this far end You're starting to have the the software itself is a platform that's providing lots of these pieces for you right and when I say platform I don't mean things like You know Cloud Foundry or Kubernetes or whatever I mean more like the platform is your organization is specific to you It does whatever it needs to do to be to be able to to take care of the the business you you're in right and that's of That's evolving rapidly those capabilities Does anyone ever read the Borg paper? Does anyone know what the Borg is? Yeah, so all the stuff is inspired by Borg mesos Cloud Foundry Kubernetes is all inspired by Borg It's a tool that was built internally to run all the stuff that Google does and they've run it You know for a while. They got some good ideas. They just published a paper I remember back in the day 2008 and puppet was doing lots of work with Google they were managing 30,000 machines with puppet And they would not talk to you about board if you said the word Borg in a room with people that worked at Google They would all look at each other and then stop talking to you for the rest of the night Till he got really drunk But this is the most important sentence most important piece of the board paper So all this stuff about about container scheduling and the rest of that This is actually more important in my opinion and it's overlooked Every task runner board contains a built-in HTTP server that publishes information about the health of the task and thousands of performance metrics If your software tells you when it's healthy, you have huge advantage over trying to come after the fact and put monitoring on it Right so get getting that is a huge thing. So that takes us to metrics This is the this is from the service reliability site reliability book and the service reliability hierarchy Which I don't know why they switch back between service and site in their own writing, but This this base of the pyramid from a Google perspective. This is Google telling you how they do things is is monitoring So service low objectives. I'm gonna come back to that a little bit later. What are your service low objectives? Anyone know what that means? Does anyone know what their service low objectives are? This is language that I hope we all adopt as an industry steal it from Google What are your service low objectives and and how do you know if you're meeting them? What are your service level indicators? So so this next phase is that is adding to this idea of the evolution I'm gonna come back to that actually I this is I didn't set this up as properly as I could have but there's this notion of The s-curve of commoditization is that things start as innovation? This is stealing from Simon wordly start as an innovation and then they move up to commoditize So in it what we've seen what cloud computing actually represents is a movement from innovation to a commoditization to this utility model and That slide was probably a little bit out of place But the the spectrum for monitoring is at the level of insight. So now there's this massive Debate slash discussion around observability is anyone following this discussion? So so there's we've moved from unmoderates to having some base like understanding We should have some metrics to now getting really really fine-grain tools being able to ask questions about the software that we're writing We're coming to the home stretch of this model. So there's no sort of sharing So we're a global community of practice right like group hug everyone smiles and Then people have this this wall of confusion between their dev and their ops and they think the way we're going to solve that Internally is we're going to make a new silo So we're going to solve our problem with silos by adding a new silo I'm going to say you shouldn't do that Not not because you shouldn't have silos, but because that won't actually solve the problem of silos Just don't do that what you actually want to be able to do and this is oversimplified is that because because developers are not a monolith Right and developers on the front end are not the same as the developers on the back end and then operations We already talked about the stratification of the of the platform up through you know from the bottom of the Hardware up through the services to do some things like deployment and monitoring each of those have their own operational Aspects and then let's not forget the business. That's probably a good idea to have something that makes this get paid And then you know last but not least but they they should always be the least and on the bottom is security What we actually want to do is create these communities of interest So each of these in its own right is a community of practice has notions of excellence has notions of best practices And then inside of our organizations We want to be connected and sharing information and aligning incentives to deliver value as a community of interest across all these concerns So this is a this is my little model spectrum in in terms of sharing that in Pathological situations, there's a lot of information that's hidden They're very strong silos everything is secret and then in the middle You may get a little bit more alignment and you're starting to have things that are secret to the company But there may be shared internally and then last but not least I think that the you know the highest evolution of this is participating in global community of practice and I can you know I could call Matt Ray in Australia anytime and ask him questions about habitat and he'll probably answer them right and you could too And then last but not least lean actually subsumes all this stuff If you go look at lean and what that model talked about and what they did in that community You know it got sort of started in manufacturing But it has this notion of continuous improvement and I think that's that's what I like adding lean for and so It's not just about having metrics. It's not just about having culture It's about continuously improving the notion of continuously improving that stuff and also because calm sounds way better in the camps So that's a little you know will win walk through this notion of the five five pillars this five model We can talk about it in the open space later and here's a you know Hamphisted attempt to kind of build The spectrum so that you could see the the at least a framing that I like about what is better and what is worse for each of these five And then this is just me being funny Because you gotta get that sheet of flow right So coming into To the end towards the end. There's still there's still a bunch of slides, but I'm going to go really fast So what I want you to understand is the context is key right and that scale breaks everything And I said this yesterday in the open space So there's this ant an ants can lift 50 times their body weight right and they're very very humble about it I can lift 50 times my body weight And then there's elephants and they're big and you know they spend all day eating right So can you have an elephant sized ant? Well, there's a problem and that's the physics get in the way Right, so you have this thing called the square cube law and that is that when scaling a physical object The new surface area is proportional to the square of the multiplier and the new volume is proportional to the cube of the multiplier Which means that the elephant needs to have bigger bones right the ant actually breathes oxygen through its exoskeleton And if you try to make an elephant sized ant it would require hurricane strength winds To drive enough air over the pores in its skin or its exoskeleton to even have a chance to die And then if it tried to move it immediately died it just crushed itself right So lots of devops conversations sound like Hey, I can lift 50 times my body weight right when you when you're getting advice from from the people on on You know whatever conference and they have this little startup and they're like we did this fancy thing It's like well I work at an elephant right and like if I try that everyone will die so What's the organizational equivalent of the square cube law? I don't know either, but I know scale breaks everything And devops is scale There's a existence proof that that there's this way to do things at google and it's free and you should go read this book So this site reliability engineering book has a bunch of great things from google And i'm going to give you some homework in a minute, but sharing is caring right They're part of this global community practice took them a while they used to be secret about work But now they're sharing and you can read the book right all these things that I said about Developers and operations working together solving system administrations with software and participating in global community It's all in that book right the five check marks But the thing you you start to realize when you read this is that sre's are actually architects too Like they're basically architects governance compliance of the way google does stuff right they change that whole Mental model about how they do security how they do the rest of it because of the way that that that process and the and the Software first mentality change everything So this is actually a quote from the book So another way perhaps the best is to short-circuit the process by which specially create systems with lots of individual variation End up arriving at sre's door provide product development with the platform of sre validate infrastructure upon which they can Build their systems this platform will have the double benefit of both being reliable and scalable You collapse the complexity by giving people solutions that don't have that complexity right you you make that problem go away The sre builds framework modules to implement canonical solutions for the concerned production area as a result development teams can focus on the business logic Because the framework takes care of correct infrastructure use So people read about sre and they're like oh the sre's took the burden from the software developers They took the burden of production It's actually wrong and and it's it's a it's a blessing and a benefit to get sre support inside google and most projects start without it Right so developers are actually the operations for their own stuff on the services They have the the full the full palette of of you know google's platform to build with but they're the one that run their service Until the sre's get involved But the goal is not to take the operational burden from the software developers The goal is to make the operational burden negligible Right, so we're trying to eliminate toil by making those tools making those libraries making those modules reduce the cost of operations So this is your homework I want you to read embracing risk service low objectives and eliminating toil from the free book that you can get If you search sre sre book you should read that So what are your service low objectives? You should tell me next time I talk to you And we don't get it decide when the competition gets an advantage But we can decide to learn and we can decide to act And I've been to lots of places. I've had people read the same books been the same conferences Given the same advice the same tools same resources basically the same organizations Everything about them looks the same, but they get drastically different results Why is that well what I what I came to realize is that that's correlated with their ability to change their behavior So learning is to me is about change behavior If you're trying to learn to play chess if you're trying to learn to play music You're not better until you can beat someone better. You're not better at music until you can play the new song Right, and that's really easy to measure In organizations and delivering it. We're not always good at that So this is a model all models are wrong some models are useful about organizational learning I I what I recommend you do is read the organizational Learning model and there's a questionnaire and take the ideas from the questions and reframe them as statements And I don't have time to go into all this stuff But if you're if you're in a place where people are continuously learning everyone can ask questions The team's learning everyone's empowered regardless of their rank You have a model for how you communicate the systems connected and your leadership strategic. You're probably going to have a pretty good time So very very very very big finish coming to the end. You haven't learned anything until you change your behavior This is one of my friends. I hope it's not you I'll just let you read it So saying devops does not fix the pathological culture saying devops does not fill a lack of vision saying devops does not align Incentives and interests you haven't learned anything till you change your behavior Software is creative software is complex software is not digging ditches Software is not running factories software is closer to art than science The principles are more important than the practices more important the tools I didn't really talk about tools at all today the mindsets are more important than the skill sets are more important than the tool sets Adapting is more important than adopting Why is more important than what so this is my call to action Investing yourself by investing your community don't attach your identity to your tasks To be as to do talk does not cook rice Eliminate toil change your behavior change your behavior change your behavior Good ideas are for everyone steal them and then give them away. So these problems are not technical They're not people the problem is social technical. It's one system all together and you have to solve both of them So let's build better software. Let's build better organizations and let's build a better world We could change everything Again, I'm going to keep saying this you haven't learned anything until you're changing behavior. So here's what you need to succeed you need courage strength excellence And resilience in the face of challenges I'm going to steal that Thank you So I'm not here to answer questions. I'm here to have conversations. We'll have to open space later Last but not least That's it Thank you. Thank you. I'm here on uh was a very good stop of uh second date. I will uh We'll continue the conversation in our demos day Open space So we'll do the same what we've done yesterday. Please continue the conversation. I'm feeding for our welcome our second speaker Which is i young will dress from sunday chapter So I Shared all the things that happen now He had an ignite last year. So it's always Really good to see how as a community You are the second So I Hello everybody who's participating in the DevOps Singapore conference It's quite the challenge to speak at the Andrew. He covered most of my points. So I'll try to add something new First of all, I'll speak very quickly about myself I have a background in academia for five years as a university lecturer. So 20 years in it mostly 13 in financial industry Pretty much I was from the beginning beginning of 2000 very much involved in agile And then lean cams DevOps automation and new technology In my current role in standard charted I doing i'm doing digital transformation for Foundation services, which is data group So what we will discuss about so I joined the bank a year and a half ago So for the project for digital transformation one of the biggest project in the bank for enterprise service bus So we're talking about this the successes and failure in our journey What challenges we have in our enterprise Environment so less than we are learning learn and still learning as a team And what we try to build Next and how we can mitigate the risk Just very quick just to understand how many of you are working for and then big enterprises Okay, so these people relate with my pain So and how many of you know about the enterprise service bus? Is it anybody familiar? Okay Good. So what is enterprise for the people who are not familiar with this? So what is enterprise service bus? So As historically when you start the company you have one single application We you don't need to talk with any other application is talk with external clients or it's a website or anything so Once you start start adding additional application You need to communicate one to one At some point your environment start growing Growing and become similar to Mesh and this is in the case only of the six applications. So instead of each other we have around three thousand I mean not all of them need to speak with each other, but many Speak so and all these applications speak different languages different protocols different Some of them can speak jms another can speak xml somebody's speak mq So it's created a lot of complexity. So and to solve this kind of the problem. We're moving to enterprise service bus So it's considered as middleware We're all application talk with enterprise Service bus and service bus basically talk one other application Based on the of the enterprise service bus. It's a based on the so-arranted architecture. This is very popular Like 10 years ago. It's still popular some enterprises Now everybody moving to the rest with the graphical and all the new tech stuff But as the bank or many other big enterprises, we still pretty much stuck with the old old style And we slowly moving to the new new technology stack so as an Enterprise service bus is doing two things in our case is doing the message transformation and message routing When we say about what we talk about the message transformation. We're talking about uh protocol exchange From one protocol to another some application might send xml another application can consume jason So this enterprise bus basically translate xml to jason in this case when he came to the message routing again Because application doesn't start a charter bank. It's a across asia pacific. It's a global bank. We have different countries different countries have different regulations and again, there's the enterprise Bus must know about each country. What a regulation where to send information when not to send um And everything was related to enterprise service bus. You have basically only two players You have a consumers of the service and you have a providers of the service And psp is somewhere in the middle So this is a very high level general architecture. How it looked like So the consumer talk with the integration services You have a message broker and you have integration manager caching or management layer, which is talk to the database So in some cases you write your audit data into database Or any other platform message broker usually routes the messages integration service Take functional services and you have a services basically publish Then do the routing internally as well So a year and a half ago when I joined the company, so the project already was around three years old And already mostly implemented, but we identify a lot of problems In financial industry, you have a term of the criticality of application So you have bcp one two three five two three four five up to the five So the five considered one almost one most critical piece of the software in the bank in the So this mean it's pretty highly regulated by mes So you can have a downtown Downtown you have a down You may have your application down around for four hours in a year everything any single any single Problem with the application will need to be reported to regulators so and Edmi enterprise message bus is one of these applications and just to understand the complexity every time you go to the ATM machine So you put your credit card or debit card basically to send the messages through enterprise bus So it will send few messages to different system one for eye banking for example Retail check your accounts another will send your risk just in case if you use your credit card like five minutes ago in In Cambodia or now you try to on to do the online banking from Singapore. So basically this is a high risk So this kind of the complexity is there. So that's why is It's very critical. So and in order to accommodate this criticality and the high availability of application we we saw To fix two problems basically organizational problem and technological problem Uh When we arrived we saw few problems. So with the teams it was classical traditional waterfront model approach on the development Was project-centric delivery. So basically no clear ownership No pro no program or portfolio governments level so We didn't have a clear everything was kept in excel files Or all the services what else publishes and sometimes you have a multiple version of it and it was pretty messy And narrow specialization of the team even in the development We have a team who is managing infrastructure team actually who developing the services and they've been aligned based on the domain specific level Then every single delivery it took around two to four weeks Even if you have one line change in your code you'll need to wait around three weeks to deploy it in production just because of the paper heavy deployment and The team is pretty big is around Around 100 people. It's a 70 developers. So it's quite big Almost no automation very very little basic shell scripts automation As a factory an organizational side Additionally the environment how historically was structured was very inefficient of use of resources So even though we had enough capacity to accommodate All most of the project historically only few projects Was using the specific infrastructure or the more precise of virtual boxes that was used and And this lead actually to the all the projects here at the common infrastructure only few infrastructure boxes was used The conflicting deployments each project have maybe some specific configuration types if one project changed basically rest of the projects would suffer Because again the problem was the development Or infrastructure team didn't had absolutely no control over development environment One example would be if your application user account was locked You need to raise a ticket to another team and sometimes you may need to wait two days just to reset your ticket in development environment so Extremely heavy processes everything need to be was a BMC ticket or JIRA ticket remedy ticket So it was very very happy from technology stack side, so We had we we discovered through scripts, which is more or less running and most of the time didn't So no automation or provisioning of infrastructure at all because people didn't have a right access So manual code deployment For the web interface click click click Again manual infrastructure Maintain us if you had like partition with the logs field you usually find from your consumers and providers who complain that the message doesn't run So and then you go and until you find that actually it's a log creative problem or the disk space, right? So it took another two to four hours just to figure out that there's something wrong So no monitoring or alerting obvious So this is the challenge in which we start facing a year and a half ago This is approximation is a factory model So only a few teams each team tried to even you have enough infrastructure basically they still try to share and again, it's a reason was behind it because the these virtual boxes had access to the For example the firewall was open for external consumers and providers for example if you use Google pay or apple pay alipay So usually in order to do the testing with your clients you need to have access and once you start with this With this infrastructure you you don't want to move so it's great If you don't have any automated model, so it's become very difficult to move to any new new infrastructure site So we had pretty dining difficult problem to solve so we focus on the A few things we needed a new organizational model So we couldn't implement any automation if you are underlying Organization is not strong enough to support you. So all your efforts in automate will fail so First what we decided no manual code deployment From development up to production. So no manual infrastructure maintenance This means everything will need to be monitored and to be automated to clean up whatever it will require And because of that It's a vendor during an application enterprise is not in in-house development And we're usually lacking behind two three years from development newer version newer patches So in any deployment, it's usually take like six to ten months A new version even it's a minor one Because of the dependency of the consumers and providers you need to test everything And we decided to look for the new technology stack for the enterprise service bus The first thing what we did we start looking into implementing agile frameworks agile methodologies. So We start looking into the based on the on the scale agile frameworks. So because of the size of the team Uh Focused on the product-centric business domain delivery So we need to have a very clear definition Defined program and delivery levels. For example, we try to To get much more collaboration Uh We implemented the combined but not the scrum one of the reason of the combined because of the size of the team And experience with the jl frameworks will be much more easier to implement combined the scrum So in addition to the type of the work that was done there, it's more linear As the teams are involved in the project. They're not Self-sustained they usually depend on the consumers and providers and the consumers and provider usually they manage the pace Of the development. So and combined it was much better suited in the in our case than the scrum And we started working basically towards the DevOps automated everything as a culture for the team So this was a very very hard push from the leads Up to the team members who do the coding We didn't had any Any testing framework. So we start developing a testing framework Specs by example, for example or ATDD We didn't have any catalog as I mentioned we had a excel sheets different versions very big We didn't contain any relevant or any information that Real real-time information what what we are running in the production and what actually we're having In this in this excel sheet. So we developed the service catalog Which will be directly linked to the production system. It will be automatically updated once anything is released in production We build continuous delivery pipeline using the Jenkins And streamline our release management We've In order to go to the big banks. So it was a very big change for the teams And or the dependent the consumers and providers so we couldn't afford because again of the criticality And from the if you look from a release management perspective, I think the 30 30 to 40 percent on the weekly deployments of the bank it was This team was doing it So if you think around 3000 application and 30 percent of the load of the deployments in the bank is done by one single team Because everybody tried to change something. We use some services or deploy new services so We split into the phases. So we put a free basic phases one We try to decouple the first number one We try to stabilize the environment for development to work properly to not conflict with each other So we decouple infrastructure based on domain. We have a domain like wealth security risk Which is have very very type of the packages that they deploy a very similar classification part This help us to efficiently manage resource utilization. We get control over the development environment. This is first thing what we did we fight it with our Unix team to get a root access to the boxes. We need full control. We couldn't afford to wait for it It took a while, but we get it We automated most of the tasks, which is was done manually So it was pretty straightforward one to one each team had their own environment. They work On the basically they have their own vms. They could deploy Another step was done. We just start training the teams that they start getting ownership what they run In the development. They also should support if they have a problem. They need to try to fix it The phase number two we implemented continuous deployment using the ansible It's a vendor product, which was probably developed initially in 90s. So Everything was pretty manual. So it took some time To to make it ansible Ansible work with it. We implement continuous integration using the Jenkins Implementing the source control system with a big bucket inside the bank. So we split the configuration level and this And the source source levels Because of the different life life cycle management With feather decouple our common components First thing we did we implemented a proper monitoring solution. We already had the itrs genius if you're familiar running in the bank and we start We implemented the elastic stack with the kibana for logging and analytics Approximately, this was how was looking after we After the second phase we had a in front of the reverse proxy So we tried to obstruct direct communication for the teams and users to connect to the service. We tried to go for reverse proxy elastic would take over the analytics Previously we ship all our analytics into the Oracle database and was very heavy If you want to query something for more complicated query, it took around 10 to 20 minutes to query for any data So once with elastic it took less than a second Maybe one second So it big improvement there. So we start using the git and artifactory where I can publish our artifacts On the phase three that we are working now. So The mind problem with the big enterprise application, they try to do everything for everybody And usually when you try to do this, it doesn't work very well So because each team more mature the team, they need more specialization. They need more specific use cases that It doesn't work. So and at some point we realize it that we are We are the bottleneck for many teams to evolve So they need to comply with our own processes because we are in the middle So our idea now is to build infrastructure as a code that to give the team each team possibility to deploy their own Integration servers as they need instead of coming to us and begging for different changes So we'll give this possibility only possibility with a new tech stack, which it could be the docker kubernetes. Now we're working We try to move ahead with a continuous deployment into production with ansible and though If you remember, this is a bcp5 application. It is quite challenging Again ansible run that kubernetes We try to shift We work with our support team operation team. We try to shift the focus from Stability that never worked historically. We try to shift to recoverability And we try to move towards The model so you build it you run it as andrew mentioned earlier woggle In 2006 speech. So basically we tried to go in the same direction so we try to build more on the face so We'll be running at the kubernetes clusters. It'll be completely auto deployed each name space will be per project Each container we run one single service so because the technology of the kubernetes allow us To do the replication model we can replicate as many as you need in your yaml definition file a replication control side It's allow you allow us much more flexibility and reliability we can do the canary deploys so We'll blue green deployment. We can shift the traffic 30 percent To make sure one of the environment is stable enough and then can switch the rest of the Of the kubernetes cluster for the specific service challenges So this is was so we had tons of challenges and still having it probably will have it So one of the challenges we face Was front to back integration projects. I executed using traditional waterfall model So it's not we're not talking about any more enterprise service bus team. We're talking about the consumers and providers Yeah, so consumers and providers sometimes slow some consumer could be very fast, but You as fast as your slowest consumer and provider you cannot go faster Incremental delivery very challenging. So we tried to introduce a gel release train model. This should mitigate this part Again, we have a huge wall between delivery and operation teams. Again, it's a classical anything To be done is go for very tedious process of documentation with need to specify every single line. So it's create a lot of headaches We try to bring operation in development recently. We start glow working very close with the operation team Basically, they now become part of the development So we identify a few champions from the operation team, which you work Day to day basically with the development and they they spread knowledge further Uh, again, we don't have much adequate tooling for the supporting ccd. So again, it's a manual deployment does take a lot of time We try to use moving from Ansible now more to the Kubernetes model Uh, if you're an infrastructure you treat it as a pets Ansible probably will not be the best answer So Ansible is stateless. We saw a few issues during deployment in production Many things what you assume you're in your playbooks will not work if you are infrastructure, it's It's like you treat it as a pets if you cannot spin it or bootstrap it so we saw we had few issues with Uh, with Ansible so kubernetes probably will solve us because it's abstract all this complexity uh, we didn't have any performance monitoring tools now we're looking into few few possibilities We implemented elastic stack in production. So now we're moving to complete audit log. Everything is done in elastic For development environment Very shortly soon. I think maybe in the next few weeks we are going into production to complete audit side One of the challenges of the human resource Level most of the team are located outside of singapore. So it's very difficult to find the champions inside the team who can Drive the challenge Drive the change So one of the big challenges everybody usually is pretty much compliant. You tell them what to do They will do it, but exactly you need the people who can drive it So this is one of the challenges. So we try to hire the right people for the job Budgeting it's a big problem as well for us If you want to try to be agile or lean in your mindset, so you try to decide as As late as possible In order to decide as late as possible You cannot do it with the current enterprise budgets Because usually how the budget works in the beginning of the year or in the end of the year You provision or you think what you will do the next year And you need to spend as much as possible because just in case because next year we'll not get the money So and this is great a lot of problems because you don't know how what you will build exactly So you may have some ideas, but or you do over provisioning or under provisioning And if you do over provisioning probably You will get something but it doesn't make sense anymore because you may change your mind Or under provisioning your new option your new technology you may choose may not be covered And become political again another challenge. It's a vertical organizational structure So Example will be for example the support model development team are all ended up at CEO level So if you need anything to be done across the team and you don't have authority Only out authority person who has a CEO it's a global CEO So if I want to complain I need to go up to the higher letter Which probably not do the case. So From the organizational side we need more of probably horizontal teams who has more power to change Another another Another point what I what we we saw in the enterprise level companies the teams who are less independent of the Global enterprise application are most efficient and fastest in development So we try to abstract for the model bring one big application that support every single Thing for everybody. We try to go very small if you want to move fast. You cannot rely on the global application So we try to bring our own application. We try to manage it and try to deploy it So that's it for We still have a lot of we end just the beginning. So we have a lot of work to be done And wish us good luck So That's So So I So So So Hi everyone, hi again A quick reminder regarding the pigeon hole. So we don't do questions during the speaker stock Instead is go to the pigeon hole dot at And submit a topic or a question that you want to discuss during office basis Second use double face sg hashtag if you have or you want to share something with your in the social media and third If you have colleagues who didn't made it to the event we have a live stream On youtube you actually can share it with them so that they can virtually attend the conference So the next speaker is Amit Kumar who's going to talk about creating a demo culture in a bank When I asked him how is this relevant double face It's actually tips and practical experience that probably can be used in How many of you are working for a bank? So so hopefully Ion and I mean Will share practical experience from how they do the digital transformation At their workplace and you can take it and actually put it in practice at your At your workplace With your managers. So without further you Amit Yep Thank you. Thanks. Thank you. Can everyone hear me clear Should I use this one blink, right? Do you want to quickly send me this presentation of 250 max plus it problems We have a USB stick I do have yeah This is the one that I got yesterday Yeah, should we try this it works We don't have okay Good morning, everyone. Finally. We are wired up. Okay. This is working. Okay So as Sergio said, my name is Amit Kumar. I am a software craftsman currently helping a bank in their digital transformation journey You can find your internet With this handle everywhere to amit So like I said transformation Journey in a bank is the topic that Is more sense interesting for people because banks are being disrupted by the ever ever increasing demands of the customer And if I think about it Has everyone seen this movie? How many of you have actually seen it? Okay, it's a curious case of Benjamin Button If you think about a transformation in any large enterprise including banks is the same In short the story is about a kid Which becomes younger as it grows right So large in large enterprises are actually Similar any change in an enterprise including banks Is slow because of the processes regulations governance requirements that make the change slow however And this is by design banks are designed to be cost efficient and stable And that's why these processes are there and they have some sense to be there to actually be efficient and stable But in the current era of digital where everyone is saying that I want to be digital Things are changing at a very fast pace banks can no longer be Doings just the savings saving account business or lending business They have to change they have to change the way they operate They have to change the way they deliver software So banks have to become more like an IT shop Which means that They have to basically build new solutions And the way I see Most of the organization or large enterprise what they have started to do is Build beautiful interfaces for the customers via a web channel or mobile channel But they fail to realize that if you build a Ferrari, you need that road to run the Ferrari You have to change the core system as well Which means if you have to build run a beautiful software You need to change the core systems. This does not mean that you have to change the whole core system It's about Achieving speed and efficiency, which means there's there's certain things that need to take care when you're actually doing the transformation journey You need to change the way you build software Which is go away from the traditional software development of waterfall and introduce agile You need to change the cross you need to bring more cross functional culture Break the silos between business and IT which means that business is no longer going to throw the brd documents over the fence to IT to build it It's more close collaboration You need to get the skill set and the mindset which we are going to talk about how do we we are doing at the bank Introduce DevOps and continuous delivery and all of this beautiful things that you have heard I'm not going to talk about the details because the the focus of the my talk is going to be around DevOps culture in in btpn How many of you have heard about btpn? anyone One hand no one else. Okay A quick introduction about btpn btpn is a pretty old bank in jakarta Established in 1958 their core business has been National pension savings. They have been national pension savings bank But over a period of time they realize that they are being disrupted and they have to disrupt the way they do business So in 2016 They launched a pure digital bank And I when I say pure digital bank, which is completely branchless There is no branch of genius I would strongly recommend to go and and and read about what genius is capable of doing I think genius is mostly about the customer value proposition a bank in in your smartphone Everything that you can do in a branch you can do on your smartphone However, when we when the talk of the talk topic is DevOps culture What do you think is the culture Andrew was talking about cams The DevOps culture or the culture that requires to be built. But how do I know what is the right culture? What culture do I need to build in the organization? Every organization has its own way of living and breathing Which means you need to identify the culture of the organization and then change it And this goes back to people and mindset People have their mindset changing the mindset is the most hard thing that Every human being can do and it was very difficult for us as well to change the mindset And bring the culture of automation bring the culture of engineering practices The first thing that becomes very important where agile scum helped us is to break the barriers or the silos of the organization Where we brought cross-functional teams together. We made business responsible for for the ownership of what gets built An IT team is responsible for the software including the operations so When you start to build the culture you also need to take care of improvement Continuous improvement is part of the cycle When we say DevOps, you know the fancy word is you build it You run it You break it you fix it Right, but then When you say that statement, it is very very easy Ask a developer how many of you are actually are actively developing one two Have you ever run a software on production as a developer? Okay, very good It's not an easy task as a developer the mindset is always limited to the The backlog story that is in front of you and you will only be limited to that You will not be thinking about how this software is going to run on production because the easiest thing is Hand it over to operations guy. I am responsible. I will do the automation. I will do the deployment But how the software runs? I don't know how the software runs because it works on my local machine It worked on s it worked on u at but on production if it fails Then there is environment leveling issues and those how many have actually heard environments are not leveled Environments are not leveled because this is happening in production But it worked in u at because they are not similar and then we go into this loop of leveling the environment We have to make the u at environment similar to or the staging environment similar to production But in fact if you are able to run a software on u at environment or staging environment issue run on production That should be very straightforward So that culture Is important the continuous improvement culture. So we started to say go beyond You build it you run it you break it you fix it What I started saying to them you touch it you improve it And this is very important because in large organizations you are you will have very less opportunity To build a software or or a build a solution from greenfield or from scatch You will have many solutions that exist That have to be extended there are core system that needs to be extended And if extension is required you will be adding features And if you are touching that feature or a particular part of the software you have to improve it Because anything around that boundary if it breaks the person who touch it is responsible for that And if if we are able to to build that culture it becomes easy because then people start to improve Because none of these software have any test coverage So the owners on the developer was if you are going to touch it Add any functionality you are responsible to write test coverage And then you have this beautiful heat map where you can say that the test coverage is increasing in bubbles in the various sections Or places or components and then the software becomes starts to improve and become better When we started implementing DevOps Things started to fail And they will fail And I'm sure all of you in this room have experience when things fail. It becomes very hard to justify because Especially with the senior leadership because for them a running software all of a sudden started to break because a new way of development or deployment or automation was introduced and that is the responsibility That person who did that is responsible so we had to break this culture of Or rather bring this culture fail fast fail cheap very easy to say but then actually Celebrating failures is very hard You cannot stand in front of business and say hey, I failed. Let's celebrate You're there already to accept it their business Is topsy-turvy so they cannot basically say that because I'm reducing new way of working things are failing apart or falling apart So the culture of celebration or the culture of failing Quite often is very important. So we started to reduce the cycle of going to production Identify when you are introducing automation the sooner you go to production The sooner you will identify the bottlenecks and that becomes a very important part of automation So we started doing this is a beautiful concept culture of post-mortem How many of you actually do post-mortems when something goes wrong on production? What do you usually do you find the problem you fix the problem and yeah Uh And usually we forget to take actions because then there is next thing important is to deliver something else So celebrating Post-mortems becomes very important In the bank we have started to adopt this culture of celebrating postmortems So any incident that occurs on production and there's a the the link at the bottom of the slide is is is how google does postmortem That's a beautiful Uh template you sort of you can use the template for for doing postmortems Because at the end of the day if we are able to create this culture where the cost of failure is education Then you are into the game Of building the culture in the organization which is going to be basically change over period of time Culture is also about how the teams are organized And agile's come helped us like I said, but culture is also about how What is the work environment? in banks typically what you have is Cubicles and cabins where the managers are going to sit and developers are also sitting in one corner where there's a closed box We broke that culture of having cubicles and cabins It's a completely flat structure This is a new breed of bankers Which was quite fun because then bankers are typically behind a teller sitting behind a A closed glass and you have to basically insert your hand and get some cash out This is new breed of bankers Which which is helping btpn and its customers become digital So when we started the journey, so let me go back a little uh behind when we started when we started There are typically three approach to the transformation That I have seen worked pretty well The third one does not but let's see the first one is the lab approach What essentially it means is that you Ring fence a small team A cross-functional team programmers security people qas sysadmin operations guy You ring fence a team And you actually run the team to implement the whole Chain of DevOps practices And once the team is successful, then you start to scale. That's a lab approach second is a spike approach Where you actually pick up two to three teams And start implementing DevOps with those teams The the problem with this approach is if the skill set is not there or people are not cross-functional then the adoption becomes slow And the final one which is The easiest one is the big bang approach. This is the most easiest one There is a sarcasm there Is the most toughest one because you have to then design the whole DevOps org structure And then do a big bang This is the most difficult to implement and I have seen this to fail many many times because There will be a lot of disruption that is happening Because of the the expectation that this culture or this org structure is going to work in the organization Because it has not been tried and tested So we went ahead with option number one and two a combination of both Which is A lab and a spike So we set up a team Which is a cross-functional team programmers Sqo people sys admins and operations and we call them Copassus Copassus is an indonesian word That means special commandos They are on a special mission And they their mission was to actually start implementing DevOps practices Now when we started we actually fought We we we started to put together objective with more measurable goals. This is what Stephen was talking about story mapping yesterday Which is you have to put together your objectives and objectives should be Quarterly objectives typically what we do is we put together a big plan and then start the execution We said we will not put a put a big plan. Let's put what we can achieve in next quarter Put together our objectives what we want to achieve including individual aspirations So a programmer is aspiring to learn more on sys admin sys admin wants to become a programmer So those aspirations were really good because then you are actually creating a cross-functional team Once you have that then you have then you create your backlog This is a snapshot of mission six for copassus So we we were saying mission one, which is sprint one or mission two And stuff like that. So the missions were established, which is a bi-week bi-weekly missions for the copassus team And they started to sprint they learned they stumbled They got up and they started to run But when we started There are certain principles and practices that needs to be put in place bank has svn and ibm rtc as a source code or version control system We started to say that we will use more modern way of Control version control, which is git Now if you have if you have been using svn for a long time You will have which what I used to call is svn mindset. You will start using git as svn So you have to unlearn a lot of svn behavioral patterns to start using git And that's where this manifesto become became very important for us We publish this manifesto manifesto and force the developers to place to this manifesto Which means that if they are not following following this Manifesto then they have to pay the penalty for it Every team has cicd is very common. You basically create a cicd process You have continuous integration and deployment and blah blah blah. What is important is When you set it up, are you religious about following these principles? Right And that's part of the change of the culture and I'll I'll talk about it in a minute Let me quickly explain some of the practices that are very salient to us You see this term chakra. I'll talk about chakra as well The pronunciation c is ch in Indonesia. That's why it's pronounced chakra not kakra or sakra So some of the important salient things that were Which we which we held very strongly was you build once and you deploy multiple times So this is very common, but practicing that is very important as well We build once which is you have a build cycle And only once it gets built in a site environment beyond that it gets promoted across multiple environments You have And and that applies to any any software whether it is a mobile app Or an api driven backend app or a web app doesn't matter the principle applies the same You build once and you continue to promote across various environments Second which was Which became a little challenging is how do we how do we manage the the quality of our software? most of the quality Parameters or tools that are there are cloud based So it it was a little bit of challenge for us to use sonar cube and configure sonar cube because there are various ways to do that Everyone knows about sonar cube, but how to use it efficiently. I don't think I have seen that work pretty well We are able to get some good quality attributes of of a software, but we are still not there What this process helped us was to go to production At any point that we want Which is within 10 minutes from the developer machine to production. We are able to ship it and here open shift helped us a lot because Not I'm going to talk about the benefits of open ship But it it is easier to have a cloud native solution where you don't need to worry about Especially the skill set that is needed to to run a cubanity setup Talking about chakra since we started implementing microservices in the bank It becomes very challenging to know what version of microservice is deployed to what environment At the same time if microservice is being used or reused by various business units which Very soon started happened happening to us that An identity management system developed by one team will be reused by the bank Now if that happens there is a version that was maintained by the team now if gets reused across How do I know which team is using which version? Now this all of the information is spread all over all across various systems This was the first and and I I must say this was the first culture change that we started to see in the bank No one asked the team that you need to know this And capacus team felt that there is a need that we should know this what version is deployed where The information information about the source of truth was in git lab The information about what is the version of the current build is in nexus or docker registry What the information about the build pipeline? For cd is with jenkins which because jenkins was the orchestrator for us and actual Truths of what version is deployed on production was an open shift Which means that you have to collect all this information from various systems calling apis And then you have a dashboard which will say that this particular application, which is a project has 16 microservices of this version So that's where chakras solved the problem and the culture Was there people started to feel it and they started this is a homegrown product that we built for ourselves The second thing that we saw the pattern of because we were we started to use open shift extensively in the bank That option was very good people loved it because then it decoupled the operations from From from the the operations of the sys admins and brought it closer to the to the developers because they were writing their docker files However, when we had set up open shift for the first time it was a manual Installation following the manual that we got from red hat Which means that if you have to scale it you have to keep on adding worker nodes Adding a worker node on demand with a manual step is very difficult Although it is It was not very complex, but it was very critical that we do the setup Once again and now and stabilize that So what the team did which is another culture change There was no need to do that But they felt this is required because a lot of time is just wasted to create a node add it to the cluster and then run it So they felt that we should we should do it. So they learned ansible No one in the bank knew ansible. So they learned ansible They they wrote the playbook and now in 30 minutes They are able to set up open shift cluster across multiple uh Data centers Including the the network five store storage that is all ansibleized So that that culture change brought a lot of shift towards people adopting automation And this this becomes crucial because Then you then you started to see the patterns that I want to write more automation on the tests because only then you can be fast So we started creating this culture that you have to have unit tests You have to have functional tests you need to test your apis Because with microservices you have the only thing that you have is apis You need to test the customer journey for those apis You need to have performance and security test as a feature Not something that happens towards the release Times that you have to do va and and penters and everything we we we made we started to make sure that These nfr's non-functional requirements became a feature for the stories or as part of definition of dance to To make sure that we are not losing on the rhythm We we also started to have these weekly dashboards a visual performance dashboards Which are very critical from a management perspective because for them is how do I know that it is working? So we started to have these visual dashboards which clearly Says that hey the code quality c gpa is dropping or going up from the previous print Or the api test coverage has dropped. It's so so buckle up. You have to Take care of that other aspect that becomes important is monitoring Like andrew was talking the morning about how should we monitor a system? There is no there is no single tool that I have personally not seen a single tool that is so efficient that it Helps you to monitor especially with microservices and add to the icing on the cape and when microservices are Not orchestrated, but they are choreographed, which means they're completely asynchronous It becomes it becomes so much painful for us to understand and map it to the customer journey For a customer journey to work efficiently. You need to know what microservices are doing What and where in which message queue? Where are the topics and whether they are actually functioning the way they are supposed to function? So it's extremely important that you need start building those constructs because all the person who knows the most about is the developer Your ops guy are not going to know Because they have no freaking clue where it broke And what queue has the message that is required for a service to pick it up So we started creating these Monitoring and performance dashboards including operational dashboards Because it's a bank you will definitely not have the full flexibility to run the software or access to production There will be some operation people who are going to be involved in this whole running the software So operational dashboards became the responsibility of the devs. They need to create these dashboards, which is the services need to know How I'm working the health check The final thing which is to keep track of the DevOps transition This becomes extremely important and what I have seen in my experience a lot of the organization what they do is they cherry pick They pick the ones that works for them and leave leave the stuff that is very difficult to implement So this is crucial that we that we have a snapshot to show The practices that are required to to implement DevOps in the organization and where do we stand This is a snapshot of where We stand most of it, but we are lagging behind on a lot of stuff. So this is a journey That we are taking we are still Far away from saying that we are a DevOps organization but We the the journey and the culture is changing mindset is changing people are starting to Adopt new way of working new tools because in banks people are always scared if I take this open source tool I will implement it But Where is the enterprise license for this? Who is the ringable neck if something breaks? I need to go and grab someone And that person has to come and fix it That's a culture that I have seen in large enterprise including banks and people who work in banks Can basically I see people nodding their head because that they have experienced it There is for any product or solution that that enterprise is used There is always someone which is holding the enterprise license Because if something breaks and I need to catch someone So we are on that journey I would like to end the session with this beautiful statement. The most important is the The the last sentence DevOps isn't something you can buy It's something you have to do And the most important part is you have to do it yourself Typically enterprises you will find out someone who will come and implement DevOps for you But that does not work then that culture will not be there You have to build that culture and building that culture requires that you do the DevOps yourself Why there's the last slide that which when I was Having coffee in the morning. I read it on twitter. This is a beautiful statement now computer shall not harm your work Or through inaction allow your work to come to harm So it's all about people. It's all about mindset That's all I had. Thank you very much and yes Btpn is hiring if you all are interested then we can chat offline Thank you very much Please again go and meet at home Other topics both Does anyone actually share a live stream with your colleagues? One two Please share with your colleagues. I think it's very good that you are present here and you get all the knowledge But then when you go back Interesting talking we saw it coming up over the years and it's So Okay, it's working Okay, so let's start. Hello everyone. My name is marching velgus seven years ago I joined google and since more than two and a half years. I have been working on the kubernetes project Hey, google is an engineering driven company or at least tries hard to be such And engineers including me are often lazy And hate repetitive boring tasks like manual maintenance of their applications Moreover with large scale large scale application Manual tasks are often impossible to do like managing thousands of services that handle search queries Or show these stupid cat videos on youtube It would require an army of devops people and still it would the results would be uncertain So over these years google built an internal system called borg that manages the containerized application So that the engineers can slack off and enjoy legendary free food instead of worrying about the deployments at least in theory Based on the experience with borg about three years ago A small group of google red hat and corals engineers started a project to bring google style container management infrastructure to the outside world But what exactly are these software containers? How many you have used docker? Please raise your hand Okay, almost all but a few of you didn't raise their hand So for those who didn't raise their hand. There is a super quick explanation Probably everyone here is somehow familiar with the concept of virtual machine Simplifying a lot virtual machine consists of operating system Application and a bunch of libraries and the dependencies. It runs under some other operating system VM provides an isolation the application running in one vm Interfering either with the other apps or the hosting system at least in theory It is usually quite easy to take a vm from one machine and run it elsewhere So in principle vms are super useful for building and releasing software The ability to move vms from one place to the other often results in having couple of them sitting on a single machine Quite often the hosting machine runs the exactly same operating system as you have in vms So there is linux inside linux outside repeated multiple times And the resources are wasted We cut off this extra OS from vms while maintaining the decent level of isolation and high mobility of The thing we get a software container software containers are meant To be lightweight and easy to use And this lightness invites engineers to have many of them Especially when developing microservices If many containers require many machines With more than a handful of machines in different sizes and different shapes One click quickly loses any desire to manage the whole thing manually And for that Some system is needed Like kubernetes kubernetes is a system for automating deployment scaling and management of containerized application Is 100% open source and When we say open source we really mean so with the Release of 1.0 we donated the whole Source code of kubernetes to a newly created cloud native Computing foundation so that google doesn't own the system It cannot close it and it is a joint effort between all of the people involved It runs on cloud as well as bare metal. It supports docker rocket from coro s and other container run times Okay, let's take a look on kubernetes details A pod is a main building block of the kubernetes environment A pod is a set of containers that should be run together on the same machine Containers have the same ip address and can easily communicate with each other via the local host They share volume so that the data produced by one container can be easily consumed by the other One of the key parts of reducing maintenance cost of an application is a detailed application specification Pod api is meant to allow application developer to express all aspects of pod life circle In a declarative way that is understandable both to humans and container management system From the high level It is divided into two parts. The first part describes what containers should be run within the pod It defines it defines the image to execute The cpu and memory requirements For each of the containers as well as well as the list of volumes that should be mounted there The second part covers things like note preferences Information about friends and foes how to and how important the pod is Large part of this information is used to automatically determine which machine should in the cluster should run the pod Kubernetes stores information about the capacity and capabilities of each of the machines and The list of pods that are run on each of them As I said a minute ago containers declare how much resources they need in terms of cpu memory Maybe gpus and the sum of all container requests is calculated to get the total needs of a pod for example on this diagram A small this spot on the Left hand side has small requirements. It can fit into the second or to the third node However, a bigger pod has less nodes to choose from. This one can only go to the third node Scheduler won't put it on the second node and if the user Try to somehow manually place the pod there. It will be rejected by the node management agent called kublet Some pods have more sophisticated requirements They may want to run on specific type of nodes. For example nodes with SSDs Instead of regular hard drives or attached to a better network or nodes that are pre-allocated for them Kubernetes user may assign labels to nodes and later consumes them in pods via a node selector on this picture We have a node which is labeled as green and a pod that wants to run on a green node So that the green pod will go only to a node that is marked green And if you don't specify any label selector a pod can go to both green or maybe yellow or some other node Some nodes can be really special in terms of capabilities or maybe prides Nodes with gpus on aws or google cloud platform are a couple times more expensive than nodes without gpus So as a cluster administrator You may want to put a kind of gatekeeper in front of them So that only pods that are really really meant to be run on these nodes are scheduled there Appropriately configured things make nodes unschedulable for pods that don't Don't have an explicit toleration for these things Some pods like each other so much that they should be placed on the same machines if possible We called it pod affinity For example a web serving application might benefit from being close to the distributed cache instance Sometimes the situation is opposite Pod shouldn't be placed together It might be not the best idea to place two super IO heavy application on a single node Or have a couple of instances of the same super mission critical server running on a single node What if all of the nodes Are already occupied and you want to start a new instance of the super hyper mission critical application Well, there are priorities With the information about how important each pod is the scheduler can't find a node running low priority pods Remove them and make space for this superstar pod As you can see pod specification Gives quite a lot of flexibility It allows engineers to express the needs of the application in the terms they understand CPU memory friends falls node preferences They don't need to do extensive planning or manually find machines for the new shiny microservice It will be done for them Automatically, they just send the pod definition to the system And kubernetes will automatically figure out what to do and what actions need to be taken in order to run this pod on the Node that matches the pod needs Okay, but what if the cluster is simply full? In kubernetes, there is an add-on component that we called cluster autoscaler Cluster autoscaler monitors the pods that failed to schedule And once it finds such a pod it checks if adding a new node Similar to the nodes that are already present in the cluster would help It simulates the scheduler and if the scheduler would place the pod on the node Should it be there it calls the cloud cloud provider api and simply buys you the mentioned node And minutes later when the node is up and running the pod is placed and started on it On the other hand if there are too much resources in the cluster And moving some stuff around would free the resources the appropriate actions are taken By the cluster autoscaler to restart the pods on some other nodes so that the unneeded capacity is deleted And the monthly bills are lowered on the picture. You can see that moving pod from third node to the Second node would free The third node and it would be okay to remove it The node is removed and now the cluster is smaller. It has all of The pods that you had before but now you pay for two nodes instead of three If you run on one of the popular clouds cluster autoscaler allows you to worry less about Infrastructure and capacity planning if cluster autoscaler finds that your cluster is too small To run all of the pods it will scale it up. And if you have too much resources it will try to scale it down automatically So there is a way to put exactly where they want. There is an infrastructure that automatically adjust to the pods needs What about application how to manage multiple instances of pods? Well for application there is a concept called leplic asset and deployment which i will describe in a moment Replica set makes sure that you have exactly and instances of your pod up and running like free engine x servers It creates the pod based on the user provided template and allows the scheduler to place them in the best locations Just like here the user asks for free pods And replica set created these free pods. So these three almost identical pods The scheduler put them on the appropriate nodes and everyone is happy But sometimes something bad may happen to the pod for example The machine it was running on crashes In such a case replica set recreates the pod and scheduler is likely to put this pod on some other node like here Hopefully there is still some free capacity in the cluster. But if there is no Cluster autoscaler if enabled of course comes into play and makes sure that the pod gets the place to live or the Priority and preemption mechanism comes into play and some of the low priority pods are killed and this space is created Deployment that is built on top of replica set gives you other nice features Rolling update and revision control Usually you don't want to update all of your replicas at once, but slowly Updated one by one So you want to wait until the first updated replica boots up And when it's up and running and then you start and proceed with the Start the next update and proceed to the second and to the third node Okay, so we have our end pods up and running But what is the right value for n? In the previous example, we had three replicas But why three because some up developer said that three would be okay Come on application developers cannot be trusted They probably just handpicked the number and claimed they did some performance tests And even if they did this performance test, the real world traffic might be completely different It might be bigger smaller or have completely different characteristics So there is a need for mechanism that will automatically adjust The number of replicas to the load in the application and for that purpose. There is horizontal pod autoscaler in kubernetes For example, here we have four replicas that are burning their CPUs. They run About 90 95 percent of the requested CPU capacity and if any spike comes This whole thing will simply collapse and crash So there is a clear note for yet another replica And when this new replica is added the traffic will Go down on average on each of the nodes On the other hand if our deployment is having some slower time and The load is small than maybe removing one of the replicas wouldn't be a bad idea Then the load of the other replicas will go slightly up But your cluster will have more free resources that can be used by Either other application or might be collected by cluster autoscaler So you don't pay for resources then you don't need And obviously cluster autoscaler priority preemption mechanism also helps if a sudden spikes in traffic Drives the number of pods outside of your cluster capacity In general, you probably want to establish a target for target load for your deployment If the average load goes above the target the new the new instances are added And if it goes below the target instances are removed In our example, we use cp load measured on the system side as a metric for scaling However, the same operations model can be used with other metrics like for example the number of requests per second That is exported by the application itself With horizontal pod autoscaler you get more peace of mind If for some reason the traffic to your application goes up the system will react accordingly Engineers will still need to run their performance tests HPA horizontal pod autoscaler is not a silver bullet against all performance problems and bottlenecks But it may save your evening when your continuous integration slash continuous delivery pipeline pushes A version that has let's say 25 percent of performance degradation And who needs more replicas? Moments ago I claimed that developers usually have absolutely no idea how many replicas they need What about the resources? Can they really really estimate how much resources they really need? Well, the answer is no For that reason, some time ago we started a sub-project that will automatically adjust pod resource requests based on the current and historical needs Vertical pod autoscaler hasn't been launched yet But hopefully sometime in Q1 next year we'll have a component that solves the resource estimation problem Okay, so we have enough pods in the right version running on the appropriate machines What else need to be done? Well, the application is not guaranteed to always be in a good shape Due to programmer errors it might hang in some corner cases In such situation a short-term solution would be simply restarting the application Hoping that it will run for some time before it crashes again As a DevOps or on-con Pelsoron You don't want to be called in the middle of the night to just restart your pod You want a system to do it automatically for you And for that specific reason we have probes That can be defined on pods There are two types of probes in Kubernetes. The first one is liveness probe It checks if the container is working properly and if this probe fails the container That failed this probe is restarted The other probe is readiness probe. It checks if the containers are ready to serve the traffic and If the pod hasn't been yet fully initialized, the traffic is not sent to your pod With these two mechanisms, you can have Some type of guarantee that your deployment is healthy and your requests are appropriately handled With probes, you can have less on-call alerts So it's a good habit to define them for all pods that you run You may even go to extreme and to require them from all user-facing pods Kubernetes allow you to expand the admission checklist That is validated whenever someone issues a new request to Kubernetes So that you can plug in your own stuff there For example, that will do this probes checking and will reject pods that are run in production that don't have probes defined Or you may reject pods that have images Coming from unknown source Or you may execute some custom initialization when you, for example, create a service The possibilities are huge And they allow you to practically avoid problems In a similar fashion, you can get storage for your application Kubernetes may automatically fulfill claims for persistent storage and automatically adjust the new attach the newly Created storage to your application In some environments like google container engine You can also specify auto upgrade and auto repair policies for your nodes So that you always have the newest stable version of Kubernetes running on your nodes or master And if, for example, the docker demon breaks permanently on the node, this node will be automatically recreated All the capabilities I described during the last More or less 30 minutes allow more seamless deployment, more robust infrastructure, less alerts, less production issues It moves the responsibility of keeping the application up and running from system administrator towards development the By simplifying stuff by allowing them to specify what they exactly want to achieve with the application And by building trust into automation That can handle the regular operations for them Restarts, resizes And simple adjustment don't need humans Humans are better utilized when building the software itself or defining policies how to run the software And I see this approach as the future of dev ops more dev left ops. Thank you If you want to learn more about kubernetes, please visit kubernetes.io or catch me during lunch or one of these open sessions Sorry During lunch or open sessions you can catch me Okay We've moved the tables so uh, we actually will have two rows You can go from the left side on the right side. So hopefully this lunch will be much faster We will do the again pigeon hole Again for those who haven't used pigeon hole before Go to the go to this link Like a question here And A poll That's it Um, we have one hour break. I'll see you at one p.m Thank you Uh Do you use Very very Do Good afternoon everyone how is the conference so far how's the lunch Good good. So we're supposed to do the open space kickoff I guess everyone else is already familiar with how open space flows and You're all already commented that you are using pigeon hole Um, so what i'm going to do i'm i'm going to start reading the questions You know and again, we'll do the same as we did yesterday the top 12 will be chosen for those spaces And by the way, hope everyone didn't get the t-shirt yesterday The contractor brought the t-shirt. So please drop by our registration booth and pick up a t-shirt your size All the participants get access to all the press or shared. Yes What do you look for in a devops job? How can we attract talent to devops roles? Who's field? So what do you mean? That sounds interesting. I could be a really good helping about the space. I just thought that maybe it's a recruit Yeah Continuous development And employment as a part of devops are we going to cut down on the presentations like we are the fsd test cases Who's shashin? So what do you mean? We're more talking about the classification test cases So i'm saying that you don't just cut down that Let's start with Those documentation we are doing So the reason why i'm asking is just to make everyone else aware who asked the question and you know just to start the conversation Any thoughts about getting rid of a manual change release management or cover process? Who is Mishra? Igam, Igam are you here? What do you mean What do you want out of an office space discussion? From the manager perspective, how do you measure full spectrum engineer and animus? Is anyone interested in this discussion? Obviously Many people voted and I noticed one of the persons who are voted raise their hand and say Why do you think it's a good topic to discuss in an office space? If nobody raises their hand, we'll just skip it Anyone? That's an important topic. I just wanted to make it here Flavio who helps with the tenure Who wants ice cream in afternoon and won't use the polls as the number of ice creams So there will be a big fight Unfortunately, we are not able to capture who voted so there might be some intricacies and senties Again, the The slides will be posted on our website. We do a live stream and we'll also record all the talks So they will be made available I don't want to tell you a date, but it will be soon Two weeks maybe more just so that we capture it make sure everything is in sync Event notification We usually don't don't want to spam people but if you can follow on the twitter Just follow the the twitter and we will publish it And again the website that's for table stays all in a program. It can be a link added The slide and the youtube video So just watch the website, but if you want an email, just give your business card and I'll notify you Why are there so few women in doubt? Actually, I think it's a great Discussion with all these things. We should be more inclusive. We should be more inclusive and We tried as a As one of the organizers, you know to ask Women to come and speak because we want we see it just in the attendees in the speakers list That we don't we have under representation So I think it's a bad thing for us Because we're using on so many opportunities to help different ideas from uh, so definitely How to persuade management that I teach engineering teams need to educate that time 20 30 to clean off technical debt We've gotten the outside years here. Who's constantly second? What do you think this is? to have Question team. Well, we have six folks with anyone else raise their hand and saying that this is a great topic to have a discussion on this place Can any of you just comment? Why do you think it's Antonio I believe that we should be doing better job on that I think we can have like Through discussion on what are the formulas See if someone actually achieving here Anyone else want to comment who raised their hand Why are we Preserve complexity instead of introducing better more simple solutions is simpler even possible Who asked this question? Yeah It's a variation of the one that I had yesterday So do you want to continue that discussion? But if anyone is interested in it How what do we learn from the stuff that we do in terms of how would the architecture of the next generation solution That's easier Or can we just easier DevOps agile old shift focus from individuals super heroes to the team Most of the prizes still use stack ranking other alternatives that works on lines up with teams versus me Hello, Chrissy. I should hold on Chrissy And who's jacquoise the best way to teach our Our users and reduce the resistance when encouraging them to move towards devops and I was anyone Or we think this is a great topic for an office space Anyone wants to discuss about it Can I know why please stop it here again, uh, here's your chance to actually add topics Both them is very easy just grab a mobile phone Go to this url type of question and unfold. I think it's We're all here Acknowledging that we have a problem in the industry So the best way to learn and come up with a solution is by learning from others and sharing with others So without further ado, I would like to introduce Matt Ray Who is our final Keynote speaker is Do the closing keynote. We still have the big nights and we still have the karaoke. So there's still a lot of fun Matt Yeah, so I think Um, one of the reasons why we chose this topic is the theme of the year for devops days It's actually everything as code. So we started in the previous years on the infrastructure. I'm really monitoring those those Parts of the organization where we wanted to automate an automation on the infrastructure How you build a platform However, we see more and more where you have Compliances code, you know all the other line of businesses like recruitment or a charming It begins to be codified So we would also like to include that in the in the devops ecosystem anything and everything as code approach to it I'm going to be talking about Compliance as code and and the idea of Being able to audit all your infrastructure and applications and everything above And so let's let's go ahead and get into it. All right. So my name is matt. I'm The manager and solutions architect for apj for chef That's just basically the way of saying that like I'm over here in apac and doing a whole lot of technical stuff And I've been at chef for about seven years. So that's like forever in startup time and Done a lot of things over the years professional services consulting engineering You know community management Long background and open source I'll say you get a hold of me. I also have a podcast I do with some friends Andrew Schaefer was uh, he's been a guest on it a few times called software defined talk if you like to hear me ramble It's not professional. So So starting off, um Who's got software defined pipelines, you know cicd, right? You've got some sort of cicd infrastructure and You know, I used to do some development like that. I'll try to stay right here Um, and we had you know dev to qa to staging to production right fairly familiar kind of stuff And you know my pipeline was working well. We're coding away things that are getting delivered in a continuous fashion Doing test driven infrastructure. Uh, you know, I've got unit tests for everything. It's it's moving along nicely And then one day out of the blue, uh, the auditors show up, right? Anybody work with auditors, you know, they show up every quarter every six months if you're lucky Maybe once a year, but their job is to see what's going on and to stop you from releasing code, right? Um And they start off with a big binder or you know, maybe pdfs, but you know gigabytes of pdfs And so they've got these these rules and regulations and they you know slap it down on your desk and like for the next two weeks We're going to read through this thing and you are not going to release features You're going to stop what you're doing and we're going to make sure that Things are set up right. So you open it up pop it open And the first thing we we see is uh an ssh control and you start to read it. It says, you know, hey ssh supports two different protocol versions Uh, ssh v1 was broken a long time ago. It's not maintained. You should not be using that Please use ssh v2 And you're like, all right. I got this Um, how am I going to verify this right and you're you're a sysad man Uh, you you know, you you know a little bit of uh, a little bit of bash a little bit of pearl Uh, so you're like, I'm going to whip up a one liner. I could check To see what version of ssh we're doing and uh use grep because everybody loves grep And a little bit of sed because I'm on old school units of sad men And I'm going to be able to see which version of the protocol I'm using I'm like, cool. Check the exit code and we're good to go and then the next thing you know Open it up a little further and you get to apache server information leakage And this is essentially like you go to a website and you crop off the html page and it drops you in a directory and it says Apache 1.6 on solaris 8 and everybody knows like you shouldn't be using that old version of apache You shouldn't be using that old version of solaris. So you don't want to you know leak that information So that's what this is saying like don't let people know what version of apache you're using and which os So you're like, all right more grep and sad. I still got this. I can handle that I'm going to look for the server tokens. That's how I know Uh, that I'm serving that stuff up, you know, you've you've seen this before right And then they're not so fast I've got the thing called the center for internet security benchmark for centos 6 Uh, if we look carefully, we'll see it has 172 pages um And and you're like wow 172 pages that is a lot of grep and and sad And uh, I'm going to be beating my head into a wall For a while, um at least for two weeks, right? But you'll make it through because the auditors they're like, hey, you know what if If you if you can pass this audit we can come back in three months and you're like, all right Whatever we need to do to get a passing grade of a c-minus Um, you know, we'll we'll get the auditors out and we can go back to having fun and playing golf Um, if that's what you're into, you know, so the auditors you finally get through it Uh, you get them off your back the machines are passing the audits And you can get back to delivering code, right? You can get back to delivering features, but you know The security officer said, you know, that was kind of painful. Let's introduce a security review step to releasing, right? Anybody have a security review team? You know, uh, we were talking about the idle where you got to have the checkoffs and the signoffs and the release and You know who everyone's got to prove things It's just it's just a big dam because everything's gonna you're like, hey, we're doing continuous integration continuous delivery We would like to release 10 times a day and the guy's like, all right, that'll be two weeks You know turn around and you know, wait 10 times a day is like well 10 times in two weeks, right? So it kind of starts to back up. You're right. You you you introduce this roadblock There's this dam and all your features are held back, you know by by hoober dam Um, and so compliance becomes this wall, you know, they become this line that says, you know, you have to go through us Um, and so you're like, all right, we'll we'll slow it down Maybe maybe you're thinking, you know what? We'll we'll just release it and we'll scan things in production after they're out the door, right? That way I still get to do my continuous integration my continuous delivery But uh, you know my code's been released into the wild and I turned telnet back on and and you know There's all sorts of craziness going over there, but we're scanning. We're going to find it eventually Hopefully fairly fast. Um It's not really the greatest of patterns, right and the problem with compliance and security It's not really something you can bolt on after the fact, right? You have to kind of build it into how you operate You have to make it part of the process. You can't say every three months We're going to think about compliance, right? Every three months. We'll make the auditors happy Get our systems secure and then they become unsecure again, right? And we've seen this right the the Verizon put out a state of PCI compliance report I need to get the the 2016 version, but I bet it hasn't changed Um, you don't actually have to read it right there on the cover It says two-thirds of organizations did not adequately test the system Uh, the the security of all in scope systems So people aren't even checking the machines they have today. Does that sound familiar? Yeah, nobody wants to raise your hand. You don't have to Um, right and and what we've seen is these trends are continuing. You're getting more systems You're moving into more and more diverse, uh application sets and sustainability is low keeping up with the amount of change It's just not happening, right? And the good news is, um, We now live in a transparent society because all your information is online You just don't control it, but you don't have to pay for for uh for security for Credit checks, so that's good, right? Uh, so yeah security theater is a thing, right? You go through the motions You get your compliance officers happy you pass the audits you move on even though It's a tire fire in the background, right? There's just this burning mess of of things that aren't getting better But you know the auditors aren't incented to like stop you forever because if they piss you off too much They're not going to be invited back. Um, you'll find an auditor who's more compliant Right, so you've gone through this process. You've got Reports you've got PDFs You've generated all these, you know sign-off signatures that hey you have a whole bunch of paperwork that says you are compliant What does that mean? It means you take this big binder and you put it on a shelf And it sits there for the next three months until you get back to it, right? And so what you're left with is a whole bunch of shell scripts that are a little bit hard to maintain I mean, I know grip and said but maybe everyone doesn't and keeping up with something like the sys benchmarks For the various operating systems. It's going to be a little bit a little bit troublesome. Um, and and and you've got You know, you've got your your your engineers Who are using you know tools like chef or puppet or ansible, you know, they they're doing infrastructure as code They've got their own set of tools, you know, they're not using grep instead anymore. They've moved on to higher level abstractions like like chef And you know, so we've got this communications problem where everyone has their language of choice You know the compliance guys use excel And and you know, they they there's a mismatch in how they work together And so one of the things we know from the holes, you know You know, we had the infrastructure as code conversation yesterday is humans should not be logging into machines You know, you want automation you want, you know, you don't want humans talking to boxes You want scripts doing it you want, you know, chef recipes, you know ansible playbooks, whatever You want to have an interface that goes through code You know, that's how you want to talk to your servers, you know, because people don't scale And one of the things we talk about a chef is the idea that, you know, tools and culture are reinforcing, right? You can have The best devops team in the world using, you know, bash and cvs, right? But most people have moved on to better tools, you know Most people are like they're using git and and you know vsts and and other you know and jinkins And they've moved on to to higher level tools because the tools reinforce the patterns and the patterns reinforce the tools You know, you want the tools designed for the way you want to work, right? And so what the way we want to work is through code We want to have a common interface to talk to our infrastructure, you know We've already got that for automation of our servers and and and you know our infrastructure But wouldn't it be nice if our compliance devops and security folks are all on the same page? So that's what we're going to talk about now. So in spec is an open source project from chef It's a patchy licensed It's uh, I'll talk about the open sourceness of it a bit But let's dive into it. It is a compliance language It's based off of the spec style of testing. So if you're familiar with unit testing spec is in general A style of a very human readable format, right? So Over here we've got uh, don't worry Down the bottom we're describing the ssh configuration we want to see, you know And we want the protocol it should equal two, right? Your auditors could probably understand this, right? They're not sys admins, you know, they don't know pearl and and and grep and set and stuff like that But if you tell them protocol should equal two they can read that, right? And if you have a junior sys admin, they're like day one They they understand that but it also has some things here that your auditors are looking for the impact You know, is this critical, you know, if you're running ssh v1, uh, you might as well be running telnet, right? Um, you know, so the this is a critical thing You also a title on this description and tagging so you can generate reports of this stuff So your auditors will get a nice big report that says hey ssh protocol equal two, you know The ssh control one two three four Um, it passes and if they want to drill down they can see what that's about, right? So this is what it looks like It's one language, uh, it works on linux, uh works on windows, right windows everyone's got some windows and we're always, uh Worryed about the state of our security and compliance there. Uh, so this is what a windows control looks like Um, a registry key, you know, uh, this is strong windows ntlm v2 authentication This is how your windows machines join an ad domain And so you what you're saying is we want the the new the new authentication enabled So we can't have those windows xp machines joining our ad and so we look for a registry key You know your windows admin you've seen this before The key it should exist, right? Very human readable and the the value should equal four, right? So that's that's high level readable understandable Um, it also does mac os Solaris aix hpux bsds There's a lot of stuff built in to in spec. So In spec has, you know, pretty much anything a Operator is going to care about in their infrastructure, you know, how your Windows registry set up What services are installed what packages are installed what users are directoring What users are doing what directories are set up permissions? What's installed what hotfixes are there the things that your auditors are looking for right? They've got this big checklist of oh is this installed is this patch there is this patch there? You know, we're bringing that into code, right? Um, this isn't uh intrusion detection. It's not A firewall it's not an advice it's not a pentesting tool, but you know security Folks like this right because it allows them to just quickly blast it onto a bunch of machines to see What's going on in the infrastructure? Um, it's just another tool in your toolkit Right. Uh, it does bare metal. It does vms Works with containers. I work with docker So that's that's all in there. Uh, it also works with databases, right? So we have the ability um to connect to a database and run queries against it, right? So we can connect to your oracle database or your mysql database and run a query that says something like Did we ship the default user with mysql? Yeah Because that's something that happens sometimes and so you can add these sorts of controls around your databases That's that's fairly handy and very human readable Also can talk to apis and this is this is where things start to get kind of interesting, right? So there's an http Resource so I can just make a query to You know a web server or you know an api and parse out the results that come back from it Um, but I could also talk to clouds, right? So I could talk to aws Sure, I could talk to vmware the vSphere api and run queries against that and see things like uh, do our security groups have Um the proper inbound You know the proper inbound rules, you know are the amis that we're using on aws coming off an approved whitelist, right? Do you know which amis you're using right? Do your users have multifactor authentication enabled? So we've moved on beyond just traditional windows and linux and and and and you know Unix and stuff and actually moved up to the next tier of let's talk to the infrastructure layer itself, right? Um, so in spec works, uh remotely So in spec connects to machines via ssh, you know, so your your linux your unix machines We're going to be able to connect to them and run a script remotely against them So you don't need an agent like a chef on those things. They're it's not related to chef at all So it actually connects them allows you to scan machines and see what's going on also works with winrm So you can connect to any windows machine that has winrm enabled. So right now it looks like 2008 is about where that cuts off If you can get in spec installed on 2003 it does work I can attest to that, but you know, we mostly stick with the supported operating systems So you can connect to your windows boxes I'll also connect to a docker host and check the state of containers running on it for compliance, right? So that's that's all pretty handy and that's all completely agentless You can also run it locally and just you know, check the state of the machine that you're having that you're on So, you know and it works with cloud platforms and a lot more stuff is is in is in progress We've recently started working on networking devices and networking devices will give way to storage, right? And so then you can start to say like what's the state of our sysco routers? You know what what are you know our rules? Our firewall rules. What do those look like inside the data center? And so it allows you to do things like lock down your pc i compliance in the data center and start thinking like Well, if I want to move to azure What level you know, I can take those same pc i rules that I have in my data center run them against my azure infrastructure Also, check the state of my azure compliance when we get there. So that's that's pretty handy So Inspec is a completely Apache licensed open source, right? This means that you can take it embed it in your commercial products do whatever you want with it Inspec.io is the website that it all starts at Um, there's the github repo It's up on supermarket, which is our chef community site people are sharing compliance profiles So you have your controls you package them in a profile and the profile is the set of controls that represent how you're testing something so Up on supermarket, we've got profiles. There's an open source group called devsec.io That does hardening profiles, you know, what's the state of my windows patching? You know, I want to harden my ubuntu linux You know, what does that look like? And so they've got a bunch of compliance profiles up there And then learn.chef.io. There's completely free Tutorials that allow you to test your compliance You know your in spec and learn how in spec works On linux on windows on your laptop with vagrant on azure on on vmware on aws, you know, wherever you like to run it and of course we have a slack channel that has uh I don't have a number it has like 600 people in it. So a very big Thriving open source community. So just this year, you know talking about being open source over 100 You know over 116 pull requests this year from non chef employees And that's kind of the one of the signals of a healthy open source project, right? Is the patches don't come from one company And so we've seen A whole bunch of new resources added We've seen contributions from the likes of of oracle of microsoft of hpe hpe Of uh, you know, eccentric deloitte pwc You're starting to see this get picked up by the auditors, right? Because this is the grunt work that they don't want to do And allows you to get to the higher level conversations of like let's start talking about remediation and you know How we get the auditors to to provide an extra value And so this kind of comes back to a very simple process, right? The first step is detect you need to find out what's wrong You know earlier talked about the the state of dev ops report One of the things that we saw is that 55 of organizations have no idea What the state of their compliance is, you know, either little startups who don't care Maybe they're not touching money. That's a venn diagram of overlapping people sometimes Um, but uh, you know people who don't touch money don't have auditors That's that's that's kind of where that starts. So most people have some sort of audit requirement that they're not Working on and so the first step is really just simple. It's detect. Let's find out what's wrong with your infrastructure And the second step is correct, you know, maybe maybe you put chef on those boxes Maybe, you know, you're using ansible Maybe you're old school and you're still managing machines by hand But you need to correct the things that get flagged as you need to be remediated And then the things that you fix rather than continuing to like, you know, fix them by hand Let's automate that. Let's let's, you know, let's move the process forward And work towards continuous compliance. It's a very You know a simple process of find out what's broken. You know, this is not dev ops, you know, scan the machines Find out what's broken. What's not configured properly Which machines are still vulnerable vulnerable to want to cry? That kind of stuff Find the machines that are broken Correct them, you know, ideally with with some sort of a config management tool and then automate that process So instead of, you know, constantly being, you know constantly fighting fires You build a little bit of, you know Good hygiene around building machines right the first time, you know, maybe maybe when you go to build your golden images You scan them for compliance before you snap, you know, take that snapshot, you know But then you take that same compliance profile that you used to build it and you run it in production So you're testing day one and day two problems And so this is, you know, this is not dev ops, but this is getting you towards the motion So when when we start to talk with customers about, you know, oh, how do I get started with dev ops? You're like, look, it's hard, you know, we've heard a lot of talks about how hard it is, right? It's an arduous process. So What we want to do is just get you going through the motions You know, if you start, you know, acting like a duck and quacking like a duck, maybe you're a duck But, you know, maybe if you go through the motions at dev ops and get used to the idea of, you know Testing your infrastructure before you deploy it, you know, that's a radical idea for some organizations Backing your compliance as code. That's kind of a different idea But what you want to get to eventually is this utopia of Achievable utopia of all your infrastructure being defined as code and all of your compliance being defined as code So when your auditors show up, you say, hey, look, we have this big data center No humans have been in there like we don't allow access by humans The machines have all been provisioned, you know, chef or puppet or whatever There's no humans on these boxes Those audits that we were going to step through for two weeks. We've automated that Would you like to review that code or would you like to review the output of those reports, right? That can all be automated and as you start to work through that process, you know You build up that muscle of hey, let's write tests for our infrastructure. Let's let's write, you know Let's uh You know start building up that that muscle and and working towards it You know commercial pitch, of course, that's what chef automate our commercial offering helps you do And we're shipping a whole bunch of compliance profiles with that But yeah, that's uh, that's the journey, right? That's the journey. So I think Yeah, learn learn.chef.io has completely free tutorials get you up to speed on inspec You know allows you to to go and test machines A little bit of cross pollination with the conference upstairs After this i'm going to go teach a workshop on this So if you want to come to windows configuration management and inspec At the power show conference, maybe you can get in for free. I don't know. Um, but we'll be doing that upstairs Yeah, so do I have time for questions or Do I have time for questions? Yeah Yes Yes Yeah, yeah, so the question is, you know, um chef configures things and inspect detects You know, what's wrong, you know, it audits so inspect doesn't do any correction You know, it's completely a remote reporting tool, you know, hands off your infrastructure Um, and so the question is is there remediation available, right? Yes. Oh, okay. Are there existing standards? Right, right, uh, so there's a group called the center for internet security It's a industry consortium. They have they put out PDFs free PDFs of What they call the sys benchmarks, right and they put them out for every commercial operating system There's one now for kubernetes. There's an aws one They've got a couple of applications in the mix The pdfs are free The and then they make microsoft scap profiles. So scap is an xml format that system center can use and run audits Um We can translate those to inspect and but we're not allowed to open source those So the pdfs are free. It's like that weird gray area of free but not open source but we can ship them so, uh, the Sys benchmarks are part of our commercial product But you know, so we ship all the sys benchmarks. Um, the dev sec i o ones though are free, right? So those are just, you know open source community building up hardening profiles You go read the sys benchmarks and if you take something like, uh The windows 2012 r2, uh benchmark it is 280 controls, right and it's checking registry keys and patches and that kind of stuff Um, you could go and implement that yourself on supermarket There are a couple of sys benchmarks where people have implemented them, uh as well So there are a lot of existing ones. Um, and one of the things that we've started to see is as you know People are adding Inspec to their toolkit like a lot of service providers if you are an msp And you have a bunch of customers and you're looking for an easy way to distinguish yourself from your competition Being able to say oh, yeah, we're running compliance audits in the background for you You know, isn't that cool or if you're an si working with you know, government or health care You can come and say hey, we have these compliance audits for hippa or Uh, I don't know the the singapore standards, but in australia. We've got apra, right? You know, we've got some apra standards So we've seen a lot of that where you know, uh, you know people are writing them Some of them get open sourced some of them don't but there's a lot of existing content Uh, the really nice thing though is all the documentation on on inspect.io There's examples of every resource in action. So if you're like, oh, what am I going to use the windows hotfix for What's an example of that? Well, that's how you can determine whether or not you're vulnerable to want to cry so there's a Oh There's a want to cry uh profile. So that that's kind of a fun story. Um, you know, you're everyone's familiar with want to cry right? It's that uh windows patch That windows vulnerability that showed up Based off the wiki leaks nsa leak Of some stuff that microsoft hadn't patched Or they had patched but they hadn't told people why they'd patched it Right and so the the wiki leaks leak was like here's how you exploit that and so then you know a couple days later It gets weaponized and starts taking over hundreds of thousands of machines Shows up on the news and so your boss calls you up and like want to cry. Are we vulnerable, right? and so about six hours after that hit the news we put up a blog post and Said hey microsoft patched this back in march came out of may And he the here are the hotfixes that if you have these applied you are not vulnerable if you Don't have them applied you're probably vulnerable Right and so we just put that up and like that took off like wildfire Um about three hours after that we got a call from one of our large customers and said we just scanned 10 000 machines Uh, we're not vulnerable, but thank you very much. You know, thanks for putting that out So what we're starting to see up on our supermarket community site is people are starting to do Cve's right a lot of the you know the named exploits like your want to cries your poodles your heart bleeds It's easy to check You know do I have the latest version of bash if I uh, if I You know connect to this machine with open ssl client connect. Do I get tls 1.1? You know that kind of stuff DevSec does have a an open ssl one that is maintained. So, you know, it's easy to stay on top of that So there's there are a lot of community resources available You know and it has a very healthy open source ecosystem around it, which is cool, right? um, any other questions Yes Uh, no no so the question is does the target node need to have an internet connection? So inspect can run You know inside the day inside the data center far welled off completely from the internet If I have a machine and I can you know Well, I can have the inspect run on that machine and then I don't even need remote access if I'm inside the data center I can say you know inspect ssh to that box And then the way the the source of those profiles can be local Uh, it can be off of a website. It can be off of a git repository Uh, it can be off of a private supermarket instance. So, yeah, you can we have lots of customers who are completely firewalled off We had a really good talk at chef conf, which is our user conference from a us department of defense contractor and they were doing You know military compliance checks and uh, completely firewalled off, right? So No, so the the target node, um, it does not require an agent, you know, the you know other than ssh or winrm, right? The Most of the compliance profiles are going to just use like basic shell commands You know, so they they some of it may be You know, hey, if I do a psa ux and grep that and check the responses. What does that look like? That's under the covers So inspect is you're going to be using shell commands mostly just the basics And on windows, it's usually calling Wmi or power shell commands, right? So there's a certain expectation of I think power shell v3 Is that right? I'm looking at steve and moraski because he Probably to so so like I've been doing some windows 2008 Support lately and that's where things start to get bumpy, right? So, you know, all the supported os's 2008 r2 and later everything works fine right and and most of the most of the shell command most of the the Linux and unix resources are shell. There's a few that are python. So You got to watch out because python not everybody has python You know, which sounds weird, but there's you know, if you're securing an operating system, you might not want a full-blown Language runtime on it. So, yeah It's minimal Yes Yes Uh, so I'll be upstairs at the power shell confee asia. Um, can people just go to that? Yeah, just come on with me. I'll let you in Um, yeah, so it's a three-hour workshop. Yeah, so um, maybe we'll run out of the machines and I don't know. We'll see what happens. Yes Yeah, yeah a lot of that Right, right right, so so I used to have a slide where I talked about bare metal and Part of that was like looking at your grub conf and making sure that was set up because there's resources for that Um There are some resources around bios Setup and I think it's going to call like dmi to code So you would have to have that installed Or maybe or maybe it's going through your proc and looking through what's exposed there It's easy to write new resources and write custom ones. Um, so if uh, if there's something that's not There, you know, the reason there's so many open source contributions because it's easy Right adding new resources is pretty easy. And so the bio stuff you may have trouble like, you know What works on del might not work on hpe, but you know Writing new resources is fairly straightforward and and dmi to code is where a lot of that information is hidden Um, or it's in proc Yeah, so yeah You have the ability to just read things off the linux file system really easy and and the resources built in for most comp files as well, so Wrap it up Okay, um, any other questions or Okay, well, I'll be over with chef booth for a little bit longer and then uh workshop upstairs. So thanks a lot y'all Good Yeah, uh next time bring your your friends bring your friends and tell you colleagues to to join the next year um Pigeon hole is looking good, which means always fun, but here's a chance to actually add a topic Uh, and I'm going to show it again Probably because a lot Of you haven't actually seen it So I'll go pigeonhole.it You enter your name And then again I'll ask the question You ask We are both Okay Now we have day and nights. It's always a pleasure to to see speakers and Ultra topic in five minutes Very concise very brief. You should not that much marketing. So I'll welcome our first speaker Um And you ready Okay, so I'm here to talk about automation and how I came to see the light So I often get asked. Why am I in it? But what I find more interesting is to tell Why I got interested into automation So I started as a traditional windows Windows engineer and as a windows engineer, there's going to be a lot of clicking Around in guis and this is windows 3.1. It hasn't really changed that much We have tiles now, but it it doesn't really make us happy to be clicking around The first couple of years in my career. I I want to say that I enjoyed it, but I didn't really enjoy it So I started taking my first steps with scripting and in particular because I come from the Microsoft world I did a lot of power shell scripting But I didn't really bring it much into production or Into my daily work because I at the time I worked a lot with SharePoint Any of any of you know the earlier versions of SharePoint were extremely hard to automate and We had some deployment skips, but they ended up taking longer than manually clicking through a GUI So while I kept on scripting in my spare time, uh, I didn't really use it much For my day-to-day work So for me, this really changed when I took a different job I took a different job in a different country because why not And here I was put on a project to migrate A product from a third party vendor Back to Back to a customer so It's it started off quite good, but then we found out that we actually didn't have any vendor support We didn't have any tooling And if we wanted to get the data out We needed to do it manually and manually meant clicking through a GUI So at this point, what do you do? Well, you get frustrated. You start screaming shouting I was hiding under my desk for a couple of days But after I recovered from that we started to look at solutions. Uh, one of my, uh, one of my colleagues He was really into clicking so what he did was He went through it manually We looked at the speed that was going We needed more than a dozen people to perform that for even a single site and we We had to migrate dozens and dozens of sites. So it wasn't a realistic option So me and another colleague, he was good at vb scripting. I knew a bit of power shell So we got into looking at what we could automate But because we also had a day Day-to-day stuff to do The first thing we did was get rid of, uh, all the manual tasks We were not all the manual tasks a lot of the manual tasks We were doing for day-to-day work So once that was automated the way We could start taking a look at, uh, how can we, how can we actually automate what we, uh, Well, how can we actually automate those migrations? So we started looking at, uh The documentation we could find online. We started looking at online communities how we could automate this So, uh After we found the information, uh, after about it took about two or three months We came up with a solution where we could just do a single click And a single person could perform an entire migration and we didn't have to Uh, we didn't have to do any any manual work anymore Uh, so the reason this was possible for us was because, uh What other people shared online there was a lot of code we use copy paste from stack overflow I think we're all guilty, uh, well guilty of that at some point And and because uh, because of the knowledge shared by other people we were able to Uh, we were able to build our own custom solution and Get, uh, automate our, uh Automate the boring stuff away and get involved Get involved with the automation so my biggest takeaway from my experience on this project was Make sure you have the right learning materials when you start whether it's a book video series or Uh, in person training course I ended up with a lot of technical depth and rewriting and refactoring scripts multiple times because I didn't have the right knowledge at the start So investing good training materials when you're going to start, uh with this furthermore, uh, the online communities were a big, uh, big benefit for me, uh, I learned a lot from it and There's a wealth of information that helped me out to automate other boring stuff away When i'm looking at what to automate, I always like to refer to this one I do like to add a third dimension How much do I hate a task because if it's a one minute task I hate I automate it even if it takes me a day to get rid of it So, uh, I've automated my stuff away. Everything is awesome. Thank you for your time Hello testing works Tell me when you're ready. Yeah ready So I try to compress this talk in five minutes. It's a little bit more technical. So Okay, thank you sir. So Um, I'm trying to compress this talk in five minutes It's a little bit more technical about monitoring which we were talking yesterday. So let's see how it goes So first hi to everybody My name is Antonio. I'm a service reliable engineer at cloudflare Based here in singapore and i'm involved with the singapore team on our monitoring solutions So when we're talking talking about today i'm going to talk about, um, uh Monitoring as a scale as I'd say at at the topic of the presentation and For us monitoring, uh, its visibility And the facilities are keys. Um, we have some changes when we scale up our systems Back in the time we have 50 servers. We slow slowly grow and 100 servers was a huge environment already. So by then we had, uh, Nagios as a monitoring system which worked great at that point. Uh, it was, uh Plenty of documentation out there and it was working simply as a charm. However As we keep growing we start to experience problems first of a scalability. So Uh, we found that Nagios breaks very often. We have a single point of failure. Uh, It's very hard to make changes. So at the end, uh, this was the Infrastructure. So we had a centralized Nagios server that basically, uh, connects to all the metals and get the data So you can see the problems here obviously Uh, one of the big problem it was like we're talking about the centralized demon that, uh Handled an insane number of connections. Then the configuration file was complex, rigid, very hard to change. Uh And we could not afford that. So we're moving forward and we were looking for solutions The solutions we're looking for is a standard by setup Uh, something that was easy to customize and something that can do self registration and, uh, expose servers services. So we choose Prometheus Because it's robust, it's HA, it's very easy to troubleshoot And you can, uh, add easily metrics, millions of them And converting in monitoring points. But of course the challenge that we face was first off the philosophies There are like two different ways to monitor Nagios versus Prometheus Uh, and both systems need to coexist. So we cannot have downtime on during the That migration. So that was quite, uh, challenge Regarding implementation we use horizontal, horizontal shirting The process was verifying first HTTP points, aggregating metrics, defining alert rules Make sure that the rules match with the previous one we had in defining Nagios initially And then, uh, once the metric is ready, we deploy it, we verify that it's firing It comes as well. It's open source, it's free to use Beneath that, the deployment that I'm using is, uh, you know, has, uh, having a Uh, SEF is a storage That's basically highly scalable, uh, you know, distributed storage solution A wide single point of failure, you know, has self healing embedded within it. So, um, that's all. Thank you very much for your time Um, I, I wish you all, uh, you know You also do some automation, uh, play around with physical servers if you get opportunity like me. Thank you It's a system administrator Yep And you also work Do you need a mic? Hi everyone, uh, my name is Vessel and I also work for Palo IT. Um, I'm going to be talking about virtual assay Which is a chat bot to do provisioning on the cloud And it also installs software packages that you might need I would like to share why we did virtual assay how we did it the technology stack that we used And, um, I would also like to share our experience in general Right. So at Palo IT, we focus a lot on learning and innovation One of the main goals with virtual assay was to let our developers provision Machines or development environments as soon as possible so they can focus on, uh, innovating and delivering software Which adds business value as quickly as possible We did it, uh, over a weekend where, um, some of us got together Basically to learn more about chat bots machine learning and provisioning The idea was to learn in a collaborative way where we're where we're sharing, um, sorry Where we're sharing with each other and everybody gets to learn based on their own interests Um, with the bot, we can now set up an environment for a specific project in under four minutes You need a development environment for a new application in react That's not a problem at all. You just ask the bot and it's done The machines can be provisioned on open stack. They can be on google cloud aws alibaba cloud You could even integrate it with kubernetes or docker docker We use slack for internal communications, but we explored facebook messenger Skype a little bit and telegram and we even did a mobile application In react native, uh, with closure script In terms of design, uh, the virtual assay has three components the bot machine learning and provisioning The bot is essentially, uh, node.js web server, which is talking to mongo db The different channels talk to the same bot so that you get a seamless experience across the different channels To make it more seamless in terms of the interaction with users, we use something called dialogue flow It's basically a natural language processing api from google And um, it has built in fallback responses and it does learn over time. So that was definitely a win We felt that it would be interesting in the future to augment this with, uh, Project, uh, specific synonyms. So for now, basically we just use different key We just match different keywords if the uh, to the same underlying package if they're synonymous to each other For example java java 8 or jdk would all be mapped to jdk 8 On the machine learning side, this is an interesting one We tried to build a model to predict if a machine that was being requisitioned would be used effectively or not We did not have a lot of luck on this front because at that point we did not have a lot of data We definitely need to go revisit that, uh, soon The third component the last one and the most interesting one I think for for the audience today is basically provisioning and the way we did it is arrest api using node.js Which is talking to which is talking to ansible and terraform Yeah, it's a single ansible playbook that we use with different roles that you need to set up And it's it does everything in in one single step where it basically sets up the infrastructure that's required The bot also has integration with ansible galaxy, which is a public repository of various ansible roles This enables the users to Install packages that the bot has never even heard of in the past, which is quite interesting I would now move on to how we could improve virtual assay in the future A few key things would be to Uh, identify the same user across the different platforms. This can be done by building a user profile with their mobile numbers and slack IDs, for example NLP could be improved further to better handle like project package and operating system names We could curate more ansible roles based on the common requests that we get As not all roles on the ansible galaxy work out of the box That's something that We we we struggled with a little bit Having this curated set of roles basically ensures that installation would work in the first in the first step We would definitely want to work more on machine learning For example, one of the things that we want to do is to make suggestions for better machine configurations based on the project that you're doing That's all the time that I would take today I've been told to plug this in. So i'm going to do that We have limited edition t-shirts. So come visit us at the booth We would like to get your comments and feedback and that's it. Thank you so much guys Yes, if you can raise your hand We have one I was actually concerned and that our need to actually time this out because there will be a lot of hands raised I remember so, okay, well and a half one any more notice we'll have two Again, the care is limited. It's actually very good I don't see great How's the beer? It's actually craft beer. So it's not bottle beer. Okay, three Can you come to the front? No You're the first one second and the third Um, can I have your names? I mean we'll just tell us a little bit about Your name or do you do? My name is William I'm currently serving national service I'm a software developer My name is Vaz. I'm working in the world as a French company I'm happy to be in the conference and I would like to thank you very much for your enthusiasm So for the first volunteer We'll add one extra. I mean you can actually have three times to click the new topic and choose That's the first quality So if you want to go off continue to do it or you could click three times This is fine, okay There you go Again, it's two minutes 10 seconds per slide Random image I think the only caveat that it's not profanity So Yes, so that The question is what do we do during karaoke? Not is You get to speak for two minutes Try to stick to the topic and explain how the image aligns to the theme If you come plug in double today's or your job or anything By the way, make it fine. This is this is not serious. Let's give it to you Okay ready So continuously continuous delivery is about making money and Putting money on a weighing scale And when you put money on a weighing scale You will have tall buildings in a nice scenery And because you're you're rich already And you're at the top of the pyramid because you're the CEO and you basically just do nothing and continuously deliver Imaginary value while making politically correct speech about cats hiding down there And the cat is looking at the chair and the chair is something that you have just made thing And sometimes when you're bored you would want to make a Make something spiky and then Lie on a beach and Start doing programming like a real developer Right. So as deep almost like to say developers developers developers and then um Oh That's that's deep almost and and then you you get all the people from the new foundation writing on the bicycle going to his house test and then Yeah, the bridge is The bridge is a significant figure of being of the first step to quitting your job and uh Signing on in the national in the army itself and while you're in the army you realize that you can actually go to the navy itself and do Do sailing and then and then you You realize that um, you want to quit army and then you start a going for a wall and thank you Actually stand here because I think it's video recording. So So do I get that choice of two topics? Yeah And this is digital transformation So digital transformation is not something that can happen immediately. It happens over time And uh, this august represents that time that it takes So, uh much like gender change You know, it's in in that process of changing that you can't just chop it off and It takes time Okay, so Digital transformation is much like the matrix where you get offered two choices. You either choose to transform or not And I would strongly recommend you To go with it and I'm also would strongly recommend you to wear pants on the daily basis Uh, especially in singapore, you know, because you have all sorts of rules here that you don't want to break And if you do break them, uh We'll show you the sign and You might attain And then deported In an order that we surprise you Okay, so back to digital transformation. So, uh Take this dinosaur from the brave story age who is trying to undergo Self transformation. He's trying to become a basketball player and Uh, he's going through some teething issues He's having some problems, but he's working them up. Okay, and now back back to this time time period, I guess So when you undergo digital transformation, uh, there are some integration issues that you might face Uh, such as introducing all these new technologies that may or may not be relevant to your organization such as wikipedia and bicycles Yeah, uh, and This is just a non-secretar guys. Don't text and cycle at the same time is very dangerous From personal experience. Don't do it Right. Thanks. It looks like you're going to speak about change management after all There's still there's still a couple of them Okay, so, uh, let's try it All right, uh change one such one actually And uh, if I look it in a negative perspective, I think it's an evil insight because When we try to figure it out, it takes some time to see how the new things come up in our organization Though you would have to pace up Between the colleagues who try to do the new things in our company at the same time We like the knowledge We need something to zoom in and check how the things work out Because we have to develop ourselves at the same time Yeah, we feel sleepy at the same time to accomplish the new things. So it's not easy task. We have to Pace up And then uh, if sometimes the plans doesn't work out, you have to switch to different plans But I think switch to planner is always a good thing because it deviates to your direction And most probably if you can figure it out, you can elevate yourself to the highest point of learning As I put seeing this graph I'm not sure what is this But everyone is present the way we looked in the previous slide so that everyone is Scared to look into it Yeah, coming back to So those are the people that got the portrait So yeah, there's Yeah There's a change management still takes a time to analyze what we do with the company Do we have one more volunteer anyone wants to try It's a very good ice cream or brand you already have your What anyone else maybe maybe maybe a woman would like to try about all ladies in the We will have representation anyone wants to join the the stage Yeah, yeah Is that is that a yes? Yeah, is that a yes? Yeah Yes Don't make it easy We'll choose a topic. What do you do? I think my name is journey. I'm in marketing in palo it Is there something else I can talk about? What's what's the topic? Okay, I can well I'm a fashion student I can talk about fashion Fashion what? Let's see what Yeah Um, well, so this is uh, wow to be honest. I have no idea what's happening, but uh Okay, so um for fashion. Oh my god I So people are playing with some Lego. Uh, so I think they are now it's a cookie Uh, yeah, so, um, how uh, I'm thinking uh that uh, sometimes you know live there's like a bridge You just have to that there's always a lot of obstacles pass Sometimes you just take a bike and you just cross the bridge and then next you find some crazy organism thing and You you go back to the human form and you think about life as you're sleeping on the bed And you're like, what the hell am I doing in my life right now? Give your presentation in an uh, darebox day And then next you think about oh, I need to build a family. I need to build a home so, um This is like any care manual so, um And then uh, you think about what kind of sports should I do to get healthy to improve my lifestyle? So then maybe basketball, uh, yeah, so, um And then you think about reading, you know, this is an important thing to become more intelligent And um, you regret on your life because you didn't start early. Um, yeah, so, um, you hope that your brain can uh not degenerate and Yeah, so, um, I think you think about having a making money That but money is not all that matters. I hope all of you agree with me on this and You think about the numbers in your life and thank you Sorry us We'll do the voting this time And to make it much easier we'll only vote And Julie, yeah, so who wants to give the beer to Julie for the current? Okay, come on One more wine bottle, but you'll have to share it So, uh, we've done Let's go back to the Pigeon hole. So as agreed, we'll select the 12th So have the 12 topics. So what do you look for in a devil's job? How can we attract talent? Uh, to devil's roles how to persuade management that they engineer team need to be capable of time And it's also about getting rid of manual change with this manager top Continuous development and employment as part of devils Why are there so few women in devils? What is the best way to teach our users to reduce their resistance? Devils agile all shift focus from individuals to the team daily standards how to and How to and any best practices what are the essential skills to build devils career? Why are we pretty? Why are we pretty? Please have some vegetarian food for the fish Uh That's any one operator no cloud people like this cluster are playing as possible in going against individuals and the last one When are the t-shirts arriving? Please Please if anyone didn't pick that the t-shirts is actually in the registration booth So go grab this limited edition t-shirt. I've seen one of the Our previous attendee who has a 2016 t-shirt What's your name sir? Denmark you have a 2016 t-shirt. Is it good? It's awesome. So again grab your limited t-shirt It's next how do I ask question of Does the replicas in a pod gets restarted when dpa is Applied in kubernetes. For example currently the replicas are running with one. Oh, that's some someone is working on that Okay, so what we'll do next is we will plan it in the group The first open space And we'll have the same grouping that we did yesterday. I have to ask one favor again Please can you help me rearrange the chairs in two groups? So we'll have on this side One two groups and on this side two groups and a lot of facilities to Help us to the partition And let's get back at 3 p.m. When we discuss the open space Again, can you help me? Can you help me rearrange the chairs? The chairs will not be rearranged by themselves