 There are two talks today and three speakers, so without any further ado, sorry for the technical difficulties and we will also be moving to a larger space soon because we had more on the weight list than on the archipelago because of the limitations of the space, so we will be doing that possibly from the next month's meeting We are moving to a bigger space, so we'll have a couple of hours I think that's gone Okay, that's fine Hi, everyone Thanks for coming today to us So today we're going to share our experiences from working on the government space Continuous delivery in government space So we're going to discuss how the environment is different in a government project versus in a private workspace So the KPIs, the metrics, the measures of success are very different in a private space as a government environment For instance, let's say in an e-commerce environment, the measures of success would be revenue, number of new users, number of products sold and so on But in government space, it's completely not done It's either how stable the environment was, how stable the application was And how fast did we react to customer feedback and how high is the customer satisfaction and so on In fact, the government project that we experienced, payments were linked to the customer satisfaction, the user survey So the measure of success and the metrics they use to measure the metrics are all different in government versus private space And why auditability and traceability are important in both private and government In government, those are the ones that are most important Having control, control over what is deployed, control over who can deploy, control over everything And auditability especially So typically in government space, auditing takes more time than development So they spend months trying to validate whether what is done is right And are we combined with all the policies rather than productive stuff That's how a typical government space works And organizational structure, it's very different, it's very hierarchical There's role segregation, we say the name DevOps team is a misnomer But unfortunately, that's how leaders in government space Those who do development, those who change business, domain logic cannot deploy to production So that has to be exclusive role segregation So what we're going to generally discuss today is how do we still achieve continuous deployment, continuous delivery With given all these constraints, that's what we're going to focus on Coming up So we know that in a private space, do you want to go to the next slide? We know that in a private space, success is very clearly defined There is clarity on our goals We know our goal is to achieve this much revenue or these many users at the end of this quarter and so on But in government space, there is no such success defined So how do we still make it successful when nobody is looking for it? That's what this is about So what we've seen is the uptimes, the performance, user satisfaction, better streamlining of processes These are the things that everyone is generally focused on Those who are very forward-minded at least A very interesting example is in the e-commerce space, we're all very used to things like perform analytics See, do some blue-green deployments, do a bit of a two-percentage testing And see, can we increase our user traction and so many other such things Over here, the metrics are different One of the things that Mutu had pointed was payments which are linked to certain other metrics So when the user is not so much concerned about growth, how do they make sure that the vendors give them what they need? We realize that these were the kind of interesting success criteria for them I tell you that when you come from the e-commerce side, the bulk of my work has been very e-commerce-ish Coming from there to this sort of a world was a big change because when you give recommendations to them Oh, maybe you want to try this You look at each other and say, yeah, that doesn't quite apply to us That often would be a very interesting surprise But the better thing was our experiences were good There was a lot of focus on improving processes There would be a lot of wake-up calls like you would give them showcases very regularly So they would realize, oh, you can do this with computers, is it? Or you can do this in a new way Okay, maybe we'd like to try it out So our experiences have been generally very optimistic But we realize that growth means something else over here Go ahead, Mutu We spoke about auditing and how auditing is very important It's the prime thing in government space, right? So we also saw like, auditing takes typically more time than actual productive development Most of the developers' time or the infrastructure systems in government's time Spend confidant arguments with auditors on how we think what we did is the right thing While auditors say, no, I need proof and so on So how we addressed it in this environment with all these constraints is We had hardly embedded in our deployment process So if government comes up with a security policy We automate all of it And every environment, not just production, right from development to QA, UAT Any environment that we might have All of it have the same deployment steps and so on And all of it is added similar to production And automated auditing reports What auditors look for is reports to say we could be CS, auditing reports Or it could be any metric that they're looking for Let's say we have... But auditors are primarily concerned about probably just pre-production and production But even if we have, say, 50 servers together in both those environments Generating those reports for every environment And getting the auditor to go through all of it And getting a sign-off is a big deal So automating that also helps, that's what we did Yeah, one thing I'd like to add here is The CIS benchmark, yes, you got a question The CIS benchmarks, which are very popular, right? When you have external auditors come and audit you, they say Please run the CIS benchmarks I tell you, please can be highly misleading The reports that come, you know, the CIS benchmark reports It just, they have certain defaults of their own Okay, if the finding doesn't match that It's immediately marked as a fail Even if you are more secure or whatever Let's say it has a policy like, you know, change passwords every 120 days You would have gone and said, oh, I want to change it every 90 days It's not a market as a failure And then good luck trying to sit and explain to certain kinds of, you know Auditors who are not very tech savvy sometimes The other interesting challenge with the CIS benchmark scripts Are that they are often not, their checks are actually not correct They may not really check for the right stuff Or there'd be some error in the script And it just reports a minus one or something where You actually have values in production So what we realized was Sometimes you can't explain a lot of these things to auditors So you have to go and come up with some stupid work around yourself Where you say these are going to be known findings These are what we actually have This is our script, this is what we check for And, you know, please when you review this Don't waste your time flagging it as a failure And engaging in multiple meetings Here's what we expect you to see as a failure And here's what the actual value is Here's the extract of the file Those actually seem to get us through faster than Sitting and debating whether the CIS script should be fixed or not Because we have hardly time for that And employment and automated auditing reports We don't have to wait for auditors to come up with findings Even before he comes, we know whether it's going to be It's going to be green or red as in it's going to pass or fade So we get proactive reports and we know in advance If some security policy is not followed We know it in our development environment and model production So that gives us time to fix it And also, if it is something like known issue If there are any known issues Then we know it well in advance before the auditor comes And they can be prepared Our goal in all of this was Just make sure that eventually auditing becomes a non-event So something that we've discovered about Singapore culture As you can make out, I'm not from Singapore In the other countries I've worked in, the metrics were different It was always about revenue, getting in revenue Making sure you don't have down times and so on Over here we realized that there's a very intense audit culture You have the excel auditors who are invited to audit You have the government auditors who are invited to audit And there is a strong feel of Even if you have answers to questions The fact that questions are raised Itself has become a very interesting It guides a lot of people's behavior It's fine if, I mean you would say Hey, someone asked a question, that's good I've got some responses for them I've got very nice clarifications But culturally what we've realized is The fact that even questions were raised In certain groups that itself becomes a flag Hey, you know what questions are raised So that was very interesting for us But I don't know, this was a bit of a bad adventure really We just had to survive and go through it So I'd really like to go and look at this very quickly with you To tell you the truth I really enjoyed the thing all sudden Sometimes you go through difficulties And you realize, wow, I've learned a lot We've come up with a better product And we actually have something that's very good To the extent that I went and gave your exam To become an auditor And now my paperwork for that doesn't progress I cleared the exam and I'm pretty pleased actually The reason is, there are some topics That we are going to come up with later Like segregation of duties and network segregation And whatnot There's a lot of problems that we faced And my feeling was instead of sitting and debating it Or calling names or making snide remarks or whatever That's fine I'd rather come up with working examples And some revised papers and revised examples By which we can go and influence the industry And say, folks, there are better ways to do it We can still stick to all the principles Still continue to have the same kind of We can still continue to protect our assets And start with better ways which are not so painful As they are today That was our goal So one of my own realizations was this What is it that security and audit folk do? They are not actually out to give you a difficult time Because after I got past that barrier We started to show the auditors a lot of automation After we got past that barrier Where they were a bit skeptical about automation We generally had a good time with them To the extent that towards the Later half of our various auditing that we'd go to We'd be done in about 45 minutes Imagine what would usually take me About 10 to 12 days And with evidences another month And what with other projects have seen That they can go a month or longer Over here, we would now just be done In about 45 minutes We were just done That's because we had learned our lessons So the top thing that we realized is This is what everyone really wants to know The answers to these questions And if you as a team understand that Hey, if I can have good answers for these Which are sound And you're actually in a good part The answers to these Plus automation and you're good to go So at another meet-up last week One of the people in the audience Had asked a good question What is a good security policy? So I gave a whole lot of answers But one of the guys, Stefan, he said By the way, a good security policy Is also one that can be codified If you can express your security policy In a code form And be able to run that regularly And review it Then you have a security policy That can be life Versus a security policy That's just a bunch of documents Painfully filled, painfully reviewed But perhaps not updated on time Right? So how did we win How did we win the trust Of the various auditors that we worked with The top thing that we did was We showed them that as much as possible We are going to follow revision control We're going to make sure that our entire revision control And the process beyond that is fully automated We are going to make sure that We provide all the assurances that everyone needs In terms of how good is your code Are you coding the right stuff? Are you, you know Are there any shortcuts that you have put in? Is there any back door that you have snucked in? Right? So we had, we made sure that First we explained all of those things On the next few slides to the auditors We started with this Saying that at an application level This is what we have at the very minimum We make sure that we have The test driven approach To writing unit tests Which developers author Which QAs also contribute to By way of saying I will be checking for these things We make sure that all our services And inter-service calls We make sure that we have the right kind of tests over there We demonstrated that For end to end workflows Also we have tests An end to end workflow test is something like I would like to take an e-commerce example Where you log in You search for some items Add it to a shopping cart, check out And it says thanks for the payment So when you have a test that just does this much That's a sort of an end to end test Those kind of tests are ideally done at the UI level Either I should enhance this like You see the interesting thing about this sort of a pyramid structure We call this the test pyramid Just search for the test pyramid You will see a bit of a write up The interesting thing about such a test structure Is a test that fails here Tells you that something failed It would tell you things like login failed Or I could not add to cart A service level test is one that says This tells you what failed Add to cart failed This service level test would tell you Why it failed That's why you would say Oh this service could not talk to that service Why your API signature changed Or it talked to it But you know the response was something else The unit test is the one that tells you Why it failed This is the one that says You know I received three parameters But I used only two of them Things like that are what you catch with Unit level tests So what failed Where did it fail Why did it fail So once we are able to explain to auditors That look this is the approach that we take So that the application is robust Okay they realize that Okay so these guys you know They have removed a lot of the human element Out of the testing And they have automated those things This is important Muthu Yeah I just want to add on something here So it's in the shape of a pyramid For a reason Because tests have cost So we don't write test just to get An auditor sign off It helps us with quality control So that makes a good build As in a developer comments a code change And how to make sure that That code change is not going to Regress any functionality That's already in production If all of these tests pass And why do we have more unit tests Than service tests That's because it's more cost effective To write I mean it's easier to write unit tests It takes less time It has more value And it gives you quick feedback The feedback the cycle time is so short A developer checks in These are the set of tests That will be run first So this is Even if there is a functional test failure Whenever there is a functional test UI test failure We will try to write a failing unit test Before fixing it So that we catch it early on The cycle rather than late Actually this brings up a very important point Okay We don't write our tests On approved code So there are organizations Where what everyone does is They say Let's write a test Let's deploy to Let's say a QA environment Let the QAs give a sign off Let everyone say yes This looks like a good build And then some dev team Or whoever would take that Or you'll have a separate unit testing team They'll now take up that code And start writing unit tests There are companies Who would like to do things like this What we know by experiences That's not really the right way to go about it I'm just putting it out there I would like to debate it some other day If you don't mind Because we want to tell you A very important thing Because we explain these things To the auditors We were able to talk to them next About pipelines So A pipeline And we don't use things like Jenkins We use more advanced things Than that Open source We have this concept called a pipeline Okay Remember, please note This is not just a visual indicator That I've used A pick is worth a thousand words But a pipeline You saw the tests in the earlier slide This is how we run them Unit first Integration next Functional after that And then regression Which is just what every End-to-end or whatever critical test Would have identified Often it is bugs that have come up In the past Which you want to catch And make sure you're not regressed again Whatever those sorts of tests are The point of having a pipeline Especially one that's automated Is as you go through this From left to right You can have an increase in confidence In the commit Or the set of commits that went in So please understand this We have set fully automated And I said Or set of commits that went in That's because When we set up our CI systems We don't build every night We set up the CI system To call it maybe a one or five minute interval And I commit a check-in Someone has a You know, they're waiting to check in But the CI system detected it It went ahead And began the whole pipeline process So this much We made sure that we always Have this automated And the build that comes out of this We send it off for manual pen testing Right Or for showcases Or other such local checks So just to go back to the previous slide These kind of tests Are run over here And we were able to show this to the auditor And she liked it Because now When we see the message About continuous delivery is Your code should always be In a production ready Ever deployable state And we want to tell them We want to deploy every week On a basis to give a sign-off So we were able to show that We're not skipping testing This is how we do it We're not skipping human pen tests That's how that is over there And as a consequence We were able to get a go-ahead For a typical workflow that was like this Where we get whatever requirements are We have the dev QA pen test And showcase phase This happens as much as possible We release it into a UAT environment External people can pen test If they would like to Of course performance tests run all the time There's a whole separate slide on this And some interesting stuff there And once all this is ready We are ready to deploy to prod Okay Yeah The next thing that we explain to our auditors So you see a lot of these things Were very interesting for them Because they're all used to this You have a development phase That regressions will start in two weeks All those things right And we said here's our regression We come in Regression done in 25 minutes All things through right So because we were able to show these And they enjoyed it We then took them to something like this Right where we said A hypervisor infrastructure was VMware So what we were able to explain And this was a whole audited process We said look even the VMs that come up They come from approved templates And how do you approve templates We had a proper CI setup for that So that this is a Red Hat ISO image Here's the GPG file for it We finish our validation Use VMware API Startup a VM Prepare a VM with the whole OS install Convert it into a template Just this bit of a process We were able to establish that there's no Human intervention in this Zero human intervention And now that's become a template If you ever would like to suspect a template To compare checksums, compare everything This flow gives you an approved template Right this Okay I just said approved template You see I should have said Automatically created template The approval was just saying Yes use Red Hat Linux CD That's what that was Even the operating system however We were able to show that it comes It is automatically provisioned Into the VM used to create a template We were able to trace all this The kickstart all the rpm files Are the GPG checks turned on And you know the whole script is Revision control the whole CI system Knows exactly which are some of the Source control system it took To make it happen We were able to demonstrate all this So from an info auditing point of view We got a very good check off for these And then began all of these You see one of the things we wanted To tell them was We sometimes may just destroy An environment and create it again One very nice situation We were talking to One of the senior government officers Before we had shown them all this We just had it running You see we made a mistake of assuming As you me That was my mistake over there I argued with him for about 2-3 days And he said hey Ram you know You guys set up all this infrastructure With such pain man Why do you want to destroy all your environments I didn't showcase So we came and showcased all this to them Then we were able to show them That something like a 2325 Virtual machine environment With all of this done We were able to finish it in 5 minutes Why? Because that's the time that VMware takes Click a button make sure it's approved Check in your environment configuration Like how many VMs and such That VMware take care of it We were able to do that We do not buy any tools like Cloud of vCloud or anything There's standard open source tools available VMware supports the API You can just program it You do not have to spend thousands of dollars On tools You see the second thing that we were able to show The auditors was We do not want to buy a tool We don't need an official Approver or raise a ticket system And other such things Someone who owns an environment Now has the ability to click a button Delete the VMs Spin them up In the time that I spoke to you on this slide Perhaps we would have had an environment come up Perhaps We were also able to show That all of these things are revision controlled From source You name it like auditing rules Monitoring and logging All the network services Certificates with which we connect Full machine configuration All of these completely automated And checked in to revision control systems Deployed And we can verify the origin of each of these So once we were I must say If I were to sit and do this From scratch on my machine It would take me maybe a week But when you have to go into Any existing organization Where they have a lot of processes Implementing something like this Perhaps even in the dev environment Might take a month To convince people Encourage them to give up a bit of control While you do this and ramp them up Taking it all the way to the product Can take a few months more That's the reality check But the benefits are really awesome Because imagine being able to recreate An environment on demand Whether test or dev or product Yes Let's say there's a new OS edge Will you destroy your OVN I'll be attempting to spin a game Deployed So the way we handle OS addresses We have a running production system We want to apply OS on the product If you want to do that in production We want to do it in our dev environment Because production we obviously cannot destroy And recreate So the way we do it is We make production our standard benchmark That's our reference And we create a similar reference VM on our dev environment And register it with Red Hat And subscribe to Red Hat So we get OS patches So we know with reference to production Every day So it's a nightly build Every day we get some patches So every time we deploy Any environment We take all these delta patches From Red Hat And deploy it on all environments So when we run these tests Can you go back to the previous slide So let's say QA environment Automated test Even in those environments We will be installing all these Delta OS patches And making sure all our functional Tests pass even with those OS patches That our application is still working fine So when we have a good build That good build contains Not only just the application It also contains OS patches Our build is everything together Application, text tag, OS patches Everything We also get set of delta patches So when we deploy these patches to production We update the reference Whenever we update production, we update reference So our reference is always in line with production Our reference of what patches are installed In fraud So look at this Any of these environments We would of course know what But there's one more thing which we have not shown here That's the box where she You know, syncs with Red Hat and test the delta And what not, right? So the idea that she would do that She actually ended up writing the whole script at the end So the The thing that we did over there was We know what to deploy to fraud We have just pulled in stuff from Red Hat So we get the delta That set we would always promote So this is an interesting question you raised It made me realize I took this for granted You see for us This whole thing Is our deployment bundle Imagine that this whole thing Our deployment bundle being promoted In every environment So whether dev or QA or creeper Or staging or perf or wherever You would always deploy this Always and we had the ability to Just spin, just revert it back To a known good state or whatever It depends upon the environment and tooling And all of it but This would be our deploy bundle This way if that is a conflicting OS patch If we are using for some reason A system function Realize that well in advance In our testing environments Like automated testing environments It would not be a surprise in production So at this point In your in-house environment You could connect to the internet So that's not a problem We would run a script To do a YAM update So not all of our environments Can connect to production But the one which is pulling the patches Can connect to production We would bundle RPFs Okay but are you Using a satellite Or are you creating your own YAM repository Our own YAM repository So you are actually editing your YAM.xml In-house Yeah we would update the package manifest That is correct The challenge is when you push it Into the customer's environment Which is like we Seal off from the internet Then the challenge is how do you push it out Because that becomes part of our build When we push our deploy bundle All the dependencies will get pushed And one of the dependencies Will be OS patch So let's take a point ahead And answer this So it bears are mentioning I think we forgot to mention it A lot of these things Are all were all delivered As GPG signed RPMs So what that means is When we come up with a deployment bundle Let's say move into a different environment Into that environment's YAM repository We would add these GPG signed RPMs And regenerate the package manifest So we did not have a satellite license And all that This was our plan B And everyone was fine because We have full traceability of the GPG signed RPMs So it was good, yes sir So I can answer that right away We never had a satellite We just did not have a satellite What we had to do instead was Purchase red heart licenses So for example In the provider's environment Our client when they subscribed over there They were already paying for those license In the data center where we provided We already have a license And we have a license And we have a license And we have a license In the data center where we provided We already had purchased red heart licenses Per physical VMware box So of course for example The Dell hardware VMware to run the hypervisor We could have done red heart KBM But no We chose to stick with VMware So we also purchased something called The red heart data center Edition So what that gives you is That lets you run up to 500 In parallel on one physical server So that is the approach that we took So we actually Have a ticket file with red heart They are moving from that old Registration model to this so called new one We told them Guys when we Register with you Using your new approach We want to find out what are the packages That should have been applied into this That is not a solved problem yet I still have this ticket It is still open a year later That is still open So we had to resort to generating Our own delta manifest See the approach that we really wanted I would like to take this a bit because you mentioned satellite So for those Who have not used This red heart linux thing Here is what happens it is very interesting They give you this subscription model If you have computers that cannot Connect to the internet They sell you something separate called satellite That is cool Suppose you do not have satellite Then what they urge you to do is Something called an offline registration But you see now that we are moving to a new model That offline registration and getting What is applicable does not work So ideally what you should be able to do Is take a computer Register it with red heart And then go to the dashboard and say Tell me what packages are applicable Over here That way you should be able to prove to an auditor That it is patched and up to date But we did not have It is a procurement thing If in the procurement cycle Or whenever you did not say that I need The software you are not going to be able to buy it Worse you cannot even give it for free We had a situation where we had to give So force and it just cost $200 But there is a lot of paperwork So it was unfortunate for both of us We went through it But you see in the government space At the start that for this project I need this software If your required software is not in that list It is unfortunately a hard time for one and all Right so I have two more short questions The first one you should be able to answer Is there a license Or is there any way I mean if you test And you test bigger environments Create 50 greater instances And then you destroy them again And then you destroy them again Any development license from Red Hat Where they say hey you can have Red Hat Enterprise And you can do whatever you need To do in your development Stuff without I mean I know that they have one developer program Where you can have one license For yourself So that was a one developer program Licensed thing but that is for one instance Now you see in our development Environment we had like around 250 Ord virtual machines And then we put in the money And just provide a production grade license For even our dev environments What to do that is the first question I don't know if you have heard of Actly Which is a tool that solves this problem In the dev space Yes And where you The problem is that you have a public repository Which is run by some people And they update pages They delete the old version Yes Which means that once they do this You can't recreate a machine In the same state as it was In the past I use Actly for this Yes I know that the guy He wanted to integrate The young part of the tool So that you can use the same thing Which means you will update You have a local error And then you can do a snapshot And then refer to that at least I use one-source snapshot Yes You have to tell them why you are not developing Your software in open source It's something I think you should all tell Please open source your ideas She is not doing it yet Go into that open source project And then offer your help It's a nice goal project So understand And he is looking for For people who have to maintain The project So maybe it's correct I'm going to be sumptuous About this We have worked on a particular project We have Simply for the fun of it We called repository For an authenticated source And then we formed We pushed it down to send OS Simply for the fun of it Not for commercial reasons We trashed it We repeated the process It's a bad idea Because as long as it's not authenticated You can get We have 10 servers One server doesn't have a license If you fuck up the other 9 You have to catch your pets down So you have to travel You can prepare in advance But your pets are gone So nonetheless I think you can waste your time With the local account manager And you can work out some creative arrangements The surprise to say that If you are going to manage your own That repository is very dangerous Because they are a lot of unauthenticated Great-head repositories out there Some which seem authentic Some which are not And not only with the YAM XML And also with the authenticity So granted, we can do GPG But we could also Get Things which look authentic So once you have the right licenses You have to read through that process Because for us, we have tried to learn repositories We have done the satellites, we have pushed it out We have done the scripts And we work for us For us, the Challenge Universe Hatches was one Red Hat can support the list of Applicable patches for us But even if Red Hat were to support it We cannot just apply all the Applicable patches on production directly So if we take a build We have to make sure We have to identify a set of patches Which we have tested And only apply those That's where the challenge for us was So I might have gotten that as of today But if I choose the build from last week To be deployed to production Then the OS patches should be of that Not of today Just an interest Can we save the questions for later Because you do have another speaker as well Okay Yeah, so I guess What we want to say If you have the right kind of If you can demonstrate auditability And traceability If you can demonstrate proper code I mean test coverage If you can demonstrate that You are able to do this fast enough And that You are not just deploying to prod Because you have the automation If you can demonstrate things like that Then You can have continuous delivery Cd is not continuous deployment It is continuous delivery Be able to Be able to deploy Be prod ready Not My test pass I am going to prod Depends on the application of course But you know not everyone in the world Is all these e-commerce sites Where they will just roll back Actually one interesting thing We said the measures of success Is different right So why do we need such a robust CI system Because even more than private sector Government has to respond To change very quickly If something like control is all It is about If something goes wrong I want to be In a controlling position I want to control what is there Or if I want to put in a quick fix It may not always be a roll back It could be a roll forward But I want to quickly deploy How long does it take To fix things in production That is why we may not deploy Every hour every day even But when we have to deploy we should be able To deploy within the shortest time possible So we have two interesting things We did not mention these here See one thing That we One thing that we explained to all Was we are going to roll back Only if we cannot roll forward That means Because we have such high speed Turn around times We would prefer to be able to roll forward Than roll back While we have so much of testing And what not in place There were other departments Or other agencies that we connected to They provide their own APA endpoints They will tell you something They may not have a staging environment That is their reality So you test with them in prod And hey You said you would give me a value 302 But you are giving me a 200 What does this mean Sorry it is an issue But you see someone else had signed it off So even though you may call it a defect It was signed off So that means it is now a separate change request And God bless everyone On when it will get done So we cannot wait for all that We have to turn around quick That is why it was an advantage These things And all of this It was our advantage to have it all in place It is a very important thing for us And just to reiterate that We said our goal Is to be able to roll forward And not to roll back Do you want to take this? In governments In a typical e-commerce environment Or in the private sector If it is a good bill We can just go ahead and deploy That is the norm But in government space Can anybody deploy a good bill Given just that it is good We run all those tests So our performance testing has passed And testing has passed so we can deploy No unfortunately Because there is an approval process It is there and we just have to do it So how do we still achieve CI Given this constraint So authorization is embedded within our deployment process itself So we have somebody called the environment owner So for the environment He will be the one who will have access to deploy And he can just deploy as he likes But if it is production Then our product owner Is our environment owner So she or he has to approve that Given that it is a good bill It still can be deployed Because it could be policy changes Before government can pull out That policy changes over the internet They have to announce it to the world That there is indeed a policy change In the time and period when they can Deploy So it can't just be deployed Just because it has passed all the tests So So how do we Embed authorization is So our product owner It will go This is actually going through a ticketing process As our product owner approves A ticket to deploy as a specific car We will Have a tag that will make sure There are the changes that has to be deployed And before deployment we have Additional checks Which will ensure that the tag matches The set of changes that's coming in And if it matches only then it's going to deploy So that gave our product owner A lot of confidence And that also gave us visibility Because our product owner can clearly see What is right now in production And what are the changes that are going to go through As part of this Future deployment Like she'll have clarity on those requirements And if they are ready to be deployed Then she will approve it And then it will be deployed And as it gets deployed she can see And track that these are the changes So when I say track it is Including the functional specifications OS patches everything that's in the Deploy model everything is traceable Yeah Muthu you take this Okay I will try So security So yes we know security is important For government applications Penetration testing like we pointed out Earlier in our pipelines In this part of our deployment process Every good build if it has to be deployed We will go through penetration testing And if there are issues the cycle will do that And we fix it and we get a new build And then we leave the deployment So We need security clearance before A new build can be deployed And it's not just once in a while That just before production deployment We do the security testing It's embedded in our development process So we had a security investor In our team who tests Regularly All our changes and make sure That we don't have any last minutes of crisis Just before deployment We made it mandatory So security clearance is mandatory For management deployment Only in history earth And In all the two years that we spent In this project Our external pen testers Didn't even find a single security I mean they found stuff It was not really relevant to us Some other front-ending load balance Or whatever Performance One thing is performance of any application Is important for user experience Because customer satisfaction is of high importance To the government that's really important And even otherwise Our payments are linked to the performance SLA's So the uptime and how fast The response time works and all of it So how do we ensure that We don't break Or we don't slow down Like our system doesn't slow down As due to some of the changes That goes in the new build So that's part of our Development process itself Like one of those pipelines that we saw earlier Is a pipeline It generates automated performance testing reports So Every We can run it as often as we want to But we run it every day So every day all the changes that went To today will be run tonight And we'll go to performance testing pipeline So that will test whether Our functionality is right Whether all our load times All our response times Is our database good New build memory good All of it will be tested And you'll get automated reports on that So even if something breaks We have managed to set And it's practical I must tell you that as a sysadmin I really enjoyed this project Not just because it taught me a lot By view of due diligence in the security space But also in the performance space Just imagine So we had SLA parameters like CPU utilization Should not So they had a limit called warning Warning was set to 60% So we had performance SLA's payments and penalty Link to this where it says You should not touch or breach Warning More than 3 consecutive 5 minute samples Like I sample now I sample If this third time also it has touched Or exceeded 60% Of the performance SLA breach You don't get paid How awesome is that And this payment was You know you have this thing called It was very interesting for us Because unlike many other projects I see people laughing So Look I'll tell you something interesting Right This is a concept called the performance guarantee Period So you have this notion called The performance guarantee period The intent is In the waterfall approach When you're finished developing something You put it into a production environment Make sure you know all app issues And such are fixed And then for a particular duration of 21 or 27 days There should be no breaches After that period Then begins a warranty period And then everything else is paid Sounds good For us was We had already gone live about 1.5 years ago By the time we were done with the Replacing the last functionality Our first deployment had already taken place Some 21 months ago Right So in all that time our app was actually Live and we kept maintaining performance Right And yet because the contract said so Even after our last deployment We had to sit through a performance guarantee Period like that I mean we have all of this The contract says so So no problem what else to do You have to go through it Yes sir Process where you spin up the But at the same time you're saying For 21 months your performance has been running Are they not We are running, no not performance We are live in production for 21 months So this context is We are replacing an existing web apps functionality Moduled by module And then we are just going through all of it No My question is In the performance The thing is In reality Most of your test Or most of your production system It fails because of The load in actual environment And the load in Their environment or the kind of Translation happening in production Is not similar to what you find in Your U18 or SIC Very nice Back up those transactions To have some mechanism To find out those kind of Actually that's a very good question We missed a mention So our performance as the environment Is the only environment which replicates The production data load So we have anonymous data Down from production and we use that In our performance as the environment So we make sure that Our load If we have four servers in production We have four servers of the same load In performance as environment And we simulate the same load And that's how we make sure The other two Performance tests are really testing real time performance Plus Our performance guarantee was Not on the basis of present load But five years from now So We used to run Five years projected load We used to run it now And like Your normal test Also in Your U18 one Are those having negative test Or does it have some Down from production Do you get that? Maybe the load wouldn't be similar To the production but yes That was more from functionality point of view That reminds me The system gets Not because of new deployment Not because of changes But probably how it was Used And then the new deployment You can just scrap it And then put it So that reminds me of example Of iOS 10 Apple came So they immediately have to Come up with a patch Because it was not To install all Of the Although they have tested it 100 times With the same build same But not with the same dance set So something I have screwed up In fact my phone was the one You know So I reported that So I have to restore my phone And all of that I didn't have the backup I would say So even after all this That is especially why we should have All these processes in place So that we can respond to change And roll forward With you on that I would like to go to the Yes sir I hate to do this but we do have to So maybe you can save it for after the Presentation where you can ask them directly Maybe the very short one I can ask I don't know how many people Work on this project Beyond how many were there When we were at most There were 50 people in the room There was a whole application 47 or something But to do just this work We were anywhere between 4 to 5 of us 4 to 5 of us I would say the first cut of it Took about 3 months And then there were improvements as we went along I would like to talk about this It's our last slide actually Working with infrastructure Constraints So this has been widely published The government has decided to make sure That a lot of networks are segregated Post known And I'm sure they must have some major Insights into what else could go wrong But we also had to work with it So you see we didn't have things like Automated promotion of artifacts Into other environments So the environments in which we built Were not connected to anything By which we could promote the artifacts So since then In this project We just survived it It was unfortunate We had to do stuff like Connect over here Take your government secured hard disk drive Download artifacts into it Connect somewhere and upload it We were not too worried because Because of a lot of our tests We not really have to ever Encounter very many bugs and fraud And when we had to deploy We would deploy on a venus day That was our day We go live with a new build So we were kind of fine with that Though I Since then there are Some other government-led companies We are in conversations with And what we have had them understand Is yes you can And perhaps should have network segregation But you also can Think about Automated promotion of Assured artifacts Across these environments That is a discussion that has now Began and I am involved in those things That's one The second thing is Do you want to take this, take it in system? Yes, so Unlike, I mean if it's our own Data center and if we are controlling The complete infrastructure Then if you want to switch a service From porty to porty Then we just open by words And we just get it down I mean it's a non-event We just change the infrastructure And just be done But unfortunately we had to rely on Some other infrastructure provider Which means we have to go through A decaying system And it might take Any time between our days to months To get even the smallest of the Infrastructure changed And we just survive with it I mean we handled it In a different way Like we will use IP labels forwarding What are the resources available What is just an example And we just work within Those constraints and make it happen It may not be ideal It is not what we know if we had control But given this constraint We just have to make do with it This is another thing, limited access to Production We had to figure out some creative ways In which we make sure that Things like how do you make sure That the system is flawed This is a strict no In terms of so called segregation of duties Which there are better ways To handle it nowadays We had to work around those systems We had to work with those systems Because of our success over here Our clients were highly supportive They would often sit with us While we diagnose and debug fraud issues And we get into those boxes And review logs and what not There were those of us who were That we could review a lot of logs ourselves What we realized was Is the sheer pain of sometimes Getting into certain boxes was so much We tried to compensate that By making sure we have a lot of Monitoring, auditing rules Logging and all of those things that In place We had an intense amount of logging And log alerts and what not This is something that we were pretty pleased about We had a lot of proactive alerts On possible issues In situations where Network service providers in between Would suddenly change firewall rules Or they would install a patch Or apply some random policy Which causes issues to us And then it's a great Circle of trying to figure out Room to talk to Instead of users telling us We made sure that we had All manner of checks all across the way For example, it's a very well known thing If you have an IDS And you're making an open SSL connection somewhere A simple 10 net port check is not enough IPS will respond to you And say, oh yeah, I can connect to you there You have to do stuff like Run an open SSL client Connect to the other side Download the server side certificate Compare it with the last known cert And raise an alert in case there are changes We had to do stuff like When the open SSL Voluntaries just kept coming up We got into issues where One of our day before we went live Or maybe two days before we went live They had suddenly gone And changed the SSL policies It just dropped SSL 1 and 2 And said it's just SSL 3 Or maybe it was SSL 3 Yeah, you see we were We were using, we were still on An older build of Red Hat Linux And there were certain apps we were using Those were not able to recognize This new protocol and cert enforcement And we were just in a very very bad situation And we were in an older build of Red Hat Again because of our infrastructure provider Yeah, so it was like Really messy for us We had to work one at the last minute Replace one website, another Change a bit of the app, God knows what not Okay, it was not a day before But four days before go live But you know right, what the hell So something to live with But proactive Alert and monitoring on a whole lot of things And what is helpful for this customer satisfaction Because even before the affected customers Would call in, we know there is an issue And if we have our customers They would trigger the standard SOP They'll get the call centers ready They'll be prepared for customers And if there's something that we could do To fix it, yeah it's not always Out of our control If it is in our control then we could fix it Before something is affected But one thing So after we deployed our application When compared to the legacy system The cost of the call center for issues Have come down drastically I don't know the exact numbers Yeah, so we're not sales So we'll not throw some random 30 person calls reduced or whatever But they were reduced enough that people were wowed And I guess This is a last slide In summation, what I would say is I personally kind of enjoyed it Because I ended up becoming A better quality minded person My due diligence levels have improved Vastly When I document or read Something or when I even respond I think now in multiple Aspects, I worked in many Countries around the world And I was really really pleased And I guess I kind of understood The Singaporean spirit Of wanting to always be The best in the world It doesn't matter how much we have to be the best Or you know, hey This had better not regress And when you look at how well the country Is run and everything, you realize it It's all these people, civil servants Do all the best And they're doing the best that they can I walked away very impressed I must say I was really impressed Please come, ma'am sir Just want to tell the story on how civil servants Thanks to our customers Our product owner actually Shared her own credit card details To a random user Because they were not able to pay With their expired credit card in our app I mean she really wanted to help The customer And understand that there is this concept Of expired cards So she shared her credit card details To a random user So they could pay and use this application That's how much the civil servants Have this company is We were really really really impressed Because of a lot of our proactive Alerting and logging and everything And we got into situations where We would notice something We'd get an alert, the user's still trying it out And we'd tell one of these civil servants Who we are working with They'd give them a call and say Our system is detected, you're trying to do something It's a mistake from our side Sorry, please try again in five minutes Or whatever And we're having a fix ready for you I guess the last closing note The success of all this is A case where Mutu and our BA Direct owner There was something that had happened And the fix was either approach A or B And they just looked at her and said You have to tell us, we'll be done in an hour But you have to tell us Do you want to approach A or B So they were not used to this They were used to a life where They'll tell us today We'll have a triaging meeting next week And it'll be delivered in some other months Delivery cycle But here we scared the hell out of them You tell me It was pretty cool to be able to do that We're done your last question Yeah, you had a question Problems come from the application And the systems But you said that it automated a lot So there's a lot of Any kind of problems coming from the automation side Where you also have updates Of the automation software Scripts going bad You have a lot of things Somewhere running in the background Hard to see what's going on Man, it was the wildest Bad adventure I have dealt with stupid things Like idea signatures changed Like that random patches applied Some HP software changed You never get to a point where you found That the automation part Is stabilizing So you know what In all of these We ran the entire automation scripts So if we had an issue we'd catch it here Or even earlier So if we had issues we'd catch it there We'd not catch it those in prod But other things in prod Where they just changed beyond all of automation scripts Really bad number of issues Like firewall policies changed Some patch applied by somebody else Some HP Has been revised so it hung One of our PGP SLAs had once got breached And it was after a lot of Begging that we could make them see That there was a sand level failure It was miserable Very miserable, I can tell you this Yes sir So manual hollow takes to complete the free automated pipeline And is it really No manual testing involved Because as I know We do exploratory testing The thing in our world is Do you see this QA So we have dev, QA Pentesting and showcase And by dev I don't mean just a dev environment The whole dev is this whole flow But then the QA They click, deploy the environment Get the app in it And perform the exploratory testing That is correct So the pipeline doesn't complete Just one hour it can be more than one hour It depends Hey you know what, this is a very interesting thing You see I must say Okay The tool and the technology Is maybe just 10% of the effort A lot of making These sorts of things happen Is a lot of people and process and getting the right mindset Okay You cannot buy a stupid enterprise Jenkins or you know some IBM Urban code or some DB deploy or whatever you can say I have paid 15 million dollars I'm sure we got continuous delvina Why? Because my excel spreadsheet says So cannot Like that For a lot of these things I really like the question you asked You see For us Right from the time the dev comments And does all these things The QA is involved even before In the requirements phase itself The security guy is involved in the requirements phase When the dev say I've done all of this and Yeah and it's ready for showcase Before that the QA's come to the dev machine And do a dev machine check Like right there, okay you commented it You think it looks good Before we move it to done Does it actually work? Yeah we call it a desk check Dev box desk check And then before we showcase We showcase to the client every Tuesday 5pm Before we showcase to the client We do all of those things Right so if that is going to be handled QA, are we still quickly respond to change Within 1 hour to say If we are going to quickly respond to change Then it's just one issue that we are focusing On that time So like Ram said QA's will be involved Even in the requirements phase While a dev is fixing the bug He will know what the bug is And he would have come up with all the possible scenarios That he would have to test once it is fixed And so on QA will quickly pull it and test it While all these automation tests are being done So it's parallel So it's still the short man So the QA actually worked One spring behind the dev No no no, in parallel with the dev There is no work as such They just work in parallel with the dev