 So my name is Deepak Jain. I have around 13 years of experience in architecture, design, development of software applications for financial institutes, high resiliency, high availability applications, trading applications as well, some of them. And before I start, I have a confession to make. As much as I love technology, two of my passion areas really are solving business problems using technology and managing people in terms of, how do we get people to collaborate and work together and deliver that business value to our end users. So why do I want to talk about this? The first reason, the theme of the talk today is failure. And it's not just about our own failures. It's about learning from others' failures as well. And this is a strong community here. And it's important that we learn from each other's failures. So I wanted to showcase or have some confessions that some of the mistakes that we did when we were trying to implement DevOps for an organization. So before I get started, I just wanted to introduce the problem statement. Let's call the bank, a global iBank, because I can't take the client's name here. And we're working for the credit derivative business for this bank. And a small portion or a small function of that credit derivative business was the clearing and settlement function. Now, do people know here what settlement is in terms of trading? Anybody knows what settlement is? I see one hand, but I'll talk about it. So settlement is really, so when you execute a trade on the market, that's why you're making most of the money. And what happens is there are cash flows that are generated. These cash flows could be one time. These could be periodic. There are multiple currencies, different countries. There could be a lot of different scenarios that could happen. And settlement is the process where those cash flows are actually settled. And there's an actual transaction that takes place with your bank. So it's a fairly complex system, different components. You have netting, you have rules, you have state-through-processing, different components. Just wanted to put this picture up to give you an idea of the complexity of the problem that you are dealing with. So we had a team of 15 developers, seven QAs, three DAs, one PM. The application was fairly large. We had 1 million lines of code. We were an agile team, and we had two-week-spread cycles. We had a release which was done every three months. We didn't release every iteration. We were used to release every three months. And definitely the release cycle itself was a problem. The regression was a problem. We used to take one day to run all of our regression tests. It actually took eight hours on, I think, 10 machines at that point in time. So one of these days, one and a half years back, a client MD came in and said, hey, I want to, I'm hearing a lot about this word called devops, and I want to get into devops. You know, what is it that you can do? And that's when it all started. We made a lot of mistakes as we went along this journey. And let me start with the first one. I have a video to show just to indicate what problem it was. Devops is a mythical beast with 22 eyes. Devops is actually an old Ukrainian casserole that my grandmother used to make. This is when she was alive. Dev is the Greek word for 22. Devops is, it's like that fancy term that project managers use around the office. Devops is short for Devon Oppston, the creator, the founder of hashtags. He invented the first hashtag in 1977. It means deviant options, and it refers to distracting things in the office, like smartphones or Facebook or Pottery Barn catalogs, stuff like that. It's how you make a development. How do we make opportunities in development? Thank you. So we were not at this level, I think, when we started looking at it. But we were not very far away from this. So when our MD talked about DevOps, and there were a bunch of people in the room coming from ops teams, coming from architecture team, coming in from release management team, coming in from development, coming in from QA, everybody had their own understanding of what DevOps is. And what happened is, as soon as we came out of that room, we always want to be good in the eyes of our bosses. So we came out of the room, and everybody started on POCs on their own front. This team that started doing POCs on Chef and Ansible, trying to see how they can automate their infrastructure. The QA team started looking at how can we automate our performance tests and load tests, and how can we integrate that into the build pipeline, and so on and so forth. So all of our effort got diverted at the beginning itself, and that we had a very real, very slow start, and that impacted our progress in the beginning itself. Give you 10 seconds to read this. So I read Dilbert a lot. I think whenever I'm in a problem, I look for something that matches that, and kind of gets me the genius of Dilbert, and I can solve that problem easily. But what happened? The second thing that we did wrong was we did not talk to our people about what was happening. As soon as we started doing DevOps, or started thinking about DevOps, we realized that there are a lot of new tools, there are a lot of new platforms, there are a lot of new stuff that we have to learn. As developers, we need to understand how monitoring and operations and all of that works. Operations, we have to understand how development works. And we did not talk to our people as much, and we simply pushed them, why don't we start doing this, why don't we start doing this. The impact was, within the first two months, I actually lost 25% of my people from the team. And it was a setback. I had a couple of rock stars in there who went out, and it was not a very happy situation. It probably turned out better in the end, but at that point in time, we were again, we took a hit at that point in time. 10 seconds again. So the next thing that we didn't do very well, right? So I think Yagnik was talking about silos in an organization. In a financial enterprise at that, these silos have been there for so many years. They have been working in a certain manner for so many years. It's very hard to break these silos. It's very hard to get them to understand how a transformation is going to help them. You're looking at people who are from legal, who are from marketing, this is a client-facing website. You're looking at people across areas, and they are not ready to jump into this. They are saying, how is this going to help me? How is this going to solve my problem of making sure that the people are ready to listen to me, or the people are, I'm getting more customers to my website? And that didn't happen very well when we started. Now, as much as I tried, I couldn't get Dilbert to push the elephant for me. So I had to get some of my colleagues to do it. But that's the next thing as we did. Even though settlement was a smaller part of the entire application, it was still a huge application. And when we started, we thought that if we try to do something and showcase success early on, we'll be able to prove that this works for the organization. What we didn't realize that how difficult it is going to be to do this for the settlement application. I talked about the regression suite taking almost eight hours on three different machines to get completed. That itself was a blocker for us to get into an entirely development to production push kind of mode. And we were not able to showcase that success that we wanted to. And we had to go back, rethink about our application design, try to break that down and work on it. So the next thing that we didn't do very well was look at all the risks. Now, as a technical manager, I would look at a lot of technical risks, but I think we overlooked some of the things. For example, one of the things, one of the risks that we overlooked was information security. So we started using ELK for log centralization and all of our logs were coming into one place. We were able to debug that. It took it very successfully through QA environments and all of that. And we were about to deploy it onto the staging environment. And what happens in a financial organization is that till the QA environment, the data is always scrubbed. And as soon as you enter the UAT environment because it's the end users who are testing it, the data is not scrubbed there. So now, our ELK server was getting the logs from the UAT environment, which had all the client information, which had all the trade information, which were real trades. And that was a huge information security risk. And the information security team kicked in and this is not something that you can do. Immediately, we were asked to either roll back the ELK, shut down the ELK, or remove all the logs from the application. Either one of which was not an easy task to do for us. And that, again, put us in a lot of problems. The next one is about governance. So I'll go back to the same silos of a financial organization, right? Now, I'll take an example here again. So similar to log monitoring, we also wanted to do metrics monitoring for our application. How is the server performing? How is the application performing? We had a lot of application metrics as well. So we picked up Grafana as a monitoring tool and we said we'll implement it. So we did Grafana in the local environments on a Unix box, set it up, all the metrics were being collected from the QA environment. What we missed here was the client team, the actual application was actually deployed on a Windows platform. And they didn't have the capability to actually manage Grafana on a Unix box. They said, we don't have a Unix box, we can't take it there, we cannot deploy it, you know, you can't do this. And again, we did something which we were not able to take it to a staging or a production environment, which was again a roadblock. And if we have Dilbert, we can't leave out Dogbert. So that was the last, or that's the last confession that I have to make. So when we started doing this, one of the things that we didn't do very well was, how would we measure success six months down the line? How would we measure success one year down the line? Is there something that we can showcase to our business users? Now, in our organization, we have users ranging from the technology lead, the MD who's supposed to deliver success from a technology standpoint, but there's a business lead as well who's actually giving you money to do all of this. And we didn't have any matrices to show to the end user or to the business owner that this is what we have achieved in the six months. And as signed passed by, he was on our heads that I'm making so much of investment, I'm making so much of, giving you so much of money, what's my return on the investment? And that's where we didn't have a good answer for him because we didn't have any matrices to measure our success. So one of the things I did is I intentionally kept solutions that we used to get through these pitfalls out of the scope of this conversation. I wanted to, I think my perspective is that every solution, every context, since we're talking about people and we're talking about a business problem, every context is very different and each solution has to be relevant to that context. There's no silver bullet that's going to help you here. So as you go along, find the solution to the problem. Just wanted you to be aware of some of the challenges that you can actually have in a financial organization and an enterprise organization. So that's what I have. Thank you. I think I finished it very early. Any questions? What made you choose Grafana for monitoring and do you really do some real time visualization and metrics viewing across sites, across geographical locations? So what made us choose Grafana? I don't think we had really thought about why we want to do Grafana. We started off Grafana as a POC and we started thinking about it in terms of what are the matrices that we want to capture. So in addition to the server matrices and in addition to the JVM matrices, Grafana gave us the capability to capture matrices from our programs. How much time does it take for a trade to get settled? How much time does it take for trades for a particular counterparty to get settled? So all of those matrices, Grafana had a good interface for us. We were able to visualize it across different regions and that's why we went ahead with Grafana. Not until at that point, I think we couldn't move forward with that post. We got that blocker from the infrastructure team but till that time I think we were fairly good with it. Hey Deepak, thanks for the talk. I think it's really interesting to hear from someone in a completely different domain. A lot of talks were from big internet companies which have I guess different challenges. So my question is, I think a big part of DevOps is the cultural aspect where everyone has to be on board. But you also talked about things like governance and approvals. So I wanted to know, so there will be some resistance to change in a financial organization but how much of this comes from regulations that are imposed by external regulators and how much of this comes from internal resistance to change? So regulations or external regulations in my view are really business logic, right? What they want you to do is send me this data, you cannot do this trade, you cannot do that, right? There are really business requirements in my head. I don't think any business requirement should have an impact on how you deliver that business requirement. So most of the clients that I interact with or most of the challenges where we are faced are primarily political and a lot of times they are because people don't understand what we are really trying to do. So for the silos example, right? So different silos, I think one of the things that we have tried to do post this is that we break, we go to each of the teams, talk to them, help them understand what we are trying to do and how that it is going to help them as well. And then form a smaller sub team where we have representatives from everybody, every team and take that process forward. That actually helps a lot. Having a conversation in my view is in any organization is always very helpful. Yeah, hi. Get down. So you said that your regression cycle takes one day, right? Yeah. So how did you optimize your regression cycle and what tools you have used? So regression cycle, so what has happened is that we had, so it's a FAT application, it's a swing application and what was happening is the test client that we were using, the test tool that we were using could not run multiple instances of the client on the same machine. So that's why we had to run tests synchronized in a synchronized manner. So one machine could only run one test at one point and that's where it was taking. So what we have done very recently, we were trying different options on how to parallelize that. What we have done very recently is use Docker to create multiple client instances. So we have got our, so our infrastructure at this point in time is also moving from Windows base, which was in Ireland to New York at this point in time and it's a cloud-based infrastructure and we have Unix boxes there. So we've started using Docker to replicate the client instances and we are trying to run multiple tests at the same time, which is against the same server. So which is, which is working well for us now. So we achieve success there. Thank you Deepak. General reminder.