 talk on waterfall to weekly releases. My name is Tathagat Verma, and I'm going to talk about one of the experiments that I did about 10 years back. And we were actually having a waterfall-based process for a product that I was managing. And we experimented with some of the ideas from Evo. Anyone familiar with Evo? So Tom Gilb, who's one of the thinkers who has been around in the scene for a long time. Unfortunately, a lot of his ideas are not kind of mainstream, but I think Tom Gilb has done a lot of wonderful work there. And then Kanban, and of course I want to be careful when I use the word Kanban, because Kanban did not exist at that point in time. So David Anderson's work, if you have read the book Kanban, you would read about how some of the teams in Microsoft at Hyderabad, they were actually doing some of the stuff there. And it's just incidental, the coincidence that we were also doing some of the work at Bangalore when I was at that company. It was a company that I used to work for McAfee and we came out of McAfee. So it's some of those experiences. So what I went through in 2003, actually I signed up for, Tom Gilb had come down to Bangalore. He used to come to Bangalore very frequently. And so I had attended a day session, a workshop by Tom Gilb on Evo. And that gave us some of the ideas on how we could apply some of the things, because he actually talked about weekly release cadence. So far as it was a very radical thing that going from a very waterfall way of thinking into a weekly cadence of how do we do that. So we took some of the ideas from Evo. And then Kanban kind of a thing was something to be honest, we didn't know. So I would very honestly say that it was a serendipity for us. We discovered some patterns. We stumbled upon certain patterns without really knowing what we were doing. So I'll talk about that and take you through that journey of how it unfolded. Just talking a little bit about our product. We were in the network management domain. It's been 10 years. I can talk about the company I used to work for. Anyone from network management domain here? Anyone has heard of Sniffer? The packet Sniffer? So this was Sniffer. I was heading the Sniffer India engineering operations at that time. And this was the packet Sniffer. So basically you look at the TCPIP packet or any of the packets that you are looking at. Payload, of course, it's encrypted. So you cannot look at the payload, but you're looking at the header of that. And you are playing around with the header. You are deciphering it. You are basically trying to understand. So that's what we were doing for a living. The specific name of the technology being deep packet inspection is what we were doing and protocol analysis. It was essentially windows-based and we actually used to sell that as an appliance. So we had specialized hardware. We would actually have Ethernet, fast Ethernet and other kind of things. And we would sell them typically as appliances and they were essentially regular windows-based. There was nothing real-time about it. They were installed in data centers typically for traffic monitoring, analysis and network troubleshooting. And they were generally not put on the production network just so that you get the context of it. So we were not the most important lifeline of those data centers because we were not on the production networks, but still people needed to use us whenever they had a problem so that they could do the troubleshooting faster. They could localize the problems much sooner and so on. The typical users of our appliances were technical folks. So CIO, network manager, network engineers, and that's what we would basically do that. And it was a typical enterprise play and the selling cycles typically would align with quarterly or annual budget cycles because what used to happen, there was a lot of CAPEX involved in that. So there was a lot of capital investment and it just was the way the enterprise software used to be sold 10 years back was like, so if I'm a sales guy, I have quarterly sales quota and it just tends to happen that before the quarter expires, I have to kind of make sure my quota is met, otherwise I don't get paid, right? So typically we would see a lot of those kind of deal blocking feature requests coming literally at the end of the quarter because hey, if you can deliver these three features, then I can sign this deal with this Fortune 100 client, right? So that is how typically it used to happen and many times the sales would require customer specials, like I said. So the deal would not be signed until we were in a position to commit that hey, we could deliver that feature. So there was a lot of push from the sales side to the engineering to do that. So in 2003, our old process would look something like this. We would have customer bugs. They were prioritized based on multiple business parameters. For example, severity was a typical way we do that. Impact on revenue, so what is the revenue lying on the table that's going to get impacted by that or especially if we do not implement that. The volume in terms of well, how much we have sold it to a particular customer, the competitive nature that somebody else is pitching there and if we don't do it, then the deal is not going to come through. So multiple of these factors, the age of the case itself, how long that's been open. And I will show you some real data from that time of how long some of these bugs had been and feature requests had been open. It was just a humongous amount of backlog lying over there. So we had actually, the team was in Bangalore and we had the program manager in San Jose and program manager would typically prepare the plan of record and then he would get the buy-in for various types of MRs. A typical process at that time would look like a very episodic kind of a process. So we would have a service pack where we would bunch up 50 to 60 issues and there could be feature requests, there could be bugs and every quarter we release them. So what we do is typically every quarter, we are working on these 50, 60 issues there. They are not in any particular priority among themselves, but they are just there as a bucket and we just have to get done, all those things. And then if there is an urgent issue, we would release a hot fix and if there is a specific thing, we might even do patches there. So this is how typically the process used to be there. The service packs would typically have, we just had a very simple way of doing it where we'd say above the line, below the line kind of a thing there so that we had some negotiation power on the table with our tech support and with our sales folks that hey, if we run out of time, then we will not do this. So this was how we did that. Now it sounded very simple kind of a process which normally should work, but we had tons of problems with that. So I'll just list out some of the problems. We had a development team at that point in time which was supposed to take care of these issues as well. But what tend to happen and I joined the company in 2003 and I realized that the developers were not willing to sign up for any of these service packs or bugs. Because what tend to happen is they had signed up already for the dot releases as we used to call them and after that they just didn't have any bandwidth to take care of some of these issues. So they would not do any maintenance releases. We had a huge pile of customer escalations without home. Home was a term that we used to signify that there is a version or a release vehicle we have identified. So if there is a bug in my bugzilla, then I know this goes into 4.7 SP1, this goes into 3.1 SP2. That was a home. So we had a situation where a lot of these customer escalations were lying orphaned because nobody was willing to sign up for them. So they had no home really. And they were compounded by high incoming field rate. Ten years back the group I used to lead had a major because there was nobody to work on them. The development team, yes so what would happen is maybe a very high priority kind of a thing would be there but you would be actually shocked when I show you the real graph of before and after that some of the save ones would be lying. I mean we had save ones open for like 200 days which was okay because well people were willing to wait for that. It was not a SARS kind of a thing. It was like, so it was, see SLA is a different thing than the severity really. I mean we didn't really say that just because there is an SLA on that then we would really do that. The tech support was responsible for prioritization. We would not do that. So tech support would do prioritization based on multiple of these factors. But the challenge was and that became the crisis for us that there was nobody, there was simply nobody to deal with such a high incoming field rate and all those seven issues there. On paper we had a team, on paper I was making the commitment that we are going to do the SP1, SP2. We would try and do that. But what would happen is that SP1, SP2 were all episodic in nature. So I have to wait for a whole quarter before I can ship something out. But in between if one of my fortune 50 customer comes back and says I need something there and this really causing us trouble. We had serious limitation. We'll have to drop everything else and get somebody to just work on them for two or three weeks and then somehow we would go back and deliver the patch and what we would realize is because of the configuration system not being such a robust one we would typically like override that when we do the dot, that service back there. So the short answer is we did not have a lot of these things in place because the developers were, there was only a development team. There was no separate team that was working on them. So if I pulled out a developer from the team and said hey, can you work on this? The developer was working on only the main development main line actually. There was no separate service line there. So they would have to do something on that and try and do that. So they were challenges because of that. We had a low closure rate largely due to no dedicated resource as I said. So there was nobody really owning that. We had a large wait for customers to get bug fixes. Tech support often tasks the team directly and broke the process as was typical case of a startup. So it was a group within McAfee. It was an acquisition by McAfee in 2006 and then I was running the group from India and then what happened in 2003 was that we came out of McAfee and became a private company because we were acquired by two of the big private equity guys. So we became a pre-IPO private company and like a typical startup we were actually people would call up all over the place and just somebody would call up, even the CEO would call up sometimes the team and say hey, can you work on this? I mean it was like a typical startup kind of a thing. So sometimes that would just break the process because there is nobody who is really keeping the sanctity of the process. The hot fixes were not available to all customers. So if my Fortune 20 customer came and literally shouted down at the other end of the phone to my CEO then my CEO somehow passed down the grief back to the team then I would do that but then I have no systematic way to make sure that particular issue which other customers might be facing I can send it to them because we have to ship CDs those days so we literally had to wait for that. Sometimes a new bug might even break a hot fix because like I said the version control was pretty bad that time and if a hot fix failed in the field rollbacks would be very difficult so that was another problem for us. Difficult to estimate time to resolve the bug and give an ETA. Now the bugs themselves were not something that you can actually time box. If anyone has been in the business of fixing customer issues there are few things that you would know one of them is that they follow a stochastic bug arrival pattern they actually follow a poison distribution. So we looked at Scrum was known to us at that time in 2004 when we thought of it but we said hey we will not be able to splice down every single thing into a weekly cadence. We looked at Evo as a model and said hey every bug sometimes it takes 2 months sometimes it takes 3 weeks sometimes it takes 2 hours and severity and the time it takes to fix have no correlation. I could be working on a 7 which takes me 6 months to fix it or I could be working on a 7 3 which takes me 1 year but there could be a 7 which I fix in 20 minutes. So there is no correlation between these two. So it is very difficult to kind of create a time box way of doing these releases and say hey I am going to be done in 2 weeks. A story you can split but a bug you cannot split like that. So that was a kind of inherent issue that was there. The high priority bugs could arrive anytime theoretically because that is just the nature of it. Like I said it is a poison arrival pattern that is what a bug arrival follows from the field. The customer specials also could arrive anytime with top priority because anytime we had a customer who was bigger than us we had to make sure that we found a way to satisfy their requirements. So that is the way. And there was a high internal rejection rate of bug fixes by tech support. So 2003 is when we opened up the Bangalore Center the team was still coming up to speed on that and we did not have enough technical depth in the team to really do that. So we got about 6 or 8 guys from the US team who were in the process of moving back to India so that is how we were able to seed up the team and do that but still we did not have the right capability to fix the issues better and they used to be invariably rejected by the tech support team which was again in San Jose. So that is how it used to happen. And in 2004-2005 we worked on a new process we had this bunch of stuff there. I had just joined in 2003 and then I was busy in setting up the India operations and then as soon as we were done we started looking at our engineering process. So we actually created a very simple process. First of all we did was we created a dedicated customer sustaining team so we said these people are only going to work exclusively on fixing the customer issues. They will not write new features and create more mess. They will only clean up the mess which is already lying there. Because one thing we realized like in your teams who creates bugs in the product? Just think about it. We developers who create the bugs in the product and we get paid to do that. So we said we will pay people to fix the bugs but they will not write any new bugs there. And then we created a so-called new cumulative hot fix process which allowed us to basically exactly to the point you said improve on some of the configuration management things that were already problematic and broken there and improve the collaboration with all. We had the same version. We had multiple patches going on at that time and different developers. So what would happen is tech support knows I know if this is a particular thing on the device driver I will go to this engineer and that guy is sitting in Plano in Texas. If there is an issue on the ARM on monitoring then I know this guy is sitting in Bangalore I will go to him. Now what is happening is the tech support guy is kind of running a PMO operations there and he is actually sending the things there. Nobody else knows these developers don't know among each other and they will be overwriting the file because there is no separate branch in which we are really managing it. It was like a typical startup. If that actually makes you very curious I will tell you something. I am now an independent agile and lean startup and design thinking coach and consultant since last six months. I have been to some of the most successful and well-known startups in India and I have talked to them and you will be actually shocked to know that their process is totally broken. What I am telling you horror stories of this ten years back look like fairy tales actually in front of what I have seen. It's a miracle and in fact at one of them I was talking I asked the entire team of seventy people how many of the people here are doing test driven development they are refactoring their code they are doing CI builds not a single hand went up in the air and I told them that you are going to have your sales force moment. In 2006 sales force had a problem when they could not ship a single feature out for one full year because it was so broken their processes were so broken that they could not ship anything out for one full year and then they decided to go from zero to hundred percent agile in just one quarter. So it was a big bang approach they did but they had such a big burning rig that they had to take such a radical step. So I actually what I find is this sounds like a work of fiction but what I see today even in and I am talking about some of the best names that we know in India in terms of start-up space they are all driving fantastic half a billion billion dollar valuations lot of them is broken actually it's all shiny hood under the hood there's a lot of problem there. So it actually needs little more different type of a rigor to really get some of that stuff there. So this was Evo which actually like I said I attended one of the workshop by Tom Gilb and learned something about it of course we did not imbibe all the principles these were the ten principles that are a part of Evo we took up some of them so that gave us some of the opportunities to do that and like for example decompose by performance results and stakeholders sounded like a very agile way of doing it where we said hey every time you do something it's like two percent four percent of the features that you are releasing you kind of deliver the whole meal and not really half baked kind of stuff there. So we took some of the elements from there and then we started kind of doing it so this was one of the poor quality pictures from that time where we actually started looking exactly to the branching and said how do we really look at branching the code and start doing that and then we created this cumulative hot fix process where we actually started creating the branches in a little more methodical manner and said hey this is how we'll have the gold branch and then we'll fork out a branch for the service pack we were able to do them and cumulatively people are checking into the service pack branch so that what's happening is and again this was the time when we did not follow any test driven development there was no QA automation again the kind of work that we were in my QA team actually was if anybody is from networking space they will know what I when I say what I am about to say now my QA engineers were essentially all CCNA, CCNP and some of them were CCIEs that was the level of rigor that they needed developers knew how to write a BGP OSPF or some of these complex protocols but the QA guys had to simulate a topology in which they would simulate like 100 routers or 500 agents or whatever is the need of there so they needed to understand the network topology it was not just a regular black box QA where they could just do any kind of a testing so there was a lot of things that were there on the table so we needed to have a way to kind of manually do at that time we also changed the way we were doing the releases earlier we used to actually have a very episodic way of doing the releases so like I said it was a three months journey we would run for two and a half months and then in the last two weeks we would start integrating them together and like a typical waterfall run into all kind of problems we immediately brought it down that every week we were doing a builds now that itself was a huge thing for us ten years back today a lot of us would say what is one week build I am doing on the go actually right whenever I am doing my CI I am doing those builds but at that time that was a very radical thing for us to do but what made it really possible for us is something that I am going to talk about now so just pay close attention because I have tried to do some amount of animation here to show what the process looked like so this is how our process would look like we had the PMO sitting in San Jose and he was the person responsible for basically prioritizing and allocating the issues to the team we had a CST manager the CST manager would be responsible for basically working with the 15 member team that we had and these 15 member team was responsible for entire company's products and they represented competencies in GUI in back end and device drivers in protocols and so on and they had some experts in each of these areas right in theory we might say and that's to me a little more controversial because specially when we talk of agile and immediately jump to scrum sometimes we use the term fungible in a very derogatory manner it's like oh it's fungible resources I have a problem with that phrase because first of all fungible is a bad idea for software developers and resources is yet another bad idea right so I hate both of these things so these were all people who had proficiency so the people who were in the device driver team they were not the people who would understand how to write a windows code for GUI they were experts they were deep experts in device drivers they knew how to write fast ethernet or device drivers for wireless or 11abg or what have you so that was the level of competency and as much as we want to make people generalizing specialists and I believe in that by the way the fact of the matter was that these people had a specific competence no matter how much anyone wanted they just could not go and do that device driver work that was just the way it was so the CST manager had to continuously make sure that the work was being and then there was a QA team and this was in Bangalore as well the PMO was the only person who was sitting in San Jose except the tech support guy so the tech support was again back there rest of the entire team was in Bangalore now the work would basically be pulled by the CST manager from the PMO because there is always a laundry list of things that have to be done and the work would typically be allocated to the teams and then they would complete the activities and I'll tell about how they would do that and then that would go on to the QA team QA team would validate that and then go send it back to the tech support so process by itself was pretty simple but what we did something over and above that was something which helped us to achieve that what we did was we said we will impose a work in progress limit in fact we didn't know anything about Kanban like I said so there was no understanding of Kanban the only thing we said was one person will work on only one thing at a time that's how it started actually we didn't even call it WIP I used WIP here label when I started to give talk about that but those days we never even used that we said one person will work on only one issue at a time and when you are done with that you go and talk to the CST manager so you can choose what is the next thing that has to be done there so by default what tend to happen is that the work in progress then became I'll come to this one so the work in progress was kind of there and since we had 15 people the work in progress was 15 so it was a fixed work in progress and there is no way people could do that now you might say that's a bad idea anyone who has done Kanban will say that's a bad idea because what if I get stuck there I need to have some way till that gets unblocked I'll be able to do that now honestly speaking we didn't know any better so we just said it's a natural thing for us to work on one if we run into problems then we are going to change it maybe we'll make work in progress as two luckily that didn't happen we were just plain and lucky we were not smart we were just lucky that it didn't happen but this is how we really did that now what would happen is each of these 15 people is working on some issue looking at the issue there is no way we can say that any of these 15 is going to be done in one week two week three week four weeks it is all random some issue might take one week some issue might take two days some issue might take four weeks and so on so what would happen is as you can imagine if I want to continue the same cadence on the QA side of the equation I cannot really say that the work in progress will be that much so we created a different work in progress notion and we said the work in progress is one week that means whatever is the number of issues in that one week period that will be taken care of by the QA team consumed tested and then it would be sent on a weekly basis to the tech support so it might happen that one particular week I have two bugs and one feature request going into that the next week I might have four bugs going and the week after that I might have nothing going there so we created a time box here but the outcome of the throughput of that time box was not known it could not be ascertained we still followed estimation even though if you see in David Anderson's Kanban he talks about no estimation frankly ten years back there was no conversation happening that estimation is good or bad we just estimated if you ask me to do today I would still estimate it even though I would say I would follow Kanban the simple reason being like I said testing was an extremely involved process we had to set up a lot of test data generators and routers and all that stuff so if I have to do that level of thing I need to know when is the likely time I am going to need it otherwise I will have my tech support not being able to give the commitment outside so estimation was more a way to really forecast what resources we would be needing at what point in time just so that we could make more realistic assessments of that so then this would happen work in sorry I am sorry for the entire thing the estimation would happen for everything just so that we know by which time it is going to land into the QA's column so that they are in a position to plan for it yes but then what would happen is one thing might come take two weeks another is taking four weeks some is taking one week now QA doesn't know when can I expect it so if that bug requires a particular device driver to be tested I need to keep the setup ready now if I am already resource constrained then I want to be able to flag it up front and not really manage with the crisis so that was the whole idea there sorry yes no so what would happen is see let's say some particular GUI issue is there and that takes one week at the same time there is a device driver that takes four weeks now on the week one at the end of week one the QA would pull the work which has been completed by the GUI team and anything else that's been delivered at that time and then they would kind of lock the gates after that and then they would take one week to do the QA on that they would validate it if it is not acceptable they would give it back to the developers if it is good then they would send it to the tech support they would take a week and then they would send it to the tech support so this worked for us another thing which we had never discovered or stumbled upon was the concept of Q so we didn't really know what was a Q at that time and we just didn't have any Q so we did not really earmark something there which actually to me if I think of it it was a blessing because we didn't go with sorry it was a team that we made an agreement that one person will only work on one issue at a time once it is complete then only you can sign up for something so the WIP became a team agreement really the PMO the program manager the program manager was actually working very closely with the tech support the program manager had access to from the sales side of the equation who were the largest customers who were the most unsatisfied customers how long the bug is open so he had a complex matrix to basically arrive at that so this was a team which was only meant for sustaining but then in sustaining they were for the existing versions that were already in the market if any new feature request came they would also enter the same queue yes yes but then there was a new no no so this was a team 15 member team which was only looking at the products that had that were already out there in the field there was another 50 people 70 people team which was working on the new products that would be released in next three months six months nine months time period so they were no so then like I told you earlier see what we would do is because we had the we created the cumulative hot fix process so what would happen is the the SP branch would come and then if you if you see the gold branch would continue there for the next version and when the SP branch with the service pack branch cumulatively we are putting all the cumulative hot fixes one two three after that let's say we say we are done with that we might do a regression round and we release the service pack and then at that time we carve out a new branch and roll all the changes back into the main line here so the main line continues there see what was one of the bigger problem was people were checking in some of the raw code inside the main line which was then interfering with some of the new features that we were developing so then we said let's separate this service pack is happening and then we merge it we still had challenge but what would happen is cumulative hot fix process made sure that when we were doing a weekly here even though it was not a full blown regression a lot of times they would circle through the issues there on a weekly basis so by the time the service pack is get the service pack branch is getting merged into the main line it has already been tested multiple times so most of the major issues would kind of be discovered by the time and done that yeah sorry this was a separate QA team yeah glad you asked that we had a separate development team 15 people and we had three or four people QA team as well which was dedicated only for this activities yes yeah because the QA were only responsible for the black box QA the developers were responsible for any kind of white box testing that they had to do the developers would the QA was QA was a small cross functional team people knew that the like the wireless somebody knew the Ethernet somebody knew this and they would pull in the knowledge and do that and what we found more was by trial and error that that small team of three to four people was reasonably because at any point in time this development teams throughput was only in on an average on a weekly basis three to four bugs so they were able to kind of do that so they were actually they were more fungible the QA team was more fungible that in the in the sense that they could adapt to different thing rather than the Q development team any some of the products were similar for example the device driver if I change let's say in my wireless card I'm changing the device driver that is not just for one product it is across five products that same device driver has to be now tested yes not not really they were very clear that hey I mean like most of the time it would not be the case because at least like a device driver and GUI were two different things most of the time there might I'm sure they must have been some random thing but I would call that out as not significant number what's that so if the QA team so when we initially started the process what would happen is the developers would not be able to pick up the new work till the QA team has certified that this is okay I can send it to tech support only then they would sign up for a new work yes but but the problem is the I mean that's the that see over a period of time what happened was in fact QA guys knew that there was even like few of the guys they said you have done it I know it's going to work see the process it to me process is not something I'll tell you now this this was a team with zero iteration see that's the point our mental model is first of all we say that sustaining is not the glamorous job nobody wants to do that and we said this was a team that I managed for one and a half years with zero iteration in fact we had to dismantle the team later on because we ran out of bugs and people were not very happy with that because they lost the identity they created an identity which was all over the company there was a this was like and of course I was their cheer leader basically because my job as as head of the operations was to make sure I'm cheerleading for something so they were very unhappy when we had to do that not a single person left that team actually I'll talk about that so this is how we did that basically a very simple way where we managed there so there was no real Q in that sense in terms of saying that hey this has to be the next we were really very dynamic so the PMO was basically always looking at and constantly prioritizing like a product sorry you had a question as well probably yeah yeah possible so we would we would that point in time till we have found a solution that works in the customer premises we would not roll that into the main line so our strategy was with this was we would only roll those changes back into the main line once the tech support says good to go customer says I'm happy with that then we would roll that in because we realize earlier that that kind of thing also created problems for us because we were too too soon to roll them into the main line on the main line that's possible and that's what we would sort out during the during the integration of that dot release and that QA so they will go back they will send it back it again goes back now you again have to assess how much time it's going to take you literally have to decide because that guy says hey I didn't take care of that maybe it's going to take another two days if QSA I can still accommodate that in that cycle because my test setup is ready if you can give it to me then it's good enough because we were a small organization we didn't really have to but if the QA said hey I'm already in last two days I don't want to take this it's the risk is too high then we would move it to the next week every week yeah so every week there was a lag between the what the developers are doing and the QA is doing and then there is there is an outcome that's coming out it was like a wagon train every week the wagon train is leaving and the wagon train has a work in progress of one week and after one week it is delivering to the tech support now tech support would also do some QA maybe half a day one day two day two hours and then they would release to the customer yes yeah yeah correct yeah I don't remember what we were using that time maybe we were on DDTS I think at that time it won't come in the priority if it has it doesn't have the right business priority so we had six or seven parameters the volume of business the fact that this was a for see not all customers are equal as much as we like if you don't like to hear it the reality is not all customers are equal some customers are more equal than others so if there is a customer who is giving me 30% of the business I'm probably more likely to understand their pain point and find a way to do that so even if there is a 7 that's coming there maybe I'll have to find a way to do that so the PMO was responsible because he was the most business facing when I say business facing inside the organization he was he would talk to the field guys he would talk to the sales guys he would talk to the CEO he would talk to if the CEO somebody picked up the phone and called the CEO and said hey I'm not happy with this he was the guy who would get to know that so he knew and we all trusted his process nobody tried to second guess and said hey Don why are you doing it his name was Don he said why why this bug should not be here we would go with Don because Don had a transparent process and that used to work so yes because these were all individual things so typically what we have found is at least the way like in the last 10 years now Kanban has got more mainstream people are using it for more development as well 10 years back whatever people were beginning to do was really this kind of pattern where individual even though I won't I won't really say that there was no interaction sometimes people would talk to each other this team was all set up ground up actually they were not people with deep experience in the product like I said 2003 is when I set up the group 2004 is the story so they were still learning a lot about the product so they were experts on UI product back-end database device driver but you are right to the extent that they were working independently and that kind of worked pretty well for us but it's it does not stop us from really making some kind of a crew like a feature crew or some two or three people working together so that they could work on on a single Kanban item that is also equally possible in doing it I'm sure Mahesh is sitting here and probably he would sorry if I just put you in a spot but I'm sure as maker of Kanban tools they would have that ability that it's not just one particular individual pulling the card but there couldn't be a group of people also pulling the cards when they need to do that so I would probably look at that so let me just complete what is what's there so what is happening is it's not originally stated as the vision or goal but work in progress was being limited to the number of team members so that's how we ended up accidentally doing it at any time one developer is assigned only one piece of work thereby achieving a one piece flow so we were not really having kind of episodic flow it was one single piece that was really flowing through the system the new work is only assigned when the current work is completed or cancelled or stalled and the team member is available so if the work was likely to be stalled for a longer period of time then we allowed people to pick up another piece of work but if it is one day or two days then people were expected to kind of well either learn more about the code or find some way to code review or do something but not really pick up another work there there was no wait state or switching cost at an individual level so the context switching cost that typically happens when people are switching between the work did not happen we had a smaller treat time for bugs in contrast to the lead time for a service pack the lead time for service pack was 3 months but what was happening was that 85 90 percent of those bugs had to be completed within one week two weeks three weeks but they had to wait because of the nature of the three month episodic release there that we were able to improve the process of allowing weekly deployment of each of the hot fixes that happened and finally the flexibility gained was not a zero sum game there was no penalty on performance in rest of the process so we were able to do that so let me show you a few graphs and then I will stop how did this move the needle because sorry it picks up a bug and starts working on it that is their individual discretionary effort we have nothing against it but they cannot shortchange the commitments if they have made a commitment on something that I am going to deliver it after two weeks they are not allowed to decommit it just because they feel they want to work on something else so we did not stop people from picking up some new piece of work as long as they did not go back on their commitment part of it if that is what yeah okay let me show you some of the graphs that was there so we moved from service pack to cumulative hot fix while maintaining the high quality so if you see what is happening is let me see if I can show you this blue line here is actually the failure rate and these yellow ones these yellow ones that you see are the ones that is actually the number of hot fixes we are doing so for example here in 2003 onwards when we were doing we were not doing too many hot fixes we were doing more of service packs in fact if you see the first one it was all service packs here and then we gradually move here too many service packs and eventually started doing more of hot fixes now we had a major quality problem at this point actually when the failure the acceptance rate was only 80% but over a period of time we again improved it and we were able to maintain it to almost 95% of 95% of the times when developers completed a bug the QA guys did not have to send it back RTQA the return to QA did not happen they worked the first time and they were able to send it to them increase in the number of bugs with homes we were able to do so if you see here the number of bugs that had homes was only about 50% by and large which actually came down even beyond it but over a period of time we were able to do that and we were able to our goal was 80% of the time it should be with a home which we were able to accomplish reasonably well total bugs open was an interesting number because if you see here this is all actually 7, 1, 7, 2, 7, 3, even 7, 5 all put together across all the products this is what we were going into so we had about 120, 140 bugs open and then we started working on it because we went so the classical thing of go slow so that you can go faster work for us we were able to reduce this and we were able to come it down actually this number does not tell the story because this is 40 bugs across all the products in the company that were open at that point in time which essentially meant per product we had only single digit that none of the none of the product had more than a single digit of bugs actually they were all under 10 that was the point till which we tracked it and kind of there days open I was telling you if you see the average days open of all the bugs like here you will see the horror stories actually the average is coming down to about 300 days to 400 days kind of a thing with different 7, 1, 7, 2, 7, 3 going on which we were able to bring it down to about 125 days now you might say that that still sounds very high but compared to what we were doing before we were able to dramatically bring it down it's almost 100 days and that was that was within the acceptable ranges actually people motivation like you were asking about it started with 16 people development team zero attrition in the team once the backlog started coming down the engineers were ramped down to the team to do new features so what we did was we said hey this team is not required to have 16 people anymore because the work doesn't exist the team was created to solve a problem that didn't exist anymore so we said we started ramping it down at some point in time we had only 6 or 8 people left in the team in which case we said let's roll them back into their respective development team so we dismantled the team because the job was done so this is how we were basically able to do that and do that I will have to skip because I've almost come down to the time but I've just added in the deck actually a little more about Kanban this is later day Kanban which when David Anderson's came out so basically visualize the workflow represent the work items and work flow on the card wall we didn't do any of that thing because honestly these principles have emerged much later on limit work in progress was something that we accidentally stumbled upon because and that's one of the big things there measure and manage the flow well we didn't explicitly do anything to manage the flow we were just tracking the bugs and we were just making sure that we were sticking to the work in progress make process explicit everybody knew this was the hot fix process this is how you have to check in this is how everybody is allowed to only take up one at a time this is how the program manager is going to allocate the priority and so on use models to evaluate improvement opportunities so we basically looked at improvement because when we said hey that many bugs are not there we let's find a way to optimize the team and we started kind of dismantling the team so just to give a point of view why Kanban is after engineering I think it's important it makes sense not to build features that nobody needs right now don't write more specs than you can code don't write more code than you can test and don't test more code than you can deploy a lot of times we make them a classic mistake that we actually write so much of code then we cannot test it it's lying there and then we test it and you cannot deploy and when you deploy it nobody is able to use that so at least we were able to do that I'll stop here on the last slide just to tell you what did we learn the process improvement driven by business needs not because some process looks sexy I think it's important and many a times a lot of us really read the airline magazines that says hey this is a new version of agile go and do that and the CEO comes back to the office and says guys let's do this kind of a thing I think that is all nonsense we should solve the problems that to me is most important so drive it by the business needs rather than saying that hey I want to be first in the race to deploy that process don't let a process limit your potential think beyond gurus like in our case honestly speaking we believed in the crowd source knowledge that we all represented as developers and test engineers and product guys we said there is no process we will create it we did not blame the process we did not adopt a wrong process we just tweaked what was the most natural thing for us to do so I think there is a need for us to really recognize that we it is all common sense and we can we can actually do a better job than that and don't let the absence of a process limit we will do whatever it takes to serve the customer better so these were the few things that we learnt out of that process like I said it's been 10 years in fact most of the people in Dev and QA team in that company are still there I was the guy who ran away from there but most of the people are still there and this was a great study for us in human motivation as well because my mental model at that time was also people in India don't like to do sustaining people in India don't want to fix bugs and test it all the time but we created a system where people were not only willing to do that they technically were challenged but they did not even leave the team so that's how it was I've run out of time I've taken 5 minutes more so I'll stop here now if there's only one or two questions I can take it meanwhile the next speaker I would request them to come and kind of set up if they want to but till the next speaker comes I'll probably go on so it was all manual it was basically manual we had to scavenge the hardware software and do that so yes you're right but we had to do that yes so there was no pushing into production we literally were shipping the bug as an exe I mean as a as an executable file tech support would either ship a CD or they will just email it to them there was no drop box so they would just like email it to them yes we because we had a problem the problem was that we were we were looking at this pile of increasing pile of bugs my SVP in San Jose he told me it's a TV you guys have to fix this issue this is to for me it became the top priority I literally would have lost the job if I if I did not fix it so that was the business urgency for us to do that and I was chartered to do that I joined the company to do that but I was not able to do because the way we were structured so we had to look at the org structure of separating out these two things create a different process create a different tool look at create different policies to do that and then it worked for us it didn't work by only changing one piece in that jigsaw puzzle ok I will stop here thank you if there is anything you can talk to me