 And, the change that comes in there may invalidate the design that I do today, that is the fear. Of course, the you know the policy here is that design is this thing that is cast in stone and now that it is written down according to law it cannot be changed except by an act of parliament that is not true, but that is the way we tend to fear design changes. Now, if you take it and say that ok let us be a little more challenging and say what about architecture changes can we do them? The question is much more frightening. So, that is the story I am going to tell this afternoon. So, and along the way I will also make one or two disparaging comments about people who go for sprint zero ok. So, if I insert some of you I am sorry, but the whole idea is that let us try to learn from it ok. So, architectural hiccups. So, this is a story where I went to a company and I was doing some little bit of consulting for them and this company was basically building what is called an MMOG. MMOG stands for massively multiplayer online gaming. So, this was the product ok. So, I will give you a little bit of background and then we will go through the story alright. Now, one of the reasons why I picked up this case was lot of things were in place. So, it was not a total mess it was not a disaster. So, this is not a failure story neither is it a complete huge success story is just a real story where some things went well some things went badly and not all the wisdom that was available let us say in scrum was applied and maybe that was a mistake. So, let us see. So, this was a Dutch company very well funded they had lot of money and they were you know there was a time I do not know if it is true even now, but US loss or you know even in India there is some loss that there is some sort of gambling and online betting and things like this. I am really not an expert in this, but Netherlands at one point was pretty much a lead leader in this kind of software which which allowed lot of people to play poker online and do betting online or things like this. So, this company was basically wanting to build an online poker market. They were already an online lottery child and interestingly you know where they place the servers they place them in Gibraltar. Do you know where Gibraltar is? It is in the south of Spain you know just across Africa. So, that is where they placed it all. So, and the whole idea was to build a complete offering of online betting especially for poker and a bunch of further games or card games and open up open the environment to various professional players who may who may living out of playing online. And the idea was first version in 12 months and they did consider a bunch of different technologies here. Strangely the client was bespoke C++ application. They had the server in Java which is not very not very common combination and they had their own protocol to communicate between the server and the client. So, client is where the poker player would log in and start playing and they use some you know their own testing script. We will talk about testing a little later again. The server was basically online the leaders platform with my SQL database. Database was actually not a very important part of application per se. It was just for recording all the moves that were played and then there is a rapid MQ message. RedWolf was actually a particular Java based server framework. So, this is not J2V. It is nothing to do with any of the J2V stuff. So, server based framework which is able to handle certain kinds of asynchronous transactions. So, that was I think the name has even changed. RedWolf is no longer there. I think it is called Dark Star later or something else. So, some of these a couple of these technologies are not even available now. They have been evolved into better technologies. How about the project background? A few things here. Though this is a JAL conference etc. What is the project scrummy? It was not very scrummy. They had no Strum Master, they had no product owner, they had you know they had stand ups that is ok, but not really anything that you would recognize nor mainly as a good scrumpting. Nothing of that sort. Did they have the right people? And this is and this is one reason why this is a good case study because they did have the right people. So, one of the lead architects in this, he is a friend of mine and he worked along with me when we worked in Iona Technologies. So, how many people have heard of Corva? Just show of hands. Ok, a bunch of people have heard of Corva. So, Iona Technologies was the leading winner of Corva. The reason I mentioned this is because Corva is this middle where that helps distributed applications talk to each other even if they are in different platforms. So, we were the team that was the engineering team that built Corva services and maintained the Corva product. So, we understood distributed applications very well. So, he was the lead architect here. He had also earlier worked in another online gaming company. So, that was great background. We had other people who also knew something about extreme programming. So, we had a very good team, it was a good strong team. One of the lessons I would like to mention is you can have a good strong team, but you can still make plenty of mistakes. You can still make plenty of mistakes. So, the people were not bad even looking back even in hindsight, a bunch of people were good. Prioritization what is that? So, the thinking of that point because. So, remember we came from an extreme background many of us came from an extreme programming background. Extreme programming does talk about prioritization, but it is not as clear and as insistent on prioritization as from us. So, somehow prioritization is not something that we are really paid a lot of attention to. Because in extreme programming you do pay attention to it with especially if your system is on production, but we are 12 months to go there was nothing on production. So, we did not pay attention to prioritization. Did we know exactly what to build? Yes, we thought we knew exactly what to build who is we not me the people in the team as well as the business. Because they have been in this online betting and various applications. So, then you are exactly oh yeah we know what to build. So, interestingly the team also consisted of a couple of professional poker players. So, it is not just developers and testers, but you had a couple of poker players over high to play all right. So, that is good. So, why is it relevant to a conference like this? Because one good thing was because background people we did everything in an iterative and incremental value. If you learn nothing or experience from it is ok as long as you realize that all software development is essentially iterative and incremental. You are understanding of what you are going to do you are understanding of how you are going to do all this improves in cycles. It does not come one fine day and then everything is just a it is not there after that you switch off your brain and just do it. So, it was all incremental and iterative. So, that was the background all right. So, let me just take you quickly through how the application looks like all right. These funny faces you see here on the left they are then they are meant to represent poker players and they are with cards and they are accessing this server over the internet. Yeah, they have got it down on the time first. This is the backend infrastructure of that application and basically have a database which is just there to store data it is not really to provide any transaction support everything happens in memory. So, you want a tournament server and a game server you have got a player portal and a lobby server and these are connected to you know a messaging service. Just to spend a couple of minutes on this the game is played on tables and you can have. So, that would be one round of game. So, that is provided by the game server. You can have a bunch of games played across different times to make a tournament and you needed some server to take care of it. In this server you could play multiple tournaments at the same time. So, the tournament server would use the game server to provide a tournament interface to anyone who is joining the game and to join the game they come to the these people they come to the lobby server. So, just this is what this is all PowerPoint this is death by PowerPoint. So, let me just try to show you some you know real screenshots. Can you see that? I mean you can see that, but can you see what is written probably not anyway. So, on the left side the person will come in and this is the lobby and they are seeing a list of tables and you know they can they can choose games or tournaments from the left hand side and the bunch of game tables that they can choose. And once you choose you can also get to you can get the details of the seats available bunch of things. And you know the professional poker players they they also know who the other good players are and whom to play against and they there is a huge community there out there and they tend to make the choices according to who else is there on this. So, that is that is the lobby. So, these are screenshots from your gambling sorry your gambling. It is gambling yes it is gambling and the reason the reason the service is in Gibraltar is because of the laws there which mean that you can place your servers in many countries they do not allow it. So, anyway I think that is a separate conversation you can have if you are interested in gambling. So, that is that is the lobby first this is how game table looks like. So, they have got names and maybe you know some numbers which represent the amount of chips they have or the money they have and that is that is the table and so on. And what you see on the right is a chat window. So, there are also chat windows it is all simultaneous it is a part of the client. So, this this is how it looks like and this is the this is where you can see information about tournaments and so on. And this yeah it is a bit last I was trying maybe I do not know, but this is where players can make that selection on which which game to play all right just a few screenshots. So, let us look at what was known to this team when they started. So, when they started on this project by the way they did not have very strong project management the process or anything like that. So, when they start on the project the you know the lead developers in the architects etcetera they knew they knew most of this. These players who are playing they are serious and there is a lot of money that is being bet on them it is easily in the orders of many millions every day. And if you are bringing a new product into the market like this the reputation is very important because quickly players find out that it is not good it is not reliable they will just desert it and they are going to talk to each other. So, it is very critical that if this is made on you know is open online that it is really stable and robust. Someone used to be robust it is a network radio you know it should be able to recover from network errors. It has to support for statefulness because every player in the whole game is a series of states and even if someone is disconnected they need to be able to go back I mean within the rules of the game and we need to record the state. So, it has to support that as well. It should be easy to interface with external system I think this is a bit of a wish list because as far as I knew this is an interesting point because the external systems they wanted the plan to interface with did not come up and external system they never thought they would have to interface with did come up. So, I will give you an example of it a little later and that you know that should they are planning basically. It has got to be highly threaded you have a minimum supportable concurrent player base of at least 20,000 people simultaneously remember it is it is also what you call graphic. So, all these messages have a lot of content in that as well and logged in players don't always play but add lower it is a bit like Amazon if you visit an Amazon site even if you don't buy anything you don't add anything to a shopping cart you are already in a transaction because if you come on thing it tells you ok people who have shown interest in this have bought other items like this. So, even if you don't do anything on Amazon you are already initiating some kind of transaction. So, like that people who are not even playing they are just watching the logged in and watching they are putting load on this system because the system has to keep sending them messages ok. Login players don't always play but add loads here in order of messages exchange between client and server is important because that is how you the game follows ok. Are you with me or have I completely lost you? Oh you are still there ok. So, this is what was known at pretty much the starting of the project. The next slide is more requirements more things that the server must do but most of these were not known or at least people didn't think of them when they started. So, these were discovered by and by and these are some of them are design aspects some of them are architecture aspects but basically what happened was we needed randomness and it has to be proven to be random because the system is going to deal the cards for a game and it must be random and it must be proven to be random ok. Now, when this system was built the first of what three four sprints you know they had a simple game to be played they just used the Java library to generate random numbers and that seemed fine enough but later the business said I will be not too sure about whether this is ok it is probably not good enough but we have to show that it is right now and probably we can't do this. So, I will tell you the story in a second that that created quite a bit of issue. Then the other interesting thing is arbitrarily fired time worms affect software behavior I don't know if it is uniformly true but you know there are timers in Java you can set a timer and then you can pick off an event the time of guarantee is that the event will start as suppose you set a time of two minutes what the time of guarantees is that it will not have before two minutes it doesn't guarantee it will fire at two minutes it could be two minutes or it could be two minutes one second or two seconds or something. So, what the problem that this generated is that when players are playing you cannot wait forever for someone because it is an online game you can't wait forever because one person can bring the game to a halt or a tournament to a halt. So, you maintain certain time limits and you try to you know you try to use timers to enforce these time limits but then they also create problems because they don't quite fire in that time. So, that's another issue that came up clear my login just to watch others and network collection can drop at any time well this is this is something that should have been anticipated but even though it was there as requirement earlier I remember this was the network connection being dropped because this created a this was a big bug that that killed one whole sprint. So, if you look at the previous slide you we said network ready right and here's the thing we all said network ready and everyone nodded their heads as if they understood what it is but when that actually happened then people oh is that so it's it's very strange it's all it's about communication sometimes I feel you know we hear things but we don't hear them. So, anyway players may decide to sit out and rejoin at any time. So, you also have how to provide facilities where people can rejoin so that which basically means that you need to keep track of their moves and their money as well and multiple simultaneous tables multiple simultaneous tournaments all these you are supposed to support using the server and the last but not the least the game history must be retained it's a legal requirement because you are distributing money you are collecting and then redistributing money and if someone has a question you it should be auditable. So, these are some of the things that came into the requirements things like game history must be retained things like arbitrarily fire timers and randomness these did create problems for the architecture as well. One more thing that is not there on the slides is the whole idea of timeout and you know that was not a part of original design and it cuts across all screens. If you are there on the screen and not doing anything for a long time you are supposed to be logged out and remember this is a custom made client. So, this was not designed earlier. So, that was another problem. Issues with guaranteeing that this we can we have delivered these requirements basically to prove that would require one to show test results and strictly to test is strictly to test software which is asynchronous we have to show that many users are playing this game according to the rules of the game functionality also has to be maintained is asking tester and then it is also asking testers to be good and bad poker players. The problem of testing poker is people also very often have to play simultaneously. Can we do that easily manually? Now, can we do that easily automatically? There is no easy way there is absolutely no easy way. So, this is something that the architects have to take care of upfront. I think we have discussed some of these things. Let me go to the next slide and run you through an example of asynchronous behavior and try to tell you the kind of issue that had to be solved and this was not anticipated. This is not what was anticipated when the project started that asynchronous behavior has to be handled. So, let us say there are four players you got player one played the step one is ok player one started step one is player two P to place and then step two the P three is either disconnected or just sticks out of the game ok. Then step three player four is able to close the game now there is a new hand that is dead there is a new round a new hand is dead. Now, at this stage P three is now either rejoining of the game or is reconnected P three does not need to know what is all the things that have happened in between because those messages if they are just simply queued then they will just irritate the person. So, you do not want them to be queued you want the latest message to be sent. So, that is what the server has to do for example of the synchronous behavior. So, basically this there is a mechanism by which the server is sending messages the server is handing control to the client and then if the client does not receive any more messages the server is only has to send the latest state to the clients ok. So, this how did they handle this well fortunately what they did is they they managed to fit this in because it is a custom build protocol they were lucky in this they actually managed to build it into their communication protocol or as an application construct it is their own protocol. So, they basically said that we will have a code by which we will understand what is the last move that you received or the last move you did and the latest we have and if there is no correspondence then we just send the latest otherwise we will send replay some of the old ones. So, these are some of the things that they did to make this happen. Some of the problems detected along the way disconnected player was not able to log in at all they were negative balances for players if a if a player's connection to the game server is lost his balance of money was not returned and this is not fair. So, the business did tell that you know you are supposed to return his or her money because they are not being able to play the entire round the poker rules to raise the stakes they were not applied correctly. So, a bunch of things that were not not being handled properly. So, tournament ended abruptly and the winners are declared even though the tournament was actually put up in between. So, bunch of things that the team discovered along the way most of the solutions of this was provided by the teams themselves, but the critical thing here was the communication between the teams and the business owners was good. So, that is what Scrum says in terms of a product owner there was no product owner here. So, what happened was the business did manage to tell the developers which kind of mistakes are ok and which kind of mistakes are not ok. I think at this this is somewhere in midpoint of the project I think at this point the business is understanding that not everything is always possible. When I go to projects and look at companies sometimes I feel business or on a stakeholder side they feel that everything should be possible cheaply or any business requirement can just be taken care of. It is not always true and unless you have this collaboration between both sides I think there is more conflict than resolution of this conflict. So, here what I am saying is that the tournament ends abruptly and especially if a game ends abruptly. So, business so the developers explained we cannot guarantee 100 percent of the time that the server will never come down or some part will go off the rails we cannot guarantee this mostly it will work. But what we can guarantee is we can always keep a guarantee of that we would record a log of what has happened. So, that we can guarantee but we cannot guarantee that every game will end successfully. So, then the business said that is fine then what we can do is if the game does not finish it, but it is something happens in between as long as we can simply return all the money that people have put in the same money back it is ok. People do not like it, but at least it is acceptable they do not end up with negative balances you know some wrong winner being declared as long as that does not happen it is ok. So, this is actually not very consciously, but it is actually a prioritization exercise that did we are going ahead about testability a few things a few of the steps that were taken to increase testability one of the main things is actually use the second point impregnate code base with event watchers. So, you actually have the code base they used some in some pieces of code that you could inject while you are doing testing to look for certain events that were fired this is basically based on the game and this would help us run multiple tournaments and see the list of set of steps it would help a lot in testing remember in testing you have to prove that what you are dealing of hands is random if it is random when you run it again obviously will get different numbers then how can you prove that the first time it was random. So, this is a place when they got into this whole discussion then there was this external stakeholder who talked to some auditors regulators of online betting and they actually said according to certain standards you cannot use the javas random number generator. So, they actually had to buy a piece of hardware that they plug into the server and this hardware generates the random number now what happened was whatever server that they had built it had to pull from this hardware device. So, what is the effect on the architecture the effect on the architecture was earlier where a player was coming in or a new game was starting the server itself could simply hand over a some number and keeping track or running that game again simply meant that you start the same server with the same speed but now you cannot do that anymore because in real life you will have to ask this hardware device but for testing you could still use the same one. So, they actually had a switch within the server whether it was for production or for staging if it was production you would call the hardware device you have to get it from the hardware device. Now, this in the hardware device was very fast but still you have to make a call and that was that was a J and I thing they had to write something in the native interface and when they make the call they get that value and then pass it to the game server. Issue is this the game server is not handling just one game it is handling multiple games. So, now the architecture effect was the game server had to now know when it is getting a request for a new game and when it makes a call for a random number. So, here the random number in this case is not just a number it is a card one of the 52 cards. So, get that you have to know to whom to give it. So, if 10 of you are asking me I when I make calls to the hardware device I have to know to which of which of you I have to give this guy you have to cut I have to keep track of this. This basically took them believe it or not it took them just a couple of days to implement, but it took them three sprints to test and show that it is tested it took them three sprints to do it. These are few good programming habits carefully structured log messages maybe this is a bit similar to what Joshua was speaking in the morning. If you want to make things safe or maybe I am making it to grand, but if you got too many log messages out of your server and there are many people doing you know firing a lot of events you will not be able to keep track you know it is not more rubbish you don't want too many log messages you have to be very careful with your log messages. So, there were some careful structuring of log messages there. So, these are steps taken to increase testability. Actually when we did a whole retrospective team one of the questions we asked is what if you did take these steps for testability. The other one that I am not sure I don't know if there is a slide. Yeah ok in this slide as well if you look at the left side on the top you have your bots. So, they have they actually made little virtual robots I mean software agents that will play the game. So, you could configure it then to play poker game as good players as bad players remember all that the tester had to be sometimes a good player or a bad player. So, you could actually set some parameters and you could play you could get these bots to play multiple games simultaneously. So, one of the questions we asked the team was suppose you had not done many of these things up front how would that have worked for the project. And it was very clear that doing this helped the project multiple ways because they is sometimes remember some other projects. Remember when is the time we do testing if not in Scrum, but in the general waterfall what is the time we do testing? At the very towards the end what is the time we don't want to find bad news what is the what is the time to find bad news towards the end. Well, that is when the pressure is highest. So, if you don't have this testing infrastructure and then you are finding bad news how would most people react to it? They would react to it like headless chickens basically. So, it gets into the death march mode. I think let me come towards the close of my talk because there is a couple of things on testability that we have discussed already. So, this is a hypothetical question because at the beginning of the project then it was not a Scrum thing, but just for a coffee table discussion we did discuss what would happen if they had been spread 0 because later on in consulting I went to a lot of companies I am trying now trying to bring the lessons of this project to other projects. I go to companies and when they start on Scrum they often talk about let's have a sprint 0 trust me on this there is a true story. There is a there is a team I coached along with we did the coaching for a whole release when the next release came sadly they started with no they started to sprint minus 1 it was even worse they started to sprint minus 1 and then sprint 0 and the funny story is when they did sprint they were we do they sprint 1 then they got news from their customers in the US that something else is important. So, all the planning so basically they use sprint minus 1 and 0 to do a bit of planning and you know gathering some courage or having some plans or some crutches to go and do the rest of the release. They did all that for 2 plus friends and then the news came that we have to implement something completely different. So, why do we start with sprint 0? Why does only projects have sprint 0? Well, we would like to think that that is because programmers like counting from 0 they do not count from 1 like normal people like for 0. So, that is why we have sprint 0, but the reality is sad it is much worse we start from sprint 0 because we are flat footed lily lily bird poldrums who do not have the guts to really try something and the idea is you really have to try something get a result if you are good or if you are the right people you will get some result it may not be the perfect result, but then you have to inspect and adapt your way out. So, the question is if there was a sprint 0 what would we have done? We would have done nothing we would have learned nothing because we would seriously do any work because serious work is for sprint 1 onwards minus 1 and 0 is not for serious work it is just something you do to prepare for the work. So, you do not really work. What will you do for sprint 0 minus 1 for second release? I it is a very painful story because I we coach them for a whole release and then well I think ok that is a good point yeah actually it is there next. So, instead we should take a cold hard look at where we are on the ground. Do you know do people know what a stacy diagram is? So, ok oh great imagine an x and y axis 0 0 everything is simple on the y axis let us say it is the it is the technology domain it is a technology thing and up there is uncertain that is we do not know what exactly what kind of technology we will use or what kind of approach we will use technically on the x axis again 0 is very simple we know exactly what we are doing what we want to do and away is complex that is we do not know what we want to build that is the domain of functional. So, stacy diagram is basically one is technology one is. So, one is what and one is how. So, when what and how both are very simple we know what we are doing and you know how to do it because many of us have done something before then it is ok we do not feel afraid. So, tomorrow if you are asked to do a little web page that just you know it is a static web page you are not afraid you just say yeah ok I will do it in a day or whatever and you do not give much hard. In the stacy diagram and both are complex when both are complex both are high then we are very unsure and we do not do this consciously. So, I think when we go into a new project we should look at the stacy diagram and you have to look at where we are on the stacy diagram. Now, what is the relevance because if we know we are high up there we are not sure of what we are going to build and we are not sure of how we are going to build then we do not need these numbers minus 1 and 0 and all this. We will say let us do a sprint let us look for a result let us try to build this thing and if we cannot build it there is no shame because we know it was not going to be that easy anyway. If anyone is following the news remember the you know the probe on the comet yeah you know there was a lot of suspense whether it will land or what will happen but if it did not land no more blame anyone. If you think about Josh's talk we do not fear failure actually we fear blaming. So, if you know that everything is there it is complex it is a complex thing we are attempting there is no fear in that. So, it is ok. So, we look at the stacy diagram and then we say ok maybe these first few sprints maybe we got a fail not a problem but let us actually try to do something get a potentially suitable product out that is the main yeah we have questions. So, they were not sprints like like a two week traditional scrum sprints because there is no scrum but it was very much iterative it was every month we had to do a demonstration and so on. Yes sir we did we never call it sprints yeah yeah yeah. How do you know what you are thinking when you try to find something that you should ask. No no there was in this team there was no such thing because first of all in this team the team had many people who had done a similar project before. So, there was no fear in this team the point is many of the team members have gone to other companies and then we see you know. So, we go to other companies and we find performance worse and so getting results of this fear and having a sprint zero and all. So, this is lesson and not the sprint zero this guy this particular slide is not about this project it is nothing to do with this project it is. So, the case study is to show that architectural changes can also be handled may not be always easily, but you need not have the fear the other thing is if architecturally we have to do something new you may not have the fear the other thing is sprint don't think the crutches of sprint zero that is the message of this slide. No no no No sprint zero is actually an anti pattern and it is I think an outcome of most people's lack of confidence in getting something done. So, that is that is because you know, but they do not want to say that they are not sure etcetera. So, they try to give them some breathing space that is what they do. Can I can I just can I finish this slide and we take questions and how is it working here. This actually just one most might. So, so the experience of this project in general it reminds us that first of all it is nothing to do with sprint zero, but testing cannot be an opportunity to do something that you think about very early even if it is about architecture because mostly architectures are very far away from testing and the parts of testing it is very important it is a role collaboration you must think about if I build the system it is supposed to do all these great things and are my architecture is supposed to be great how do I prove it. It isn't a holy cow it can pass two points. So, don't be tentative in action it is a kind of team of this conference action precedes clarity. So, don't be tentative don't don't fear failure just make sure that there is safety just try it out learn move you get some result one trick we didn't do in this project that was bad was we didn't prioritize because had we prioritized architecture properly I think they would have been less pain in trying to handle it again don't be fooled many of these architectural changes we could handle but it was not without pain there was some pain, but the pain would have been less had we prioritized the architectural needs better earlier on we didn't do it with that I close the formal part of the talk and open to take questions. So, there are some questions yeah. . . . . . No, so the thing is you know this is playing on our mind a lot that we need a string 0 because but if you look at the Ralph Stacey Diagram and have an open conversation with your stakeholder and say but in the Ralph Stacey Diagram at 0, 0 and then you say that oh you know we are not sure when would you do a profile concept when you are not sure right so you say you are not sure let it be if it turns out wonderful that means a great success either we are lucky or we are very good or both so that's ok but people need to know that if you are attempting something that is non standard that yeah well if you not work out as you thought it may be. Organizations need to just you know be comfortable with that thought any other question yeah . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .