 All right, so we started, I think, about more than five years ago, and we have been keeping continuous core three-week sprints doing agile, right? I guess all of you must have seen this one, right? It's agile manifesto. This is where the story started. And in this talk, I'm really going to focus on responding to change, right? At this point, I want to understand from you, how do you recognize that there is some change happening to your requirements? In your experience, what do you do? How do you recognize change? Anyone? Right? A request from a requirements team. During testing, you talked about customer behavior changing, right? Sorry? Customer feedback, right? So there are many ways in which we recognize that something is changing. Some of them are really apparent, like the requirements team telling us that, hey, I want this new requirement, or this functionality, I need something different. But there are a whole lot of things which are not that apparent. So we talked about customer behavior changing, right? That there is no direct way for us to see it. We need to be able to get that information somehow. And that is my premise here, that it is my assertion that you need data to understand that. The right way, the best way to get an understanding of how your requirements are changing is to ensure that you are collecting data about every aspect of your work. Whatever software you're building, when it reaches a customer and how the customer uses it, you need to be able to get the data on all of them. Not just that, you also use the data to measure your progress against the goals. And your progress against the goals itself is, I believe, a driver for your requirements, right? If that is not being met, if you are not able to measure the goals and if you are able to measure the goals and those goals are not being met, that in turn should drive a set of requirements for you. You may need to change what you're doing. You may need to go and revisit what you're building, right? All of that is required. So before we start, let me first put that big rider, right, privacy. We have learned through a lot of experience that this is very, very critical. Now, we are going to collect data from customers how they are using. The first thing we do is ask customer for his consent. These are the scenarios in which we will use your data and these are the data that we will collect. That is absolute must. And one of the tools that we use, and I will share my experiences, if you guys have anything to share on this, please do speak up and share your experience with the other people in the audience. I have about 30 minutes of slides, so we do have time for discussions. So I'll talk about asset classification that we are using and one of the guiding principles I use within my team is that we are on the side of caution, especially in this area, right? If there is a feature that is required for privacy that trumps almost everything else, right? So this is an example from my team. I don't know if you can read all of it, but just to set the context around it, when we look at data, we go and look at different types of data. So the example that I've given here is about access control data. That is basically password, certificates, et cetera, which are used to access the administrative functions, right? So we look at it and say, what are the permitted usage scenarios for that data? In this case, it is used for administration of our site. And another aspect we look at is geolocation requirements. In this case, there is no geolocation requirements. We can keep it anywhere. But when you come down to uploaded customer data, there are several cases where there is geolocation requirements. So we captured that here. So this asset classification and the questions therein, it is giving us a framework under which we can think about the privacy aspects of the data that you're collecting. And then the other aspect is access restriction. Who can get access to the data and what are the processes to get access to the data? We will think about that. Then retention, how long are we going to keep this data? In this case, there is a maximum of 300 days. But we also define our rotation policy. So the data is not kept continuously. And then we also talk about what are the protection, sorry, how it is encrypted in at rest when it is stored and when it is in transit. So this is the asset classification example that we go through for all the data that we collect. And what that in turn drives us is it helps us to build a privacy statement. And you can actually see this in the public site, right? Which is available for every customer who is using our services. And there are subsequent sections in this privacy statement for all types of data. We tell the customer, this is the data that we collect. This is how we are going to use it and this is how it is protected. And there is a section at the end where we give them the ability to ask us questions, right? So that's sort of the high level privacy aspect that I wanted to touch upon. So before you start going and collecting data, make sure you have a privacy policy in place and a privacy statement. All of the large corporations have one. If you don't have one, you can easily look at them and customize it for your benefit. Having said that, okay? Now let's go into the meat of the business, which is designing your data, right? So when we start a project to collect data about whatever we are building, be it a service, a software, a package software, we will have so much data that we can collect. We start brainstorming. You can fill in multiple pages of data points that we can collect. How do you prioritize them? And here is something which worked for us. We start with the business questions, right? And in this presentation, I'll talk about the business questions that we use. I run the Visual Studio Marketplace, which is like an app store for the Visual Studio. Visual Studio is an IDE, and extensions can be plugged into it. So people can go and get extensions for different features, which are not present in Visual Studio, but other developers have built it and they can share it in the marketplace. So my marketplace is, I own the marketplace. And what are the key things that I want to drive from here? So that is what we try to capture in the business questions that are present here. My first question is, are we able to drive user traffic to our site? And what is the data? So from this business question, I determine what are the data that I need to collect to answer this business question. And then once I complete that, I then go into the next steps in the process. Now, the second business question that I use is the site available and performant. I might get a lot of people, but when they come in, if the site is not available, then I'm not giving them what they want. And after that, it should be performant too. So that is critical business question for me. Third one, are we able to convert visitors to customers? Do they actually install extensions, or do they go and buy something? Then I call them a customer. So I'm converting from visitor to customer. And since we are a subscription-based offering, we need to ensure that they stay engaged with our service. It is not okay for them to just install and then leave. We want them to understand really what is required, and we want to be able to continue their engagement. And finally, we know that customers leave us and we try deeply to understand why they are leaving us. So these are the kind of business questions that we look at. So what this also allows me to do is prioritize all my investments based on these business questions. I have hundreds more questions that I need answered, but when I am talking about data points that needs to be collected, they need to inform one of these decisions. And throughout the rest of the talk, I will use these as examples. These are actually real things which I track. And the subsequent slides, we will link this through how we use the data to get this information. Now, another useful tool that we have used is this guidance. How do we come to this business question? We do it by first asking the question, can we correctly measure the success of a feature? Now, the success could itself be different. It could be that you are talking about revenue. You want to drive a lot of revenue for your site, or you could be really focused on adoption. You don't care about revenue. So you need to understand from all your stakeholders what the key business drivers are and what is the success criteria that you're looking for. And if you are able to answer that question, answer the set of questions involved there, those are your business questions, right? Second one is get all the data that is required in your current development cycle to inform your decision. So while you are in a development cycle, you basically go through several decisions about which feature to build, what not to build, what are the various options, among the various options, which one should you choose. We should collect data to help inform those decisions. And finally, how do you influence future releases? You have some idea about what you want to go and build. Do we collect what data we collect, what questions are needed to answer those questions and answer those aspects. Now, so we have looked at business questions. Those business questions will tell us what the data points that we need to collect there are. And then the next question before us is, how will you visualize all of that data? Visualization is interesting, but it always is tied to what decision you will make, right? The whole intent of visualization is to build that knowledge discovery to derive insights from the data that is getting presented. Visualization will help you get to that decision faster. You should not build visualization just for the sake of building visualization, right? So whenever somebody asks you for some kind of visualization, what is the decision you are going to make? When you are building it for yourself, you determine what decision you want to make and that informs, that decides what visualization you will build. So here I will go through a set of visualizations that I use and I will then connect back to those five business questions that we originally described, right? The first one is the usage funnel. We use this extensively. This basically shows the customer journey through our site, right? So we define it as multiple stages in the cycle of a customer, lifecycle, right? And in this case, we say that a user who comes into the site logs in, he gets into the next stage. After he logs in, he goes in and visits the details of one of those extensions. That's the third stage and he goes in and installs something, that's the fourth stage and then he actually buys something. There is commerce involved. It becomes the fifth stage. So that's really a funnel and we look at the drop-off in each stage in the funnel, right? And in some cases, we actually look at the end-to-end flow also. So stage one to stage four is our primary metric here. That is what we want to target. So what is the decision that I want to make? I want to ensure that the friction here is completely gone. I want to be able to maximize the number of people who are moving from stage one to stage four. And therefore, this visualization clearly gives me information about that. Now I can double-click on it and say who are the people who are dropping off? Those are the people whom we should talk to probably and understand why they are dropping off. If you can, you know, there might be a credit card involved in the workflow. Okay, he doesn't have that credit card. We add more credit card providers. So that in turn drives the decisions that you're making in the product. This is another interesting visualization. So I talked about another one of those business questions that I had was about around engagement. How deeply are my customers engaged with this, with my service? Now what we have done is we have actually classified our customers into three buckets. The first bucket is called a tire kicker. A tire kicker is somebody who has used my site only for a day, right? The second bucket is potential. That he has used my site between two and 10 days in a month. So we are talking about monthly analysis here. And the third one is somebody who has used it more than 11 days in a month. So that is a dedicated customer. So if you look at that, look at this map. What this diagram shows you is how the engagement level of a customer has varied. You can see how easily we are able to quickly draw some very easy but obvious conclusions, right? There are a set of, I mean, when you are coming in as a tire kicker, there is a very great chance that you will churn out in the next month, right? So the majority of the tire kickers are churning out. Now, if you look at the dedicated, which is the last one, you can see the majority of them continue to be dedicated throughout the flow, right? But I'm not sure if you can see that, but there is a very fine thread which connects dedicated through to churn. Now, those I really want to double click, right? I want to know what are the people, why did somebody who used it so heavily not use it at all in March? A key decision that I can make using this visualization. Then, obviously, there are map visualizations. Now, this is useful if you are looking at geographic data. Now, in this case, I have sort of expanded it to show a lot of pictures, but you can actually collapse it down to just the areas like North America, Asia Pacific, or EMEA, or so on, right? And the map visualizations, they are actually, I mean, you can put a lot of very dense information out here. Here, what I'm showing you is the split of how customers are buying our offers. Each different type of offer is shown as a different section in the pie that is shown there, right? So, you can actually get a bunch of insights just by looking at it, and especially if you have ownership assigned to different people. The person who is looking at North America now knows clearly that his behavior is, his customer behavior is significantly different from the behavior of the China market, right? And if, can he get some learnings from there? If, so if you can, if you notice there, the China, if you say the green one is the most profitable offer of our skews, China is not as successful as North America. And we will be able to, you know, he'll be able to go and talk to the China guy and see, hey, what are you guys doing? What should I be doing now, right? So that insight becomes easily available once you, once you look at this kind of visualization. Another one of the key business decisions, business questions that I asked was, is the site available and performant, right? This is another visualization that we use to measure that. Now, the area graphs are commands that are either slow or fail, right? The ones below, the line, the dark blue line is availability, right? And our goal is to get the availability up to 100% all the time. But that doesn't obviously happen always, right? And whenever, but the key thing for us is, when it falls below 99.9, we definitely need to take action. So there are alerts being triggered based on this, this graph, whenever it dips below 99.9, somebody, somebody has woken up and has to go and look at it and see why it has fallen below 99.9. Again, very action oriented. What's the decision you need to make you, the visualization helps you to arrive at that quickly. Of course, we still use tables. Now, the interesting thing here is, Excel gives you these capabilities to do like spark lines, right? Even within the table now, you are able to quickly make that decision that something is either doing well or doing bad or there is some action that is required. Creating those spark lines is just one or two clicks in Excel, right? It's really easy. You may not be able to draw a simple conclusion from one of the, so if you look at the bottom lines, everything seems to be green, but not really. The spark line clearly shows that it is actually dipping and probably there is an action that we need to take from there. Now, animations. Let me set the context here. So here I was, we were trying to present a business case for investment in a particular feature in our area. The feature that we were interested in is shown in the green bubble at the bottom, okay? And our hypothesis was that we need to invest more in this feature area. We believe that is significant and one of the triggers that we used was we actually showed this. So let me play that. So what is happening here is that we are plotting the data from 2014 to 2015. The axis on the graphs are on the y-axis. There is depth of usage. That is, how frequently are people using it, right? On the x-axis, it is percentage of users. Of the total population of users, these are the people who are using it. So my business case was to say that the green bubble is actually dropping in the percentage of users using it, but the people who are using it are using it in greater depth. And in fact, the size of the bubble represents how many actions they perform in the feature. So I'm pulling three plus one, four parameters into a simple visualization which I'll be able to easily explain, right? The feature is getting heavily used, but its usage is dropping. I want to invest more in that. I also anticipated the immediate next question, how are similar features doing in a related area? So you notice other features also I have plotted here and they have also grown. But my argument is, because I have the most number of users, you need to give me the funding to go and enable this, right? So this visualization was, I mean, we actually won that one, right? We got investment funded for it. But so fancy and really useful in certain scenarios where you really want to get attention, right? And we were able to do that. And by the way, again, built-in into Excel, you actually don't have to do anything more fancy than installing, there is an Excel plugin called PowerView. If you use PowerView, any four parameter table can become an animated bubble chat. Next one is clustering. So here, the business question at that stage was, I want to do a promotional activity, but I want to really target it. I want to identify the set of customers who will benefit the most from this promotional activity. So here we used a machine learning algorithm to actually understand the relationship between two different characteristics of the user and then determine how much impact the promotion will have on their behavior. And this heat map that is visualized here actually helps us to determine that this red zone is the set of customers who will get the most benefit. And we were able to then drill down and go into the set of customers by looking at the property set. So those users who have their property X in this value and property Y in this value are the ones which we should target with this promotion. And while we were doing this experiment, we also identified a set of centroids. Now these are the representative user samples. So in effect, we have these many representative users who are distinct in behavior. And if you are building custom offers, these are the people whom we should really target and we will get the maximum mileage out of. Again, a visualization which is in this case actually derived from a business question. We wanted to actually identify what the promotional offer should be, who the promotional offer should be targeted at and this visualization helps us to answer that very easily. So we started our journey with identifying the business question, going into the data points which actually answer those business questions, then identified a bunch of visualization. Obviously, this is not the final list. I have a list at the end, or a set of resource at the end where you can go and see more such visualizations. But once we have done that, let's go and build the infrastructure and probably this one is actually what we use today. You can, if you have other examples, other best practices, I would love to hear about that too. So the one on the left is my service. We have a health monitoring system which pumps data into three different streams and I will talk about those three streams. And there is another system which tracks how the users are interacting with my system. So the monitoring system is for the server. So server-based data goes through that path and this path is the client-based data. And we call the three paths, among the three paths we call the first one the hot path. Here the intent is to do monitoring and alerting. So earlier I talked about availability goal of 99.9. Not just that, we have several other metrics which you are tracking where there is a threshold value. As soon as the threshold value is crossed, we want to be able to be alerted. Either via email or via phone calls, somebody has to be woken up. How do we do that? We use this hot path and that is the thick green line which shows there. It is high bandwidth, low latency. Every event that happens in the system is available in that store within one minute, within less than one minute. That is the guarantee that we have set up on this. So the infrastructure is set up so that the hot path is very, very quick. And the other aspect of it is the retention of this data in that store is very low. It is only kept for one day because the intent is actually immediate action. As soon as the alert is raised, it needs to be, we need to take action on it. The second one, which is the middle one, that is used for diagnostics and incident investigation. So if something is broken, we need to be able to go in and identify the root cause of that. We need to be able to get all the data to help understand the root cause of that. That's where this comes in. The data store is actually SQL which obviously allows you the SQL based querying, but we also have another log analytics tool out there which also provides interactive visualization also. So the key here in this particular path in the architecture is to really focus on diagnostics and incident investigations. And the data is kept for 28 days because we need at least a month data to if you need to go back and see if such incidents have happened in the past but were not noticed, but now we are noticing it and so on, right? So we collect all that data and that information goes into the warm path. And finally, the very thin line is the cold path. So here what happens is the data from the system goes in and is put into an Azure data lake, right? And what that is, it's actually a big data store. So very standard big data tools like Hadoop, obviously, because it's Azure data lake, Azure machine learning and Office with Excel or Power BI can connect to it and do lot of aggregations, machine learning, all those are enabled in this cold path. Here though the data is not very fresh, right? I mean, it could actually be eight hours old. So if you're looking at troubleshooting or if you want to look at the data for the last five minutes cold path is not the place to look at. Cold path is primarily for the aggregation. So the data that I showed you for drawing the Sankey diagrams where the set of users we saw their journey through different months, we will build it from the cold path Azure data lake. Now, the key message that I wanted to land was analyze that data, right? I mean, we could go through all of these examples and we have fallen into this trap many, many times. We build, we start this whole process of collecting data, building all of these reports, score dashboards and everything, but nobody goes and looks at it. And so one of the key disciplines I think when you're looking at data in the context of agile is that you should have really efficient processes to analyze. And here are my recommendations, right? When there are incidents, so if you are looking at incidents, analyze them daily. We need to be looking at different strategies. We use five ways, right? Now, we see that the availability has gone down below 99.9, why? Because service X was down, why? Because a deployment happened, why? Probably the why stop there, I don't know, right? But we can keep asking those questions until we get to the root cause of the problem. And we do that very religiously, right? Every incident is always marked with a root cause and it is actually monitored also at regular frequencies to understand which ones do not have and which ones have delayed root causes, but we drive towards that. Next one, analysis per day. Here what we do is every feature crew actually spends five minutes after their scrum to talk about what happened operationally during the last one day, right? Some things may have failed but got fixed. Everything must have, I mean, everything happening perfectly has never happened for the last two, three years where we have run the service, but some blip or the other will always inevitably happen. And each one, we treat it as a learning opportunity, right? I mean, if you see that, okay, the database is going down in frequent intervals. In one case, we actually found out a pretty serious bug by just ensuring that we are watching this. It was not customer affecting, but we did see that happening regularly and that in turn led us to do that analysis. So this one is really operationally focused. The next set of analysis we do on a weekly basis and here it is more learning focused. So one member from each feature crew actually goes into a common place and say that during this week, we learned this particular thing from our data, right? And then we ask the question about is this applicable for the other teams? Can they leverage this learning? And we do basically a scrum. It's a scrum of scrums with our representative going in there and the intent is to understand the learnings that have been driven from the data that we have collected. And finally there is the analysis per mark, right? So here it is more business focused. We build out scorecards and some of the visualizations that I showed earlier were all taken from the business scorecards that got generated. The funnels are drawn per month, right? The Sankey diagrams are drawn per month. Each of them will help us answer the business question. So for each business question, so at least the format that I use is I put the business question on top and draw the visualizations that I want to present to answer that question in one slide. So I have five slides which I present monthly to my leadership and to my team saying this is how we are performing as a business. That regular analysis that cadence happens. So in effect, that sort of completes my end to end story on doing the data, right? We started with business questions. We went into identifying data points, did the visualizations and did the analysis and that fed back into set of actions that we need to take so that we keep improving our product. And I just wanted to touch upon machine learning as more and more we are starting to, as the data size grows, we see a lot of benefit in running these machine learning algorithms. I talked about clustering which was used to help us target a very specific question. Similarly, we run several experiments and the data from that also can be fed into one or more of the machine learning algorithm. So if you are thinking about building up a data pipeline, do evaluate the various alternatives that are available here. I mean, in our case for example, Microsoft has a free offering called Azure Machine Learning where you can actually go in and test each of the various machine learning alternatives that are available and see if it applies to you. That's it, use data to make your decisions better, understand the changes that are happening to your product and be able to react quickly and respond to the change and be successful enterprise agile. Thank you.