 All right, it's been a long day. I request people to just come in the front. It'll be easier for us to work as a group. So if you guys don't mind, please just kind of come in the front and we'll get started. Can you guys please come in the front? It'll just be easier for us to kind of see everyone and work as a small group. Quick show of hands so I understand who's in the room. How many people here work in some kind of BI systems or data science related projects? Quick show of hands. OK, we have a couple of people. The rest of you are agile experts, so we're going to leverage your expertise in kind of figuring out how to apply this. Can we invite the four panelists to please come join us on stage? Let's do a quick intro and then I can get started. This chair is open for any expert from the audience to come and join us. Leave room for the unexpected. All right, Joy, if you want to go. I'm Joy Montello. I work for Target Corporation in Minneapolis, Minnesota. Hi, I'm Gopal Krishnan. I work for Walmot Labs in Bangalore here. I lead some engineering teams that work on the Walmot.com site and other sites. They do a lot of data science and machine engineering. Which is why I'm here. Ragu Kashyap from Orbit's Bangalore office. I'm still trying to figure out what I'm doing in the office. Maybe I'll let you know soon. Hi, everyone. I'm Sangamitra. So I work for a startup called Help Chat, which has a personal assistant app. And I have just started the BI practice over there. And we're trying to make sense out of the data that we receive and do some personalization and get some insights out of the data that we have. Because I see a lot of people here are not directly working on BI or data science kind of project. So I'm going to quickly run through a good introduction to what it is. It might not be an expert's introduction, because I am not an expert on this topic. I've just dabbled in this field for the last couple of years. So I'm going to give my perspective. I've taken help from Vishal over there, if you can show a quick hand, to put this quick thing together. And hopefully this is fun and entertaining. How many people have walked into stores and have seen something that doesn't work? Have seen beer bottles kept next to diapers. Beer bottles kept next to diapers. How many people have ever stumbled into a store and seen that? That too on a Monday. There's a lot of stuff on this online. And some people say this is not true. Some people say this story is true. But the story is basically in 92 in one of the stores, and I believe at Walmart Store, that they figured out the correlation between new parents, new dads, and basically them wanting to party, but at the same time buying diapers. So putting these two together basically helped them increase the sales. And so there's lots of stuff online, whether this is true, this is not. And particularly, they did this on the Friday. And as Kalpesh is pointing out that he's even found it on a Monday. So the question is, how did they kind of do this? Was it just a random thing that they started putting these things together? Or is there something more to the story here? And again, I don't know exactly what happened at Walmart and Gopal can correct us if I'm mistating something. But I believe this is when people were trying to apply some kind of data science and make sense of what do people buy at what frequency, on what days, and can we put them together to basically improve our sales, to make it more convenient to personalize things. And that's a very quick example of what data science in some sense and what BI is trying to get at in some sense. I'm going to jump into talking about a very oversimplified version of what BI is. So typically, we have a lot of transactional systems. Lights are not visible. It's the projectors coming up. I'm going to try and explain some of it. And the slide comes up. It's going to be visible in a minute. But we typically have a lot of transactional systems where we are either capturing sales of certain things. We are capturing other kinds of transactions that have happened. And so we have these traditionally a large amount of transactional systems where a lot of information is sitting. And someone wants to basically make some business decisions based on these transactional data. And the way they want to make a business decision is they basically want some kind of report that they can look at at some interesting data and then decide, OK, if this is what is happening, let's do this. That's kind of what they want to do. So going from the transactional data to some kind of reports that can help them make decisions of what can we do. But if you look at the traditional transaction systems, they are not generally, at a small scale, they work. But as the system grows, they're not suitable for doing ad hoc reports. They're not suitable for pulling out stuff in random orders and kind of querying things. So that's the kind of the core problem, where you have transactional data sitting in one place and you're trying to build these ad hoc reports or some other kinds of things. So what came up a few years ago, and this dates back to quite some time, is basically trying to figure out, can we take this data and reorganize the data in a different format, which is more friendly for reporting, which is more friendly. And then basically, can we just, I think we started calling this as a data warehouse where we dump all of this information. And those are more report friendly. We go to an ETL process, which is an extract, transform, load process to get this data from the transaction system into the warehouse. And then that helps us. So this is a 30 second introduction to BI. This is going from transactional information data to some kind of reports that people can make sense out of and make business decisions. Did I do justice to the topic? Probably not, but this is a quick 30 second elevator pitch on what this is. What this led to, and I think a lot of systems have come up in the past to help us do this, but as we are progressing, as things like internet reach has increased, as the rise of social network, ability to track data so much more than before, what we are having is referred to as a data tsunami. Have you guys heard this term before? Data tsunami, which basically means we are overwhelmed with the data that's coming in. And now we need to make intelligent decisions based on this data. Of course, there was this one smart dude in the company who would be making these decisions. And now that person is standing and watching this data tsunami come all the way, and it's just not practical for this person to make these kind of business decisions anymore. And that's what, in my opinion, led to the rebirth or the kind of emergence of data science. So my slides are all kind of mixed up because I exported it from somewhere else, all right? So what is data science? If I were to explain in a very simple manner, data science has three kind of core parts to it. One is your ability to do some kind of programmatic, you need some programmatic skills, which is referred to here as hacking skills, some mathematic and statistics knowledge, and then some subject matter expertise, right? And right at the center of that is where we say is the core of the data science. And so having these three skills together gives you the ability to do data science. There's obviously the danger zone, which is people like me have some hacking skills, have some subject matter expertise, and then we go off and make arbitrary decisions, because I can program stuff and I can read data, but I have no math or stats background, right? So that's the danger zone. Or you have the traditional researchers who will go off for years kind of trying to do research and coming up with meaningful patterns, which again is the math knowledge in the subject matter expertise, but it's the right in between, which is what data science is. A lot of people will say you need to add some black magic to it, and that's what gives you data science. No data science explanation is fulfilled without a complex flow chart, right? You need to have some kind of a flow chart to show what this is. So there's all these different steps that are involved, and this is typically where your data scientists kind of operate. This is kind of the space in which they operate. If you look at this, what's the typical challenges people in the data science field face, right? One is this is, a lot of it is exploratory in nature. So how do you fit it in a two week cycle? Because of the nature of this being exploratory, you might get some pattern, you might not get some pattern, at the end of it you might just throw away and say you don't even have the data to make something and move on to something else. The nature itself of this work is something that is very hard to slice because you might make premature decisions, right? If you only looked at slice of the data, you need all the data and collecting all the data in the past company where I work takes six months before you can get all the data. How do you fit that into a two week cycle, right? Some of these are the kind of challenges that people face. And if anyone's worked with data scientists, they are different kind of people from a different planet. It's not very conducive to work with them a lot of times. Not that I'm saying they're bad, I think they're like awesome people, but a lot of us just struggle trying to work in a very academic setting with them. It's very hard and that's, I'm just highlighting some of the problems that I have faced in my past, right? So I've kind of given quick intro to what BI is, what data science is, some of the typical challenges that are faced. And then what I would do is I would turn now to the panel and I would ask them to kind of briefly give the context of where you're applying some of these techniques and what are the challenges that you guys are facing? I've been at Target for three and a half years and I worked heavily in the reporting side of our BI space and we did on that project, we were bringing in a new transactional system that was capturing transactional data and then in one project we were taking that data, moving it into our data warehouse and then also consolidating reports. We went from 630 reports down to like 15. Across all of our stores in the US, across headquarters they were used for a variety of purposes and so that was the first project that was on at Target and we were very new in our agile journey at that point in time. So we did all kinds of crazy things like we estimated stories and we had definition of done and all these really terrible things. Kidding. But we ran into lots of challenges, right? So we were waterfall for, I think over a year and a half or two years, the project was super cross functional. So we had, I wanna say it was upwards of 100 and some people on this project. In my first intro into the company, I walked into a meeting and it was, the project was so read across every function, it was magenta and people were yelling at each other and there were estimates about how much we were gonna save by implementing this new system and so they're like just throwing more people at it and the more people and resources that were thrown at this, the further red we got, right? And so the solution then was let's move to Scrum. So let's implement agile and it was a great idea. The problem that we were having was people described BI as the tip of the dog's tail, right? So we had our system in and now we had all these stores that needed the reporting but we hadn't been able to vet our data in advance of needing to just push it out on those reports and give it to the field to make actionable decisions. And so as we moved to Scrum, we ran into a couple of challenges. One, how do you really slice something vertically when you're trying to deliver end to end for a store? So in a store if I need to know, how many, what was my average cost per hour today that my store was operating? How can we get our data all the way from our transactional system into the operational data store, into our data warehouse, into meaningful metrics that were accurate and then onto a report that our field people could use. So that was one for us, was just really understanding how we could get vertical slices that were meaningful into the right size for our team to be able to deliver in a fixed period of time. Okay, cool. So vertical slicing, something that's valuable and something that can be accomplished in two weeks was a challenge that you guys ran in. Yep. Awesome. Yeah. Go to Gopal. Sure. Hi, in Walmart we use data science and a lot of machine learning. For a variety of things, you start with when you go search on Walmart.com, you see the search results. The search results are ranked and order things that sell more or more popular, etc. They get ranked on the search page like that. And that's all actually from a lot of data and machine learning work that engineering teams do. We also use it for things like sometimes hard to work with ad networks. You may come to Walmart.com, you may see some item, maybe you buy it, maybe you don't. But it gets surfaced back to you as an ad when you're somewhere outside on the road. Knowing ads that you see everywhere. Right. So those, all of that actually happens from a lot of user behavior data. And that's where as the internet explodes and mobile devices explode, internet of things explode, the amount of data that we end up collecting is enormous. And which means that data scientists need to figure out how do I make sense out of this and then apply it to problems, like how is my search ranking going to change or which set of products should I surface to you when you're somewhere outside on the road. There's many more examples we use it for. But the challenges we face is this. So we are over the last, I think two years of it, maybe. We've tried to transform ourselves into an agile organization. And some of the teams are in various states. So for example, I usually use the search team and the search relevance team, which is what is responsible for relevance and ranking. So we are in the process of transforming them to an agile model. And the challenges we face are these things. Like, so these people are people who are, they have to do a lot of research. So you have a ton of data, or it's collecting, like Nareesh was mentioning, getting the right data is a challenge. Once you even get the right data, then they make some hypothesis saying, okay, let me look at this, analyze which machine learning model I'm going to use to try and solve this problem. So they may say something like, okay, I'm going to use a naive-based classified algorithm in order to address this specific problem I'm trying to solve. And that itself is an experiment. And it's going to be, so we end up saying that, okay, take a spike and make a sprint. And it may run for three to five days, something like that. And maybe hopefully you'll find something. Five days later, they may say, okay, that model really worked only for the head on the top working queries, on the top queries on the site. But when you get into all the long-tail queries, then it really doesn't work. So now I have to go to a new model. So this is the pattern. It's almost like an experimentation platform that search systems typically are. And therefore the whole model of development is really, I will try something, I will experiment, which is all a very agile concept. But then we're trying to be a little bit process-oriented and say, okay, now you come up with an estimate and tell me, how long is this going to take you? You're like, I don't know, I got to go to some research papers, I can go talk to 10 more people and then I'll discuss and then maybe it'll take two weeks, maybe it'll take three weeks, I don't know who the answer is. And this happens even for other algorithmic work too. They, lots of times, even without data science, we just have people who develop complex algorithms. And so they go read actually published papers, journals. They go read research, it's a little bit researchy, but we're not really a research organization, we're a product organization, we have to deliver value to the business at the end of the day. And so the challenge is then, how do you, should we actually, and I question myself on this, should we actually be trying to enforce a time-estimate bound length set into solving some really, really complex and challenging problem? And I'm curious to hear what other people are doing about it and if you guys have any opinion. So just to quickly summarize, the challenge you highlighted is that the nature of the work is very exploratory data analysis and then even a lot of it is exploratory. So trying to figure out a time box and how do you fit this, and even if you estimates can be applied, is a question mark and a challenge right now. Okay, cool, awesome, Raghu. Thank you. So I've been with Orbitz a little over 12 years now. Have people heard about Orbitz, anyone? Expedia? Okay, a few more names. We got acquired last year, by the way, so we're part of Expedia Group. So the first, almost seven years, I was on the product development side working on Java and we started using Scrum in like almost 2007. So as an organization, we were pretty mature in terms of agile. But in 2009, I decided, okay, let me try my hand with data. So I moved over to the BI and the analytics team. So my first project that I worked with the team was like building out customer lifetime value. So this is pretty common in an e-commerce world where you build customer lifetime value to figure out what's the customer worth for you so you can actually personalize better. So the teams were kind of very distributed, right? So within BI, you have a visualization team, you have an ETL team, you probably have a business analyst team. So how do you get to them to work together? That was the biggest challenge for us. So I talked to the data modelers. I tell them that, hey, we gotta build this. And this guy says, give me two months, I'll build you the model. So, okay, that's not how it works. Then we talked to the visualization people, they're saying, okay, I need a month to build the visualization. ETL like, okay, I need another two months. So the whole waterfall concept was inbuilt in the DNA of the BI people, which was hard to change. And what we really started to figure out was how do we change this, right? How do you change the mindset? agile is more mindset than anything from my perspective. So what we did was we actually brought all the skillset into one team, right? You want co-location, you want vertical slicing. So you bring all of them together and you actually give them a simple problem. You don't tell them the bigger problem, right? And let's say, okay, how long do we think we need to do something like this? So we started off simple experiment in terms of how we extract data of our HDFS system, which is Sadoop system, and kind of move towards the CLT we build out, right? So we slowly started changing from, we started off with actually a four week sprint and we came down to three, came on to two. So as people started working in this fashion and understanding the value that they're providing to the business and actually the constant feedback that they get from business really changed the mindset, actually. So, and the picture that you showed about the transactional data, that's how we used to be like, whatever, 2010 prior. So now we work a lot more on real time data analytics. We are on cloud, actually. On the BI and cloud is like, no, no. It's a big issue. Putting financial data on the cloud, I mean, it's like, yeah. Anyway, so that's kind of how we tried to move away from the whole waterfall concept. And obviously we had a lot of challenges. The biggest challenge we had was obviously the mindset. The second biggest challenge was skill set. As you progress in evolving BI and data science, your skill set needs changes a lot. You need people who can program. You need people who can actually have some math background, kind of a combination of things. So it was quite a good challenge for us to work through. I wouldn't say we're almost there, but we are actually a long way ahead than where we were in 2010. So that's pretty much my story. So skill sets, basically getting people to start becoming a little bit more generalizing specialist in some sense was one of the challenge. So initially when we were hiring, right, we would hire saying, okay, we need ETL developer who would work in, let's say Informatica, whatever it is. But now we hire people who have some programming skill set because ETL is not just on the traditional proprietary systems. It's a lot more than that. So like I was mentioning, I work for a startup. So being agile with BI is the only resort. There is no way you can follow a waterfall model because it all comes at the cost of the business, right? So the app is evolving and every week there are definitely releases and every day there are changes that are happening. The experiment's happening. And today everybody is data driven. They need data to understand what is happening. Is it working or not working? How people are reacting to it? And the main drivers at this point for an app which has just started in a few months old is how to drive engagement, how to bring more people in and more installs and things like that, right? So you need analytics also to evolve at the same pace as the app is evolving, right? So you have to be agile. There is no other way. You cannot go with a waterfall model. So I think the main challenge is, like Raku was mentioning, right? Everybody from a BI analytics background who's worked for a good number of years has this vision of BI where you create one mammoth structure where all the data organizational data is there in what you call a data warehouse or data lake, whatever you call it. And then you build a visualization and you can go across different kinds of data and cross reference, et cetera. But building that out takes a lot of time, right? And designing itself is a big phase. So how do you break it out in agile structure? So that challenge is what we work on every day. So the biggest challenge is working at that pace and how do you ensure that you're reporting the correct data because all the data is landing in your transactional system. And because you are receiving app data from different sources, different kind of installs, different referrals, et cetera, everybody has their own way of denoting data. Somebody may call something in a different manner. How do you all bring it and show it in one place and ensure that the data is correct? Bringing the data is not the big part, creating visualizations on top of it is not the big thing. But what is the logic? How do you ensure that you work at the pace that you want to work at and you report things correctly? So that is the big challenge that I face today is like ensuring that the correctness of the data. And the other thing is it's a very evolving process, right? So you have to be very flexible with the toolkit that you go in with, especially in the BI. So you would have seen the Gartner's Magic Quadrant and there are so many tools, et cetera. But not necessarily all those will fit your requirements. You have to be very flexible with what you want. You have to be ready to program. You have to be ready to change your tools. You have to be ready to do work in Excel sheet also if required. So that flexibility and everybody in the team having that mindset that yes, we need to be flexible, work with different kind of tools, whatever suits the requirement at that particular point of time. Cool. I think you trust upon lots of interesting challenges. But the one that you last talked about is being flexible to kind of pick whatever tool to be able to get the job done because you need to move fast as a startup. You can't afford like a two year timeframe where you can build this nice data visualization, data analytics, all of that stuff. That might not be the case. How do you be really agile to even get started in this space, right? So those are kind of interesting problems. Each one is brought to the table. What we're gonna do is now we're gonna turn the table to you guys. You are like the agile expert here and you're gonna help these panelists figure out how can they address the challenges that they have. So what I would request is you form groups of tables, right? And each group pick up one particular challenge that the panelists brought up. You can pick any challenge, whatever you like and then try and come up with a solution of how you would deal with a situation like this. How would you be agile? How would you be anti-fragile? Whatever you want to call it. How do you actually deal with this? We'll give you a fixed 10 minutes time box. Then we'll look at if you need more time, maybe extend by five more minutes. And then each group will kind of quickly present what would be their approach and we'll let a panelist judge if that's gonna help or they can give you a counterpoint. The panelists are also here if you wanna ask them any questions, you can come up and ask them questions or you can call them to your table. Okay, does that make sense? So we're gonna, yeah, okay, there's a question there. Sure, sure. So quickly to summarize, I think what you're talking about is in your context, you have four kind of, you know, not so latest tools or data ETL processes that are there, data flows through all of these. Each step is managed by a different group or a different team. And to get an end-to-end piece of functionality delivered, it needs to cut across everything. So how do you apply agile in a context like this? So this could be, again, a question that one of the groups maybe wanna take and see if they wanna take a stab at it. So let's do a quick 10-minute time box and then you might be the fifth person here. The chair is empty. And then I want people to kind of pick one problem, try and take a stab at it and we'll get into more discussions after that. So I'm gonna set a 10-minute timer as the initial timer, pick your groups, pick a problem and try and take a stab at how you would apply agile methods to solve this particular problem. Okay? Feel free to use the experts there here, ask them more questions of what the challenges are. Sure? Hello? 10 minutes up. Do you guys have some workable solution? Looks like everyone's having interesting conversations at the table, which is a good thing. Do you want five more minutes or we are good? We wanna present what we have come up with. All right, five more minutes, 10 minutes. So let's take a quick break. We'll see which groups have come up with. So I'll quickly go around. Can you quickly talk about the problem that you guys were addressing? We'll come to the solution next. I just wanna go around and look at who's tackling what problem. So our group was looking into the problem like they are doing this research and they really cannot predict and they're trying to predict into this two weeks and three weeks. So our first response was, hey, don't do it. I mean, you know, it's... So we'll not go to the solution. Hang on. I just wanna quickly go around. So the problem you guys were looking at is it's research oriented. So can you actually try and fit it into a two-week or three-week cycle and estimate that stuff? All right, awesome. Quickly go around and look at who else is solving what problem. We also wanted to solve your problem, Gopal. We are with you. Same thing. Walmart be a problem. That sounds more interesting, right? We were trying to solve the problem for target, the vertical slicing. Vertical slicing, okay, cool. We're trying to solve the problem for startups. What problem? Basically when they get rapid requirements changed and they still need data analysis to be done. So that's the problem we're talking about. Gopal's time box challenge, okay. We'll go back there. Scott and Table. Time box challenge, time box challenge. That's the easy one. Everyone wants to go after that. All right, so let's look at what did you guys come up with? Explain a little bit more. I mean, you know, the product owner is saying, you know, reduce the time and they're saying we cannot reduce the time of six weeks. So we were still like debating and what's the solution is. So I don't know. Others, do you want to add anything or Jordan, do you want to add anything? No time box was what I heard. Yeah, just no time box as a beginning, but then we dig deeper, but we could really what we wanted. What would you say, Gopal, to don't time box? What's your reaction? I think my teams would love that, most certainly. Because that is definitely, that's already a discussion we've had. Ask me for a deadline for this because I got to go figure it out. We're all still, you know, we need to give some form of deadlines and some form of time frames to say, okay, the system will be delivered soon because we're still in business at the end of the day. One other possibility is just pairing them up with somebody to speed up the learning process. We'll go back to that table. We felt two weeks print is a viable thing to fail fast as this is an R&D kind of a project. Explorer to kind of work has to be time boxed, if not the research can go nowhere. So while you're actually picking up the right tool set or the right variable, you might also want to use the right tools if possible. If you see there is no right tools, then you might want to move on with that particular tool and then find another one. So picking the right variables using the right tools would be very beneficial in a short period of time and failing fast is what is the theme what we want to project. You must feel to move forward. So what we intended is that you know what you require to do that R&D. You need to look at specific set of stuff. So find a variable, you come up with the variable that you want to go with. You know the most used variables in your market. Use it fail. So the first iteration would be to fail. So once you fail, you know where not to go the next time. So, and that would be a good way for you to estimate that for this variable, I tried this out, I failed, I should not go back to similar stuff, but I know where to go to. And even R&D, you can take up specific stuff to analyze. So by the end of two weeks, we would have analyzed this, this, and this market. So R&D is also scalable to a two week span just that you guys should be open to it. Along with that, you know, probably what we do is we also have some variables and probably we think of what is the most, containing most value. I will go with that. Probably we are failing somewhere and we are taking up the lessons learned. So probably with the value, with the most value, I'll probably get failed over there, take up the points, and probably work on that. My response to that is it's very premature. I've been there, when you take only a few variables, the outcome is very different from you add a new variable. So you might be just feeling like you're wasting time doing that. I agree. I work in the healthcare industry and the data itself is variable for us, you know? Today what the doctor wants to see as his criteria for a condition. If they say that I wanna know how many people between the eight span of 25 to 30 do not sleep or have a sleep apnea. Tomorrow he'll say that I'll add one more condition to it. After he gets the solution in his hand, he'll say, can you add an RDIs also? Meaning how many times he starts from a sleep? So the entire thing that we've made is gone and we have to add one more condition to the hypothesis. So it's okay to fail at the first part. You are handling data. So data is always volatile. You will have legacy data, but you have to fail to know what is done and what is not. I'll just add some specifics that we discussed, which is that the two week window or whatever is the time box you use, you use it to figure out, meaning you set a target saying that we need the, the variables to be defined by then or we know that the tool sets we are using are not working. We have to move on to another tool set. So either ways you failed, either you've not, you know, locked onto the variables or you said the tool sets are not working. We need to use another set of tools to figure out what these variables are. Right, so thanks for all of that. So we do fail fast actually. So we actually, so one of the teams operate in a, I was just explaining to them, they operate in a three week development cycle, one week. And that's where they tie out different models on a week by week basis. Sometimes single models that can take like two weeks to figure out and tune and validate and so on. They do fail fast. And that's not an issue actually. And we are already practicing that. So the challenge comes to, we have like a three week cycle for this. And then what's essentially like algorithmic QA, there's some people who can validate some of the work that has been done and see how it affects, for example, like search results on the site. So that's like four weeks. We do a two week A-B test after that because you want to make sure that it doesn't actually affect something really, really badly in production and therefore which will affect revenues, right? So that translates into like a six week cycle. Sometimes it extends because in that after the three week and then you do one week algorithmic QA, then they'll say, okay, no, no, no, this doesn't work only for these cases. Now go and fix it for the rest. So that now extends. So your cycle is unpredictable a little bit and that's where business gets antsy. So the fail fast is there. The question is how do you shorten this while still providing good data analysis and models which doesn't affect revenue adversely? We actually said maybe you're already practicing it. I'm sorry? Amongst us, we have actually said that maybe you're already doing it while you were debating this. Yeah, we are doing something. That's how we operate actually. So just wanted to add that if, I mean, the purpose of shorter iterations was to fail fast and get early feedback and if you already have the mechanism, then why have the useless ceremony of doing the iteration, going into asking what will fit into the time box, right? It just feels like a lot of wasted effort in my opinion. I'm going to this actually. What if you look at this problem a little bit differently, right? After two weeks, you're saying that, hey, I didn't get any results. Instead of looking at that way saying that I figured out one way where it wouldn't work, right? And you basically work with your business and say, hey, this is one of the hypothesis we had. Guess what? It's either negative or neutral. So we know at least something's out there. So twist it a little bit and put it back on the plate saying that, hey, the two-week sprint was different based on this. And try to figure that more at acceptance level if possible. One example I want to give out is like, we had a very similar problem with CLTV when we built out the CLTV, right? Customer lifetime value is not like easy to build out. There are so many variables. So I think one of the way we looked at it is like, okay, what if we take each story for a sprint with each variable and we're going to come out and say whether it's positive, negative, neutral, whatever it is, and then change the story afterwards based on the results. So at the end of the sprint, I still have something to show saying that, hey, guess what, we tried this and here's the result. Whether you like it or not, that's a different story. So that's a little bit twist we did it. Yeah. No, that's what we... I mean, you pick some cycle, but the point is not two weeks or three weeks. It's like X week cycle, whatever that is. Yeah, so what's that regular interval is basically the question, right? At what intervals do you sync up? At the end of the day, shortening it only means that can you release stuff to production that increases revenue? That's the goal, right? And can you do more things in during the timeframe of a new year? That's... I'm sorry. More is always good. Is more always good. I'm sorry, more revenue is always good. So... Go, we'll go, okay, go ahead. Okay. I have more of a leading question. I want to take a step back here and I want to think about, we have accepted this thing as a golden rule. It is important to fail fast. It is important to do something real fast and do something because it has yielded results in certain fields. Because actions, action precedes clarity as a principle is what we have accepted. Should we take a step back and think why this rule or why the strategy has not worked with data science and BI for some time? Again? Fail fast actually works. I mean, the case that I'm talking about at least is to say the engineers do pry out one thing. Then they find that this doesn't work in these X percentage of cases, but not in the other 60% of the cases. And then they say, okay, now I got to try out a different model. That is the fail fast that I'm at least thinking in the development process. What I'm trying to ask is, perhaps, was it... I mean, looking back, you'd probably say that the first few attempts, we could have avoided those because... I mean, we could have obviously avoided those. We should have thought of these X, Y, Z points. We could have avoided those. Yes, the steps that fail, we could have avoided those because obviously we didn't think of these things. That's why we went that way. So what I'm trying to come at is... Unfortunately, it's not that way. That's where it becomes... That's my challenge actually. Data science is actually a little bit different from a standard engineering problem. And a standard engineering problem, you can at least say what I learned in the past and these kinds of problems I can avoid. With data science, it's completely data driven. And therefore, the problem that you're solving today and the algorithm that you use today may not necessarily work for the next one or what you avoided the last time may actually work in this case because you're solving a different problem that has a different set of data. And that's where it becomes a little bit of a challenge. Both of us are agreeing, but probably I'm not choosing the right words to say that. Okay. The thing is I would like to go to the word science. How science happens? How do scientists arrive at results in their fields other than data science? It is the same scientist who is working on a particular thing, researching on a particular thing for let's say 10 years and then one fine day suddenly comes up with a result. Maybe he has the same data in the first year itself, but there is one small part that it did not look at for a long, long time. Can we create an environment for scientists or for data scientists to be aware, to be more conscious, to reach that stage which they had reached after 11 years so that they can come up with the same results beforehand? So what I'm trying to say is instead of action precedes clarity, let them get to clarity first. Let them, let's try to create an environment where clarity is given more importance. Scientists enjoy that I know it. And then perhaps we can try this way around. We can try to address the challenges that they might get at least. This is just a guess. This is just a leading question that I'm going to ask. I'm going to time box this and kind of move because I wanted to get, make sure every table has had an opportunity to kind of share their solutions. So I think we'll go to this table and see the problem that you guys were trying to tackle and what was the solution you were coming up with? So it's not a complete solution. So we were trying to solve that all, we can say. See, one thing, and many of the suggestions which came for Gopal's problem that applies to this case as well. Instead of looking at the whole set of report or whole set of inferences which you are looking for, it's better to look for a smaller piece of inferences and then do it. And many times or at least in the projects which I was involved, proving the architecture's stability itself was a problem. So what we did was we sent one piece of data from your input to your ETL to your data store to your visualization. That way we were able to prove that your architecture really works. Many times that itself is a problem and also we figured out a way to change the architecture quickly. When we change the architecture quickly, what we mean is the data has to move from wherever it is stored currently to the new place without affecting the current functionality and without bringing the system down. And for that, you need a lot of technical expertise and you should also, like how she was saying, you should be willing to change your tools at any point in time. And the other technique what we used is we made people work together. The visualization part, if they are sitting separately and if they are not going to talk to the ETL guys, that's not going to work. So we made everybody sit together and work. That helped. Quickly to summarize, I think you're saying, the last point you made was that if you can get people to work together and get to sit together and work more closely, then that one way to kind of shorten the cycle. The earlier point that you made is basically find a thin slice that will basically prove out the architecture or get one variable across and get that at least validated. Right. And after that, this is very simplistic way of representing. Many times the inferences will not be, you will not be able to come to that inference within that two or three weeks time. So we ended up having some horizontal stories. So data moves from the input to the ETL. And the struggle we had was we were not even able to validate whether the data quality is good in the next level. Because we don't even know what data we are dealing with. The data itself is very dynamic in nature. And that problem we were not able to solve still. Okay, cool. I want to quickly come to Joy and see what your reaction is. Yeah, so that's actually, this project was a couple of years old. So that's what we did. We actually co-located, I think it was 17 people. So we were super lucky, such that we were able to put everybody in the same space. They spanned every need that we had. So we had data management, which was doing our data science work there. We had our business or our product owner who was representing our stores would actually come and sit with the team at a specific time every day to look at the data to answer questions. We had our micro strategy architect, our DBA who was helping us with our tarot data stuff. Everybody came there and then we really just challenged the business or the product owner to strip out anything that wasn't critical to get to that MVP. So can we just give you one piece of data, team member name in the hours that they worked every day, right? And maybe it doesn't have any fancy functionality that's a future feature, but all those things you talked about are what we did. So, yeah. Cool. I think you wanted to make some- Yeah, I know it's time box just to mix. I think when you are slicing the reports, probably it will be very useful if you write some automated test cases because if you have reports, you are playing with the data. If you slice your reports, you already have existing results known to you that if we run the report in existing system, this will be the outcome or this will be the calculation, how it would work and this will be the result. On those cases, if you slice those reports and then match the results with existing reports and write automation cases on incremental basis, you will always, you know, that work of reassurance that the calculation which was in the existing system is always with you, that you know that the current reports are giving me the same results which we have been, you know, building up incrementally over the period of time last, whatever time you- Yeah, that would have been ideal. Unfortunately for us, it was a cut over and so we had, well, so we had physical clocks in each store and a person would come in and take their punch card or key in a code and that would be the start of their shift. Those clocks actually cut over to a brand new system. The old system had a bunch of configuration and calculations happening behind the scenes that we couldn't get to and so we didn't have the benefit of having a punch-coded system simultaneously so we could compare and refine it. We had to do some just guesswork somewhat and I say that totally respectful of the scientific process. Yeah. But it was a theory and then I asked the question a few weeks ago, like at what point does a theory become a fact? Apparently never. Like gravity's still a theory? I didn't know this. Anyway, so it was a lot of theory until we could prove it wrong. Yep, thanks. Cool. We'll go to that table quickly. Yeah, go ahead, you guys. Very similar? Okay, Bhavan, you want to add something? You're sure. So this is based on my experience when I was working with GE in the same division. So I think every experiment typically will have some mechanics of doing certain steps to produce a result. And so a lot of times what we realized is bad data was really a big problem because the bad data would give us bad conclusions and the feedback was much late. The fact that we know that it's going wrong is too late at the end of the day. So we tried applying things like data profiling and cleaning of the data right up front as part of our process of doing the experiment. That really helped us a lot. The other piece was not all engineers are very aware about the data. So what happens is if we put data profiling and cleaning of the data up front, people like ETL, data warehouse, reporting guys, what happens is they're aware about the data, they understand the data much better. And if your team has a better understanding of the data, what happens is they can talk with the business in a lot more sort of, I mean, they are on the same page. So that kind of makes it much more easier to collaborate. So that's something that you can really look into. Sounds good. I think the data profiling is creating that. And I think that relates to what this gentleman said here is to create that environment where they actually have clarity more. I think that's fine. Thank you so much. Okay, we were considering about the startup issues. Typically, startups generally don't have too much of a budget or luxury to buy expensive ETL tools, reporting tools and a lot of other stuff. So we broke down the problem into two parts. First part is where all the requirements are governed by business needs. So a business user would typically be giving his or my requirements, which I would want the team to execute over a period of time. So you call it maybe an iteration. So now in an iteration, you want to execute those requirements. So what we thought was like, if we can have the BI specialists also to be a part of the team, these are the guys who could actually understand the database like and convert it into the star schema thing. I think that's the right word. So into a star schema, write on scripts and do it at the same time while the feature is being developed into the system, not later on. So by the time the feature is shipped, you actually have your system for data monitoring in place. Rather than having like, you know, later on doing, okay, I've got the history log. Now I have to kind of know. You make it early on and actually have it tested. So you have a team which is cross functional, which is so that you know, they have got all. And secondly, the second part of the problem is analyzing the trends. Like, you know, okay, you've got the data now, you have to analyze. Now the person who is analyzing the trends that they would also be typically experienced from what kind of problems they're going to face. So if they also become consultants to the team when they're making the reporting requirements in place, I think that would be a big step forward to actually solve the problem later on for the, you know, making in the exploratory phase if I'm not mistaken. So you could have your ETLs being done by even simple scripts. It's not necessary to have a big time ETLs or all that stuff, you know. So this was just what you thought about. So if I summarize, right, and if I understood correctly, what you're saying is that the BI team and the analyst should be part of like, whatever changes are going into applications and have it ready before that. So, but I think for us, the challenge is not the involvement because the involvement is there. The challenge is because, you know, how quickly things are changing. So today, like you plan, you plan to bring the data in like a simple architecture, bring it in and think that these are the kind of analytics that you want to do. But once you look at the data, things change. And then you realize that the answers you were looking for has actually brought up more questions. And then you want to explore more. That's precisely what I'm trying to say. So you are the person who's a data analyst, data scientist is already experienced and the data scientist is one like, I would say in scrambler, you call it product owner. So they also kind of product owner are while doing the development part of it so that it helps the exploratory phase far more better. And people who are already doing the development part, they're actually executing the BI thing during the development itself so that you don't have to wait later on to do. You got like roughly around 50, 60% of work already done. That's what I'm trying to say. I think the fundamental problem here is when you actually look at the data, everything can change. And you don't look at the data when you're building stuff. You build stuff, you ship it out and two weeks later you'll actually have some data. Precisely. So I don't think what you're suggesting combining is going to be practical because the data is going to come two weeks later. That's right. There's no way you can design for something that you don't know what is going to come. But we can definitely, because the business user is giving you a requirement, right? And this requirement... There's no such thing in startups. There is. There's no business requirements coming to you. Yeah, the business users, I would say maybe entrepreneur, right? You would actually say, I want this feature shipped out next one month. 100% right. I've done two startups there. I think I'll break down the problem. Like what you have proposed, that solves like 60% of the cases. But there is another 40% which goes unanswered because it's because things are so dynamic and... Absolutely right. You don't know how to accommodate that. Sometimes you know how to accommodate. Sometimes you don't know how to accommodate. Precisely. There's definitely, there's some lecuna over there, but then you know, that's a startup. Pains. So we'll go to that table. They've been eagerly waiting for their chance. We were trying to tackle the whole challenge around like, you know, the work is too big or we're getting too many spikes that you kind of called out, right? I mean, there's a lot of spike in our world. Does things fit into a two-week sprint or not? So one of the things that we kind of brought to the table was rather than worrying about a two-week sprint or things of that nature or there are too many spikes, can we collaboratively come to the table and say what is the goal of the day? What is it if there was one assumption that we could tackle by the end of the day, what would that be, right? And try to tackle that and the next day of standup is basically discussion of pivot or persevere, right? Should we further, look at this further or not, right? And that's how we should be on a daily basis. If I have to say that it's almost more programming version of BI issues, right? The team is basically together combined solving the same challenge at the table and try to tackle it one day at a time and making a call. Should we continue or should we not continue? Because you know where you are and where you want to be. It's through series of experiments and this pivot and persevere decisions are gonna take you there. And at the end of the two weeks, basically look, where do we, I mean, look, let's look back and see where do we stand. Did we accomplish what we started to start off with? If it did great, if not, okay, do we need, what's the next direction we should take, right? We're basically almost on a daily basis making that call rather than dealing with 50 spikes parallely going on. Let's take one step at a time. Goal of the day and our daily stand-ups are pivot or persevere. So that's all suggesting that don't ask the engineers to estimate there. I mean, yeah, there is no estimate over here at all, right? It's rather than getting bogged down by estimates, let's talk about the problem to solve and what can be accomplished by end of the day today, right? Okay, cool. That just sounded like estimation to me. I would rephrase it and I would give a consultant answer. What's the end of the day? What are you going to solve estimate? Let's time box it. One day's sprint, you've taken two weeks, now you're saying, okay, estimate, what are you going to complete by end of day? I did give you autonomy of saying, well, I'll look, whatever you can accomplish by the end of the day, we'll look at that, look at that, let's look at that, right? We'll time box it. I didn't tell you that I want this by the end of the day. I said, well, let's see what we can accomplish by end of the day. I mean, what assumptions can we knock off the park and what decisions do we need to make by tomorrow morning, right? I mean, should we continue down this path? Did we make enough progress to look into it or? But your data collection can take, in some cases, systems have worked, it takes six months. So you can't make daily decisions. Again, we're throwing stuff on the wall over here, right? You want to run 50 experiments parallely because you're not going to do sequentially six months once an experiment. So I think, again, the context here can be quite different from an engineering context. In engineering context, most of the things are under your control. While here, a lot of data, you're dependent on data which can take up to six months for you to get back, right? So again, the big challenge that I see in data analytics is the reliance and exploratory nature and stuff that is outside your control. Makes it very hard to work in a very short cycle while we all would love to do that. Is this the constraints, if I can speak on your behalf? My partner in crime over here has something. So apparently, I'm a criminal, maybe. Yeah, so to solve the six-month problem, I usually call BS on that right away. And my approach is, well, what data do we have right now? Let's look at that. And then if we get new data six months from now, we'll look at that six months from now and we'll deal with that. And whatever we learn at that point, we'll deal with. So I start asking questions that whenever I get like this, it's going to take us months or years or weeks. My response is always, what are we going to get done by lunch? And then let's do it. Because we can't, like right now, you can take a look at some data and you can play with it for the next three or four hours and you could probably learn something. So let's do that. So let's not wait, let's not look for excuses for why it's hard. Let's look for things we can actually get done. Like what can we learn today? It might be a better way to say what you just said, but yeah. I'll throw an example. We're trying to help a hotel make better pricing decisions, all right? We're onboarding a new hotel. We're going to help them make better pricing decisions. I have no idea what transactions have happened over the last six months for us to be able to derive some patterns and tell them, okay, here are some pricing optimizations you could do. It's going to take us six months to actually collect data and there's nothing we have today. Okay, so then you wait until six months from now. See, look at what data do they have right now? They have no data right now. They don't use any of your systems, right? For me? They don't use any of your systems today. Okay, so you start putting your systems in place and then instead of waiting for six months, let's take a look at the first months of data that they got because this month, this hotel will have several hundred or several thousand people go through it and there's a little bit of data being generated there. So let's take a look at one month with the data and then there'll be some interesting things to happen and then too much. The end of the season is the year and you'll get the summers different than winter. So yeah, it's going to take time to figure out all these trends but I bet you I could figure out some fairly interesting trends from just one month with the data. Maybe even one week's with the data, probably not, but let's start taking a look at it and see what we got. So let's not look for excuses for why it's going to take so long and it's going to evolve, that's okay. There's nothing wrong with it evolving. Let's get in a position where we can change and change and change and change and change as the data changes because this hotel's competitive market will change. Some clown will build a Hilton right across the street from them, which will completely change the nature of their business and however long it takes to build a Hilton but and that's okay. So we want to react to that and go into bashing Hilton or something like that. Yes, sometimes that's what we've done too when we can't wait for, I mean, until the feature rolls out and that's when we start collecting data. We just roll with it and that's what we've done. And as you, and the other challenge of this is, you know, these are scientific models that get built and then you question the veracity of what you built so far and it's just a hack at that point until you actually get better data and unfortunately it affects revenue and so that's where it becomes a challenge also. So to the same point, right, when we're working with data and the nature of the data, how data can impact like what analysis you drew out of it. So two very, you know, common sense based hypothesis which we tested very recently. One was people who are on Wi-Fi most of the time and people who are using their data connection. So the hypothesis is the user should be higher for people who are always on Wi-Fi, more on Wi-Fi compared to people who are using data connection and they would also be upgrading or more frequently. But surprisingly the hypothesis was proved wrong. People who are on Wi-Fi are not amongst the people who are upgrading. Yes, they're using more but they're not upgrading. There are other people who are not so much on Wi-Fi but are upgrading more. So this is how things change. So this was not a hypothesis we thought would go wrong but it went wrong when you look at the data. So this is how like, I think this is how data, once it starts coming, changes things a lot and how you look at it, how you analyze it, what models you apply and probably Gopal is struggling with this a lot. So to add to the hotel problem, if I would have had a hotel, I would at least take five customers every day and push the data. And the problem which he's stating that the data should go to the end user every day if you want to have real value from the data. Let's take your conferences, the number of likes for the conference. So it's to that level where data, the way you presume is one thing but when it reaches the end user, the culture and there are different parameters that's coming to persistence which actually change the trend. So if I were to run a hotel, I would push the data to every five customers every day. So in data science, we talk about statistically significant amount of data. You need certain amount of data to be able to derive any meaningful pattern out of it and it varies from different problems to different problems but you need data which is statistically sufficient to be able to derive. So if I just took five, I could be completely off the whack. I could be giving them recommendations that increase your price and sell at a higher price and that could just shut down the hotel. So it could have severe consequences for the hotel. So we have to be very careful because there's real money and reputation at stake. So from a science point of view, data science tends to be closer to something called grounded theory as opposed to you have normal science, science. I don't know what you call normal science. But anyways, where you run, yeah, normal science. So you run an experiment, you see what happens and then based on your results, you then evolve your hypothesis, run another experiment, evolve your hypothesis, run another experiment. And in this case, you just wanna do it much, so in the grounded theory world, it often takes months or years to run these experiments, like each iteration of these experiments. In the case of data, it could take day, maybe a week or two or days or even hours sometimes. And there's wonderful case studies out there where literally the iteration lengths are hours. So yeah, that's extreme, but don't underestimate, Walmart has, I have to, or in Target, I have to be, I gotta think they're overwhelmed with data. So it's gotta be possible to at least have daily iterations based on looking at the data, if not even hourly, depending on where your data feeds are coming in from your stores. You wanna reply to that? There's tons of data, but I think it feels like the piece that's missing is that all the complexity that plays into it. When you look at an enterprise as big as Target, it's not as simple as we have the same backend system for every single user in every single place. A lot of times we've got different systems that are responsible to feed in the same type of data, dependent upon the location or what it's doing. And so part of what takes so long is really needing to get a model that works across every single one of those. So make sure your sample size is right, one, number two, identify who even owns the freaking data source. And then how do you get into that data source? So it's so big and so matrix that it's not as simple as just get your data, look at a few days' worth, and then you'll have your answer in the next day. The process of just finding a data owner sometimes takes weeks or even months. Like I was just on the phone last week with our product owner for one of our data engineering groups out in Sunnyvale, and he was explaining to me like there's a middle system that's going away and just the pain that they're in right now because we're so big. And when you grow, sometimes you don't scale up super efficiently, right? Like you've got growth that pops up kind of everywhere. And so what you wind up doing is because you need to be fast, you just do the solution that's gonna work at that time, which is good. But then when you're so big and you're trying to make everything unified across the board, there's a lot of complexity. Yeah, so that data stream is not currently available to you, but I'm sure right now you must have hundreds of data streams that are available to you, coming in right now into your data warehouse, right? The question is, what's the question that you're trying to solve? Like the model's different geographically, what we sell at a store in Louisiana isn't gonna be the same as what we're selling in a store in Minneapolis, and the models don't scale, right? So you have to get data across the board. And I think, I would say it's not that we're not thinking about it the right way, it's that the gamble is so big if you don't have the right amount of certainty when you're making a decision based on data. So there's just no room to make a mistake. We make a mistake and there's huge consequences. Yeah, so then for those types of questions, you have a longer pathway than for other types of decisions. I'm sure you have different scales of the impact of decisions, right? So you change the runway length for the more impact, the longer the runway. So here's what I would say, I worked on an HR project for a long time and there was no data that we provided that didn't have some kind of huge risk. Like even just like anything that's tied to team member time, there's legal ramifications for that, even if you're off by a millisecond because that adds up and all of a sudden you've got a class action lawsuit. There are ramifications, right now I'm the product owner for our Hadoop space or our big data space, right? And all the clients we interface with, it might seem when you separate one small piece of data that that's not going to have bigger consequences, but the reality is is that you're never after just that one piece of data to answer a question, you need data from all different pieces. And so now things become very complex. And if you take a gamble on this one piece, the impact on the question that you're trying to answer is suddenly astronomical. So, I don't know if that's answering the question. Are there any other questions we can quickly open the floor maybe five minutes quickly if you guys are not too tired and then we'll close this. Is anyone have any other questions that you want to ask the panel? How they're doing something? Is this a real challenge? I'm not sure, yeah. Google Docs. Tools in terms of agile tools or data science tools. My teams do mostly are something to MATLAB. They do use Python a lot. Python is actually a really popular, has a whole bunch of machine learning libraries from the open source community. Those who are probably are most common. So, depending on which team, we were just earlier discussing what defined BI because it's different in different teams, it's evolving. So if you talk about proprietary tools, we use things like Teradata, MicroStrategy, TechView, Informatica. But if you talk about our machine learning teams, we use Python a lot. We use Apache Mahat. So again, I think it's really, really federated. People use different tools and technologies right now. I would say the same thing, right? It depends on your budget and there's a whole your budget. What are your preferences? How fast you want to get there? And there's a whole array of tools available. There are a lot of open source tools like R is there. You can, if you are ready to do programming, you can have Python. Then there are third-party tools that you can buy, Teradata, et cetera. Informatica is there. IBM has a whole set of tools that is available. Then there are the front-end tools like Tableau, MicroStrategy, SAP, everybody. So that there are a lot of tools there. You can have your pick, you can go open source if you're ready to do a lot of coding work or whatever change is required. Or you can go with third-party tools, whatever is your preference. One last thing. I also forgot to mention, actually there is a huge debate that we... We're using everything. We have to do Teradata, SaaS, S-based, everything you can imagine that works good to do. Right now we're really focusing on our big data space and we're really focused on data ingestion. And then also data egress for our external partners because we have clients who have... Or vendors who have a need for data. So you name it, we are probably using it somewhere. We had this BI typically on a batch based where typically it will take days or months before you get your reports. Now there's a lot of push around real-time analytics, really, because I mean Google Maps and those are really real-time analytics. How mature is that space? How much of a... Where it stands right now in terms of maturity, in terms of tools, in terms of techniques? And how, I mean, are you guys really applying it now in your area? Yeah, so again that depends on the area, right? So when I was on the project I mentioned before, which was an HR project, they wanted compliance information right away. But then you look at the cost of being able to do something like that and it doesn't support the use case, right? However, right now we have something that's an internally developed tool it's called Datastream and it's essentially recordizing data so that it's available real-time wherever it's coming from. Now we run into issues all of the times with our operational systems. How much can that support in terms of a draw from that system to get it into whatever space? And then once it gets into that space what needs to happen to it? So I think it's a hot topic right now in terms of really being able to assess like does real-time data belong in the BI space? Like that's one thing, right? Shouldn't that just be reporting directly off of those transactional systems versus coming to these central warehouses? And some people feel really strongly one way or the other about that. What I would say is that from a leadership perspective we're very focused on making sure that we deliver the right data in the fastest way possible to equip our leaders to make the right decisions. So trying to get even more action-oriented but the first thing we're focused on is stability and accuracy. All right, thank you guys. And thanks to the panel for coming in and enlightening us. Thank you again. Hopefully you got some answers, some more questions and I don't know if we were able to do justice to your time but hopefully people did benefit a lot and I think there's some new insights in terms of how we think about data analytics and agile in the context of that. Something for all of us to go back and think about. So thanks again for taking the time this evening and coming to the conference. Some of you have been here longer, so thanks again. And thank you guys for hanging around. There is dinner outside that's being served. Please do have dinner and if you're coming tomorrow then we'll see you sharp at nine tomorrow, okay? Thank you.