 Hello, everyone. Today, I will be talking about how to be versatile with data as a product manager. As a PEM, we constantly had to make decisions in our various roles, and we had to make the right call very quickly and sometimes under ambiguous circumstances. So I want to focus this talk on the various tactics you can use to excel in making the right decisions and getting the results you need efficiently. The key points I want you to take away from this presentation are how to flex your skills in using data for making decisions in the various roles that appear. Identifying the various sources from where you can get data to make decisions and understanding the various types of data you may have to fill and how to process them. A little bit about me. I have been at AWS as a product manager for the last three years. Currently, I am in the machine learning team and previously I was in the elastic computer cloud team. As you can imagine, at AWS, we rely heavily on data to make decisions and being conversant in data analytics is key to being a successful product manager here. I'm also an alum from the University of Michigan Draw School of Business. When I get time, I'd like to go out and take photographs. You can find me at a camera store where I mainly focus on landscape and nature photography. What do you see behind? It's one of my images from the sunrise of the Great Smoky Mountain National Park. So let's look at why being versatile with data is important for the PEM. This is because as the owner of a product, feature or service, you're going to be at the center of interactions with marketing, sales, support, customers, engineering, UXD, and even your leadership and finance team. You're going to be asked to make decisions or talk to any one of these partners or stakeholders in day to day activity. And to make these decisions, you're going to rely on one or more of them for your data needs. For example, one day you may be writing product requirement docs and write user stories based on data that you may have already gathered. Another day you may be talking to customers, understanding the pain points and customer needs, and trying to address them. You may be talking to finance to project the growth of your business or convincing leadership that they actually need to support this product or help make a decision. In all these different roles, you're going to be wearing different hats and for each of them, you're going to have to make a decision. And for each of these roles, you're going to be handling it in different ways. A second reason is that you have to handle data from different sources as part of your decision making. You may be looking at raw data from a log file or from a sales transaction. Other times you may be looking at dashboards to help support your case. You know, performance dashboards are charts that show growth of your product. You may be evaluating soft data and these are things like customer interviews, surveys, feedbacks and focus groups. Sometimes I even had to create data from social media and from marketing campaigns. As a PM, you will often be asked to draw meaningful conclusions from all of these different sources of data and have to make a stand. So you have to quickly identify what makes sense for a given issue and evaluate which one to use to support the case. Another thing that you have to recognize as a PM is differentiating between qualitative and quantitative data. And this is an important distinction to make. Why? Because qualitative data is data that you can get from surveys and focus groups. And they help tell stories in a very different way compared to quantitative data, which you get from dashboards and data warehouse. And to be versatile with data, you may need to distinguish one type of data from the other and to know and learn which one to use for each situation. And even after you've identified your data sources and categorized qualitative and qualitative data, you still have your work cut out for you. And that is in using data to get the results you need. Sometimes it may be as simple as just pulling data from a table and creating a chart out of it, say the growth of a project or sales data. But other times it may be a lot more involved. You know, maybe you have to transform your original problem statement. It's something that fits the data that you have. Or you have to convert data from one form to another using various tools like Excel or Python or even other data management tools. Sometimes you have to combine artfully, combine qualitative and quantitative data and tell a story with it. You may have to do all this without the help of a data engineer. It's not the end of the world, but it's not an easy process. And that's why you have to be resourceful and inventive to handle such situations and be scrappy, and be creative and persistent to make sure that you take it all the way to the decision that you're trying to make. So we have looked at why, but before we go down the rabbit hole of what we need to do to be versatile. Let's look at how we can frame the problem so that we can use data to solve those issues. Thankfully, this is a familiar concept called the scientific method and the basis of this hypothesis testing. You first have to formulate the problem statement that you have to make the decision for. And once you've formulated the problem statement, you derive the hypothesis that'll help you make the decision. Simply clear once I give an example. The final thing is that you have to know what metrics to practice will allow you to test that hypothesis and then eventually make a decision. So for example, let's say you have to update a design element for a website. For example, it's where the search bar is located on the website. So to decide whether or not to do it, you would create a hypothesis test that tests whether updating that search bar location would improve, say for example, the traffic to that website. And now that you've identified the hypothesis and the metric, then you go about doing what is called the A.B. testing. And A.B. testing is really just comparing that metric, that traffic to that website for that updated design where the search bar has been moved to the original design in a very randomized manner. So it means that some customers, some users of that website are going to be shown the original design, whereas others are going to be shown the new design. And by running this across hundreds of thousands of users, you now have meaningful data that will tell you whether or not changing the design improved at the traffic by, say, 20%. And if that was your hypothesis, you can say that, yes, this hypothesis was a valid hypothesis and you can go about with making that change. The primary challenge that you will face in that hypothesis testing is that it's not always easy to transform the problem statement into a hypothesis that you can test. Additionally, defining the correct metrics for success is not always easy. The previous example, it was easy to just say it was the traffic to the website. Other times it may be a bit more nuanced than that. And finally, you may not always know the world of available metrics or data that you may have to look at. And this is where, you know, looking at previous examples of such changes or talking to your data engineering team to see what is available is very helpful. I would highly recommend a webinar from the product school by a PM, Sohei Thia, who gave this excellent presentation on how to be a data driven team. It breaks down this process of hypothesis testing very clearly. Now let's look at an example of data gathering that I did as a PM. I was working on a marketing campaign for a new product that I had launched and they wanted to do an email campaign to help grow the product. Typically, these kind of campaigns are run in a bulk corporation where thousands of customers are email information about this new product. And then we have an analytics team to gauge how effective that campaign was and what the success rate was. For this campaign, we faced an issue. We clearly couldn't do email customers in bulk. This was because we had a very limited budget for running the campaign. Secondly, customers were very sensitive to the campaign. These are enterprise customers and if the email wasn't relevant to their use case, it's likely that they might unsubscribe. As we all do when we get an relevant email, the problem is this could impact future campaigns for products that are relevant to their use case. So the hypothesis we formulated to address is worth that. Learning a targeted campaign improves the efficacy of the campaign by a specific amount and that is better than the cost incurred in creating this targeted campaign. The metrics that we tracked were the usual ones for gauging effectiveness of email campaigns that are clicked through a conversion rate. But in addition, I also wanted to track the growth and usage of this product by the target of customers. So the way this testing was accomplished was firstly identified select group of customers by using data about existing products and figuring out who these customers would be. So once we identified the customers we ran a targeted email campaign to them, the control was a set of random customers that were not using the similar product. After this was done, the marketing analytics team measured the email campaign metrics and these fell, you know, click through the rates and conversion rate. But in addition, I also measured the growth and usage of the product using our internal dashboard roughly two weeks and one month after the campaign was done and compared the targeted customers against the control. What I noted was that there was a marked usage improvement in the targeted customers versus the non-targeted customers a month after the email was used. A hypothesis was on 30 days or a month was that it took that long for customers to really digest the campaign email and run a proof of concept with the new product and decide that, yes, the new product was worth, migrating to the new product was worth the effort involved in doing it. As a result, this targeted campaign improved the product adoption and it became a prototype future email campaigns. So in summary, you know, I was tasked with growing the usage of a product and ran a targeted email campaign to grow this product. And it had to be a targeted campaign because we had limited budget and we didn't want to spam customers. And the end result was that by being scrappy and inventive and getting a targeted customer list and identifying the right metrics for usage, we were able to validate the hypothesis that running a targeted campaign does better the cost of running that campaign and includes the efficiency of such a campaign. So we looked at the why we need to be versatile and how to frame a problem to be versatile. Let's look at what we need to be, what we need to know in order to be versatile with data. If I start by revisiting an earlier slide that we did on distinguishing qualitative from quantitative data. We recognize earlier that this is an important distinction to me. And this is, let's put that in context of the hypothesis that the testing that they're defined for meeting decision based on how you frame your hypothesis, you get to decide what format of data you want to use to support that hypothesis. And this also determines your methods for collecting data and how you process the data. So let's look at examples of these two data that really help you understand what to look out for when you're looking at qualitative and quantitative data and how they differ from each other. So qualitative data sources are really descriptive and conceptual data collected through sources such as questionnaires, interviews, observations and focus groups. They are needed when you want to tell a story from these responses and customer insights. And because there are fewer than numbers, usually point in time to say collected during a focus group, you're not going to repeat a focus group every day. And additionally, because of that, the number of data points are likely to be fewer as well, especially when you compare it with quantitative data sources. I mean, how many times are you going to run customer interviews? They're pretty taxing. The primary advantage of qualitative data is that insights really jump at you from these politicians. You may have to do some work in interpreting them, but the challenge is that or the flip side is that it is much harder to avoid bias in gathering the data. You have to make sure that you're not one way or the other in gathering data. And as apart from focus groups and interviews, the other key source of qualitative data, industry reports and market reports, because they aggregate quantitative data into useful snippets and this can really help tell a story or support your analysis. Typically, your employer or your university is likely to have subscriptions to sources such as iBuswood. There's some excellent market analysis reports. Let's look at quantitative data. Quantitative data should really be used when you're trying to quantify a problem based on data or when your hypothesis requires you to measure changes in a numeric scale. Right away, this distinction is, you know, you can easily distinguish this from qualitative data. And while this data can be pointed out, it's more often than not. It is more often than not, it's continuous in time. And if you properly define your thresholds and constraints for collecting the data, you can make sure that you're avoiding bias or anchoring any sort of quantitative data. Of course, because of the volume of data that you get in this, the primary disadvantage is that you have to do some sort of processing or filtering to make sure you get the right insight from the data to help support or oppose your hypothesis. And this is especially important when you source data sets of shoes that can happen, you know, processing more data for a bunch of IoT devices or looking at audio files for particular products, say an echo device. And you'll be lucky if you get access to dashboards that succinctly present the data, but more often than not, you have to do some analysis yourself. Notice that the one thing I didn't mention were customer surveys and feedback. And this is because this data source can be both qualitative and quantitative. We point in time or continuous because it all depends on how you formulate your questions and gather responses. Typically, the responses are long form question and answer. It's going to be qualitative data. And again, you're not going to get a whole lot of responses in long form. On the other hand, if your responses are multiple choice of simple options and things like waiting for a restaurant, then you're going to be collecting data from probably hundreds and thousands of users and this becomes quantitative data. And based on how you process a handle it, you can also be with bias in these situations. So let's look at an example of dealing with both multiple data sources to help address the issue. One of the products I was faced with when I was working in an EC2, was to design a compute server for running a very specific workload. And this may seem arcane, but it's really similar to designing a spec for a laptop which was intended primarily for gaming or video editing. It's just that in this case the workload and the customers are different. So the hypothesis I formulated was that a very specific design configuration is optimal for this workload. The challenge in testing the hypothesis was that design configuration isn't the exact same. There's nothing in literature says that you have to have four CPUs and eight GB of memory, for example, to run this workload. And I can rely on way too many data sources to eventually come to a conclusion. So in formulating the product requirements, I actually had to look at multiple sources. The key amongst which was talking to customers, so I interviewed multiple customers and ran focus groups. And these gave me qualitative input on what customers are looking for running this workload, because few, but very valuable. I also talked to workload experts for this particular workload and they gave me even more precise input on what is needed. I didn't have the customer view, but from a technical point of view, they helped me understand what is required. But I also wanted to validate these against data that I collected in a non-biased manner. And so I looked at dashboards and usage metrics for similar workloads that I've gone internal to the company. I also looked at log files to see performance of this workload in similar platforms. And using all of this together, I was able to create a product requirements document and then justify the specific configuration I was proposing to leadership. So now we have looked at how you need to rely on different sources of data to make decisions. Let's also look at why we need to be resourceful with handling data. This is because data is never in the format you need. Sometimes it's a large table or an Excel file. Other times it's just raw data, you know, I mentioned the IoT use case or maybe audio file. And if it isn't these raw data formats, you need to transform it to the one that you need, maybe as a border or a dashboard or a chart. That's not intuitive all the time. And this is mainly because data sources can be pretty diverse, especially raw data. And then there's no one single way to transform it into the format that you need. If you think of audio files, the thing has very different from sales transaction data. And you may need to combine all of this in order to make your case. This is probably where the help of a business intelligence or data engineering team can help. What if you don't have such a team? What do you do? So let's look at a few tricks or tactics that can help you address those gaps. In a typical medium or large enterprise, you're likely to have data and a data warehouse. And that means that you can do it in a database and you can write what is called SQL queries to extract the data that you need from it and then present to the need form. The problem is, SQL is just going to give you a simple table and it's not in a presentable form. A tableau of some other form of graphical analysis tool is extremely useful in such situation because not only can you pull data from a data warehouse, you can also present it in a very visual manner using graphs and charts. This is important because two ways to work with such visual dashboards are far more convincing and powerful in making your case and making a decision in supporting your decision to leadership. What if you don't have data in a data warehouse? Say it's in a log file, it's in a modern process. What do you do? I think first and foremost, see if you can handle it with Excel. It's one of the most simplest tool, easy to pick up, and it has a little bit of flexibility in handling unstructured data. But more often than not, if you have any form of tabular data, you can import it into Excel and then process it. Additionally, Excel is pretty extensible. You have pretty powerful DI tools that will allow you to make additional analysis. So if you're in Excel with, this is the path you should look at. On the other hand, if you have a bit more programming experience, Python is certainly a better way to process raw data. And this is because even though there's a steep learning curve, once you pick it up, it has a lot of power in how you can process data. It is very versatile and it builds upon libraries and other analysis that have been done by other people, other experts. You can also use tools like R, but really unless you're a data scientist, probably know it's an overkill for what you would need to do. So one thing to note is that all of these four tactics are repeatable and that is key because more often than not, you will be asked to redo analysis that led to the original decision and you may need to do it multiple times. Now let's look at an example of how what we learned applies. One of the tasks I was working with, one of the tasks I was working on was tracking the availability of a specific product across different regions. And the goal was to provide a roadmap to a leadership on what to do next, where to expand the product. But typically, in this case, I would have just looked at the dashboard maintained by our VIP. However, for this time, there wasn't a dashboard, there wasn't even a data that was where I could pull this data reliably. There was no hypothesis in this particular case, it was just a data gathering exercise, but we had to do it in a very quick and repeatable manner. So what I did was, I did a bit of scripting with Python to check whether a product is launching in a single instance in a single region. And by using the result from the Python script I imported it into Excel to present in a more human friendly manner, in a more graphical manner. And the key requirement was that it was repeatable because we had to know this every single region presented to leadership. And so what the key takeaway for this is that you don't need to know every skill, but it's important to be resourceful and gathering the data and be very quick in processing the data so that you're not stuck up on any specific aspect. And so, you know, the key takeaways from that exercise were how to be resourceful and inventive with processing data so that you're not hung up on very specific pathways or specific points. You know, ask for help if needed and make and be persistent in calling through the result. And so I was beginning that exam with that exercise I didn't even know what that end result was, but by pursuing it through the end and making sure that I had a repeatable result to be presented. It was a successful venture. So we're now reached the end of the presentation. So now learned how to be successful data driven and being versatile with data. For this, the key things to note are you need to know what your end goal and what is the decision or hypothesis you're trying to make. Because you have to be, you have to learn to be flexible with where you get your data from and how resourceful and inventive you are in health. And finally, you'll make sure that the process you do for making these positions is repeatable because you'll never know when you're asked to do it again. Thank you.