 Hey, everybody, welcome. As the talk says, this is going to be about kind of how to build a data-driven culture view. It's a lot of material and not a lot of time, so I'm going to go pretty quickly. And I hope to leave enough time at questions so we can dig into the parts that people do care about. Just to give you a little bit of history on where this is coming from, and all of these slides are available right now on datapractices.org. We have a series of open courseware that's up there. And this is one of those things that we're drawing from. A little bit about the history of where this came from. About a year ago, November now, we put a whole bunch of folks that we felt were kind of leaders or visionaries across the data ecosystem, whether that meant the academic world of semantics, data viz, data journalism, open source. We kind of threw them all in a room and shook it just to see what would happen. And we noticed that a lot of the conversations that were happening were very similar to what we saw 10, 15, or more years ago in the software development ecosystem. And we decided that, hey, let's skip that whole waterfall development awfulness that we had in software development, and let's go straight to Agile for data. And so we wanted to come up with some values and principles that we felt best enumerated what was important about modern data teamwork as we saw it. And so that was kind of the first start. And from there, that community has grown and also helped to build all of the open coursework you can see that was relaunched in March on datapractices.org. We have four kind of core values and 12 principles, had almost 40 authors, and we have a ton of signatories already, so definitely some wonderful momentum that we really appreciate. And I'll skip over some of this, just to give you an idea of some of the people involved if you look on the left here, some of the folks that were in the room. There was some really good brain trust, and it was really nice to kind of hear some of their frustrations with the data ecosystem, especially as it's taking off in the realm of the buzzword. But a lot of times it's not keeping up in terms of the actual ability to deliver. There's this really big gap between what people want and what people are able to get from their workforce. And a lot of times it's just miscommunication. So we're hoping that by providing some of this course where and continuing to grow that community that we can kind of elevate the level of data literacy. So here, taking a look at some of the things that we're gonna cover, mostly I'm interested in kind of framing this problem, and then looking at some of the things that are good indicators, or perhaps other things that are good places where you can start. What's the low hanging fruit in terms of I am part of an organization, whether that's a nonprofit, a governmental organization, or whether it's the company that I work for, my day job. How can we make data a first class citizen? Now, I will say that these blue slides, this is designed as an hour and a half workshop for people to actually dive into and I've got like 15 minutes to go through it. So we're gonna skip the exercise portion of this part but I did want to let you know that it is there. All of these slides, as I said, up on datapractices.org and if you look at the first slide, if you hit that S key when you're looking at the first slide, all of the speaker notes and everything that you need to kind of dig into this yourself, if you want to do something like that is all available to you. So, framing the problem. So, I wanted to kind of look at a few different tangible examples of how people are framing the problems in the data ecosystem. I'm gonna look at two books and then just my experience at data.world, I felt there was some really valuable lessons that we've learned over the last few years that I wanted to share that I was intimately involved with. So, first I'll start with the definition of what is data-driven mean. So you can see kind of the dictionary sort of definition but if there's only one thing that you walk away from this talk with, I want it to be that being data-driven is about the people. It's not about the technology. I mean, if you look at the research that's being done out in the ecosystem right now, whether it's Gartner or Forrester or academic researchers or whatever, they'll tell you that the Chief Data Officer, he's the new kid on the block in the C-suite, right? A lot of times they are underutilized, under-resourced and unappreciated. The research is saying that 50 to 60% of those Chief Data Officers are going to fail in their mission in the next three to five years. And the reason for that that we've seen anecdotally is that the ones that are doing well are approaching this as a people problem, an education problem, a culture problem. And the ones that are not getting the results that they're really looking for are treating this as a technology problem. What can I install? What are the tools that I need? If I buy Tableau, that's the answer, the silver bullet. There is no silver bullet. Like let's get that right at the beginning here. So if there's no other takeaway from this, it's a people problem. So there are a couple of books here and the first one here, Play Bigger. This is not about data. So the reason I selected this is there's some really good lessons about building practice. So Play Bigger was all about how do you, it's about category design. It's about creating a new business but creating it in a way that there's a new category. I am creating a category that's called data cataloging and I'm going to tell everybody that's the answer to everything and this is how you grow and explode and dominate a market by creating the market to dominate. So the interesting portions of this book and I won't dive in too deep just because we don't have enough time but I like that it starts asking questions about what are the problems you need to solve? So looking at are we data driven? If we're not, what can we do to solve that problem? Do we have underutilized data? So are we getting poor ROI? Do we have disconnected data? Have we spent the last 10 years aggregating technologies that now have stuff scattered to the four corners of the universe? And then taking those problems and looking at kind of what are the key components? So is my data hard to understand? Is it hard to access? Is it like looking at what are the things that lead to your frustrations? And then at the end of the day kind of teaching you, this book tells you how to kind of define the villain and it's an interesting way to kind of personify the thing that you're trying to defeat, the thing that you are doing battle against, whatever it is, right? Whether it's data silos, whether it's dumb data, whether it's self-serve analytics that no one knows how to self-serve, it's looking at ways to kind of do some of these things yourself. And then the second book I would recommend, highly recommend for anyone in the data space is called Winning with Data. I think this is a fantastic book and especially as it relates to this topic, I mean even the first chapter is titled, Mad Men to Math Men, The Power of the Data Driven Culture. So it's really neat because they kind of, they reframe a lot of the problems that you see in the data ecosystem in a very unique way and a way that's really easy to kind of get your mind around. It's a lot less, I know it's a lot less of that ephemeral well, you know, data silos are this or you know, some of those different things that, yeah, they exist, but they're talking about kind of four key problem areas. Talking about what they call data breadlines, data obscurity, data fragmentation and data brawls. And so going through some of those things, I like the data breadlines. This one is the one that I seem to see most often when I go into an organization. This is the idea that small data service teams exist within the organization but they have to service the data requests for everyone. And this leads to that data breadline. You can imagine the, you know, great depression breadlines where there's a round the corner and everybody's waiting with a soup bowl. It's the same thing with people waiting for their data, right, and a lot of times you may have been in line and you may have been ready, you may have had all your ducks in a row, but some executive comes along and says, I need this right now and they jump the line and now you're waiting even longer. So, you know, it's this idea that we have data experts and they will answer our questions. They are the black magic box and that becomes a problem, right? Because then everyone else is beholden to their time schedule, their workloads, et cetera. And so, you know, that's obviously a huge problem. The second one that they were framing was data obscurity. So it's similar but different. It's talking about how there are very few people within an organization that know where all the data lives, where all the bodies are buried, right? You've got maybe a couple of data engineers that know, okay, I know I have to pull these different things from these different pipelines for different people, but if you ask any one of the subject matter experts or stakeholders, they have no idea what exists beyond their own very small niche view of their data universe. And there's data fragmentation. So this is the shadow IT problem, right? This is where people got frustrated with data obscurity or data breadlines and they've decided, well, the heck with that, I'm gonna stand up and access database on a machine that's stashed under my desk and then I'm gonna use that to make millions of dollars worth of business decisions. And no one else knows where that data came from. If it's clean, if it's accurate or better yet, the hit by a bus problem, that person goes away. Now, how are those decisions being made? We have no idea. And then the idea of data brawls, I like this one because if you take some of the problems and silos and things like that, they're built within an organization, you look at, okay, two people have decided that they're going to reference our Q4 sales from last year. Now, they both came up with different numbers. They may be similar, they may be close, but they both came up with different numbers. Now, we spend more time arguing about where did this data come from? How was it generated? What kinds of transforms were done? And so now we're brawling about the data. We're not actually answering the question that we came there to address. And so those four things, I think, really do a good job of describing the problems that you see in an organization as it relates to data. Now, as I said, I wanted to kind of bring in my own data.world POV. So early on, we wrote kind of an internal position paper about the data ecosystem. And we had a bunch of main points that we looked at. We looked at community and governance and kind of the tool fatigue problem and transformation through data and all these other things. But I wanted to read one excerpt about data just to kind of kick it off here. So we believe that data can transform the world causing an unprecedented boom in efficiency and effectiveness. Data can make the world smaller by connecting people to problems and solutions. Data can radically change company cultures too. As William Gibson, speculative science fiction author, said, the future is already here. It's just not very evenly distributed. The future is already here at a handful of these companies that have transformed their industries. Airbnb, Warby Parker, Facebook, Google, Amazon were literally built from the ground up to take advantage of data. At Airbnb, their fifth employee was a data scientist. Today, they have their own data catalog where 45% of their 3,100 employees regularly engage with it. At Warby Parker, they have a similar data catalog but in this case, 80% of their employees are regularly engaged with it. Traditional Fortune 500 and Global 2000 companies are lucky if even 1% of their employees are doing the same or if they have a democratized data catalog in the first place. Most don't even measure this percentage. Ask them. These traditional companies weren't built from the ground up around data. They were built from the ground up around Adams, not Bitz, which is one of my favorite quotes from an investor named Mike Maples Jr. He's very quotable, you can look him up. The new order of the world is that Bitz will control Adams. This creates many exciting opportunities but is very scary for traditional companies. In the 2018 New Vantage Partners big data executive survey, the traditional company's fears of a highly agile data-driven competitor has jumped to 79.4% from, this is a significant drop from like 47% in 2017. So that was kind of our look at the data ecosystem and understanding that there is a large gulf between the haves and the have-nots, between the companies that have incorporated data into their DNA and those that are struggling just to understand how they can even recognize that they have data that's important. So I wanna look at the pillars of a data-driven company. What are kind of the important things that make up a good data-driven company? And so I think the first is data infrastructure. Now this is a tough one, because remember I said that this is not a tool problem, but the problem involves tools. We've spent the last 10 years or more really advancing some of these data technologies, whether they're data lakes, data warehouses, relational databases, all the way down to your spreadsheets and flat files. Like we use all of these tools on a daily basis, we just don't use them together and we don't use them well. So the idea here is looking at these tools and trying to find a way that, or identifying what data infrastructure looks like from a this is disparate, this is siloed, this is not working together well versus how do we put that together? And I think the questions that you have to ask yourselves are around data access, data context, and just making a single source of truth. And so when you look at these things, data access, this is important because historically you have had these, I'll call them the data elite, right? These are the folks that know how to write a SQL query. These are the folks that are your BI analysts and all the way on down the data science chain. And we are doing ourselves a disservice by relying so heavily on them. We need to start bringing in more of our subject matter experts. Yeah, I can be the best BI analyst in the world and I can crunch all the numbers I want, but if the numbers are off, I'm not gonna know it. I'm just gonna know that my math is right, but the subject matter expert can come in and say, yeah, Microsoft, we analyzed our gender makeup in terms of our employees. I know it's bad, but I can tell you that we are not 98% male. So it's that context that's really important. It's the ability for you to bring more people into that conversation. And so I think that's one of the first things that you can identify is, is there this gulf? Do we have the data people in a black, dark room somewhere and it's kind of this magic in and out? Or are more people involved in this conversation? Then I talk about context. This is your data dictionaries. This is understanding. If anyone has ever worked with data from the federal government, you'll know that you get a data dump from them, whether it's a spreadsheet or whatever it is, access to a database. And there are these columns, FS 37 2019 AG. Like, what is this? I have no context, I have no understanding. So a lot of times that data just gets dropped or ignored because no one has an understanding of it. And so bringing the context with you wherever you go and trying to centralize that context in a single place so that the water cooler discussions, the tribal knowledge that's gathered over time, if these things can kind of be centralized and they're just as important as the data itself. So I think that's kind of the other question. And then like I said, bringing this all together into a single source of truth. This is the data catalog movement is something that's big in the industry right now as people are looking at all of these disparate data resources and how do we bring them into a single place? That's what we like to play with, but at the same time, there's a lot of catalog vendors out there. Don't take my word for it. Moving on to data governance. Now, I will say this, that data governance isn't a dirty word. Like for a dirty term, I guess I should change that. A lot of people think about data governance as the red tape that gets in the way between me and my data. And really, if it's done well, it shouldn't be that. We should have the ability for data governance to maintain things like integrity and security, but also promote things like availability and usability in a way that is coherent across the organization. So it should establish a good framework on how to use your data, not get in the way of actually using it. And then data literacy. I think this is important. Again, this goes back to that getting more people involved in the conversation. I don't have to even be able to spell SQL, but I should at least understand there are ways that I can interact with my data and my data team and other things like that without having to be super technical. And we should provide avenues and resources to those in our organization. And you look at these big orgs like Airbnb or Warby Parker, they're great examples of my receptionist knows my Q4 sale numbers because she can then tell people, hey, this model is selling better than that model as she talks to people and interviews people or our VP of marketing can talk about, she can talk about the sales numbers of in the same way and say, look, now I can go out and tell the market this whereas one person is talking to an individual and the other person is talking to the market, it's important for you to be able to understand the underlying data and how it relates to the business. So data-driven leadership. Don't wanna dive too deep into this one, but a lot of times we will go into an organization and talk and that we're preaching to the choir when it comes to the practitioners. They get it, they're like, yeah, I know, we have data silos, it's a problem, these edicts have been handed down on high and it's mostly a lack of understanding but there's a lot of things that go into it. And so I like to talk about one of the things that you can do when you go in and you want to establish or build your ability to be data-driven is find the places where when you ask questions, someone answers with the phrase, that's the way we've always done it. Because that is the very first place that you should change something. If that's the way you've always done it, it's probably a good indication that that's the way you probably shouldn't do it any longer. The world is changing and you need to evolve with it. Unfortunately, change doesn't happen in a vacuum. It needs to be from ground up and top down. So I think there's a lot of things that go into this but one of the dangers that we always caution our executive friends with is expert syndrome. You've got an executive who goes to a conference and listens to someone like me talk and suddenly decides that they have inferred the answers to all of the universe and that expert syndrome is really tough. So you gotta stay away from that. So we talk about some data decision-making and I won't go too far into this but I think the biggest thing to walk away from is just knowing that there are questions that every person in your org can answer and being able to enable them to do so is important and don't just go build another dashboard. Like go ask specific purpose-built questions and find a way to answer them because most likely you're gonna learn something along the way and you're gonna have a better answer when you come out the other side. Treating data like an asset. Historically, most people have treated data like a cost center. This is a thing that we have to pay to store but they don't think about what this means. At the very best, the most forward thinkers for years just treated data as a way to squeeze a little bit better efficiency out of some process. How to operationalize that asset. Yeah, okay, we've kind of gotten there but sometimes you can turn your data into an additional revenue stream or even better yet and there's a book, it's named Doug Laney and the name of the book is escaping me but he talks about ways that you can use data to barter as well and I think there's some really awesome potential in the market for that going on. I'll skip past some of this stuff. Breaking down silos, I think we all know the restricted data silo. This is the, I can't use that data because it belongs to a different business unit but looking at dark data and unused data depending on whose study you look at up to 80% of data gathered by an organization it goes completely unused. So I think it's just looking for what are we gathering? Do we need to gather it? If we're gathering it we should use it and then dark data, the stuff that lives on people's laptops and underneath their desktop so take a look at those and then just ask questions and I won't go into that, it's pretty self-explanatory. So I guess the last thing I wanna make a point here I'm running low on time is just building your culture through practice. So you want data access? More collaboration, we've talked about that. Data dictionary, better definitions of your data, better understanding, human readable that's the most important part. Data literacy, a workforce of people who use your data and data decisions. So we're actually using data and not our visceral engagement with a problem to answer business decisions. So it's the idea of let data lead you there, be data informed. Sometimes I like to say data driven is not necessarily the answer because you don't want the data to be doing the driving, you want to be informed about it and with it but definitely should be using data and not your gut. So I guess I will stop there and see if there are any questions but that is an hour and a half workshop in 24 minutes. So...