 Hello and welcome. My name is Shannon Kemp and I'm the Chief Digital Manager for Data Diversity. We'd like to thank you for joining the latest in the Monthly Webinar series, Data Architecture Strategies with Donna Burbank. Today, Donna will be joined by Kelly to discuss self-service reporting and data prep benefits and risks sponsored today by Dramio. Just a couple of points to get us started. Due to the large number of people that attend these sessions, he will be muted during the webinar. And we very much encourage you to chat with us and with each other throughout the webinar to do so. Just click the chat icon in the bottom middle of the screen to activate that feature. For questions, we'll be collecting them via the Q&A section in the bottom right-hand corner of your screen. Or if you'd like to tweet, we encourage you to share our highlights or questions via Twitter using hashtag DAstrategies. As always, we will send a follow-up email within two business days containing links to the recording of the session and additional information requested throughout the webinar. Now, let me introduce to you the speaker of the series, Donna Burbank. She is a recognized industry expert in information management with over 20 years of experience helping organizations enrich their business opportunities through data and information. She currently is the Managing Director of Global Data Strategies Limited, where she assists organizations around the globe in driving value from their data. She has worked with dozens of Fortune 500 companies worldwide in the Americas, Europe, Asia, Africa, and speaks regularly at industry conferences. Joining Donna today is Kelly Sturman, the CMO and VP of Strategy at Dramio, a data as a service platform company. Over the last 14 years, Kelly has served in multiple senior roles in various companies such as MongoDB, HeadApp, and MarkLogic. And with that, let me get the floor to Donna to get today's webinar started. Hello and welcome. Thank you. It's always a pleasure to join these and see familiar names on the attendee list. So thanks to those who joined fairly regularly. This particular webinar, and we'll talk more about it, will be particularly interactive. One of the great things about Data Diversity Crowd is that you guys are always asking great questions, and because this is more of a panel style, we'll be opening up for you for Q&A. For those of you who like to take your Q&A and discussions out on Twitter, there is a hashtag, hashtag DA strategies for this event, and we'll be looking at that as well. And I'm on Twitter at Donna Burbank, if you want to DM me or anything. I will not get to it until after the webinar, because I'm terrible at multitasking. But with this, you'll be more interactive. The other question that comes up, as you know, this is a series where we talk about all various and sundry topics that relate to data architecture. And this is the last this year, but if you're not, they are all on demand. And that's often a question we get, but Data Diversity keeps all of these in the archives forever, as long as I know. So you can catch any of these other topics that you may have missed earlier in the year. Next year, we are lining up already a good lineup of topics. We hope you can join us as well. Coming in January, we'll be talking, and this has been a bit of a tradition for us, kind of what's one of the new trends to be thinking of in the coming year, and what's the next big thing, and what you should be paying attention to. So today, which is why you joined, the topic is more on self-service reporting and data preparation. And we list those two together. So we are all data professionals on this call. We realized that reporting doesn't come without good prep, right? And this is a hot topic. Most of my customers that I work with are either doing some sort of self-service data prep or asking the question of, is it right for them? And how do we get there? It isn't all about tools and technology. That's a big part of it, and that's part of the driver. But the other big part of it is, how do you have the right architecture supporting those tools? And how do you get the governance across different roles and different stakeholders across? So before we jump in, we have a quick poll question for you, just to get a feel of the audience today. The poll is, are you currently implementing self-service? So yes, you are. It's running, you're in place. Yes, the middle one is, but we're just beginning to investigate, or we just started, so I can't really say a full yes, or no, we don't have one in place. So those are your three options. I will give it, Shannon, the floor, to open up the official polls. So over on your right, you should see whether we do or not have a self-service strategy. It's going to, after you hit submit, it'll take a little time for the polls to open, and then we will share you with, share with you what everybody else has been saying as well. So hopefully you will be curious. And if you don't participate, then you will not be part of the voice of the crowds. So I encourage you to hit the little button, whether you're doing self-service or not. It looks like the poll has been submitted, and it is going through the staging area and sent to a star schema to be reported shortly, so we'll get the results. There's always some latency. That's why Shannon tells me not to do these, but I never listened, so it's always a little bit of a lag before we get them. All right, Shannon, should we be seeing the results, or am I missing something? Right, well, I did not see the answer of our own poll. We will skip this awkward moment, and we will go on to the discussion, and if we can get the answer later, we will. All right, so maybe that'll be our little excitement at the end. What are some of the drivers going towards self-service data prep? Right, and this shouldn't be a surprise that a lot of it is just shorter time to insight, right? There's so much information we want to get. There's so many different types of information. Everybody wants to be a data-driven company, right? So that is the goal. So we still have enough people to do that. So we want to become a data-driven. We want to react the time to have business conditions and have operational efficiencies, right? So I think this is not a surprise. That is the drive for we want more analytics, we want more reporting. Traditionally, your sources are sort of the traditional ones, right? When we do a lot of the surveys, still things are in relational data, like the warehouse makes spreadsheets, unfortunately, or one of the more popular databases out there. I have to put that in air quotes, right? But that makes it a little easier. And in the past, even though things were sort of in the relational world, you know, nothing just doesn't happen and there isn't a place for this. But it was sort of a business person would need a query that sort of throw it over the wall. They would send it over to the data warehouse team that was already very busy because they're trying to report up to management. And there was this lag. And that wasn't really fun or isn't fun if you're still doing some of this for either side, because the data where there is just not enough people on the planet to be the way people think is more exploratory, right? I have an idea. I have an idea now. And I want to see, I'm going to build a marketing campaign. I want to see what the results are, what what region I can't wait a few weeks for the data warehouse team to get back to me. Data warehouse team is busy with some very important reports can't respond to every version of slicing and dicing. So there has to be some sort of self service capability, whether you know, there obviously, there's some prep involved. Is there one data warehouse that everybody sends information to? Or are there marks? Or are there more just lakes or hubs or whatever you want to call it? So this is this is the problem we're about. So I mean, not that either sides of these go away. Right? You need both. But how do we break down that wall and have that work correctly? What sort of compounds the problem or opportunity I like to think, but it just makes our reporting environment more complex. Is that everything does not live in a nice clean relational database that is in a warehouse? If that were the case, I think self service would be a bit easier. So you'll just look here, although relational database still are the leading whether it's on prem or on the cloud. And I can't get rid of those spreadsheets, right? You'll see just the breadth of the different platforms we have. And this was a survey we had done with diversity earlier, I guess in the last year. But when we asked into the future, you'll see that those bars are even more dispersed, that it is, you know, clearly relational databases stay more are in the cloud. But people are looking at a lot of different platforms, real time Internet of things, you know, big data platform streaming. And a lot of people don't know. And because they don't know because things are changing so drastically. So I think not only is it just the volume and the timing of getting those reports, there's just a lot of different platforms. So one of the discussions we have, and I'm going to open up to Kelly in a minute, who's our guest speaker, but I kind of folks who want to be taking active use of the chat, please do because we'll kind of open it up and take your questions throughout. We won't do your typical take questions at the end. We're going to kind of open it up throughout so it's more of a collaborative. So Kelly, I want to get you know, you are in this business for a long time with your pedigree. I kind of want to get your thought of how is this expansion of the data sources beyond relational? Is that complicating self service reporting? Is it workable? Is it exciting? And how does someone tool up in this new world? What skills are needed? What are your thoughts here? It's a good question. I think to be clear, I think overwhelmingly, if you look at the data that is of interest and valuable to companies, it is still dominated by relational databases. But newer strategic applications, and where there's maybe interesting new types of data that might be relatively small compared to just, you know, decades of accumulated data in relational systems, those tend to be in non-relational sources that might be a data lake. It might be something like F3 on AWS or ADLS on Azure that are simply, you know, much closer to something like a file system than a relational database. And the underlying data tends to be modeled and structured in something called JSON. And this does complicate things in terms of self service because those data structures are fundamentally incompatible with the sorts of tools that people are already using today. So if you take something like your favorite BI tool and try and point it at these new sources of data, you're sort of stuck because you can't simply, you know, point with an ODBC driver to those sources and start to go to work the way you're used to with something like SQL Server Oracle. And so I think that does pose some interesting challenges. And what you end up is sort of going backwards in time to the point when you didn't have self service and you ask, hey, can you get this data ready for me? Or can you build a report for me? Because I can't really do it myself with the tools that I'm used to using. So, I mean, what is there hope? Does everyone now need to be this, I love to say, the sexiest job of the 21st century, according to Harvard Business, this idea of the data scientist, do you think it's feasible for business people to sort of learn JSON? Or do you think that's the place where there's that partnership between IT and more business focused people? Well, I think, I don't think it's feasible for every BI user to become a data scientist. I think, you know, some of them can and have made that transition. But I don't think it's necessary either. We at Dremio, we see this as an opportunity for a new class of technology that we call data as a service. Because the same way that, you know, software engineers love AWS because instead of waiting on IT to rack and stack servers, they can click a button just like shopping on Amazon and get, you know, infrastructure on demand. We think there's a similar opportunity for data. And so, for people that are used to using self-service reporting and visualization and data prep tools, we think there's a similar kind of experience opportunity in terms of getting to the data itself. And that is what our product is focused on, is making it so you don't have to be dependent on IT. So that you can go into a data as a service platform, do a Google search to find a data set, inspect it visually, and then click a button to launch your favorite tool and go to work. And it doesn't matter if the data is in JSON. It doesn't matter if it's an Excel spreadsheet or a relational database or some other format. You don't have to worry about that because the data you need is at your fingertips on demand and Dremio takes care of the heavy lifting in the background to make that possible. Yeah. Another point you brought up, I think, is a good one because when we talk about data, we so often think of, you know, the database or the JSON files or, you know, the XML that's being passed across. But the other big excitement in the industry is this idea of the cloud technology and the scalability. And again, this is from the Data Diversity Survey we had done last year. And you'll just see that the, you know, overwhelming number of people are, you know, starting to implement a cloud strategy or planning to. Because, as you mentioned, it's so easy to spin up some of these, you know, AWS and some of the different platforms. It can help with scalability. It can help with costs. It's also, obviously, not without its risks. And I think that some people are rightly concerned about things like security, privacy, but also skills with another big slice of that pie. And I'd like your thoughts, because I'm glad you brought that up, of how does that help self-serve? We're so used to that almost seemed if we were dividing up roles, okay, IT, you know, does the provisioning, they get the platform set up, and then we BI or we self-service BI, we can do the queries on the data. But do you think even that's changing, but now maybe a more traditional, you know, business person could just spin up their own servers as well? I think that the way that you should think about this and attendee should think about this is that increasingly the experience you will have as an employee at a company is that instead of double clicking on an icon on your desktop, to launch a desktop tool to do these tasks, you will log in through your browser to a cloud-based version of the same software, where you no longer have to worry about upgrades on your desktop, you no longer have to worry about, you know, some kind of incompatibility with the tool and the way your particular desktop is configured, instead it will be like using any other browser-based service where somebody else in the background is taking care of all of that complexity, and you get access to these services in a very simple, clean way. And so instead of you thinking about, oh, I'm going to go to AWS or I'm going to go to Azure and spin up infrastructure, instead you will log into a data prep capability in the cloud, or you will log into a BI capability in the cloud and have everything that you're used to having, but it will be managed in the background for you seamlessly. The real opportunity here for companies as they transition to cloud, I think, is that employees don't know about it, that what you're really doing is taking away the infrastructure area of complexity, and in the same sense that today as an employee you don't really think about servers running in a data center, or hopefully you don't have to, that, you know, one day when you deploy some kind of workload it just happens to be running in the cloud instead of the data center of your company. Yeah, I mean, I think that's an interesting distinction is that, you know, we do, we sort of get set in our own patterns, but really having it be a whole ecosystem and not even thinking of I'm doing the platform and then I'm moving it across, again, the BI layer on top. It's an interesting perspective. Our audience has been a bit quieter than normal, so I'm just wondering, I want to throw that out to the attendees. Is anybody here successfully implementing what we've described, this idea of using cloud for more self-service and having more business people kind of either do all of the above that we've mentioned, or maybe having IT do some of the provisioning and doing more of the reporting side. I'll give a little pause there. And again, if you haven't found the chat button, that's a great way to kind of type in your response. The phone's here. Technical little tips for Kelsey. I mean, I found it with some of my own clients, that this is a bit of, and I find it myself, I think a lot of us who have been in the business for so long, we kind of get our old patterns and just sort of think, oh, I've got to spin up the server or I have to, you know, do some of these old-fashioned things that we haven't had to in the past. So it's a different way of kind of looking at it. I know there's also some risks. I think thinking of, thinking of some of these web platforms, like it were a regular desktop server, I had one client that was just thinking of it sort of like a SQL server and they were sort of spinning up all of these cloud platforms and forgot that it was subscription-based and had a ridiculous, with a lot of zeros, bill at the end of the day. So a lot of different paradigms to be thinking of and I think we had an issue with WebEx and I'm wondering if we're all still on. Well, I'm still here and I can hear you loud and clear if that helps. Great. Do we still have Shannon? Can you view the slides? I can see the slides. All right. It is one of those days. All right. Great. So I'm going to move on into the discussion. Sorry for the technical difficulties we're having today. I see one question in the chat if you want to. Okay, great. Yeah. If you want to take that, that would be great. Yeah. So the question is how do we sell or how do you sell the cloud to highly secure, conscious organizations? Yeah. Do you want to take a first shot of that one? Yeah. I think, I mean, I talked to tier one banks, government agencies, you know, Fortune 10 manufacturers, healthcare providers. The topic of security I think has made a significant turn in terms of what people's expectations are and I think now by and large companies believe that cloud providers are better at enabling security than individual organizations are on their own. If you talk to Gartner, if you talk to Forrester, mini industry analysts, that has become prevailing wisdom among folks and if you look at just the frequency of data breaches, we just had one with SPG. My data is out there via SPG affecting, you know, three or four hundred million people and those happen with regular frequency. I think companies have become more comfortable with relying on the cloud providers to implement security on their behalf more effectively than they can do so themselves. Yeah, no, I think, I think a lot of folks are looking for that silver bullet and having it in house isn't necessarily more secure if you're not securing it, right? It's a very good point you just brought up. Well, you've locked on your cars and your house, if you don't lock them, you know, somebody's going to go charging right in and I think that's what happens in a lot of companies is people forget to lock the doors. Yeah, and another similar comment that I see is that similar to what I mentioned on this cloud, the cost implications of cloud versus data center. So I think people said that that is sort of why IT spins up the platforms in that they're sort of in control of that budget. And and to me that's similar to what we were just saying about security in that there's no necessarily right or wrong answer, but there is implications, right? So someone has the budget and so you don't want to willy-nilly do either one of these. Did you have thoughts, Kelly, on the kind of cost side of that because it's come up a couple times now? Yeah, it's interesting. I first hand have made that mistake where I accidentally left a fleet of servers running over a weekend only to discover about a $50,000 bill on Monday morning and I'm not exaggerating. Unfortunately, we were able to talk to the cloud provider and, you know, plea ignorance and cut a deal, but they don't make it, they're not incented to automatically terminate instances, but now there are lots of services that let you make it so that, you know, things automatically turn off if you forget and leave them on or that you're issued alerts when you pass certain price thresholds. So there's a lot better offerings through all three of the cloud providers in terms of helping you understand ongoing costs, but it is a risk. It's cheap to get started, but the cost can certainly add up. You ultimately pay for that flexibility and convenience that the cloud providers offer. It's up to you as a customer to make sure the cost don't get out of control. Yeah, and I think we'll talk a lot more about governance and I think that is a key part of governance is, you know, whether it's security, whether it's cost somebody needs to take ownership and when we talk about, you know, is it IT or is it the business, we have to be very clear on those roles. I think that's something we'll talk a lot more about. Another comment that came up is this idea of the semantic layer and I'm glad this came up and because we're big fans of that and we will talk more about that as well. Also, but data just doesn't magically query itself, right? So having that semantic layer and kind of the business and technical definitions. I'm assuming you'll agree there Kelly, I know that's near and dear to your heart. Oh, absolutely. That's how many times have you looked at a data set and, you know, the name of a column was C03 or something meaningless like that? I think different teams have a different sense of what the meaning of the data is and how they want to describe the data, tracking ownership, who to call if you have a question, understanding how often it's updated, how it's related to other data sets, what reports are based on a particular data set. There's just so much tribal knowledge floating around and it's really not captured effectively anywhere in most companies and if it is captured somewhere it tends to be in a particular tool like one particular BI tool where it can't be used by other teams who are using a different BI tool or other technologies. So the concept I think is incredibly important and everyone ends up doing it somehow just not as efficiently and universally as they would like to. Yeah, I think those are those comments and there's a lot more comments around something similar. Is this idea of that the roles involved, right? So it it is just like we saw the database platform is being so diverse. I think the number of people looking at the data so of course data architecture doing architecture but when you look at whether it's DBA it's programs, business people. This idea of getting the roles right and being collaborative where it makes sense but also I think we're coming up with some ideas of where in some cases we do need to have more strict ownership. Somebody's paying the bill, right? Who owns that? Who owns the architecture? So sorry, it is a day. Yeah, one of the tactics that I'm seeing increasingly from companies is this idea of crowd sourcing the semantic layer. That instead of it being a big IT project that lasts for months and months and then it completes and then that's sort of a static sense of what the data assets are and the semantic description of those assets. Instead providing capabilities through things like data catalogs that allow teams to describe almost like a wiki to describe what the data is and to keep that up to date based on the ongoing evolution of the business and the teams and people working with the data and making that really a crowd sourcing instead of something that only IT can develop and maintain for business users. Yeah, I think that idea of crowd sourcing is so critical and I wanted to sort of kind of get to the next question before we go to some of the other additional comments because I think this idea is a new paradigm, right of this self-service user and I think it's a mix of self-service data prep tools that we've been talking so much about, this idea of crowd sourcing but there's also a place for, you know, it doesn't mean that data warehouses go away or that master data systems go away, right? Or glossaries and data models and all the semantic layer but I think there's a balance between this idea of crowd sourcing and Wikipedia and then some of this more encyclopedia, right? In some cases of financial data ware reporting to the street that is locked down and there's certain definitions that we do need. So, I mean, where do you kind of see that balance? And again, is this a place where you think tools help? Is it a governance model all of the above? Kelly, what's your kind of thoughts on that? In a way, I think all of the above. I think the certainly tools help but a lot of it is really about the culture of the organization. Are people okay with the sort of the truth about the data being something that the business is owning and maintaining? Are they empowered to own and maintain that? Or is the culture such that no one wants to step up for that and they want it to be something that IT is responsible for? I think companies vary in how they want to balance those responsibilities and I think tools open up the door to a possibility where ownership can rest in the hands of the business but just because the tools there doesn't mean that it's going to be effectively applied in that organization. There has to be a cultural aspect to embracing that change that has strong empowerment from leadership to make that reality possible. Yeah, this one I'm sure will generate a lot of discussion that already has and I'll kind of throw out two of the comments so Sharisha mentioned earlier this idea that IT provides a layer of abstraction to augment the self-service BI so there is OLAP or a relational view or a Power BI data set et cetera and I think from the tool perspective that makes sense but then Gail chimed in and said but isn't data owned by the business and I think they're both right in a way so this idea of having this idea of data stewardship or data custodians and it's what I spend a lot of my practice doing of trying to get that balance right of your right but I think the business owns the data definitions but IT typically has had kind of their fingers on the tool I'm going to pause a bit because I know people have some discussions on this on chat how are people getting that balance right is it the business people that are providing the business definitions is it a governance committee that works together and then IT types them in what sort of working in other people's companies here oh you guys are going to get shy I have seen in my practice and this is a way I think some of the tools have evolved a lot so I think in some of the I don't want to say old fast because in some ways there is that place for the encyclopedia approach where maybe there is a group that vets certain definitions and this is how we define total sales or customer et cetera but I think the real world is as we're doing self-service data prep there's this idea of collaboration and as I'm using a data set I might find out that oh Europe is using a different definition for total sales and all are right I think you need both there's some place where we have the the committee locking things down but you need that that discussion that kind of comes through as well Gail mentioned that she has sort of a data steward team with a mixed level engagement that's it sometimes sometimes it takes a few examples to get people excited for why they need metadata right and I think my favorite insert company here it happens almost all the time if you're just a lonely IT voicing but guys the data quality is bad it's bad guys the data quality is bad we need a definition because they until they get the self-service and they'll say guys the data quality is bad we need a definition right so I think that's a way where self-service has sort of helped governance come into play because people are seeing the data firsthand and I think that's a place where there is some some overlap Peter Lam mentioned that his organization is just beginning that journey of having the business 100% on the definitions rather than IT yeah I'd be curious how that journey is going because I think that is the evolution right because data data is that weird thing that it's an IT thing but it's also used by the business right so a lot more folks are kind of chiming in James mentioned that his enterprise business owns and defines the data Roger mentions that this is probably a good distinction that business owns the glossary where IT handles the data dictionary Kelly I might pass that back to you do you kind of see maybe that's a good definite kind of split between the glossary versus the dictionary do you think with self-service there's kind of an overlap there as well yeah and I think it's probably worth trying to explain exactly what those terms mean because glossary and dictionary sound like synonyms but I think what Roger means is that you know for example the valid values for a particular field this is a trivial example but say states in the United States that those are controlled by by IT and you know and you know Joe user can't go in and add a state to the list of states right that's a that's a dictionary question the glossary is the meaning of this column called ST is actually state and that means states in the United States of America and we use this in the systems A, B and C and for reports one, two and three I think that's the distinction between the dictionary and the glossary and I think that's that's a nice balance yeah no but I bet we'll get a lot of discussion on that because I might slightly differ from that I think well to me and this as an industry that defines terms for a living I think we're horrible at defining terms to me it is dictionary and maybe it's just a subtlety on yours is more that's my data structures right I have a table with column A, B, C this is maybe the technical definition of how you query it a glossary may have some of the similar terms like you meant with with the state code or maybe that's reference data but what is even a customer is there a customer different than a prospect or how do we define total sales or you know I think there's business terms or I know I'm one of the first things I do when I go to organizations where all these acronyms mean you know what's HIPAA if I'm in health care and I don't know what HIPAA is right so that's often a question I have from people what is that fine line between the data dictionaries that more technical and a glossary is only business terms is there an overlap you know I think I think there's probably a continuum especially as some of the data is more exposed directly to the business any other thoughts there on using data dictionaries or data I thought to be more am I the only one that's passionate about that one now there's a first I do think how many people on okay so Gail sort of timed in that when we're talking about a prospect you know sales only wanted sort of a first name mandatory and you know before I great yeah the the we have a lot of interest from John John keeps raising his hand every week he's interested in our products it's hard if that's the only mandatory field yeah exactly but you know that's a balance and that's where I think this business and IT can I worked with a large retail company and they they were the whole data governance committee on that because I can see the salesperson's point of view of you know I have this person their lead they're interested I don't want to log them down with what's your name what's your address what's your propensity to buy and all of that but once they realized your very point can't do much John if we want to have a marketing campaign for John that doesn't help and I think as we start doing more of that collaborating to me as business and IT start to work closely together I think some of these walls that we showed in the beginning start to break down because people start to understand the why why data is being used why do I have to get an email from a customer well because if you want them to get a sales campaign and I think often that's where some of the data quality starts to improve make sure we catch everything so Peter again was talking about some of the naming standards and again I think that is kind of what we were talking about again is sometimes that's a struggle of you want us standards but you also want to have some freedom of query so I'm thinking of passing it back to you Kelly what do you think the balance is of how much should be locked down in terms of naming standards and data type standards and how much should be kind of opened up with business well I think there needs to be the flexibility for a you know a canonical way to name and describe uh data that is effectively owned by IT but I think that the idea that everyone is going to use one way to describe their data just it just doesn't work in practice if you look at you know marketing's marketing's relationship with a data set about customers versus sales versus finance you know those three groups of of people have different needs for the same data and different ways they think about it and different ways that it might make sense for them to describe it and so this this idea of a crowdsourced way of describing and maybe even you know preparing and manipulating data for different sorts of work I think that needs to be something that companies embrace that they need the data itself to be you know central and one copy and it's vetted and the integrity is ensured and IT is ultimately responsible for the data itself but but different people have different needs for the data and they'll kind of one size fits all I just don't see that working very well in practice and if companies don't embrace that what happens well people copy the data into spreadsheets and they start to do whatever they want with it and then it leaves the governed environment that that IT works so hard to create in the first place and you end up with teams reaching different conclusions about the same data so I think there needs to be a way for the data to be in you know one copy that is centrally governed but the different people can use it and describe it in different ways I think yeah and I think that's a great way of kind of adding context to this picture here where we had that idea of the encyclopedia and in Wikipedia and kind of the way I look at it too and I think we agree is that you know that some things I think there are and I have a another slide which famous I have a slide for everything but you know almost the more data is shared you know that common subset of core maybe master data that should be locked down there should be a canonical way let's not argue and have everyone have a different version of how we're storing first name you know if a first name needs to be longer we'll all agree we'll make it longer make it work for every bit I think there's a place for that core set it that of true because that's where I see a lot of just the spinning of you know are we spending all of our time cleaning up addresses when we could just have that right but I think you're right and when we get into this idea of self-service that idea of what's the right definition of total sales there isn't there there may be the one we send to the street there may be the one that spot finance uses internally there may be the as long as you put that context which is part of that wikipedia dynamic approach I think that is that we've got metadata on read you know I guess kind of like we have schema on read you know as long as that core core is defined we're not worrying about just spinning on that there's a valid reason why it depends and as long as we have the core I have some clients that actually store the query that we use this definition in to kind of give that context in this report this is the the calculation we're using and there's a reason for it I think that gets rid of all of the spinning yeah we have a similar approach in in in Dremio that as people provisioned data sets for particular needs the the query that defines that data set any applicable transformations the description of what that data set purpose is and who it's for and the security control to go along with that that's all part of the context for the data set that's in provision for a particular user and and we do that virtually without making copies of the data which is kind of a core tenant of the product you don't need Dremio to do that necessarily we kind of make it easier but that idea of everything has context and you need to be able to capture that context to to preserve kind of the the custody and the the information about why that data made sense for that particular user at that time I think that's incredibly important I'm really glad you brought that up yeah I think that and I think that's what I think a lot of light bulbs are coming as we move into this self-service data prep it is a changing world I think Gail chimed in does anyone else have difficulties with one part of IT eliciting business requirements without considering other areas of the business requirements and I think that's almost a symptom of what we've just discussed I think for some if we have core shared data it has to be a core shared decision or we open it up and let individual groups to define our own as long as that's open and people can see it and I think some of these tools you mentioned Kelly you're away IT doesn't have time to talk to everybody and sometimes you find some hidden secrets in an area I had one customer that they had a query someone down the hallway was using a very similar query and they said I didn't know that was all being calculated and they were down the hall from each other and didn't know that until they were looking on this sort of crowd sourced you know data catalog and so I think that's a way to get more voices into the conversation that you might not have heard before you know it's a way of sometimes technology is a way to bring people together which is sort of ironic you know not replacing people it's actually helping that conversation Don I know that you had a other other slides I don't know if you like what you're seeing on your end but you're what we see is still a slide like the one of the polling questions from a while back slide 14 is what we're seeing okay we are having I am now well now it's jumping again so we should be in Oh it looks like I can move your slide would you like me to move your slides for you what slide would you like to go to I think we're both moving it so it should be on 18 I'm not sure why we're having um a little bit more from the audience perhaps on this idea of the the crowd sourcing of metadata are people trying that any any curmudgeons on the call that think we're naive and think that there do need to be kind of more rules and strictness or someone maybe tried this collaboration and kind of thought it went too too far the other way because I know a lot of this is a continuum and we're still sort of kind of flushing that out now I had one example and this this is the I think the the the minority and not the majority but in some I think you want to pick what can be open source defined I think when there's a chord the first decision of what we mean by customer and prospect and whether social security numbers should be obfuscated and we're talking about PII or social security number you know I think those they shouldn't be defined there was one of those core decisions and there were two team members that did not agree one was in Asia and one was in the U.S. and the person in the U.S. had her definition and she would change it and then overnight in Asia the other person would override it and they were not there was a better way to resolve that conflict because they were not going to agree and finally we did that with the place where I think the old fashioned or old fun you know the more human touch steering committee came in and said okay we need to sort of agree on this and that was the one way crowdsourcing went bad I think most often that's why I think there's that balance of the encyclopedia and then the Wikipedia approach any other thoughts on that that you had Kelly or do you think that used to be working well for you well I mean see I see different things at different companies and I guess from my perspective is the the crowdsourcing is happening whether you want it to or not the question is are you are you providing a way to capture it so people can collaborate and benefit from each other's work and communicate more effectively or are you letting these things sort of happen on you know spreadsheets and email chains and and I think Peter you know Peter just said that the reason we're standing at the crowdsource glossary is because previously there was no single source of truth for the business to refer to besides the CDM the CDM is a very large academic model that have provided no use to the business because only it knew how to navigate through the artifact I've seen that so many times and it's probably not up to date either because things change and new data flows in and new use cases pop up and people need a way to stay on top of that and it's not going to happen from IT I don't think because let's face if you think about our experience with IT for most people is it's like going to you know the world's most popular deli and you go take a number and wait in line and for every you know for every hundred people that are what we call data consumers users of vi tools data science people who rely on data there's one on on average in most companies for every hundred data consumers there's one person in IT who is tasked with supporting their needs and so we end up waiting in line a long time so you have to take some of this work and put it back into the hands of the the data consumer so they can do their jobs effectively and I think this is a great example of something that that most users can do very effectively in their best interest and it's something that they want to do so let's get out of their way and let's do it yeah no that's good I think that's I liked your comment because I think it's a bit of truth when you know we when we used to lock it down people did work around us in the past anyway right so you could say this is the definition that people said yeah I did their own thing on a spreadsheet right so that's not ideal I think there is a balance I think one of the comments was as long as we're clear of I think it was Joe that mentioned what metadata is curated and what is not I think that's key back to that encyclopedia you know this is our our definition we're going to the street this is the this is the curated definition of what a customer means I think some of these core artifacts we're talking about master data or a corporate data warehouse I think almost having you know if you think of Twitter the kind of the verified users type of thing I think being clear on what is curated what has been mastered what has had all that hard work behind it versus this is how I wrote a query against it I think that's kind of the best of both worlds because I think the old way doesn't go away I think it's broadened you know from my point of view because I think you're right you can't lock down everything no one I think that's the last thing IT wants to do as well so I think it's the idea of focusing on that piece that needs to be curated and then opening up everything else and then providing that I think a data scientist would love a curated data of customers right they'll certainly use it I will I will I question you a bit on the CDM or the conceptual data model I think you call my baby ugly I think I think it's wasn't a bad CDM because I find a conceptual data model very different from a glossary it sort of has a different use and a different way of visually showing the information I've often found that to be an excellent tool in addition to a glossary in addition to a dictionary but a way to kind of show the relationships to data you know some of these arguments of is it a prospect or a customer or a you know a lead sometimes just showing a visual picture can be the right way to do it so I think with any technology we can say we had one that didn't work but I think CDM is a tried and true way of really showing things in a and often that feeds your glossary right is that's that's the battling out of was the customer versus the prospect and then that Cleans definition can go into the glossary and I've had sometimes CDMs published next to the glossary so I think it's an and and not a stallion or great well I think that's been some great discussion um oh well there's someone that says their business people didn't know the value of the CDM or it they see it and it tells them things they didn't know yeah I think sometimes spelling it out on a model sometimes shows some of these is why we're trying to say there's a difference between you know a product and a component or a finished product and a raw material and some of these things that are core to the business and I have found that's an often way we can get kind of buying with the business they should have understand maybe why the model didn't work or whether the analytics went right some of these core decisions weren't went there there's a good question here that we didn't talk about oh sure it's set from from David that says I think it needs to be curated by whom and by it I think he means anything the gloss or the dictionary whatever it is if we have crowdsourced that means you can have potentially garbage right that anybody puts in there so the notion of vetting the information or curating the information I think is really important and there's a role that we've started to see emerging companies that that many folks call data stewards who or data curators and other companies who are tasked with being the official better of this information and so what you see in some tools is the ability to recognize tags or descriptions as official or or vetted or or some other you know or color differently but you can say okay well that's the like authoritative view on this particular data set or this particular column meanwhile here's what other people have to say about it and and being able to distinguish between what is a curated or vetted representation of that metadata versus what sort of you know generally crowdsourced and unvetted I think both are really important but you you have the specific role associated with that that generally sits in the business but is closer and to it and bridges it on behalf of the business users yeah and I think you know and then you brought a good point we've talked so much about tool that whole idea of governance and how we establish governance in those right roles and I think as different is every company is how different that governance model works so is it a formal steering committee and or is it data stewards and I think they they have a place of living together I often see you know there is a data steward they may be technical steward and a business steward you know depending on the different roles I think that steering committee often is a nice way for when there's sort of you know difference of opinion or the arbitrator whatever between I think that sort of offers the enterprise view but I think you're right and often you know what I've found a lot of folks when we start to talk about governance they say oh it's not an extra layer it's not an extra well doing it without governance we see the mass right but often these data stewards they probably exist already right the people that oh go to you know what in the old days it was go to Joe he knows right well do we give Joe the formal role with some accountability and some voice and I've often seen people jump right into these roles because it's in a way really thank you I now have a place to share either the issues or the knowledge that I have so yeah our other folks I would assume kind of having some idea of this data stewardship and and and we call it governance committees and I think that that is getting that right as hard as getting the tool right I think well someone else kind of mentioned some of the volume right and I I think that's you know someone you know we've 9000 attributes we're not going to put all of that into the glossary and I I think that's where and that's a place where your CDM or the that kind of prioritization can help let's let's pick that back to your canonical model there the canonical you know you Kelly what are what are the core things we need to focus on and kind of let some of the other things going well again kind of back to crowdsourcing is you let people decide what's important not all data sets are equally valuable or important to the business and even down in a particular data sets not all fields are essential in all cases so letting people describe and annotate things helps you understand what things are valuable another thing that we see for example in in in Dremel and this is true of other tools as well as you can start to get a sense for the popularity of data sets based on how frequently they're accessed and in a sense get a heat map of of data across all the different systems and down to the column level to understand hey these are the data sets people keep using over and over again what if it's the wrong data set what if you need to redirect people to hey that's that's the outdated copy you need to be using this newer copy or or other sorts of interesting patterns that are really hard to understand if you have you know 10 different bi tools and 100 different data sources how do you know what people are querying and accessing some of these newer technologies give you a sense for that and let you quickly understand where there is ongoing interest in the like I said kind of a heat map of where things are going on I think that's a great way of that overlap we were talking of that is the sort of when things are locked down and you know that encyclopedia versus Wikipedia they keep using that example but you know it could be that you know in many cases some of that privatization is down from the top down you know the CEO is as a customer centricity mantra and we're we're changing from a product center to a customer centric model we want to focus on customer data you know that we need a customer master that's all well and good and we have a customer master you could from what you're you know your great example of maybe we've published customer master but but and nobody knows there and everyone's going to the old version right it could be that people aren't using the standards and or it could be that in our customer master we've decided these are the 10 most valuable fields and when we look at the uses people say no these other six are really valuable as well so maybe that's that kind of you know feedback of when we have the kind of vetted master we can listen to the voice of the people I think it's a mix you know some things are locked down and but also have that open voice you know you don't want chaos with everyone just chiming in this is customer data right but I think it's that that feedback loop of which data is allowed to become you know some of the queries and allowed that crowdsourcing view and then we don't get to choose whether this is visible or not there's laws around that one right yeah so I think that's great just quickly because I have to answer it because it's near there in my heart if someone mentioned about the conceptual data model and what level of detail I would say this is a great place just don't don't over detail it for the someone that had a bad experience this is these are those core terms what is the customer how is that different from a prospect how is that different from a patient or a member etc etc I think starting small with a piece of the business that where you know there's a pain point sales and marketing and finance can't agree with how we handle member you know you know customers versus prospects get that hashed out and then just keep it very simple I often don't even show attributes just definitions and boxes and those that cardinality between them can be a great way to kind of hash out some of these core issues and it can also help identify what some of that core data that needs to be curated and what can be sort of opened up for crowdsourcing great we are getting close on time so I think this has been a great discussion and I think partly because this is there's no one answer right I think I think the beauty of the tools and the different ways of governing in this world is that there's I think it's an and it's not an or I think things like master data are still super valuable I think these crowdsourcing and open source and self service is a great way to evolve and I think that idea of the collaboration getting that right of what those right roles or right stewardship is really key to that success and to double check I'm on slide 20 is that what you're seeing as well yes I think so all right I don't see a slide number on this one yeah slide 20 okay just to remind folks that for next year we do have the series we'll be starting again on data architecture strategies and so what are the next big things to think about maybe it's self service I think there's a lot of discussion there right what are some of these new tools that people can use to really make this doable there is a white paper that we mentioned on the transient data architecture which is available if you want more details than any of those figures we mentioned I am from Global Data Strategy we do this for a living if you need consulting help and Dremio is our sponsor for some of the tools they'll be Shannon will send out some more information on that particular tool because I know there were some questions about that as well so Shannon I'm going to pass it over to you to wrap it up or if you wanted to take any additional questions before we go I don't see Shannon on Donna that is odd I know I know this I wasn't bored she maybe she was so bored she left will be a Shannon on the call but not our Shannon is Tenley still on the call because someone was asking about the polling slide I'm wondering if you could bring that up for us which slide is that we weren't able to get the final poll it's not a slide it's is it slide 21 no all right so unfortunately we're not able to show the poll unless there was any other final Q&A thanks everyone for joining and as usual Shannon will send out the the slides because that's always the first question we have will these slides be available they will will ask Shannon to put in the results of the poll because we're all curious and expect that within the next three to five business days so thanks all for joining and we hope to see you again in January thanks Donna thank you you