 From the CUBE Studios in Palo Alto in Boston, connecting with thought leaders all around the world, this is a CUBE Conversation. Hi everybody, this is Dave Vellante and welcome to this special digital CUBE presentation sponsored by IBM. We're going to focus in on data ops, data ops in action. A lot of practitioners tell us that they really have challenges operationalizing and infusing AI into the data pipeline. They're going to talk to some practitioners and really understand how they're solving this problem. And I'm really pleased to bring Victoria Stasiowicz who's the Global Information Systems Manager for Information Management at Harley-Davidson. Vic, thanks for coming to the CUBE, great to see you. Wish we were face to face, but really appreciate you coming on in this manner. That's okay, that's why technology's great, right? Cause a couple of decades ago it would not have been possible. So you are steeped in a data role at Harley-Davidson. Can you describe a little bit about what you're doing and what that role is like? Definitely, so obviously I'm manager of information management dash governance at Harley-Davidson and what my team is charged with is building out data governance at an enterprise level as well as supporting the AI and machine learning technologies within my function, right? So I have the portfolio, that portfolio really includes data AI and governance and also our master data and reference data and data quality functions. If you're familiar with the DAMA wheel, of course. What I can tell you is that my team did an excellent job within this last year, 2019, standing up the infrastructure. So those technologies right specific to governance as well as their newer, more modern warehouse on cloud technologies and cloud object store, which also included Watson Studio and Watson Explorer. So many of the IBMers of the world might hear about obviously IBM, ISCE or work on it directly. We stood that up in the cloud as well as DB2 warehouse on cloud like I said in cloud object store. We spent about the first five months of last year standing that infrastructure up, working on the workflow, ensuring that access security management was all set up and configured within the platform. And what we did the last half of the year, right, was really start to collect that metadata as well as the data itself and bring the metadata into our metadata repository, which was our ex meta database, with that ISCE. And then also bring that into our DB2 warehouse on cloud environment. So we were able to start with what we would consider our dealer domain for Harley Davidson and bring those dimensions within to DB2 warehouse on cloud, which was never done before. A lot of the information that we were collecting and bringing together for the analytics team lived into spirit data sources throughout the enterprise. So the goal right was to stop with redundant data across the enterprise, eliminate some of those disparate data resources, right? And bring that into a centralized repository for reporting. Okay, wow, we got a lot to unpack here, Victoria. So, but let me start with sort of the macro picture. I mean, years ago, it used to be the data was this thing that had to be managed. I mean, it still does, but it was a cost was largely a liability. You know, governance was sort of front and center. Sometimes, you know, it was the tail that wagged the value dog. And then the whole big data movement comes in and everybody wants to be data-driven. And so you saw some pretty big changes in just the way in which people looked at data. They wanted to, you know, mine that data and make it an asset versus just a straight liability. So what are the changes that you discerned in data and in your organization over the last, let's say, half a decade? We, to tell you the truth, we started looking at access management and the ability to allow some of our users to do some rapid prototyping that they could never do before. So what more and more we're seeing as far as data citizens or data scientists, right? Or even analysts throughout most enterprises is that, well, they want access to the information. They want it now. They want speed to insight at this moment using pretty much minimal viable product. They may not need the entire data set and they don't want to have to go through leaps and bounds, right? To just get access to that information or to bring that information into necessarily a centralized location. So while I talk about our DB2 warehouse on cloud and that's an excellent example of when we actually need to model data, we know that this is data that we trust, right? That's going to be called upon many, many times from many, many analysts, right? There's other information out there that people are collecting because there's so much big data, right? There's so many ways to enrich your data within your organization for your customer reporting that people are really trying to tap into those third party data sets. So what my team has done, what we're seeing, right? Change throughout the industry is that a lot of teams and a lot of enterprises are looking at, as technologists, how can we enable our scientists and our analysts, right? The ability to access data virtually. So instead of repeating, right? Recreating redundant data sources, we're actually enabling data virtualization at Harley Davidson. And we've been doing that first working with our DB2 warehouse on cloud and connecting to some of our other trusted versions of data warehouses that we have throughout the enterprise, that being our dealer warehouse as well, to enable, obviously, analysts to do some quick reporting without having to bring all that data together. That is a big change I see. The fact that we are able to tackle that, that's allowed technology to get back ahead because most backup front, I should say most organizations, right, have given IT the bad wrap up. It takes too long to get what we need. My technologists cannot give me my data at my fingertips in a timely manner to allow for speed to insight and answers the business questions at point of time of delivery. Most often we've supplied data to our analysts, right? They're able to calculate, aggregate, create the reporting metrics to get those answers back to the business, but they're a week, two weeks too late. The information is no longer relevant. So data virtualization through data ops is one of the ways that we've been able to speed that up and act as a catalyst for data delivery. What we've also done though, and I see this quite a bit is, well, that's excellent, we still need to start classifying our information and labeling that at the system level. We've seen most enterprises, right? I worked at Blue Cross as well with IBM tools, had the same struggle. They were trying to eliminate their technology debt, reduce their spend, reduce the time it takes for resources to work on technologies, to maintain technologies. They want to reduce their IT portfolio of assets and capabilities that they license today. So what do they do to do that? It's time to start taking a look at what systems should be classified as essential systems versus those systems that are disparate and could be eliminated. And that starts with data governance. Right, so okay, so your main focus is on governance and you talked about people want answers now. They don't want to have to wait, they don't want to go big waterfall process. So what would you say was sort of some of the top challenges in terms of just operationalizing your data pipeline and getting to the point that you are today? You know, I have to be quite honest, standing at the governance framework, the methodology behind it, right? You get data owners, data stewards, a catalog established. That was not necessarily the heavy lifting. The heavy lifting really came with setting up a brand new infrastructure in the cloud for us to be quite honest. We, with IBM partnered and said, you know what, we're going to the cloud. And these tools had never been implemented in the cloud before. We were kind of the first to do it. So some of the struggles that we partake or took on were actually standing up the infrastructure, security and access management, network pipeline access, right? VPN issues, things of that nature, I would say is some of the initial role blocks we went through. But after we overcame those challenges with the help of IBM and the patients with both the Harley and IBM team, it became quite easy to roll out these technologies for other users. The nice thing is, right? We at Harley Davidson have been taking the time to educate our users. Today up, for example, we had what we call the data bytes, a lunch and learn. And so in that lunch and learn, what we did is we took our entire GIS team, our global information services team, which is all of IT, through these new technologies. It was a form of over 250 people with our CIO and CTO on and taking them through, how do we use these tools? What are the purpose of these tools? Why do we need governance to maintain these tools? Why is metadata management important to the organization? That piece of it seems to be much easier than just our initial scanning it up. So it's good enough to start letting users in. Well, it sounds like you had real sponsorship from leadership and input from leadership and they were kind of leaning into the whole process. First of all, is that true and how important is that for success? Oh, it's essential. We often said when we were first standing up the tools to be quite honest, does our CIO really understand what it is that we're standing up? Does our CIO really understand governance because we didn't have the time to really get that face-to-face interaction with our leadership? So I myself made it a mandate having done this previously at Blue Cross to get in front of my CIO and my CTO and educate them on what it is we are exactly standing up. And once we did that, it was very easy to get at an executive steering committee as well as an executive membership council, right? On board with our governance council. And now they're the champions of that. It's never easy though selling governance to leadership in the ROI is never easy because it's not something that you can easily calculate. It's something that has to show it's return on investment over time. And that means that you're creating dashboards, you're educating your CIO and CTO and how you're bringing people together, how groups are now talking about solutions and technologies in a domain-like environment, right? Where you have people from at an international level. We have people from Asia, from Europe, from China that join calls every Thursday to talk about the data quality issue specific to dealer, for example. What systems we're using, what solutions are on the horizon to solve them. So that now instead of having people from other countries that work for Harley, as well as just even within the US, right? Creating one-off solutions that are answering the same business questions using the same data, but creating multiple solutions, right? To solve the same problem. We're now bringing them together and we're solving together and we're prioritizing those as well. So that return on investment necessarily down the line, you can show that as, you know what, instead of this turning into five projects, we've now turned this into one. And instead of implementing four systems, we've now implemented one. And guess what? We have the business rules and we have the classification tied to the system so that you, CIO or CTO, right, can now go in and reference this information and a glossary, a user interface, something that a C level can read, interpret, understand quickly, right? Dissect the information for their own need, without having to take the long lengthy time to talk to a technologist about what does this information mean and how do I use it? You know what's interesting, a takeaway based on what you just said is, Harley-Davidson is an iconic brand, cool company, we're talking motorcycles, right? And but you came out of an insurance background, which is a regulated industry where, you know, governance is sort of de rigueur, right? I mean, it's a table stake. So how are you able, Harley, to balance the sort of tension between governance and the sort of business flexibility? So there's different lovers, I would call them, right? Obviously within healthcare and insurance, the importance becomes compliance and risk and regulatory, right? Their big push is, gosh, I don't want to pay millions of dollars for fines, start classifying this information, enabling security, reducing risk, all that good stuff, right? For Harley-Davidson, it was much different. It was more or less, we have a mission, right? We want to invest in our technologies, yet we want to save money. How do we cut down the technologies that we have today, reduce our technology spend, yet enable our users to have access to more information in a timely manner? That's not an easy, that's not an easy task, right? So what we did is I took that and I married governance to our time model. And our time model is specific, we're either going to tolerate an application, we're going to invest in an application, we're going to migrate an application or we're going to eliminate that. So in talking to my CIO, I said, you know, we can use governance to classify our system, help act as a catalyst when we start to implement what it is we're doing with our technologies, which technologies are we going to eliminate tomorrow? We, as IT, cannot do that, unless we assess some sort of business impact, unless we look at a system and say, how many users are using this? What reports are essential to the business teams? Do they need this system? Is this something that's critical for users today? Do we, is this duplicative, right? Do we have many systems that are solving the same capability? That is how I sold that off my CIO and it made it important to the rest of the organization. They knew we had a mandate in front of us. We had to reduce technology spend. And that really, for me, made it quite easy in talking to other technologists as well as business users on why if governance is important and why it's going to help, Harley-Davidson and their mission to save money going forward. I will tell you though, that the business's biggest value, right, is the fact that they now own the data. They're more likely, right, to use your master data management systems. Like I said, I'm the owner of our MDM services today as well as our customer knowledge center today. They're more likely to access and reference those systems if they feel that they built the rules and they own the rules in those systems. So that's another big value add too, right? As many business users will say, okay, you know, I think I need access to this system. I don't know, I'm not sure. I don't know what the data looks like within it. Is it easily accessible? Is it going to give me the reporting metrics that I need? That's where governance will help them, for example, like our data scientist scheme. Using a catalog, right? You can browse your metadata. You can look at your server, your database, your tables, your fields. Understand what those mean. Understand the classifications, the formulas within them, right? They're all documented in a glossary versus having to go and ask for access to six different systems throughout the enterprise. Hoping, right, that Sally next to you that told you you needed access to these systems was right. Just to find out that you don't need the access and hence it took you three days to get the access anyways. That's why a glossary is really a catalyst for a lot of that. Well, it's really interesting what you just said about, you went through essentially an application rationalization exercise which saved your organization money. That's not always easy because businesses, even though IT may be spending money on these systems, businesses don't want to give them up, but you were able to use, it sounds like you're able to use data to actually inform which applications you should invest in versus sunset. As well, it sounds like you were giving the business a real incentive to go through this exercise because they ended up, as you said, owning the data. Well, that and what's great, right? Who wants to keep using the old car and driving the new car? If they can buy the, I'm sorry, owning the old car, right, driving the old car, if they can truly own a new car for a cheaper price. Nobody wants to do that. I've even looked at Teslas, right? I can buy a Tesla for the same prices I can buy a minivan these days. Think I might buy the Tesla. But what I will say is that we also use that. We built out a capabilities model with our enterprise architecture team and in building that capabilities model, we started to bucket our technologies within those capabilities models, right? Like AI machine learning, warehouse on cloud technologies or even warehousing technologies, governance technologies. Those types of classifications today, integrations technologies, reporting technologies by kind of grouping all those into a capabilities matrix, right? It was easy for us to then start identifying, all right, who are the system owners for these when it comes to technologies? Who are the business users for these? Based on that, right? Let's go talk to this team, the dealer management team about access to this new profiling capability with an IBM or this new catalog with an IBM, right? That they can use today versus this SharePoint Excel spreadsheets they were using for their metadata management, right? Or the profiling tools that were old, you know, 10 years old, some of our SAP tools that they were using before, right? Let's sell them on the new tools and start migrating them. That becomes pretty easy because, I mean, unless you're buying some really old technology, when you give people a purview into those new tools and those new capabilities, especially with some of the IBM speed tools that we have today, the buy-in is pretty quick. It's pretty easy to sell somebody on something shiny and that's much easier to use than some of the older technologies. Well, let's talk about the business impact. I mean, my understanding is you were trying to improve the effectiveness of the dealers, not just go out and brute force, sign up more dealers. Were you able to achieve that outcome and what does it meant for your business? Yes, actually we were. So right now what we did is we stepped something called a CDDR and that's our consumer dealer development repository, right? That's where a lot of our dealer information resides today. It's actually our dealer warehouse. We had some other systems that were collecting that information, talent and light speed, for example. We were able to bring all that reporting into one location, sunset some of those other technologies but then also enable for that centralized reporting layer which we've also used data virtualization to start to marry some of that information to DB2 warehouse on cloud for users. So we're allowing basically those that want to access CDDR and our DB2 warehouse on cloud dealer information to do that within one reporting layer. In doing so we were able to create something called a dealer harmonized ID really which is our version of we have so many dealers today, right? And some of those dealers actually sell bikes. Some of those dealers sell just apparel material. Some of those dealers just sell parts of those dealers, right? Can we have certain IDs, kind of a golden record, mastered information, if you will, right? Bought back in reporting so that we can accurately assess the dealer performance. Up to two years ago, right? It was really hard to do that. We had information spread out all over. It was really hard to get a good handle on what dealers were performing and what dealers weren't because it was tough, right? For our analysts to wrangle that information and bring it together. It took time. Many times we would get multiple answers to one business question, which is never good, right? One question should have one answer if it's accurate. That is what we worked on within this last year and that's for really our CIO. So the value add is now we can start to after report on what dealers are performing at an optimal level versus what dealers are struggling. And that's allowed even our account reps or field staff to go work with those struggling dealers and start to share with them the information of, these are what some of our stronger performing dealers are doing today that is making them more effective in selling bikes. These are some of the best practices you can implement. That's where we make our field staff smarter and our dealers smarter. We're not looking to shut down dealers. We just want to educate them on how to do better. Well, and to your point about the single version of the truth, if you will, the lines of business it's kind of owning their own data. That's critical because you're not spending all your time pointing at fingers, trying to understand the data. If the users own it, then they own it. How does self-service fit in? Were you able to achieve some level of self-service? How far can you go there? We were, we did use some other tools. I'll be quite honest. Aside from just the IBM tools today that's enabled some of that self-service analytics. SAP SAC was one of them. Altrux is another big one that we like to, that our analysts team likes to use today to wrangle and bring that data together. But that really allowed for our analysts, right in our reporting teams to start to build their own derivations, their transformations for reporting themselves because they're more user interface-based versus going in the backend systems and having to write straight code, right? SQL queries, things of that nature. That usually takes time and requires a deeper level of knowledge than what we'd like to allow for our analysts to have today. I can say the same thing with the data scientist team. They use a lot of the R and Python coding today. What we've tried to do is make sure that the tools are available so that they can do everything they need to do without us really having to touch anything. And I will be quite honest. We have not had to touch much of anything. We have a very skilled data scientist team. So I will tell you that the tools that we put in place today, Watson Explorer, some of the other tools as well, they haven't, that has enabled the data scientists to really quickly move, do what they need to do for reporting. And even in cases where maybe Watson or Explorer may not be the optimal technology, right? For them to use, we've also allowed for them to use some of our other resources, our open source resources to build some of the models that they were looking to build. Well, I'm glad you brought that up, Victoria, because IBM makes a big deal out of being open. And so you're kind of confirming that you can use third party tools. And if you like tool vendor ABC, you can use them as part of this framework. Yeah, it's really about TCO, right? So take a look at what you have today. If it's giving you at least 80% of what you need for the business or for your data scientists or reporting analysts, right? To do what they need to do. It's, to me, it's good enough, right? It's giving you what you need. It's pretty hard to find anything that's exactly 100%. It's about being open, though, to when your scientists or your analysts find another reporting tool, right? That requires minimal maintenance, or let's just say data scientists pull, that requires minimal maintenance that's free, right? Because it's open source. IBM can integrate with that. And we can enable that to be a quicker way for them to do what they need to do versus telling them no, right? You can't use the other technologies or the other open source information out there for you today. You've got to use just these tools. That's pretty tough to do. And I think that would shut most IT shops down pretty quick within larger enterprises because it would really act as a roadblock to allow most of our teams, right, to do what they need to do for reporting. Well, last question. So a big part of this data ops, borrowing from dev ops is this continuous integration, continuous improvement, kind of ongoing, raising the bar, if you will. What do you see going from here? Oh, I definitely see, I see a world, right? I see a world of where we're allowing for that rapid prototyping like I was talking about earlier. I see a very big change in the data industry. You said it yourself, right? We are on the brink of big data and it's only gonna get bigger. There are organizations, right, right now that have literally understood how much of an asset their data really is today, that they're starting to sell their data off to other similar people or similar industries, right? Similar vendors within the industry, similar spaces, right? So they can make money off of it because data truly is an asset now. The key to it that was obviously making sure that it's curated, that it's cleansed, that it's trusted, that when you are selling that back, you can truly make money off of it. What we've seen though, and what I really see on the horizon is the ability to vet that data, right? Because in the past, what have we been doing in the past decade? We were just buying big data sets. We're trusting that it's good information. We're not doing a lot of profiling on it. Most organizations aren't. So you're gonna pay this big top dollar. You're gonna receive this third-party data set and you're not gonna be able to use it the way you need to. What I see on the horizon is us being able to do that. We're building data lake houses, if you will, right? We're building really those Hadoop-like environments, those data lakes, right? Where we can land information, we can quickly access it, we can quickly profile it with tools that would take hours for an analyst to write a bunch of queries to understand what the profile of that data looked like. We did that recently at Harley-Davidson. We bought in some third-party data, evaluated it quickly through our Agile Scrum team, right? Within a week, we determined that the data was not as good as the vendor selling it, right? Pretty much sold it to be. And so we told the vendor, we want our money back. The data is not what we thought it would be. Please take the data sets back. Now, that's just one use case, right? But to me, that was golden. It's a way to save money and start vetting the data that we're buying. Otherwise, what I would see in the past or what I've seen in the past is many organizations are just buying up big third-party data sets and just saying, okay, no, it's good enough. We think that just because it comes from the motorcycle and council, right? Or motorcycles and operation council, that it's good enough. It may not be. It's up to us to start vetting that. And that's where technology is going to change. Data is going to change. Analytics is going to change. Well, Victoria, this is a great example. You're really on the cutting edge of this whole data op trend. Really appreciate you coming on theCUBE and sharing your insights. And there's more in the CrowdChat, CrowdChat.net slash data op. So thank you, Victoria, for coming on theCUBE. Perfect. Well, thank you, Dave. Nice to meet you. And it was a pleasure speaking with you. Yeah, really, pleasure was all ours. And thank you for watching, everybody. As I say, CrowdChat.net slash data op for more detail, more Q and A. This is Dave Vellante for theCUBE. Keep it right there. We'll be right back right after this short break.