 From theCUBE Studios in Palo Alto in Boston, connecting with thought leaders all around the world, this is a CUBE conversation. Hi everybody, welcome to this special CUBE digital event where we're focusing in on data ops, data ops in action with generous support from friends at IBM. Let me set up the situation here. There's a real problem going on in the industry and that's the people are not getting the most out of their data. Data is plentiful, but insights perhaps aren't. What's the reason for that? Well, it's really a pretty complicated situation for a lot of organizations. There's data silos. There's challenges with skill sets and lack of skills. There's tons of tools out there, sort of a tools creep. The data pipeline is not automated. The business lines oftentimes don't feel as though they own the data, so that creates some real concerns around data quality and a lot of finger points around quality. The opportunity here is to really operationalize the data pipeline and infuse AI into that equation and really attack the cost cutting and revenue generation opportunities that are there in front of you. Think about this. Virtually every application this decade is going to be infused with AI. If it's not, it's not going to be competitive. And so we have organized a panel of great practitioners to really dig in to these issues. First I want to introduce Victoria Stasiewicz, who's an industry expert in data ops at Northwestern U-Tool. Victoria, great to see you again. Thanks for coming on. Excellent, nice to see you as well. And Caitlin Halferty is the director of AI, of the AI Accelerator at I-B-M. And also part of the Chief Data Officers organization at I-B-M who has actively eaten some of its own practice, what it preached, let me say it that way. Caitlin, great to see you again. Thank you, Dave, great to have you. And Steve Lewick, good to see you again. Steve, Vice President and Director of Data Management, Associated Bank, the largest bank in Wisconsin. Thanks for coming on. Thanks, Dave, thanks for being here. All right guys, so you heard my little narrative that you're each at different stages of authority in terms of operationalizing your data, getting the most insight out, as I often say, data is plentiful, insights aren't, but getting insight in real time is critical in this decade. So I'm going to ask each of you to give us a sense as to where you are on that data journey or Victoria, in your case, because you're brand new to Northwestern U-Tool, but you have a lot of deep expertise in healthcare and manufacturing, financial services, but kind of where you see just the general industry climate. And we'll talk about the journeys that you are on, both personally and professionally. So Victoria, take us off here. Sure, I think right now where I see the industry going is needing to have speech and insight, right? So as I've experienced going through many organizations we're all facing the same challenges today, and a lot of those challenges are, where do I need to live? Is my data trusted? I mean, has it been curated? Has it been plentified? Is it qualified? Has it been classified? A lot of that is questions, right? What we see often happening is businesses, right? They know their KPIs. They know their business metrics, but they can't find where that data lives in their back and ask. There's redundant data distributed all over the place, but it is replicated because it's not well managed. So a lot of what governance in the platform of tools and governance this week, right? Offer back to organizations today is just that piece of it. I can tell you where your data is. I can tell you what's trusted. That way you can quickly access the information and bring back answers to business questions that is one answer, not many answers, leaving the business to question, what's the right path, right? Which is the correct answer? Which way do I go at the executive level? That's the biggest challenge where we want the industry to go moving forward, right? Is one, breaking that down, allowing that information to be published quickly. And two, enabling data virtualization. A lot of what you see today is most businesses, right? It takes time to build out large warehouses at an enterprise level. We need to pivot quicker. So a lot of what businesses are doing is we're leaning them towards taking advantage of data virtualization, allowing them to connect to these data sources, right, to bring that information back quickly so they don't have to replicate that information across different systems or different applications, right? And then to be able to provide those answers back quickly, also allowing for seamless access too from the analysts that are running full speed, right? Try and find the answers as quickly as they can. Great, okay, and I want to get into that sort of how to do this, but Steve, let me go to you. One of the things that we talked about earlier was just infusing this mindset of a data culture. And thinking about data as a service. So talk a little bit about sort of how you got started. What was the starting, take us through that. Sure, I think the biggest thing for us there is to change that mindset from data being just for reporting for things that have happened in the past to do some insights on some data that already existed. What we've tried to shift the mentality there is to start to use data and fuse that into our actual applications so that we're providing those insights in real time to the applications as they're consumed, helping with customer experience, helping with our personalization and optimization of our application. The way we've started down that path or kind of the journey that we're still on was to get the foundation laid first. So part of that has been making sure we have access to all of that data, whether it's through virtualization like Vic talked about or whether it's through having more of the data collected in a data lake concept where we have all of that foundational data available as opposed to waiting for people to ask for it. That's been the biggest culture shift for us is having that availability of data to be ready to be able to provide those insights as opposed to having to make the businesses or the application owners ask for that data. Okay, when I first met into Paul Bandari, the IBM global data officer, yeah, I was asking him, okay, where does, what's the role of that, that CDO? And he mentioned a number of things, but two of the things that stood out is you got to understand how data affect the monetization of your company. That doesn't mean selling the data. What role does it play in helping cut costs or income revenue or productivity or customer service, et cetera. The other thing he said was you've got to align with the lines of business. Well, it sounded good. And this is several years ago. And IBM took it upon itself, break its own champagne. I was going to say, you know, dogfooding, whatever, but it's not easy. You just flip a switch and infuse AI and automate the data pipeline. You guys had to go, you know, some real pain to get there. And you did, you were early on, you took some arrows and now you're helping your customers better understand that. But talk about some of the use cases that where you guys have applied this, obviously the biggest organization, you know, one of the biggest in the world. The real challenge is there. Sure. Happy to Dave. You know, we've been on this journey for about four years now. So we stood up our first global ticket office 2016. And you're right. It was all about getting what data strategy authored and executed internally. And we wanted to be very transparent about it. Because as you've mentioned, you know, a lot of challenges are being differently about the value of data. And so as we wrote that data strategy at that time about coming to cognitive enterprise, and then we quickly have pivoted to see the real opportunity and value of infusing AI across all of our major workflows. To your question on a couple of specific use cases, I'd say, you know, we invested that time getting that platform built and implemented. And then we were able to take advantage of that. One particular example that I've been really excited about, I have a practitioner on my team who's a supply chain expert. And a couple of years ago, he started building out supply chain solution so that we could better mitigate our risk in the event of a natural disaster like the earthquake or hurricane anywhere around the world. And because we invested the time and getting the data pipelines right and getting that all of that work curated and cleaned and the quality of it, we were able to recently in recent weeks add the really critical COVID-19 data and deliver that out to our employees internally for their preparation purposes, make that available to our nonprofit partners. And now we're starting to see our first customers take advantage with the health and well-being of their employees' minds. So that's an example I think where, and I'm seeing a lot of my clients, I work with, they invest in the data and AI readiness and then they're able to take advantage of all of that work very quickly in an agile fashion to spin up those applications. Well, I think one of the keys there to Caitlin is that, we can talk about that in a COVID-19 contact but that's going to carry through that notion of business resiliency, it's going to live on in this post COVID world, isn't it? Absolutely, I think for all of us, the importance of investing in the business continuity and resiliency type works so that we know what to do in the event of either natural disaster or something beyond, it'll be grounded in that. And I think it'll only become more important for us to be able to act quickly and so the investment in those platforms and approach that we're taking and I see many of us taking will really be grounded in that resiliency moving forward. So Vic and Steve, I want to dig into this a little bit because we use this concept of data ops which we're stealing from DevOps and there are similarities but there are also differences. So let's talk about the data pipeline. If you think about the data pipeline as a sort of quasi linear process where you're investing data and you might be using tools whether it's Kafka or whatever favorite tool you have and then you're transforming that data and then you got to discover, you got to do some exploration, you got to figure out your metadata catalog and then you're trying to analyze that data to get some insights and then ultimately you want to operationalize it. So, and you could come up with your own data pipeline but generally that sort of concept is I think well accepted. But there's different roles and unlike DevOps where it might be the same developer who's actually implementing security policies and taking it to operations. In data ops, there might be different roles and in fact very often are there's data science, there's maybe an IT role, there's data engineering, there's analysts, et cetera. So Vic, I wonder if you could talk about the challenges in managing and automating that data pipeline of applying data ops and how practitioners can overcome them? Yeah, I would say a perfect example would be a client that I was just recently working for where we actually took a team and we built up a team using agile methodologies, that framework, right? For rapidly adjusting data and then proving out data is fit for purpose, right? So often now we talk a lot about big data and that is really where a lot of industries are going. They're trying to add enrichment to their own data sources so what they're doing is they're purchasing these third party data sets. So in doing so, right, you make that initial purchase but what many companies are doing today is they have no real way to vet that so they'll purchase the information, they aren't going to vet it up front, they're going to bring it into an environment, it's going to take them time to understand if the data is of quality or not and by the time they do, typically the sales gone and done and they're not going to ask for anything back. What we were able to do at the most recent client was use an unstructured data source, right? Bring that in and just that with modelers using this agile team, right? And within two weeks, we were able to bring the data in from the third party vendor, what we considered rapid prototyping, right? Be able to profile the data, understand if the data is of quality or not and then quickly figure out that you know what? The data's not. So in doing so, we were able to then contact the vendor back, tell them, you know what? Sorry, the data's not up to snuff. We'd like our money back, we're not going to go forward with it. That's enabling businesses to be smarter with what they're doing with their data purchases today because many businesses right now, as much as they want to rely on their own data, right? They actually want to rely on across the data from third party sources. And that's really what data ops is allowing us to do. It's allowing us to think at a broader, a higher level, right? What can we do to bring the information in? What structures can we store them in that they don't necessarily have to be modeled because a modeler is great, right? But if we have to take time to model all the information before we even know we want to use it, that's going to slow the process down. And that's slowing the business down. The business is looking for us to speed up all of our processes. A lot of what we heard in the past, right, is that IP tends to slow us down and that's where we're trying to change that perception in the industry is no, we're actually here to speed you up. We have all the tools and technologies to do so. And they're only getting better. I would say also data scientists, right? That's another piece of the pie for us. If we can bring the information in we can quickly catalog it in a metadata environment to bring in the information in the backend data assets, right? And then supply that information back to scientists. Gone are the days where scientists are going and asking for connections to all these different data sources, waiting days for access requests to be approved, just to find out that once they figure out what the relationship diagram, right, the design looks like in that backend database, how to get to it, write the code to get to it and then figure out that this is not the information I need that Sally next to me, right, bold need the wrong information. That's where the catalog comes in. That's where data ops and data governance, having that catalog, that metadata management platform available to you, they can go into a catalog without having to request access to anything quickly. And within five minutes, they can see the structures. What do the tables look like? What do the fields look like? Are these the metrics I need to bring back answers to the business? That's data ops. It's allowing us to speed up all of that information, taking stuff that took months, now down to weeks, down to days, down to hours. So Steve, I wonder if you could pick up on that and just help us understand what data ops means to you. We talked about earlier in our previous conversation, I mentioned it up front, is this notion of the demand for data access is through the roof and you've gone from that to sort of more of a self-service environment where it's not IT owning the data, it's really the businesses owning the data, but what does all this data ops stuff mean in your world? Sure, I think it's very similar. It's how do we enable and get access to that quicker, showing the right controls, showing the right processes and building that scalability and agility into all of it so that we're doing this at scale, it's much more rapidly available. We can discover new data set quickly, determine if it's right or more importantly, if it's wrong. Similar to what Vic described, it's how do we enable the business to make those right decisions on whether or not they're going down the right path, whether they're not. The catalog is a big part of that. We've also introduced a lot of frameworks around scale. So just the ability to rapidly ingest data and make that available has been a key for us. We've also focused on a prototyping environment. So that sandbox mentality of how do we rapidly stand those up for users and still provide some controls but provide that ability for people to do that exploration. What we're finding is that by providing the platform and the foundational layers that we're getting the use cases to sort of evolve and come out of that as opposed to having the use cases prior to then go build things from. We're shifting the mentality within the organization to say, we don't know what we need yet. Let's start to explore. That's kind of that data scientist mentality and culture. It's more of a way of thinking as opposed to an actual project or implementation. Well, I think that cultural aspect is important. Of course, Caitlyn, you guys are an AI company or at least that's part of what you do. But for decades, maybe century, you've been organized around different things like the factoring plant or sales channel or whatever it is. But how has the chief data officer organization within IBM been able to transform itself and really infuse a data culture across the entire company? One of the approaches we've taken and we talk about sort of the blueprint to drive AI transformation so that we can achieve and deliver these really high value use cases. We talk about the data and the technology which we've just touched on but the organizational piece of it and considerations are so important. Change management, enabling and equipping our data stewards. I'll give one specific example that I've been really excited about. When we were building our platform and starting to pull disparate data, structured, unstructured, pull it in, our data stewards were spending a lot of time manually tagging and creating business metadata about that data. And we identified that that was a real pain point costing us a lot of money, valuable resources. So we started to automate the metadata and doing that in partnership with our deep learning practitioners and some of the models that they were able to build. That capability we pushed out into our contact for data and for our product last year. And one of the really exciting things for me to see is our data stewards who be so value with the expertise and the skills that they bring have reported that it's really changed the way they're able to work. It's really sped up their process. It's enabled them to then move on to higher value capabilities and business benefits. And so they're very happy from an organizational consideration point of view. So I think there's ways to identify those use cases. And in our particular case, we drove some significant productivity savings. We also really empowered and enabled our data stewards who be really value to make their job easier, more efficient and help them move on to things that they are more excited about doing. So I think that's another example of the approach you can take in. Yeah, so the cultural piece, the people piece is key. We talked a little bit about the process. I want to get into a little bit into the tech, Steve. I wonder if you could tell us, what's the tech? We have this bevy of tools. I mentioned a number of them up front. You've got different data stores. You've got open source tooling. You've got IBM tooling. What are the critical components of the technology that people should be thinking about tapping and architecture? Sure, from an ingestion perspective, we're trying to do a lot of kind of Python frameworks and scalable ingestion type frameworks. On the catalog side, I think what we've done is gone with IBM Cloud Pack, which provides a platform for a lot of these tools to stay integrated together. So things from the discovery of data sources, the cataloging, the documentation of those data sources, and then all the way through the actual advanced analytics and Python models and our models in the open source side, combined with the ability to do some data prep and refinery work. Having that all in an integrated platform was a key to us for us to roll out kind of more of these tools in bulk as opposed to having the point solutions. So that's been a big focus area for us. And then on the analytics side and the web versus side, there's a lot of different components you can go into, whether it's Mealsoft, whether it's AWS and some of the native functionalities out there. You mentioned before Kafka and Kinesis Streams and different streaming technologies. Those are all the ones that are kind of in our toolbox that we're starting to look at. So, and one of the keys here is we're trying to make decisions in as close to real time as possible, as opposed to the business having to wait weeks or months and then by the time they get the insights, it's too late and really rear view mirror. So Vic, your focus in your career has been a lot on data, data quality, governance, master data, management data from a data quality standpoint as well. What are some of the key tools that you're familiar with that you've used that really have enabled you to operationalize the data pipeline? You know, I would say definitely the IBM tools. I have the most experience with that. Also Informatica though as well. Those are to me the two top players. IBM definitely has come to the table with a sweet right, like Steve said, Cloudpack for data is really a one-stop shop. So that's allowing that quick seamless access for a business user versus them having to go into some of the previous versions that IBM had rolled out where you're going into different user interfaces right to find your information and that can become clunky, it can add to process. It can also create almost like a bad taste in most people's mouths because they don't want to navigate from system to system to system just to get their information. So Cloudpack to me definitely brings everything to the table in a one-stop shop type of environment. Informatica also though is working on the same thing. And I would tell you that they haven't come up with a solution that really comes close to what IBM's done with Cloudpack for data. I'd be interested to see if they can bring that on the horizon, but really IBM's suite of tools allows for profiling quality analytics, right? Metadata management, access to DB2 warehouse on cloud. Those are the tools that I've worked in my past to implement as well as cloud object store to bring all that together to provide that one-stop stop. At Northwestern, right? We're working right now with Kaliber. I think Kaliber is a great set of tool or a great governance catalog, right? But that's really what it's truly made for is it's a governance catalog. You have to bring some other pieces to the table in order for it to serve up all that Cloudpack does today, which is the advanced profiling, the data virtualization that Cloudpack enables today. The machine learning at the level where you can actually work with R and Python code and you put our notebooks inside of Cloudpack. That's some of the pieces, right? That are missing in some of the other vendor schools today. Well, so one of the things that you're hearing here is the theme of openness. We've talked about a lot of tools and not IBM tools, all IBM tools, there are many, but people want to use what they want to use. So, Caitlin, from an IBM perspective, what's your commitment to openness, number one, but also two, we talked a lot about Cloudpacks, but to simplify the experience for your clients. Well, and I have to thank Steven, Victoria, for speaking to their experience. I really appreciate the feedback and part of our approach has been to really take one that the challenges that we've had. I mentioned some of the capabilities that we brought forward in our Cloudpack for data product, one being automating metadata generation. And that was something we had to solve for our own data challenges and needs. So we will continue to source our use cases from and grounded from a practitioner perspective of what we're trying to do and solve and build. And the approach we've really been taking is a co-creation one in that we roll these capabilities out in the product, we work with our customers like Steven, like Victoria, to really solicit feedback to product, route that back to our dev teams, push that out, and just be very open and transparent. We want to deliver a seamless experience, we want to do it in partnership and continue to solicit feedback and improve and roll out. So I think that has been our approach and will continue to be and really appreciate the partnerships that we've been able to foster. So we don't have a ton of time but I want to go to the two practitioners on the panel and ask you about key performance indicators. When I think about DevOps, one of the things that we're measuring is the elapsed time to deploy application start to finish where we're measuring the amount of rework that has to be done, the quality of the deliverable. What are the KPIs, Victoria, that are indicators of success in operationalizing the data pipeline? Oh, I would definitely say your ability to deliver quickly, right? So how fast can you deliver? Is that quicker than what you've been able to do in the past, right? What is the user experience like, right? So have you been able to measure what the amount of time was right that users are responding to bring information to the table in the past versus have you been able to reduce that time to deliver a right of information, business answers to business questions? Those are the key performance indicators to me that tell you that the suite that we've put in place today, right? It's providing information quickly. I can get my business answers quickly but quicker than I could before and the information is accurate. So being able to measure is it quality that I've been given back or is this not? Is it the wrong information? And yet I've got to go back to the table and find where I need to gather that from somewhere else. That to me tells us, okay, you know what, the tools we've put in place today, my teams are working quicker. They're answering the questions that they need to accurately. That is when we know we're on the right path. Steve, anything you would add to that? I think she covered a lot of the key components that around the data quality scoring, right? For all the different data attributes coming up with a metric around how to measure that and then showing that trend over time to show that it's getting better. The other one that we're doing is just around overall data availability. How much data are we providing to our users and showing that trend? So when I first started, you know, we had somewhere in the neighborhood of 500 files that have been brought into the warehouse and had been published and available in the neighborhood of a couple of thousand fields. We've grown that into, we have thousands of tables now available. So it's been hundreds of percent in scale as far as just the availability of that data, how much is out there, how much is ready and available for people to just dig in and put into their analytics and their models and get those back into the other applications. So that's another key metric that we're starting to track as well. So last question. So I said at the top that every application is going to need to be infused with AI this decade. Otherwise that application not going to be as competitive as it should be. And so for those that are maybe stuck in their journey don't really know where to get started. I'll start with Caitlin and go to Victoria and then Steve bring us home. What advice would you give to people that need to get going on this? My advice is I think if you pull the folks that are either producing or accessing your data and figure out what the greatest change is, I mentioned some of the data management challenges we were seeing, these processes were taking weeks and prone to error, highly manual. So that was right for AI project. So identifying those use cases, I think that are really causing the most free work and manual effort. You can move really quickly. And as you build this platform out you're able to spin those up in an accelerated fashion. I think identifying that and figuring out the business impact you're able to drive very early on you can get those going and start really seeing the value. Great. Vic? Yeah, I would actually say Caitlin hit it on the head what I would probably add to that right is the first and foremost in my opinion the importance around this is data governance. You need to implement a data governance at an enterprise level. Many organizations will do it but they'll have silos of governance. You really need an enterprise data governance platform that consists of a true framework of an operational model charters. You have data domain owners, data domain stewards, data custodians, all that needs to be defined. And while that may take some work in the beginning the payoff down the line is that much more. It's allowing your business to truly own the data once they own the data and they take part in classifying the data assets for technologists and for analysts, right? You can start to eliminate some of the technical debt that most organizations have acquired today. They can start to look at what are some of the systems that we can turn off? What are some of the systems that we see of value and truly build out a capability matrix where we can start mapping systems, right? To capabilities and start to say, where do we have, where's the redundancy, right? What can we get rid of? That's the first piece of it. And then the second piece of it is really leveraging the tools that are out there today, the IBM tool, some of the other tools out there as well that enable some of the newer next generation capabilities like data in AI, right? For example, allowing automation for automation which right for all of us means that a lot of the analysts that are in place today they can access the information quicker, they can deliver the information accurately like we've been talking about because it's been classified that pre-work's been done. It's never too late to start. But once you start that, it just really acts as a domino effect to everything else where you start to see everything else fall into place. All right, thank you. And Steve, bring us some advice for your peers that want to get started. Sure, I think the key for me too is like those guys have talked about, I think all everything they said is valid and accurate. The thing I would add is from a starting perspective if you haven't started, start, right? Don't try to overthink it over planet. Get started, just do something and start to show that progress and value. The use cases will come even if you think you're not there yet. It's amazing once you have the foundational components there how some of these things start to kind of come out of the woodwork. So get started, get going, have that iterative approach to this and an open mindset. Encourage exploration and enablement. Look your organization in the eye to say, why are there silos? Why do these things exist? What are our problems? What are the things getting in our way and focus and tackle those areas? As opposed to trying to put up more rails and more boundaries and kind of encourage that siloed mentality. Really look at how do you focus on that enablement? And then the last comment would just be on scale. Everything should be focused on scale. What you think is a one-time process today. You're going to do it again. We've all been there. You're going to do it a thousand times again. So prepare for that. Prepare for that. You're going to do everything a thousand times and start to instill that culture within your organization. Great advice guys. Data, bringing machine intelligence and AI to really drive insights and scaling with a cloud operating model no matter where that data live. It's really great to have three such knowledgeable practitioners. Caitlin, Victoria and Steve, thanks so much. for coming on theCUBE and helping support this panel. Thank you, thank you very much. All right, thank you for watching everybody. Now, remember this panel was part of the raw material that went into a crowd chat that we hosted on May 27th, crowdchat.net slash data ops. So go check that out. This is Dave Vellante for theCUBE. Thanks for watching.