 From the CUBE Studios in Palo Alto in Boston, connecting with thought leaders all around the world, this is a CUBE Conversation. Hi everybody, this is Dave Vellante with theCUBE and welcome to this special digital presentation. We're really digging into how IBM is operationalizing and automating the AI and data pipeline, not only for its clients, but also for itself. And with me is Julie Lochner, who looks after offering management and IBM's data and AI portfolio. Julie, great to see you again. Great, great to be here, thank you. Talk a little bit about the role you have here at IBM. Sure, so my responsibility in offering management in the data and AI organization is really two-fold. One is I lead a team that implements all of the backend processes, really the operations behind any time we deliver a product from the data and AI team to the market. So think about all of the release cycle management, pricing, product management, discipline, et cetera. The other role that I play is really making sure that we are working with our customers and making sure they have the best customer experience. And a big part of that is developing the data ops methodology. It's something that I needed internally for my own line of business execution, but it's now something that our customers are looking for to implement in their shops as well. Oh, good, I really want to get into that. So let's start with data ops. I mean, I think a lot of people are familiar with DevOps, maybe not everybody's familiar with data ops. What do we need to know about data ops? Well, I mean, you bring up the point that everyone knows DevOps. And then in fact, I think what data ops really does is bring a lot of the benefits that DevOps did for application development to the data management organizations. So when we look at what is data ops, it's a data management set of principles that helps organizations bring business ready data to their consumers quickly. It borrows from DevOps similarly where you have a data pipeline that associates a business value requirement. I have this business initiative. It's going to drive this much revenue or this much cost savings. This is the data that I need to be able to deliver it. How do I develop that pipeline and map to the data sources, know what data it is, know that I can trust it. So ensuring that it has the right quality, that I'm actually using the data that it was meant for and then put it to use. So in history, most data management practices deployed a waterfall like methodology or implementation methodology. And what that meant is all the data pipeline projects were implemented serially and it was done based on potentially a first in, first out program management office with a DevOps mental model and the idea of being able to slice through all of the different silos that's required to collect the data, to organize it, to integrate it, to validate its quality, to create those data integration pipelines and then present it to the dashboard like if it's a Cognos dashboard or a operational process or even a data science team, that whole end to end process gets streamlined through what we're calling data ops methodology. So I mean, as you well know, we've been following this market since the early days of Hadoop and people struggle with their data pipelines. It's complicated for them. There's a raft of tools and they spend most of their time wrangling data, preparing data, proving data quality, different roles within the organization. So it sounds like to borrow from DevOps, data ops is all about streamlining that data pipeline, helping people really understand and communicate across end to end, as you're saying, but what's the ultimate business outcome that you're trying to drive? So when you think about projects that require data to again, cut costs to automate a business process or drive new revenue initiatives, how long does it take to get from having access to the data to making it available? That duration for every time delay that is spent wasted, trying to connect to data sources, trying to find subject matter experts that understand what the data means and can verify its quality. Like all of those steps along those different teams and different disciplines introduces delay in delivering high quality data fast. So the business value of data ops is always associated with something that the business is trying to achieve, but with a time element. So if it's for every day, we don't have this data to make a decision, we're either making money or losing money, that's the value proposition of data ops. So it's about taking things that people are already doing today and figuring out the quickest way to do it through automation, through workflows and just cutting through all of the political barriers that often happens when these data's across different organizational boundaries. Yeah, so speed, time to insights is critical, but so, and then with dev ops, you're really bringing together the skill sets into sort of one super dev or one super ops. It sounds like with data ops, it's really more about everybody understanding their role and having communication and line of sight across the entire organization. It's not trying to make everybody a super human data person. It's the whole, it's the group, it's the team effort. Really, it's really a team game here, isn't it? Well, that's a big part of it. So just like any type of practice, there's people, aspects, process, aspects and technology, right? So people process technology. And while you're describing it like having that super team that knows everything about the data, the only way that's possible is if you have a common foundation of metadata. So we've seen a surgeons in the data catalog market in the last six, seven years and what that, the innovation in the data catalog market has actually enabled us to be able to drive more data ops pipelines. Meaning as you identify data assets, you capture the metadata, you capture its meaning, you capture information that can be shared whether they're stakeholders. It really then becomes more of a essential repository for people to really quickly know what data they have, really quickly understand what it means and its quality and very quickly, with the right proper authority like privacy rules included, put it to use for models, dashboards, operational processes. Okay, and we're going to talk about some examples and one of them of course is IBM's own internal example but help us understand where you advise clients to start. I want to get into it. Where do I get started? Yeah, I mean, so traditionally what we've seen with these large data management data governance programs is that sometimes our customers feel like this is a big pill to swallow. And what we've said is, look, there's an opportunity here to quickly define a small project aligned into a high value business initiative, target something that you can quickly gain access to the data, map out these pipelines and create a squad of skills. So it includes a person with DevOps type programming skills to automate an instrument, a lot of the technology, a subject matter expert who understands the data sources and its meaning, a line of business executive who can translate bringing that information to the business project and associating with business value. So when we say, how do you get started? We've developed a, I would call it a pretty basic maturity model to help organizations figure out where are they in terms of the technology? Where are they in terms of organizationally knowing who the right people should be involved in these projects? And then from a process perspective, we've developed some pretty prescriptive project plans that help you nail down what are the data elements that are critical for this business initiative. And then we have for each role, what their jobs are to consolidate the data sets, map them together and present them to the consumer. We find that six week projects, typically three sprints are perfect times to be able to, a timeline to create one of these very short quick win projects. Take that as an opportunity to figure out where your bottlenecks are in your own organization, where your skill shortages are, and then use the outcome of that six week sprint to then focus on filling in gaps, kick off the next project and iterate, celebrate the success and promote the success because it's typically tied to a business value to help then create momentum for the next one. All right, that's awesome. I want to now give you some examples. I mean, we're both Massachusetts based. Normally you'd be in our studio and we'd be sitting here face to face. Obviously with COVID-19 and this crisis, we're all sheltering in place. You're up in somewhere in New England. I happen to be in my studio, but I'm the only one here. So relate this to COVID. How would data ops, or maybe you have a concrete example in terms of how it's helped inform or actually anticipate and keep up to date with what's happening with COVID? Yeah, well, I mean, we're all experiencing it. I don't think there's a person on the planet who hasn't been impacted by what's been going on with this COVID pandemic crisis. We started down this data ops journey a year ago. I mean, this isn't something that we just decided to implement a few weeks ago. We've been working on developing the methodology, getting our own organization in place so that we could respond the next time we needed to be able to act upon a data-driven decision. So part of step one of our journey has really been working with our global chief data officer, Interpol, who I believe you have had an opportunity to meet with an interview. So part of this year journey has been working with our corporate organization. I'm in a line of business organization where we've established the roles and responsibilities. We've established the technology stack based on our cloud pack for data and Watson knowledge catalog. So I use that as the context. For now we're faced with a pandemic crisis and I'm being asked in my business unit to respond very quickly, how can we prioritize the offerings that are gonna help those in critical need so that we can get those products out to market? We can offer a 90-day free use for governments and hospital agencies. So in order for me to do that as a operations lead for our team, I needed to be able to have access to our financial data. I needed to have access to our product portfolio information. I needed to understand our cloud capacity. So in order for me to be able to respond with the offers that we recently announced and you can take a look at some of the examples with our Watson Citizen Assistant program where I was able to provide the financial information required for us to make those products available for governments, hospitals, state agencies, et cetera. That's a perfect example. Now to set the stage back to the corporate global chief data office organization, they implemented some technology that allowed us to ingest data, automatically classify it, automatically assign metadata, automatically associate data quality so that when my team started using that data, we knew what the status of that information was when we started to build our own predictive models. And so that's a great example of how we've partnered with a corporate central organization and took advantage of the automated set of capabilities without having to invest in any additional resources or headcount and be able to release products within a matter of a couple of weeks. And that automation is a function of machine intelligence. Is that right? And obviously some experience, but you and I, when we were consultants doing this by hand, we couldn't have done this. We couldn't have done it at scale anyway. Is it machine intelligence and AI that allows us to do this? That's exactly right. And as you know, our organization is data and AI. So we happen to have the research and innovation teams that are building a lot of this technology. So we have somewhat of an advantage there, but you're right. The alternative to what I've described is manual spreadsheets. It's querying databases. It's sending emails to subject matter experts, asking them what this data means. If they're out sick or on vacation, you have to wait for them to come back. And all of this was a manual process. And in the last five years, we've seen this data catalog market really become this augmented data catalog. And that augmentation means it's automation through AI. So with years of experience and natural language understanding, we can comb through a lot of the metadata that's available electronically. We can comb through unstructured data, but we can categorize it. And if you have a set of business terms that have industry standard definitions through machine learning, we can automate what you and I did as a consultant manually in a matter of seconds. That's the impact that AI has had in our organization. And now we're bringing this to the market. And it's a big part of where I'm investing my time, both internally and externally, is bringing these types of concepts and ideas to the market. So I'm hearing, first of all, one of the things that strikes me is you've got multiple data sources and data that lives everywhere. You might have your supply chain data in your ERP, maybe that sits on-prem. You might have some sales data that's sitting in a SAST or in a cloud somewhere. You might have weather data that you want to bring in. In theory anyway, the more data that you have, the better insights that you can gather, assuming you've got the right data quality. So let me start with where the data is. So it's anywhere. You don't know where it's going to be, but you know you need it. So that's part of this, is being able to get to the data quickly. Yeah, it's funny you bring it up that way. I actually look at it a little differently. When you start these projects, the data was in one place. And then by the time you get through the end of a project, you find out that it's been moved to the cloud. So the data location actually changes while we're in the middle of projects. I mean, even during this pandemic crisis, we have many organizations that are using this as an opportunity to move to SAST. So what was on-prem is now cloud, but that shouldn't change the definition of the data. It shouldn't change its meaning. It might change how you connect to it. It might also change your security policies or privacy laws. Now all of a sudden, you have to worry about where is that data physically located? And am I allowed to share it across national boundaries, right? Before we knew physically where it was. So when you think about data ops, data ops is a process that sits on top of where the data physically resides. And because we're mapping metadata and we're looking at these data pipelines and automated workflows, part of the design principles are to set it up so that it's independent of where it resides. However, you have to have placeholders in your metadata and in your tool chain for automating these workflows so that you can accommodate when the data decides to move because the corporate policy change from on-prem to cloud. Then that's a big part of what data ops offers. It's the same thing, by the way, for DevOps. They've had to accommodate building in platforms as a service versus on-prem development environments. It's the same for data ops. And the other part that strikes me in listening to you is scale. And it's not just about scale with the cloud operating model. It's also about what you were talking about is the auto classification, the automated metadata. You can't do that manually. You've got to be able to do that in order to scale with automation. That's another key part of data ops, is it not? Well, it's a big part of the value proposition and a lot of the part of the business case, right? When you and I started in this business and big data became the thing, people just moved all sorts of data sets to these Hadoop clusters without capturing the metadata. And so as a result, in the last 10 years, this information is out there, but nobody knows what it means anymore. So you can't go back with the army of people and have them query these data sets because a lot of the context was lost. But you can use automated technology. You can use automated machine learning with natural language understanding to do a lot of the heavy lifting for you. And a big part of data ops workflows and building these pipelines is to do what we call management by exception. So if your algorithms say 80% confident that this is a phone number and your organization has a low risk tolerance, that probably will go to an exception. But if you have a match algorithm that comes back and says it's 99% sure this is an email address and you have a threshold that's 98%, it will automate much of the work that we used to have to do manually. So that's an example of how you can automate, eliminate manual work and have some human interaction based on your risk threshold. Yeah, that's awesome. I mean, you're right. The no schema on right said, oh, I throw it into a data lake. The data lake becomes a data swamp. We all know that joke. Okay, I want to understand a little bit and maybe you have some other examples of some of the use cases here, but just some of the maturity of where customers are. I mean, it seems like you got to start by just understanding what data you have, cataloging it, getting your metadata act in order. But then you've got a data quality component before you can actually implement and get to insight. So, where are customers on the maturity model? Do you have any other examples that you can share? Yeah, so when we look at our data ops maturity model, we tried to simplify, and I mentioned this earlier, that we tried to simplify it so that really anybody can get started. They don't have to have a full governance framework implemented to take advantage of the benefits data ops delivers. So what we did, we said, if you can categorize your data ops programs into really three things. One is, how well do you know your data? Do you even know what data you have? The second one is, can you trust it? Like, can you trust its quality? Can you trust its meeting? And the third one is, can you put it to use? So if you really think about it, when you begin with what data do you know, right? The first step is, how are you determining what data you know? The first step is, if you are using spreadsheets, replace it with a data catalog. If you have a department line of business catalog and you need to start sharing information with other departments, then start expanding to an enterprise level data catalog. Now you mentioned data quality. So the first step is, do you even have a data quality program, right? Have you even established what your criteria are for high quality data? Have you considered what your data quality score is comprised of? Have you mapped out what your critical data elements are to run your business? Most companies have done that for their governed processes, but for these new initiatives, and when you identify, I'm in my example with the COVID crisis, what products are we gonna help bring to market quickly? I need to be able to find out what the critical data elements are and can I trust it? Have I even done a quality scan and have teams commented on its trustworthiness to be used in this case? If you haven't done anything like that in your organization, that might be the first place to start. Pick the critical data elements for this initiative, assess its quality, and then start to implement the workflows to remediate. And then when you get to putting it to use, there's several methods for making data available. One is simply making a data mart available to a small set of users. That's what most people do. Well, first they make a spreadsheet of the data available, but then if they need to have multiple people access it, that's when like a data mart might make sense. Technology like data virtualization eliminates the need for you to move data as you're in this prototyping phase. And that's a great way to get started. It doesn't cost a lot of money to get a virtual query set up to see if this is the right join or the right combination of fields that are required for this use case. Eventually you'll get to the need to use a high performance ETL tool for data integration. But Nirvana is when you really get to that self-service data prep where users can query a catalog and say, these are the data sets I need. It presents you a list of data assets that are available. I can point and click at these columns I want as part of my data pipeline. And I hit go and it automatically generates that output for data science use cases for a Cognos dashboard. That's the most mature model and being able to iterate on that so quickly that as soon as you get feedback that that data elements are wrong or you need to add something, you can do it push button. And that's where data ops journey should bring organizations to. Well, Julie, I think there's no question that this COVID crisis has accentuated the importance of digital. We talk about digital transformation a lot and it's certainly real. Although I would say a lot of people that we talked to will say, well, you know, not on my watch or I'll be retired before that all happens. Well, this crisis is accelerating that transformation and data is at the heart of it. You know, digital means data and if you don't have your data, you know, story together and your act together, then you're not going to be able to compete. And data ops really is a key aspect of that. So, you know, give us a parting word. Sure, I think this is a great opportunity for us to really assess how well we're leveraging data to make strategic decisions. And if there hasn't been a more pressing time to do it, it's when our entire engagement becomes virtual. Like this interview is virtual, right? Everything now creates a digital footprint that we can leverage to understand where our customers are having problems, where they're having successes. You know, let's use the data that's available and use data ops to make sure that we can iterate, access that data, know it, trust it, put it to use so that we can respond to those in need when they need it. Julie Lockner, you're an incredible practitioner, really hands on, really appreciate you coming on theCUBE and sharing your knowledge with us. Thank you. Thank you very much, Dave. It was a pleasure to be here. All right, and thank you for watching everybody. This is Dave Vellante for theCUBE and we will see you next time.