 From around the globe, it's theCUBE covering Data Citizens 21 brought to you by Calibra. For the past decade, organizations have been affecting very deliberate data strategies and investing quite heavily in people, processes and technology specifically designed to gain insights from data, better serve customers, drive new revenue streams, we've heard this before. The results, quite frankly, have been mixed. As much of the effort is focused on analytics and technology designed to create a single version of the truth, which in many cases continues to be elusive. Moreover, the world of data is changing. Data is increasingly distributed, making collaboration and governance more challenging, especially where operational use cases are a priority. Hello, everyone, my name is Dave Vellante and you're watching theCUBE's coverage of Data Citizens 21 and we're pleased to welcome Michelle Getz, who's the Vice President and Principal Analyst at Forrester Research. Hello, Michelle, welcome to theCUBE. Hi, Dave, thanks for having me today. That's our pleasure. So I want to start, you serve a wide range of roles, including enterprise architects, CDOs, chief data officers, that is, analytics, analysts, et cetera, and many data related functions. And my first question is, what are they thinking about today? What's on their minds, these data experts? So there's actually two things happening. One is, what is the demand that's placed on data for our new intelligent digital systems? So we're seeing a lot of investment and interest in things like edge computing, and then how does that intersect with artificial intelligence to really run your business intelligently and drive new value propositions to be both adaptive to the market as well as resilient to changes that are unforeseen. The second thing is, then you create this massive complexity to managing the data, governing the data, orchestrating the data, because it's not just a centralized data warehouse environment anymore. You have a highly diverse and distributed landscape that you both control internally, as well as taking advantage of third party information. So really what the struggle then becomes is how do you trust the data? How do you govern it and secure or protect that data? And then how do you ensure that it's hyper contextualized to the types of value propositions that our intelligent systems are going to serve? Well, I think you're hitting on the key issues here. I mean, you're right, the data, and I sort of refer to this as well, it's sort of out there, it's distributed, it's at the edge, but generally our data organizations are actually quite centralized. And as well, you talk about the need to trust the data, obviously that's crucial. But are you seeing the organization change? I know you're talking about this to clients, your discussion about collaboration. How are you seeing that change? Yeah, so as you have to bring data into context of the insights that you're trying to get or the intelligence that's automating and scaling out the value streams and outcomes within your business, we're actually seeing a federated model emerge in organizations. So while there's still a centralized data management and data services organization led typically by enterprise architects for data, a data engineering team that's managing warehouses and data lakes, they're creating this great platform to access and orchestrate information. But we're also seeing data and analytics and governance teams come together under chief data officers or chief data and analytics officers. And this is really where the insights are being generated from either BI and analytics or from data science itself and having dedicated data engineers and stewards that are helping to access and prepare data for analytic efforts. And then lastly, this is the really interesting part is when you push data into the edge, the goal is that you're actually driving an experience and an application. And so in that case, we are seeing data engineering teams starting to be incorporated into the solutions teams that are aligned to lines of business or divisions themselves. And so really what's happening is if there is a solution consultant who is also overseeing value-based portfolio management, when you need to instrument the data to these new use cases and keep up with the pace of the business, it's this engineering team that is part of the DevOps workbench to execute on that. So really the balance is we need the core, we need to get to the insights and build our models for AI. And then the next piece is how do you activate all that? And there's a team over there to help. So it's really spreading the wealth and expertise where it needs to go. Yeah, I love that. A couple of things that really resonated with me. You talked about context a couple of times in this notion of a federated model because historically the sort of big data architecture, the team, they didn't have the context, the business context and my inference is that's changing. And I think that's critical. Your talk at Data Citizens is called how obsessive collaboration fuels scalable data ops. You talked about data, the DevOps team. What's the premise you put forth to the audience? So the point about obsessive collaboration is sort of taking the hubris out of your expertise on the data. Certainly there's a recognition by data professionals that the business understands and owns their data. They know the semantics, they know the context of it and just receiving the requirements on that was assumed to be okay. And then you can provide a data foundation, whether it's just the lake or whether you have a warehouse environment where you're pulling for your analytics. The reality is, is that as we move into more of a AI machine learning type of model, one, more context is necessary and you're kind of balancing between what are the things that you can ascribe to the data globally, which is what data engineers can support. And then there's what is unique about the data and the context of the data that is related to the business value and outcome as well as the feature engineering that is being done on the machine learning models. So there has to be a really tight link and collaboration between the data engineers, the data scientists and analysts and the business stakeholders themselves. You see a lot of pods starting up that way to build the intelligence within the system. And then lastly, what do you do with that model? What do you do with that data? What do you do with that insight? You now have to shift your collaboration over to the work bench that is going to pull all these components together to create the experiences and the automation that you're looking for. And that requires a different collaboration model around software development and still incorporating the business expertise from those stakeholders so that you're satisfying not only the quality of the code to run the solution, but the quality towards the outcome that meets the expectation and the time to value that your stakeholders have. So data teams aren't just sitting in the basement or in another part of the organization and digitally disconnected anymore. You're finding that they're having to work much more closely and side by side with their colleagues and stakeholders. You know, I think it's clear that you understand this space really well. Huber's out context in. I mean, that's kind of what's been lacking. And I'm glad you said you used the word anymore because I think it's a recognition that that's kind of what it was. They were down in the basement or out in some kind of silo. And I think, and I want to ask you this, I come back to organization because I think a lot of organizations have looked that most cost effective way for us to serve the business is to have a single data team with hyper specialized roles. That'll be the cheapest way, the most efficient way that we can serve them. And meanwhile, the business, which as you pointed out has the context is frustrated. They can't get to data. So this notion of a federated governance model is actually quite interesting. Are you seeing actual common use cases where this is being operationalized? Absolutely. I think the first place that you were seeing it was within the operational technology use cases. You know, they're the use cases where a lot of the manufacturing, industrial device, that, you know, any sort of IOT based use case really recognized that without applying data and intelligence to whatever process was going to be executed, it was really going to be challenging to know that you're creating the right foundation, meeting the SLA requirements, and then ultimately, you know, bringing the right quality and integrity to the data, let alone any sort of data protection and regulatory compliance that has to be necessary. So you already started seeing these solution teams coming together with the data engineers, solution developers, the analysts and data scientists and the business stakeholders to drive that. But that is starting to come back down into more of the IT mindset as well. And so data ops starts to emerge from that paradigm into more of the corporate types of use cases and sort of parrot that because there are customer experience use cases that have an IOT or edge component to them. We live on our smartphones, we live on our smart watches, we've got our laptops, all of us have been put into virtual collaboration. And so we really need to take into account not just the insight of analytics, but how do you feed that forward? And so this is really where you're seeing sort of the evolution of data ops as a competency not only to engineer the data and collaborate, but ensure that there's sort of an activation and alignment where the value is going to come out and still being trusted and governed. I got kind of a weird question, but I was talking to somebody in Israel the other day and they told me masks are off, the economy's booming. And he noted that Israel said, hey, we're going to pay up for the price of a vaccine that cost per dose, 28 bucks or whatever it was. And he pointed out that the EU haggled big time and we don't want to pay $19. And as a result, they're not as far along. Israel understood that the real value was opening up the economy. And so there's an analogy here, which I want to come back to my organization and it relates to the data ops is if the real metric is, hey, I have an idea for a data product, how long does it take to go from idea to monetization? That seems to me to be a better KPI than how much storage I have or how much petabytes I'm managing. So my question is, and it relates to data ops, can that data ops, should that data ops individual maybe live and maybe even the data engineer live inside of the business? And is that even feasible technically with this notion of federated governance? Are you seeing that? And maybe talk a little bit more about this data ops role. Is it fun? Yeah, it's definitely fungible. And in fact, when I talked about sort of those three units of there's your core enterprise services, data services, there's your BI and data. And then there's your line of business. All of those, the engineering and the ops is the data ops, which is living in all of those environments and being as close as possible to where the value proposition is being defined and designed. So absolutely being able to federate that. And I think the other piece on data ops that is really important is recognizing how the practices around continuous integration and continuous deployment using agile methodologies is really reshaping a lot of the waterfall approaches that were done before where data was lagging 12 to 18 months behind any sort of insights, but a lot of the platforms today assume that you're moving into a standard mature software development life cycle. And you can start seeing returns on investment within a quarter really, so that you can iterate and then speed that up so that you're delivering new value every two weeks. But it does change the mindset this data ops team aligned to solution development, aligned to a broader portfolio management of business capabilities and outcomes needs to understand how to appropriately scope the data products that they're delivering to incremental value-based milestones. So the business feels that they're getting improvements over time and not just waiting. So there's an MVP, you move forward on that and optimize, optimize, extend scale. So again, that CI CD mindset is helping to not bottleneck and wait for the complete field of dreams to come from your data and your insights. Thank you for that, Michelle. I want to come back to this idea of collaboration because over the last decade, we've seen attempts, I've seen software come out to try to help the various roles collaborate. And some of it's been okay, but you have these hyper-specialized roles. You've got data scientists, data engineers, quality engineers, analysts, et cetera. And they tend to be in their own little worlds, but at the end of the day, we rely on them all to get answers. So how can these data scientists, all these stewards, how can they collaborate better? What are you seeing there? You need to get them onto the same process. That's really what it comes down to. If you're working from different points of view, that's one thing, but if you're working from different processes, collaborating is really challenging. And I think the one thing that's really come out of this move to machine learning and AI is recognizing that you need processes that reinforce collaboration. So that's number one. So you see agile development in CI CD, not just for data ops, not just for dev ops, but also encouraging and propelling these projects and iterations for the data science teams as well, or even if there's machine learning engineers incorporated. And then certainly the business stakeholders are inserted within there as appropriate to accept what it is that is going to be developed. So processes is number one. Number two is what is the platform that's going to reinforce those processes and collaboration. And it's really about what's being shared. How do you share? So certainly what we're seeing within the platforms themselves is everybody contributing into some sort of a library where their components and products are being ascribed to and then that's able to help different teams grab those components and build out what those solutions are going to be. And in fact, what gets really cool about that is you don't always need hardcore data scientists anymore as you have this social platform for product, data product and analytic product development. This is where a lot of the auto ML begins because those who are less data science oriented but can build an insight pipeline can grab all the different components from the pipelines to the transformations to capture mechanisms to bolting into the model itself and allowing that to be delivered to the application. So really kind of balancing out between process and platforms that enable and encourage and almost force you to collaborate and manage through sharing. Thank you for that. I want to ask you about the role of data governance. You've mentioned trust and that's data quality and you've got teams that are focused on and specialists focused on data quality. There's the data catalog. Here's my question. You mentioned edge a couple of times and I can see a lot of that. I mean, today most AI is or a lot of, I would say most is modeling. And in the future, you mentioned edge. It's going to be a lot of inferencing in real time. And people maybe not going to have the time or be involved in that decision. So what are you seeing in terms of data governance? We talked about federated governance. This notion of a data catalog and maybe automating data quality without necessarily having it be so labor intensive. What are you seeing the trends there? Yeah, so I think our new environment, our new normal is that you have to be composable, interoperatable and portable. Portability is really the key here. So from a cataloging perspective and governance we would bring everything together into our catalogs and business glossaries. And it would be a reference point. It was like a massive wiki. Well, that's wonderful, but why just house it in a museum? You really want to activate that. And I think what's interesting about the technologies today for governance is that you can turn those rules and business logic and policies into services that are composable components and bring those into the solutions that you're defining. And in that way, what happens is that creates portability. You can drive them wherever they need to go. But from the composability and the interoperability portion of that you can put those services in the right place at the right time for what you need for an outcome. So that you start to become behaviorally driven on executing on governance rather than trying to write all of the governance down into transformations and controls to where the data lives. You can have quality and observability of that quality and performance right at the edge in context of behavior and use of that solution. You can run those services and governance on gateways that are managing and routing information at those solutions and where synchronization between the edge and the cloud comes up. And if it's appropriate during synchronization of the data back into the data lake you can run those services there. So there's a lot more flexibility and elasticity for today's modern approaches to cataloging and glossaries and governance of data than we had before. And that goes back into what we talked about earlier of like this is the new wave of data ops. This is how you bring data products to fruition now. Everything is about activation. So how do you see the future of data ops? And I kind of been pushing you to a more decentralized model where the business has more control because the business has the context. I mean, I feel as though, hey, we've done a great job of contextualizing our operational systems. The sales team, they know when the data is crap within my CRM, but our data systems are context agnostic generally. And you obviously understand that problem well. But so how do you see the future of data ops? So I think what's kind of interesting about that is we're going to go to governance on lead versus governance on right, more so. And what do I mean by that? That means that from a business perspective there's two sides of it. There's ensuring that where governance is run is as we talked about before executing at the appropriate place at the appropriate time. It's semantically domain centric driven not logical and system centric. So that's number one. Number two is also recognizing that business owners or business operations actually plays a role in this because as you're working within your CRM systems like a Salesforce, for example, you're using an iPad's environment, MuleSoft to connect to other applications, connect to other data sources, connect to other analytic sources. And what's happening there is that the data is being modeled and personalized to whatever view, insight, or task has to happen within those processes. So even CRM environments where we think of as sort of traditional technologies that we're used to are getting a lift both in terms of intelligence from the data, but also your flexibility and how you execute governance and quality services within that environment. And that actually opens up the data foundations a lot more and avoids you from having to do a lot of moving, copying, centralizing data and creating an over-weighted business application and then over both in terms of the data foundation but also in terms of the types of business services and status updates and processes that happen in the application itself. You're drawing those tasks back down to where they should be and where performance can be managed rather than trying to over-customize your application environment. And that gives you a lot more flexibility later too for any sort of upgrades or migrations that you want to make because all of the logic is contained back down in a service layer instead. Great perspective, Michelle. You obviously know your stuff and it's been a pleasure having you on. My last question is, when you look out there, anything that's really excites you or any specific research that you're working on that you want to share that you're super pumped about? I think there's two things. One is it's truly incredible, the amount of insight and growth that is coming through data profiling and observation really understanding and contextualizing data anomalies so that you understand is data helping or hurting the business value and tying it very specifically to processes and metrics which is fantastic as well as models themselves like really understanding how data inputs and outputs are making a difference whether the model performs or not. And then I think the second thing is really the emergence of more active data, active insights as what we talked about before, your ability to package up services for governance and quality in particular that allow you to scale your data out towards the edge or where it's needed and doing so, not just so that you can run analytics but that you're also driving overall processes and value. So the research around the operationalization and activation of data is really exciting and looking at the networks and service mesh to bring those things together is kind of where I'm focusing right now because what's the point of having data in a database if it's not providing any value? Michelle Getz, Forrester Research. Thanks so much for coming on theCUBE. Really awesome perspectives. You're in an exciting space, so appreciate your time. Absolutely, thank you. And thank you for watching Data Citizens 21 on theCUBE. My name is Dave Vellante.