 From around the globe, it's theCUBE covering Data Citizens 21 brought to you by Calibra. We're back talking all things data at Data Citizens 21. My name is Dave Vellante and you're watching theCUBE's continuous coverage, virtual coverage, hashtag Data Citizens 21. I'm here with Jim Cushman who's Calibra's chief product officer who shared the company's product vision at the event. Jim, welcome, good to see you. Thanks, Dave. Glad to be here. Now, one of the themes of your session was all around self-service and access to data. This is a big, big pointed discussion amongst organizations that we talk to. I wonder if you could speak a little more toward what that means for Calibra and your customers and maybe some of the challenges of getting there. So Dave, our ultimate goal at Calibra has always been to enable service access for all customers. Now, one of the challenges, they're limited to how they can access information, these knowledge workers. So our goal is to totally liberate them. So why is this important? Well, in and of itself, self-service liberates tens of millions of data-literate knowledge workers. This will drive more rapid and insightful decision-making. It'll drive productivity and competitiveness. And to make this level of adoption possible, the user experience has to be as intuitive as a retail shopping. Like I mentioned in my previous bit, like you're buying shoes online. But this is a little bit of foreshadowing and there's even a more profound future than just enabling self-service. We believe that a new class of shopper is coming online and she may not be as data-literate as our knowledge worker of today. Think of her as an algorithm developer. She builds machine learning or AI. The engagement model for this user will be to kind of build automation, personalized experiences for people to engage with data. But in order to build that automation, she too needs data. Because she's not data-literate, she needs the equivalent of a personal shopper, someone that can guide her through the experience without actually having her know all the answers to the questions that would be asked. So this level of self-service goes one step further and becomes an automated service. One to really help find the best, unbiased and elated training data to help train an algorithm of the future. So- Oh, go ahead, please, continue. Oh, please, so all of this self and automated service needs to be complimented with kind of a peace of mind that you're letting the right people gain access to it. So when you automate it, it's like, well, geez, are the right people getting access to this? So it has to be governed and secure. This can't become like the wild, wild West or like a data, what do we call it data flea market where data is everywhere. So history does quickly forget the companies that do not adjust to remain relevant. And I think we're in the midst of an exponential differentiation and a Kaliber data intelligence cloud that is really kind of established to be the key catalyst for companies that will be on the winning side. Well, that's big because I mean, I'm a big believer in putting data in the hands of those folks in the line of business. And of course, the big question that always comes up is, well, what about governance? What about security? So to the extent that you confederate that, that's huge because data is distributed by its very nature. It's going to stay that way. It's complex. You have to make the technology work in that complex environment, which brings me to this idea of low code or no code. It's gaining a lot of momentum in the industry. Everybody's talking about it, but there are a lot of questions. You know, what can you actually expect from no code and low code? Who are the right potential users of that? Is there a difference between low and no? And so from your standpoint, why is this getting so much attention and why now, Jim? You know, I mean, to go back to even 25 years ago, we were talking about four and five generational languages that people were building. And it really didn't reach the total value that folks were looking for because it always fell short. And you'd say, listen, if you didn't do all the work it took to get to a certain point, how are you possibly going to finish it? And that's where the four GLs and five GLs fell short as a capability. With our software, if you really get increased self-service, how are you going to be self-serviced if it still requires somebody to write code? Well, I guess you could do it if the only self-service people are people who write code. Well, that's not that effective. So if you truly want the ability to have something show up at your front door without you having to call somebody or make any efforts to get it, then it needs to generate itself. The beauty of doing cataloging and governance, understanding all the data that is available for choice, giving someone the selection that is using objective criteria, like this is the best objectively because of its quality for what you want or its label or its unbiased and it has that level of deterministic value to it versus guessing or subjectivity or what my neighbor used or what I used on my last job. Now that you've given people the power with confidence to say, this is the one that I want. The next step is, okay, can you deliver to them without them having to write any code? So imagine being able to generate those instructions from everything that we have in our metadata repository to say, this is exactly the data I need you to go get in performing what's called a distributed query against those data sets and bringing it back to them, no code written. And here's the real beauty, Dave. Pipeline development, data pipeline development is a relatively expensive thing today. And that's why people spend a lot of money maintaining these pipelines. But imagine if there was zero cost to building your pipeline, would you spend any money to maintain it? Probably not. So if we can build it for no cost, then why maintain it? Just build it every time you need it. And then again, done on a self-service basis. I really like the way you're thinking about this because you're right. A lot of times when you hear self-service it's about making the hardcore developers, you know, be able to do self-service but the reality is, and you talk about that data pipeline, it's complex, the business person sitting there waiting for data, wants to put in new data. And it turns out that the smallest, you know, unit is actually that entire team. And so you sit back and wait. And so to the extent that you can actually enable self-service for the business by simplification, that is, it's been the holy grail for a while, isn't it? But let's look a little bit, dig into where you're placing your bets. I mean, you're a head of products. You got to make bets, you know, certainly many, many months, if not years in advance. What are your big focused areas of investment right now? Yeah, certainly. So one of the things we've done very successfully since our origin over a decade ago is building a business user-friendly software and was predominantly kind of a plumbing or infrastructure area. So business users love working with our software. They can find what they're looking for and they don't need to have some cryptic key of how to work with it. They can think about things in their terms and use our business glossary and they can navigate through what we call our data intelligence graph and find just what they're looking for. And we don't require our business to change everything just to make it happen. We give them kind of a universal translator to talk to the data. But with all that wonderful usability, the common compromise that you make is, well, it's only good up to a certain amount of information. Kind of like Excel, you know, you can do almost anything with Excel, right? But when you get into large volumes, it becomes problematic and now you need to, you know, go with a hardcore database and application on top. So what the industry is pulling us towards is far greater amounts of data, not that just millions or even tens of millions, but into hundreds of millions and billions of things that we need to manage. So we have a huge focus on scale and performance on a global basis. And that's a mouthful, right? Not only are you dealing with large amounts at performance, but you have to do it in a global fashion and make it possible for somebody who might be operating in Southeast Asia to have the same experience with the environment as they would be in Los Angeles. And the data needs to therefore go to the user as opposed to having the user come to the data as much as possible. So it really does put a lot of emphasis on some of what we call the non-functional requirements, also known as the illities. And so our ability to bring the data and handle those large enterprise grade capabilities at scale and performance globally is what's really driving a good number of our investments today. I want to talk about data quality. This is a hard topic, but it's one that's so important. And I think it's been really challenging and somewhat misunderstood. I mean, you think about the chief data officer role itself. It kind of emerged from these highly regulated industries and it came out of the data quality kind of back office role that's kind of gone front and center. And now it's pretty strategic. Having said that, the prevailing philosophy is okay, we got to have this centralized data quality approach and then it's going to be imposed throughout. And it really is a hard problem. And I think about these hyper-specialized roles like the quality engineer and so forth. And again, the prevailing wisdom is if I can centralize that, it can be lower cost and I can service these lines of business when in reality, the real value is, you know, speed. And so how are you thinking about data quality? You hear so much about it. Why is it such a big deal? And why is it so hard and a priority in the marketplace? Your thoughts. Thanks for that. So we of course, inquired a data quality company, not very early this year, LDQ. And the big question is, okay, so why them and why now? Not before, well, at least a decade ago, we started hearing people talk about big data. It was probably around 2009 and it was becoming the big talk. And what we don't really talk about when we talk about this ever-expanding data, the by-product is velocity of data is increasing dramatically. So the speed of which new data is being presented, the way in which data is changing is dramatic. And why is that important to data quality? Because data quality historically for the last 30 years or so has been a rules-based business where you analyze the data at a certain point in time and you write a rule for it. Now, there's already a room for error there because humans are involved in writing those rules. But now with the increased velocity, the likelihood that it's going to atrophy and become no longer a valid or useful rule to you increases exponentially. So we were looking for a technology that was doing it in a new way, similar to the way that we do auto classification when we're cataloging atrophys. How do we look at millions of pieces of information around metadata and decide what it is to put it into context? The ability to automatically generate these rules and then continuously adapt data changes to adjust these rules is really a game changer for the industry itself. So we chose our DQ for that very reason. It's not only where they had this really kind of modern architecture to automatically generate rules, but then to continuously monitor the data and adjust those rules, cutting out huge amounts of cost, clearly having rules that aren't helping you save. And frankly, you know how this works is no one really complains about it until there's a squeaky wheel, you get a fine or exposes, and that's what is causing a lot of issues with data quality. And then why now? Well, I think, and this is my speculation, but there's so much movement of data moving to the cloud right now. And so anyone who's made big investments in data quality historically for their on-premise data warehouses, netizens, terror datas, articles, et cetera, or even their data lakes are now moving to the cloud. And they're saying, hmm, what investments are we going to carry forward that we had on-premise and which ones are we gonna start a new from? And data quality seems to be ripe for something new. And so these new investments and data in the cloud are now looking up. Let's look at this new next generation method of doing data quality, and that's where we're really fitting in nicely. And of course, finally, you can't really do data governance and cataloging with that data quality without data governance and cataloging is kind of a hollow long-term story. So the three working together is very a powerful story. I gotta ask you some Colombo questions about this because you're right, it's rules-based. And so my, you know, immediately I go, okay, what are the rules around COVID or hybrid work, right? If there's static rules, there's so much unknown. And so what you're saying is you've got a dynamic process to do that. So, and one of my gripes about the whole big data thing, and you referenced that 2009, 2010, I loved it because there were a lot of profound things about Hadoop and a lot of failings. And one of the challenges is really that there's no context in the big data system. You know, the folks in the data pipeline, they don't have the business context. So my question is, it sounds like you've got this awesome magic to automate. Who adjudicates the dynamic rules? Does humans play a role? What role do they play there? Absolutely. So there's the notion of sampling. So you can only trust a machine for a certain point before you want to have some type of a steward or assisted or supervised learning that goes on. So, you know, suspect maybe one out of 10, one out of 20 rules that are generated. You might want to have somebody look at it. But there's a ways to do the equivalent of supervised learning without actually paying the cost of a supervisor. Let's suppose that you've written 1,000 rules for your system that are five years old. And we come in with our ability and we analyze the same data and we generate rules ourselves. We compare the two themselves. And there's absolutely going to be some exact matching, some overlap that validates one another. And that gives you confidence that the machine learning did exactly what you did. What's likelihood that you guessed wrong and machine learning guessed wrong exactly the right way? That seems pretty small concern. So now you're really saying, well, why are they different? And now you start to study the samples. And what we learned is that in our ability to generate between 60 and 70% of these rules, any time we were different, we were right almost every single time, like almost every like only one out of 100 where was it proven that the handwritten rule was a more profound outcome. And of course, it's machine learning. So it learned and it caught up the next time. So that's the true power of this innovation is it learns from the data as well as the stewards. And it gives you confidence that you're not missing things and you start to trust it. But you should never completely walk away. You should constantly do your periodic sampling. And the secret sauce is math. I mean, I remember back in the mid 2000s, it was like 2006 timeframe, you mentioned auto classification. That was a big problem with the federal rules of civil civil procedure trying to figure out, okay, you had humans classifying humans don't scale. And so you had all kinds of support vector machines and probabilistic latent semantic indexing, but you didn't have the compute power or the data corpus to really do it well. So it sounds like a combination of cheaper compute, a lot more data and machine intelligence have really changed the game there. Is that a fair assertion? That's absolutely fair. I think the other aspect to keep in mind is it's an innovative technology that actually brings all that compute as close into the data as possible. One of the greatest expenses of doing data quality was of course the profiling concept, bringing up the statistics of what the data represents. And in most traditional senses, that data is completely pulled out of the database itself into a separate area. And now you start talking about terabytes or petabytes of data, that takes a long time to extract that much information from a database and then to process through all of it. Imagine bringing that profiling closer into the database what's happening in the same space as the data that cuts out like 90% of the unnecessary processing speed. It also gives you the ability to do it incrementally. So you're not doing any full analysis each time. You have kind of an expensive play when you're first looking at a full database. And then maybe over the course of a day, an hour, 15 minutes, you've only seen a small segment of change. So now it feels more like a transactional analysis process. Yeah, and that's, again, we talked about the old days of big data, the Hadoop days and it was profound. It was all about bringing five megabytes of code to a petabyte of data, but that didn't happen. We shoved it all into a central data lake. I'm really excited for Colibri. It sounds like you guys are really on the cutting edge and doing some really interesting things. I'll give you the last word, Jim, please, bring us home. Yeah, thanks, Dave. So one of the really exciting things about our solution is trying to be a combination of best-of-breed capabilities but also integrated. So to actually create a full and complete story that customers are looking for, you don't want to have them worry about complex integration in trying to manage multiple vendors and the times of their releases, et cetera. If you can find one customer that you don't have to say, well, that's good enough, but every single component is in fact best-of-breed that you can find and it's integrated and they'll manage it as a service, you truly unlock the power of your data literate individuals in your organization. And again, that goes back to our overall goal. How do we empower the hundreds of millions of people around the world who are just looking for insightful decision? Did they feel completely locked? It's as if they're looking for information before the internet and they're kind of limited to whatever their local library has. And if we can truly become somewhat like the internet of data, we make it possible for anyone to access it without controls, but we still govern it and secure it for privacy laws. I think we do have a chance to change the world for better. Great, thank you so much, Jim. Great conversation. Really appreciate your time and your insights. Yeah, thank you, Dave. Appreciate it. All right, and thank you for watching theCUBE's continuous coverage of Data Citizens 21. My name is Dave Vellante. Keep it right there for more great content.