 From the SiliconANGLE Media office in Boston, Massachusetts, it's theCUBE. Now, here's your host, Stu Miniman. Hi, I'm Stu Miniman, and welcome to theCUBE Conversation here in our Boston area studio. Happy to welcome back to the program Paul Barth, who's the CEO of Podium Data, also a Boston area company, Paul. Great to see you. Great to see you Stu. All right, so we last caught up with you. It was a fun event that we do at MIT talking about information, data quality, to kind of understand why your company would be there. For our audience that doesn't know, just give us a quick summary, your background, what was kind of the why of Podium Data back when it was founded in 2014? Oh, that's great, Stu, thank you. And I've spent most of my career in helping large companies with their data and analytics strategies, next generation architectures, new technologies, et cetera. And in doing this work, we kept stumbling across the complexity of adopting new technologies. And around the time that big data in Hadoop was getting popular and lots of hype in the marketplace, we realized that traditional large businesses couldn't manage data on this because the technology was so new and different. So we decided to form a software company that would automate a lot of the processing, manage a catalog of the data and make it easy for non-technical users to access their data. Yeah, that's great. And I think back to when we were trying to help people understand this whole big data wave, one of the pithy things we did, it was turning all this glut of data from a problem to an opportunity. How do we put this into the users? But a lot of things kind of, we hit bumps in the road as an industry. Did studies, it was like, more than 50% of these projects fail. You brought up a great point. Tooling is tough. Changing processes is really challenging. But that focus on data is quarter our research, what we talk about all the time. But now it's like, oh, well, automation in AIML, choose your favorite acronym today. This is going to solve all the ills that the big data wave didn't do. Right, Paul? So maybe you can help us connect the dots a little bit because I hear a lot into the foundation that trends from the big data to kind of the automation and AI thing. So you're maybe just a little ahead of your time. Well, thanks. I saw an opportunity before there was anything in the marketplace that could help companies really corral their data, get some of the benefits of consolidation, some oversight and management through an automated catalog and the like. As AI has started to emerge as the next type wave, what we're seeing consistently from our partners like DataRobot and others who have great AI technology is they're starved for good information. You can't learn automatically or even human learning if you're given inconsistent information, data that's not conformed or ready or consistent once you can look at a lot of different events and start to build correlations. So we believe that we're still a central part of large companies building out their analytics infrastructure. Okay, help us kind of look at, I'd like to see kind of how your users and how you fit into this kind of changing ecosystem. We all know things are just changing so fast. From 2014 to today, cloud is so much bigger. The big waves of IoT, keep talking, everybody's got some kind of machine learning initiatives. So what are the customers looking for? How do you fit in some of those different environments? Yeah, I think when we formed the company we recognized that the cost performance differential between the open source data management platforms like Hadoop and now Spark were so dramatically better than the traditional databases and data warehouses that we could transform the business process of how do you get data from Rota ready, right? And that's a consistent problem for large companies. They have data in legacy formats on mainframes, they have them in relational databases, they have them in flat files in the cloud behind the firewall and these silos continue to grow. And this view of a consistent, or a consistent view of your business, your customers, your processes, your operations is central to optimizing and automating the business today. So our business users are looking for a couple of things. One thing they're looking for is some manageability and a consistent view of their data no matter where it lives. And our catalog can create that automatically in days or weeks depending on how big we go or broadly we go. They're looking for that visibility but also they're looking for productivity enhancements which means that they can start leveraging that data without a big IT project. And finally they're looking for agility which means there's self-service, there's an ability to access data that you know is trusted and secured and safe for the end users to use without having to call IT and have a program spin something up. And so they're really looking for a totally new paradigm of data delivery. I tell you that hits on so many things we've been seeing and a challenge we've seen in the marketplace. My world, talk about people, they had their data centers and if I look at my data and I look at my applications it's this heterogeneous nightmare. We call it hybrid or multi-cloud these days and it shows the promise of making me faster and all this stuff but as you said my data is all over the place. My applications are getting spun up and maybe I'm moving them and federating things and all that but my data is one of the most critical components of my business. Maybe explain a little bit how that works. Where do the customers come in and say oh my gosh I've got a challenge and podium data is helping in the marketplace and all that to fix it. Sure, I mean first of all we targeted from the start large regulated businesses, financial services, pharmaceutical healthcare and we've broadened since then but these companies data issues were really pressure from both ends. One was a compliance pressure. They needed to develop regulatory reports that could be audited and proven correct. And if your data is in many silos and it's compiled manually using spreadsheets that's not only incredibly expensive but and non reproducible it's really not auditable and so a lot of these folks were pressured to prove that the data they were reporting was accurate. And on the other side it's the opportunity cost. The FinTech companies are coming into their space offering loans and financial products without any human interaction, without any branches and they knew the data was the center to that. The only way you can make an offer to someone for a financial product is if you know enough about them that you understand the risk. So the use and leverage of data was a very critical mass. There was good money to invest in it and they also saw that the old ways of doing this just weren't working. So Paul, does your company help with kind of the incoming GDPR challenges that are being faced? Sure, last year we introduced a PII detector and protection scheme and that may not sound like such a big deal but in the Hadoop open source world it is at the end of the day this technology while cheap and powerful is incredibly immature. So when you land data for example into these open data platforms like S3 out on the cloud podium takes the time to analyze that data and tell you what the structures of the data are where you might have issues with sensitive data and has the tooling like obfuscation and encryption to protect the data so you can create safety used data and I'd say our customers right now they started out behind the firewall. Again these regulated businesses were very nervous about breaches. They're looking and realizing they need to get to the cloud cause frankly not only is it a better platform for them from a cost basis and scalability it's actually where the data comes from these days their data suppliers are in the cloud. So we're helping them catalog their data and identify the sensitive data and prepare data sets to move to the cloud and then migrate it to the cloud and manage it there. Yeah such a critical piece. I lived in the storage world for about a decade. There was a little acquisition that they made of a company called Pi, PI it was Paul Merrittz who a lot of people know, Paul had a great career at Microsoft went on to run VMware for a bunch but it was the vision you talk about reminds me of what I heard Paul Merrittz talking to gosh that was a decade ago. So information, so much sensitivity expand a little bit the security aspect there. When I look through your website you're not a security company per se but are there partnerships? How do you help customers with, I want to leverage data but it needs to be secure and all the GRC and security things is super challenging. And in this space to achieve agility and scale on a new technology you have to be enterprise ready. So in version one of our product we had security features that included field level encryption and protection but also integration with LDAP and Kerberos and other enterprise standard mechanisms and systems that would protect data. We can interoperate with protégories and other kinds of encryption and protection algorithms with our open architecture but it's kind of table stakes to get your data in a secured monitorable infrastructure if you're going to enable this agility and self service. Otherwise you restrict the use of the new data technologies to sandboxes which really make the failures you hear about are not in the sandboxes and the exploration they're in getting those to production. I had one of my customers talk about how before podium they had 50 different projects on Hadoop and all of them were in code red and none of them could go to production. Paul, you mentioned catalogs. Give us the update. What's the newest from podium data? You know, help explain that a little bit more. So we believe that the catalog has to help operationalize the data delivery process. So one of the things we did from the very start was say let's use the analytical power of big data technologies, Spark, Hadoop and others to analyze the data on its way into the platform and build a metadata catalog out of that. So we have over a hundred profiling statistics that we automatically calculate and maintain for every field of every file we ever load. It's not something you do as an afterthought or selectively. We knew from our experience that we needed to do that, data validation, and then bring in inferences such as this field looks like PII data and tag that in the metadata. And that process of taking in data and this even applies to legacy mainframe data coming in in a VSAM format. It gets converted and landed to a usable format automatically, but the most important part is the catalog gets enriched with all this statistical profiling information, validation, all of the technical information and we interoperate as well as have a GUI to help with business tagging, business definitions and the like. Yeah, Paul, just a little bit of a broader industry question. We talked the value of data, I think everybody understands how important is it? How are we doing an understanding really the value of that data though? Was that a monetization thing? We've got academia in your background. There's debates, we've talked to some people at MIT about this. How do you look at data value as an industry in general? Is there anything from podium data that you help people identify? Are we leveraging it? Are we doing the most? What are your thoughts around that? So I'd say for someone who's looking for a good framework to think about this, I'd recommend Doug Lenny's book on infonomics. He's doing, we've collaborated for a while, he's doing a great job there. But there's also just a blocking and tackling, which is what data is getting used? Or a common one for our customers is where do I have data that's duplicate or comes from the same source, but it's not exactly the same? That often causes reconciliation issues in finance or in forecasting and sales analysis. So what we've done with our data catalog, with all these profiling statistics is start to build some analytics that identify similar data sets that don't have to be exactly the same to say you may have a version of the data that you're trying to load here already available. Why don't you look at that data set and see if that one is preferred? And the data governance community really likes this. It's for one of our customers, there were literally millions of dollars in savings of eliminating duplication, but the more important thing is the inconsistency when people are using similar, but not the same data sets. So we're seeing that as a real driver. Okay, I want to give you the final word, just what are you seeing out in the industry these days, biggest opportunities, biggest challenges from the users you're talking to? Well, what I'd say is when we started this, it was very difficult for a traditional business to use Hadoop in production and they needed an army of programmers. I think we solved that. Last year we started on our work to move to a post-Hadoop world. So the first thing we've done is open up our cataloging tool so we can catalog any data set in any source and allow the data to be brought into an analytical environment or production environment more on demand than the idea that you're going to build a giant data lake with everything in it and replicate everything. That's become really interesting because you can build the catalog in a few weeks and then actually use the analysis and all the contents to drive the strategy. What do I prioritize? Where do I put things? The other big initiative is of course cloud. That is, as I mentioned earlier, you have to protect and make cloud ready data behind your firewall and then you have to know where it's used and how it's used externally. We automate a lot of that process and make that transition something that you can manage over time. And that is now going to be extended into multi-cloud, multi-lake type of technologies. Multi-cloud, multi-lake. All right, well, Paul Barth, appreciate getting the update, everything happening with podium data. Well, theCUBE, it's so many events this year. Be sure to check out theCUBE.net for all the upcoming events and all the existing interviews. I'm Stu Miniman. Thanks for watching theCUBE.