 From the SiliconANGLE Media office in Boston, Massachusetts, it's theCUBE. Now, here's your host, Dave Vellante. Hello everyone and welcome to this CUBE conversation. You know theCUBE has been following the trends in the so-called big data space since 2010. And one of the things that we reported on for a number of years is the complexity involved in wrangling and making sense out of data. The allure of this idea of no schema on right and very low-cost platforms like Hadoop, became a data magnet. And for years, organizations would shove data into a data lake, and of course the joke was it became a data swamp. And organizations really struggled to realize the promised return on their big data investments. Now, while the cloud certainly simplified infrastructure deployment, it really introduced a much more complex data environment and data pipeline with dozens of APIs and a mind boggling array of services that required highly skilled data engineers to properly ingest, shape, and prepare that data, so that it could be turned into insights. This became a real time sock for data pros who spent 70 to 80% of their time wrestling data. A number of people saw the opportunity to solve this problem and automate the heavy lift of data and simplify the process to ingest, synchronize, transform, and really prepare data for analysis. And one of the companies that is attacking this challenge is Infoworks. And with me, to talk about the evolving data landscape is Buno Paddy, CEO of Infoworks. Buno, great to see you, thanks for coming in. Well thank you Dave, thanks for having me here. You're welcome, I love that you're in Palo Alto, you come to Metro West in Boston to see us, so that's great, well welcome. So you heard my narrative, we're 10 years into this sort of, 10 years plus into this big data theme and meme. What did we learn? What are some of the failures and successes that we can now build on from your point of view? All right, so Dave, I'm going to start from the top with why big data, right? I think big data, this big data movement really started with the realization by companies that they need to transform their customer experience and their operations in order to compete effectively in this increasingly digital world, right? And in that context they also realized very quickly that data was the key asset on which this transformation would be built. So given that, you look at this and say what is digital transformation really about? It is about competing with digital disruption or fending off digital disruption. And this has become over time an existential imperative. You cannot survive and be relevant in this world without leveraging data to compete with others who would otherwise disrupt your business. You know, let's stay on that for a minute because when we started the whole big data, covering that big data space, you didn't really hear about digital transformation. That's sort of a more recent trend. So I got to ask you, what's the difference between a business and a digital business in your view? That is the foundational question behind big data. So if you look at a digital native, there are many of them that you can name, these companies start by building a foundational platform on which they build their analytics and data programs. It gives them a tremendous amount of agility and the right framework within which to build a data-first strategy. A data-first strategy where business information is persistently collected and used at every level of the organization. Furthermore, they take this and they automate this process because if you want to collect all your data and leverage it at every part of the business, it needs to be a highly automated system and it needs to be able to seamlessly traverse on-premise cloud, hybrid and multi-cloud environments. Now, let's look at a traditional business. In a traditional enterprise, there is no foundational platform. There are things like point tools for ETL and data integration and you can name a whole slew of other things that need to be stitched together and somehow made to work to deliver data to the applications that consume. The strategy is not a data-first strategy, it is use case by use case. When there is a use case, people go and find the data, they gather the data, they transform that data and eventually feed an application, a process that can take months to years depending on the complexity of the project that they try. And they don't automate this, this is heavily dependent as you pointed out on engineering talent, highly skilled engineering talent, that is scarce and they have not seamlessly traversed the various clouds and on-premise environments but rather fragmented those environments where individual teams are focused on a single environment building different applications using different tools and different infrastructure. So you're saying the digital native company puts data at the core, they organize around that data as opposed to maybe around a bottling plant or around people and then they leverage that data for competitive advantage through a platform that's kind of table stakes and then obviously there's cultural aspects and other skills that they need to develop. Yeah, they have an ability which traditional enterprises don't. Because of this choice of a data first strategy with a foundational platform, they have the ability to rapidly launch analytics use cases and iterate on them. That is not possible in a traditional or legacy environment. So their speed to market is gonna be and time to value is gonna be much better than their competition. This gets into the risk of disruption. Sometimes we talk about cloud native and cloud naive. You can talk about digital native and digital naive. So it's hard for incumbents to fend off the disruptors and then ultimately become disruptors themselves. But what are you seeing in terms of some of the trends where organizations are having success there? One of the key trends that we're seeing or key attributes of companies that are seeing a lot of success are is when they have organized themselves around their data. What do I mean by that? This is usually a high level mandate coming down from the top of the company and where they are forming centralized groups to manage the data and make it available for the rest of the organization to use. There are a variety of names that are being used for this. People are calling it their data fabric. They are calling it data as a service which is pretty descriptive of what it ends up being. And those are terms that are all sort of representing the same concept of a centralized environment and ideally a highly automated environment that serves the rest of the business with data. And the goal ultimately is to get any data at any time for any application. So let's talk a little bit about the cloud. I mentioned up front that the cloud really simplified infrastructure deployment but it really didn't solve this problem of we talked about in terms of data wrangling. So why didn't it solve that problem? You've got companies like Amazon and Google and Microsoft who are very adept at data. They are some of these data-first companies. Why is it that the cloud sort of in and of itself has not been able to solve this problem? Okay, so when you say solve this problem, it sort of begs the question, what's the goal? And if I were to very simply state the goal I would call it analytics agility. It is gaining agility with analytics. Companies are going from a traditional world where they had to generate a handful of BI and other reporting type of dashboards in a year to where they literally need to generate thousands of these things in a year to run the business and compete with digital disruption. So agility is the goal. By the way, the cloud is all about agility, is it not? It is. When you talk about agility of compute and storage infrastructure. So there are three layers to this problem. The first is what is the compute and storage infrastructure? The cloud is wonderful in that sense. It gives you the ability to rapidly add new infrastructure and spin it down when it's not in use. That is a huge blessing when you compare it to the six to nine months or perhaps even longer that it takes companies to order, install, and test hardware on premise and then find that it's only partially used. The next layer on that is what is the operating system on which my data and analytics are going to be run? This is where Hadoop comes in. Now, Hadoop is inherently complex but operating systems are complex things. And Spark falls in that category, Databricks has taken some of the complexity out of running Spark because of their sort of managed service type of offering. But there's still a missing layer which leverages that infrastructure and that operating system to deliver this agility where users can access data that they need anywhere in the organization without intensely deep and knowledge of what that infrastructure is and what that operating system is doing underneath. So in my upfront narrative, I talked about the data pipeline a little bit but I'm inferring from your comments on platform that it's more than just the sort of narrow data pipeline. There's a macro here. I wonder if you could talk about that a little bit. So the data pipeline is one piece of the puzzle. What needs to happen? Data needs to be ingested. It needs to be brought into these environments. It has to be kept fresh because the source data is persistently changing. It needs to be organized and cataloged so that people know what's there and from there, pipelines can be created that ultimately generate data in a form that's consumable by the application but even surrounding that, you need to be able to orchestrate all of this. Typical enterprise is a multi-cloud enterprise. 80% of all enterprises have more than one cloud that they're working on and on premise. So if you can't orchestrate all of this activity and the pipelines and the data across these various environments, that's not a complete solution either. There's certainly no agility in that. Then there's governance, security, lineage. All of this has to be managed. It's not simply creation of the pipeline but all the surrounding things that need to happen in order for analytics to run at scale within enterprises. So the cloud sort of solved that layer one problem. And you saw this in the sort of not early days but sort of mid days of a dupe where the cloud really became the place where people wanted to do a lot of their Hadoop workloads. And it was kind of ironic that guys like Hortonworks and Cloudera and Mapbar really didn't have a strong cloud play. But now it's sort of flipping back where as you point out, everybody's multi-cloud. So you have to include a lot of these on-prem systems whether it's your Oracle database or your ETL systems or your existing data warehouse. Those are data feeds into the cloud or the digital incumbent who wants to be a digital native. They can't just throw all that stuff away, right? So you're seeing an equilibrium there. An equilibrium between? Yeah, between sort of what's in the cloud and what's on-prem and how do you... So let me ask it this way. If the cloud is not a panacea, is there an approach that does really solve the problem of different data sets, the need to ingest them from different clouds on-prem and bring them into a platform that can be analyzed and drive insights for an organization? Yeah, so I'm gonna stay away from the word panacea because I don't think there ever is really a panacea to any problem. That's good. That means we got a good roadmap for our business then. However, there is a solution. And the solution has to be guided by three principles. Number one, automation. If you do not automate, the dependence on skilled talent is never gonna go away. And that talent, as we all know, is very, very scarce and hard to come by. The second thing is integration. So what's different now? All of these capabilities that we just talked about, whether it's things like ETL or cataloging or ingesting or keeping data fresh or creating pipelines, all of this needs to be integrated together as a single solution. And that's been missing. Most of what we've seen is point tools. And the third is absolutely critical. For things to work in multi-cloud and hybrid environments, you need to introduce a layer of abstraction between the complexity of the underlying systems and the user of those systems. And the way to think about this, Dave, is to think about it much like a compiler. What does a compiler do? You don't have to worry about what Intel processor is underneath, what version of your operating system you're running on, what memory is in the system. Ultimately, as much as we love assembly code. As much as we love assembly code. Now, now, now, so take the analogy a little bit further. There was a time when we wrote assembly code because there was no compiler. So somebody had to sit back and say, hey, wouldn't it be nice if we abstracted away from this? Okay, so this sort of sets up, my next question was, is this why you guys started Infoworks? Maybe you could talk a little bit about your why and kind of where you fit. So let me give you the history of Infoworks because the vision of Infoworks, believe it or not, came out of a rearview mirror, looking backwards, not forwards, right? And then predicting the future in a different manner. So Omar Arsikere is the founder of Infoworks. And when I met him, he had just left Zynga where he was the general manager of their gaming platform. And what he told me was very, very simple. He said he had been at Google at a time when Google was moving off of the legacy systems of, I believe it was Neteza and Oracle and a variety of things. And they had just created Bigtable and they wanted to move and create a data warehouse on Bigtable. So he was given that job and he led that team. And that, as you might imagine, was this massive project that required a high degree of automation to make it all come together. And he built that and then he built a very similar system at Zynga when he was there. These foundational platforms, going back to what I was talking about for digital data. When I met him, he said, look, looking back, Google may have been the only company that needed such a platform, but looking forward, I believe that everyone's gonna need one. And that has absolute truth in it and that's what we're seeing today where after going through this exercise of trying to write machine code or assembly code or whatever we'd like to call it down at the detailed, complex level of an operating system or infrastructure, people have realized, hey, I need something much more holistic. I need to look at this from an enterprise wide perspective and I need to eliminate all of those dependents on talent and the cloud plays a role because it eliminates some of the dependents or the bottlenecks around hardware and infrastructure. And ultimately gain a lot more agility than I'm able to do with legacy methodology. So you were asking early on, what are the lessons learned from that first 10 years? And a lot of technology goes through these types of cycles of hype and disillusionment and we all know the curve. I think there are two key lessons. One is just having a place to land your data doesn't solve your problem. That's the beginning of your problems. And the second is that legacy methodologies do not transfer into the future. You have to think differently and looking to the digital natives as guides for how to think when you're trying to compete with them is a wonderful perspective to take. But those legacy technologies, if you're an incumbent you can't just rip them and throw them out and convert. You're going to use them as feeders to your digital platform. So presumably you guys have products, you call this space enterprise data ops and orchestration EDO too. Presumably you have products and a portfolio to support those higher layer challenges that we talked about, right? Yeah, so that's a really important question. No, you don't rip and replace stuff. These enterprises have been built over years of acquisitions and business systems. These are layers, one on top of another. So think about the introduction of ERP. By the way, ERP is a good analogy to what happened because those were point tools that were eventually combined into a single system called ERP. Well, these are point capabilities that are being combined into a single system for EDO too or enterprise data operations and orchestration. The old systems do not go away and we are seeing some companies wanting to move some of their workloads from old systems to new systems. But that's not the major trend. The major trend is the new things that get done. The things that give you holistic views of the company and then analytics based on that holistic view are all being done on the new platforms. So it's a layer on top. It's not a rip and replace of the layers underneath. What's in place stays in place. But for the layer on top, you need to think differently. You cannot use all the legacy methodologies and just say that's gonna apply to the new platform or new system. Okay, so how do you engage with customers? Take a customer who's got on-prem, they've got legacy infrastructure, they don't wanna get disrupted, they wanna be a digital native. How do you help them? What do I buy from you? Yeah, so our product is called Data Foundry. It is a EDO2 system. It is built on the three principles, founding principles that I mentioned earlier. It is highly automated. It is integrated in all the capabilities that surround pipelines, perhaps. And ultimately, it's also abstracted. So we're able to very easily traverse one cloud to another on-premise to the cloud or even back. There are some customers that are moving some workloads back from the cloud. Now, what's the benefit here? Well, first of all, we lay down the foundation for digital transformation. And we enable these companies to consolidate and organize their data in these complex hybrid cloud, multicloud environments. And then generate analytics use cases 10x faster with about 10th of the resource. And I'm happy to give you some examples of how that works. Please do. I mean, maybe you could share some customer examples? Yeah, absolutely. So let me talk about Macy's. Macy's is a customer of ours. They've been a customer for about, I think about 14 months at this point in time. And they had built a number of systems to run their analytics, but then recognized what we're seeing other companies recognize. And that is there's a lot of complexity there. And building it isn't the end game. Maintaining it is the real challenge, right? So even if you have a lot of talent available to you, maintaining what you build is a real challenge. So they came to us. And within a period of 12 months, I'll just give you some numbers that are just mind blowing. They are currently running 165,000 jobs a month. Now what's a job? A job is an ingestion job or synchronization job or transformation. They have launched 431 use cases over a period of 12 months. And you know what, they're just ramping. They will get to thousands. Scale. Scale. And they are, you know, have ingested a lot of data, brought in a lot of data sources. So to do that in a period of 12 months is unheard of. It does not happen. Why is it important for them? So what are they, what problem are they trying to solve? They're a retailer. They're being digitally disrupted like no one else. They have an Amazon war road, no doubt. And they have had to build themselves out as an omnichannel retailer now. They are online. They are also with brick and mortar stores. So you take a look at this. And the key to competing with digital disruptors is the customer experience. What is that experience? You're online. How does that meld with your in-store experience? What happens if I buy online and return something in a store? How does all this come together into a single unified experience for the consumer? And that's what they're chasing. So that was the first application that they came to us with. They said, look, let us go into a customer 360, right? Let us understand the entirety of that customer's interaction and touch points with our business. And having done so, we are in a position to deliver a better experience. That's a data problem. I mean, different data sources and trying to understand 360. I mean, you've got data all over the place. All over the place. And there's historical data. There's stuff coming in from what's online, what's in the store. And then they progress from there. I mean, they're not restricting it to customer experience and selling. They're looking at merchandising and inventory and fulfillment and store operations. Simple problem. You order something online. Where do I pull this from? A store or a warehouse? So this is, you know, big data 2.0. I mean, just to use a sort of silly term. But it's really taking advantage of all the investment. I've often said, you know, Hadoop, for all the criticism it gets, it did lower our cost of getting data into at least one virtual place. And it got us thinking about how to get insights out of data. And so what you're describing is the ability to operationalize your data initiatives at scale. You can absolutely get your insights off of Hadoop, right? And I know people have different opinions of Hadoop given their experience. But what they don't have, what these customers have not achieved yet, most of them, is that agility, right? So how easily can you get your insights off of Hadoop? Do I need to hire a boatload of consultants who are going to write code for me and shovel data in and create these pipelines and so forth? Or can I do this with a clickable button, right? And that's the difference. That is truly the difference, the level of automation that you need and the level of abstraction that you need, away from this complexity, it has not been delivered. We did, and it must have been 2011, I think the very first big data market study from anybody in the world and put it out on Wikibon free research. And one of the findings was, this is a huge services business. I mean, the professional services where all the money was going to flow because it was so complicated. And that's exactly what happened. But now we're entering, really, it seems like a phase where you can scale and operationalize and really simplify and really focus your attention on driving business value versus making stuff work. You're absolutely correct. So I'll give you the numbers. 55% of this industry is services. About 30% is software and the rest is hardware. Break it down that way. 55%. So what's going on? People will buy a big data system. Call it Hadoop. It could be something in the cloud. It could be Databricks. And then this is welcome to the world of SIs. Because at this point, you need these SIs to write code and perform these services in order to get any kind of value out of that. And look, we have some dismal numbers that we're staring at. Only, according to Gartner, only 17% of those who have invested in Hadoop have anything in production. This is after how many years? And you look at the serveries from, pick your favorite. They all look the same. People have not been able to get the value out of this because it is too hard. It is too complex. And you need too many consultants and delivering services for you to make this happen. Well, what I like about your story, when it was your not, I mean, a lot of the big data companies are pivoted to AI. We often joke, same wine, new bottle. But you're not talking about, I mean, machine intelligence, I'm sure fits in here, but you're talking about really taking advantage of the investments that you've made in the last decade and helping incumbents become digital natives. That sounds like it's at least a part of your mission here. Not become digital natives, but rather compete with them. Effectively, right? So, yeah, that is absolutely what needs to get done. So let me talk for a moment about AI, right? I actually, way back when, there was another wave of AI in the late 80s. I was part of that. I was doing my PhD at the time. And that obviously went nowhere because we didn't have any data. We didn't have enough compute power or connectivity. Pre-internet, all right. So here it is again. Very little has changed. Except for we do have the data. We have the connectivity. I only have the compute power. But do we really? So what's AI without the data? Just A, right? There's nothing there. So what's missing, even for AI and ML to be, and I believe these are gonna be powerful game changers, right? But for them to be effective, you need to provide data to it and you need to be able to do so in a very agile way so that you can iterate on ideas. No one knows exactly what AI solution is gonna solve your problem or enhance your business. This is a process of experimentation. This is what a company like Google can do extraordinarily well because of this foundational platform. They have this agility to keep iterating and experimenting and trying ideas because without trying them, you will not discover what works best. Yeah, I mean, for 50 years, this industry has marched to the cadence of Moore's law and that was really the engine of innovation and today it's about data, applying machine intelligence to that data and the cloud brings, as you point out, agility and scale. That's kind of the new cocktail for innovation. The cloud brings agility and scale to the infrastructure. In low risk, as you said, experimentation, fail fast, et cetera. But without an ED02 type of system that has a high level and gives you a great degree of automation, you could spend six months to run one experiment with AI. Yeah, but gathering data and feeding it. Because if the answer is people and throwing people at the problem, then you're not going to scale. You're not going to scale and you're never going to really leverage AI and ML capabilities. You need to be able to do that, not in six months, in six days or less. So let's talk about your company a little bit. Can you give us the status, where you're at, what kind of, is the newly minted CEO, what your sort of goals are, milestones that we should be watching? Yeah, I want to be honest. So, newly minted CEO, I came in July of last year. This has been an extraordinary company. I started my journey with this company as an investor. And it was funded by actually two funds that I was associated with, first being Nexus Venture Partners and then Centerview Capital, where I'm still a partner. And myself and my other two partners looked at the opportunity and what the company had been able to do. And in July of last year, I joined as CEO, my partner, David Dorman, who used to be CEO of AT&T. He joined as chairman. And my third partner, Ned Hooper, joined as president and chief operating officer. Ned used to be the chief strategy officer of Cisco. So we pushed pause on the fund, Dave. And that's about as all in as a fund can get. Yeah, so you guys are operational experts that became investors and said, okay, we're going to dive back in and actually run the business. And here's why. So we obviously see a lot of companies as investors, as they go out and look for funding. There are three things that come together very rarely. One is a massive market opportunity combined with the second, which is the right product to serve that opportunity. But the third is pure luck, timing. It's timing. And timing, it's a very, very challenging thing to try to predict. I don't, you know, you could get lucky and get it right, but then again, it's luck. This had all three. It was the absolute perfect time. And it's largely because of what you described, the 10 years of time that had elapsed where people had sort of run the experiment and we're not going to get fooled again by how easy this is supposed to be by just getting one piece or the other. They will recognize that they need to take this holistic added approach and deploy something as an enterprise-wide platform. Yeah, I mean, you talk about a large market, I don't even know how you do a TAM, and now what's the TAM, it's data. It's the data universe, which is just massive. So I have to ask you a question as an investor. I think you've raised what, 50 million? We've raised 50 million. The last round was led by NEA. Right, okay. Got great investors, hefty amount, although, you know, in this day and age, you're seeing just outrageous amounts being raised. I mean, software obviously is a capital-efficient business, but today, you need to raise a lot of money for promotion, to get your name out there. So what's your thoughts on as a Silicon Valley investor is this wave, I mean, get it while you can, I guess. You know, we're in the 10th year of this boom market, but your thoughts. You're asking me to put on my other hat. I think companies have in general raised too much money at too high a value, too fast. And there is a penalty for that. And the down-round IPO, which has become fashionable these days, is one of those penalties. It's a clear indication. Markets are very rational. Public markets are very rational. And the pricing in a public market, when it's significantly below the pricing of in a private market is telling you something. So we are a little old fashioned in that sense. And it is, you know, we believe that a company has to lay down the right foundation before it adds fuel to the mix and grows. You have to have evidence that the machinery that you build, whether it's for sales or marketing or other good market activities or even product development is working. And if you do not see all of those signs, you're building a very fragile company. And adding fuel in that setting is like flooding the carburetor. You don't necessarily go faster. You just consume more. So there's a little bit of perhaps old fashioned discipline that we bring to the table. And you can argue against it. You say, well, why don't you just raise a lot of money? Hire a lot of sales guys and hope for the best. See what sticks? Yeah. We are fully expecting to build a large institution here. And I use that word carefully. And for that to happen, you need the right foundation down first. Well, that resonates with us East Coast people. So, Bruno, thanks very much for coming on theCUBE and sharing with us your perspectives on the marketplace. And best of luck with info. Thank you, Dave. This has been a pleasure. Thank you for having me here. All right, we'll be watching. Thank you. And thank you for watching everybody. This is Dave Vellante for theCUBE. We'll see you next time.