 From around the globe, it's theCUBE, presenting ActiveDQ, Intelligent Automation for Data Quality, brought to you by IOTAHO. Okay, now we're going to look at the role automation plays in mobilizing your data on Snowflake. Let's welcome in Duncan Turnbull, who's partner sales engineer at Snowflake, and AJ Vahora is back CEO of IOTAHO. He's going to share his insight. Gentlemen, welcome. Thank you, David. Good to be back. Yes, great to have you back, AJ. And it's really good to see IOTAHO expanding the ecosystem. So important now, of course, bringing Snowflake in. It looks like you're really starting to build momentum. I mean, there's progress that we've seen every month, month by month over the past 12, 14 months. Your seed investors, they got to be happy. They are, they're happy and they can see that we're running into a nice phase of expansion here and new customers turning up and now we're ready to go out and raise that next round of funding. I think maybe think of us like Snowflake five years ago. So we're definitely on track with that. A lot of interest from investors and really now trying to focus in on those investors that can partner with us and understand AI data and automation. Well, so personally, I mean, you've managed a number of early stage VC funds. Four of them, you've taken several software companies through many funding rounds and growth and all the way to exit. So you know how it works. You get product, market fit, you got to make sure you get your KPIs right and you got to hire the right sales people. But what's different this time around? Well, the fundamentals that you mentioned, those are never changed. And what we can see, what I can see that's different, that's shifted this time around is three things. One, in that they used to be this kind of choice of do we go open source or do we go proprietary? Now that has turned into a nice hybrid model where we've really keyed into Red Hat doing something similar with CentOS. And the idea here is that there is a core capability technology that depends a platform, but it's the ability to then build an ecosystem around that made up of a community. And that community may include customers, technology partners, other tech vendors and enabling the platform adoption so that all of those folks in that community can build and contribute while still maintaining the core architecture and platform integrity at the core of it. And that's one thing that's changed was seeing a lot of that type of software company emerge into that model, which is different from five years ago. And then leveraging the cloud, every cloud, Snowflake cloud being one of them here, in order to make use of what customers and customers in enterprise software are moving towards. Every CIO is now in some configuration of a hybrid IT with state, whether that is cloud, multi-cloud, on-prem, that's just the reality. The other piece is in dealing with the CIO's legacy. So the past 15, 20 years they've purchased many different platforms, technologies and some of those are still established and still function. How do you enable that CIO to make a purchase while still preserving and in some cases building on and extending the legacy mature technologies that they've invested their people's time in training and financial investment into? Yeah, of course. Solving a problem, customer pain point with technology that never goes out of fashion. No, that never changes. You have to focus like a laser on that. And of course, speaking of companies who are focused on solving problems, Duncan Turnbull from Snowflake, you guys have really done a great job and really brilliantly addressing pain points, particularly around data warehousing, simplified that. You're providing this new capability around data sharing, really quite amazing. Duncan, AJ talks about data quality and customer pain points in enterprise IT. Why is data quality been such a problem historically? Sure, so one of the biggest challenges that's really affected by in the past is that because to address everyone's need is for using data. They've evolved all these kinds of different places to store it, all these different silos or data marks or all this kind of proliferation of places where data lives. And all of those end up with slightly different schedules for bringing data in and out. They end up with slightly different rules for transforming that data and formatting it and getting it ready in slightly different quality checks for making use of it. And this then becomes like a big problem in that these different teams are then gonna have slightly different or even radically different answers to the same kinds of questions, which makes it very hard for teams to work together on their different data problems that exist inside the business, depending on which of these silos they end up looking at. And what you can do, if you have a single kind of scalable system for putting all of your data into it, you can kind of sidestep a lot of this complexity and you can address the data quality issues in a single way. Now, of course we're seeing this huge trend in the market towards robotic process automation, RPA. That adoption is accelerating. You see in UI paths, IPO, you know, 35 plus billion dollars valuation, snowflake like numbers, nice cops there for sure. Ajay, you've coined the phrase data, RPA. What is that in simple terms? Yeah, I mean, it was born out of seeing how in our ecosystem across that community, developers and customers, general business users were wanting to adopt and deploy to our host technology. And we could see that, I mean, it's not marketing RPA, we're not trying to automate that piece, but wherever there is a process that was tied into some form of a manual overhead with handovers and so on, that process is something that we're able to automate with our host technology and the employment of AI and machine learning technology specifically to those data processes almost as a precursor to getting into financial automation. That's really where we're seeing the momentum pickup, especially in the last six months. And we've kept it really simple with snowflake. We kind of stepped back and said, well, you know, the resource that snowflake can leverage here is a metadata. So how could we turn snowflake into that repository of being the data catalog? And by the way, if you're a CIO looking to purchase a data catalog tool, stop, there's no need to, working with snowflake with enable that intelligence to be gathered automatically and to be put to use within snowflake. So reducing that manual effort and putting that data to work. And that's where we've packaged this with our AI machine learning specific to those data tasks. And it made sense. That's what it's resonated with with our customers. You know, what's interesting here, just a quick aside, is I've been watching snowflake now for a while. And of course the competitors come out and maybe criticize, well, I don't have this feature. They don't have that feature. And it's snowflake seems to have an answer. And the answer oftentimes is, well, it's ecosystem. Ecosystem is going to bring that because we have a platform that's so easy to work with. So I'm interested Duncan, in what kind of collaborations you are enabling with high quality data. And of course, you know, your data sharing capability. Yeah. So I think, you know, the ability to work on, on data sets isn't just limited to inside the business itself or even between different business units. So we were kind of discussing maybe with those silos before when looking at this idea of collaboration, we have these challenges where we want to be able to exploit data to the greatest degree possible. But we need to maintain the security, the safety, the privacy and governance of that data could be quite valuable. It could be quite personal depending on the application involved. One of these novel applications that we see between organizations of data sharing is this idea of data clean rooms. And these data clean rooms are safe collaborative spaces which allow multiple companies or even divisions inside a company where they have particular privacy requirements to bring two or more data sets together for analysis but without having to actually share the whole unprotected data set with each other. And this lets you to, you know, when you do this inside Snowflake you can collaborate using standard tool sets. You can use all of our SQL ecosystem. You can use all of the data science ecosystem that works with Snowflake. You can use all of the BI ecosystem that works with Snowflake. But you can do that in a way that keeps the confidentiality that needs to be preserved inside the data intact. And you can only really do these kinds of collaborations especially cross-organization but even inside large enterprises when you have good reliable data to work with otherwise your analysis just isn't going to really work properly. A good example of this is one of our large gaming customers who's an advertiser. They were able to build targeted ads to acquire customers and measure the campaign impact and revenue but they were able to keep their data safe and secure while doing that while working with advertising partners. Business impact of that was they were able to get a lift of 20 to 25% in campaign effectiveness through better targeting and actually pull through into that of a reduction in customer acquisition costs because they just didn't have to spend as much on the forms of media that weren't working for them. So AJ, I wonder, I mean, with the way public policy is shaping out obviously GDPR started in the States California Consumer Privacy Act that people are sort of taking the best of those and there's a lot of differentiation but what are you seeing just in terms of governments really driving this move to privacy? Yeah, government public sector we're seeing a huge wake up and activity in across the whole piece there. Part of it has been data privacy. The other part of it is being more joined up and more digital rather than paper or form based. We've all got stories of waiting in line, holding a form taking that form to the front of the line and handing it over a desk. Now government and public sectors really looking to transform their services into being online digital cell service. And that whole shift is then driving the need to emulate a lot of what the commercial sector is doing to automate their processes and to unlock the data from silos to put through into those processes. Another thing that I can say about this is the need for data quality as Duncan mentioned and the pins, all of these processes, government, pharmaceuticals, utilities, banking, insurance. The ability for a chief marketing officer to drive a loyalty campaign, the ability for a CFO to reconcile accounts at the end of the month to do a quick accurate financial close. Also the ability of customer operations to make sure that the customer has the right details about themselves in the right application that they consult so from. All of that is underpinned by data and is effective or not based on the quality of that data. So whilst we're mobilizing data to the snowflake cloud, the ability to then drive analytics, prediction, business processes off that cloud succeeds or fails on the quality of that data. I mean, and you know, I would say, I mean, it really is table stakes. If you don't trust the data, you're not going to use the data. The problem is it always takes so long to get to the data quality. There's all these endless debates about it. So we've been doing a fair amount of work and thinking around this idea of decentralized data. Data by its very nature is decentralized, but the fault domains of traditional big data is that everything is just monolithic and the organization's monolithic, the technology's monolithic, the roles are very, you know, hyper-specialized. And so you're hearing a lot more these days about this notion of a data fabric or what Jamak Degani calls a data mesh. And we've kind of been leaning into that and the ability to connect various data capabilities, whether it's a data warehouse or a data hub or a data lake, that those assets are discoverable, they're shareable through APIs and they're governed on a federated basis and you're using now bringing in a machine intelligence to improve data quality. You know, I wonder, Duncan, if you could talk a little bit about Snowflake's approach to this topic. Sure, so I'd say that, you know, making use of all of your data is the key kind of driver behind these ideas of data meshes or data fabrics. And the idea is that you want to bring together not just your kind of strategic data, but also your legacy data and everything that you have inside the enterprise. I think I'd also like to kind of expand upon what a lot of people view as all of the data. And I think that a lot of people kind of miss that there's this whole other world of data that they could be having access to, which is things like data from their business partners, their customers, their suppliers, and even stuff that's more in the public domain, whether that's demographic data or geographic or all these kinds of other types of data sources. And what I'd say to that to some extent is that the data cloud really facilitates the ability to share and gain access to this both kind of between organizations, inside organizations, and you don't have to make lots of copies of the data and kind of worry about the storage and this federated idea of governance or these things that's quite complex to kind of manage. This, the snowflake approach really enables you to share data with your ecosystem or the world without any latency, with full control over what's shared, without having to introduce new complexities or having complex interactions with APIs or software integration. The simple approach that we provide allows a relentless focus on creating the right data product to meet the challenges facing your business today. So, Angel, the key here is, Duncan's talking about it in my mind anyway, and my take away is to simplicity. If you can take the complexity out of the equation, you're going to get more adoption. It really is that simple. Yeah, absolutely. And I think that that whole journey, maybe five, six years ago, the adoption of data lakes was a stepping stone. However, the Achilles heel there was, the complexity that it shifted towards consuming that data from a data lake where there are many, many sets of data to be able to curate and to consume. Whereas actually, you know, the simplicity of being able to go to the data that you need to do your role, whether you're in tax compliance or in customer services is key. And, you know, listen, for Snowflake via Tahoe, one thing we know for sure is that our customers are super smart and they're very capable. They're data savvy and they'll want to use whichever tool and embrace whichever cloud platform that is going to reduce the barriers to solving what's complex about that data, simplifying that and using good old fashioned SQL to access data and to build products from it to exploit that data. So simplicity is key to it until they have people to make use of that data and CIOs recognize that. So, Duncan, the cloud obviously brought in this notion of DevOps and new methodologies and things like Agile, that's brought in the notion of data ops, which is a very hot topic right now. Basically, DevOps applies to data, but how does Snowflake, think about this, how do you facilitate that methodology? Yeah, so I'd agree with you absolutely there. Data ops takes these ideas of agile development of agile delivery and of the kind of DevOps world that we've seen just rise and rise and it applies them to the data pipeline, which is somewhere where it kind of traditionally hasn't happened. And it's the same kinds of messages as we see in the development world. It's about delivering faster development, having better repeatability and really getting towards that dream of the data-driven enterprise where you can answer people's data questions, they can make better business decisions. And we have some really great architectural advantages that allow us to do things like allow cloning of data sets without having to copy them, allows us to do things like time travel so we can see what data look like at some point in the past. And this lets you kind of set up both your own kind of little data playpen as a clone without really having to copy all of that data so it's quick and easy. And you can also, again with our separation of storage and compute, you can provision your own virtual warehouse for dev usage so you're not interfering with anything to do with people's production usage of this data. So these ideas, this scalability, it just makes it easy to make changes, test them, see what the effect of those changes are. And we've actually seen this, you were talking a lot about partner ecosystems earlier. The partner ecosystem has taken these ideas that are inside Snowflake and they've extended them, they've integrated them with DevOps and DataOps tooling. So things like version controlling Git and infrastructure automation and things like Terraform. And they've kind of built that out into more of a DataOps products that you can make use of. So we can see there's a huge impact of these ideas coming into the data world. We think we're really well placed to take advantage of them. The partner ecosystem has been doing a great job of doing that. And it really allows us to kind of change that operating model for data so that we don't have as much emphasis on like hierarchy and change windows and all these kinds of things that maybe views it a lot of fashioned. And we're kind of taking this shift from this batch data integration into streaming continuous data pipelines in the cloud. And this kind of gets you away from like a once a week or once a month change window if you're really unlucky to pushing changes in a much more rapid fashion as the needs of the business change. I mean, those hierarchical organizational structures that when we apply those to begin to that what it actually creates the silos. So if you're going to be a silo buster which AJ, I look at you guys as silo busters you've got to put data in the hands of the domain experts the business people, they know what data they want if they have to go through and beg and borrow for new data sets, et cetera. And so that's where automation becomes so key. And frankly, the technology should be an implementation detail not the dictating factor. I wonder if you could comment on this. Yeah, absolutely. I think making the technologies more accessible to the general business users or those specialist business teams that's the key to unlocking this. And it's interesting to see as people move from organization to organization where they've had those experiences operating in hierarchical sense. I want to break free from that and we've been exposed to automation, continuous workflows, change is continuous in IT it's continuous in business the market's continuously changing. So having that flow across the organization of work using key components such as GitHub and similar tools to drive process Terraform to build in code into the process and automation and without Tahoe leveraging all the metadata from across those fragmented sources is good to see how those things are coming together and watching people move from organization to organization say, hey, okay, I've got a new start. I've got my first hundred days to impress my new manager. What kind of an impact can I bring to this? And quite often we're seeing that as let me take away the good learnings from how to do it or how not to do it from my previous role. And this is an opportunity for me to bring in automation and I'll give you an example, David. Recently started working with a client in financial services who's an asset manager managing financial assets that have grown over the course of the last 10 years through M&A and each of those acquisitions are bought with it, technical debts, its own set of data, have multiple CRM systems now, multiple databases, multiple bespoke in-house created applications and when the new CIO came in and had a look at this, he thought, well, yes, I want to mobilize my data. Yes, I need to modernize my data state because my CEO is now looking at these crypto assets that are on the horizon and the new funds that are emerging that's around digital assets and crypto assets. But in order to get to that, where absolutely data and the pins there and it's the core asset, cleaning up that legacy situation, mobilizing the relevant data into the Snowflake Cloud platform is where we're giving time back. That is now taking a few weeks, whereas that transition to mobilize that data start with that new clean slate to build upon a new business as a digital crypto asset manager as well as the legacy, traditional financial assets, bonds, stocks, fixed income assets, you name it, is where we're starting to see a lot of innovation. Now, tons of innovation. I love the crypto examples, NFTs are exploding and that's the face that traditional banks are getting disrupted. And so I also love this notion of data RPA, especially because AJ have done a lot of work in the RPA space and what I would observe is that the early days of RPA, I call it paving the cow path, taking existing processes, applying scripts, letting software robots do its thing. And that was good because it reduced mundane tasks but really where it's evolved is a much broader automation agenda. People are discovering new ways to completely transform their processes. And I see a similar analogy for the data operating model. So I'm wondering what you think about that and how a customer really gets started bringing this to their ecosystem, their data life cycles. Sure, yeah, step one is always the same. It's figuring out for the CIO, the chief data officer, what data do I have? And that's increasingly something that they want to automate. So we can help them there and do that automated data discovery, whether that is documents in the file share, backup archive in a relational data store in a mainframe, really quickly hydrating that and bringing that intelligence to the forefront of what do I have? And then it's the next step of, well, okay, now I want to continually monitor and curate that intelligence with the platform that I've chosen, let's say Snowflake, in order such that I can then build applications on top of that platform to serve my internal external customers. And the automation around classifying data, reconciliation across different fragmented data silos, building that in those insights into Snowflake. As you'll see a little later on where we're talking about data quality, active DQ, allowing us to reconcile data from different sources as well as look at the integrity of that data to then go on to remediation. I want to harness and leverage techniques around traditional RPA. But to get to that stage, I need to fix the data. So remediating, publishing the data in Snowflake, allowing analysis to be performed in Snowflake. Those are the key steps that we see and just shrinking that timeline into weeks, giving the organization that time back means they're spending more time on their customer and solving their customer's problem, which is where we want them to be. Well, I think this is the brilliance of Snowflake, actually, Duncan, I've talked about Benoit D'Agevilla about this and your other co-founders and it's really that focus on simplicity. So I mean, you picked a good company to join by opinion. So I wonder, A.J., if you could talk about some of the industry sectors that are going to gain the most from data RPA. I mean, the traditional RPA, if I can use that term, you know, a lot of it was back office, a lot of financial, what are the practical applications where data RPA is going to impact businesses and the outcomes that we can expect? Yeah, so our drive is really to make that business, general users experience of RPA simpler and using no code to do that, where they've also chosen Snowflake to build their cloud platform. They've got the combination then of using relatively simple scripting techniques such as SQL with our no code approach. And the answer to your question is, whichever sector is looking to mobilize their data? It seems like a cop-out, but to give you some specific examples, David, now in banking, where our customers are looking to modernize their banking systems and enable better customer experience through through applications and digital apps, that's where we're seeing a lot of traction in this approach to play RPA to data. Healthcare, where there is a huge amount of work to do to standardize data sets across providers, payers, patients, and it's an ongoing process there. But for retail, helping to build that immersive customer experience, so recommending next best actions, providing an experience that is going to drive loyalty and retention. That's dependent on understanding what that customer's needs, intent are, being able to provide them with the content or the offer at that point in time, all data dependent utilities. There's another one, great overlap there with Snowflake where helping utilities, telecoms, energy, water providers to build services on their data. And this is where the ecosystem just continues to expand. If we're helping our customers turn their data into services for their ecosystem, that's exciting. And no way more so exciting than insurance, which it always used to think back to when insurance used to be very dull and mundane. Actually, that's where we're seeing a huge amounts of innovation to create new flexible products that are priced to the day, to the situation. And risk models being adaptive when the data changes on events or circumstances. So across all those sectors, that they're all mobilizing their data, they're all moving in some way or form to a multi-cloud setup with their IT. And I think with Snowflake and without Tahoe being able to accelerate that and make that journey simple and as complex is why we're found such a good partner here. All right, thanks for that. And thank you guys both. We got to leave it there. Really appreciate Duncan, you coming on and AJ, best of luck with the fundraising. We'll keep you posted. Thanks, David. All right, great. Okay, now let's take a look at a short video that's going to help you understand how to reduce the steps around your data ops. Let's watch.