 From around the globe, it's theCUBE, presenting ActiveDQ, Intelligent Automation for Data Quality brought to you by IOTAHO. Welcome to the sixth episode of the IOTAHO Data Automation Series on theCUBE. We're going to start off with a segment on how to accelerate the adoption of Snowflake with Glenn Grossman, who's the Enterprise Account Executive from Snowflake and Yusef Kahn, the head of data services from IOTAHO. Gentlemen, welcome. Good afternoon, good morning, good evening, Dave. Good to see you, Dave. Indeed, good to see you. Okay, Glenn, let's start with you. I mean theCUBE hosted the Snowflake Data Cloud Summit in November, and we heard from customers that go from, love the tagline, zero to Snowflake, you know, 90 minutes, very quickly. And of course you want to make it simple and attractive for enterprises to move data and analytics into the Snowflake platform. But help us understand, once the data is there, how is Snowflake helping to achieve savings compared to the data lake? Absolutely, Dave, it's a great question. You know, it starts off first with the notion and kind of we coined it in the industry, our t-shirt size pricing. You know, you don't necessarily always need the performance of a high-end sports car when you're just trying to go get some groceries and drive down the street 20 miles an hour. The t-shirt pricing really aligns to, depending on what your operational workload is to support the business and the value that you need from that business, not every day do you need data every second of the moment, might be once a day, once a week. And through that t-shirt size pricing, we can align for the performance according to what the environmental needs of the business, what those drivers are, the key performance indicators to drive that insight to make better decisions. It allows us to control that cost. So to my point, not always, do you need the performance of a Ferrari? Maybe you need the performance and gas mileage of the Honda Civic if you would, just to get and deliver the value of the business, but knowing that you have that entire performance landscape at a moment's notice. And that's really what allows us to hold and get away from how much is it going to cost me in a data lake type of environment. Got it, thank you for that. But Yusef, where does Iotaho fit into this equation? I mean, what's unique about the approach that you're taking toward this notion of mobilizing data on Snowflake? Well, Dave, in the first instance, we profile the data itself at the data level, so not just at the level of metadata, and we do that wherever that data lives. So it could be structured data, could be semi-structured data, could be unstructured data, and that data could be on premise, it could be in the cloud, or it could be on some kind of SaaS platform. And so we profile this data at the source system that is feeding Snowflake within Snowflake itself, within the end applications and the reports that the Snowflake environment is serving. So what we've done here is take our machine learning discovery technology and make Snowflake itself the repository for knowledge and insights on data. And this is pretty unique. Automation in the form of RPA is being applied to the data both before, after, and within Snowflake. And so the ultimate outcome is that business users can have a much greater degree of confidence that the data they're using can be trusted. The other thing we do, which is unique, is employ data RPA to proactively detect and recommend fixes to data quality. So that removes the manual time and effort and cost it takes to fix those data quality issues if they're left unchecked and untouched. So that's key, two things there. The trust, nobody's going to use the data if it's not trusted, but also context. If you think about it, we've contextualized our operational systems, but not our analytics systems. So this is a big step forward. Glenn, I wonder if you could tell us how customers are managing data quality when they migrate to Snowflake because there's a lot of baggage in traditional data warehouses and data lakes and data hubs. Maybe you could talk about why this is a challenge for customers and like, for instance, can you proactively address some of those challenges that customers face? Yeah, we certainly can. They have data quality, legacy data sources are always inherent with DQ issues. Whether it's been master data management and data stewardship programs over the last really almost two decades right now, you do have systemic data issues. You have silo data. You have information operational data stores, data marts. It became a hodgepodge when organizations are starting their journey to migrate to the cloud. One of the things that we're first doing is that inspection of data. First and foremost, even looking to retire legacy data sources that aren't even used across the enterprise, but because they were part of the systemic long running operational on-premise technology, it stayed there. When we start to look at data pipelines as we onboard a customer, we want to do that error. We wanna do QA and quality assurance so that we can, and our ultimate goal, eliminate the garbage in, garbage out scenarios that we've been plagued with really over the last 40, 50 years of just data in general. So we have to take an inspection where traditionally it was ETL. Now in the world of Snowflake, it's really ELT. We're attracting, we're loading, we're inspecting, then we're transforming out to the business so that these routines could be done once and again, give great business value back to making decisions around the data instead of spending all this long time always re-architecting the data pipeline to serve the business. Got it. Thank you, Glenn. Now you said, of course, Snowflake's renowned for, I mean, customers tell me all the time, it's so easy. It's so easy to spin up a data warehouse. It helps with my security. Again, it simplifies everything. But so getting started is one thing, but then adoption is also a key. So I'm interested in the role that Io Tahoe plays in accelerating adoption for new customers. Absolutely, Dave. I mean, as Glenn said, every migration to Snowflake is going to have a business case and that is going to be partly about reducing spend and legacy IT servers, storage, licenses, support. All those good things that CIOs want to be able to turn off entirely, ultimately. And what Io Tahoe does is help discover all the legacy undocumented silos that have been built up, as Glenn says, on the data estate across a period of time, build intelligence around those silos and help reduce those legacy costs sooner by accelerating that whole process. Because obviously the quicker that IT and CDOs can turn off legacy data sources, the more funding and resources going to be available to them to manage the new Snowflake-based data estate on the cloud. And so turning off the old, building the new, go hand in hand to make sure those numbers stack up, the program is delivered and the benefits are delivered. And so what we're doing here with Io Tahoe is improving the customers ROI by accelerating their ability to adopt Snowflake. Great, and we're talking a lot about data quality here, but in a lot of ways that's table stick. Like I said, if you don't trust the data, nobody's going to use it. And Glenn, I mean, I look at Snowflake and I see obviously the ease of use, the simplicity, you guys are nailing that. The data sharing capabilities, I think are really exciting because everybody talks about sharing data, but then we talk about data as an asset and everybody wants to hold it. And so sharing is something that I see as a paradigm shift and you guys are enabling that. So what are the things beyond data quality that are notable that customers are excited about that maybe you're excited about? Dave, I think you just cleared it out. It's this massive data sharing play part of the data cloud platform. Just as of last year, we had a little over about 100 vendors in our data marketplace. That number today is well over 450 and it is all about democratizing and sharing data in a world that is no longer held back by FTPs and CSVs and then the organization having to take that data and ingest it into their systems. You're a Snowflake customer, want to subscribe to an SMP data source as an example, go subscribe it to it, it's in your account, there was no data engineering, there was no physical lift of data and that becomes the most important thing when we talk about getting broader insights, data quality, well, the data's already been inspected from your vendor, it's just available in your account. It's obviously a very simplistic thing to describe behind the scenes as what our founders have created to make it very, very easy for us to democratize not only internal with private sharing of data, but this notion of marketplace and sharing across your customers. Marketplace is certainly on the top of all of my customers' minds and probably some other areas that might have heard out of our recent Cloud Summit is the introduction of Snowpark and being able to do where all this data is going towards is MI and AL, along with our partners at IO Tahoe and RPA Automation is what do we do with all this data? How do we put the algorithms and targets? Now we'll be able to run in the future R and Python scripts and Java libraries directly inside Snowflake, which allows you to even accelerate even faster, which people found traditionally when we started off eight years ago just as a data warehousing platform. Yeah, I think we're in the cusp of just a new way of thinking about data. I mean, obviously simplicity is a starting point, but data by its very nature is decentralized. You talk about democratizing data. I like this idea of the global mesh. I mean, it's a very powerful concept. And again, it's early days, but a key part of this is automation and trust. Yusef, you've worked with Snowflake and you're bringing active DQ to the market. What are customers telling you so far? Well, Dave, I mean, the feedback so far has been great, which is brilliant. So I mean, firstly, there's a point about speed and acceleration. So that's the speed to insights really. So where you have inherent data quality issues, whether that's with data that was on-premise and being brought into Snowflake or on Snowflake itself, we're able to show the customer results and help them understand their data quality better within day one, which is a fantastic acceleration. Related to that, there's the cost and effort to get that insights is it's a massive productivity gain versus where you're seeing customers who've been struggling sometimes to remediate legacy data and legacy decisions that they've made over the past couple of decades. So that cost and effort is much lower than it would otherwise have been. Thirdly, there's confidence and trust. So you can see CDOs and CIOs got demonstrable results that they've been able to improve data quality across a whole bunch of use cases for business users in marketing, in customer services, for commercial teams, for financial teams. So there's that very quick kind of growth in confidence and credibility as the projects get moving. And then finally, I mean, really all the use cases for Snowflake depend on data quality, really, whether it's data science and the kind of Snowpark applications that Glenn has talked about. All those use cases work better when we're able to accelerate the ROI for our joint customers by very quickly pushing out these data quality insights. And I think one of the things that the Snowflake have recognized is that in order for CIOs to really adopt enterprise-wide, it's also, as well as the great technology that Snowflake offers, it's about cleaning up that legacy data estate, freeing up the budget for CIOs to spend it on the new modern data estate that lets them mobilize their data with Snowflake. So you're seeing this kind of progression of simplifying the analytics from a tech perspective. You bring in federated governance, which brings more trust, then you bring in the automation of the data quality piece, which is fundamental. And now you can really start to, as you guys were saying, democratize and scale and share data. Very powerful, guys. Thanks so much for coming on the program. Really appreciate your time. Yeah, thank you. I appreciate it as well. Tired of performing manual data quality reviews, dealing with data incidents and spending valuable time on manual data remediation? Re-establish trust in your data with IoTaho's data RPA technology. AI-driven digital workers packaged into a seamless user experience connect directly to all your data sources. They automate repetitive laborious tasks like data discovery, data cataloging, data mapping, data lineage, data enrichment, data de-duplication and data remediation. A specialized ActiveDQ digital worker provides continuous automated data quality assessments for data producers powered by machine learning to ensure data is fit for consumption by data consumers on your Snowflake data cloud. ActiveDQ alerts data producers with data anomalies detected, providing data consumers with continuous reporting of trends and analysis across your data quality KPIs. ActiveDQ then proactively generates recommendations for auto remediation to accelerate data quality improvements and reduce the cost and business impact of poor quality data. Ready to accelerate your data modernization journey to Snowflake's data cloud? Start with our low-cost, minimal effort data mobilization for Snowflake package and achieve key cloud migration and data quality improvement milestones in hours. Download the brief to learn more and book time with an Io-Tahoe engineer now. Okay, now we're going to look at the role automation plays in mobilizing your data on Snowflake. Let's welcome in Duncan Turnbull, who's partner sales engineer at Snowflake and AJ Vahora is back CEO of Io-Tahoe. He's going to share his insight. Gentlemen, welcome. Thank you, David. Good to be back. Yeah, it's great to have you back, AJ. And it's really good to see Io-Tahoe expanding the ecosystem, so important. Now, of course, bringing Snowflake in. It looks like you're really starting to build momentum. I mean, there's progress that we've seen every month, month by month over the past 12, 14 months. Your seed investors, they got to be happy. They are, they're happy and they can see that we're running into a nice phase of expansion here and your customers signing up and we're ready to go out and raise that next round of funding. I think maybe think of us like Snowflake five years ago. So we're definitely on track with that. I'm not of interest from investors and we're right now trying to focus in on those investors that can partner with us and understand AI data and automation. Well, so personally, I mean, you've managed a number of early stage VC funds and I think four of them, you've taken several software companies through many funding rounds and growth and all the way to exit. So, you know how it works. You get product market fit, you got to make sure you get your KPIs right and you got to hire the right sales people, but what's different this time around? Well, the fundamentals that you mentioned, those have never changed. And what we can see, what I can see that's different, that's shifted this time around is three things. One, in that there used to be this kind of choice of, do we go open source or do we get proprietary? Now that has turned into a nice hybrid model where we've really keyed into Red Hat doing something similar with Sentos. And the idea here is that there is a core capability, a technology that depends, a platform, but it's the ability to then build an ecosystem around that made up of a community. And that community may include customers, technology partners, other tech vendors and enabling the platform adoption so that all of those folks in that community can build and contribute while still maintaining the core architecture and platform integrity at the core of it. And that's one thing that's changed was seeing a lot of that type of software company emerge into that model, which is different from five years ago. And then leveraging the cloud, every cloud, Snowflake Cloud being one of them here, in order to make use of what customers and customers in enterprise software are moving towards. Every CIO is now in some configuration of a hybrid IT, we state whether that is cloud, multi-cloud, on-prem, that's just the reality. The other piece is in dealing with the CIO's legacy. So the past 15, 20 years, they've purchased many different platforms, technologies and some of those are still established and still fun. So how do you enable that CIO to make a purchase while still preserving, and in some cases, building on and extending the legacy material technologies that they've invested that people's time in training and financial investment into? Yeah, of course, you know, solving a problem, customer pain point with technology that never goes out of fashion. No, that never changes. You have to focus like a laser on that. And of course, speaking of companies who are focused on solving problems, Duncan Turnbull from Snowflake, you guys have really done a great job and really brilliantly addressing pain points, particularly around data warehousing, simplified that, you're providing this new capability around data sharing, really quite amazing. Duncan, AJ talks about data quality and customer pain points in enterprise IT. Why has data quality been such a problem historically? Sure, so one of the biggest challenges that's really affected by it in the past is that because to address everyone's need for using data, they've evolved all these kinds of different places to store it, all these different silos or data markets or all this kind of proliferation of places where data lives. And all of those end up with slightly different schedules for bringing data in and out. They end up with slightly different rules for transforming that data and formatting it and getting it ready in slightly different quality checks for making use of it. And this then becomes like a big problem in that these different teams are then gonna have slightly different or even radically different answers to the same kinds of questions, which makes it very hard for teams to work together on their different data problems that exist inside the business depending on which of these silos they end up looking at. And what you can do if you have a single kind of scalable system for putting all of your data into it, you can kind of sidestep a lot of this complexity and you can address the data quality issues in a single way. Now, of course, we're seeing this huge trend in the market towards robotic process automation, RPA. That adoption is accelerating. You see in UI paths, IPO, you know, 35 plus billion dollars valuation, snowflake like numbers, nice comps there for sure. AJ, you've coined the phrase data, RPA. What is that in simple terms? Yeah, I mean, it was born out of seeing how in our ecosystem across that community, developers and customers, general business users were wanting to adopt and deploy to our host technology. And we could see that, I mean, it's not marketing RPA, we're not trying to automate that piece, but wherever there is a process that was tied into some form of a manual overhead with handovers and so on, that process is something that we're able to automate without our host technology. And the employment of AI and machine learning technology, specifically to those data processes, almost as a precursor to getting into financial automation, that's really where we're seeing the momentum pick up, especially in the last six months. And we've kept it really simple with snowflake, we kind of stepped back and said, well, the resource that snowflake can leverage here is a metadata. So how could we turn snowflake into that repository of being the data catalog? And by the way, if you're a CIO looking to purchase the data catalog tool, stop, there's no need to. Working with snowflake with enable that intelligence to be gathered automatically and to be put to use within snowflake. So reducing that manual effort and putting that data to work. And that's where we've packaged this with our AI machine learning, specific to those data tasks. And it made sense. That's what's resonated with our customers. You know what's interesting here, just a quick aside is I've been watching snowflake now for a while. And of course, the competitors come out and maybe criticize, well, I don't have this feature. They don't have that feature. And it's snowflake seems to have an answer. And the answer oftentimes is, well, it's ecosystem. Ecosystem is going to bring that because we have a platform that's so easy to work with. So I'm interested Duncan in what kind of collaborations you are enabling with high quality data. And of course, your data sharing capability. Yeah, so I think the ability to work on datasets isn't just limited to inside the business itself or even between different business units. So we were kind of discussing maybe with those silos before when looking at this idea of collaboration, we have these challenges where we want to be able to exploit data to the greatest degree possible. But we need to maintain the security, the safety, the privacy and governance of that data. It could be quite valuable. It could be quite personal depending on the application involved. One of these novel applications that we see between organizations of data sharing is this idea of data clean rooms. And these data clean rooms are safe collaborative spaces which allow multiple companies or even divisions inside a company where they have particular privacy requirements to bring two or more datasets together for analysis, but without having to actually share the whole unprotected dataset with each other. And this lets you to, when you do this inside Snowflake you can collaborate using standard tool sets. You can use all of our SQL ecosystem. You can use all of the data science ecosystem that works with Snowflake. You can use all of the BI ecosystem that works with Snowflake. But you can do that in a way that keeps the confidentiality that needs to be preserved inside the data intact. And you can only really do these kinds of collaborations especially cross-organization but even inside large enterprises when you have good reliable data to work with otherwise your analysis just isn't gonna really work properly. A good example of this is one of our large gaming customers who's an advertiser. They were able to build targeting ads to acquire customers and measure the campaign impact and revenue but they were able to keep their data safe and secure while doing that while working with advertising partners. The business impact of that was they were able to get a lift of 20 to 25% in campaign effectiveness to better targeting and actually pull through into that of a reduction in customer acquisition costs because they just didn't have to spend as much on the forms of media that weren't working for them. So A.J., I wonder, I mean, with the way public policy is shaping out, obviously GDPR started it and the state's California Consumer Privacy Act and people are sort of taking the best of those and there's a lot of differentiation but what are you seeing just in terms of, governments really driving this move to privacy? Yeah, government, public sector, we're seeing a huge wake up and activity across the whole piece there. Part of it has been data privacy. The other part of it is being more joined up and more digital rather than paper or form-based. We've all got stories of waiting in line, holding a form, taking that form to the front of the line and handing it over a desk. Now, government and public sectors really looking to transform their services into being online digital self-service. And that whole shift is then driving the need to emulate a lot of what the commercial sector is doing to automate their processes and to unlock the data from silos to put through into those processes. Another thing that I can say about this is the need for data quality as Duncan mentions underpins all of these processes, government, pharmaceuticals, utilities, banking, insurance, the ability for a chief marketing officer to drive a loyalty campaign, the ability for a CFO to reconcile accounts at the end of the month to do a quick, accurate financial close. Also the ability of customer operations to make sure that the customer has the right details about themselves in the right application that they consult so from. All of that is underpinned by data and is effective or not based on the quality of that data. So whilst we're mobilizing data to the snowflake cloud, the ability to then drive analytics, prediction, business processes off that cloud succeeds or fails on the quality of that data. I mean, and you know, I would say, I mean, it really is table stakes. If you don't trust the data, you're not going to use the data. The problem is it always takes so long to get to the data quality. There's all these endless debates about it. So we've been doing a fair amount of work and thinking around this idea of decentralized data. Data by its very nature is decentralized, but the fault domains of traditional big data is that everything is just monolithic and the organization's monolithic, the technology's monolithic, the roles are very hyper-specialized. And so you're hearing a lot more these days about this notion of a data fabric or what Jamak Degani calls a data mesh. And we've kind of been leaning into that and the ability to connect various data capabilities, whether it's a data warehouse or a data hub or a data lake, that those assets are discoverable, they're shareable through APIs and they're governed on a federated basis and you're using now bringing in a machine intelligence to improve data quality. I wonder, Duncan, if you could talk a little bit about Snowflake's approach to this topic. Sure, so I'd say that making use of all of your data is the key kind of driver behind these ideas of data meshes or data fabrics. And the idea is that you want to bring together not just your kind of strategic data, but also your legacy data and everything that you have inside the enterprise. I think I'd also like to kind of expand upon what a lot of people view as all of the data. And I think that a lot of people kind of miss that there's this whole other world of data that they could be having access to, which is things like data from their business partners, their customers, their suppliers, and even stuff that's more in the public domain, whether that's demographic data or geographic or all these kinds of other types of data sources. And what I'd say to that to some extent is that the data cloud really facilitates the ability to share and gain access to this both kind of between organizations, inside organizations, and you don't have to make lots of copies of the data and kind of worry about the storage and this federated idea of governance and all these things that it's quite complex to kind of manage. This, the snowflake approach really enables you to share data with your ecosystem or the world without any latency, with full control over what's shared, without having to introduce new complexities or having complex interactions with APIs or software integration. The simple approach that we provide allows a relentless focus on creating the right data product to meet the challenges facing your business today. So A.J., the key here is, don't get stuck in my mind anyway, my take away is to simplicity. If you can take the complexity out of the equation, you're going to get more adoption. It really is that simple. Yeah, absolutely. And I think that that whole journey, maybe five, six years ago, the adoption of data lakes was a stepping stone. However, the Achilles heel there was the complexity that it shifted towards consuming that data from a data lake, whether many, many sets of data to be able to curate and to consume. Whereas actually the simplicity of being able to go to the data that you need to do your role, whether you're in tax compliance or in customer services is key. And listen, for snowflake via Tahoe, one thing we know for sure is that our customers are super smart and they're very capable, they're data savvy and they'll want to use whichever tool, embrace whichever cloud platform that is going to reduce the barriers to solving what's complex about that data, simplifying that and using good old fashioned SQL to access data and to build products from it to exploit that data. So simplicity is key to it to enable people to make use of that data and CIOs recognize that. So Duncan, the cloud obviously brought in this notion of DevOps and new methodologies and things like Agile, that's brought in the notion of data ops, which is a very hot topic right now. Basically DevOps applies to data, but how does snowflake think about this? How do you facilitate that methodology? Yeah, so I'd agree with you absolutely there. Data ops takes these ideas of agile development of agile delivery and of the kind of DevOps world that we've seen just rise and rise. And it applies them to the data pipeline, which is somewhere where it kind of traditionally hasn't happened. And it's the same kinds of messages as we see in the development world. It's about delivering faster development, having better repeatability and really getting towards that dream of the data-driven enterprise, where you can answer people's data questions, they can make better business decisions. And we have some really great architectural advantages that allow us to do things like allow cloning of data sets without having to copy them, allows us to do things like time travel so we can see what data look like at some point in the past. And this lets you kind of set up both your own kind of little data playpen as a clone without really having to copy all of that data so it's quick and easy. And you can also, again with our separation of storage and compute, you can provision your own virtual warehouse for dev usage so you're not interfering with anything to do with people's production usage of this data. So these ideas, this scalability, it just makes it easy to make changes, test them, see what the effect of those changes are. And we've actually seen this, you were talking a lot about partner ecosystems earlier. The partner ecosystem has taken these ideas that are inside Snowflake and they've extended them, they've integrated them with DevOps and DataOps tooling. So things like version control in Git and infrastructure automation and things like Terraform. And they've kind of built that out into more of a DataOps products that you can make use of. So we can see there's a huge impact of these ideas coming into the data world. We think we're really well placed to take advantage of them. The partner ecosystem has been doing a great job of doing that. And it really allows us to kind of change that operating model for data so that we don't have as much emphasis on like hierarchy and change windows and all these kinds of things that maybe views as a lot of fashioned. And we're kind of taking this shift from this batch data integration into streaming continuous data pipelines in the cloud. And this kind of gets you away from like a once a week or once a month change window if you're really unlucky to pushing changes in a much more rapid fashion as the needs of the business change. I mean, those hierarchical organizational structures when we apply those to big data that what it actually creates the silos. So if you're going to be a silo buster, which AJ I look at you guys as silo busters you've got to put data in the hands of the domain experts, the business people. They know what data they want. If they have to go through and beg and borrow for new data sets, et cetera. And so that's where automation becomes so key. And frankly, the technology should be an implementation detail, not the dictating factor. I wonder if you could comment on this. Yeah, absolutely. I think making the technologies more accessible to the general business users or those specialist business teams that's the key to unlocking this. And it's interesting to see as people move from organization to organization where they've had those experiences operating in hierarchical sense. I want to break free from that. I've been exposed to automation, continuous workflows. Change is continuous in IT. It's continuous in business. The market's continuously changing. So having that flow across the organization of work using key components such as GitHub and similar tools to direct process, Terraform to build in code into the process and automation and without Tahoe leveraging all the metadata from across those fragmented sources is good to see how those things are coming together and watching people move from organization to organization say, hey, okay, I've got a new start. I've got my first 100 days to impress my new manager. What kind of an impact can I bring to this? And quite often we're seeing that as, let me take away the good learnings from how to do it or how not to do it from my previous role. And this is an opportunity for me to bring in automation. And I'll give you an example, David. Recently started working with a client in financial services who's an asset manager. Managing financial assets that have grown over the course of the last 10 years through M&A. And each of those acquisitions are bought with its technical debts, its own set of data, have multiple CRM systems now, multiple databases, multiple bespoke, in-house created applications. And when the new CIO came in and had a look at this, he thought, well, yes, I wanna mobilize my data. Yes, I need to modernize my data state because my CEO is now looking at these crypto assets that are on the horizon. And the new funds that are emerging that's around digital assets and crypto assets. But in order to get to that, where absolutely data and depends there and is the core asset, cleaning up that legacy situation, mobilizing the relevant data into the Snowflake Cloud platform is where we're giving time back. That is now taking a few weeks, whereas that transition to mobilize that data to start with that new clean slate to build upon a new business as a digital crypto asset manager, as well as the legacy traditional financial assets, bonds, stocks, fixed income assets, you name it, is where we're starting to see a lot of innovation. Tons of innovation, I love the crypto examples, NFTs are exploding and that's the face that traditional banks are getting disrupted. And so I also love this notion of data RPA, especially because AJ have done a lot of work in the RPA space and what I would observe is that the early days of RPA, I call it paving the cow path, taking existing processes, applying scripts, letting software robots do its thing. And that was good because it reduced mundane tasks but really where it's evolved is a much broader automation agenda. People are discovering new ways to completely transform their processes. And I see a similar analogy for the data operating model. So I'm wondering, I wonder what you think about that and how a customer really gets started bringing this to their ecosystem, their data life cycles. Sure, yeah, step one is always the same. Always figuring out for the CIO, the chief data officer, what data do I have? And that's increasingly something that they want to automate. So we can help them there and do that automated data discovery, whether that is documents in a file share, backup archive in a relational data store in a mainframe, really quickly hydrating that and bringing that intelligence to the forefront of what do I have? And then it's the next step of, well, okay, now I want to continually monitor and curate that intelligence with the platform that I've chosen, let's say Snowflake, in order such that I can then build applications on top of that platform to serve my internal, external customers. And the automation around classifying data, reconciliation across different fragmented data silos, building those insights into Snowflake. As you'll see a little later on where we're talking about data quality, active DQ, allowing us to reconcile data from different sources as well as look at the integrity of that data to then go on to remediation. I want to harness and leverage techniques around traditional RPA. But to get to that stage, I need to fix the data. So remediating, publishing the data in Snowflake, allowing analysis to be performed in Snowflake, those are the key steps that we see and just shrinking that timeline into weeks, giving the organization that time back means they're spending more time on their customer and solving their customer's problem, which is where we want them to be. Well, I think this is the brilliance of Snowflake, actually, Duncan, I've talked to Ben-Waad Dajaviya about this and your other co-founders and it's really that focus on simplicity. So, I mean, you picked a good company to join, my opinion. I wonder, A.J., if you could talk about some of the industry sectors that are going to gain the most from data RPA. I mean, the traditional RPA, if I can use that term, you know, a lot of it was back office, a lot of financial, what are the practical applications where data RPA is going to impact businesses and the outcomes that we can expect? Yeah, so our drive is really to make that business general user's experience of RPA simpler and using no code to do that, where they've also chosen Snowflake to build their platform. They've got the combination then of using relatively simple scripting techniques such as SQL with our no code approach. And the answer to your question is, whichever sector is looking to mobilize their data? It seems like a cop-out, but to give you some specific examples, David, now in banking, where our customers are looking to modernize their banking systems and enable better customer experience through through applications and digital apps, that's where we're seeing a lot of traction in this approach to play RPA to data. Healthcare, where there is a huge amount of work to do to standardize data sets across providers, payers, patients, and it's an ongoing process there. But for retail, helping to build that immersive customer experience. So recommending next best actions, providing an experience that is gonna drive loyalty and retention, that's dependent on understanding what that customer's needs, intent are being able to provide them with the content or the offer at that point in time, all data dependent utilities. There's another one, great overlap there with Snowflake where helping utilities, telecoms, energy, water providers to build services on their data. And this is where the ecosystem just continues to expand. If we're helping our customers turn their data into services for their ecosystem, that's exciting. And nowhere more so exciting than insurance, which it always used to think back to when insurance used to be very dull and mundane. Actually, that's where we're seeing a huge amounts of innovation to create new flexible products that are priced to the day, to the situation. And risk models being adaptive when the data changes on events or circumstances. So across all those sectors that they're all mobilizing their data, they're all moving in some way or form to a multi-cloud setup with their IT. And I think with Snowflake and without Tahoe being able to accelerate that and make that journey simple and less complex is why we're found such a good partner here. All right, thanks for that. And thank you guys both. We got to leave it there. Really appreciate Duncan, you coming on and AJ, best of luck with the fundraising. We'll keep you posted. Thanks, David. All right, great. Okay, now let's take a look at a short video that's going to help you understand how to reduce the steps around your data ops. Let's watch. With legacy traditional data catalog software, it is commonly known that there are five steps to completing an enterprise data catalog. These steps are, number one, interviewing stakeholders again and again to obtain context and understand how data is consumed. Number two, manually creating the catalog by personally inspecting and finding and logging data sets where they're stored and how they are consumed. Number three, manually organizing and reorganizing categorization, classification and usage of each data element. Number four, connecting the dots by manually linking data elements with each other to show relationships and producing a business glossary of terms and governance policies that the organization should adhere to. Then comes the task of wrangling every data consumer together under these rules, training them on these policies and ensuring quality by repeatedly manually re-inspecting and checking data sources. The obvious problem with these steps is they do not serve an enterprise scale. Imagine thousands and thousands of data stores that have to be combed through and then cross referenced with dozens of employees working across different departments, often remotely and all manually. This leads to not only that, but you have more data coming through the pipeline all the time and constant changes to business systems. So most data professionals find themselves drowning in a proliferation of data with no feasible way to keep up with the task of data governance. Traditional data catalogs exacerbate these challenges by attempting to scale up resources with more people, implement an operating model for data governance that can only scale with manual effort and require expensive implementation for which there is no benefits realization for at least 12 months. Adoption by business users is slow, time-consuming and difficult to coordinate as many business users do not want to take on the manual effort. Naturally, most of their time has to be allocated to their primary role. This means high labor costs persist even after implementation. Even when you bring in external consultants to your team, there is no time for business users to attend meetings, manage the project and participate in adoption of the manual operating model. So how are businesses supposed to have the ability to make agile, data-driven decisions and achieve data modernization if their data governance operating model is manual and every step is managed as a waterfall plan? Turn five steps into one step with the IOTI platform. Get ahead by automating discovery, cataloging, mapping, enrichment, lineage and data quality assessment in one go. Data RPA built on advanced algorithms, machine learning and AI enables our digital workers to perform all five of the repetitive traditional manual steps so you don't have to get insights in hours, achieve results in days and complete projects accurately with less manual overhead. Generate a holistic view of data across all your system silos with a single version of the truth that you can trust. See what IOTI can do for your organization and sign up for our minimal cost, commitment-free data health check. Let us run our automated data discovery on key unmapped data silos and sources to give you a clear understanding of what's in your environment. Book time with an IOTI engineer now. Are you ready to see active DQ on Snowflake in action? Let's get into the show and tell and do the demo with me or T.G. Matthew, the data solutions engineer at IOTI. Also joining us is Patrick Zymet, data solutions engineer at IOTI and Centville Nitin Karpaya, who's the head of production engineering at IOTI. Patrick, over to you. Let's see it. Hey Dave, thanks so much. Yeah, we've seen a huge increase in the number of organizations interested in the Snowflake implementation who are looking for an innovative, precise and timely method to ingest their data into Snowflake. And where we are seeing a lot of success is a ground up method utilizing both IOTI and Snowflake. To start, you define your as is model by leveraging IOTI to profile your various data sources and push the metadata to Snowflake, meaning we create a data catalog within Snowflake for a centralized location to document items such as sources to owners, allowing you to have those key conversations and understand the data's lineage, potential blockers and what data is readily available for ingestion. Once the data catalog is built, you have a much more dynamic strategy surrounding your Snowflake ingestion. And what's great is that while you're working through those key conversations, IOTI will maintain that metadata push and partnered with Snowflake's ability to version the data, you can easily incorporate potential schema changes along the way, making sure that the information that you're working on stays as current as the systems that you're hoping to integrate with Snowflake. Nice, Patrick, I wonder if you can address how the IOTI platform scales and maybe in what way it provides a competitive advantage for customers. Great question. Where IOTI shines is through its active DQ or the ability to monitor your data's quality in real time, marking which rows need remediation according to the customized business rules you can set, ensuring that the data quality standards meet the requirements of your organizations. What's great is through our use of RPA, we can scale with an organization. So as you ingest more data sources, we can allocate more robotic workers, meaning the results will continue to be delivered in the same timely fashion you've grown used to. What's more, since IOTI is doing the heavy lifting on monitoring data quality, that frees up your data experts to focus on the more strategic tasks, such as remediations, data augmentations, and analytics developments. Okay, so maybe, TG, you could address this. I mean, how does all this automation change the operating model? We were talking to AJ and Duncan before about that. I mean, if it involves less people and more automation, what else can I do in parallel? You see, Dave, I'm sure the participants today will also be asking the same question. Let me start with the strategic tasks Patrick mentioned. IOTI does the heavy lifting, freeing up data experts to act upon the data events generated by IOTI. Companies that have teams focused on manually building their inventory of the data landscape leads to longer turnaround times in producing actionable insights from their own data assets, thus diminishing the value realized by traditional methods. However, our operating model involves profiling and remediating at the same time, creating a catalog data estate that can be used by business or IOTI accordingly. With increased automation and fewer people, our machine learning algorithms augment the data pipeline to tag and capture the data elements into a comprehensive data catalog. As IOTI automatically catalogs the data estate in a centralized view, the data experts can parallely focus on remediating the data events generated from validating against business rules. We envision that data events coupled with this drillable and searchable view will be a comprehensive one to assess the impact of bad quality data. Let's briefly look at the image on screen. For example, the view indicates that bad quality zip code data impacts the contact data, which in turn impacts other related entities and systems. Now contrast that with a manually maintained spreadsheet that drowns out the main focus of your analysis. Tiji, how do you tag and capture bad quality data and stop that from, you've mentioned these different dependencies, how do you stop that from flowing downstream into the processes within the applications or reports? As IOTI builds the data catalog across source systems, we tag the elements that meet the business rule criteria while segregating the failed data examples associated with the elements that fall below a certain threshold. The elements that meet the business rule criteria are tagged to be searchable, thus providing an easy way to identify data elements that may flow through the system. The segregated data examples on the other hand are used by data experts to triage for the root cause. Based on the root cause potential outcomes could be, one, changes in the source system to prevent bad data from entering the system in the first place, two, add data pipeline logic to sanitize bad data from being consumed by downstream applications and reports, or just accept the risk of storing bad data and address it when it meets a certain threshold. However, Dave, as for your question about preventing bad quality data from flowing into the system, IOTI will not prevent it because the controls of data flowing between systems is managed outside of IOTI, although IOTI will alert and notify the data experts to events that indicate bad data has entered the monitored assets. Also, we have redesigned our product to be modular and extensible. This allows data events generated by IOTI to be consumed by any system that wants to control the targets for bad data. Thus, IOTI empowers the data experts to control the bad data from flowing into their system. Thank you for that. I mean, one of the things that we've noticed, we've written about, is that you've got these hyper-specialized roles within the data, the centralized data organization, and I wonder how do the data folks get involved here, if at all, and how frequently do they get involved? Maybe you can take that. Well, based on whether the data element in question is in data cataloging or monitoring phase, different data folks gets involved. When it isn't the data cataloging stage, the data governance team, along with Enterprise Architecture or IT, involved in setting up the data catalog, which includes identifying the critical data elements, business term identification, definition documentation, data quality rules and data event setup, data domain and business line mapping, lineage, PIA tagging, source of truth, so on and so forth. These are typically in one-time setup, review, certify, then govern and monitor. But while when it isn't the monitoring phase, during any data incident or data issues, IOTA who broadcast data signals to the relevant data folks to act and remediate as quick as possible and alerts the consumption team, it could be the data science, analytics, business ops, about the potential issue so that they are aware and take necessary preventive measure. Let me show you an example of critical data element from data quality dashboard view to lineage view to data 360 degree view for a zip code for conformity check. So in this case, the zip code did not need the past threshold during the technical data quality check and was identified as non-compliant item and notification was sent to the IT folks. So clicking on the zip code will take to the lineage view to visualize the dependent system such that who produces and who all of the consumers. And further drilling down will take us to the detail view where a lot of other information are presented to facilitate for a rootcast analysis and to take it to a final closure. Thank you for that. So, TJ Patrick was talking about the as is to the 2B. So I'm interested in how it's done now, versus before, do you need a data governance operating model, for example? Typically, a company that decides to make an inventory of their data assets would start out by manually building a spreadsheet managed by data experts of the company. What started as a draft now gets baked into the model of the company. This leads to loss of collaboration as each department makes a copy of their catalog for their specific needs. This decentralized approach leads to loss of uniformity, which each department having different definitions, which ironically needs a governance model for the data catalog itself. And as the spreadsheet grows in complexity, the skill level needed to maintain it also increases, thus leading to fewer and fewer people knowing how to maintain it. About all, the content that took so much time and effort to build is not searchable outside of that spreadsheet document. Yeah, I think you really hit the nail in the head, TJ. Now, companies want to move away from the spreadsheet approach. IoTaho addresses the shortcoming of the traditional approach, enabling companies to achieve more with less. You know, I'm interested in what the customer reaction has been. We had Webster Bank on one of the early episodes, for example, I mean, could they have achieved what they did without something like active data quality and automation, maybe Centel, Nitin, you could address that. Sure, it is impossible to achieve full data quality monitoring and remediation without automation or digital workers in place. Reality that enterprise, they don't have the time to do the remediation manually because they have to do an analysis, confirm, fix, and any data quality issues as fast as possible before it gets bigger and no exception to Webster. That's why Webster implemented IoTaho's active DQ to set up the business metadata management and data quality monitoring and remediation in the Snowflake cloud data lake. We helped on building the center of excellence in the data governance, which is managing the data catalog, scheduled, on-demand and in-flight data quality checks where Snowflake, Snowpipe, and Stream are super beneficial to achieve in-flight quality checks. Then the data exception monitoring and reporting. Last but not the least, the time saver is persisting the non-compliant records for every data quality run within the Snowflake cloud along with remediation script so that during any exceptions, the respective team members is not only alerted but also supplied with necessary scripts and tools to perform remediation right from the IoTaho's active DQ. Very nice. Okay, guys, thanks for the demo. Great stuff. Now, if you want to learn more about the IoTaho platform and how you can accelerate your adoption of Snowflake, book some time with a data RPA expert. All you got to do is click on the demo icon on the right of your screen and set a meeting. We appreciate you attending this latest episode of the IoTaho data automation series. Look, if you missed any of the content that's all available on-demand, this is Dave Vellante for theCUBE. Thanks for watching.