 Hello and welcome. My name is Shannon Kampen. I'm the Chief Digital Officer of Data Diversity. We would like to thank you for joining the latest installment of the Monthly Data Diversity Webinar Series, Advanced Analytics with William McKnight. Today William will be discussing data integration, News Flash, where we still just moved data. Still answered this month by Informatica. Just a couple of points to get us started. Due to the large number of people that attend these sessions, you will be muted during the webinars. For questions, we will be collecting them by the Q&A section. If you'd like to chat with us or with each other, we certainly encourage you to do so. And to open the Q&A panel or the chat panel, you will find those icons on the bottom of your screen for those features. And just to note, the chat defaults are just the panelists, but you may absolutely change that to network with everyone. As always, we will follow up, send a follow up email within two business days containing links to the slides, the recording of the session, and any additional information requested throughout the webinar. Now let me turn it over to Preetam from Informatica for a brief word from our sponsor. Preetam, hello and welcome. Hey, thanks, Shannon. I hope you can hear me clearly. You sound good. Awesome. Hello, everyone. Thank you for joining this webinar. My name is Preetam Kumar, and I'm the Director of Broad Marketing at Informatica for the data integration and data engineering business. And today I would be presenting to you on a very key topic. We all know about AI, generative AI, but we don't talk about how data engineering is critical for AI and how AI needs data engineering and data engineering needs AI, either both the sides. Okay, so I don't want to waste any more time. I just want to get started. So we know that the AI market has been growing exponentially. It has been growing at a pretty brisk pace, as you can see here, almost at 19.1% CAGR. It's going to touch by 2027, almost 300 billion dollars of market overall, the entire AI market. And if you look at the generative AI, which is basically a subset of overall AI, that is growing exponentially, like at 35%, it is expected to grow between 2023 and 2027. And it's obviously no brainer with the invent of chat GPT and every organization you can look at it, is trying to infuse AI into the business processes. And so are the tech companies. You see clear co-pilot, I mean the co-pilot AI co-pilot stuffs. A lot of the companies are infusing that it has been existing, but now it has come to the mainstream now. And of course the NLP feature, okay, building LLM models and all other stuffs. So huge adoption we see when we talk to our customers, everyone is really excited. And this time it's not about that AI project is just like it will take five years, six years. It's they want to do it immediately. And so are we at Informatica. So I was talking about generative AI. So we saw about how AI has been there for the last 10 years. There's a lot of being buzzed around AI, machine learning, then there is deep learning. And now with the emergence of generative AI, this has actually accelerated. So you can see here the current state of AI strategy and the updated AI strategy post the introduction of generative AI. There is a huge shift in terms of vision. Customers are looking to augment their workforce by introducing AI in marketing, sales, technology, development, everywhere you look at it. The roadmap has reduced from three years to one year. Now it's not like good to have. It is a must have thing. And of course the use cases have also evolved significantly. Typically the AI use cases used to be advanced analytics, creative analytics, maybe prescriptive analytics in some cases, but that's what you typically see. Now it has gone into generating artifacts, whether it is videos, audios, coding and everything. And governance has become more important because now you're talking about responsible AI. We talk about AI hallucinations and all this kind of terminologies right now. So governance has become extremely critical. Otherwise, we all know there are many examples when an AI went wrong and it can have a catastrophic effect on the business. And also in terms of talent, it's not that AI will take our jobs. AI will help us evolve more in our jobs. And AI needs data engineering because without having a strong data foundation, you cannot have a successful AI model. You need to have high quality trusted data. Otherwise it would be garbage in and a garbage out kind of situation. And the second thing I was talking about is you need AI for data engineering. And data engineering also needs AI because you want to automate your data pipeline. There are so many resources and targets you are dealing with. You can't do it manually. You need low code, no code, augmented solution to do that. And this is typically how Informatica is integrated into the AI journey for data engineering for AI, right from data ingestion to integration, to applying data quality, making sure that you build the model, deploy the model, and you help you monitor the model so that you can detect anomalies and act on it accordingly. And here are a couple of examples of our customers who have used it. Spark Cognition built their Darwin data science platform based on Informatica's AI-powered Merida Intelligent Data Management Cloud, which has got from ingestion to integration to quality to governance. And also you have MDM and data quality, all powered by our AI-powered Merida Engine Clare, which is our AI engine. And of course, there are many examples of some global banks. In fact, Informatica, IT also, they have used our MLops solution, which is called model serve, to operationalize the AI ML model to put AI into action. And what differentiates us is that we have been here for 30 years. We're the best data management product. If you look at right from the on-prem date, now we are completely pivoted to cloud with our end-to-end capabilities with unprecedented innovation. Only data management platform that has got breadth and depth of capability powered by Clare and it can deploy it in any platform, whether it's cloud, on-prem, or hybrid model. And the two solutions which you are talking about, we have Informatica for Gen AI, which where you need the platform that provides the holistic, high-quality, governed data. And then of course, we have Genitive AI for Informatica, which helps data engineers and data professionals to automate their data engineering tasks using Genitive AI capabilities. So this is a call to action. We are calling the Clare GPT, Genitive AI solution, which uses text-based interface, NLP interface, wherein you can build your data pipeline by just typing, I want to ingest data from Oracle to Snowflake. Can you help us build me a data pipeline? And it uses our 50,000 petabytes of metadata, which is there in our database to access those information and provide the right recommendation for you. So I will strongly suggest, after this call, to scan it and start using and give us your feedback. So with that, I would like to pass it on to Shannon for the next session. Thank you. Thank you so much for kicking us off. And thanks to Informatica for sponsoring today's webinar and helping to make these webinars happen. If you have questions for PJ and FieldField, just submit them in the Q&A section of your screen, as he will be joining us in the Q&A at the end of the webinar. Now, let me introduce to you our speaker for the series, William McKnight. William has advised many of the world's best known organizations. His strategies form the information management plan for leading companies in numerous industries. He is a prolific author and a popular keynote speaker and trainer. He has performed dozens of benchmarks on leading database, data lake streaming, and data integration products. And with that, I'll go to Florida William to get his presentation started. Hello and welcome. Hello, Shannon. Thank you, Shannon. Thank you, Pre-Tam. Welcome, everybody. We have an important topic to talk about today, something that we spend a lot of our time on as data architects, and that is data integration. Data integration has evolved from the last, in the last five years especially, and it has a lot more evolution to have happened here in the next few years. Pre-Tam just reminded us of the importance of artificial intelligence. I think it's really important. I think it's going to be the thing that if utilized best, whichever vendor, let me say it again, whichever vendor utilizes AI the best in their products is probably going to come out the winner. It's that important no matter of the foundation that we're starting from. So keep an eye on that. I know that a lot of products are undergoing an AI makeover, if you will. And when it comes to data integration, we do depend more and more on products. Yes, there is still some code bound solutions out there. I'm not going to acknowledge them too much in this presentation because I see a lot more going the way of tools, and I'm all for it, by the way. So that's like 99% what I recommend in situations. So we're still just moving data. I had to put News Flash on there though, because it's really not that sexy. It's moving data from one platform to another that it happens to form in Platform A, and it needs to be in Platform B. It sounds so easy. And it's getting easier and easier to do from a data architect perspective, but the plumbing behind the scenes has gotten more complex. Think about your own environment five years ago and how much more complicated it probably is today. And as a result, when you want to do something like, oh, let me find the data that I need to move. There's a lot more places to look. There's a lot more things to consider. And if you don't have a data catalog, for example, in place, it's just that much harder. But if you do, it's as simple as it ever was. And it's actually probably even simpler. And it's going to get simpler like I mentioned. We still do mapping. We still have to do mapping, but increasingly the tools are saying, you don't have to do that anymore. We'll do that for you. And there are different ways that the tool sets of today are taking cycles out. And we'll get into some of that. So just moving traditional data point to point, nothing fancy. But you want the tool to manage all of the other things. It's more important than ever to use the tool so that those other things are managed, which I'll get into. And really, like I said, this is the thing that we spend the most time on as data architects and in any data environment, data integration. So I like to say data architecture is data integration. And these are some of the clients that I've learned from over the years. I like to say I'm just an aggregator of all the good ideas that I learned from my clients. So to date, I have, I think, 423 good ideas for you as we go forward. So why are there so many data stores? Yes, maybe in the future, maybe in a future that I'm starting to see starting to come into focus a little bit. We might have data lakes with multimodal capabilities and more aggregation going on in that single platform. It didn't happen with the whole data warehouse revolution, although I'm not knocking it because it did centralize and simplify some environments. And truth be told, every environment out there could use some of this today. Some more organization, some more architecture. Those things seem to get the short shrift. As a matter of fact, some clients have told me that we have so many data warehouses and data marts and data lakes that it's so confusing that when I have a new need, it's easier just to build a new one than it is to go sorting through all the existing data platforms to see if we already have it somewhere. And that is kind of crazy if you think about it. So over time, data is going to move from A to B all over the place inside of organizations. As a matter of fact, most of you will look at this, which I'm trying to say is kind of complicated, maybe, and say, well, I wish my environment were that simple. Yes, things change over time and so on. And over the, across the top there, you see all the different parties that are interested in what's going on in the data architecture. So not many businesses have a good handle on this, and the cost of it is very high. I don't think very many companies out there would blink if I reference spaghetti architecture because they totally relate. In the beginning, you know where a few things are. It's simple, but then you just add more and more. And we have to keep pushing forward. In fact, the acceleration of progress is probably at its peak today in terms of all time. And I don't see it slowing down at all. And I said this like five years ago, and it's still true. And now we have the AI overlay on everything that we do. So you have multiple data lakes competing internally, but part of what we need to do is to organize this mess. And raise the foundation of our company so that it can keep going and not become so inefficient that it just grinds to a halt and things become really complex. And the answer is not always throwing more people at it. I know we'd like to do that, but there are limits to that. And there are TCO ramifications of doing that. So what I like to say is nobody's going to say, William, organize this like you have a blank sheet of paper. Okay, they're not going to do that with you either, but I like to have contained expansion of the data environment. So why do we have so many data stores? First of all, it's okay, but there are, and there are good reasons. And well, if you, if that is you today, you have many, many data stores, maybe they're overlapping and so on. So what, that's where you are. That's where you have landed. You can only go forward from there. And so price performance, I'm going to call that out as number one, the number one reason why we have different platforms. Anybody who says, oh, they're all the same. All databases are the same is wrong. When you have a mission critical application or anything close to it, you should choose your platform. And that is not withstanding any enterprise agreements that you have. I have benchmark, for example, databases. I have benchmarked more databases than anybody else and they are different. And I'm going through the things here today where they are different and where you hopefully care about these things, especially as we get into larger and larger amounts of data. So keep in mind though, it is price performance. We all have to acknowledge that we can't have unlimited budgets thrown at great performance, not that that is even possible, but there has to be a price consideration in the mix. And that's why when I do performance benchmarks for a client, I always bring in the price part of it. And we always look to see what are we paying for this great or otherwise performance, cost predictability and transparency in environments that have had some, shall we say, spending challenges, some budget challenges, what have you, cost predictability and transparency is something that those companies overweight into as a selection criteria for platforms. Now, I encourage everybody to consider all of these considerations, but let's face it, some are going to be more important than others in certain situations. Cost predictability and transparency has become pretty high on the list of what people are concerned about when they get into a platform. And for example, for example, you want to get into the right platform, right, you say, okay, AWS has 822,000 options. There are 392 VM sizes, six storage options on demand versus spot versus reserve, three tendency options and 25 regions. 822,000 is probably grown since then that was as of a couple months ago. It keeps growing, right, you want to get into the right one. Now, you have like one in 822,000 shot at Azure has 797,720 Google Cloud has 120,000 options for you. So it's not that easy. And you're constantly weighing out capacity versus usage and consumption. Now, other shops have found themselves in situations where the administration cost of the environment is so high. That's a key factor at the moment. And so they're going to lean into those environments that have low administration. And administration is cost too, by the way, it's just more or less people cost. The database should provide a single point of control to simplify some system administration tasks. Even though we look to modernize the database with easy administration, the platform should provide tuning capabilities should you need them. So that's a kind of a religious debate, right, whether the platform should give you a lot of knobs or not, that would be another criteria that people use to consider which platform to get into. The optimizer is another one. And this is related to performance. But some people have that foresight to think about, well, my queries are only getting more and more complex. I'm not even writing SQL queries anymore. They're being written by a machine. What I'm doing is asking the database what I want. And it's generating some pretty hairy queries under the scenes because I can certainly think of a lot more than I can create an SQL for. Well, SQL is kind of going by the wayside. I don't know if you've noticed, but a lot of the interfaces that I'm seeing coming out in the market are definitely language based. And that is generating more complex SQL. Keep your SQL skills going out there, everybody. Because you're going to still need them. But our end users may not. OK, but back to this, the optimizer. These are some of the things that we live for in a great optimizer, conditional parallelism. And what are the causes? And what the causes are of variations in the parallelism deployed, dynamic and controllable prioritization of resources, the time required to do the whole optimization routine, and the other things that are there to support a query in its performance, like indexes. That's what indexes are for, period. Or updating statistics. That's what statistics are for, creating better paths, workload isolation capabilities. These are some of the things that we live for in an optimizer. And sometimes that becomes the guiding factor in a platform selection. Concurrency, all of what I've said before is great. But if it doesn't scale to the concurrency level that you're going to be throwing at it, and some of you have hundreds of users that might hit a platform at the same time, doing these more and more complex queries that I talked about. So this is really something that becomes hard to benchmark for by somebody who's not accustomed to doing that kind of a benchmark. And so sometimes this gets left out when you're evaluating a platform, but it really shouldn't, especially if you have, say, five plus levels of concurrency on this platform that you're considering. Resource elasticity. This is for those environments that, well, the project might go gangbusters, or it might not. And when it does, it's just going to move fast, up and down. So I need resource elasticity. So this is another big consideration when selecting a platform out there and leads to more and more platforms in the mix. And today we must consider the machine learning capabilities of the platform, and some are more committed than others to the machine learning future. I think it's inevitable. I think it's great when you're thinking this way. I think this is a high consideration for a platform. What is the build-in machine learning functionality? Some databases offer build-in machine learning libraries and algorithms, potentially simplifying model training and deployment. And also in this regard is your integration with machine learning frameworks. And sometimes shops have data scientists that are aligned with certain frameworks, like TensorFlow or PyTorch, ensure that your chosen database seamlessly integrates with your preferred machine learning frameworks. And so that could lead to a selection right there. And finally, data storage format alternatives, unstructured data, semi-structured data. Some of you have more of that than others. Some data warehouses are more of the unstructured data variety now than they used to be. And then any other kind of data, really. So we might be talking about Apache Ork, Apache Parquet, JSON, Apache Avro, et cetera. Modern databases need to be able to analyze that data without moving or altering it as it is. And so that requires some capabilities on the part of the platform. They are all over the place in terms of those capabilities. So that is a consideration that has led to a lot of the challenges of today's environment, complexity, context, and cost. So systems largely are designed independently, each for its own purpose, with little regard sometimes to what came before or after. And I've cited some reasons why that is and why that happens. The complexity of the environment is keeping users from taking full advantage of them. Users don't understand where the data they need are. Once they find the data, they don't know how to fit it with data from other sources. And once they are shown how it fits together, they don't understand, nor could they be expected to understand how to join it in a reasonable way. So we end up with this platform escalation. Hopefully we're managing it. But there is no doubt that today we sit at the place in history where there are more reasonable platforms in a shop than ever before. And it'll probably start turning at some point. But in the meantime, we have all these platforms that have finite needs. And that's really the bottom line. Unless it's a multi-purpose data warehouse or multi-purpose data lake, not too many of them have been built yet. We have finite purpose data stores out there. This is a real challenge. So we move data. I had a client tell me this week, and these are the words he used. We need data on three levels. And he was talking about Oracle Service Cloud, Salesforce Data Cloud, and the data warehouse. Well, there you go. Who's to argue that, oh, no, you don't. You don't need that data in Oracle Service Cloud. You do or you don't, right? In order to produce the service out of the organization that's required. Same thing for Salesforce Data Cloud. Oh, we don't need that data there. Well, if you do for better marketing purposes, you do. And the data warehouse, well, it's running 20 reports for the company. So there you go. You need data in a lot of places. And you just simply want to move it and move it in a timely fashion with quality and so on. So we need to look at capabilities in our data integration tool. Now I'm not getting so much into streaming. I'll get into that later. I actually, in my mind, I've decided that data integration is separate from data streaming because the data sets that they work on are very different. And I'll call this out a little bit later when we get into streaming. I'm also not going into change data capture tools too much. If you have low transformation requirements, you might consider change data capture. But please hold that CDC tool to the same standards that I'm going to give you right now for data integration. And when you do that, you find that a lot of times they're lacking something that you really do need. Now, maybe you're not transforming a lot of the data, but you need a lot of these other things. And when you need them, you need them. So let's talk about it. Now, unfortunately, my enterprise contribution or ranking report on the data integration industry is not published yet. I just put this out there to mention that I have recently gone through, and I'll move to the next slide, these 10 vendors who I consider the top 10 vendors to do the criteria that I'm about to share with you. I've recently gone through each of these 10 vendors products and taken briefings and worked with them and considered all these things about them. So a lot of the information I'm sharing with you is very up to date and relevant. So I'm bringing you what the industry really provides today. These are listed in an alphabetic order. Hopefully this thing will be published soon. All right. One of the things is comprehensive native connectivity and multi-latency data ingestion. You have some alphabets through it. I threw the two together because they're kind of related, right? Native connectivity empowers businesses to manage their data effectively and leverage it for advanced operations in the dynamic data landscape. There is one critical area, creating custom connectors. Creating custom connectors. You want connectors for all of the data store types that you're going to be connected, right? And that can be hard or easy. Some will be very helpful to customers who need to create, test, and refine custom connectors in a low-code or no-code manner. And others provide you with, and some do both, right? Provide you with a lot of great custom connectors out of the box. Maybe that's not custom, I don't know. A lot of connectors out of the box. So there are diverse ways to handle this data source onboarding. As I talked about, change it to capture streaming and good old data integration, which is what we're really talking about here. So the connectors should include cloud, lake houses, data warehouses, data lakes, files, databases, all the databases, Databricks, Snowflake, Teradata, BI tools like Tableau, and so on. Anywhere there's a data store, and everybody seems to want their own data store, right? All these products do. So it's not easy to train an enterprise product on a data store that you already have. You pretty much throw your hands up after trying that a few times and say, well, okay, what do you need? What's your schema look like? And let's feed it. And that drives the complexity. Data transformation, yeah. Data transformation includes SQL queries, data pipelines, or code-based transformations, like ETL, ELT, and streaming. It should provide a user-friendly visual environment to do this. To empower non-technical users, tools resemble spreadsheet-like interfaces now for data manipulation and pipeline creation. So there is a vast array of transformations. That are possible. Not all data integrations have data transformations. Less have it than I ever thought would have it. Most of the time the problem becomes, I just need to move data from A to B, right? Well, not so fast, all right? Let's think about data quality, make sure the data is fit for purpose, and so on. And sometimes the source data has issues that just simply are not going to get fixed. That's not your domain. You don't fix SAP. You work on the data warehouse that is fed by SAP. And if you identify problem data in that movement, you don't just move it and say, well, just have at it, we know what's wrong, you do something about that. And you make sure that the data is fit for purpose. And there in comes many data transformations. These include aggregations, expressions, normalizer, rank, filter, joiner, search, sort of stored procedures, update strategies, routers, XML source qualifiers, sequence generators, and more. And these are available in some of the best products out there. And data format support should include Avro, Parquet, JSON, Iceberg, and Delta today as well. So just because it's in a different format doesn't mean it shouldn't be transformed if that's what is required. Data security and access control means a lot of things. And this is something that you just want the tool to handle. And there are things in here that need to be handled at data integration level security, not at the project level. At the project level, sometimes there's an overlay from enterprise data security. They're not going to necessarily relate to, let's say, audit trails or data in transit encryption and things like this. So there should be some standards around how data integration is used in the enterprise. And most of those standards are going to be about data security and access control. This should definitely be an overlay on everything we do. So lean into that. Data quality and data governance. Yeah, I talked a little bit about data quality, changing data, making it more applicable. Data quality capabilities are wildly variant in these tools. Some of them, some of the vendors behind them, hope that you don't really care about data quality. Maybe you don't. Maybe you should. You definitely should. You should bring data quality up to a level that makes the data fit for purpose. You don't have to go beyond that, but you don't have to get it to 100%, but you do have to get it to where it's fit for purpose. It's never good enough as a data integration professional or someone presiding over any data integration to say, well, garbage in, garbage out. How many times have we heard that? That's not good enough. If it's garbage in, we should know it and fix it. So it's not garbage out. And again, we have all those data integration rules at our disposal. I will add though, that as we move more into streaming data, there's less concern about this. There's really less opportunity in most streaming data for there to be errors. I know that sounds like heresy to say, but it's a pretty fixed format, pretty small data. And the volume is so incredibly high. If there is an error here or there, it usually just gets buried in the volume of that kind of data. So we don't tend to do a whole lot of data quality to that data, but we do some. And we try to really more or less make sure that that is systemically good data. Now workflow automation and orchestration, this is the process of building the pipeline. This we want to be pretty easy to do. There are a lot of benefits from great workflow, improved efficiency and reduced manual effort, simplified data management and smoother operations, enhanced real time responsiveness to data changes and timely data processing for improved decision making. You want a visual in interface, like something like a visual basic form where users can construct a process flow for data extraction, transformation and loading by dragging and dropping various items. And maybe at some point in the future, this will even become automatic and we'll be able to give our data integration tools higher levels of goals and have it fill in. It's getting close, but in the meantime, we're doing workflow and we want it to be great. Analytics, automation and AI, which this is where you want to be able to speak to the pipeline, if you will. And vice versa, have the pipeline speak to you and the more abstraction that can be done in this process, the more accessible it is to anyone and everyone to do. And that's sort of a goal of the industry right now. Democratizing data-driven decisions, future-proofing with AI, you definitely want to look for a lot of AI. A commitment to AI is essential in all tools that I recommend, for example. And I want to know what that commitment looks like and if it's moving in a direction that's going to benefit clients down the road, because let's face it, these vendors are making these decisions sometimes in a vacuum. Yes, they have their input from customers, but that could be sketchy. And so hopefully there's a great person, how shall I say, grand wizard behind the curtain in the Wizard of Oz, I'm referring to that. The great grand wizard that's making these great decisions for AI. That's pretty much what it comes down to today. So keep an eye on that. You want analytics flowing out of your pipelines and you want those analytics to be relevant and actionable. Seamless integration with BI and analytics tools. You might say, hmm, what do you mean by that? Because we're just landing the data in point B, right, in another platform. And then the BI tool picks up on it there. I don't really need to know. It's the data architect. Well, you do because you want to deliver the data. Number one is you're going to actually move a lot of the data to real BI tools. BI tools have their own data stores. You're going to be delivering that data, but you also want to deliver data to or in a format that can be understood and worked with by the tools that are going to sit on top of that platform. And so, yes, some knowledge about that is important. Again, a lot of this is being thought of by AI, but it's still important to think about today. Data cataloging and metadata management. Yes, this helps us find the right data, move the right data, understand the quality of that data, understand the profile of that data, and point out a few things in this area. Universal metadata connectivity. This gives you a thorough and detailed data lineage, which is ensured by this approach and very much interesting in a lot of shops today, especially those who are undergoing a lot of regulation. Utilizing sensor metadata is another one I have to call out. The administration of metadata is streamlined by this level of automation. We also have on here data relationship identification. I had to squint there a moment, sorry. And this is making educated decisions made easier with this kind of knowledge. Then pipeline creation, we talk a little bit about that. Analytics and data exploration are made easier by that capacity. And suggestions and search results ranking. This features streamlines the process of ingesting data into cloud data lakes or warehouses. Another consideration, enterprise scaling with performance. So performance is kind of interesting at low levels when you can tolerate anything and everything. But when you get into mission critical and you get into data volumes where it could impact performance, you definitely want to test what you're going to do at the level that you really care about. And that's hard sometimes. That might require a lot of data. And you want a data integration solution though with a speed to keep up with source data. Your source data is going to grow probably. Power timely decision making and not have to be re-engineered or replaced as the application becomes more successful. And as data volumes grow naturally, if the pipeline is successful, data volumes will grow. And I always say, we're building this as if we're going to be successful. What other way would you build it? You don't build it kind of with hedging in mind. Well, let me just cut this corner because I'm not committing 100%. No, we're going to commit. We'll build a strong pipeline. We're going to expect it to scale. And you can figure out for yourself what level you want that thing to scale to. Your platform must be able to scale horizontally or vertically to accommodate increasing data volumes and the evolving demands of systems and users. What about ecosystem compatibility and platform versatility? It should be reliable. The compatibility of a solution with diverse ecosystem holds significant importance today. When you were talking about a lot of applications are hybrid cloud or multi-cloud. The connectors and integrations need to allow enterprises to seamlessly connect and integrate systems with other platforms and services, enabling leverage over the full potential of the multi-cloud environment. And additionally, a diverse ecosystem also provides opportunities for collaboration and innovation as enterprises can then tap into a wide range of expertise and solutions from various vendors within the ecosystem. So it's about reliability. It's about ease of use and multi-cloud capabilities. Financial operations, compliance, and data auditing. Yeah, the same is true of data integration as a platform The entire platform is to the enterprise. You want to be able to control its costs. You want to be able to know what its cost is. I can't tell you how many times that I am told, William, yes, the budget number is important and certainly the lower the better, but what is more important is that it's predictable. And it's the same month over month and that we hit it and we don't go over, we get close, but we don't go over. That's what's more important. So that's a trickier number to try to figure out. But we do it and this is all part of financial operations, compliance, and data auditing. Also in here, features like auto-scaling and resource allocation optimization, which are quite helpful in cutting down on wasteful spending, things that cut down on wasteful spending. You might lead into this if you're really not running the DI tool and even your databases that frequently. So when I looked across these 10 capabilities, I applied a percentage based upon what I see enterprises caring about today as allocated across these capabilities. Your knowledge may vary. Keep in mind that sometimes you're considering a tool for the enterprise, yay, and sometimes you're considering a tool just for this application and you're not thinking about the next application or any prior application. And maybe that could use a little more data architecture in the mix. Remember in the early days of data integration, if anybody was around them, it was just moving data, data transformation. It was just moving data in the same way that we move it today, but there weren't all these considerations. It's just moved the data. But now it's gotten more complex. But again, the process of moving the data is still pretty simple. It's still GUI based and it's even getting more simple. But the things that have to happen behind the scenes in the setup, more and more complicated. So let's take a look at streaming solutions. Keep in mind that if the data is streaming data, you probably need a streaming solution to manage that data if you want to capture the data. And by capture, I might mean putting it into another data store forever, or I might mean managing the data or getting that data involved in your processes, your business processes in real time. Either way, if you have to touch that streaming data, you probably are going to need a streaming solution. Now, you might be saying, well, what's streaming data? Well, let me try to sort that out a little bit. ETL or ELT for that matter is insufficient for this combination where data platforms operating at an enterprise-wide scale. I know that's a little nebulous, but everybody's going to interpret that differently and that's okay. A high variety of data sources, real time, or I use the word streaming data. I kind of know it when I see it. I think a lot of practitioners do. And it's going to be a bit contextual. That's why I don't have a number in here for you. But it's going to be a bit contextual to the environment and to your maturity, frankly, around data. But it's really not that hard. If it's being generated all the time at an enterprise scale as we sit here on this webinar, is that data just being generated by the millions? If that's the case, millions of records, if that's the case, it's probably streaming data. It probably falls into this category. I have had clients that have a legitimate dilemma between a DI tool and a streaming tool. But most of the time it's let's pick the right DI tool or let's pick the right streaming tool. This is about real-time data, aka messaging, live feeds, real-time, event-driven. It's gone through some different names and still is. Comes in continuously and often quickly. Needs special attention and can be of immense value, but only if we are alerted in time because a sudden price change, a critical threshold that's met, a sense of reading that changes rapidly or a blip in a log file need to be attended to right away if you want to get any value out of that. And if that's the case, you cannot wait for any kind of batch process, even if the batch is occurring in every five minutes. Real-time data, I called out the foundation for artificial intelligence. We've got to get the... Now, we're nowhere near getting the other data in shape for artificial intelligence, for the most part. But streaming data, we're pretty far from getting that data under control. And that will be, I say, the battleground in the next year, two or three, where companies are distinguishing themselves through the use of analytics over streaming data. So it is essential that you get all your data now under control. I used to be able to say, well, just integrate the data that you need, just save the data that you need. And we'll age off history data and so on. Now it's all data. And if you don't see the need, that's the problem. If you don't see the need for all data, you don't see the need or the way to exploit all of your data, then you need to grow your data science capabilities. And you need to champion how data can be used. And usually people on calls like this that have the good information that the enterprise needs to hear. And I always encourage you to take that message back to your enterprise and talk about the importance of data. Create a plan. How are you going to influence the enterprise? And in the way that it should be influenced, by the way, this is nothing artificial. This is important. And this is probably the difference between success or failure down the road anyway, of the business. And I view, I'm on a soapbox now, excuse me, but I view artificial intelligence as part of the lineage of data. So we had data warehouses, we have data lakes. We have artificial intelligence. Yeah, I know it's a use, but there's not enough, but it's data is so important to AI that I view it on the maturity spectrum of data. AI, that is. Okay, off soapbox. So back to the problem of real-time data, enter a message-oriented middleware, aka streaming and message queuing technology. We've all heard of Kafka. There are others, but this is the type of technology, technologies that I'm talking about. Intelligent data platforms for fast data that connect, process and store data in real-time in a unified, flexible solution, able to meet demanding SLAs even at scale without operational burdens and complexity. These products give you throughput storage, latency and operations. You want to look at all of those. And the streaming architecture conceptually is pretty simple. There's request response from all of the connected applications and platforms. Sometimes change logs are read. But the streaming platform will distribute the data by, let's say, topic in the case of Kafka to whoever is subscribed to it. It's a more flexible arrangement for certain data. But again, it's the data profile that I lean into to determine if it's a streaming problem or a data integration problem or opportunity. Apache Kafka. This is the one that everybody knows about, but there are others. I'll just say a little bit about it. Open source streaming platform developed at LinkedIn. They all have a nice heritage. Very interesting stories there sometimes. A distributed PubSub messaging system that maintains feeds of messages called Topics. Publishers write data to Topics and subscribers read from Topics. And this is where you get the source to sync data pipeline language. Kafka messages are simple. You can put in a T to Kafka stream, T for transformation through Kafka Streams API. It's kind of hard, but you can. There's also Pulsar. There's also Nats. There's also Red Panda. There's also a few others. So Kafka not the only game in town when it comes to this stuff, but the one people know. So I called it out as an example of streaming. Pulsar, when it comes to Pulsar, the reason I call it out is it ensures message data is never lost under any circumstance. It achieves this with Apache Bookkeeper to provide low latency persistent storage. So some good things about it and really a lot of them. So in summary of the presentation and now's a great time. If you have any questions for myself or pre-tam about data integration, get them on in. We'll be there in a couple minutes. By necessity, we have numerous data stores for enterprise data. So this is not to give you a license to ignore the complexity in your environment and just charge on, right? And this gets back to a judgment call. This is a decision point that is hard to articulate in a presentation because everybody's going to be a little bit different and there's going to be a lot of circumstances. But comes a point when there may be some obvious integrations that you can do between platforms. And you certainly should. Data integration is moving data. That's what it is. That's it. We want the tool to take care of the details and I've gone through the details. This is it. Those are the things that we want the tool to take care of for us. Data integration vendors show signs of growth that allowed them to drive customer focused data strategies, et cetera. As a matter of fact, that market is at an all time high. I thought we were creating these data lakes, data warehouses where data is aggregated. We just have so much data and data architecture is still a challenging word. I'll put it that way in a lot of environments. So you end up with a lot of data integration. Probably the data integration vendors like that. I don't know. An enterprise's data stack would not be complete without data integration. And there are a wide variety of options available from vendors today to meet a wide range of requirements. What I said before about the databases, they're not all the same. The data integration vendors, they're not all the same. And really that tool selection could be the difference between success or failure of your project. And certainly some are going to be better for the project and better for the long run to have in the shop. So it is important to vet it out very well. Message oriented middleware, aka streaming and message queuing technology provide an intelligent data platform for what I'm labeling fast data. Again, you know it when you know it, see it when you see it. And evaluators should consider whether they need a solution that is full spectrum, solution specific bespoke or framework according to the intended application of data integration in terms of the complexity of project scope and how intricate the project's technical environment will be. And this is something I call out in my report, but do note that I left a lot of room in here for your good judgment. And nowhere in data can I think of that requires better judgment calls than in data integration. So I've given you some of the science behind it, but the judgment calls are there for you to make every day in your data integration environment. Sometimes they're small, sometimes they're big, but they're important. So this has been data integration newsflash. We still just moved data. I will turn it back over now to Shannon to see if we have any questions for myself and Preetam. Shannon. Well, thank you so much for another great presentation. And just to answer the most commonly asked questions, just a reminder, I will send a follow up email by end of day Monday with links to the slides and links to the recording from this webinar. And if you have questions, feel free to put them in the Q&A section. So diving in here, why not update your disparate data sets to be directly interoperable thereby eliminating the need for data transformation based data integration? I'll take that first and Preetam, you may have a view on that as well. And one thing I didn't mention in this presentation because it's about the opposite of it is data virtualization and the ability to look across different environments. I'm a proponent of data virtualization and not just integrating for integrating's sake all the time. However, that being said, there are limitations in terms of performance to data virtualization. And there's also limitations to a lot of these products out there, like I'm working now with Salesforce Data Cloud, for example, that is fairly new. But it wants its own schema and wants all the data in there and maybe doesn't have great virtualization capabilities. You give it the data in its schema. There's no wiggle room there. And there's often, if you look at a data architecture, there often isn't the wiggle room. So when there's not virtualization, as good as it may work in the environment, doesn't work. And again, it's not going to perform as well if that's a big consideration. So you want physical cohabitation of your data or best performance today. You want virtualization for those edge cases and things like that. Preetam, anything to add? Yeah, I mean, I think you very well summarized it. Virtualization is definitely a solution, but it also depends upon the use cases, what use cases you want to drive. And that's when data movement becomes extremely critical. If you want to do ELT, which we see a lot of our customers are doing it as William was mentioning about CDC and all. So it's absolutely depend upon the use cases and the volume of data you are having. So depending upon your typical enterprise, you do have to move the data and transform the data and make sure the data is available in the right format. So data quality and all becomes extremely critical, including governance as well. Perfect, thank you both. So how does data mesh and data fabric fit into your presentation? Yeah, that's a great question. It's, that's data, a data fabric, it's a lot of data virtualization. And so I would direct you back to the prior answer for a lot of that answer. But as for a data mesh, as it is a way of organizing the data stores, I just absorbed data stores into the presentation as data stores. However you have architected them is up to you. You can architect different ways and still have a lot of data integration needs. As a matter of fact, with data mesh, data mesh is something that is driving a lot of the exponential growth in data stores. It's an acknowledgement that we can't do it all centrally. So let's open the doors. Let's open the barn doors and you get a data warehouse. You get a data warehouse. You get one. You get data integration over there and so on. You get a pipeline. You get a pipeline. And so yes, it is opening the doors for a lot of data integration. It's an acknowledgement of where we really are in the industry. And so I think it's just opening up more doors for data integration. Wait, Preeti, anything you want to add there? Yeah, I mean, I think William summed it really well. I mean, data fabric, data mesh, even lake house architecture, different kind of architecture that customers, we see customers are adopting. It's a different way of like you're consuming the data. So it depends like some customers want to have a centralized governance. So they go with data fabric, which helps them in terms of controlling a lot of stuff. And in data mesh, it's actually the departments that can would like to drive things again. So it purely depends upon the customer's use cases. But yeah, as William rightly said, we also see a lot of our customers adopting data mesh with each department building their own data warehouse for driving their specific use cases. Thank you. Well, thank you both so much. That does unfortunately bring us to the top of the hour. Preeti, thank you so much for joining us this month. And thanks to Informatica for sponsoring the webinar, hoping to make these webinars happen. Thank you. It was a pleasure talking. And William, thank you as always. And thanks to all of our attendees. Again, just a reminder, I will send a follow-up email by end of day, Monday with links to the slides and links to the recording. Hope you all have a great day. Thanks, everybody. Thank you. Bye. Bye-bye.