 Hello, and welcome my name is Shannon Kemp and I'm the Chief Digital Manager of Data Diversity. We'd like to thank you for joining the latest installment of the Monthly Data Diversity Webinar Series, Advanced Analytics with William McKnight, sponsored today by Matillion. Today William will be discussing the shifting landscape of data integration. Just a couple of points to get us started, due to the large number of people that attend these sessions, you will be muted during the webinar. For questions, we will be collecting them by the Q&A section or if you'd like to tweet, we encourage you to share highlights or questions via Twitter using hashtag ADV analytics. And if you'd like to chat with us or with each other, we certainly encourage you to do so. To open the Q&A panel or the chat panel, you will find those icons in the bottom of your screen for those features. And as always, we will send a follow-up email within two business days containing links to the slides, the recording of the session, and any additional information requested throughout the webinar. Now, let me turn it over to Paul from Matillion for a brief word from our sponsor, Paul, hello, and welcome. Hey, Shannon, good to be with you and hello to everyone, William, my co-presenter and everyone that's on the call here today. Looking forward to having a really good conversation around the shifting landscape of data integration and we'll kick it off here with a vendor perspective from Matillion. But first, a quick note about myself. Whoops. So, my name is Paul Lacey, I'm a Senior Director of Product Marketing here at Matillion. I've been involved in the data integration world for quite some time now, holding various product marketing roles in the business and in several companies in the industry. Before that, I was an engineer through and through. I was a hardware engineer that transitioned into being more of a firmware and then full on software engineer. So a lot of experience kind of managing the technical aspects of data and really excited to get into that with you here today. So in thinking about what we could talk about with this subject, I thought the most interesting thing that we could share with you from Matillion's perspective and I'll tell you a little bit about Matillion a little bit later on. But Matillion is a cloud ETL or cloud data integration provider and so we are born in the cloud and we work with all these kind of modern cloud data platforms. So what we've noticed is some interesting shifts happening with the with the discipline of data integration in the cloud and that's what I'd like to share with you here today. A lot of you might remember the three Vs of big data, the volume, variety, velocity of big data. What we've noticed over the past couple of years is a shift as people transition to more cloud-based architectures that we've noticed there's a new set of challenges waiting for them in the cloud and so that's what we're going to dive into a little bit of research here that we've been doing into what that is and how we can address that with modern solutions. So to start off, I have a little bit of research to share with you from our partners at IDC. They have been doing a lot of looking into this space and we've been working with them on this as well. This is a survey they ran late last year and was published early this year around some of the challenges that people experience when they when they migrate their infrastructure to the cloud, as well as a number of the things that they find to be accelerators of their analytics in the cloud once they have dealt with the challenge. And it's quite interesting if you look down this list, you see a lot of things that you would expect, things like data security and quality, cloud migration, compliance, but at the top of the list, we see data distribution is continuing to be a challenge as well as an accelerator once it's effectively dealt with in the cloud. And so what we've noticed is that there's a bit of a shift in some of the things that people might be used to dealing with and more of an on-prem environment when they shift to the cloud. In fact, IDC has come to the conclusion over the last couple of years that this is part of a much larger shift away from the traditional three of these of big data that the cloud technologies and cloud scale technologies actually do a really good job of dealing with you. And they do a very good job of enabling scale and elasticity to handle a lot of volume and velocity of data. And they certainly with a lot of the more modern approaches to data lakes and lake houses, etc. Can deal with a variety of data quite well as well. What we see when people migrate to the cloud is there's a new set of challenges waiting for them there. And that data is ever more diverse, distributed and dynamic in the cloud. And so what do we mean by this? Well, when we think about the challenge of diversity in the cloud, data is increasingly coming in new sources and formats. It is coming from more and more SAS APIs. We've seen just an explosion of SAS in the enterprise over the past five to 10 years. And that's really starting to play a role in how people think about architecting their cloud infrastructure and their integration paradigms. Things like data fabrics have become more popular. Things like data meshes have become more popular. Primarily to deal with the fact that it's easier than ever now to pull out a credit card and spin up a data silo in the cloud. And with that also comes the challenge of having to process lots of different types of data and different JSON formats and schemas, as well as a lot of other types of formats, as people have realized that they now have the ability to do so. They're being put pressure on them and their data teams to process all types of data and IDC breaks it down in terms of a lot of broad strokes here, but transactional data, geospatial data, multimedia data, all these different data types. They find that more than 50% of organizations are processing at least for different types, distinct types of data in those broad categories in their analytics today. The second thing, as I mentioned before, related to diversity is the distributed nature of data in the cloud. And data is being stored in more and more operational systems that are at the edge. IoT is becoming more of a factor when it comes to centralizing data or thinking about how you want to process your data. And so they need to find new ways of dealing with the distribution challenge in the cloud. And finally, data is more dynamic in the cloud with more systems that are owning the data and more APIs and whatnot to integrate with. Teams are having less and less control over things like schema drift and other things that can happen in the cloud with infrastructure that's not under their control. And that requires them to be dynamic as well. They need to be agile and iterative when it comes to thinking about how they roll out their infrastructure and how they make incremental improvements to it as well. So we see the rise of concepts like data ops as being a kind of offshoot of the need for people to move faster and deliver results faster in the cloud as well. So a little bit about Matillion. We think about Matillion as a modern solution to modern challenges. As I mentioned, we are a cloud data integration provider. Really, our secret sauce is the ability to give data teams low-code environments with absolutely no compromise. And for some that might have worked with more legacy kind of ETL providers, usually there's a bit of a trade-off to be had between whether or not you can do things easily and quickly in the tool, but you are kind of forced into some rigid paradigms or whether or not you can actually provide a lot of customization, but then you have to basically reinvent the wheel from scratch quite frequently. What we think we've done, what we have done at Matillion is really strike a good balance there so that the way we've architected our tool using best in class cloud frameworks and push down transformation technologies, we can enable teams to have really rich granular levels of control when they need it, but they have the simplicity of drag and drop and kind of low-code development when they don't, when their goal is to move quickly and kind of reuse things. And so towards that end, we think about four key principles to every product that we design and develop. We think about products these days need to be easy to use. Like we talked about with the dynamicism of data, product, data teams need to move much, much quicker now than they ever did before. And in order to do that, these tools need to be intuitive. They need to be low management, low overhead. They need to be fast to pick up and they need to be easy for easily interchangeable so that people can come into the team and understand what's going on very quickly. All of those things are kind of behind the design philosophy of what we do. We are built for the cloud. And what that means is we leverage the full power of the cloud with push down translations on all the native platforms that we support. You can see all of them over here on the left. All these modern cloud data platforms are usually coming up with new features and capabilities new DIN AI and ML capabilities that are native to their platforms. Matillion is so tightly integrated that we surface all of those feature functionalities in our low code tool and allow people to take advantage of that without needing to fall back on Python scripts or things that would be required in more legacy environments. So very, very powerful. We win awards all the time for how tightly integrated we are with our partners. And we really focus on bringing the power of the cloud to you. We are built for the enterprise and really through and through from the ground up we take things like enterprise security very seriously. We have designed our process and our products to be seamlessly scalable across large volumes of data. In fact, one of the key innovations that we've unlocked here at Matillion is the ability to separate logic from compute in much the same way that the modern cloud data platform say they have separated compute from storage which is a tremendous innovation on their end which allows them to own best in class compute. We leverage their best in class compute by separating the logic layer from the compute layer. So we can run the same logic on wildly different volumes of data and different infrastructures as well with relatively low overhead and portability. So again, we are built through and through for the enterprise. And finally, we focus on allowing our teams to deliver transformative value. We do that in a number of different ways. We do it in the way that we allow people to rapidly deliver insights and results back into their business. But we also do that in the way that we allow them to pay for and use the software. And we've just recently rolled out a industry leading credit-based consumption model where we have this concept of universal credits on Matillion and so there's no subscriptions to be had and there's no contracts to be signed for new functionality. You simply pay for credits. You consume them when you use parts of our platform. As we continue to roll out new parts of our platform, those can be consumed using the same credits that you paid for in the past. And so you can essentially just use it when you need it and not use it when you don't need it. It's very flexible, allows for very tight ROI between data initiatives and the spin that you have for infrastructure underneath. We have a couple of core products that help us do this. We have our Matillion ETL flagship data integration suite as well as our lightweight and free Matillion Data Loader SAS Data Collector which collects data from a number of SAS sources and allows you to seamlessly with a no-code wizard load that into some of the leading cloud data platforms today. Both of these are available for use today. As I mentioned, Matillion Data Loader is available for free for use at Matillion.com. And Matillion ETL is our flagship product with much more functionality like we just discussed. If any of this has piqued your interest, we would be happy to have some more conversations with you about this. You can visit us on the web at Matillion.com and get in touch with someone who can answer all your questions and show you Matillion in action and we welcome the opportunity to do so. But with that, I will thank you for your attention and I will turn it back over to Shannon. Paul, thank you so much for kicking us off and thanks to Matillion for sponsoring and helping these webinars happen. If you have a question for Paul, feel free to submit the questions in the Q&A section of your screen as he will be joining us in the Q&A at the end of the webinar today. Now let me introduce to you our speaker for the series, William McKnight. William has advised many of the world's best-known organizations his strategies from the Information Management Plan for leading companies in numerous industries. He is a prolific author and popular keynote speaker and trainer. He has performed dozens of benchmarks on leading database, data lake, streaming and data integration products. And with that, I would give the floor to William to get his section of the presentation started. Hello and welcome. Hello, thank you Shannon and thank you Paul. It's great to have Matillion aboard, advanced analytics here. They are quite a compelling data integration vendor and that's not easy to do these days as you'll see as we go through some of the requirements that you guys have for data integration that it's a pretty high bar. Anyway, before we get started, I asked my friend, John, who attends all of these. Hello, if you're out there, John, how these were going for him as an attendee and he said great, great information, well-organized and all these accolades and I appreciated it. But he said you got to let your hair down a little bit more because we spend an hour with you every month here and we don't know that much about you and let your hair down a little bit. So I'll let my hair down a little bit here and let you know, the thing I will let you know is I'm a dog owner. We here at the McKnight house are two-time foster failures. So we've accumulated these four dogs in the house. I guess we specialize in the 12 pound and below because that's kind of where they are but they keep us on our toes. I'm also heavily into fitness and I am the current national age group champion of these things called DecaFit and Hype Rocks Pro. If they mean nothing to you, then you're probably normal. But if they mean anything to you and you wanna reach out on behalf of any of that, I'd be more than happy to do that. I enjoy talking about all the racing that I do and the training for it. So a lot of fun there. And I also play a little piano, although I must admit it's been hard to get the time to do it lately, I play mostly classical but we'll play just about anything if I can get the time for it. It's hard when your data environments out there look like this. Data processes, people, privacy problems and projects. Over time, it begins to look like a spaghetti mess and that's just normal. I don't wanna say, I don't wanna excuse all the messes out there, of course. If you've been following this series, you know I like to get things as organized as possible yet keep the business moving forward. And I know that that is hard and I've given you lots of tips over the years as to how to do this very thing. But we have today, we have code all over the place. We have data all over the place. We have various profiles of usage of this data and sometimes, well, quite frequently actually they have demands as to how they want to consume the data. And now we throw in, we definitely have to throw in compliance and audit these days because those requirements are looming large within us. And oh, by the way, things change quite a bit. And so we're trying to support, if you will, I don't care if you're in a business area or central IT, wherever you sit, if you're working this technology area, this architecture area, we're trying to support applications. And to do that, every application is gonna be a little bit different. Some of them are gonna be more keen to share. Some of them are gonna be less keen to share. I want you to have, as the data expert, and I do speak to the data experts within enterprise companies out there in this series. I know we get a ton of vendors and consultants and welcome of course, but I do kind of speak to the persona of the person in the enterprise trying to make sense of all this out there in their enterprise and do governance over the top of it. So we have a lot of data integration going on is kind of my point. We have more data integration going on than ever. So yeah, just because we're not just doing data warehousing anymore, doesn't mean that the data integration has gone away. Actually, with the more decentralized approach, you could argue that there's a whole lot more data integration going on. And there's a lot of valid platforms and that's what I wanna talk about is why do we have so many data stores? Now, I talk to clients all the time about this and a lot of them, a lot of you are fretting about why you have so many data stores. Some of you are reasonably fretting because you have 100 and it's out of control and nothing gets reused at that point. There's kind of a bubble, if you will, a graph. If you can graph this in your mind, there's kind of a bubble in the middle where you have just the right amount and variety of data stores in your environment. But when you get beyond that, it actually becomes far less efficient because you begin to be overwhelmed and now you have a high management need around what you have. There is such a thing as too few data stores. I haven't found that client yet that has too few. A lot will try to have a few, which is I think a worthy goal. And we want to try to reuse and I think the real keyword here to get this right is reuse, right? We want to build things for reuse. I call it leverage, data warehousing, data lakes, operational data lakes, master data management environments. This is where I train my CDO clients to focus on, focus on leverage, focus on the biggest bang for the buck and creating structure that actually can be leveraged throughout multiple applications over the course of time and is future-proofed. So many of you have an enterprise agreement with one of the big vendors, right? And you kind of reach for that software, that database, if you will, for most of your needs. And for a lot of the needs, that's gonna be just fine because it doesn't matter too much. But I have found in the past five years that the majority, over 50% of the applications that we work on that I'm exposed to out there has some unique requirements that you cannot just do that thing I just mentioned and feel like you're going to succeed. So what I'm seeing now is we're going through a process here. We're kind of in the middle probably of a good five years, six years, seven years of companies deciding that they need to break away from that primary vendor and do some other things and consider some best to breed out there because they have unique requirements. It's not that our requirements are supersizing or outsizing the abilities of technology. It's not really that at all. It's about right fitting technology to meet the demand of the application in the enterprise. And that means you're gonna have different data storage and you're gonna have data integration as a result. Performance is one thing. So you're kind of, if I may say the word generic, your everyday kind of database may not have the performance characteristics that you need for a given application. I know that this is true for again, the majority of the applications that we've been working on and exposed to over the past five years. And people want performance out of the box, gone are the days when it's okay to tinker, tinker, tinker forever with the database to dial in that performance. The demands are really day one, day one the performance at level that is needed. And of course, we can go from there but out of the box performance is very important. And that's usually what we measure in our benchmarks is out of the box performance is that's what we hear people want. What else? What else? Cost predictability and transparency. So it used to be that we used to spend this big cycle trying to spec the platform and we usually got it wrong. I'll say I usually got it wrong. Too little, too much, what have you? And of course with the elastic platforms which we're almost all on now, at least for new stuff that kind of goes out the window that need. But the need to manage costs does not go out the window at all. As a matter of fact, it's now an ongoing concern within the enterprise. And you have to have much more dedicated skills and knowledge around cost and keep a closer eye on it because you're getting a bill every month. Whereas before you could do it, it would go away and maybe you revisit that in six months when you're looking at, oh my gosh, we're paying for all this capacity that we're not using or maybe the reverse is true. We need more capacity. So that generates an event. Well, cost predictability and transparency is less of an event anymore. It's really an ongoing concern. It's something that somebody in the data team needs to be tasked with keeping an eye on. That means they need to put a notice on their calendar, what have you, whatever it takes to check in on the ongoing costs and making sure that it's in alignment. Yeah, there's a lot of frustration out there with cost predictability and transparency, even of these clouds, especially of these cloud databases, could it kind of hit you in the face when you need more resources than you thought and things just sort of add up. Anyway, administration, I find that companies with an excessive of staff that's qualified in the data area has less of a concern about this, but eventually it even can become a concern there when efficiency becomes front and center. And in those shops, which are most, I would say that are shorter staffed on great administration around these data products, administration, ease of administration, certainly becomes a thing. So this is a criteria by which some people out there have decided they need to go with a different database. Now, it's usually multifactored, but this is one of the leading factors alongside performance as to why sometimes you change platforms and you fork off. You fork off of that data warehouse and you don't necessarily go with the same database when you fork that off and you go with a mart, you build a lake, you build a specialized structure and so on. You may need some of the things that special optimizers do, not all optimizers do conditional parallelism, not all optimizers do dynamic and controllable prioritization of resources, et cetera. Now, workload isolation, do you need this stuff? Well, you only know that when you know your workload and when you go into these things heads up and you know your workload, you can decide if you need an optimizer that does these things because some optimizers have not been worked on very much in the past five to 10 years and really an opportunity missed there because the optimizers needed more than ever now that we're demanding much faster turnaround of great performance and so on. We need the ability to introduce a few less than optimal things in our design and in our queries and so on and let it be okay. Well, that falls, then the work falls back on the optimizer. Concurrency, I'll say concurrency scaling because most of the databases are pretty good down at the one to five concurrent level and they'll scale literally to that point but then we see a vast divergence of concurrency at that point. We've tested up to 50 concurrent users which some shops will have and of course you can play design games to get around that but again, we're looking at out of the box. We're trying to look at these platforms from the perspective of if I land this platform tomorrow what is the environment gonna look like the next day because that's what my clients care about and so we're looking at concurrency scaling to 50, 100 whatever the case may be and you may not have that need in whatever application you're working on. Okay, great, but I will add that again over 50% of my workload and the workload I'm exposed to in the past five years has had exceedingly greater concurrency requirements than the previous ones. So also a thing to look at when you look at concurrency scaling is the cost. Had the cost of concurrency scaling might get you so keep an eye on that resource elasticity. I know we throw this word around but if you really don't know what the workload's gonna be or if it's gonna vary quite a bit or if it's gonna grow steadily whatever the case may be you don't wanna be paying for the gap that is there just in case. Some of my clients in the past if I've given them the estimate they've doubled it just in case. Well, that's expensive. And that's an expense that we don't really need to have anymore when you have great resource elasticity and although all the vendors will say they have it this refers to hands off elasticity not, oh, we got to call the vendor we have to do a budget negotiation it's gonna take a week we have to schedule it in blah, blah, blah and it's gonna be huge, you know 10 terabytes at a time or something like that that's kind of ridiculous today. So that's one thing to look out for and many workloads need that which is another reason why we have so many platforms in the enterprise today. Oh, this is a big one, machine learning we are considering machine learning now in everything that we do in everything that we are speccing from an application perspective I mean, unless it's super low complexity machine learning is replacing BI in some cases machine learning is taking the place of BI in many cases that when we're where we're designing an application where we wouldn't even think about it before before we'd be introducing data analysts and time and specialized structure for the analysts and so on now we're thinking more about can I get the data into the algorithms? How do I get the data into the algorithms the best? And for that we need machine learning and there's way different ways to skin this cat right now. I think it'll settle I think machine learning is coming into the database and some databases for example, I think Vertica is touting some 400 odd capabilities in machine learning. And so, I mean, we're seeing a lot more of that we're seeing a lot more built in access machine learning kind of like how we used to access any SQL statement and letting it go get trained on the data, do its thing and we want machine learning like everything else to be sort of hands off and work. And so, looking at the machine learning characteristics of platforms is causing some to diverge. Yes, and data storage format alternatives there are many out there now and sometimes it's that we're getting data in in Orc, Pat Parquet, JSON, Avro, et cetera getting data in that way and well, it just sort of makes sense to store it that way. Well, not all data platforms are great about that. So you might diverge and sometimes we actually want to store whatever data we're getting in these formats for all the benefits that it provides. And sometimes we turn to the NoSQL marketplace which are obviously specialists in this type of data structure but frequently the capabilities are growing within the database marketplace. And so we're looking at those capabilities as well. So there's big decisions to be made for you when it comes to when you have data storage format alternatives to good old alphanumeric data when it comes to platforms. So hopefully you're taking account of that. Again, platforming is important. It is important to the success of the project. Today's environment is complex. We need context, we need lineage. We need to know where that data is coming from. We need to know where it's been, who's done what to it and different things like this. And we have to consider the cost of the environment. Yeah, none of us are going to get it perfect but if we can get kind of in that great zone then we'll be far better off and today we have BI tools, SAS tools and applications, machine learning applications that want data from multiple of these platforms. Usually we're not perfect about putting all the data into one platform for a given complex application. So therefore we sometimes introduce data virtualization. Sometimes we actually add more data to these leverageable stores so that they can support the application. Yeah, different ways to go about it. So when it comes to data integration now we see that we need it. Now we see that it's normal. Now we see some of the reasons why we get into the situations we get into and why some of the many platforms out there are very relevant. So if you have 10, if you have 20, whatever you might just be where you need to be, not to worry but you do need to put it all together through data virtualization as I mentioned and good old data integration. Although it shouldn't say good old because it's changing. That's sort of the emphasis today. We have new goals now with the data integration product that we use on a given application. Now I don't believe for a minute that any enterprise that's mid-sized or up is optimal with one data integration product. I believe you might need an anchor kind of product that's sort of the default that you're good at but most of my better clients in the data area I'll say have this sort of data in an enterprise data integration product but they also have other products maybe from the same vendor, maybe not that actually do things with a more ease of use feel to them something that works great for a business user or a department as opposed to something that feels more central IT oriented. And to be fair, a lot of the enterprise data integration players are improving their UI to make it more user friendly because they know this trend is very true as well. So we want the product, whatever it is to be cloud data today. I cannot imagine that that is not a huge requirement for data integration in any modern application. We want intelligently driven automation and we want it to do a lot of different things with within I won't go into it right now but within a data integration product there's a lot that can be done intelligently. So here I say generate new pipelines source and target without manually mapping or design. Yeah, without doing that tedious manual mapping step that frankly I haven't figured out how to do any better than pulling out a good old spreadsheet and listing out the columns and typing in the transformation the transformation rules that I want and so on building a spec that way. Data orchestration, yeah, this has got to be part of data integration managing the ebb and flow of data throughout the ecosystem you want to see everything that's going on determining what data is analyzed determining what data is moved upstream and what granularity and state moving integrating and updating data metadata and master dating machine learning models as evolution happens at the core. So that whole feel of data orchestration is important to have within your data integration product trust created through transparency and understanding. So definitely security on top of this and a lot of this trust also will come from not from the tool but from your programs from your processes that go around it and are you communicating how that data evolves and how it gets to the point where the user actually grabs it or the application actually grabs it and wants to use it? Are you communicating how that data got there? And is it right? Is it right for use? So there's a lot of, you know, there's still left a lot for a human understanding in this process to make data integration successful able to dynamically and even automatically scale to meet the increasing complexity and concurrency demands and query executions because they continue. Okay, so let's look in more detail at some of the capabilities for cloud data integration, right? Data lineage is one of them and I don't belabor this too much but I have to say this is a new one. This has been really added to our vernacular when we go to market for clients, when we look at the market for tools and it's severely lacking. It's severely behind the requirements that organizations have. So they're home-rolling, if you will, you know some of the data lineage requirements which is tedious at best or they're doing something else. And there's a tool out there called Manta, for example we like it for data lineage. If that's a huge requirement you might want to augment whatever you're doing with data integration with something like that because that will track and do all this stuff at a very detailed level, which I'm saying is increasingly important because we have legislative bodies and compliance requirements. We have data discovery initiatives. These are all use cases for data lineage and sometimes use cases are multiple of these. So anyway, data lineage requirements. Yeah, we need the tool to represent things graphically. We need to see impact analysis these days. In data integration, you need to have impact analysis. Sometimes what's in the data integration tool is good for that, but sometimes it's not. So we wanna look at if I change this column, if I change this feed, what gets affected? We don't, you know, in an enterprise we can't just do it and wait for the fallout, right? That used to be something that was on the table but I think these data systems are pretty critical these days and we can't just do that. So we have to know. And usually when we don't have that capability that means we're piling on, we're adding on to what we got because we don't wanna touch what we got. We're duplicating work effort and so on. And we're adding every day to the inefficiency of the environment which one day that comes back to bite you. And so we love that clean environment. Root cause analysis is a big part of that. We want data lineage to extend to non-standard or custom sources, not just your everyday sales force and ERP systems and so on but really everything and general accessibility to that lineage and metadata requirements. Okay, so that's what I wanted to say about that data lineage requirement, which is kind of new. And here's the data integration requirements that we are working with right now. If you want a fuller treatment of these requirements check back on my April Advanced Analytics webinar here which I don't know if it's on YouTube or not but should be on dataversity.net. Check it out and I go through each of these in great detail. So I won't do that today, but I will go through them. Make sure you understand that these are some of the things you're looking for in data integration for given application. Now every application is gonna have a different profile. That's why I love it when we actually profile the data integration requirements for an application and take that sensible document to market and don't just reach for the same old same old. Now you do this, it might end up that your enterprise DI vendor is great and that's great or close enough and that's great too but you might find that you need something else. Comprehensive, native, connectivity, multi latency data ingestion. You need to be able to take that data in in different patterns. Data integration patterns, ETL, ELT, batch, streaming, data quality and data governance over the top. Oh yeah, that's really important and I'm seeing more that we are loving to abstract our rules in a tool that's focused on that, like a data catalog. So that's the next one, data cataloging and metadata management and letting that rule just take place throughout all new development within the enterprise. That's kind of the short-term holy grail if you will that I'm looking at in vendor solutions. I don't think we're quite there yet. The integration with premier data catalogs like Elation and Calibra are, shall we say, work in progress but when that happens more and when frankly the data catalogs come up in the enterprise data integration vendors, we're gonna see more abilities around this in this area and we're seeing this as one of many things that should be pushing down our cycles in data integration over time. Our data integration cycles don't need to be scaling linearly with the volume of data that we're managing or the volume of platforms that we're managing. I think that data integration, the tools, it's time for the tools to step in and put a ceiling I guess on that, let us do some other things, enterprise trust, artificial intelligence and automation. To me, this is a nascent coming on board in these tools. Mason, I should say. And something that will help you know that you're getting into a future-proof tool, a tool that is going to be absorbing artificial intelligence means it's going to be more efficient over time. It's really almost too late for a tool to be stepping up to artificial intelligence today, it's high time. So something to look at in your tool of choice and the whole ecosystem and we're kind of all multi-cloud these days. So that's great when that is true. There are some really good tools within a given environment like AWS and Azure. For example, in Google, but you're committing. You're committing full, that full application to that platform because those tools don't work outside of their platform. So make your choice wisely there. Make sure you're future-proofing what you do. And a lot of times that means the tool that works in multiple clouds, yeah. We also have these data prep requirements and I don't necessarily mean it's a data prep tool if you will, but we have requirements at a lower scale of that need that I was just talking about. And this might be because it's smaller scale of in terms of data, in terms of users, in terms of complexity, et cetera, in terms of fluff time that you have to get the thing up and running, okay. So there are tools that are designed more as a low setup, no config and no maintenance data pipeline to lift data from operational sources and deliver it wherever, my example is a modern cloud data warehouse, well suited for popular cloud applications. We should not be reinventing the wheel for pulling data out of Salesforce, Stripe, Mercato, et cetera, of these type of popular cloud applications. And very few of you are doing that. We used to do that. I used to do this for SAP. And even before they had their BW if you will. And so we'd be pulling the data out. Of course, the column names were in German and is very difficult and tedious. Well, hopefully nobody does that anymore. That's an extreme example, but people still probably do that a little bit. Transformations are SQL based rules written by business users and set to run on a sketch. So there needs to be some transformations even at data prep level. They may not be as elegant or as profound as what an enterprise tool might need. But this is a different price point. This is a different scale. But what has happened is this is all, I say all this, right? I say all this about data prep, but it's gotten kind of blurry lately because you have tools like Matillion that's stepping up, raising its hand for more enterprise needs, bringing that ease of use that I said before is required up to the enterprise. You may sacrifice a little bit of that deeper level of control today. You may or may not need it, but bringing ease of use to the enterprise is pretty important. So again, you're stepping into this marketplace. It's not for the faint of heart. It's for someone that really knows it and can get into the right tool that is future proof that we'll do the job for now and forever. And even as platforms change, maybe you're working in a data mark today and eventually this does need to be part of the data warehouse just for efficiency's sake. So there again, you have a need. Now, application programming interfaces. So I've been talking about data integration tools and they're great and all that. And sometimes we need something that's gonna bring that data to the table a little bit better than what we can do otherwise. And today, increasingly, that's an application programming interface with or without a data integration tool. These are becoming numerous. These are becoming de facto standards of communication within and beyond the enterprise. So they've begun to replace older, more cumbersome methods of information sharing with lightweight endpoints due to the popularity and proliferation of microservices. In particular, the need has arisen to manage the multitude of services the company relies on. And organizations depend on these services. Now, we at McKnight have a working definition that has held true for us for a couple of years. We've done several benchmarks in the API arena. And that is 1,000 transactions per second. 1,000 transactions per second on their API endpoints. I don't know, does that sound like a lot? It's higher end, I would say, but it's certainly something within the realm of need in the next year or two for many of the applications that we're working on, probably that you're working on. So how are the APIs doing at that level? There are companies out there. I'll just mention FuKong, Ingenix, API7. There are companies out there that provide a host of different APIs. And I think enterprises now need one of those vendors, a board, so that they have access to all the capabilities that these tools are building to their architecture. So go choosing one of those as well. I've given you several things to get here, several things today that you can buy and not build. Always a good thing over the course of time. API microservices ecosystem, there's public APIs. And we enjoy working with these, right? There's over 20,000, there's stuff like weather, there's stuff like stocks, stuff like currency, stuff like news, things like that. And you can look at programmableweb.com. Most programmers are familiar with that. And look at all the APIs out there. And I don't know what number we're at now, but 20,000, not too long ago. So probably 25,000 by now. But anyway, you might have private APIs with external partners or with internal partners. And the platform architecture, you've got clients, you might have a load balancer. We've used Ingenix a lot. Then you have different API nodes that you set up. I'm showing the Kong logo there, but there's different ones. And they have their database and they distribute the data to the endpoints. So anyway, this is a great thing to take a lot of cycles, not only cycles, but performance. Take cycles off and add performance to the whole idea of data integration with or without a data integration tool. So these endpoints can be data integration tools that take the data into deeper transformation, load it with much more manageability into data warehouse and so on. You want to bring a lot more rigor, a lot more standards to how you load the data into these leverageable platforms. That's not for, well, let's just load it this way and hope for the best. And we know it'll work day one kind of thing. This is, we want to put high levels of, well, standard onto our data warehouses, data lakes and so on, because those are gonna get highly used. And then you also have critical applications that the same is true for. And sometimes you can sort of take it as it goes, if you will, but frequently we can't. So requirements around APIs, good for high performance workloads like streaming solutions, which I'm gonna get to in a minute here. Reliability, all workloads completed with 100% message completion, no failures. So test your APIs, grab one of our benchmarks to see how you can test those APIs and see that you get 100% message completion. See how it does at the different 99.9 plus percentiles, if you will. And by the way, make sure you turn on the plugins that you're going to have turned on in the process of executing the API and see what kind of overhead that puts on the whole process. Now, I've talked about data integration tools and APIs and there's also streaming solutions to kind of round it out. When might you need something else? Well, I'm gonna get back to this frequent thing I keep saying, which is that the workloads are changing out there. And what we're seeing in terms of requirements for modern workloads is at a different level, different scale than what it used to be. And that means the volume of data coming in falls into what we call the streaming category. And ETL can be quite insufficient when you know you're dealing with streaming data. And when I say ETL, I mean ELT as well. You just need something that's gonna handle things better. Now, there are brokers that you can put into the middle of all this. I call them traffic cops or post offices, if you will, that really help out a lot here and feed the data out and feed it in a sensible context. And that's stuff like Kafka, which we'll get to here. But first of all, ETL, ETL forces you either to have real-time loading without being scalable or scalability with bachelor, but not real-time loading. And that's kind of where a lot of those tools are at today. This isn't a hard and fast statement. This isn't a quantified statement, but this should trigger you to actually make sure that you're not forcing yourself into something that is not future-proofed with ETL when you have real-time data. Also knows messaging live feeds, real-time or event-driven where the data comes in continuously and often quickly. So we call it streaming data, need special attention. And this data is not only needed for whatever application you're doing, supply chain, fraud detection, customer churn analysis, these things come to mind, but just as a general foundation for anything that you're doing with artificial intelligence, which wants all your data, it wants all data down at a granular level. So this is it. Doesn't get much more granular than this real-time data. And stream data forms the core of data for artificial intelligence, truth be told. So the base data, if you will, of alphanumeric, financially oriented, acceptor kind of data. Yeah, that's master data, if you will. That's important for artificial intelligence to provide context to the transactions, but the real transactions are found within streaming data. So enter message-oriented middleware, also known as streaming and message-hewing technology. So this is all about the messages and I'm gonna go a little fast here, that's okay, I think. Things you're gonna look for in the solution, throughput, storage, how's the data stored, what kind of latency are you gonna fall behind? Falling behind is sort of the death knell for streaming when you're loading that data, unless you have a gap coming up, which usually we don't, which is why it's streaming data and also how the thing operates. These are things you're gonna look at. So the streaming platform will go into the middle of all of the apps and all of the downstream notes on the pipeline. And this will be something like a Kafka, which is very popular these days. Open source streaming platform developed at LinkedIn, not everybody knows that. Kind of an interesting nugget, I would say. It's a distributed PubSub messaging system that maintains feeds of messages called Topics. So if you're in that world, you know about topics, you know about assigning records to topics and then the subscribers, the sinks, if you will, can subscribe to topics and just pull off that kind of data. Just be careful doing all this, that you're still doing something architected. You're not using Kafka, et cetera, as an excuse to have an unarchitected environment and just spread all day to everywhere. Okay, I've seen that and that's a little naughty. Pulsar, something we're very keen on, you can find our benchmarks on this. This was originally developed at Yahoo, began as incubation of Apache in late 2016. It's been in production as at Yahoo for three years prior in many things. I don't know if you feel like Yahoo things are credible or not, but anyway, they did quite a bit at Yahoo before their demise, follows the PubSub model and has the same producers topics and consumers as Kafka. But as we have shown, it operates with good performance. So when you look at your workloads or distinguished by the number of topics, the size of the messages, et cetera, what you see here, these are the things that you wanna look at when determining the right streaming solution for you or even if it's a streaming solution. If you can't look at the workload and see topics and see messages and see subscribers and producers, then maybe you're back in the data integration world. Nothing wrong with that, but make sure you're not in this world and you put the wrong tool on it, right? Okay, so key takeaways before I get to your questions. By the way, if you have questions, go ahead and lob them into the Q&A panel and Paul and I will do our best with them in just a minute. But this, my summary slide, an enterprise has many different types of data stores as well as many data stores of the same type. We try to limit that, but there are reasons. There are reasons and this extends to clouds as well. So now we have multiple clouds with multiple stores. And so it's not just our enterprise that we have to be concerned about anymore, our physical enterprise, that is. The reasons for this are many and I went through them. It is fully expected that enterprise environments would have a heterogeneous vendor that's that enterprise vendor I spoke of as well as other vendors for data integration, right? APIs have begun to replace old rumor, cumbersome methods of information sharing, talked about APIs, definitely avail yourself of that marketplace if applicable and then streaming and message queuing. Gonna be around for a while, able to meet real-time data volume, variety and timing requirements of the coming year. So another future-proofed item for you to consider as you are doing all that data integration in your enterprise to try to keep all that data in some sensible architecture. And that has been my presentation part and I will turn it back over to Shannon to see if we have any questions. William, thank you so much for another great presentation if you have questions for William or for Paul, feel free to submit them in the Q&A panel and which you can find in the bottom middle of your screen there. And just to answer the most commonly asked questions, just a reminder, I will send a follow-up email by end of day Monday for this webinar with links to the slides and links to the recording. Just everyone's quiet right now. I know we all got questions. We'll give everyone a minute here to log to type in their questions. So Paul, any observance from the William presentation? Anything you wanna add in or chime in on there? Yeah, definitely. Great presentation, William. Thank you for taking us through all this and I can tell that you have a tremendous amount of insight. You have the really unique perspective of being able to sit across a number of different clients and kind of pull out best practices and I always love hearing those. One of the things that we, you talked a little bit about message cues, right? For data and we do see a lot of people that are starting to do that where they're publishing sort of micro events onto message cues and they're expecting the data integration platforms to be able to pick those up and do something with them, which is a use case that we support at Matillion as well but another I think more interesting use case that we see and be curious William, if you see this as well, you mentioned that people are starting to have more and more components of their data infrastructure, data integration architecture than before. What we see happening is people wanting to write out to message cues, not with data, but with events that drive other orchestration around their platform. So you'll see people wanting to publish to Amazon SNS for example, or Google Pub Sub when a job is completed for an integration task or a transformation task. And then that alerts other downstream systems to take action based on the fact that the data set is prepared for analysis. I'm curious William, do you see that happening in your architectures as well? I think that's clearly something that's going to happen and it's something that leading edge organizations are considering today and doing because just moving data usually doesn't actually do the thing that the enterprise actually wants done. So it's picking up the intelligence out of the data and doing the right thing even automatically. And so yes, we have applications that are doing that. Yes, vendors are talking about it. And I think it definitely is a way of the future. All this automation, all this efficiency, anything to do with AI, anything that AI can do, which is a lot, I think is a wave of the future and something that leading organizations definitely should be thinking about now. So yes, we're seeing that for sure. I love it. And we do have some questions coming in now. Do you have any rules with them for choosing the right integration paradigm or product type? You gave a few examples as we walk through this but do you have a quick summary? I don't know that I have a quick summary, grab the slides because I think it might be kind of embedded throughout. When I talk about requirements for DI tools, which I guess I would say that your DI tool as we come to call them is sort of the default. So when do you need to break away? Well, if there are APIs, you definitely wanna consider that if it is streaming data and if you recognize yourself in that arena, if it's that volume of data, then consider what Pulsar or Akafka can do and add to that architecture mix. And are you aware of any data lineage automation that truly works? Yes, I would steer that questioner to Manta, M-A-N-T-A. That is the only one that I know of that is great about that. And I think it's an emerging market. So it works. All right, lots of product questions here too. And then how do you ingest and add the recording tools? And what's your opinion on those becoming the ingestion tools nowadays? Okay, so yeah, I have a fairly strong soapboxy opinion on this, I think, which is that if you've heard the series at all, I kinda say repeatedly to make the data scream out what you should be doing with it. So I mean, let's work on the data layer. Let's put our energies into the leverageable parts of the architecture, which is the data layer and have that data sitting in the, or created in the data layer that's ready to go. So you can slap any BI tool on top. I don't wanna mean, I don't mean that that's an inconsequential selection there, but truly you can slap any BI tool on top of it and it will get you the information that you need because you've already worked the data layer and the data has your calculations, it has your summarizations as needed. It has the granular data level of data, all the components of any calculation and so on right there. So yeah, you wanna do more work in the data layer and definitely data integration, you wanna minimize what has to go out to a BI tool hub type of thing. I think that's kind of an older architecture approach. Yeah, I would just plus one of that, William for sure. And one of the things that I think some people get buyers or more spores when they do all of their modeling in the BI tool and then they wanna leave that BI tool behind or they wanna reuse that modeling in other contexts like you mentioned and they realize it's not, that work is non-transferrable. It happens in the BI layer as opposed to the data layer. Exactly. I love it. Well, thank you both for these great presentations but I'm afraid that is all the time we have for. And thanks to Matillion for sponsoring today and helping to make all those webinars happen. Just a reminder again, I will send a follow-up email to all registrants by Anna Day Monday for this webinar with links to the slides and links to the recording. Thanks everyone. Hope you all have a great day. Thanks guys. Bye-bye. Cheers everyone.