 Hello and welcome my name is Shannon Kemp and I'm the Chief Digital Manager of Data Diversity. We would like to thank you for joining this Data Diversity webinar data ops, the foundation of your agile data architecture sponsored today by Data Kitchen. Just a couple of points to get us started due to the large number of people that attend these sessions, you will be muted during the webinar. For questions, we'll be collecting them by the Q&A in the bottom right hand corner of your excuse me in the bottom middle of your screen, or if you'd like to tweet, we encourage you to share our questions via Twitter using hashtag Data Diversity. And if you'd like to chat with us or with each other, we certainly encourage you to do so. To access and open the Q&A or the chat panel, you find those icons in the bottom of your screen for those features. And as always, we will send a follow up email within two business days containing links to the slides, the recording of the session, and any additional information requested throughout the webinar. Now let me introduce to you our speaker for today, Chris Berg. Chris is co-founder, CEO and head chef of Data Kitchen, a data ops software platform provider. He has more than 30 years of research, software engineering, data analytics, and executive management experience. At various points in his career, he has been a COO, CTO, VP, and director of engineering. Chris is a recognized expert on data ops, and he is the co-author of the data ops cookbook and the data ops manifesto and a speaker on data ops at many industry conferences. And with that, I will give the floor to Chris to get today's webinar started. Hello and welcome. Well, thank you very much. I'm happy to be here. And thank you audience for attending. And hopefully you'll find this interesting because it's not going to be a product pitch. It's going to be an ideas discussion. And so it's an idea that's really followed me through a lot of my career. And as Shannon said, I was a software engineer and a manager and wrote a lot of code. And then about 2005, I got the Brighton idea, Brighton shiny idea, I should do data and analytics. You know what agility was hard, you know, because in a lot of ways, things were breaking left and right and my customers were mad that the data was wrong. My team was trying to innovate. It was really hard to get new things into production. And so agility is actually really hard when you apply it to data and analytics. And by data and analytics, I mean anything, data science models, data. Anytime you're trying to rest inside out of data. And so the four patterns that I'm going to talk about today. The first is data ops, which my company focuses on. And we have a software product that focuses on that. And we're not going to go too much into it. And the second is a new term that analysts like Gartner and Forrester are talking about called the data fabric. And I'm going to talk about that and talk about why a data fabric alone doesn't lead to agility. And then I'm going to talk about something called the data mesh, which is a way to bring another idea of software domain driven to side into data and analytics. And then the third one is functional data engineering. And so in some ways, if we look at data ops and data mesh and functional data engineering, there are ideas that originally emanated in dealing with complicated software systems that due to the complexity of the tools and teams and data we have today, I think really apply. And we're going to go through an example. And at the end, we're going to talk about a little bit about how the data kitchen can help you make all these four things easier in your environment. And so the first thing is data ops. And so one of the challenges that I've noticed that a lot of data and analytics teams have is that we focus on our backpack full of things to do. I've got some data to integrate. I've got an algorithm to tweak. I've got a visualization to update. I've got to update my governance. And there's a lot of work to do. And we sort of go into work with our backpack full of these tasks. And so I think one of the challenges is that those tasks are kind of the stream in front of them, the river of work right in front of us. And I think that excessive task focus is really blinding us in some ways to the general problems that we have in data and analytics. Is that most projects fail? Is that we have too many errors in the data systems that we create and that get delivered? And lots of models and other artifacts just take forever to update. And so in some ways this task focus is blinding us to the real problem, which is something that's upstream of doing all those tasks. It's really about how you do these things, how you develop, how you deploy, how you iterate, how you monitor, how you test, how you collaborate. And these set of upstream problems are really a focus. And what data ops focuses on is four specific processes to improve. And the first one is, how do you lower your error rates in production? How do you make sure that you don't deliver one data or bad charts or a model that's off so your customers don't trust the data? And then the second is, how do you decrease the cycle time? What that means is how fast can you get something from a development environment into production that's properly regressed and doesn't have any problems once you get it in production? And then third is, how do you avoid meetings? We all have a lot of meetings and documentation and PowerPoints to do and how can you actually improve the collaboration with your team? So there's sort of less bureaucracy and you get more time to do fun stuff. And then finally, how can you show that your team is awesome? And so all those things about data ops, it's really about aligning the people and process and technology. And it's really focused on bringing rapid experimentation, low error rates, fast cycle time to deploy, and a clear measurement and monitoring. And in some ways, data ops is an agile focus. It's an idea that came from lean manufacturing, it came from softwares, agile and DevOps. And it's an idea, one of the ideas that we're gonna talk about that leads to having an agile team. And I may be biased, but I think it's the most important idea. So let's talk about a second one, and this has gotten a lot of discussion. And this is about something called a data fabric. And so as part of my job, I have to talk to industry analysts. I've gone off and talked to the forester analysts talking about data ops or data fabric and the gardener analysts talking about data fabric. And so I'd like to go into it. Because one of the ideas of data fabric is that by enabling a data fabric, you enable agility. And honestly, I'm not quite sure that that's the truth, or I believe it. And so one of the reasons I'm talking about data fabric is that it's a hot term. It was Gartner's, it was one of the most downloaded articles, the most searched term on Gartner last year. And it's getting a lot of press, a lot of companies are talking about data fabric, and there's different definitions of what a data fabric is that slightly differ from forest or the gardener, from your average person. And so in some ways, a data fabric kind of focuses on how to build a flexible, agile, agile and scalable architecture that will be able to supply data to humans or machines. It's a design concept, not a set of technology components. And so I guess I agree with that, but in some ways I think a data fabric is in some ways a renaming of all the data management components that we've already had. And if I look at it and say, well, what are those components? Well, we get data, put it in a database, store it, and maybe that database is Redshift or Snowflake. Maybe it goes through an S3 bucket. We do some transformation on it. Maybe it's in Traumatica or Talend or ELT. We catalog it with the data catalog tools. And these are basically, if you would ask anyone what your data management tools would be, it'd be like, oh, an analytic database, something that transformed the data, and something to govern the data. And you might throw MDM in there as a category as well. But what data catalog adds is really, or data fabric adds, excuse me, is two parts. One is it actually includes the idea of streaming, the streaming ingest in the pattern. And the second that it also includes data virtualization. So things that sit on top of more data. And I think that makes sense, because those are more and more common patterns in how people are building data and analytic systems. They have a streaming component, a tool like Kafka. They have a data virtualization component that allows them to sit on top of multiple databases and have a common semantic layer. And also do nice things like masking. And I think that's great. And so I think the analysts are right. One of the challenges that they talk about, if you read the literature, is something that I'm not sure I quite agree with. And that there's AI inside all these tools. That artificial intelligence is going to end up doing a lot of the work to be able to build that semantic layer, to build data governance, to transform code, to load code. And I think it's sort of Dangerville Robinson. I started my career working on a project in NASA. I have a master's degree, or I studied AI. And I think AI is great, don't get me wrong. But there's degrees of AI. And one way to think about when the analysts talk about this sort of AI inside is think of levels of AI. And so we've all heard about self-driving vehicles. And you're going to be able to call a phone and pick it up. And I think a lot of cases that they may work, but there's different levels. And so level one is kind of, it's a sunny day. You're on a freeway. There's not a lot of construction. You've got to keep your hands on the wheel. And we actually have that now. And it's kind of cool to see. But I don't think I'm going to be taking any autonomous vehicle here in Boston in the winter during construction on a slushy wet day at night. That's just not going to happen, at least for the near term. And so I think we're at sort of level one of AI and the data fabric. There could be some things. Like you could get a new data file in and inspect it for an additional column and automatically update the columns. And I think that's great stuff. But don't expect that all the stuff is going to be done automatically and that your data fabric is going to sort of pour data in, get data assembled in a way that's mastered and analytically friendly and well-governed without a lot of people doing the work. And so I think that goes to the point here is that we're really building systems that people work in. And so AI plus kind of new tools does not equal agility. And so that's really the question here. It's like, well, Chris, I regard her, they're smart people. They say I buy a bunch of new tools or I move to the cloud and I get agility. And I just don't think that. And I just don't think that AI and adding AI magic dust on top of it means agility. I think really agility comes from, it's a how problem. It's about how your people work, the patterns that they work in a sort of architecture that follows data ops principles. And so this may be get me in trouble with the analysts, but I just don't believe new tools give you agility. And they may be add some things to your resume and you have to start working in a different way. And one way, since this is a technical architecture discussion as well, is we've got to think about our data architecture. And so in a lot of cases, I see data architecture diagrams that are something like this. There's source data on the left, there's consumers on the right and there's a chain of tools that go in the middle. And the buckets may differ, but it's put in a lake and there's some data engineering tools and some refined data and some data science and data visualization and governance. So it may get into S3, maybe you get into AWS Glue, it goes into Redshift, you may use data robot to do this transformation, you may use Tableau to do your visualization and you lose calibre to do your data governance. And so it's a chain of tools. And this is very focused on production. And I think that's fine, but one of the most important things is that you don't keep, nothing is really static. You have to build in the idea to change. And so I think the addition that I'd like to see to any data architecture is make the rights or repair, the right to change as part of the system. And so what that means is take your production environment and maybe you call that a data fabric or maybe you call that the tool, but think about how to run all these pieces together to get end to end visibility. So if something goes wrong, you're not having two hour meetings trying to point fingers that you can point to the exact spot of the data or model or visualization where it failed. And that means testing and monitoring and observability, but also think about how you get things from a development environment into production, but also really think about what does it mean to properly regress a system of data and analytic tools? And what I think, and I think what the industry is coming to believe is that you're really taking, you wanna see the end-to-end view. And regression doesn't mean that you throw it over and someone does manual tests, but if you can automate the regression, automate the end-to-end testing across all your tools, you have a much higher likelihood of being able to move things quickly from dev and the production. And then also in order to properly regress all the tools and all the data that you have, you need to manage your environments. And so this idea of build the change because you may decide to have a new tool, the code that's driving that tool may change and you wanna be able to make this happen. And likewise, a lot of organizations want to work in more than one environment and there's security concerns. And so a product like Data Kitchen has sort of a hub and spoke architecture where you can run it both in one data center or have separate segregated dev test production because the whole idea is can you move your work? Can you take that idea from your data scientists or data engineers head? Have them apply it to data and get it into production quickly without finding any regression or functional errors and at least have an idea if it works in order to get feedback. Because that's what agility is about in a lot of ways. Being able to maximize the amount of feedback your team gets so they can learn more and do that in a way that doesn't cause a lot of technical debt, doesn't cause a lot of problems and doesn't create a lot of governance issues. And so the last thing I wanna say about sort of data fabrics is that our data fabrics are in a lot of cases are in more than one place. So a lot of you on the call may be fortunate that all your data is already in Amazon or Azure or GCP and you're working in one environment. Most companies that we come across are kind of halfway. They'll have an on-premise sort of Teradata environment or SQL Server environment and they're halfway in Azure. And how do you make it work when you've got two different worlds that you're living in? And our transition to the cloud for most organizations is not a week long process. It spans a year and some of the toughest situations I've had in my engineering career is running two systems that are supposed to do the same thing at the same time. And that's just very hard. And so my heart goes up to everyone who's transitioning to the cloud and also trying to think about how you do this in an agile way is tough for a lot of people. And I think the idea of data ops can help. So lastly, let me just close out on the idea of data fabric. So it's a hot word, Gartner, Forrester are talking about it. And so what's a data fabric? Well, it's all the tools that you normally use, right? And it's got some fancy new stuff. You know, data virtualization, it's got streaming and batch. Plus it's got this AI component. And so as I said, it's a little bit of magic fixing dust now. And so I think that its goal is to drive agility. And so I don't think agility is a direct effect. I think it's a second order effect from better tools. And I think the primary driver is about people and process and architecture following data ops and perhaps following these other two ideas that I'm gonna talk about, a data mesh and a functional data architecture. So I don't mean to dump on data fabric. It's a good idea. And a lot of people are doing it, but you don't get agility just from new tools. And you get agility from building a system that allows your people to work in an agile way. And the technical characteristics of that system involve data ops and a data architecture. And increasingly it has this idea of a data mesh. And my next section is gonna talk about this other sort of mega pattern for agility in addition to data ops and data mesh and a functional pattern. And so I'm gonna go through this one and I apologize if this is a little bit fast for people. And there's been a lot written on data mesh and data fabric, but I'm trying to give you some highlights. So the first idea of data mesh is that building systems that take data, transform data, analytic systems is hard and it's complicated and it's complexity. And in general, centralized complicated systems fail. And so one of the patterns of centralization is having too many people work on the same thing at the same time. And so there's this thing in software, the Mythical Man, add more people and you get things done slower time. And so that, and another idea is that knowledge of the data, knowledge of the domain the data works in is actually important to the success of teams working together. And that sort of universal one size fits all patterns tend to fail. And so it's actually inspired by a same idea in software called domain driven design. And that there's gotta be, if you've got a really complicated centralized thing, the logical step is to say, let's break it into smaller units. And let's not have everyone work together. And so that's another idea is that having, if you've got a team of 50 gate engineers, you may think like, okay, I've got 200 tickets to do this next month, I'm just gonna sort of assign them to different people on the team. And so, and then some people does the data work, some people does, they toss it over to the model work, some people toss it over to the biz work. And what happens is that coordination between the teams, either from that people aren't perfectly interchangeable or the handoffs actually make it hard. So what if we took a different paradigm and said, you know what, your data engineers are not perfectly fungible. They can't work on everything. And what if they actually had a different way that they knew of all the data in your company, maybe you've got a hundred data sets, maybe you have a team of data engineers that know 20 and another team of data engineers at 30. And they understand it, they get to know the problems, they know the problems and the people who provide the data and they start to think about it in a different way. And that's the idea of a product thinking. So instead of being a project where I get a task and I jump onto every one of those hundred data sets, I dig into those 20 and I want it to improve and I want my customers to get better. And so this idea of having a team of people work in a domain, which is think of it as a grouping of data sets and we're gonna talk about what that grouping pattern is organized around that. And so the data domain is a product. You're building data sets, visualizations that go to specific customers. And so data engineers and in fact, people are doing data science, must live and work and understand kind of a finite number of data sets to really understand value. And I think that really helps. And I've just noticed this in my career. It takes a while to learn a data set and to learn its problems and basically to understand it and this idea that you can sort of take any random data set and then profile it and then automatically understand how it should go is being a useful format for your business customers proving to be not true. And I think one of the reasons that data mesh is popular is that if you can empower people to service their customer in a way that is focused on their needs given the data set that they have is good. And so if you talk about it, this case of like, well, how do I kind of divide the domain? I've got this data lake, we've got a hundred different data sets. Well, how do I like kind of put them together and what's the edges of my domain? And so that tends to be the key objection here is that, well, how do we, you know, we've got a hundred data sets. How do I like group them up? And I think there's a couple of ways to think about it. One is that the domains could be aligned with the sources or the types of data. And so they could be data sets that are around masters and it needs a business, subject areas, customers. They could be around sources of data. You know, what are the facts in the ground? You know, for instance, things like web logs or interaction history. And then they also, the domains could be the exact opposite. They could be aligned with the consumption of data. You've got four of that hundred data sets. They're all integrated into one set of facts and dimensions in aggregate views or it's around the product that's being used in. And so there is here a way that it gets complicated on how to carve the pieces of each domain off. And if we start thinking about it in a more abstract way, and this is the sort of abstract thinking is very reminiscent of software. We start thinking about that domain and like what's in a domain? Well, it's that data that you're getting. It's the artifacts that are created from the data. Models, views, reports, integrated data set. It's the code and the tools that are acting upon the data to transform it. It's the team used to kind of create, update, run that domain. And then it's metadata, catalogs, lineage, test results, processing. So it's a bunch of stuff that actually go into that domain. And one of the challenges here is, well, I've got one domain. How does my other domains relate to each other? And that's interesting because the output of one domain may be the input of another domain. And let me say that again. I'm building an integrated data set and the results of that integrated data set may be fed in as a source to another domain, which also integrates other data. And I'm gonna show you an example to be concrete, but they have to be composable and controllable. And so it's not silos, think of them as components, Lego bricks that are being pieced together and thinking intentionally about how you design your domains and how your domains work together. And so if we go in a little bit more into each domain, you wanna think about on a domain sort of, what's the width with what raw data or other data that comes from a domain. And I'm gonna talk a little bit. It says, this comment says, hopefully immutable. And I wanna talk about what immutable data is in the next section. That has to do with more functional engineering. The when, what's the test results? When was it processed? When was it last updated? The where, like how do I get at it? The what, what's the dictionary? And then the how, what steps were used to process it? What's the code configuration? And so ideally these things should be URLs, like get at it through a JDBC connection, a URL to get at the Wiki or the data catalog, a URL to get at all the steps that were used and the timing that it was processed and then a URL to get at the code that was used to work on it. And all these domain interfaces actually help teams work and compose their domains together. And so if we look at it a different way, sort of like what do you want out of domain? You know, from a customer standpoint, they want data and things in a data that are trusted that are usable by me, who's a customer of it, that's discoverable and findable, that's understandable and well-described, that's structured in permission. That is has APIs, URLs that are driven to it. And in some ways, I think the customer, in my experience, wants kind of a single throat to choke. You know, if something goes wrong with the data or they got a question, they want to know who to talk to. And that partnership between people who are building the domain and the customer's ability is really important. And lastly, it's kind of a change in focus. You know, that your domains, instead of having 100 data sets and 50 people who are all interchangeable, you're breaking those up into different groups with different people who own and work in that domain. And you're building a set of services that can be componentized. And it's a bit of a decentralization. And you know, instead of having each team as a degree of autonomy, and that's sort of an idea that comes out of Agile and Agile software. And it really ends up being more of an ecosystem than a platform. And so this way of breaking data sets down into more domains and working on them, I think is a very effective thing. And so I'm going to give an example. And actually it's an example I know very well because I've been working in it for, and kind of doing it for the last 15 years. And one of the ideas of in software is that they're kind of actually end up being good common sense. And I really appreciate the people I thought works giving it a name because it's a great name. But in some ways I've had to work with a lot of commercial pharma analytic data over the past 15 years. And in there we've, at default, had to work in a domain-driven design just because of the needs of the customer. And so just as a background to pharma analytics, in the United States there's patented pharma products. And so you go out of drug discovery, you get a patent and you've got a bunch of years where you can make a lot of money. And hopefully you take that money and use it into creating more products depending upon your politics. But what happens is the curve of growth of the sales of the product is actually the total amount of money. And how fast it happens during the launch phase is actually determines the value, overall lifetime value. So products have a launch phase, they have a growth phase and they have a mature phase. And usually in the mature phase you're trying to compete on price through discounts within the U.S. at least you're insurers. And so these domains are called NPP for non-personal proportion, physician and payer. And honestly they actually have a lot of different data sets that go in. And here's an example on the left. Some of them come from data that pharma companies purchases. Some of them come from Viva, which is a CRM system. Some of them come from internal small data sets like rebate data and product hierarchies. There's a whole rich, multi-billion dollar industry in this providing data. And so some are big, some are small, some come every day, some come every week. But they all really are about these in general in the U.S. entities like physicians and insurance companies or payers, products, et cetera. And so the teams that work on those, there tends to be actually different teams who use it. So during launch there tends to be kind of the NPP domain where you're actually interested in providing radio ads or TV ads, that's where sort of non-personal promotion. And of course it happens after launch, but it's a big important before launch. And then if you've ever got it to a physician's waiting room in the U.S., you see usually one very good looking man or woman who's sitting there waiting to see a doctor. And those are called, they're out of detail and they're salespeople, and that's the physician domain. Primarily one of the ways to inform physicians of the efficacy of your product is to have a sales rep going on. And then lastly, the payer domain, which is really about the price component of a pharmaceutical product. And these have different teams. They have different marketing functions. And largely the data sets are about the same thing, but they come from different sources and have commonalities. And so a number of companies and what we've done is you imagine there's sort of different domain layers. And so one domain layer is things of how you master things. So for instance, there's a million physicians in the U.S. What's the master set for that company? And maybe they're called on the U.S. is only 40,000. And then there's the integrated data sets about all the different facts and dimensions and tables that people use. And then there's sort of the self-service layer where a lot of analysts use the data and sort of mix in their own data set and develop different models. And so these are actually really interesting coupling between the two. And so if you think about on the left hand side there's these raw sources of data which are a lot of cases about the same thing, physicians and payers and products. And then there's sort of this mastered world of a physician, MDM, target list, market basket, small data sets. And then there's the integrated data sets that are about the physician, but not exclusively about the non-personal promotion, but not exclusively about the payer. And then there's the tools that are using with the brand team or the field sales. And there's these processing relationships between each two. So for instance, when the AMA, the physician list is updated, that may update your master, which may update both the physician and the payer domain, which may update your brand team reporting. And so there's this coupling between updates on any system that has to be considered. And so if you like domain-driven design, like the idea of having a small team to be agile, you really have to think about how your team relates to the other teams work. So you have to pretty intentionally think about the relationship between each one of these domains and how they interact. And so in a lot of times, this is there's different technical subsystems here running. And so there may be a data mastering tool here. So Viva's got a data mastering or Informatica's got data mastering. There may be a ETL and a data process here. There may be a Ulteryx and Tableau here. So there may be different tools. And so if you get this right, and you're able to work quickly, the teams actually end up knowing the data very well. They're able to work with their customers, the analysts and the end customers very quickly. And they end up being incredibly productive. And so like some of the results here of like, that we've seen is having a very small number of people, you know, supporting like a team of about seven or eight people supporting over a billion dollars in sales, integrating hundreds of data sets with hundreds, 50,000 now, 100,000 automated tasks and greater than a hundred. And this is true schema or data changes per week. And so that rapid velocity with low change in a data mesh that sits upon whatever data fabric you want is really the benefit of combining these things. And of course we've got some software that makes that work. But this is the, you know, one of the patterns is data ops, another one is data mesh. And then the third pattern that actually was applied here that I'm gonna talk about, and this one is not as popular, but it's one of my favorites, is more of a functional data engineering. And so again, this is another complete rip off of software. And so in software, there's kind of think of as two major ways to structure your programs. And one is called object oriented, where the entities in your software are kind of things that represent the real world. And so those things have state embedded with them and they have actions that happen upon them. And when I started programming, object oriented programming was all the rage. And objects, and actually one of the inspirations for this talk was a book I had, which was a book in object design patterns. And so lately people have been saying, well, objects are great, but there's some problems with it, especially when you have a lot of complex distributed systems. And maybe another way to do it is to actually work in a more functional way. And I don't know if you remember fourth grade or third grade when your teacher started to talk about multiplication. And they said, there's this function machine. You put in three, it adds two, and you get five. You put in three, it's times two, you get six. And no matter if you put in three, you always get the same answer. And so in a lot of ways, that way of working is the same in software. And you start with something called immutable, which means it never changes data. And so you put in three, you're gonna get out six every time no matter what. And when it runs, it doesn't actually change the input. When you put in three, you get six, but then somehow the input is attracted by two. And that's where the complexity of change in state is a challenge in software as a challenge in data. And that's really what some people call pure functions. You put in some data and you get something out. And another world called item potency, which means you can run the whole thing over again and get the same result. And so there are these words immutable, pure item potency, which are kind of fancy words, but really it's just function machines that you learned at third grade. And really this function machine just runs and it doesn't break anything. It doesn't touch the inputs anywhere. There's no side effects, or when you run something through it, it happens, you understand the way it works. And one of the ideas of a functional approach to analytics is that it's reproducible. So imagine you put in the same data and you always get the same result. The model's updated, the data sets transform the visualization because you start with immutable data and you always process it. And that sort of reproducibility is kind of a foundation to data science. And it can be sort of critical from a sanity standpoint. And the second is things are just complicated. And so one of the ideas here is to take a functional approach to reduce complexity. And you know what? We're all in the cloud and disk and storage is cheap. And so what that means is throwing around a 10 terabyte or 50 terabyte database, rebuilding it from scratch is not an issue anymore. You can throw 30, 40, 50 processors on it for an hour and get the results. And you don't have to talk to the Oracle salesman anymore. You can have multiple databases. And so this idea, and it also fits the pattern of emerging the cloud of having an object store to store your raw data and then the process data. So one way to think of this is you load your raw level data in a bucket store like S3 and your whole data and analytic process is one big function. Takes that raw data, does the transformation, builds the facts and dimensions, builds the report, supplies the model, updates the governance. And what that means is it becomes very simple to recreate what you do. And I think this really helps in a data mesh because think of the processing that happens in your data message when a big function and it's your transformation, your model, your code and rerunning a task from the same date on the same data set gets the same result. And you can also when things don't go wrong because things will always go wrong, you can rerun a task. And if maybe if you break it, you can repair it and think about having a big red green light. And so with the cloud, it's possible to reprocess data a lot. And there's some cases with this may not apply, you've got 10 petabytes of data and you don't wanna spend a week reprocessing but for a large portion of business data, it's terabytes in size. And I can have terabytes on my laptop now. And so running this in a functional way reduces your complexity so that you can better able to apply your skills to adding new features. And so, and they're just easier to test and deploy. And so like if you look at the top here, it's the production team, I've got production data running on my domain. And then my development team takes that same data. Maybe it's an update of the data and I run it through. And if everything passed, I can just flip it or I can run it in parallel for a while and kind of a red green deploy and see if it works. And so the functional pattern makes it easier for people to create parallel versions of it. And it's a more safe, more controlled, more complexity reducing process. And so, so these three patterns, data ops, data mesh, functional data engineering are great ways sort of applied on whatever data fabric that you happen to have running, I think are a way to achieve agility. And of course, there's more things into agility like the team process and scale Agile, excuse me that I don't have time to talk about. And so the last section, I actually like to talk about how our software helps this and give some examples. And so the first is we've talked about domains and the processing relationship. And if you build those domains, how do you update the data and how do you connect one step to another step? And then what causes the update of each domain? Is it a time? Is it orchestration? Is it an event? And so how do you manage this way to run all these, all these data mesh systems? And in some ways you need a master DAG, a directly signal graph, a master sort of a way to sit above all of it, a meta orchestration. And so let's just talk about the ways to think about how to communicate the domains. So one domain runs, and then let's say the example is another one's got to start. So one way is the second domain could ask the first domain. Hey, when's the last time you were updated? Are your test results right? Is the data good? Can you prove it? You know, you could query one domain or the other and see the last time it was updated. Another way is I've just done my building in batch and now you go, it's a process linkage. Or another way is a process linkage with parameters. Here I'm done, and you know what? Here's my data. It's in this new spot. You go ahead and do it. So a process linkage with parameters. And then another sort of design pattern that some organizations are doing in architectures is event-based processing, where instead of being batch, it's when one thing is done, it creates an event and puts it in a box like Kafka or a PoemSum system. And then there's subscribers. And that could be the way of doing it. And then there's the obvious case where data in one domain is being consumed by another domain. And then finally, there's the development linkage. You know, I'm making a change within my domain. I wanna be able to use what you've done or perhaps even change what you've done in order to make the new functions work. And you know, can I see the code you use? Can I modify the code? Is there a path of production? And so these inner domain communication needs are actually incredibly important for how you actually compose your domain-driven systems. And so one of my concerns is that with this new thought and we've got some customers who are working in data management, they don't think intentionally enough about how you make this work. And our software actually does a lot of these things. It doesn't do the data linkage, but it does all the sort of process and event and domain query and development linkage. And so I think you're gonna need either to build or if you're doing data mesh, you use a software tool like Data Kitchen to support them. And so let me go into another case where our software can help. And so you've got sort of a, on the bottom you've got a development domain and on the top you've got a production domain and there's a whole bunch of domains. You've got to change it. And so when I change something in one spot, so let's say I've changed my MDM and I've grouped three physicians together or I've changed this, maybe it's not MDM, maybe it's just like a small file. It's a product hierarchy. And I've changed my product, the master of product hierarchy. Well, that's gonna ripple across reports all the way down and the data sets and the reports. How do you know it's right? And how can you tell beforehand that if you change that one small product grouping file that everything doesn't go to pot? And I think you need to think about how you develop the change in those domains and then how you deploy it. And how you manage to find that sort of local change with global governance and a local change and being able to see the end to end effect of that problem. And so the way that our software helps is we of course have this thing called a recipe that actually sits on top of, it can be in a domain itself or it can sit on top and or it can sit on top of all these follow domains. And it's sort of a think of it as an intelligent, test-informed system-wide production orchestration either within a domain or across domains. And you may have other tools like Azure Data Factory or Airflow and that they're sort of, they're kind of the idea, but they're not quite enough. And so let me go on to the next slide and just think about these domain interfaces as URLs. One of the things that our software provides is sort of the when and how. We don't have the data linkage for not a tool that does a data catalog, but these additional things can be shown up as URLs, our order-run information and the recipe information. Let's talk about the how and what. And finally, this idea of thinking of composition of domains, right? How you can make this work. And between the features in our software, the ingredients, the kitchens, the recipes, if they don't make you hungry for lunch, they are components in how you build a composable data ops, domain-driven design and functional system. And I think this is actually the combination of these three things. It's a very, very powerful way for your teams to achieve agility. And so, the new tool chains out there with a data fabric, the patterns of data mesh and functional engineering are really great new paradigms. And I think data ops is incredibly important. And of course I'm biased, but I think it's the most important part. And because you need to be able to sort of wrap your data matching, your functional system in these linkages and compositions. And sort of central process and governance is important. And I think our software has actually been built with this and we've been worked, since we've been working in this way for years, you know, I think it's, we've actually had to build and follow these three paradigms. And so, lastly, here we're getting near the end. So, I wanna leave some time for questions. And so, not to be too salesy, but you know, our software does this stuff and it can help you do it. And so, come to our website, we've got a great platform and we'll certainly can help you with this, not only with how to achieve data ops, but also how to implement a data matcher applied to functional data engineering principles. And then the second is we've got a lot to learn. So, we actually just got a book called the data ops cookbook that we've given probably 15,000 copies away. Just last week, we released a second book about how to transform your organization called recipes for data ops success. And if you're interested more in the data ops idea and you don't have time to read, you know, 300, 400 pages of books that have a lot of pictures and actually not that bad, not too much marketing fluff, you can just go to the data ops manifesto and talk about it. And so, I'm gonna stop there and start to answer some questions or feel free to put them in the chat and we'll take it from there. Chris, thank you so much for this great presentation. There's questions coming in already and just to answer the most commonly asked questions, just a reminder, I will send a follow-up email by end of day Thursday for this webinar to everybody with links to the slides, links to the recording and anything else requested. So, diving in here, how the data mesh concept relate in e-commerce? For example, it has customer products, orders, inventory, et cetera. And there is analytical needs across domains and et cetera. So. Yeah. Well, I think it's interesting, right? Because that's one of the challenges in domains that's sort of where, how you organize your work and do you organize the domain by the customer need? So in e-commerce, there's of course people who do merchandising and they're trying to optimize merchandising and sales. There are people who are trying to optimize the websites for speed. There are people who are sort of financial and looking at the costs. There's people who are looking at the suppliers. And in some ways the data is overlapping a bit or the same. And so one of the ways to think about that is how one way perhaps to organize in e-commerce analytics is to have domains focused around specific customers, merchandisers and marketing and sales, suppliers, finance and another, and maybe your web development group as a third. Another way maybe depends upon how the data is coming into your website, right? If you're getting data from your website and shopping cart abandonment, maybe that is another way to group your domains. And so it takes, I think there's, it takes a perspective on how to do this. But in most e-commerce sites you've got dozens of data sets flowing in. In your website, your shopping cart, you may have marketing lists that come in, specials that come in, product lists. And so how do you manage all those and build integrated data sets? Because honestly one of the biggest challenges in dealing with any e-commerce or marketing is the rapid number of questions that your customers have. And so the whole point of agility is being able to rapidly give them insight and do that in a way that doesn't kill your team and it doesn't in a way that you don't build a lot of technical debts. And so the idea of data mesh is to say, well, one big centralized system doesn't fail. And the idea of data ops is says, well, don't just work on the task but build a system that allows you to iterate and run with low errors. And so these patterns can actually be helpful in being able to accomplish your core job, which is giving insight to your business customers. I love it, thank you. So does the data mesh approach envision also different technology by domain, different governance or even different analytic stacks altogether? That's a good question. And so there's a debate on having different, so if you look at the software industry and their application of domain driven design sometimes has this term a microservice and in some organizations they've gone so far as to have allow the people who do the microservice to pick their own tech stack. And what happens is you end up with like one team does everything in Python, the other one does it and go, the third one doesn't elixir and the different languages and different stacks. And of course, as a manager that may not be the best thing, because if they're different tech stacks people have to learn not only a different domain but a whole set of tools. And so I don't think there's anything inherent in the idea of following these patterns of data ops and data mesh. That would say you can't have different tools for each one, but there is these other more organizational concerns about having lots of tools. But even today when there's not a data mesh there's still a lot of tools running on organizations and most big companies have appetizers and ways to actually transform data or visualize data and have a dozen different tool sets. And there was a question that came into the chat earlier it says we are doing this. One of the issues we are facing is the intersection of data domains, this data set which involves data set X which involves domains one and two go into domain one or two. Yeah, that's right. And I think that's a very common case, right? You have data sets that are shared across both and then how do you handle the relationship? And so one way is at least the pattern we ended up on is like if you think of that was the master in case. And especially with small data files that need to be shared across data sets like groupings of products or other small data sets that end up kind of being have big effects when you don't get them right. And so that's why the idea of thinking of domains as composable and connected is important. And so let's say you have a small data set that's fed into two different domains. Well, perhaps that small data set is a domain itself and that needs to be updated independently of other ones. And then when the two downstream domains need to be able to perhaps interrogate it to see when it's last updated, see if it's been tested or if you're processing you need to be able to say, okay, I've updated this product hierarchy in the mastering domain and now that's gotta go into two different domains at the same time. And so that's where when you really start thinking about domains and its composability, its components, the order of operations of working yet, there is some complexity in that overall meta orchestration and coupling of the systems that I think needs to be addressed. Love it. So if you have questions for Chris feel free to submit them in the Q&A section. We've got some time for a couple more questions. Speaking of, so there's a question or a request if you wouldn't mind sharing the link to the Gartner's data fabric report. Sure, sure. Yeah, I can't share the report itself but I can share the link to it. Yes, awesome. Yeah, if you wanna send it to me or then I can put it in the follow-up email as well. And so yeah, I guess to summarize, like there is data analytics is getting complicated. Software's been dealing with complexity for decades. There's ideas about how to build the software. DevOps is one, Agile is one, the main driven design, functional programming. Those are great ideas that help reduce the complexity of building software systems. And you can pick them up and move them onto our complicated data and analytics systems and they can help and they can actually help drive agility. And so I've had the experience in working and there might be more. These are the data ops, data matching, functional data engineering principles that I particularly believe in are ways to help. And of course with any way of working you've gotta address the team, its ability, its size to work, their ability to handle the impact of new technologies. And if you're one analyst who's got one dataset, maybe one or two datasets, maybe doing a data mesh is the right thing, at least at this point. So Chris, how can a data mesh environment work in a smaller organization where a small number of groups or people may own many domains? Yeah, I think that's it. And I think I get the same question around data ops too. How is data ops only for big teams? And I think with any complicated system it helps to have a little design. And so maybe if you're two people and you've got 10 datasets, maybe one domain naturally is three and the other one naturally is seven. And you start working in a domain driven way right away. And in that way you're set up for expansion, you set up for growth. And more specifically you set up yourself to really understand the datasets and really understand the users of those datasets. And I think that helps. And I don't think one of the criticisms is that this ends up kind of just being a recreation of the data silos. And I don't think so because I think one of the aspects of this, if you follow sort of the tested data ops principles is that they have the knowledge of the data but it's much easier to bring new people in to work with that data. And so since data engineers are purposely fungible even with two people you can start applying, start doing automated testing, start doing automated regression testing, start thinking about the design and data measure. You can start thinking and working in a functional data way if the size of, you know, if you're on terabytes or tens of terabytes of data. You can work, and I've worked in these ways even when I've just been by myself and one other guy. And it's just, it's a better way to work. And once you work with these patterns you kind of don't want to go back. I love it. So, so how do you see data governance evolve in a data mesh? Do you see governance as central function or managed within the domain? Well, you know, I think that in a domain there's data and to make data useful we should have some idea of what's the catalog of data that people should be at. And so there's a question of whether that data catalog is, you know, you have a domain data catalog or you have a central data catalog where all the domains fit in and there's I guess arguments of your way but conceptually, if I'm gonna add a column to a table in a domain I'm gonna, that should register in my data catalog as being there with the metadata associated with it. And I should deploy that in a data ops automated tested way at once. And so I think there is that data about data is still amenable to the data mesh and data ops concepts. And I think data governance applies. And then it's just a question of how you manage that system whether you federate it or there's one central system that it's pushed into. And that's almost a practical question but really the update and the maintenance of the catalog the lineage is sort of done within the mesh. Makes sense. And so Chris, are these concepts mainly applicable for analytical systems? Yeah, I think this is the whole point. Yeah, that's the ideas. This is, you can, they're not just this isn't about software at all. This is all about analytical systems that aren't that are using transactional data that are doing things like ETL and data visualization and governance. They're the same terms. They're slightly different, you know instead of data mesh, data ops, functional data engineering the concepts come from software but you can apply them with a little change and a little new ideas to the processes that people use to get insight for data. All right, and I'm gonna sneak in one last question here. I think we've got time. We've got just a little more than two minutes. Thanks for the great talk. How would you advise a person to introduce data ops data mesh and functional data engineering in an ongoing data project? Yeah, I tend to think that the best way to do things is to find a problem and fix a small problem first and then iterate and improve upon it. And so the problem that you solve with data ops could be cycle time or error rates. The problem that you solve with data mesh may be more on when you're building a whole new plot and functionality and you wanna work in a more independent way. And the one with functionality may be the same. So I think it really starts with the problem and what problem are you trying to solve? And so in the second is do you have an opportunity to have a little bit of a green field? And also it also helps if you're it's easier to do but not impossible to do it's easier to do functional data engineering in the cloud where you've got these ephemeral resources that you can kind of spin up and spin down. So I mean that's a complicated answer but start small, get value and iterate and improve and avoid whatever you do, do not spend a year building a data mesh and let your customer see it after a year. The whole idea of this is improve agility and agility means you're giving early results to your customers and iterating improving. So build to iterate and what you build whether it has a component of data ops, data mesh or functional data engineering it's not important but build to iterate. I love that. That's great advice Chris. And that does bring us to the top of the hour and I'm afraid that is all the time we have for this webinar. Again, Chris, thank you so much for this great presentation. And thanks to all of our attendees for being so engaged in everything we do. Just a reminder, again, I will send a follow-up email by end of day, Thursday for this webinar with links to the slides, the recording and we'll get that link out to the Gartner Report which of course you need subscription to in order to access. But thank you everybody. Hope you all have a great day. Thank you Chris and thanks Data Kitchen for sponsoring today. Thank you much. Have a great day everyone. Thanks for your time.