 And here we go. Hello and welcome. My name is Shannon Kemp, and I am the Chief Digital Manager of DataVersity. We'd like to thank you for joining today's DataVersity webinar, a modern approach to DI and MDM sponsored by Information Builders, and brought to you in partnership with Speculus Media. There's a deep dive in continuing conversation from a World Transform podcast, which you can listen to at worldtransform.com. Just a couple of points to get us started. Due to the large number of people that attend these sessions, you will be muted during the webinar. If you'd like to chat with us or with each other, we certainly encourage you to do so. Just click the chat icon in the bottom middle of your screen for that feature. For questions, we will be collecting them by the Q and A section in the bottom right hand corner of your screen. Or if you'd like to tweet, we encourage you to share highlights or questions by Twitter using hashtag DataVersity. And as always, we will send a follow-up email within two business days, containing links to the slides, the recording of the session, and additional information requested throughout the webinar. Now let me turn the webinar over to Phil Bowermaster, the host of World Transform podcast to get today's webinar started. Phil, hello and welcome. Hello, Shannon. Hello, everybody. Thank you all very much for being with us. It's great to be here for this webinar today. My name is Phil Bowermaster. And as Shannon mentioned, I am the co-host of the Future Facing podcast, The World Transformed, as well as our special series, which is called Fast Forward on the World Transformed. Now in that series, we have conversations with thought leaders who are shaping our future through new ideas and new technologies. And speaking of new ideas and new technologies. Have you ever noticed how our thinking and our priorities and even our activities within data management tend to cluster around a few key concepts? It's understandable actually that it would cluster around these because all of these are critical, both to the overall success of our data operation and to the ultimate objective that we have of turning our data assets into business assets. And I think that would include probably everything that is shown here as well as some others. I imagine most of you could probably think of two or three others. And then on top of these key concepts come new ideas that we apply to our data management operation. And then we get new terminology that accompanies these new ideas. Now unlike the core concepts that we were just looking at, these terms are gonna change over time, right? So some of them are gonna become part of our permanent data management landscape. And then some are gonna fall into and out of favor. First they're hot and then they're cold. And you know which ones are hot because people just keep using them. You hear them over and over. In fact, it can be fun to, you know when you're attending a industry event or even hanging around your workspace just to observe how often some of these terms get repeated or even try to string five of them in a row play a little game of buzzword bingo. It's always a lot of fun. But when we talk about buzzwords, I think in data management, we talk about buzzwords especially over the past decade and a half or so there's one that really stands out. Call it one buzzword to rule them all, right? And that buzzword would have to be big data. Now some of you I'm sure are gonna say, wait a second, big data is more than a buzzword. Hold on, you can't just call that a buzzword. After all, didn't the introduction of Hadoop and the ecosystem that evolved around it? And then all those subsequent technologies like Spark and Kafka, didn't that all just kick off a whole new paradigm in data management? Well, I think that's a fair question. So surely there has to be more to the whole big data era than just hype. I think that's probably true, but there definitely was hype. And I think it probably tells us something that it's now been more than four years since Gartner removed the term big data from their annual hype cycle. So the hype may be over. The buzz has definitely died down a bit and that raises an important new question. If the hype is over, what's left? And that is going to be the topic for today's webinar. We're gonna be talking about a modern approach to data integration and master data management. We'll look at how big data has failed to live up to the hype even while delivering value to the organizations that use it. And we're going to outline a set of strategies for how your business can stop spinning its wheels and start leveraging the modern data management technologies and practices that we need to address three fundamental challenges. And those are mapping data from different sources to a single environment, addressing the longstanding disconnect between business people and technicians and keeping data in context to ensure that it is accurate and relevant. So those are not three new problems, but they're three problems that we continue to struggle with. And maybe this modern approach can give us some new insights as to how to solve those problems. And stepping us through these topics today will be Jake Freebald. Jake is the Vice President of Product Marketing for Information Builders. Jake, it is great having you with us talking about this modern approach. Take it away, Jake. Thank you very much, Phil. Thank you, everyone. I'm proud to be here. This is actually my first diversity webcast and I've heard a lot about this great community. So I'm very happy to be presenting in front of you. And I look forward to the conversation with you. I'm gonna talk a little bit about Information Builders approach to these topics. My personal ideas about them and Information Builders approach to some of the topics, just very briefly to state, if you haven't heard of Information Builders, we're a data and analytics company. So I'm gonna be focusing my attention today on MDM and data quality and things of that sort, data integration writ large if you will, but Information Builders as a whole also deals a lot with analytics and a lot of my use cases are going to be around that and the way that my thinking has been informed, I will fully admit has been shaped by a lot of those analytical style use cases. So with that, let's talk about a modern approach to data integration and MDM. And I'd like to really focus on three, the three major problems that Phil just talked about. The first one is data modeling. I'm gonna talk about how we spend too much time coping with slight changes in our business data and really talking about the way that ripples through our entire data ecosystem when it happens. When Phil mentioned business and IT alignment, we're really talking about the fact that we have trouble communicating between our business peers and our technical peers. We have those two classes of people have a difficult time talking and I'll discuss a little bit about why and maybe some things we can do about it. And then finally with respect to process, we see that very often we've lost way too much detail by handing off responsibility for that business data to different people. So that's really what we're gonna be talking about today and let's get started with issue number one which has to do with data modeling. Let me state from the outset, I know I'm talking to a lot of data professionals out here, I'm not bashing data modeling in any way. Data has to be modeled many times in order for it to be productive. But one of the things that we saw in the big data era is that people didn't wanna model the data before they captured it. They wanted to capture the data and use things like schema on read, which would reduce the amount of governance necessary and increase the amount of flexibility involved. So from a big data perspective, data modeling was not something we necessarily wanted to focus too much time on. But when we were building operational applications, when we are building operational applications that aren't based on what we're doing with the data after the fact, we find a lot of little changes coming in with slight change, coming in with any slight changes to our application logic. So for example, I like to use composers. If you're building an application that has names in it, you might know that Johann Sebastian Bach is easy to create given the rules that you've got around given middle and last names, family names, you can do abbreviations and alphabetizing appropriately. If you're smart, you instead of using first and middle and last, you use things like given middle and family so you can handle things like Chen Yi, who's an Asian composer who, obviously the family name comes first in those languages. So you're able to accommodate that pretty easily. Good job. We did a good first take on our data modeling. However, we quickly find things like honorifics. Ludwig van Beethoven is not a guy with a middle name to van. Van is part of the last name, but it's not used in alphabetizing. The B is what you alphabetize with. So you've got a different set of rules that come in when you make this accommodation for the honorific van and Ludwig van Beethoven, both for abbreviation and for alphabetization. And we see the same thing with patronymics, with Russians, you know, Dmitri Chostakovich's father's name was Dmitri, so his patronymic is Dmitriyevich and he gets called actually Dmitriyevich quite a lot. Yeah, you can't handle that like an honorific or like a middle name. And I don't even know a lot about classical music in the Arab world, but I'm telling you right now, those articles L and L are used differently from either honorifics or patronymics. So by coping over and over again, even just in this one operational application, with these slight changes in names that we've had to accommodate over time as we built out, let's say our worldwide presence or our ability to handle certain circumstances, I've made a lot of little changes over time that finally get me the point where I can do things like alphabetize and create initials correctly. These repeated changes in an operational systems row and column structure cause us to do things in an operational system that's designed for transactions that then have to be reflected downstream into things like your data warehouse. Data warehouses are designed for transactions, they're designed for abstractions. So all these things get exploded out into multiple tables and so on and they get handled differently. And then when I wanna do analytics, well that actually gets put into a data mart that's designed for analysis and that's managed in a completely different way using different structures like star schemas and so on. So the big problem, one of the big problems that we've had is that with this one operational application and the changes that have been made for it, I've got many different changes downstream of that in the data where I'm modeling for different reasons in different areas and that's caused me to have a whole lot of headaches involved in managing the data that flows from one use case to the next use case. And that's just for one operational application. When we do this with 20 or 30 or 100 different applications all coming together, it becomes quite a challenge. That's problem number one. Problem number two is all the stuff that I just talked about, IT gets it. They hear the word customer and they immediately explode it into 100 tables so that they can talk about customer effectively. They want the model in order to be able to manage the data appropriately. They wanna be able to govern that data and they're very, very suspicious of quick fixes which means when somebody from the business side comes to them and says, yeah this modeling stuff you're talking about is slowing me down, they're immediately nervous about whatever the business is going to offer as a supposed fix to that. And they're nervous that the business is gonna go out and buy some new application or some new tool, they're gonna do data prep, they're gonna do shadow IT that's going to be completely outside of any governance scope and it's going to in their belief and the business side's belief going to speed things up but in the IT side's belief it's going to make a mess out of any kind of governance that we try to put in place. And of course, you know that over time the business is going to say, well we've created this thing that's now mission critical so IT you take it back and start to manage it appropriately but it was never architected for that so we frequently have a hard time talking about what's important because we have different values the values that we have may have different language that we use so it's a major significant problem for us just in being able to communicate between business and IT to get the issues on the table that we need to talk about. And then finally with many of the processes that we do end up putting in place we lose a lot of information because we're diffusing the responsibility for that business data across many different processes. So for example, with operational applications that are being brought into a data warehouse you're frequently going to extract that information you're going to transform it, you'll cleanse it you'll standardize it all in some tool that's being used as part of an overall process. On the mainframe you might see some of those yes people use a lot of mainframe technologies these days still, you'll see that being dumped by using mainframe tools then you'll see that data being cleaned up somewhat and preprocessed before it goes on to the next step in the data warehouse load process. If you're doing it from a cloud frequently people will start to apply this transformation somewhere in the cloud and then push it down and then you start to pick it up in a more centralized detail process. And the problem that we have here is that these cleansing steps that we're going to be taking are frequently being done in different places which means you're firing different rules for cleansing in different places which means that that cleansing isn't actually going to happen in the way that it needs to. The metadata that's being used is different based on the platform that that process is being done on that it's occurring in. And perhaps most importantly is the loss of context and timestamps. You can't take information and say backtrack to what it was before and rerun history as if a different scenario had taken place. Now this is a true story. My mother and my wife have the same name. And that means I have to be very careful about who I text, what's to and what email goes to whom and so on and so forth. But one real challenge that happened recently is that my wife's email address ended up on my mom's credit card account. Now I think the less that these two women know about the financial situation of the other the better. But it took a lot of hassle to try to get my wife's email address off of my mom's account. And it actually happened more than once before they were able to figure out a permanent solution. I think we called back three different times where it was fixed and then it wasn't fixed again. It would show up the next month. That to me, that looks like a bad merge somewhere. And all they should have to do is do an unmerge on the master data that shows that my wife and my mom are not the same people. The problem is that by the time that happens there's actually a lot of downstream processing that has already taken place. They're lifetime values of customers. There are things having to do with credit reporting. There are things that have to do with what kinds of offers each of us is going to get. And from the context of the credit card company not only did it cost them some embarrassment and hassle and time and money spent with people on the phone and so on and customer satisfaction but it also caused them to have bad data downstream and they can't just roll it back. In a perfect world, you'd roll it back put in a rule saying Sue and Susan are two different people never the twain shall merge again and then rerun history so that all of that history would look like it was supposed to have looked in the first place. And in most cases because the processes are fragmented the rollback is completely impossible. One other example of a place where that might be useful is being able to say I've got five regions in North America for my sales organization and I wanna see what it would look like with respect to people's quotas if I now had three sales regions in North America. What would things have looked like if the business that was done last year fell into the buckets that I want them to be in next year and now you can have a baseline to roll on saying here's what sales quotas should look like next year for those people. So having this fragmentation means none of that's possible the processes that load that warehouse become too difficult. So Phil talked about big data how was a big buzzword and it didn't live up to its promise but it wasn't nothing either we learned a lot from big data what kinds of things can we learn from big data that would help us prevent the mapping of one set of rows and columns to another. And one answer is that you'd use post-relational data that was a big part of what we did with big data is we took instead of rows and columns we took document models and we took objects and we instantiated those in Hadoop let's say and made it possible for us to look into those things in a more subject oriented way. So instead of talking about customers in terms of a hundred different rows and columns we'd talk about customers in terms of subjects and what the object looks like or what the document looks like that contains the information about that subject. So a modern solution for doing the kind of work we're talking about now would include capture transformation storage perhaps and exchange in documents that are subject oriented in a post-relational sort instead of relational models. That's one key element that we have seen come out of the big data history that we learned. So now instead of moving sets of rows and columns you're moving the entire subject from the operational application to the data warehouse to the analytics. Now that's only one piece. We also lose a lot of information from context as I said. So how do we deal with avoiding the distribution of information the diffusion of responsibility by the fragmentation of those ETL processes? And one thing that we learned from big data is it's often more efficient to pick up data from all of the data that's available from a given operational application and just dump it directly into your Hadoop implementation let's say. So in that case you're not taking a change data capture of the system you're not doing an extraction transformation load process you're actually doing a complete pick up and move just to get the data in place. And then you can start to do things like look at what needs to be changed and what has changed since yesterday and what rules should be fired and so on. But what really important part of that is that you've captured all the data as it existed at the time that you took it. There's no question about what data was in that application at the time because you took all of that data and you saved all of it and you time-stamped exactly what it looked like at that moment in time. So you know exactly where those changes took place. You've been able to, you can do things like apply rules to its business, for example, data quality rules and data mastering rules, various forms of data integration rules and you can apply those rules to the data in the store where it landed. So you see both what the data looked like as it existed at that moment in time and every single step in the process that took it from how it was to how you wanted it to look. That is what gives you things like that complete auditability and the rollback capability and so on that we talked about that I talked about a few minutes ago. So you can't do that using typical ETL processes in the data warehouse. What we learned is that ETL capture and integrate process happens better in some kind of a data capture and transformation hub. That's quite a mouthful, I know it. And as a marketing guy, I understand that it's a terrible phrase to use, but it really describes what it is without any additional fluff. We're capturing the data as it lived at the time that it was generated or the time that we needed to capture it. And we're transforming it in that hub in a consistent way so I can see every single step that happened along the way. That is the kind of thing that you can only do in a data lake. It's not the kind of thing you do in an operational application. But it's the kind of thing that you do in a data lake that applies governance to that data in a way that previous instantiations of data lakes maybe didn't do so much where you would just dump the data into the lake and not apply significant amounts of governance to it at all. Very flexible but no governance. This applies the flexibility without having an issue with removing the governance that's around it necessarily. Nothing can be complicated, right? It's a very interesting process to take place because you are taking the data as it lives, you're taking it in a subject as opposed to as rows and columns and you're dumping maybe everything that you have to know about a customer into your Hadoop implementation or into your data store in a way that is designed specifically to be flexible. And it really is a document kind of a model. And that means that when you add something in an operational application, you could just capture that. It gets applied to the data capture and transformation hub and it just comes along for the ride as that change gets made. And then you start to apply the cleansing and transformation rules to it after it has already been captured. Which means that those applications and those processes that didn't need it continue to not get it. If you don't need a patronymic because you're not tracking Russian composers, then you don't need to have that patronymic exposed in your analytics. But in those cases where you do need it, you can start to adapt those processes pretty quickly and easily to be able to access that information and be able to use it appropriately in analytics or in other use cases moving forward without disrupting the previous set of analytics that didn't require it. So it can be complicated. It can be an interesting way of managing data. It's very much like what we used to call, what we still call in some cases data vaults with the exception that data vaults were designed to create systems where the data was captured as it existed and there was really no cleansing in mind or no integration in mind. This is like a data vault, but with all of the cleansing and integration aspects that you would want out of a data warehouse based process as well. So it's a different kind of mentality to some extent than you would have with either a data warehouse or a data vault or for that matter, big data on its own. So this is something that we looked at when we built out what we called our Omnigen product. We actually started this approach because we had built out an MDM platform, an MDM capability and the more projects we did, the more we realized that we had difficulty talking to business users about the data that we were going to get, the data that we were retrieving and integrating and so on. We had more difficulty talking to them than we wanted to because they weren't really engaged in the process. It was all seen as something that IT did. So it became very important to say, you know what, business, your job is to define what the subjects are. Your job is to define what a customer is or what a citizen is or what a product is and not define in this really complicated way that's going to make it so it's got to be fragmented into a bunch of different rows and columns, but instead in a way that just deals with the subject as it stands. And of course we know that that's not gonna be perfect on the first iteration because the business user defines what they want. They say I need the following things that define customer and then they come back to you after you've created something and they say, yes, it's what I asked for but it's not what I wanted. So they start to add more pieces into it or they say, well, this is really more complicated than I originally told you and being able to follow that iterative process became really important, perhaps the most important thing we did with Omnigen. So we built out that capability in order to make our own MDM implementations easier. And that also involved additional elements of data management that were also not necessarily front and center in these processes to begin with. So for example, if you talk to some MDM vendors, you'll realize they don't really have data quality but it makes no sense to integrate or master data that isn't clean. My last name is Freebald spelled F-R-E-I-V-A-L-D. Everybody gets that wrong. It's always spelled F-R-I-E-V-A-L-D, go figure. And if you're not going to fix that somewhere along the line then there's no sense in master moving because you're gonna have two different people, right? If you use that as any part of your key. So bringing data quality together helps business users define the rules that they wanted in order to get the subjects that they wanted. And then of course we realized that master data management isn't the only thing. We need to be able to capture instead of just mastered subjects what we consider transactional subjects in MDM stores. So this can be things like in insurance, it can be like the claims that people actually file or the clinical tests that they take. Or in PNC insurance it might be the claims that they file, the contracts and policies that they use. All of these different transactional subjects are just as important as the master data that's being mastered in order to govern those transactional subjects. So really capturing more and more of that in the store became useful for our customers. And that is really the genesis of why we did Omnigen. We struggled too much with MDM at the start and said, you know what? This is a problem that somebody should make software for and we were the people to do it. As we continued to develop Omnigen for the past number of years, we decided that one of the things we needed to do was capture the information in an automatically generated data hub. When I talked about capturing things based on subjects and I talked about bringing the entire topic over, maybe it's a document oriented view of a customer. I didn't talk about how that stuff gets stored. You actually stored in XML and JSON and something else. We don't force you to use any particular kind of subject, pardon me, any kind of document format, but conceptually everything we do is document oriented. And conceptually everything we do is similar to that data vault that I talked about before. But those are somewhat more difficult to manage. They aren't something that you can store as easily or manage as easily. So we decided we had to do something to take the process of adaptation to changing document formats, for example, and automatically generate that hub to handle it. So now you point your information, pardon me, you point your operational applications that are on-ramps and the on-ramp takes on the information, whatever form it takes, it can be JSON, it can be XML, it can be relational data, it can be whatever it is and will automatically generate the data hub that will store all of the information that you need based on what your business people have asked for. The master data needs to be business user oriented and subject oriented like I talked about. The data quality rules are built into the process and are applicable across any data coming from anywhere. So for example, if you just give me a data dump from a mainframe and you give me a data dump from Salesforce, then we're able to apply the same kinds of data quality rules to all of these zip codes, for example, that are in that or more complex data quality rules too, like handling different names and the way that transliterations occur across languages. For example, Chinese or Arabic languages where the transliterations might be different. We can apply those same kinds of rules across data from different platforms and integrate them all and you can see all of that happen in one place. Of course, it includes the master and transactional subjects as I mentioned before and the most important aspect of all of this is that because as much of this as possible just comes right out of the box, becomes something that you tap into rather than something that you have to build, the cycle times become really rapid. So the business user says, yeah, that's the stuff that I want. You come to them with something and you, they say it's what I wanted, but it's what I asked for, but it's not what I wanted. That cycle can take a matter of a couple of days or a week instead of a couple of months or a half a year. And as a result, we've actually taken cycle times, we've taken implementation times that used to go from, that were built out to about 18 months. This was the particular one I'm thinking of was six different business domains for a health insurance organization. And we needed to onboard information from multiple hospitals. We took an 18 month project plan to do multi-dimensional MDM, multi-subject MDM and brought it into a six month window and that included onboarding the clinical data from the hospitals. So by doing all of this rapidly, by keeping the business engaged, which they only do if they see rapid iterations on what's going on, by implementing a lot of best practices right out of the box, including things like data governance console that allows you to manage the data, see what's going on in the data, look at master records and see if they're correct, look and see if they need to be, there need to be merges that are done or unmerges that need to be done and so on and so forth. And by putting those in the hands of business people where appropriate, all of that makes the capacity to support large scale projects with lots and lots of different kinds of data and different operational applications being brought together. It makes all of that much faster and easier to do otherwise. Now, there are specific things that we looked at with respect to things like customer and supplier, where we did everything that we did with OmniGen, but we also built in, we used prebuilt models, like I talked about, I've been talking about customer for quite a bit here. It's one that's in almost every engagement we do. In fact, in many cases, you have a customer, even if you don't think of it that way. So for example, when we've worked with school districts, a customer is frequently the student. How is the student doing? We're looking at some kind of longitudinal study of whether or not people are graduating and what kind of grades they're getting and what the demographics are and how that shakes out. That's your customer in that case. And you can do all of that data gathering and analysis using a prebuilt customer model. Being able to link across the different models. So for example, customers and suppliers being able to figure out what SKUs come in from your suppliers and how those apply to the products that are going out to customers and therefore being able to tie that back to survey data that shows how satisfied they are. All of that becomes critically important as far as linking those models together. And then again, trying to make it built as much as possible. The data quality and governance, the match and merge, all of that kind of thing is critical as well. Finally, nothing is ever static. If there's one thing that doesn't change, it's changed, they say, right? And so all of these things require us to make sure that we've got a core view that is going to be the same, but then the extensions that you've got can go well beyond anything that we've ever dreamed up. And that means in things like, for example, our vertical application, such as Omni Health Data, which deals with all of this in a healthcare context, or Omni Insurance, which deals with a PNC context, or with these customer and supplier models. All of those things allow you to extend in particular ways that can then be upgraded and the extensions that you've made come along for the ride with the upgrade. So you're not rebuilding things as you extend the technology, just to give some concrete examples of how that's important. When you're dealing with PNC insurance, for example, some people, some insurers need to address farmhold. A farmhold is when you get more than one, say, family, maybe a brother and a sister who's married to someone else and therefore took that person's name, and then an uncle. And all of those people might have different names, but they want to farm collectively, they use the same equipment, they use some of the same facilities, and they want to get insurance on the entire farmhold instead of each one having their own policy. Well, that's a special extension that's necessary for people who do that kind of PNC insurance. And you wouldn't see it in your typical automobile policies. There's no such thing as an automobile hold or anything like that. So being able to support all of those different classes of consumers through an extension of that 360 degree core view, make sure that you've got the ability to support any kind of model that you need without necessarily breaking the original model with customizations that you felt couldn't be done before. Again, this is an example of the kind of thing that comes out of the big data mindset. We wanted to be able to just add stuff in and all the original things that we had created would come along for the ride and the extensions to what we did wouldn't break the bank as we did that. When we've seen, what we've seen from this is, again, a lot more value and a lot less time. We've seen the data management tools and ability to yourself a development environment taking 12 to 18 months. We can reduce that using OmniGen with all of the content that's out of the box, the MDM, the data quality, the integration rules itself, the remediation portal and so on, all of that stuff. We can shrink a lot of that down to four to six months. By the way, that brings along for the ride a lot of things that will normally be the first things to drop off. We all have seen situations, for example, where data quality was one of the first things that got lost in a data integration project or where the ability to provide feedback was dropped out or the metadata was dropped out and so on and so forth. All that comes out of the box so none of that gets dropped off the face of the earth. And then finally, when we do this Omni for Persona thing, either Omni for Omni Health Data or Omni Insurance or Omni for Customer or Omni for Supplier, all of those things provide that same stuff plus more which reduces the project timeline even more than I had previously talked about. So that is a bit about a modern approach to MDM and data quality and data integration. It is our mindset, so naturally it's gone into the products that we create and it's a little bit about the products that we have created with this technology and some of the benefits that we see from it. And with that, I'd like to ask Phil to comment and open the floor for discussion. Phil? Well, great stuff, Jake. Thank you very much. And we will take your questions now if you just want to submit those in the field shown there just to give us your questions and we'll be happy to try to answer those. I've got one to start with. Jake, as you look back over reminiscing over the passing of the big data era and leading us into this new era and you talked a little bit about what are the lessons that we learned from the big data era. I wonder if one of the lessons is what do we watch for the next time the next big thing comes along, right? What should we be careful about when the next new technology comes along that's going to solve all our problems? It's a really funny question. If you think about all the big things that were supposed to change the world, they all did change the world but not the way that were expected and a lot of them lost, right? So you look, for example, at Corba, which was great, theoretically, but pretty heavy and difficult to work with in reality. J2EE had similar kinds of requirements but that also was frequently dropped. Still out there, a fair amount of it's still out there but a lot of people have chosen to go with lighter weight methods. You look at web services, which we think of as something that replaced some of these object-oriented capabilities that J2EE has and the service orientation that Corba had and they were great but very often the first thing that people thought about was using SOAP if you remember, Simple Object Access Protocol, right? Which was something that people wanted to use to build out those service-oriented architectures and so on and people dropped those too. I think when you look at examples like that or you look at examples like Big Data, for me, the things that are red flags are, first of all, people thinking that one thing will solve all of the problems, right? It's this either or kind of mentality. I think that was there with Big Data. People were asking how they're gonna replace the applications they have, the relational databases they have with Hadoop and I think it was the wrong question. There's a role for relational databases and there's a role for Hadoop and the two might not be the same thing. So the either or mentality is part of what was interesting to me and I thought was a red flag. I think the miracle success stories were red flags, believe it or not, because what you had, for example, I remember there was this really one great example where a father of a teenage girl saw that she was getting circulars from her local big box store relating to pregnancy and he was ticked off. He called up customer service and sure enough it turned out that she was pregnant and the big box store knew before he did. Now that was something that was first held out to me for data warehousing, way back, maybe 15, 20 years ago. And so people believed it, which meant that business users said, yeah, I want that miracle. And when they said I want that miracle, they didn't really think through what it would take to implement the miracle. And I think marketers do a disservice by overhyping things like this. I think that technicians need to make sure that they don't have blinders on and are walking into things skeptically. But I think that miracle is a sign that somebody is going to read about it in Forbes and are going to say to their tech staff, hey, you know what, I want a data warehouse or I want a Hadoop or I want the whatever. And I think that's one of the big red flags is seeing those very early miracle stories that galvanize people, it's just begging for them to get lost in the hype. Yeah, I think, I like those answers. Well, watch out for people saying all your problems are solved. Watch out for the miracles and especially the one you mentioned. Watch out for any prescription, any time people start touting that this is going to completely replace X, right? It seems like oftentimes that doesn't turn out to be the case and the database is a really good example. I remember a few years ago when some really cutting edge folks were essentially reinventing the relational database on top of Hadoop and it was like, wait a minute, we already have this technology. You know, it's like you've come back full circle, you don't need a database anymore and now we're building one here. So it's a lot of different red flags to look out there for there for sure. Right. Now, let's talk a little bit about, I love those examples you were giving at the beginning talking about the interesting complexity that you run into in datasets. And the composer example was great because you see these names coming from different languages from different cultures and names are handled very differently and so this new complexity gets added and I think that maps very closely to what businesses are dealing with in terms of doing more global business you are dealing with exactly those kinds of realities. And you take that complexity, you marry it with that cascading effect you were talking about under data modeling where the problems just get, they just cascade through each iteration. It's almost like we're facing exponential complexity with our datasets these days, isn't it? Yeah, and for that I think the lessons we learn from big data are pretty appropriate, right? I need to be able to capture what's going on even if I don't know how to use it yet. I think the problem with big data isn't that mindset. I think that's a perfectly reasonable mindset. I think the problem comes when we say, okay, now I don't need to worry about governance. I'm just gonna put the data there and I'm gonna just use it however I use it. Well, no, what you really need to think about how I'm gonna relate this to other things. Let me give you an example there from healthcare, for example. You've got a, if you've got a, if you have an EKG, an electrocardiogram, it measures your heart rate and if you've got an EKG pumping out data constantly then a small amount of that data, a very small amount of that data is being stored in an electronic health record. Well, when the electronic health record captures it, maybe once a minute, once every 15 minutes, whatever it is, it's going to miss things. So it's going to miss things, for example, like the fact that when you got an injection of a particular drug, you had a sudden surge of heart rate for five minutes, five minutes after you took it and then it tapered off five minutes after that. So there's this big surge in your heart rate, let's say. It could miss that. So what you wanna do is you wanna capture all of the data from the EKG. That's great. So capture all the data from the EKG and store it somewhere because you're gonna wanna do analysis on that. But the thing is the analysis doesn't just relate to what happened in the EKG which is the big data, right? That's the high volume data you've got. What happened is the fact that this data from the EKG came at a time when such and such an injection was being prescribed for that patient and that patient's taking other medications that could have some sort of interaction effect with that injection. And this customer, or this customer, this patient, same thing in a way, this patient has a similar medical history that needs looking into. That context is what I would consider the master data for the patients. So it's not enough to just say take the data and dump it. You have to take the data and dump it with connections back, with hooks back into the context in which that data was captured. That's gonna be important for IoT. It's gonna be important in healthcare. It's gonna be important for applying AI to large data sets. And so you may not know what you wanna do about it yet, but you need to capture the context, the metadata, the information about all of that as you're taking a look at, as you're taking the data that you're capturing and storing it. So it really is that blend of old and new thinking that I think is really important to having a modern outlook for data integration. Absolutely. It's always gonna come down to context, isn't it? It seems like that is a recurring theme, Jake, every time you and I talk. Well, we've got a question here from Elliot, who says, do you see ETL being replaced by ELT? So, being replaced by is very different from what do I think the best thing is. I do think that ETL, I think ETL has a long history and therefore a lot of people have invested lots and lots of money into ETL. And I think therefore it is going to continue to be highly present in the market. I do think that ETL is a smarter way to go. It's one of those ideas that came out a long time ago before the technology was even properly available for it, but it has grown to such, we've caught up to the idea from a technological standpoint, if you will. So I do think that it will become much more common to capture the data as it exists, to put that into the place where you want to ultimately do analysis or ultimately want to manage the data and then start to manage the data. And the reason for that is it'll give you that data capture and transformation hub that I talked about in the presentation section and enable you to have much more insight into what happened, to give you the lineage that you need, give you the ability to back up and say what would the world be like if I had applied this data quality rule earlier rather than later as I did, that sort of thing. So I don't think it's going away. I think a lot of people are still investing in it, but I would certainly look at doing more ELT style architectures because I think that is the long term history of where data management's going. That's a personal opinion, by the way. That's not a company position, that's a personal opinion. Yeah, and going back to the earlier discussion about big data, I mean, that was another one that was supposed to, Schema and Reid was supposed to take care of the problem altogether, we weren't ever even supposed to have this conversation anymore. And yet here we are, still talking about ELT. Still talking about ELT, it goes on. Yep. Okay, we got a question from Sen who says the identity entity resolution functionality. The company I'm working at is like a data aggregator where we don't own and can't change the data, but we need to know person A, B, C is the same as person A, B, C, X, Y, Z same, or it's all caps and lower caps. What level of governance would be in an MDM like this? How do you govern that data in that kind of situation? Yeah, that's a classic MDM style problem. I go by Jake Freewald, right? That's how I was introduced today. But my handle, when I log into your website, if I do that, it's always gonna be JD Freewald. So if you can guess passwords, then you can pretty much guess whatever I am now, wherever I am. My legal name is Jacob D. Freewald, right? So when I swipe a credit card, that's what you're gonna see. And you need to be able to know that your fulfillment systems contain Jacob D, that your CRM contains Jake and that your website contains JD and all of those things have to connect. So the key thing here is being able to ascertain that these people are the same person which sometimes is stochastic, right? You figure out what's going on. There are a number of different kinds of rules that are used to determine whether somebody is the same person or not. It requires you to be able to link them together even though they're different in these source systems, but keep them permanently linked together so that you know that all these things are the same thing, that all these people are the same people. And to make sure that when they point back to the source records, those source records contain the original information because that information's not wrong, right? I really am Jacob D Freebold and I really am Jake Freebold and I really am JD Freebold. So you're pointing back to those source systems and they do contain all that information but the information is linked together in the MDM application. So that is kind of a classic problem. You also need to be able to do things like Unlink. I mentioned my wife and my mom having the same name. My wife's name is Susan and my mom's name is Sue and you need to be able to identify that the one that lives in Virginia is this person and the one who lives in New York is that person and make that permanent as well. You need to be able to do things like simply manage the way that people might spell their names differently in one system versus the other. If you have a system that allows Chinese characters, then that person might use Chinese characters to put in his or her name at first and then use English letters to put in a transliteration of their name in another place. That's critically important. And this, by the way, isn't only true with people. Like if you're a data aggregator with parts, manufacturing parts, well, you have the same problem there with SKUs, right? Shopkeeper units. This is a unique identifier for a particular product but you can have the exact same resistor being used and one is mill spec and the other is not mill spec and the only reason that one is mill spec, sorry, suitable for military applications and the only reason that one is mill spec and the other isn't is that it's gone through different testing and any that were outside of a tighter restraint were discarded. So same exact thing but with different SKUs on it. So this is, if you will, an ordinary problem that needs to be managed with any MDM application and needs to be brought to bear and anytime you're doing data integration in a high level kind of a way. About that answer to the question. Yeah, it actually has put a clarification on the question up here. Let's make sure we've addressed the actual question here. What do you think of an MDM system with only the identity entity resolution functionality? That's all it's got. Do you feel like we covered that or do you wanna say more then in that light? I think if it only has the identity entity resolution function then you're only getting a piece of what an MDM system needs to be. If it doesn't do data cleansing, if it won't attempt to reconcile the identity or the entities that you have with the transactions that are associated with them then you're not able to do a lot of the analytics that you would otherwise need to do. If you're not, I think it's part of a solution but I don't think it's the whole thing. So you're either gonna end up spending more money for additional application components to do that or you'll end up spending time to build things out in order to manage that more effectively. So I think if it only has that, it's probably not sufficient. Right. By the way, if I can just tack on to that quickly. That's also true for MDM where it's only a single dimension. You really need multi-dimension or multi-domain MDM in order to handle something because if you're working in a hospital you've got a patient but the single view of your patient is going to have in its context what doctors are associated with that patient and those doctors need to be mastered as well because that doctor could be somebody who works at a healthcare clinic at a hospital and at their own practice and yet they're all the same person, they're all the same physician. So you need to have, in order to get a 360 degree view of the patient you need to have a 360 degree view of the doctor. And so really it's about all the different kinds of domains that are necessary. If you currently have customer MDM in place, for example that might be a good source of master data for a multi-domain MDM implementation but it's not really going to be fully sufficient on its own. So we've got another question here from Gail and she says, how do you ensure context with ethics especially in healthcare? So we're talking about putting data in context but now we've got a kind of balancing these conflicting concerns here, right? We want to have context but we also need to maintain ethics especially where there's privacy concerns, there's those kinds of concerns with healthcare data. How do you do that? Sure and Gail, you might have been thinking about the example I gave where you have an EKG and you want to associate it with that patient. There are HIPAA requirements that we keep that patient's information secure and private and so doing analysis against them could be a major issue. So yeah, I won't claim to be an expert on this particular topic because I could get myself in trouble if I did but there are definitely steps that need to be taken in order to make sure for example that the individual patients are anonymized, that the data that is in the data capture and transformation hub is secure so it doesn't get released to people who are not allowed to see it. So for example, you might be able to provide aggregate data to analysts where various kinds of heart conditions and so on are rolled up into a data set where the data is anonymized and no individual patient can be identified by his or her characteristics. So for example, if there's only one patient who's got MS and diabetes and the particular heart condition and was taking a particular set of drugs, a cocktail of drugs, well if you can identify that person based on those circumstances, that could be an issue. So all of those extractions need to be looked at. But that's part of the reason again to make sure that you're not doing this in some external system or an ETL process that loads a data mark separately for an individual person. That's where you do have the governed data and you can look at the extractions that are being taken from it for analytics or what have you and saying, here's the situation. I have somebody who's authorized to get the following information and not that information, how do I make sure that they get information in a way that they can use it, but that doesn't jeopardize the privacy of an individual person. It's a balancing act. I'm not gonna say I'm an expert at it, but it's a balancing act that needs to be considered for any situation in which you've got ethics. Might be worth adding that very often right now people are talking about AI and ethics too. And a good data integration hub should have data that could be used potentially unethically if it were just open to everybody in the universe. It should have information in it that talks about individual people and what they want and what they can do. And when you want to apply that to AI you get a large data set with lots and lots and lots of attributes and you want the AI to run algorithms over it to see what shows up. That's also something that should be looked at whether it's in terms of bias or whether it's in terms of unethical use of AI. And so that's another consideration there as well. Absolutely. Well, we've got, sticking with the subject of privacy we've got a question from Narender who says with increasing focus on privacy regulations should that be managed in the customer MDM or should it be managed separately? Super question and complex one. The answer is it needs to be in both in my opinion. Again, this is my opinion it's not legal advice all that jazz but in my opinion, you need to use the MDM hub as much as possible to manage what you're talking about and that management though still needs to press outward to the other system. So let me explain what I mean by that. Somebody comes to you, let's talk GDPR. Somebody comes to you with a right to be forgotten request. What that means is that you have to be able to say they have to be able to say to you that they don't want any information about them in your system anymore. And by the way, this is customers that are also customers prospects will have you but it's also employees and employee has to be able to say I want to be forgotten by your company once they've terminated employment of course. If the MDM hub shows you all of the different places in which you have captured information about that person pardon me, then you should be able to trace all of the sources of information back to those original locations. You should be able to trace where you got all that information from that should help you identify the different locations and applications from which you need to expunge that customer's information. Now sometimes the customer information can be anonymized rather than completely deleted. Sometimes it probably has to be deleted completely but either way, the actual management of it is really more of a business process question which can use master data to trace where all of the data needs to be expunged from and then the evidence that you have expunged it can come from those operational systems. You can do things like just simple example you take a screenshot of a search where it shows that Jake Freewald was here before I did the expunging and then another screenshot afterwards where I did a search on Jake Freewald and he didn't show up, right? So that kind of evidence might have to come from the source systems themselves but the ability to guide the process and to show where it comes from could easily be centered on the MDM hub. Right, and I see a clarification from Norender. The question was with respect to opt-in and opt-out preferences. Do you feel we can do that or more to say? Yeah, I would certainly say that the opt-in and opt-out preferences should be one of the things that gets captured in your MDM hub. And opt-in and opt-out is frequently per usage. So my information might go into your CRM application but I will choose whether I want you to use it for marketing or just for being able to send me information about products and services that I've already purchased. Those are two different kinds of permission. So it's really about the permissions that you have to use my data and that is what needs to be tracked as well as the source system. And then that way you can do when you put information out for example for analysis or for a marketing campaign, you know that you're selecting information that has the right permissions for that particular usage. So absolutely it needs to be managed along with as part of the MDM process too. Right, absolutely. Okay, well that does it for the question so far from the audience and I think we're just about done here on time but Jake I wanted to throw in one more question. We talked a little bit about OmniGen, you talked a little bit about it. Earlier you talked about specialized flavors of it for insurance and healthcare. Maybe as we wrap up here you could tell us about how it's being applied in those or other industries and kind of what the future of it might look like. We're in a kind of a new age of data management. How's that new age going to evolve over time? Well I think that the key thing that everybody comes back to us with is we want more in the box and we want to be able to extend what you've got but we want more in the box when you show up at the door. So from that perspective healthcare was an obvious one because healthcare is so complex, it's just brutally complex both for payers and providers. And so we've had that for a number of years and we built it out with one of our health partners, the St. Luke's University Health Network in Pennsylvania and Eastern New Jersey. So fantastic product just because it has so much knowledge embedded in it. With insurance, we've, PMC insurance specifically, obviously health insurance was covered by OmniHealth data but we created Omni Insurance with intellectual property that came from customers as well and built it out in the same kind of a way and I think that the more rapid implementations and then extensions on top of that is a key element here, being able to do things like farm holding and so on, all of those things get rolled into the product over time. And what we see is people wanting to be able to just select the elements that they need. So they know that they're gonna get the core product and then if there are extensions for them they'll get those extensions. We're moving forward right now with law enforcement because we've done so much work around things like the opioid crisis through making something that one of our customers called Google for cops. I don't think I'm allowed to say that because it's a trademark term of course but that's what our customer calls it where you can find information about, for example, being able to find information about where a person gets his prescriptions filled, which is something that is in a healthcare system that's part of the health network along with the information about where that person lives which is public information. And when somebody has to drive 10 miles to get the prescriptions filled then you can know that it's highly likely that that's not, and they don't work near there. Then you can start to look at whether or not that might be an opioid prescription, for example, that's being filled in a particular way that might not be legitimate. Or you find a pharmacy where most of the people who use that pharmacy or many of the people who use that pharmacy are from a long distance away. If you plot that out and you can see hotspots on a map you can start to see where a particular pharmacy might be open to doing illicit business. Those are just a couple of examples but they obviously take data that's from more than one place and assemble it into a view of data about the citizen, about the customer, about the patient, about whatever it is and is able to make it capable of answering questions that we were never able to answer before. I think that's the direction things are going. We're gonna see specific buildouts of those use cases in law enforcement and probably other places before too long. We're gonna see people looking for more AI out of the box. A lot of the things that our customers are asking for is saying just highlight to me the rules that you think need to be applied, whether it's match merge rules or data cleansing rules or whatever it may be. Show me the rules that you think need to be applied, even though those are rules that you've never built before and that we didn't ask for. Show us what those things might look like. So those are a couple of things that I think I see in our future and the future of the MDM and data integration processes that are gonna be coming up pretty quickly actually. These are all developments that are happening pretty quickly. Well, things are definitely happening fast. Jake, it looks like we're just about out of time so I just wanna say thanks so much for presenting all this today. It was great talking with you. Thank you very much, Phil. I appreciate you and the University and the World Transform for hosting. Yeah, absolutely. Well, if you have questions, if you want more information about OmniGen, do check out informationbuilders.com and also come see us at worldtransform.com. Thank you all for being with us and at that point I'll hand it back to Shannon. Thank you both so much for this fantastic presentation. Thanks to Information Builders for sponsoring and making it all happen. And Phil, great to have you on as a partner and guest moderator. I love it. And thanks to all of our attendees for being so engaged in everything we do. You know, we just love it and the questions coming in. Just a reminder, I will send a follow-up email by end of day Thursday to all registrants with links to the slides, links to the recording of this session and I hope all of you have a great day. Again, Jake and Phil, thank you so much. Thanks, Shannon. Thanks, Phil.