 Hello and welcome. My name is Shannon Kemp and I'm the Chief Digital Manager for Data Diversity. We want to thank you for joining the latest in the monthly webinar series, Data Architecture Strategies with Donna Burbank. Today Donna will talk about data virtualization separating myths from reality. Just a couple of points to get us started. Due to the large number of people that attend these sessions, you will be muted during the webinar and we very much encourage you to chat with us and with each other throughout the webinar. So to do so just click the chat icon in the bottom right hand corner of your screen to activate that feature. And for questions we'll be collecting them via the Q&A section or if you like to tweet we encourage you to share highlights of questions via Twitter using hashtag DA strategies. And as always we will send a follow-up email within two business days containing links to the recording of the session and additional information requested throughout the webinar. Now let me introduce to our speaker Donna Burbank. Donna is a recognized industry expert in information management with over 20 years of experience helping organizations enrich their business opportunities through data and information. She currently is the Managing Director of Global Data Strategy Limited where she assists organizations around the globe in driving value from their data. She has worked with dozens of Fortune 500 companies worldwide in the Americas, Europe, Asia and Africa and speaks regularly at industry conferences. And with that let me get the floor to Donna to get today's webinar started. Hello and welcome. Hello Shannon. Hello everyone. Always a pleasure to join these webinars. And as you know the topic today as Shannon mentioned was data virtualization and really what that means and kind of separates some of the myth and hype from reality. A question we always get is will this be recorded? Will we be able to get the slides? Yes and the good news is if this is the first time you've joined us this is part of a regular series that we have every month and you can get all of the series from this year as well as all past years out on dataversity.net. So even if you're interested in any of those previous topics you can get them on demand. And then if you're interested we'd love you to join us in the last two of this calendar year that you can see coming up on kind of the different roles in data architecture as well as a topic on graph databases which is always popular. But the reason we're here today is data virtualization. If you're familiar with that I'd be curious folks as we always love active chats of how people are using that and people's experiences. And I'll talk more about what that is but really it's this idea of logically integrating data from different sources across the organization and without physically moving the data across. And so that sounds very cool. There's a lot of confusion sort of about what that means, what how you can use it and of course vendors love to give their pitch and it almost seems like magic. So I thought I'd demystify that a little bit and talk about what it is, what it means, some of the pros, some of the cons, and maybe some practical use cases if this is something that is new to you maybe where you can fit it into your current architecture. So what is data virtualization? I sort of covered that and we'll go more into that but if you think of it as sort of this idea of a logical data layer where you can integrate these different disparate data sources across the enterprise but without physically moving the data. So we can talk more about that but the idea is it doesn't necessarily really replace the data warehouse or a database. You can still integrate those you could also again it doesn't replace a data lake you can augment that as well. Another place this is used often is if you have different types of data or external data feeds a Bloomberg daily feed or whether or Twitter feeds that are more kind of a real time that doesn't really make sense to sort of move and locate a lot of reasons you wouldn't want to move the data maybe you're just doing kind of a test case and kind of understand that kind of full data fabric as someone mentioned. So on top of that you will have the query or reporting layer so you can use your BI tools, you can use a good old-fashioned self-spread sheet you can just write SQL but it's that kind of interim layer there that is kind of that virtualization layer between that. So I always like to go to Gartner for they've very good definitions generally. They go a little more detailed in terms of their definition of data virtualization where not only the logical views but a little bit more of whether that's sort of cached on that server layer and whether or not you define redefine the source data or not. So data virtualization and we'll go more into that it isn't magic there generally is some sort of server or tool that manages a lot of that caching work and it doesn't always mean especially now that these tools have come a long way that you don't do any redefinition or changing or quality on that data isn't necessarily just taking it one place and moving it across. So that's from our report that's referenced there if you want to go see the full report that's always interesting. Another report we can refer to and it is off the presses I think just last week so you may be familiar we do a reporter I guess it's been blah I'm old four or five years now we do sort of an annual report with data versatility on trends and data management or trends and data architecture and I thought this was interesting. I hear a lot about data virtualization and again I'd be curious in the chat people who are using it hear about it a lot a lot of questions about it but when you actually look at the numbers I think this again these are data versatility audience so folks like you on the call it's one of the lowest adoptions of any of the technologies up there so no surprise that DI data warehousing is really high but data virtualization was low you'll also see this year we did something a little different we compared the 2020 findings with 2019 2020s odd so I hope that we can take a bit of that with a grain of salt but we'll see that even between last year and this year it's gone down significantly so nearly 30% so we did notice across everything if you look between 2019 and 2020 almost everything went down except for the tried and true DI data warehousing security governance type things so if you read the report we've got to have our thoughts you know is that people going back to security a bit tried and true we don't have a lot of extra time budget bandwidth to be doing some of the exploration and so when there's sort of crisis people go back to the kind of the you know the core value propositions that could be it but again it's 2020 so who knows but even without 2020 data virtualization has been sort of low across the board again I hear a lot about it I get a lot of questions about it one of the vendors touted a lot I I know in our practice I mean we have used it it's not a big part of what a lot of our customers are asking for that said it is there's a lot of talk about it notably again going back to Gartner they admitted as well that current adoption is low in fact their numbers in 2011 was only 11% about 40% up in 2018 so they're seeing an increase and they expect a much larger increase throughout 2022 so their estimate is up to 60% of enterprises would would implement some sort of data virtualization as one option for data integration and I like that they sort of call that out and I will talk about that more if you've heard me present in the past I I do my little old lady rant about the whole either or right just because this virtualization doesn't mean data warehousing goes away or data legs go away or ETL goes away right it's another tool in the toolkit and being able to use it effectively you can have a lot of a lot of benefit so a little bit more about it so what is that difference between just to I mentioned things like ETL and data warehousing and comparing that to data virtualization so what are some of the key differences so I would assume a lot of people on the call are familiar with more of the traditional data warehousing approach where you have your source systems typically you know database systems or even spreadsheets you would extract transform and load that so you're not only moving the data but typically you're doing transformation on it it could be as simple as data cleansing it could be summarization for more historical trending in a data warehouse but there is definitely physical movement from the systems into the warehouse a lot of great reasons for that right you do often want that single source of truth again we saw in the data diversity report I'm seeing in our practice that while more than ever people are looking for a data warehouses companies are looking to be more data driven that that use case doesn't go away and then generally because you know folks in this call probably your data experts but not every person who wants to look at a report is but there is this need for self-service so a common way to add that usability as well as sort of we often call a semantic layer you know the usability what the data means maybe business friendly terminology instead of calling a table and column names we might build a cube or sort of a semantic layer which sits between that warehouse and the VI reporting layer that isn't a virtualization layer but in a lot of ways it's similar it's kind of a layer between the end user and the data itself now if we want to compare that picture to data virtualization here you still have the source systems but the data stays there and that's where that virtualization layer can sit between the end user and the and the source systems for that VI reporting so in a lot of ways that's similar and I've heard some you know confusion about some of these you know how is different and how it really is the same the other nice thing about data virtualization separate from something like a data warehouse is you can have a bit more diversity in your systems easily more easily because again the data stays in place you're not trying to conform that to a database structure which again has its pros and cons but yeah I'm trying to get a daily external feed for weather and have that in my portal or again market trends and you know stock market closing trends or weather and things like that integrating cloud and on-prem you can do that data virtualization ERP system CRM and you can see that definitely the benefit there but you can also see where that can start to sound like magic take anything and just sort of magically put this layer across it and magic happens as we know there is no magic that's why we all well it's lucrative to be in data management because there is hard stuff underneath that so I thought I'd go through some of these and show some of the differences and similarities between things we may be familiar with so data virtualization versus et al and I don't mean versus versus like one you know fighting across each other again that's it's one of my pet peeves and technologies we sort of it's either one or the other when there's often you can see at the bottom these these two solutions can complement each other in an organization it just absolutely does not need to be an either or but there are differences so and et al again extract transform and load when you do want to physically move the data this is sort of the tried and true workhorse to do that so if you are doing sort of that enterprise great effort where you do want this planning enforcement of the rules you do want that it's going to be the same over time something like a warehouse and there's other use cases for et al obviously but if we just use the warehouse as a very common one as a kind of comparison to data virtualization I do want to do that upfront planning it is rigid it is locked in and I you know I probably want to do it nightly weekly monthly depending on my needs so that I can really do that processing and transformation it can help with some of the you know aggregation and cleansing and things like that now with data virtualization it doesn't always make I mean it's it's expensive and in this time you know time consuming there are resources needed to move that data so that is not always practical another use case for it is if you do just want a rapid prototype right I'm doing a POC I'm not sure even the you know efficacy of this data does it make sense to integrate it I just want to really get an idea of what this data looks like I don't want to have to build an entire data warehouse to get some results data virtualization is very good for that also when you you're not in that data warehousing sort of bats understand aggregate load verify etc you really want those near real-time maybe you want again it's not an either or you have your data warehouse and you want to have some of those streaming things come in they kind of join it with other areas so it really is sort of delegating the queries and joins and aggregations more to the source system and that returns the rows so again you're not moving the data and then sort of transforming it you're keeping the data in place is that difference between etl which moves it in virtualization that's not so we'll cover a little bit talk more about that but the other kind of similar yet different aspect of this is that there's kind of that user layer or the we're talking the virtualization layer in a lot of ways there it's similar to what you might have by a biq because it's that layer that allows you to query it across one of the nice things about a business intelligence cube is almost by definition easily understandable by business users that kind of adds that user friendly semantic layer you kind of organize that data in a way to slice and dice you know i want sales by region i want sales by day versus week versus month right and it's sort of not only is structured in a way that kind of slice and dice but it's generally in the tool itself and people can use it but that's also a negative you're sort of locking into a tool because you've built all of those bi cubes in that tool and that might be fine maybe you've standardized but then you know if you want to move that then that doesn't always nicely translate right so one of the benefits of a data virtualization layer is that it can be used across these multiple bi tools so you sort of avoid some of that vendor lock in but on the flip side it is more of a sequel query layer so you more and more business users are knowing sequel it's not you know it's it's complicated it's not brain surgery right something that someone with you know fairly technical skills can take on but a lot of business users don't want to or that's not something that we would expect to see from a business user it is a good solution for your data scientists your bi developers again think of that rapid prototyping or i'm trying to get a large view across the organization um that's sort of where data virtualization fits in and again it doesn't have to be in either or they can compliment each other they each have their use case but i know i i called this one out and one thing i try to add in these webinars each month is a little bit of real-world experience the stars we can't see at the moment but they're there from all of the different projects we've done um and i do get this question a lot in terms of if we're trying to kind of have more and more we're trying that the word data fabric sort of came up i'm in one of the questions you know that that ecosystem of tools um and also a um a way to easily aggregate the data for for consuming across the organization so kind of that data virtualization comes in um i see uh one of the questions on you know does the data virtualization tools provide a semantic layer yes a lot of them have come a long way kind of in the day a lot of it was a lot about just that virtualization right um as they as they progress um it isn't magic and one of the reasons i put this in um i had a colleague and he was a technical colleague he's actually in the industry with data management as a consultant um and he was a fan of virtualization but he would always say it doesn't matter they just do it it's just you know it just works and i don't know about you but if i whatever i hear that as a tech person it makes me very very nervous nothing just works right so what is behind the curtain of data virtualization again a lot of these uh there are platforms either a lot of the data integration vendors uh have one of the tools in their toolkit to be a data virtualization layer there are some um pure play data virtualization vendors out there fewer and fewer they seem to have been kind of um gobbled up by the by the integration vendors because this is a nice fit to augment some of the other solutions out there so what they offer and they're coming a long way and if you kind of track these vendors again the line blurs with so many technologies now i mean one of the things they will offer is query optimization and that's some of the part that isn't magic right i mean there is there's query there's performance there's things you need to do to get that data back that absolutely isn't that magic um but they are also adding things kind of distance to that infrastructure and federation and that you know how do you manage the mood of moving of the data um things like data quality um a semantic layer a data catalog often um so everyone is in the catalog game now right um but understanding that um that semantic layer um as well as some data governance functionality uh that i think that was something in the past again that anything that's easy sort of screams i can get around governance right so um i'll talk about that in a bit uh it doesn't you know preclude the um need to actually um do some of this hard work as well so um in fact i would argue things like data governance might be even more important as you're looking at these all of these disparate sources you who's the data owner what's the data privacy um you know all of the when you have a sort of lockdown in a warehouse um it's a little easier to kind of control and manage that but now that you know you're distributing right so do we have good data stewards if do we know what that data means in the provenance so that's why i think more and more um a lot of data virtualization tools are touting the fact that they do not only have governance they have security and they have almost full data catalogs or they're a data integration tool that kind of works with some of their other pieces that offer some of this as well um so it absolutely does not obviate the need for that i think actually it sort of um enhances that um so what it is not an easy way out of doing the hard work of data modeling data quality data governance and stewardship etc i mean you need to understand what data you're moving that doesn't magically happen you don't just sort of point to the area and let that go if your data quality is not good i mean data quality is a whole topic in and of itself right so i'm a fan of fixing at the source so ideally um the source systems have good data quality i'm not a fan of using etl or to fix it unless you absolutely have to um but that is a consideration and understanding your source systems governance and stewardship i mentioned who is the steward of that data how do we manage it it absolutely doesn't matter if the data is if it's moved to one place if it's staying at the source there's still is the data steward there's still privacy and security there's still hyper rule there's still um you know the data meaning um and i would say even more so i'm getting that involved in that and again any technology that can seem simple sort of goes to again that that's often a good case for it i'm just doing a prototype um so let's do that as well um master and reference data is another one i mean there's the whole webinar and that right there's different methods of mastering so could you kind of have a log logical you know there's that data by a logical warehouse can you have sort of logical master data where it gets you this sort of lives across source systems with a state of virtualization layer i would say i'm not a fan of that that take a lot of orchestration and generally the idea of master data is that it's sort of a centralized area that can yes be cascaded across i personally and i'm happy in the chat for anyone to disagree with me because that's what um helpful about these webinars um i think sometimes when people are looking to do too much of virtualization or distributed of master data they're kind of not mastering right i went a single review of my products or my countries or my you know codes um probably good to have that in one place that said it can be a source for data virtualization as well um so not an easy way of doing hard work just another method of kind of integrating that data um moving along um so some of these cases again it can be a great way to integrate with disparate data sources it's great for sort of real time data access rapid prototyping doing some data exploration of the sources right so in some cases it's sort of a precursor to your data warehouse um doesn't even make sense um really that nice way to kind of have a virtual layer across and again it's not an either or they compliment each other but i would say and again open to uh to uh folks disagreeing with me not a great for a centralized data warehouse or master data management reference data i kind of put in a similar area i mean graph databases um it can meant what can be sort of confusing is that it can manage a lot of these different systems like documents but i wouldn't say it's a document management system right so i think if you want sort of a broad view across these areas it's a great way to understand that information if i want to pure play do true document management with taxonomies and and all of that versioning life cycle that again that's where a great use case is that it would live in the source but when you're trying to integrate you just kind of have that virtual layer across it um so um i have gone much faster than i usually do so she's going to kill me but we are actually getting close to time so um in summary when we look at data virtualization it is you know that flexible way to integrate the disk read system so without moving it while current adoption is low i think expectations are high that said um a little bit of um i don't know that was sort of saying that for a while so um but again things change and adaptation changes i know Gartner is one where they are really expecting big things for data virtualization i know it is a great tool in the toolkit but it does not obey the needs for strong data governance security and things like that um and it can be definitely be a complementary technology to other data warehouse solutions so one um point i wanted to make when i mentioned uh the survey um this is a has a lot of great information not only about data virtualization but trends in general um and that is available on the data diversity site again i think it's just a week hot off the presses so it's free for download i think uh shannon usually puts out a link in the follow-up email that you'll get in a few days um and you can kind of see a lot of other trends that we've kind of talked about as well uh we do this for living so if you like let us know um and then as always i'm going to pass it back uh to shannon if you wanted to open it up questions oh before i do that sorry um next month is on the topic of kind of that difference of terminology matters right what's the data architect versus the data engineer versus the data modeler all these hot questions of what is our overlap what's the difference so if you're available october 22nd please join us for that so without further ado shannon i'll pass it back to you for questions donna thank you so much for this presentation and we do have a lot of great questions coming in so we got lots of time to get to those um and if you do have questions for donna please send them in the bottom right hand corner in the q and a portion of the screen and just to answer the most commonly asked questions just a reminder i will send a follow-up email for this webinar by end of day monday with links to the slides and links to the recording and of course a link to the paper along with anything else requested throughout so diving in here donna you know with the main teleworking uh occurring it would seem that data management related issues would be of high priority yet slide seven showed a marked uh decrease in all aspects of the survey why was this the case yeah i i found this one i'll go back to that survey that the person is asking um it was i found this one very interesting because this is the year of you know for him with listening to this after the fact 2020 was sort of the covid lockdown um and so this survey was sort of right after that um so yeah you would think i know in our practice we're seeing an increase in a lot of demand um for you know for data analytics for people working remotely understanding data um digital transformation all of that um so i mean what i'm thinking again i think a lot of people are still doing data management i don't think data management's ever going to go away great great career to be in but again when things are risky i think people have to go to the core i think um there's a lot of exploration out there i think a lot of people want to look at things like new technologies right um but when we absolutely have to cut to the core the good news is that we're cutting to the core and i'm seeing this we need to understand our business we need to be more data driven where should we you know where should we cut costs where should we expand our business that's almost a prime use case for business intelligence and data warehousing and data governance of course insecurity is going to be with telework and things like that security is of massive importance you know there have been some issues around that so um only only the survey participants know but my my theory is that of that it's not that data is important but people really have to cut to the core fundamentals and i bet um you know in future years when people have a little more budget and and less of a crisis mode we'll go back to some of the exploration um so that's my theory but if anyone else who maybe even took the survey and and was one of those that kind of went down to be curious but anyone had it in the chat so that's uh good questions yeah so one year real time for data virtualization it should be real time which supports getting data from message cues yeah real time was one of those words uh that has so much meaning so i mean i guess literally i would think real time i think of real time sort of you're doing uh you know split seconds stop trading and things like that there is generally a bit of a lag often there still is some transformation um and so yeah i it's not like there's a lag of days right so i guess one person's real time is another person's near real time um i i'm finding that word used more and more in fact i find it interesting again we've been doing a lot of bi and data warehousing again um and a lot of the users say i i need it i need it real time and some of the things were sort of annual reports monthly finances and i said real so i was really trying to get my brain around what real time meant for something like a monthly sales report they said no no i want the report when i want it i want to go to the bi tool and have it real time and that is absolutely not what i meant by real time i would think the data is real time they meant accessing the data when they wanted it and sort of the netflix version of getting your data um but yeah i think is i think of real time is literally there's a bit of a lag and that's that's my definition of it but open to other people's chat i like it uh so if you delegate join compute to this or system are you not in contention with resources on an operational system it could be yeah that's why i think um that idea of of let me go back to let me go back to the picture of it's not magic behind a court curtain so this whole idea of the query optimization is important and that's where it's not magic so um i think you do it is still um an issue to consider it doesn't change the discussion it moves the discussion right and so that could be definitely be a con of maybe you don't want to keep it in the source system and um and that's maybe why for a production data warehouse you do move it i mean that is one of the um 101s around data warehousing when you're trying to report off the operational system many reasons to do a warehouse but one is that you're not contending with that operational system so yes definitely something to consider um it doesn't get rid of these things it moves it right so uh that's a good good question i don't want to pre-answer someone was asking about kind of life cycle management and dev test production does that go away it absolutely doesn't and you'll see this box here as well that a lot of these virtualization layers um can manage that as well um some people do use this for sandbox right because i am it is a great first kind of brush of before let's go into all of the effort to move or get your servers and all that sort of thing um you know if you have this tool it can be sort of a dev type prototype um but if you'll do have it in production i don't think dev testing production go away um so definitely look for your vendor that can support that when you get one of these platforms uh so i can see the virtualization can be organized for self-service by business users and most of the data organization provides decorative and drag and drop features for people having some excel knowledge um how matured are those data virtualization tools in are providing a true data quality solution um data quality is uh is really holistic i i have well what's difficult about that in any of these tools is often the vendors who have uh virtualization also have kind of pure play data quality as well um i think clearly a robust data quality tool on a of its own um it can always do better i mean i kind of like the data i might be showing my bias if if data quality and true reconciliation and all of that is the main use case i would lean more towards that data warehouse because partly that real time doesn't give you that idea you can do some sort of um rule-based data quality but say if you do need a human in the loop verification or you know a lot of data quality is holistic so some pure you know data quality checks or some formatting and things like that i think you do but i think if you really do need you know how would you call it uh validated financial statements and things like that or uh some of my healthcare clients when they're doing either mdm or uh even data warehousing they they do have a human data steward due to the just the sense of deputy and risk if any of that data is wrong and you just that typically doesn't lead to a data virtualization style approach so i think if data quality is a massive concern um i would kind of lean towards some of the others but again open for the people to use there and then on that also both the the data virtualization solution enables faster time to deploy data as a service for application modernization as part of a digital initiative do you agree with that i think that's fair yeah i think um i think as well if we're just rapid time to deploy but with the risk associated right so do you know just make sure um yeah i i think rapid prototyping doesn't mean it would go away but that's where your dev test and prod would come in um yeah i just always get nervous nervous when we go too rapid but that that is one of the benefits of this i need something fast i don't you know i want to be just enough uh sort of quality and integration and things like that it absolutely can be a nice kind of way to quickly join this information and it may be your permanent way right because you know if you give this example you may already have uh well i had the example earlier i mean you may already have a data warehouse and you want to augment with that um it doesn't it doesn't exclude that you can integrate that with other systems without having to hold the build a layer so yes it definitely can speed time to market and that's one of the main use cases and why why it's kind of appealing to people appropriate interim solution for a long-term migration project i want to present a harmonized view in the new system of reference data contained in the old system that is spread out through disparate silo data sources i think so i think that's uh um again kind of this rapid prototyping or and before you uh want want to actually move the data do all of the work and you really just want to do kind of that sandboxing or yeah exploration um i think data virtualization could be a very powerful tool for that so again something it may stay as a long-term solution depending on the use case and the type of data or you may know that yes i need to do a more robust warehouse down the road but for now let's just do some initial joins and queries and things like that it can be a very powerful way to do that so yeah definitely a good use case for that and can data virtualization be used for building cloud-based enterprise data warehouse on data lake supported by a computer cluster um yeah cloud is definitely a solution i think the other part of of whether i think a cloud data warehouse doesn't necessarily need virtualization i think those i see those as two separate questions but it definitely can support cloud and it definitely can integrate with a warehouse um there is this idea of kind of a virtual warehouse and that could be kind of this idea that you're integrating things together you could do cloud non-prem as well um and again just make sure that you know there is this complexity behind the curtain of kind of again that query optimization doesn't go away a lot of these tools have come a long way and with each release they do really manage some of that contention and things like that better but it's still an issue um so yeah the cloud is definitely something that it can support yeah reporting tools have a presentation layer and you do can do virtualization as called a data virtualization layer what do you think um i would i mean i i i i wouldn't know if that would be a pure virtualization i mean yeah i uh i mean you can have something like a power bi and you can have a cube onto your uh relational data you can also um bring in you know some of these x you know i can get google maps and kind of overlay onto that you can kind of link the things i see that is different than a data virtualization layer it's more i mean some of those you bring the data in uh data virtualization more is you would kind of do those it's really that whole ecosystem of tools i think i mean in a lightweight way i usually kind of say that's virtualization i kind of see that as just kind of linking data from a bunch of different places i guess maybe it's data virtualization light um but yeah i guess i mean you could and we i have customers doing that now where you've got kind of your warehouse data and then you bring in you know some weather data or map data or things like that um onto that tool but i think that's a slightly different use case than pure data virtualization with you know the security layer and the query optimization and a lot of that as well so and if you have additional questions feel free to submit them in the q and a portion of your screen um i do see some comments here i'm trying to go to the chat here just i see some questions that have come in throughout the um on the in the chat um but most data virtue do most data virtualization tools provide a semantic layer a lot of them do um yeah and i think you know i i kind of said when i was comparing the cube um with data virtualization i mean a cube has a semantic layer that's another one of those terms that's there's a lot of different semantics around um but more and more the data virtualization tools can't have that they often have data catalogs they often have some light data governance as well um so that definitely can be in place um yeah i saw another question oh go ahead so yeah so how does data virtualization layer address historical data persistence for example type two dimension historical tracking i would think and again people can i think once you're doing that that seems to make much more sense to be in your your data warehouse i mean to me that's almost a prime use case for i want to you know etl it over and i want to kind of do my facts and dimensions and slowly changing dimensions and things like that um makes a lot more sense and you do want to have that kind of stored that history over time that use case to me is much more of a kind of traditional data warehouse again you could keep that data warehouse and use the data virtualization layer to integrate that with i think if you're totally doing that kind of consolidation and trending and cleaning all the things you need to do that to me feels like more or less or something that's appropriate uh speaking of uh you actually got to that question quite a bit so we'll get back ahead and sign um i saw too much of a jump in because i think it's related um someone had a question about data vault um so if you're from a data vault this is another kind of a agile way to do some data warehousing um and could you use it over a data vault yeah yes i would again like the warehouse i wouldn't see that as a replacement for a data vault um but a data vault database or data upset uh definitely could be as one of the sources for this data virtualization layer i have personally not done that um but i don't see reason why that couldn't be done that seems like it makes sense if you delegate join compute to the source system are you not in community tension with resources on an operational system yes yep yep i think that is one of the risks and again like in this picture that is one of the reasons why you might want to do a warehouse a lot of reasons for a warehouse for one is to take that contention off of the source system so again that that is one of the considerations again these this mad magic box and maybe i've got two cents about that calling we kept saying don't worry about it they just fixed it well there's never a day right so there is query optimization where that query optimization occurs it is not a trivial thing so yes you do need to consider that and it isn't necessarily magic um so you need to kind of think that and donna what do you think about the new concept of a data hub a data hub is one of those overused words so everyone has a different uh definition of it um i do have several customers using it very successfully i i yes it can be this idea of um you know uh you have hubs the one that i have one customer international customer where they had a very successful kind of had their customer hub and their product hub and then that was sort of a loose someone had a question about the data fabric it was much more it wasn't you had this one central warehouse they sort of had various data hubs that were sort of loosely integrated with more of a data fabric approach and it was very very successful and it sort of broke things out so that i'm seeing more and more people go kind of more toward the data hub approach so another tool in the toolkit so why don't you recommend to use their data virtualization to master data management um i i think i think i think i think three times in a row i'm a fan of a centralized data a centralized approach for master data and reference data right so we reference data as a good example i want to list my country codes i just want to list my country codes and i kind of i sort of do want in that one place it doesn't mean i mean there's a published and subscribed model for mdm um so it doesn't mean that that data has to stay in that place i can have my mdm quote hub and have or reference data hub and have that pushed out so that's reference data maybe that's a little easier and maybe that wasn't the question you know yes to have one table with a list of your country codes or list of your you know i don't know code type data is a little more simple master data i can see the question because and i could see the maybe the the temptation for virtualization because i'm uh i say that you know for customer master for example nobody owns all of the customer master and i think some people go wrong by saying oh yeah the customer master is your crm or the customer master is the system and generally there's a lot of systems that have that customer master but that's part of the devil in details understanding which fields came from where and i guess you could have that virtual layer across um i think some if it were very robustly done yes um but makes the subscribe a little harder i think um into the different systems and i guess i just get a little twitchy too of of is that was that full analysis done if yes you could have that virtual layer and i if there's 10 different attributes of customer and i know that they're in 10 different systems you've been extreme and i take one from each um you know there's issues there if that were completely well thought out that's fine um i suppose that is one approach i just a lot of times people it was that full analysis done um and is that the right approach but i am i am i am biased i suppose and that is my how i've done it and it tends to you just still get it from those different systems but having that central i want a list of my master data and then integrator push through a seems to be how i can do it and i don't know what data representation represent a super alternative to combining different sources in an incremental data warehouse cheaper is is always uh but yeah i think a rapid prototype to do that um i think over time if the use case is for a warehouse doing an incremental approach i think they could fit together right i want to understand the use case for the warehouse or i need some data quickly but to do if the use case is for a true data warehouse doing an incremental approach and more of an agile you know build it in pieces and deliver it out is probably the proper use case um so or is it paying me now pay me later right so but yeah it can be a very especially for rapid prototyping or i'm just trying to get or i don't need the full um you know conform dimensions and all my history and everything warehouse is something else is used loosely is it more of a hub is it more of i you know is it a relational warehouse or a true star schema or i want my history over time if i'm just kind of joining up data version of a warehouse and that truly is the use case absolutely this could be a nice way to do it so i just be careful of i i guess when i hear you know when people are often not anyone who asked the question um but because this can be seen as easy are we skipping the analysis phase and did we pick the right use case but i've also seen people build a warehouse um that is big and expensive just because they wanted to join up some data that didn't really need a warehouse and it was time consuming and expensive so i there's a long-winded way of saying it depends on the use case but it can be a very quick and easy way to get some data that just i'll say just but needs to be joined up for queries but doesn't truly need to be mastered or warehouseed or overly integrated in that way and that could be a long-term use case it doesn't have to be just be a prototype but it is also a very big use case for and can data personalization be used to pass reference data locally managed centrally by multinational enterprises wait i need to think that one through so that was what was it cast locally and managed excuse me again i feel like that was like a quiz question cast locally managed centrally yeah by multinational enterprise um i think i i do feel like this is one of those quiz questions where there's a trick in it um so i mean i i think depends what the reference data is it could be the question person was getting at sort of the often with the governance and security where that data is stored so if it's truly referenced data and what's the iso country codes unless i'm missing something i don't see necessarily a problem with that if it's cuss if it's truly master data and it's customer data and the person is caching it in the country where you know it's it's european data and i'm now taking it here in the u.s. and it's living quote in europe then that's i think more problematic but that would be master data not reference data but yeah that would that would be i i think that would be a consideration of where that data lives is where that data lives and i i'm not an auditor uh or a lawyer but i would think that it might be a tricky it's not a tricky way to get around that requirement um but i do think you need to be careful of y'all with that data if that was what the question but i think if it's just truly reference data you know what's the list of the valid colors for my shirts that we're selling that i can't see that be a problem but uh more mastered in it maybe and can you elaborate on the real-time aspects of data virtualization because all the data remains in the source because it remains in in the source so i i guess um and because real-time came up before but if i'm i'm thinking of um okay so if i'm if i'm trying to get into a warehouse and generally do a source system and there's some sort of transformation and movement on that and often there is sort of a batch approach because you're you are doing that analysis also you are trying to you may get this data at different times um and and you want to make sure you pause and integrate that all at one time right i might get one at noon and one at five and you want to kind of make sure that it's it's loaded at a certain time more with data virtualization if if you are just querying it um you could get that real time so i want to know what the weather is now from the weather api someone had asked for apis right so you can kind of get that through the virtualization layer um and you're you're you're you're querying it more of a real time or at the source you're not sort of bashing it and then uploading it later so it's sort of more of a direct query against the source i'm i'm doing the query now data virtualization technology leverage spin off virtualized database um or math instances especially like dev tests or is that a separate topic by itself um i i you broke up when you said the beginning of it could you repeat that sure but can data virtualization technology be leveraged to spin off virtualized database or math instances especially like dev tests or is it a separate topic by itself i'm thinking that's a different topic that i'm thinking that sounds like more of virtual machine or but not a data virtualization layer which i see is more of a query layer um but i might be misunderstanding the question and if i'm not then i don't know the answer but there's a lot of words of virtual and i kind of see that as like a vm i'm spinning up a vm you know which is kind of a different thing and that might be more of your different environments you're talking about let me talk a little bit about cloud it is uh data virtualizing is virtualization getting usurped by cloud alternatives that are just a dynamic oh i'm trying to think that could that be the reason for that i don't know if it's usurped because i'm not sure this totally has been adopted yet i mean i i see that maybe it's more of the difference between you know a cloud could be more of a data lake solution in a way um i have to have a picture for everything um i don't have one um oh it was my first one yeah i mean you i kind of see a cloud as i mean a lot of these cloud solutions you have a lot of these things in that in that platform right there so right i can i can have kind of a data lake in the cloud and a data warehouse component in the cloud and some real-time streaming components in the cloud um yeah i mean it might be i guess with getting at that question this is moving data is becoming easier or you don't have to move it because it's all consolidated with the different areas um you know that could be a reason why this isn't taking up but i don't know i do see is a different thing or you can virtualize across clouds but um yeah it's an interesting point give that one more thought i don't i don't have anything more on that one but that was clever i love all these questions coming in um can there be persistence and read write access to data sources or just read access to the data virtualization later i generally see it as read access um yeah i usually see the uses of trying to virtualize the query um which is different than more of an api someone had asked what api is you know to kind of do a read write put type of thing and i just lost my i just lost my question there's a lot of questions in here about products but of course i just so you guys know we don't we don't make recommendations about products we'll we'll review products on diversity of pros and cons of that individual product so we don't recommend one i never ask and they always ask but i will i love to say a generic thing um it seemed in the day in the past there were there were a lot of kind of pure play data virtualization tools there's at least one out there that is just a virtual but i would say if you're looking at vendors a lot of the data integration vendors have kind of bought up some of these tools and um often it's the vendors that may also offer just etl and or mdm and or a lot of these other solutions so they i think that's a good thing because then you kind of have all the tools in the toolkit whether it's the best of breed um but you do you don't have to necessarily say if i've you know bought into only data virtualization you're kind of stuck but a lot of the vendors have a lot of these tools in the toolkit so you can kind of easily mix and match the same vendor which is awesome and and garner does have their magic quadrant of data integration and they talk a bit about this um so you might want to look up that as well as well as the data recipe roadshow i'm sorry you didn't want to then you get to see them in action and not just reading reports so yes demo day demo day um do you have any experience using data virtualization with data vault um creating a virtual later over data vault to quickly deliver data sets even data mart yep i think i touched on that with the warehouse um i i have not um but i can't see why that wouldn't i mean to me that's similar to integrating if if vault is a sort of a data warehouse style whether it's more agile or you know we could go into data vault but i i think there's no reason you couldn't i personally have not but i think it's the same thing if this data warehouse were kind of a vault style warehouse i can see that in this case but i personally have not and there was a question in here if anyone uh it's not a bad question i just think looking more for attendees to answer if has anyone had on this call implemented data virtualization and how hard was the quote unquote magic building the apis and views for the for the data virtualization um front end uh don if you have a quick answer to that anybody wants to add their comments in the chat of their experience i think that's a great question to i because i was trying to ask that as well i'd be curious on this call given that if you look at all the industry kind of trends it sounds like it has not been you know a lot of folks don't use it um i'd be curious if anyone did want to kind of chime out and say hey yeah i'm using it it's great i'd be curious for the chat we'll give people a chance to chime in there i'm just when we talked about it does it technology support mpp caching to further enhance performance um i think that would um depend on each of the vendors um so i don't know the answer sorry that's quite okay um singing out loud good uh feasible evolution to be to persist source data in an immutable data lake store with full transaction history and enhanced data virtualization layer software to provide a virtual dimensional model over the over the lake source wow let me think that through uh so lake source i guess it had structured data in the what i best seems odd to me i'm just trying to think that through rather than i would see more typically i have i have not done that we would typically i would think take use almost the lake if that were the use case almost use the lake as a landing area or you know and and then from there sort of transfer that if you really wanted the full dimensional warehouse that seems like that would be a more traditional approach that i've used but if anyone's in the calls it's done that that seems strange to me but i might not be understanding the full use case of that but yeah i i would have thought that the lake would have been more of a you know kind of a landing and then you would take that and kind of conform that into more of a warehouse with your dimensions because you'd need some sort of transformation on that and i think there would be a lot of yeah that that seems like that would be a more feasible solution but unless there's something out there i'm missing i will let that person chime in if they want to add anything um i think we have time to slip in one more question here donna uh data virtualization seems like a very flexible way to introduce new and different functionality to in indecisive users or a way to reduce the full cost commitment for exploring the whims of a demand user what do you think like someone has in trouble with their users um yeah i mean i think if if you're just kind it is a good way to rapidly prototype before you commit and again i don't know if they're problematic users or the use case is problematic right i mean people we don't know the answer until we've had a chance to look at it um in that case this can be very valuable before you know kind of i i don't know what's out there i can do you know in a way i think someone mentioned some of these cloud platforms and the links can do a bit of that um but this data virtualization can also without having to move anything to do that it can be a nice way to kind of even see what's out there see what it would make sense to integrate and then if there's a use case for doing something more you know integrated like a like a warehouse um then that's a nice way to prototype without having to make that commitment and do a lot of work so i mean i think there's no work but so i think that makes sense all right well that is all the time i thank you guys uh and thank you so much for all the great questions that have come in and a great discussion uh time uh just a reminder i was trying to follow up email by end of day monday for this webinar with links in the slides the recording and of course to the new research paper uh and sonna thank you so much for this thanks everybody hope y'all have a great day and stay safe out there great thanks everyone