 Welcome everyone to our next EDW session called data fabrics and enterprise data management. What analyst firms are peddling as the new trend in 2021 which will be presented by Ravi Shankar, the SVP and CMO at DeNodo. All audience members are muted during these sessions. So please submit your questions in the Q&A window on the right of the screen. And our speaker will respond to as many questions as possible at the end of the talk. Please note that there is a link form at the bottom of the page called EDW conference session survey. This is where you can submit session feedback and we encourage you to do so. So let's begin our presentation now. Thank you and welcome Ravi. Thank you Jim. Good morning, good afternoon all. So the topic today is about this concept called data fabrics. So we will talk a little bit in detail about what this is. To me, like before I jump into the topic itself, let's talk about the reality of what happened last year. First of all, going back to March of last year when things started shutting down, what was the assumptions that we made at the beginning of the year in terms of growth for the year, the budget for the year, the number of staffing increases that we need to, everything went out of the door, right? So the reality was that economy shut down, revenue was going to go down, as a result CEOs and other C level executors had to trim down the budgets that were allocated at the beginning of the year and in some cases the staff that reduced as well. So the new reality was that we had to do more or even sustain whatever operations with less. And that called for business agility, how agile was the business. And if you look back towards the end of last year, many businesses actually went out of business because they were not very agile. Their business models were structured in a such a way that it could not survive the structural impact of the economy and as a result they actually suffered. So business agility is very important in situations like these to be able to come out of it pretty good. Now quite often the business agility in these days is very closely tied to the technology agility, right? Because many of our businesses are already digitalized and we rely very heavily on the technologies to be able to conduct our business. And back in March, I remember as the chief marketing officer going to my team, we were very heavily reliant on doing physical events. But all of those things shut down, we had to switch away to virtual. But how is our efficacy of virtual events? So that's a question I asked my team and based on that we need to determine our marketing mix. So like that same question you or your executives could have asked of you in terms of give me this, give me that. I need to determine what is it that I can do in the current situation. How long did that take for you to deliver? I did a presentation like two days back in which I did a poll asking the same question. How long did it take? Did it take hours, days, weeks, months? 77% of the people who responded to that survey said it took weeks for them to respond back to the management in terms of the data that they were looking for. So if you were asking in March, give me the data, the executives were not getting it at least in April or May in order to affect the change in business. That is not good. That is all comes down to the technology side where your technology is not able to is so monolithic or so fixed that it's not very agile for you to make the changes. So that's where this notion of data fabric comes in. So data fabric was a terminology that was introduced by Gartner. This was back in 2019, and they mentioned that it is going to be one of the top 10 trends in 2020. So here we are in 2021. It's no longer a trend. It was made public. And if you go look at Gartner, they talk extensively about the notion of data fabrics. We will get into what a data fabric is and what are the technology components that make up the data fabric. But the whole idea of providing the data fabric is to provide that agility that we talked about, which is not currently present in many of the organizations. And one of the key technology I will touch upon in the data fabric is this technology called data virtualization, which is a core component of the data fabric and enables much of the functionality that the data fabric stands for. So in this particular slide to kind of give you a pictorial view of what the data fabric looks like. In the slide, couple of ones before that I showed, in the companies that you're dealing with pretty large companies who have, you know, business operations across multiple different countries. It is no secret you have multiple different applications, data repositories and so on, all the way ranging from the databases and data warehouses, which are more structured kind of in a data. Then to like the data lakes, which contain unstructured data and nowadays with going on to the cloud and even your local applications, whether it is Word, Excel and so on. So there's a plethora of applications that are spread across multiple different locations. They're all in different formats structured or unstructured and they're on various levels of latency, either they are addressed or in motion if it is streaming data, IoT data and so on. At the end of the day, business users need an integrated cohesive view of the information. So I would like to know which customers have bought which products. If I'm in the pharmaceutical industry, I want to understand which compounds make up which products. If I'm in the insurance industry, I want to understand which customers have filed what claims. We all need this integrated data, which is distributed across multiple different systems and we want a cohesive view of that. It doesn't matter to us where the data actually resides or in what format it is. So Forrester was another analyst firm that actually started talking about this as well. And in fact, they mentioned, they came up with the wave and mentioned Dino do as a leader in this particular wave just published last year. So this notion of data fabric is not anything new, but you know, everyone started thinking about it from a same perspective. So here if you look at the definition of Gartner, they do talk about it as a way it's an architecture pattern. So it is not a product. Data fabric is an architecture pattern that comprises of many different products. So we will take a look at that and I mentioned data virtualization as one of the key ones. It is to automate the design integration and deployment of data objects, regardless of the deployment platforms and architectural approaches. And it uses artificial intelligence and machine learning to provide these actionable insights and recommendations on the data management and integration. And why do we want to do this? It's basically to provide faster results and in many cases it's automated. So that's basically how Gartner defines it. At the end of the day, the data fabric is a data integration and a delivery platform. It integrates the data across all these disparate sources and provides it to the consumers and it does so in an automated fashion using AIML. Similar definition from Forrester, a little bit brief. Dynamically orchestrating disparate data sources intelligently, securely in a self-service manner, leveraging various data platforms to deliver integrated and trusted data to support various applications and use cases. Similar definition, again, the key terms, key is the data integration and the delivery aspect. And here they use the word intelligent, which is like using artificial intelligence to automate much of the processes. So why do we need the data fabric? Okay, so here the reality is this. You have your data spread across multiple different systems. And when a business user comes and says, I want to understand the relationship between customers and products, you immediately bring those data into a yet another repository. One from the CRM, another one from maybe PLM system and then you combine it and then you deliver it. And how often are you going to do this? How often are you going to replicate? There are so many repositories where you start replicating the data and very quickly this data gets out of sync with your original sources because those systems are automatically accumulating data as the business basically runs. And so this is not an achievable strategy. And many of these connections are very brittle because one, you're migrating the systems maybe from on-premises to the cloud. Some of the tables are getting deleted, new ones are getting created. This is a lot of motion that actually happens. It's very difficult to kind of start replicating this across multiple different systems. In fact, I talk about this as a conflict between like centralizing and decentralizing data in which back in the 80s, you know, those of us who are doing computing at that particular time, you would know that IBM and Oracle started the database wave. And that became so popular that there was no longer a single source of the data and because the database is multiplied, then we had to come up with this concept of data warehouse for analytics or ODS operational data stores for operational purposes. And so that became like the unified view of the information until such time in the 2000s when unstructured data started coming in in the form of social data, IoT data and so on, and which no longer could be put into structured repositories. Then we needed some system like Hadoop or a big data system, Data Lake, where we can dump both our structured and unstructured data. But that thing failed to provide a unified view across the entire enterprise and they were not able to make that the single repository. So now we have multiple different repositories. We have data warehouses, we have data links, we have databases, all and cloud systems right now. And where we are going in future in terms of the data generation, this is not going to work. So as you can see, there is a constant struggle between trying to centralize from the business purpose, which is very easy to go find information and the decentralization because like the amount of data that actually gets generated and the new sources that get come into place far outweighs our ability to physically collect them into one area. So that's why in forgotten assays is to stop collecting the information or replicating it into additional repositories but start connecting to the data wherever they are and bring about the data from the sources rather than trying to move them all into one central repository. So very quick look at what the data fabric architecture is about. So you can see this as a three layer, you have the data sources at the bottom, you have the data fabric in the middle and the data consumers at the top and the data fabric integrates the data from underlying sources and delivers that integrated view back to the consumers and in doing so it has the data integration and orchestration layer, it has the insights, active metadata, semantics and the data catalog. So if you look at, this is the first version of the data fabric architecture, although it's a much more detailed and it has like five different layers. There's the data addition that actually brings together the data across the multiple sources, whether they are cloud or on-premises and then it actually replicates it into a persistence, into a Hadoop or a data lake or a similar system. From there it applies these transformations and cleansing and preparation of that for the delivery, which is the last part where basically all the BI tools and others basically consume the data out of these ones. And you see AIML all along in each of the sections, that is to automate in every step of the way. This is called a data fabric and the data fabric can be a physical or a logical. This is a physical one because all the data is actually persisted into a repository like a data lake. Now if you remove the persistence then it becomes a logical data fabric, which is the one which is more advantageous than a physical data fabric and I'm going to be talking more about that in the subsequent slides. So I mentioned at the beginning that the core premise of using the data fabric is that you integrate the data and you deliver the data, but in doing so there are multiple different technologies that come into place in order to help achieve that. So if you look at the forest's stack of having like five different ones, it's not a single technology that delivers all the five of them. There are different technologies that come into each of the areas like you have the data integration, you have the data preparation and so on. There are different technologies that come into place. From a data integration perspective, data virtualization is a technique or a style of data integration which is more modern. It is much more faster and it overcomes the problem of how to replicate the data. So it is a virtual way or a logical way of integrating the data. So let's take a look at that at a little bit more detail over here. So the data virtualization consists of like a six different capabilities I would say. The key among them is this concept of data abstraction. You see the diagram, you have the bottom layer which is all the data sources and these data sources are on premises, on the cloud, wherever they are. And then you have the business users at the top who are consuming the data from all these different sources. And the data virtualization is the mezzanine or the in-between layer that actually this intermediates the business users from the underlying sources. I'm a business user as a chief marketing officer. I really don't care where the data comes from. But my criteria is that I want data quickly. When I ask it in March, I want it in a day or two to be able to make the decisions that I need. I need trusted data because I'm making decisions that the data better be accurate. Okay, let it not be having data quality problems. And the third thing is that I want the data as fresh as possible. I don't want the data to be landed somewhere and it has a sync problem of a day or week or something like that. I want to know what is happening right now because my business situation is changing very dynamically and I need the freshest data. So the data virtualization layer becomes this logical layer where I can go and ask for all the information in the page. Give me the current data, give it to me with the highest quality and in the format that I actually need it. And it will go figure out whether data is underneath in the sources, whether it is on the cloud, on premises, structured, unstructured, data address, data motion doesn't matter. It integrates the data and gives me the data the way that I want. That's a very powerful concept because there are not any other technology in the IT landscape that provides that level of disintermediation through which the IT can function on their own. You can go about modernizing your platforms, but the way you want to see it. But the business users can continue to get the data that they want without the disruption. More often when you actually migrate the system from on-premises to the cloud, there is a break in the business continuity. It's a new application, new technologies, sometimes the capabilities that were there in the old is not there in the new one. I need to be trained to be able to use the new system. There are a lot of challenges that actually come about, but if I can just go to a familiar place and get all the data that I need, then I don't need to worry about what IT does. So it provides me self-service. It provides me the ability to continue my operations. And the second one is the data integration aspect, which is the data integration with the replication. There are a lot of other data integration styles like ETL, a dominant one that you know of. You would have heard about change data capture. Now, nowadays there is something called the streaming data integration. So there are multiple different ways of integrating the data. But in all these cases, replicating the data costs you from two perspectives. One, the time it takes to replicate the data into that and the resources needed to do it. And we talked about it. It gets out of sync and so on. And the second one is that it actually becomes, it is costly in terms of storage, right? You're going to store the data somewhere so you need to find a place to put it. And the third one is the easy access to the data. We talked about it. It's a real-time data. When the blue layer at the top requests that the business users request the data, the data virtualization passes the query down to the sources and brings back the data in fresh time. So it's all happening in real-time because the data is not stored anywhere. It has to hit the sources in order to bring back the data. So the data is as fresh as it has been created at the sources. So you can make your real-time decisions right now rather than having to wait for the data to be refreshed if this were a physical repository. And then being the enterprise data layer, it has a data catalog where you can have all your business definitions. It has metadata. Metadata is basically information about the data. So it actually, the data is in the sources and the data virtualization layer contains information. Which data is in which source? How do I go get it? How do I combine the information? It contains all these rules, which is basically the metadata. And it's a single place where you can lock down the security so the data is very secure. And the delivery is where, you know, I as a marketing user might want my data different than what the finance user would be. So the finance team would look at a customer as a paying entity, but I'm viewing my customer as a buying entity. And there are different attributes that will come into play if I'm having those views. So I want the data delivered my way, not the finance way, and the finance wants it their way, not the marketing way. So like that way, this can actually deliver. So that's where the power of data virtualization comes, which is a very core part of the logical data fabric. And for those of you who are not very aware of the data virtualization, it's a technology that has been around for many years. And if you look at the Gartner hype cycle, it is all to the right, which means it's a very stable technology. But whereas the data fabric, as I mentioned, started coming up last year, it's very much with the peak of the hype cycle. But, you know, with the data virtualization providing the support, it is a much more stable one. And here is another chart to kind of substantiate that from Forrester, when they ask the question, which technologies are you using or planning to use in the next 12 months, data virtualization comes all the way to the top next to the database. So it's a very familiar technology if you're not familiar with it. And I presented only the conceptual in the previous three layer diagram, but, you know, there is a technical nature to it. So here you can probably see that there are the multiple different sources. And in this case, the denoted data virtualization has about 150 plus adapters to connect to whether it is structured or unstructured. And it can also go towards your existing unstructured sources or even your local files in terms of the Excel and Word and so on to be able to pull the data. So it uses multiple different formats to extract the data, whether it is using like a SQL query or using APIs, it can bring the data back in. And it has, so that's the connect part. And then the combination part is where it actually combines the information and it uses the middle layer to do a lot of those things. And one of the key aspects, you know, there are there are technologies that actually use the data virtualization view. If you're using Tableau Power BI, you're directly connecting to the sources and you're lifting the data. But quite often what happens is when you ask a question such as like, I want to know my most profitable products in the last five years. Your data warehouse could have millions of rows of data. If you have a data lake, it could have billions of rows of data. But in order to combine the information, so the data warehouse is probably where you might keep your current data, the data that is in the last one year. Historical data, you might dump it in a data lake because storage is cheap in a data lake and you might just dump it there. So if I'm asking for the last five years, I need to go to the data warehouse to get my current data. And for the previous four years, I need to go to my data warehouse to get the data and combine that information to combine the information. The combination happens with the blue layer for many of the reporting tools for which they need to lift the data. If there are two things that the technology has done very well is that we have driven the compute like our iPhone today is much more powerful than the IBM XTs that we used back in the 80s. And also the storage has become much more cheaper. One thing that has not progressed enough is the bandwidth. Transporting the data is where the problem is. And if you're lifting millions or billions of rows of data, you'll be sitting there at the chart and looking at for a minute or two or even 10 for it to refresh. What data virtualization does is it uses dynamic query optimization to run the queries at the sources and bring back just the results. And I don't need to, if I'm asking for water, I don't need to bring the entire battle to you in order to give you a cup of water. I can just fetch you the cup of water, which is what data virtualization does. So it optimizes the queries and lifts just the results and gives you the aggregated information and there are many optimization techniques that it actually uses to be able to do that. And it is also providing like caching where you can, if you're running the same query again and again, if you're getting the same data sets, there's no need to go get it from the sources every time. It has a cache in which you can store the information and then you can deliver it. So that is a capability that it has. And it has many other things. It has AIML, which was the requirement for the data fabrics to be able to automate much of these functions, data integration and delivery functions and provide recommendations. And it has governance because it has a data catalog in it and it has security because it's central place where it can actually govern the secure aspect of it. So from a consumption perspective, we provide a user interface for business users to be able to search and do the data discovery using the data catalog. Alternatively, you can use your own current BI tools and reports and charts, all those things will be much more faster. Or even you can use your own portals or any other applications that use API access because the data virtualization layer ultimately holds all the enterprise data objects. And we can just expose them as APIs and then you can call it to get the data that you want. So one example that I'll quickly quote is Seco's Bank. It's a regional bank based in Florida and they were acquiring multiple different other banks. So they were acquiring, you know, mortgage banks or, you know, belt management banks and so on. So that limited their flexibility to be able to see the customers as a universal across all of them because they wanted to cross-sell and upsell. So if you're a retail bank, you want to cross-sell mortgages to them or if you're a mortgage bank, you want to cross-sell checking and savings to them. But they were not able to get that view because FIS, Financial Information Systems, was a third-party application that they had and they couldn't make the changes needed for them to see this integrated view. So they wanted to bring back the data by housing back into their enterprise. And their decision point was whether should I go for a physical approach or a logical approach and they went with a logical approach and using data virtualization. And we will see in the next slide how they benefited from it. So they set up this logical layer and they created different virtual marks which are views of the data. One for finance, one for marketing, one for credit verification, one for risk aggregation. So all of those things they set up and they consume the data using SAS, which was the primary BI analytics application. And they also use Tableau for the reporting aspects. By doing so, they were able to cut their time to production by 50%. If they were to have gone with a physical data warehouse, that would have taken them about eight months. But with the data virtualization, they were able to cut it down to less than four months. And also before when the business users went and asked for the reports to be customized, which we very often do, it would take them about two to three days to deliver it. But now they could actually get it done in two to three hours, which means understand the amplification of it, right? You can do it like what, 10 times, 100 times, much more faster than what you could. And you could do it yourself rather than having to go to the IT. So those were the benefits they were able to gain from the logical data fabric perspective. And they are not the one-day form. There are multiple different forms that have used the logical data fabric with data virtualization of the core and benefited from that. So let me stop there. And there are some questions that are coming in. So let me take some questions here. And please make sure you visit our virtual booth. And if you have any questions, my team will be there to kind of answer. So here's a question that comes, glad to hear data virtualization has been on for years. I wondered why it wasn't fully embraced across the industry before now. Okay. So fair enough, compared to the other styles of technology data virtualization has not been that prevalent. And the reason for that is many of the companies have been stuck to the legacy base of doing things. Which company doesn't have an ETL too? Which to me is a legacy data integration, right? Anytime somebody wants an integrated view of a dashboard or something, the first thing they think is, can you actually integrate the data from system ABC using some ETL script and deliver me a report from that? They think about it that particular way. But if you go to my customers who already are using data virtualization, for example, Spectrum Health is a healthcare provider in the Midwest. And when they had to create a dashboard for the COVID stuff, they use data virtualization within a week in which they were able to identify which providers are healthy enough, what are the supplies, all the different things. So they were able to bring that all that information. Within one week, they were able to stand up a dashboard. People are not thinking that way. So those who are used to the data virtualization have now attuned to be able to use that. But in the past, more people are structured in the ways that they already know and they go with that. Change your mindset. Explore data virtualization and logical data fabric. You will benefit from that. So another question, how is logical data fabric integrates with logical data warehouse? And what about Lakehouse? Good question. So the logical data fabric, as I mentioned, is an architecture. Multiple different technologies come into that. Logical data warehouse is very much a core part of that. The logical aspect is brought about by the data virtualization technology, whether it's a logical data warehouse or a logical data fabric. The logical data warehouse, all it says is it integrates a physical data warehouse with other repositories like a data leak and other things. A logical data fabric is no different than that. The only difference from an analyst's perspective is that the logical data fabric possibly adds artificial intelligence and machine learning to automate much of the processes that would happen in a manual fashion with a logical data warehouse. That's about the only difference. Now, the Lakehouse is a new concept that many people are throwing out so that they're combining data lake and data warehouse. Do you go back and read a report on Gotner? Five years back when Hadoop came along, people were throwing out the data warehouse thinking they can run analytics on data lakes. And did that happen? No, it did not. And Gotner has a very good chart in which they talk about a two-by-two diagram in which they have known data, unknown data. If you know your data, you know your questions, you use data warehouse. Like, what are my profitable products for the last five years? If you don't even know the questions that you're going to ask and you don't know what data is needed for that, then it's your data lake. You dump all the data in there, let loose your data scientists, go figure out the questions and then figure out the answers. So those are two separate technologies. I still haven't bought into this concept of a lake house being combining two of them. Data warehouse has been around for years. It's not dying anytime soon. And data lake was at the peak of the hype cycle. It all crashed with, you know, we saw what happened to cloud error, Hortonworks and MapR. And now there's a resurgence that I actually see, wanting to be decided, maybe there's a cloud version of a data warehouse and a cloud version of the data lake. But the lake house, I don't see that happening anytime soon at least. So Gotner said 30 minutes on the session. So I think that we are done. Sorry we weren't able to get to all the questions to the audience, but they are available in the Q&A. And Ravi, if you had an opportunity to go back through into the session and check them out, then maybe there's a way you could answer some of them if you'd like. But thank you so much for your presentation. Thank you to our attendees for tuning in. Please remember to complete your conference session survey on the page for this session. And the next sessions will start in about 10 minutes. Thanks everyone. Thank you all.