 My name is Gal, Gal Ziton, like Gal Gadot. I'm from Octopi. I'm 20 years experienced with the business intelligence domain, starting with the beginning as a BI developer, later on business analyst, data analyst, and after that, data architect, and after that, a private consultant, can you hear me? I think this is the mic. Adam? Now? Can you hear me? No? Louder. Louder? Adam? Okay. Now it's okay? Okay. So, lots of experience of the data intelligence and the business intelligence consultant of big enterprises, and then six years ago, founded Octopi. And today, we'll not talk about Octopi, we'll talk about the places you will go with data literacy. Most of you, I guess that all of you, know Dr. Seuss with the famous, one of the most famous sentences that hold the places you will go. The idea of this sentence is for a new beginning, if you want to start a new way. And this is a motivated book with a motivated sentence that drives you to try and to explore and new things and don't wait at the waiting place. So, you need to go and to explore and to take adventure and to find new things. And the idea is don't wait, things come to you. You need to accomplish and you need to find the way to achieve what you want to do best for you. And if you wait for someone to give you, you can wait and wait and wait. And it can sometimes not happen and it's too late. So, this is a, when you are starting a new way, this is a good book to give us to someone that want to find this new adventure, new challenges. How it's connected to that literacy, you probably ask yourself. So, the context to that literacy, if you use and understand the data, you will find great new valuable information for your organization. Don't wait, something will happen, it will not happen. You need to do analysis and you need to do kind of insight on your data in order to find new valuable things. There are a lot of goodies in the data. So, that's the reason this sentence is here. By the way, if you have any question or something that you want to add, be my guest. It's an open discussion. So, I read a lot on the web and there are some different approaches how that literacy and that governance come together and what is the difference between them. From what I heard from an analyst and also from the web and from what I read and what I understand, this is the best clear way to define the difference between data literacy and data governance. Of course, there is a common attribute and overlap between them, but the main idea of data literacy is how you use the data and data governance is how you manage the data. It's different, but again, there is an overlap between them. For example, if you want to find where the data come from and if you want to understand the data and if you want to analyze the data, this is more focused, more relevant to the data literacy. The data governance more for the policies, how to manage the policies, the processes, workflows and, of course, regulation, security. So, this is more for the data governance. The overlap is, of course, if you want to manage, then you need to understand the data. You can't manage something that you are not understanding the data. Any question about this slide? Again, you will find more concepts at the web. Some of them are data literacy separated completely from the data governance and also data discovery. From my perspective and my knowledge and what I read in the book and talked with a lot of analysts, data discovery is part of both of them, for data literacy and data governance. They are in a life of data ecosystem. You know, data is everywhere and data is scattered everywhere on the BI landscape of the organization. And many times the VP sales or VP marketing want to generate a report or want to find a report, a report that exists in the company and you can't find it. So, what is it doing? You are probably familiar with that. It's going to the BI manager asking, well, can I find a report that generates revenue for quarter number three or number four? And then the BI manager asking to go to the data architect or he reached the data architect to find the report and then they check what is the requirement, what he is looking for, why he needs that and where the data comes from. And they design the report. Most of the time, if they don't find the report, they are duplicating the same report again because they can't find the report. And in some other cases, the one, the other, the data analyst that created the report at the beginning is not longer in the company. No one, no one knows where the report and it's frustrating and the process is long and this is our day. And it's time to take action. It's time to take action to manage if you want to implement data governance or to understand how to use the data. And that literacy is one of the things that can help you to understand and to find and how to use the data. The major challenge, the major challenge of today that the literacy can solve is the lack of visibility and control of the data that's scattered through the entire BI landscape. And I have a slide here to demonstrate only some of the system across the BI landscape and the data intelligence landscape. You can see the ETL, we have Informatica and Data Stage and SSIS that lots of organizations shift to Azure Data Factory and the new player in the area, the Matillion and Taland and of course, we have the tradition, the stored procedure from SQL Server and Oracle. Same thing with the database and data warehouse, lots of Hadoop and lots of SQL Server and Snowflake, of course, Teradata, Oracle SQL, and now we have also the Azure Data Lake. And analysis services and reporting, why all those system here? Because we did a survey at Octopi and also we use a data versatility and the other firm that help us to do kind of analysis and it seems that BI Group invests more than 50% of their time and effort to manually find and understand the metadata. Why is that? Because almost any organization use different vendors on their BI landscape and I spoke with 100 of them and all the time I asked the basic question, do you know how many reports do you have in your organization? I don't think or I don't remember that someone answered this question even if in a 10% variety from the correct answer. Same thing with the ETL, do you know how many processes do you have in your organization? Again, most of the time, almost 100% of them didn't know the correct answer. So that means that we need to take an action and to find a solution for that. Data literacy is one of the solutions. Data literacy you need to understand the lineage also from the DB sources and the operational system and the source system from the marketing, CRM, ERP, all the finance HR system that flow through the ETL to the database and the data warehouse and from them to the analysis and the reporting services and the analytics. In order to understand all those data flow and the downstream and the upstream and the impact analysis and the root cause analysis, you need to implement a solution that will help you to manage and to use the data. As you know, today, lots of organizations use more than one system, even more than two systems. I suppose that if I ask how many reporting systems do you have, one, two, three, most of them, most of you will raise your hands that you have two or three reporting systems in your organization. That's because one of the reasons is the BI self-services. The tableau, the Power BI, the looker, the nodel that generated more and more reports. Some of them are duplication from the origin reporting system. I can share with you that last week I spoke with one of our customers that used Cognos and also ClickSense. We found that there are 20% duplication between the reporting and the Cognos and the ClickSense, not because of immigration, because departments start to create their own reporting reports and they didn't know that those reports already exist in Cognos. And it's growing more and more. Main challenges in that environment, as we speak, as I explained at the previous slide, the growing of the amount of data in the organization, there are a lot of reasons. I can share with you that next slide there are some blogs and a survey that you can see that the data is growing rapidly in the next few years and it will not be stopped. Second is the increased pressure of the data teams for analytical reports, I will talk about later on. And also, inefficient use of data and lack of independence in using data, that means that data is captured all over. And because of that, it's hard to manage and hard to define definitions for the data asset in the data intelligence. Also the consuming of the data, because most of the organization are data driven and take decision through data driven and it's hard to locate which asset to use, which report to use, what are the ETLs that you need to run in order to fill the report that you want to use. And actually you can find a lot of ETLs that runs low data to a table, but those tables are open tables. No one consume those tables. And that's one of the biggest challenges of today, even if you want to do a migration from one ETL to another ETL, you want to understand what you need to migrate from, for example, from SSIS to Azure Data Factory, then if you go deep inside to understand and to map what do you need to do in order to do the migration, you will find that 20, even more, 30% of the ETLs are not necessary anymore. Lots of business analysts and data analysts duplicating assets, duplicating assets because it's not managed. There are a lot of reports, redundant reports, that are duplicated and they already exist at the system and there is no one source of the true for the organization. Data literacy can solve this problem and if we go to the next one is the regulation compliance. As you know, today we have the GDPR, CCPA and all the PII regulation compliance and some of the assets that managed or not managed, if they managed, not all the assets are sensitive and for those that are sensitive, some of them are the combination between them, make them sensitive. For example, first name is not sensitive, last name is not sensitive, but first name, last name and email is sensitive and we need to manage not one asset by one, we need to manage the combination of the asset in order to define what is sensitive and what is not sensitive. So this is another challenge that today we have in our data environment. The last one is the loss of available knowledge, communication. This is one of the biggest challenges at the data environment. It can be because the BI teams are separated in different areas or there is some organization that not implemented a proper collaborative and communication between the team and by that this is very hard to manage the data environment. Also there are some people that left the organization and take with them the knowledge of some of the process design, architect of the processes and if it's business analyst and data analyst also the existing of the reporting. So this is one of the challenges that we need to share and we need to do a collaboration between the team. Without that it will be very hard to manage the data environment. This slide will show you the amount, the growing of the amount of data. You can see that World Economic Satista and Sign Focus showing us the growing, the ever growing amount of data in the organization from year to year. It's exponential, growing. Look at the even Twitter. Every 24 hours 500 million tweets are published on Twitter. That's amazing. It's growing data efficiently. So this is a very challenge for our data environment. Regarding the pressure. This is all of you at the same boat. Lots of pressure from the business, the business demand, the business request. There are lots of projects right now because the organization is a data-driven organization. Migration project. Today we are in an era with lots of migration. There is Snowflake, the data warehouse in the cloud. Lots of organizations move from the tradition, SQL Server, Oracle and the other database and data warehouse to the cloud, to the Snowflake. Same thing with the talent and the Azure Data Factory and the Power BI on the cloud and Toblon on the cloud. All the tradition legacy systems such as Cognos, Business Object, Oracle or BIE move to the cloud. So lots of migration outside. And this is again, this is another challenge because the business want to do the migration and also the IT want to do the business intelligence want to do the migration but the business is not stopping. We need to continue to do business. So the data team have lots of pressure on them. So this is another challenge that Data Literacy can help us with this challenge. Data Literacy connects all data citizens in your ecosystem and can answer the following question where should I look for my data? Where is the data? Where data came from? Does this data matter? What does this data represent? There is a different, by the way, a different description. There is a description for data dictionary description, technical description and there is a business description. It's not all the time the same. I can say that most of the time it's not the same. The data dictionary is more focused on the technical data. It can be a description for the asset that you need to manage on your IT. But in the business description it is different because it's more for the business people, for the data analyst and the data architecture and data steward that want to understand the business meaning of the asset. So also is this data relevant and important to us? What is the tagging of the data asset? And how can I use this data? Do I need to use it with another asset? Do I need to find the calculation of this asset? What is the calculation of the asset? If this link to another asset, there is also linking between the asset. It's called impact analysis. So as you can see the data literacy is in your ecosystem and we need to order it and we need to align the dots and to find the one source of the truth for the data environment. What are the pitfalls to achieve effective data literacy? First of all, today we are trying to manually create inventories of sketched data assets. I can share with you that it will be very hard to do it manually. It will take lots of time, lots of effort and it's involved with millions of assets that you need to input one by one to your data inventory and this is something that you want to avoid. The solution here is the automation. Automation can help very quickly in order to input all the inventory to the data catalog or data literacy and also you can also add a description automatically using machine learning, using AI. There are some solutions outside that you can use them. When you choose, when you want to manage the asset in your organization, you need to choose the right platform that will be user friendly for the business and not only for the IT people because if it will not be user experience and user friendly the business will not use it so you can manage the asset on your IT system and you think that everything is managed but when the business not use it so actually you are managed asset for yourself not for your organization. So this is one of the things that you need to achieve when you choose the right platform to manage the asset. And the last one is the lack of ability to communicate in the context of the data. Okay, that is clear. You need to communicate with all the other stakeholders in the organization to collaborate with them to do communication between them and even to talk about assets. Lots of people want to to share their knowledge about some of the assets and to post to want to each other about the asset and also to get feedback from their colleague in the organization. The last one is the choosing the platform doesn't include traceability through advanced multi-dimensional data lineage. If you don't know where the data come from and what all the manipulation between the source system to the reporting system you probably lose some of the insight that you need to understand how to manage the data and the asset in the data literacy. Because some of the column, for example transform from Salesforce, for example to the reporting system three, four, five times with the name. It's the same meaning but the name is changed from one platform to another platform. This is something that you need to manage and also to trace from source system. In the impact analysis it's very important because when you change something column in Salesforce you want to understand what are the reports that will be affected from this change before you go live to the production. Or if your report is empty or something in the report is wrong you want to understand what are the ETLs the processes that you need to run in order to fill in the data in the report in the correct way. The three pillars of effective data literacy one, automatically generating and centralized of data assets we talk about it this is something very important automation can solve lots of problems that we have it's very hard to manage to manage manual processes and manual input of one by one asset into the data literacy platform so we need to centralize everything in one place to get one source of the truth second one is this is not the second one, sorry traceability as I described before we need to understand how the data flow between the system and also how the data asset is created and some of the assets in the reporting system are calculated columns calculated columns columns that not represent the data warehouse or database and even not in the source system because they are calculated from three or four columns and it will be all only exist in the reporting system so we need to manage also those columns the calculated columns are very very important if there is a column that called average amount of sales for example you will not find it in the data warehouse or databases or in the source system because it's calculation of different columns in the reporting system and when you have automation that will automate the extraction of those assets from the reporting system and inject them to the data literacy and then you can manage those assets and to understand how to use them and to find them then you can get more value for the business and more value for the decision maker the third one is the build-in collaboration it's very effective that you get the feedback from the colleague from your colleague in the organization and you can post and you can talk about the data asset attribute tagging, description, links with the data steward that manage the asset and how to use the asset and communication is very very very important in the data literacy aspect and it's very effective because you get the feedback from the other players in your organization then you can understand if there is accuracy in the description or if it's sensitive or if you want to get a rating about the item when it's collaborative then you get a lot of value and it's shareable with all the other in your organization how can you get your organization data literate so first of all data discovery you need to understand where is the data what are the systems the data inside those systems what are the assets that you want to extract from those systems I'm not sure that it's relevant to extract all the assets from the ETL for example from all the processes maybe you want to extract only the source the source of the ETL the target of the ETL and not all the manipulation inside ETL the reporting system we have the physical layer we have the semantic layer and we have also the presentation layer and in the cells BI tools we have also the report layer so I'm not sure that you want to maintain and extract all those layers all those assets from all those layers so you need to define with your organization what you want to manage and what you want to extract you can extract everything but it can give you a lot of mess at the organization so data discovery to find the right asset to extract and I can share that now we have also the for example Power BI you can do a live connection to Tabular so there is also description in Tabular if the organization used those descriptions so we need to also extract those descriptions and those assets from Tabular but if not so why do that so it can be a lot of noise for your data literacy so data discovery is very important data lineage as described before to trace any data end to end through all the entire BI landscape and you can do that with Excel spreadsheet or with documentation because this is not up to date today data is growing rapidly and you need to find automation to do that if you try to maintenance and Excel it will not be updated after one week because all the time we are changing processes we are creating new processes and it's very hard to do manual maintenance without automation so data lineage is very important for data literacy the last one is data catalog create a company-wide consistency with self-creating, self-updating data catalog you need to give the business analyst and the data analyst, data architect all the users in the BI landscape and data intelligence landscape to create new assets to link between them and to add tagging, description calculated description is very important who is the data steward if the asset is sensitive or not sensitive but you can choose only one of them you can choose data discovery and data lineage and not data catalog if you want to do data literacy you need to combine all those together and of course as I said before leverage the automation to create one source of the truth for your data includes all documentation about the data it's passed from the world to maintenance everything in the Excel or WOD or PPT today we have a solution to do that and today we have automation so this is very important in data literacy summary before the question data is the core of business management and operation for the decision maker it's very very important to understand that we need to manage and to understand that this is the core business take decision when they have data and the data is very important to manage it as well every enterprise is data driven because data is everywhere it can be in IoT it can be in the source systems it can be in the reporting system database, data warehouse, analysis services everywhere if you want to return to optimize the ROI the return of investment we must invest in data literacy if we as at the beginning Dr. Seuss if we are staying the comfort zone in the waiting room waiting place then nothing will happen no one help us to manage the data if we not take an action and we invest in data literacy that will bring optimize our eye for our organization we need to take action we need to define what we want to manage how we want to manage and also to find the right solution to manage the asset and this is very important the last one is very important data literacy is not a project this is not something one event it's not something that you can you can say ok we will put 3-4 people 3-4 months and then we will have a data literacy this is not something that will work for you you need to understand and you need to adapt the concept that data literacy is a lifestyle this is something that will go with you all the time of course we have the implementation and we have the initialized first time for the implementation but later on you need to invest in the data literacy otherwise it will not be correct it will not be accurate and then the business decision maker and the business people will not trust the data and if they will not trust the data they will not use your data literacy and then all the work that you have been done will go to garbage so this is it we have 9 minutes for questions if you have if not it's ok by the way you can come to the booth if you have more questions for me or for my colleague in Octopi about data lineage data discovery data catalog I can share with you some use cases for our customers that use Octopi to implement data literacy question or something ok thank you everyone