 Hello and welcome, my name is Shannon Kemp and I'm the Chief Digital Manager of DataVercity. We'd like to thank you for joining this DataVercity webinar, the missing link in enterprise data governance, automated metadata management, sponsored today by Octopi. Just a couple of points to get us started. Due to the large number of people that attend these sessions, he will be muted during the webinar. For questions who will be collecting them by the Q&A in the bottom right-hand corner of your screen, or if you'd like to tweet, we encourage you to share highlights or questions via Twitter using hashtag DataVercity. As always, we will send a follow-up email within two business days containing links to the slides, the recording of this session, and additional information requested throughout the webinar. Now let me introduce to you our speakers for today, Donna Burbank. We'll be joining us as the analysts for this webinar and Elm Laundrory, the CEO and co-founder of Octopi. Donna is a recognized industry expert in information management with over 20 years of experience helping organizations enrich their business opportunities through data and information. She is currently the Managing Director of Global Data Strategy Limited, where she assists organizations around the globe in driving value from their data. Amnon has over 20 years of leadership experience in technology companies before co-founding Octopi. He led sales efforts at companies like Panaya, Zen Technologies, Modus Nova, and Alvarian, and also served as the Chief Revenue Officer at Cool Data, a big data behavioral analytics platform. I'm non-studying management and computer science at the Open University of Tel Aviv. And with that, I will give the floor to Donna to get today's webinar started. Hello and welcome. Thanks, Shannon. Always good to join you and talk about metadata, a topic near and near my heart, as folks know. So today we're going to be talking particularly about automated metadata management and what might have changed in the industry with some of the tools like Octopi, since you may have last looked, and then particularly that link with enterprise data governance because governance and metadata are closely linked. And the more you can really visualize that data journey, especially with things like business intelligence, the more aligned your data governance is. So just quickly, just to go through the agenda, as I mentioned, that link between data governance and metadata, a little bit on the business need, although I'm sure this isn't a surprise to a lot of the folks on the call whenever we do a metadata topic at a university, it's always well attended because metadata is driving so many of the business initiatives, but we're going to delve a little bit deeper into why that is. And then some new strategies and approaches of how you support this ever-evolving data landscape because not only are business needs changing, but the technology environment is changing day by day. And then we'll pass it over to Amnon to give us a little bit of an insight into how Octopi can help. So just for some definitions, because we're data people and we're metadata people even more so definitions are a key part of metadata. So what are we talking about when we're talking about data governance? And really when we're talking about governance, both of these definitions are from the DAMA, DMBoc, or the body of knowledge. So if you're not familiar with that, it's a great resource. And data governance, as you most likely know, is that exercise of authority and control over the management of data assets, right? So on past Data Diversity webinars, you've heard us talk about the need for aligning both business and technical metadata. We'll particularly be talking about the technical metadata. So if you look at the definitions for metadata, it's not only about business processes, but also those data rules, constraints, and how you link both your logical and physical data structures. So some of that requires human involvement and some of it requires some automation, which is what we'll be talking about today. So I'm a big fan of trying to make the word metadata a little more approachable, and I won't use that. I'll use it once and then never again. I hate the term data about data because it does not share much information. It just makes it more complicated. But it's one of those things when you see it, you know what it is. It's really that who, what, where, why, when, and how of data, or simply data in context. So if we think about that, who, what, where, why, how? Particularly today, the items in bold will be some of the areas we'll be talking about. Where is this data stored? Where did it come from? Where's the lineage of this information? How is it used and shared? So particularly when you're thinking of governance and trying to understand the path of the journey of data or how data was calculated on a report or something like audit trails. Any of you who have tried to do that by hand, and most of us who have been in data for a while and have probably suffered through that, it can be a painful process. So there's metadata embedded in tools, and the more you can extract that in an automated way, it makes all of our life easier. So there's a lot of pieces of metadata just for sort of clarification and structure. We'll be kind of speaking about that technical kind of the where, who's creating it, and then how it's stored and how it's formatted across different platforms. So this graphic is one I use a lot because I think it sets a lot of the technologies we talk about in a bit of context. So at the top is always the business strategy, right? So what are we trying to do as a business? Are we trying to be more efficient? Are we trying to align with audit regulations? Are we trying to be more successful in the market? All of the above, right? And so it's important to align anything you're doing with data with that strategy, and that's really where I see data governance. So part of data governance is the policies and the regulation and part of that. But a lot of it's about the collaboration and the people and the better view we get of data and the way we can act more quickly on data, that's a business benefit. And so I think we always want to think of that when we're doing things like metadata. So the bottom layer, if you look at the very bottom, this is sort of that top-down approach of the why we're doing it and then the bottom up, not the how. And again, this is exploding day by day in the industry, whether it's databases or CRM or ERP or unstructured or big data, all of the above, we really need to get managed that. And so maybe 30 years ago, you could do it by hand, I would doubt it even then. But especially as volumes and types of data are exploding, really that need to automate is where things like metadata come in. So a lot of use cases for metadata, but if we just pick some of the common ones like BI and data warehousing, that very common use case of I'm just trying to understand in a simple way of how this data was calculated. And in some cases that's simple, but if we've been in the industry, you know there's a lot of transformation that occurs. So trying to automate a lot of that and find that linkage between the bottom of the different sources of data, how it got there for things like data warehousing and BI and how you govern that in a way that both business and IT resources can easily see that. So some of you may have heard us chatting before the call and I'm always a fan of saying that metadata is hotter than ever. I have been doing metadata for longer than I'd like to admit, you know, over 20 years. And I'm always pleased, I should stop being pleased and shocked that it continues to grow in popularity. So Data Diversity and I had done a research paper, I guess it's a little over a year old now, on just emerging trends in metadata. And one of the survey responses there was that over 80% of the respondents said metadata was, if not as important, probably more important than it was in the past. And some of the reasons we've already mentioned in terms of the growth and explosion. But we did ask more detail of why this growth. And I think there'll be no surprise of people in the call, but it ties nicely into the topic today. The number one use case was data governance and the graph on the right is we did a, this was from a more recent report. We just did trends in data architecture overall. And so we were able to show the growth between the 2016 and the 2017 report that data governance is still hot. And then the other things that are aligned with data governance are things like data quality. So data quality and data governance go hand in hand. You need better governance to get out of data quality. And then the use cases we mentioned, the data warehouse and business intelligence. That's a driver for a lot of metadata management, partly because that gives us great ability to your data. You're looking at a report, the term might be wrong, the figure may be wrong. Why is it wrong? And then to really, really get to the root of that, or how is it calculated? Or we have two different definitions of total sales. How do we arrive at that? Master data management, kind of the sister to data warehousing. If you're going to feed your conformed dimensions of things like customer and product, you need solid MDM. And again, MDM is not done in a vacuum. There's metadata and lineage of how do we achieve that golden record? So all of these are really driven by metadata. More on the business side, things like regulation and audit, everyone's favorite GDPR, which everybody has now completely complied with because the deadline is here. If not, you may need some help with things like metadata. But it's not just GDPR, even internal audits. Do we understand the lineage of our data? And then master data again, those are some areas that are growing for a lot of the needs for metadata. So just to kind of put this in perspective, this is probably something we've all run into, either the left or the right of this picture. So again, this rise of the data-driven organization is only growing. That more companies are realizing that to be a leader in the market, you need better analytics. You need a better visibility into how you're performing. BI and analytics is a big part of that. So you might be the executive on the right, and you're looking at the report and say, well, total sales figures just don't seem right. So a business person understands if they're good at their job, they understand this gut feel about what they expect the numbers to be because they're looking at it closely. And it may not be right. So this doesn't seem right. Could you tell me how that was calculated? And could you get it by this afternoon's meeting? That does not seem like a crazy response unless you're the person on the right. And you realize how much work that often takes to really delve into how those calculations were done and what the lineage was in that report. So of course, she says, sure, because she wants to keep her job. That meanwhile, you can tell by the sort of look on her face that this is going to be an arduous task. Because that person on the right has probably done a nice job creating some beautiful visualizations and making it look clean and prepped. There's a lot of hard work that goes into that. So sort of an analogy, I'd never heard the word Rube Goldberg until I got into data actually. It sort of describes our daily life, right? So you see the nice clean report in the upper right, and it came from numerous systems on the lower left. That's obviously a simplified version that's probably more. And then there's that sausage machine that sort of makes the report. And of course, we all have best practices and we try to make it as clean as we can. A colleague of mine, Karen Lopez, who you might have heard on the university as well, one of her favorite sayings is, if you want to make the data simple, make the world simple and get back to me. So it's not that we're necessarily messy. It's that the business rules and the business is messy. There's systems. There's different regions. There's different functions within those regions. So even with the best designed data architecture, it's still complicated. And that's what I think the well-intentioned business person on the upper left here probably does not understand, and that's not their job to understand, but it is yours if you're an IT person. So how do you take this Rube Goldberg diagram where the football sort of hits the net and then it sets the water can off that generates your report? And that's often how I feel building these. And there's something more streamlined, the more textbook, how do we get that clean automated traceability that I know that total sales was X by quarter and it came from a data warehouse with a dimensional model that was sourced from a staging area that came from six or seven or eight different source tools and transformations were done along the way. The good news is that a lot of these systems do have embedded metadata within them. So you may, and it's not a bad thing, often that human view is helpful. You may have a spreadsheet that might do those sort of mappings that I know that source A came from source B and there's a certain transformation. But often we don't have time for that. We're trying to get the report out. So a lot of the systems themselves have embedded metadata, that metadata tools that could be automated can capture and discover. It's tool-to-tool. It's doing things that a computer can do better than a human. You have enough things to try to build the report and doing the visualizations and being that data scientist. That's what humans are good at. Let the computers do what computers are good at and really do that automated lineage. And so not only is there that high level, and I see with a lot of my customers, this is several levels. One is I just need that high-level lineage. Tell me even what systems I'm hitting. Is it coming from six databases, seven? Is there a staging area that goes to a warehouse? Are we hitting it operationally off the source? All of that is valuable just seeing that very basic lineage. But especially if you're a developer, ETL developer or data warehouse developer, the devil is in those details. So even that field-to-field mapping of within each system. Is it associate ID and cusp number and cusp number? Who hasn't done this, right? And it's spaghetti. So again, this is where a lot of the tools can do an automated mapping. Because say you're using an ETL tool. Again, you've done that mapping on the tool. And the metadata can sort of be inferred from another automated tool. So that's almost your standard class book data warehouse, which I am pleased to say despite the rumors of my demise or grossly underestimated whatever that saying was, I still see a rise in data analytics and warehousing. Yes, there's big data. Yes, there's machine learning. And artificial intelligence, it doesn't mean that the core analytics reporting go away. It's that you can augment some of these. So again, I've seen less of some of the, what do you call it, but fear uncertainty and doubt of no one does data warehousing anymore. I see just the opposite. I think it's growing. And it's growing because of the growth in these other technologies I mentioned. So that is your sort of classic BI and data warehousing. But there's other. That's not the only thing we're doing. And trying to govern that chain. So again, the well-intentioned business person on the left. And just by nature, they're so positive. And we're going to change the world. And hey, we have this big, hey, man, we have this big marketing launch next week. And oh, by the way, we just changed the product name. It's awesome. It's really cool. You'll love it. Just make sure all the systems and reports show that new name. Thanks, man. Get right on it. And of course, the same sort of reaction on the right. Sure. Not a big deal at all. And if we ever tried to do that, something as simple as changing the length of a field from character six to seven, right? Can have a massive impact. So being able to proactively see that change. So maybe it's just something I'm changing it again from character 10 to character 30. From cool name to very cool name, right? You can see in an automated way where that field is used, that's used from XML to systems to that classic data warehouse. I may have been doing reports on that. What's going to be affected? And we can all say, yes, we were much more organized than that. And we don't have to worry. But I was working just a few months ago, name protected for the innocent. And a massive major retailer in the U.S. And they literally had this problem would be changed, a product code from character, whatever it was, 10 to 20. And nobody looked at the impact. And they brought down a system and sales were affected. So these are those sort of boring, banal things that folks in the front end don't think of. And we in IT often don't wish we didn't have to think of, but they're critical. And they do affect change. So being able to see in a linear in an automated way, if I change X, what will I change? That makes it not only faster and more responsive, but you're reducing that risk. So a big part of not only driving business change, but reducing risk is governance, as I mentioned. And there's a lot of pieces of governance. And you could have a whole webinar in governance, right? So this is the framework I often use. And it just highlights that there's a lot of pieces of governance. It's what we're trying to do as a business. What are the data issues involved with that? And a lot of people think of governance as the people, process, workflow, and that sort of thing. And it's sort of like that classic story of the elephant. You know, there's five old men with the elephant, and one's looking at the tail, and they say, they're blind, and they're feeling it right. The tail is what I feel. It looks just like a rope. An elephant's just like a rope. And then one man's looking at the foot. I think an elephant feels like a tree, because it's like a tree. And they're all right. And so data governance is about people in process, but it's also about the tools and technology and that idea of what are the systems I'm using, and what is the linkage between them. And that is where automation is critical. So I wouldn't want someone to automate the organization and people part of governance. There is that human eye, but that human eye can work a lot better when you're automating the tough part, like the mappings and the lineage. And that's, I think, that nice fit between tools and technology and automation and the right human piece in terms of your steering committees and your business process workflows and things. And more importantly, that culture around data. So one of the things I love about metadata, and I love a lot of things about metadata, but it's that idea that it makes governance actionable. So some folks who may knock governance is, well, it's just a bunch of people sitting in a room talking about policy, or you're telling me what to do through a policy. Well, one of the nice things about metadata, how I like to look at it, it's really making governance actionable. You can have a policy like PII needs to be not, don't put PII information on the cloud. But unless you can literally do a lineage of where PII is used, then that's just a piece of paper. It's not a real policy. Or I have a report and I want to know how total sales was calculated because I'm reporting that to the street in our financial report. I need to have that lineage. You don't want to have to go and have a bunch of people do an audit every time you want to send out that company report. That's the type of thing, automated lineage, that really is, that's an automated policy. We don't have to worry about it. That's governance that's happening every day as part of the metadata within the system. So again, allowing the systems to help you. When you're using a relational databases or a Hadoop Hive system or a technical system, they by definition have the structure within them that a person trying to document is not the most efficient way. And we've all probably done that, right? So that is a big frustration in the industry. And I talk a lot with a lot of people. I read a lot and we've done our own surveys on it. And one of the biggest issues with something, say like the Data Lake, right? A lot of people are saying, oh, Data Lakes are difficult. They're challenging. What's challenging is getting the metadata around it. And that's one of the biggest risks for a Data Lake is getting that right definition. And one of the barriers is getting that right data, the right inconsistent data for things like data-driven digitization. And a lot of it we have surveys on one of the most frustrating pieces of a data scientist's job or a data analysis job is the manual effort. And if you're that person on the lower left, right, if I have to manually map the definitions for a Data Lake or the definitions for the warehouse, I'm going to shoot myself. What I would love to do is actually build a report, actually get insights from the Data Lake and the Data Workhouse. And that's really where the people have value, not on trying to automate, do what an automated system could do. So it does matter. And let's focus on it in the right places. Part of the challenge is that metadata, the types of information an organization is using is exploding exponentially. So this again came from our trends in metadata management article survey of what people are using metadata for now and then in the future. So no surprise that a big part of metadata today is on data to warehousing, relational databases, BI and ETL tools. And that will continue. You'll see on the right, those are still there. And the relational database isn't going anywhere. It's augmented by other systems like big data, like new SQL platforms, like machine learning and AI, which isn't necessarily a platform, but it's a usage, all adding to the complexity of the one that already existed. So again, trying to manage this is where a lot of companies' heads are exploding is that because of the growth in these types of systems, it makes the need even more of how we can understand all of this. So what was nice about the survey we did, this is the data architecture survey, we allow a lot of, because I know, like me, a lot of you are very passionate about these topics. So we always have sort of a type in your own thoughts here section. So when it came to challenges about metadata management, the top two were the ones we've already mentioned. One is how do you understand the lineage across these heterogeneous environments? So say it really were that we have a simple system between SQL server from SQL server source to SSIS to a staging area to a warehouse, and we have one nice clean system. It may be inconvenient to do that in a spreadsheet, to be better to automate it, but I could probably live with it. But who has that anymore, right? You probably have six or seven source systems, and you have a CRM system, and somebody else before you built it, and I have no idea what they did, right, trying to clean it back to some, I'm working on a project right now, or that's the case, no documentation was left, because who loves documentation. So, and that problem is only getting worse as we saw from this, right? We'll work better depending on how you look at it. As a lover of data, I'm excited with all these new platforms. That's the beauty of being in the industry, but trying to document that is a different story, and that is a challenge. And because of what we mentioned, the other big challenge that people mentioned in their comment, or typed in comments, was this idea of automation. Again, some things you can't automate, I can't automate, no one can of how we want to define total sales and what systems, that's a business rule, that's something that the business needs to define. But how we can see the lineas across these environments of how we actually calculated it, or what sources we've used, that can be automated, and that's kind of that nice balance between human business rules and system implementation. So, the good news is that there is a lot of technical innovation in the market, and the innovation hasn't stopped just with the source systems. There are new ways to manage metadata as well, right? It would be odd if only the databases were changing. The metadata tool vendors are managing metadata in new ways as well, to really keep pace with this explosion of data. So, love this cartoon. Machine learning is the new buzz, right? Machine learning, AI, whoever you want to classify that. And so, they're in machine learning class. Robbie, if you start misbehending on behaving, I'm going to send you back to data cleaning, right? So, machines might feel the same way we do, but we want to get to the cool stuff, building visualizations, doing insights, and data cleaning is really kind of not the most exciting part. But sorry, computers, that is what they're good at. So, yes, they do have to do that. So, you don't have to, right? So, a lot of the sort of tedious tasks that I have done, raising my hand, you probably have to, is just something, the basic data mapping that, and we had an example before, cusp num is cusp number and another, or something like social security number here in the U.S. SSN might be field one, SSN, or social underscore num or field one, right? But there's a certain pattern to a social security number. It's number, number, number dash, number, number, dash, number, number, number, right? So, a machine, they're really good at understanding patterns like that, and that's a place where a machine can not only do a better job than a human, because they tend not to miss things, and it just takes the time away. I mean, there are places for defined mapping roles, and in some cases you want to create a mapping that we want to have a data standard that SSN should always be SOC underscore num, or you can still do that, and it doesn't mean you can't do certain manual mappings, but just use that wisely, right? Where a machine can infer some of these patterns, all the better. That really is what machines are good at, so let them do that. And this is technology that didn't exist 20 years ago, or the theory was there, but the technology has just advanced to such an extent now. We can use that to really help us understand this ever-exploding landscape. That's one that I keep showing, right? Because we'll do the survey next year, and we have plans to do so, and it'll be the longer list, right? So, again, let the computers help us keep pace with that. So, there are these field of tools called metadata discovery tools, and that really is what they live and breathe and do for a living, and that's really where automation is critical. And they have been around in the past. It's not that the idea that you could get, you know, read a data dictionary from Oracle has not existed in the past, but the fact that, A, there's more sources, and the way we can discover some of that discovery has really changed with things like machine learning. So, on the left, again, that's that classic picture we showed of I have six or seven BI tools. I know there's a standard in the organization, but I really like Tableau, even though everyone else wants Power BI, right? Who doesn't taste that, right? And I know everyone else is on a SQL server, but I have this Oracle warehouse that I love, and I'm not getting off it. And, oh, I want to have something on Hadoop, right? So, that's the reality in any organization that you have dozens of different types of systems and then actually instances of systems. So, what these metadata discovery tools can do, again, they have their own scanners or whatever you want to call them that can read that metadata, the internal metadata that Tableau has in its computer mind and what Oracle has in its computer data dictionary and do that heavy lifting for you. So, that picture on the right we showed, that classic what I called the textbook lineage, they do a pretty darn good job of getting. Of course, there's the human just makes sure it's right and there's going to be some gaps and you may have to customize some rules, but you would be surprised. I mean, you might want to try one and just give it a chance. Run the scanner against your system, and I've seen a lot of light bulb moments in the middle, and that's why I had that light bulb of, again, there's so much legacy code. In fact, I have to go back to this picture I keep showing. When you look at... See if I can highlight it for you easily because I know there's a lot of things here. When you look at what is going to be managed in the future, one of the largest ones that's actually growing is legacy systems, right? So, is it that more people are going to suddenly start building things in COBOL and JCL? No, but a lot of those people are retiring and the need to understand some of these legacy system is growing, right? So, a lot of the systems you may never have seen before, just think of... You haven't built this warehouse, but somebody else has an Oracle, they've gone away. You point the scanner against it, and then you can kind of see some of that lineage, right? So, you're not relying on human beings. So, I would just give that a try. They can do a lot more, especially with things like machine learning. And the other nice piece, and I don't want to steal and non-thunder because he'll show you, is the way I'm a big visual person. It's why I love things like data models and visual lineage. The way they will almost look like this picture you've seen, that you can literally see that flow. Because, again, you might like to see it in a kind of a spreadsheet type mapping, and sometimes there's a great place for that. But often we kind of think in this data flow, so be able to see that, and then kind of drill down to, oh, what does this line do? I need more in that ETL process. It's really what some of these tools can do. So, I see some of the comments and you guys are saying, yeah, Donna, that's nice. I want to see it. So, I'm going to pass it over to Amnon. He'll actually show you his solution of the market Octopi. Amnon. Thank you, Donna. Yeah, so I'm going to continue where you just talked. And I think that the key takeaway of the role of metadata that has become more and more critical as you go by for data savvy companies is really enormous. And the pain that you mentioned around metadata management, it's exactly what led to building the company. And four years ago, we've decided to deal with that pain because we used to lead BI groups in large companies and the pressure was so high and the deliverables were taking longer and the chaos about adding more data because data consumers' number had grown and the velocity and the volume of data created a very, very complex environment. So, what we wanted to do is to discover it, getting it and analyze it and work with it. And it's all around metadata. And the three things that we've taken upon ourselves is to try to do exactly what the opposite that we had at that time. One of the things that we wanted to do is to cover the entire infrastructure rather than use multiple tools that are focused on very silo-based systems. Why? Because then you can connect the dots and understand the flow and the transformations of a single metadata item or data element across systems. And this is something that is really, really critical which leads to the data lineage in various ways. The second thing is that we wanted to avoid to invest a lot of work in order to be able to work. So whereas data lineage is important, the journey to get there is very, very painful. So one of the things we wanted to do is to spend one hour to two hours of our time and within 48 hours to get the results. That's it. To be able to get the results fast, painless. And the third, which is really, really important and we see that as one of the greatest feedbacks from our customers who are in the healthcare, telecom, insurance, banking, retail, government industries, they're all using our product is the ability to socialize that information when the entire BI team members. So the information is not stuck or designated to certain people within the team. Everybody can communicate. Everybody can get access to the understanding of the data movement process. And this is just with a click of a button and get that in five seconds. So as you mentioned, metadata is everywhere. And as you can see on the screen, here are what we call the top six use cases in which we see our customers use metadata and metadata automation in our product for the past two years. And the most popular use case that you mentioned is about businesses are looking at the report and he doesn't trust the data or she suspects that the data cannot be trusted. What actually we're being asked is, could you trace back, can you reverse engineer how the data landed in that report? And that's in the upstream. In the downstream, if you want to make a change, the one thing you worry about at all if you're a data architect or business architect is that if I were to add this field, if I'm going to change this map, what other things are going to be impacted by that change? Now, because I don't have the full mapping of all the dependencies that related that change, I might go back to use case number one. I'm going to go live and then I'm going to get calls from business users suspecting that the data is wrong. So I'm not going to get into two of the other parts, but let me kind of demo or fulfill the promise which is set about automation and actually show you the product. So I'm going to share my screen and you should be able to see three round circles. If you don't see them, just let me know. But what you see here is a collection of metadata that we were challenged by one of our prospects and we used that as a demo platform. The challenge was to find a report that was reported by a business user showing suspected mismatched data and very fast to understand the data movement process that landed the data in that report. Now, the interesting part of that story was that it took that specific customer five days to draw the lineage. In other words, find the exact named ETL processes and database tables that were participating in landing the data in that report. So the first thing we did is go back to one of the three values. We didn't want the customer to work very hard. We just asked them to decide which systems do they have in their infrastructure and just connect it. That's it. The only thing you need to do is to decide which system do you have in your infrastructure. Click on that and add Octopi to fire up the extractor that is relevant to extracting the data from that specific system. That's it. That's the only thing you need to do. And as a matter of fact, you can actually schedule this to run as often as you want. For the first time in their lives, our customers can refresh the metadata 30, 40, 50, 60 times during the year in order to understand the differences in the versioning of that metadata. So let's go to the challenge. In this case, you can see 260 ETL business processes, 1700 database tables and views, and 21 reports. You can see the different systems of different vendors that participate in that specific infrastructure. So I'm going to ask Octopi, first of all, to find that report for me without going to one of these four BI reporting systems because I don't know where the reporting is. So I'm going to type here the name of the report, customer product. Now, as I'm typing in and I'm looking for that report, I was not looking for an ETL or database table. I was looking for a report. I found two of them. Here is the report that a business user was complaining about. I can get information about the report. I can access the report from here and I can do the following. I'm going to ask Octopi then to show me the lineage by reverse engineering how the data landed on that report. I'm going to click on the button and see what you get. So within about four seconds, this is what you get. Here's the report that the customer was complaining about. It's actually built on a view, on top of the view, that three ETL processes, here are their names, store data on these three database tables that built this view on top of which you have the report. You can access the report, as I said, and we know that it's coming from SSRS. The reason that this picture is a little bit complicated is when we ask the customer how long did it take you to draw this picture when it said five days, and then we were asking why did it take you five days? He said, well, if you look deep into those ETLs, you would find this one is generated in Informatica. And this one is generated in SSIS. As a matter of fact, we had no record, no documentation or mapping of which ETL do we have in those systems, not even knowing that two of them from two different systems are responsible for landing the data on that specific report. So then we ask the customer, all right, what happened to the complaint of that business user? Was he right, suspecting that the data was wrong? The answer was yes. And then we ask why, what happened? They said that they had changed some of the maps within this low data warehouse ETL. So then we ask the customer, in the design phase, did you know everything that is going to be impacted by any changes you're going to do on that ETL including this report? Obviously the answer was no. Then we said, you know what? Do you know what other things have been impacted by the fact that you are changing something in that ETL? So now let's do the impact analysis, the forward lineage between those systems. I'm clicking this and now see what you get. Now you see a different picture. You can actually see that this ETL impacts much more than that specific report customer product. It impacts all of these reports, which in practice, 18 business users was complaining about all of those reports suspecting that the data did not make sense to them and that poor customer had to go one by one to do a manual work in understanding the root cause analysis of each and every one of these reports where they could have saved a huge amount of time, huge by months, just by using OctaPy and understanding the data flow. In this case, as you remember, this report here is SSRS and this one is actually from Business Object. So they didn't even know that these reports that are generated in two different reporting systems are associated to anything that you do on the map. And this is what we call horizontal lineage between systems from the report backwards or from the ETL forward or from the database right and left. But there's also other venues of lineage. If you dive in, I want to see lineage on the column level, not between systems of different vendors, but also to the map themselves. How can this be done? Well, we decided also to tackle that as well. And by that, dive into the layers of that ETL, which could be for any of the systems that we analyze. You can see six different maps that are associated and all of them are involved in moving the data from the business applications to the data warehouse and if you drill in, this is what you get. You can see a lineage of the exact map out of the sixth map that are moving data from its source through the transformations all the way to the target table that runs on the database. This kind of a map was impossible for them to draw. Some of them were documented, not updated, and they had to go through certain people that knew that they were responsible for writing those maps two years back. Now what we offer is to avoid all of that. As Donna said, let's have a system that automates the mapping, the discovering, and the analyzing of that metadata so the only thing you need to do is just Google. Just go to Octopi and ask what is that you want to see. The last thing I want to show you here is another interesting use case that we were challenged by one of our customers that needed to do some regulations audit. They wanted to find a certain field in order to adapt or to mask it. This has to do with another kind of use case where we had to go through the insurance company who wanted to understand where a certain field exists in order to match it to government laws. As you type in the name of the data element or the calculation or anything that has to do with a collection of different metadata items because they represent the same meaning, you can click and you can see how fast. Again, because this metadata already had been analyzed by Octopi so the results are very, very fast. On the left-hand side you can see all the systems that had been analyzed within your infrastructure. We have customers who have four or five different systems and we have customers who have 30 or 40 different systems. The green button represents how many times we have found that item, that calculation, that instance that you were looking at and you can at any given point of time drill in and get information about that and share that with whomever you need. The message is that let the technology work for you instead of you spending so much time in order to be able to draw a lineage and it almost has to do with meeting the business demands regularly as these things change. That's a short demo of the product. What we wanted to show you today is how easy it is to get lineage and the next use cases that you want or the next reason for finding a lineage could be just with a click of a button five seconds away because we're not sure what will be required from us either tomorrow, next week or next month. And to correlate that technical capability to what Donna has spoken very deeply about the role of metadata in data governance the ability to use metadata, to collect metadata not only for metadata management purposes but also as a vehicle to shift metadata and collect metadata to data governance needs and equip and enlarge the data governance capabilities also to the analysis of the metadata to cover the BI as well in order to get a full picture of the data movement process and to be able to govern both data and metadata together. That's it for me. If you have any questions, Donna, the stage is yours. Great. Thank you. That was a great overview. And I'm going to pass it back to Shannon who I think you're going to manage some of the Q&A folks have typed. Indeed. And we've got a lot of great questions coming in and if you've got any questions for both Amon and Donna submit them in the bottom right-hand corner in the Q&A section. And to answer the most commonly asked questions to everyone I will send a follow-up email to all registrants by end of day Thursday with links to the slides, the recording of the session and additional information requested throughout. So diving right in here. So Amon, who do you see as the typical end user for Octopi? That's a good question. From companies perspective, we see like almost any industry, any vertical that needs to govern metadata for a variety of use cases that you see on the screen. So it can be any company that has the simplest, I would say BI infrastructure, one ETL, one database and one reporting tool. Let's say SSIS equals server to blow or a click or any other tool and just making more complicated as you move forward. Any company could be from any vertical. As I mentioned, some of the verticals that we are engaged with from the user point of view, we see a very nice coverage of people who are using it, starting with the head of BI to the data scientist to support an operation within the BI team or the data management. It can be business architects or data architect, business analyst, developers that for the first time they have a common ground to understand and share the data movement process between everybody that is responsible and liable of shifting the data or moving the data from the business application through the BI all the way to the business user who trusts them to trust the data. So does an organization have to do something extraordinary to make Octopi work with their system or is it simply a turn it on type of tool and IT makes necessary connections for it to work? Well, that's exactly one of the critical questions that led us to establish Octopi. We were actually spending a lot of time in getting other products or solutions up and running, and we wanted to put this in one of our three values. We're going to ask a customer to spend maybe one or two hours of the time running the extractors, schedule them as often as they want the metadata to be extracted, and that's it. That's it. 24 to 48 hours they get an access and they start using the product just like I showed you. And can you trace back to source files? Yes. I love that. That's simple. Well, yeah, that's what Donna said. I think that metadata was a nice topic maybe to deal with ten years ago, five years ago, but the role of the metadata and managing the metadata is just like Donna said and shared. It is almost as important in managing the data itself as it's not more important from past years. Now, the question is how can you enable whoever needs to control and govern metadata not to work that hard as they used to do before? And lucky for us, there are available technologies that have been adopted specifically within our product, some of which Donna just mentioned, that enable the automation of the entire process, the discoverer, or what we call discover it, getting it, analyzing it and work with it. And this very much has to do with the modern days of how metadata needs to be managed by leveraging available technologies. And then the question... I'll add one flavor to that. I think it was the person that asked how hard is it to embed it in your organization. I think just to add what Amnon said, if you think of this as your governance, if that bottom piece is automated, that part can now be easy. And then I think the harder part is then... I mean, the harder part, but the effort is then on how do I get the right people looking at it? Do I add something like this lineage report to my steering committee? And how do I embed that into a software development process? So the effort is more of how to leverage it unless taking three days to do it. It's a good problem to have. Then how do I add that as part of the culture? Because I think once people see it, for example, I had if we added something similar into the agile stage gates, before I made a change, you have to do lineas to see the impact. And so it's more of how to embed it in the culture and less on how to get it to work, which is the nice problem to have. And there is a question in here about getting a personalized demo, how to get a personalized demo from Octopi. And I will get that information from Octopi, and I'll make sure and get that into the follow-up email. So does Octopi have built-in pattern recognition, which would look at physical columns during scan and recommend business term names? The short answer is yes, but definitely if you'd like to see demo how we do it, I'll be more than happy to spend an offline demo or a personalized demo just like you recommended. And are there any limitations for data source to Octopi, such as mainframe, relational database, no SQL database, et cetera? So our focus now is mainly on the BI. One of the things that we were very surprised, I would say, how BI is suffering from lack of technologies. So our focus at this point is anything from the first step of starting DTL all the way through the database and the reporting, as you can see on the slide. Covering the business application is something that is within our roadmap. But one important thing to say, if there's any piece of data that the business user is looking at, we can trace it back from the report all the way to the table source that starts in DTL. And we will tell you that if the source is from Salesforce or SAP ONE or Marketo or PeopleSoft, you will know that within Octopi. We have not analyzed metadata directly from the business from, I would say, the DB sources, simply because we have found that about 90% of the data that is being used by the business user that they can be liable for runs through the ETL. So this is why we started to work from the report backwards and analyzing in a very thorough manner the data from that ETL. But definitely in a later stage, we will cover also the business sources. And can you do data profiling with Octopi? So there are better tools to do data profiling. But definitely this is, if I need to answer, this is something that we are going to be addressing as well. But for now, we want to focus on what we do best and this is what I showed you today. Love it. So besides the data movement, can I see the metadata collected by Octopi? Absolutely. One of the things that we do is the, what we call the inventory of the metadata. We have a lot of clients who are, even beyond just looking at lineage, the first thing they tell us is that we don't know what metadata do we have. We have four or five different BI systems and some of which they have dozens of BI systems in different countries. So this is the first time where they can centralize everything to a single repository and what we provide them is the first inventory of what the metadata that they have within their BI infrastructure and very interesting attributes about that metadata, even before starting using the product and connecting the dots in the form of a lineage. So the short answer is yes. All right. So can you confirm how critical scanning of ETL tools is to the success of automated metadata capture? So as also Donna said, I think it starts with the ETL. I don't think that reporting is less important than ETL, but if you look at the funnel or the data flow, it starts with the ETL. So whereas there are tools that are focused either on ETL or maybe reporting tools in order to show the picture in full, not half picture, not the hint, or to show really the full picture. We've taken the commitment to show the data from the ETL all the way to the report and whatever is in between. One of the things we didn't show here, for example, is when you use analysis, if you use tabular in between, so now it becomes more complicated. But ETL is definitely the starting point of where we start to track the data all the way until it appears in a column or table in that report of that business user. Donna, anything you want to add to that? No, I think he said it very well. I think there's several things. I like the user that mentioned before. Can you just get the inventory? Because I think Amnon showed some of the fancy stuff, which is great because that's where a lot of the complexity is. But a lot of people are at a level of maturity, especially as Amnon said. We have a new region. I don't even know what we have in Greece. Can I just do that base inventory? And that can be massively valuable. But I think with the complexity, as Amnon said, and when you get into reporting, it generally is in that ETL. That's often the black box or even at the analytics layer, as he mentioned as well. So I would agree. So Amnon, what ETL tools do you just octify connect with? So some of them you can see right here on the screen, but definitely if you can shoot an email, we cover kind of most of the classic ETL tools, including some more hand-making. If this is store procedure, for example, which is very complicated, SQL scripts and much more than a PLS, QLTL, and SQL. I mean, the whole point is to be able to not worry what is it that I have in my infrastructure. So there's a lot of tools that we cover and more to come. And yeah, just follow Octopi. And if there is a tool that you are using today, which either is not covered, just let us know. Either it's part of our roadmap or we will consider this as part of our roadmap. But the infrastructure that you see here is pretty much covering about 90% of the hundreds of enterprises or companies that we were working with. But by all means, our goal is to make your dream come true. There's no way for us to know what you need unless you communicate with us. And we invite you to share your thoughts with us. If there's anything that we can do that you can share with us, we are here for you. The big question of the day, Amon, how much time would Octopi save the BI team? Great question. So it varies. We have customers who testify that we have saved or now we have saved, but I will answer this differently. One of the CIOs of the insurance companies that I was really wanted to meet, and we actually met, and then I said, well, why did they end up investing in Octopi? So his answer was, I don't remember what exactly your features are, but the head of BI said that you were able to double my team's capacity with automation, with technology, and they have 19 people on the team. The ROI around this product can be even during the trial. If you were to test a product, just let us analyze metadata from your systems as I said, painless free. Use your product for two weeks in production and see how this product can work today compared to what you have done a week ago, a month ago, or a quarter ago, because what our customers do every day is something similar to what they've done half a year ago and probably will continue to do moving forward. Trialing the product is free, and the reason is that we wanted you to get a firsthand experience in validating the use of the product, everything that I showed today, and to be able to craft the cost justification of using that product. So how is the business logical information, for example, descriptions and definitions, input to the tool, front NUI edits? Is there a friendly change control, approved mechanism for governance to review and approve of those posts and puts? So that's an answer that's going to take more than just one minute. I think that we have a few minutes left. But yeah, this product is, this point is more of a viewer, so you cannot edit this, but definitely this is a capability that's going to come very, very soon within the product. So what kind of training or orientation is needed for end users to effectively use Octopi? So this is something we challenge ourselves every time. Typically when we ask a customer to spend an hour, so it goes anything, not more than half an hour, because the product is very, very intuitive, not because we're saying so, it's because customers say, well, I get it. It's very simple. Just give us a little bit of orientation, what to look at, how to use it, how complicated it is to use Google. So we're not Google, but the concept is pretty much the same. Just go and type in whatever you need and find it with a click of a button. So I would say anything between half an hour to 45 minutes, in order to start getting kind of pretty much advanced use of the product. Typically what we do is we have a follow-up training about two, three weeks later. But that's it. That is how easy it is. So we've got about four minutes left here, a little less then. If I think we can flip in at least one more question. Can I use Octopi to reconcile data between multiple systems? So if the question is, can I reconcile meaning centralize metadata from multiple systems? Yeah, the answer is yes. This is exactly what we do. The whole point is to create a layer, just like you see here, that extracts metadata, centralize it, make logic into it, even though they can be from different vendors and different ways how they store and manage them into data. So the answer is yes. All right, and let me just throw in one more here. I think we can get this in. Does Octopi also include building, maintaining a glossary? For example, similar sounding terms with same or different definitions? So the short answer to the answer is yes. Okay. That makes it easy. I mean, we can get another question. There's so many great questions coming in. Everybody just appreciate it. I will get these questions over to Octopi as we run out of time and make sure that you guys have contact information to get the additional information that you're looking for here. So Octopi appears to be all about technical metadata. What about business metadata? Is that supported as well? So some of which we actually show in the product, at least for this, for example, if you look at the semantic layer of the reporting systems, we are not yet full-blown business metadata, but it's coming. It's coming very, very soon. Alrighty. We are just coming to the top of the hour here. Amon, Donna, anything you want to add before we wrap up? Not for me. I think we covered a lot. Yeah. Seems from the questions and from the topic that is right on the spot. And if there's any questions, by all means, we are here for you. Just ask, contact us, and we'll be more than happy to serve you. Well, thank you so much, Amon, and Donna, for this great presentation. But I'm afraid that is all the time we have for today. Again, just a reminder, I will send a follow-up email by enterday Thursday with links to the slides, links to the recording, and I will get all these questions over to Octopi, and we'll get start getting the additional answers for the questions, but we didn't have time to get to. And thanks to all of our attendees for being so engaged in everything we do. We just love all the great questions and comments that have been coming in and interest in engagement in our webinars. And thanks so much, and I hope everyone has a great day. Thanks again, Amon. Thanks, Donna, and thanks to Octopi for sponsoring. Thank you.