 So, he's a tech lead at Allstate Insurance and has been a member of the data management group for more than ten years as the lead architect and designer of their metadata repository data guide and code management system. He's an experienced volunteer, so be nice to him. But he leverages those skills to design better solutions for managing the project. He was able to position the code management system to be an enterprise tool currently used as a system of records and multiple applications, cleansing, verification, or translation of simple and complete codes. He doesn't sound like a developer at all, right? Actually, we have an enterprise thing. He's looking forward to this speaking opportunity. Is this your first speaking opportunity? This is my third. Third, excellent. Now, you can actually tackle him. Because on the third presentation, you're allowed to do that, right? Yeah. So, something interesting about him. I'm going to tell you two things. One of him is true. One of him is alive. So, the first one is he's a former space shuttle pilot. And the second is that he's a professional bridge player. So, which one is it? That's true. Yeah, you guys know me too well. That's a real one, by the way. I'm going to be working on that into your presentation. Thank you. Now, I'm going to read back to your meeting. Oh, and the other thing is, you guys have your evaluations to do. Remember that if you got introduced by me, that's an automatic possible. Thank you. Thank you, Karen. As Karen said, my name is Sal Yacou. I'm a tech lead at Allstate. I've been there with the data management group for 11 years now. I've been doing only data management for the past 10 years. I'm involved heavily with all the repository, all the enterprise repository actions in there, whether it's anything about the metadata or about the codes management system. This year, I'm presenting about a metadata repository. In the past couple of years, I've presented about the enterprise codes management system in Allstate. And I also talked about the services for the enterprise code services. So, it's a different flavor in here. I'm going to tackle it with this. So, hopefully, you're going to enjoy it. The agenda. I have a full agenda for the 50 minutes. So, I have a slide. I have to pay Allstate's dues. So, just one slide. I'm going to throw it in there. Then I'm going to do an introduction about the topic, what's under the hood. I'm going to talk about that. I'm going to talk about the end users and the expectations. Something which we're pointing in, metadata federation. That's how we're trying to do our metadata repository. We're trying to position it in that direction. I'm going to talk about what I mean by that. System of record and the sanctioned copies or the authoritative sources, whatever you want to call them. So, that's a different topic, which probably is a keystone to building all the infrastructures. I'm going to talk about that. I have a slide for that. I'm going to talk about the... Okay, some of it is our existing metamodel. Some of it is where we want to be with our metamodel. I have to tackle this question. It's a key question to any discussion which we do, and whenever the metadata repository comes in, so build versus buy. I'm going to draw some conclusions about the presentation, and then I'm going to close with questions. If you have some questions which you want to ask, please do. Like you've been doing the presentation. This is all state. You can read this. It's going to be on the deck. It's just like we're in insurance business and we're one of the public insurance company. So, why did I come up with under the hood? What does that mean? Like, why did I put that in the presentation? One of the things which we struggle with, and I actually struggle with for a long time, is we as data management professionals understand what a repository is. Once it goes two levels above me, nobody has a clue. They know it's a black box. Just go and do it. But what is it? Like, for them, how can I convince them that that is something useful? We never were able to convince the upper management of what's a metadata repository. So, in this presentation, what I wanted to talk about is, somebody asked us, hey, I have my car broke. I'm going to open the hood. What do I see in the engine? Somebody has to point, oh, that's a battery. That's an engine case. That's the carburetor. Just to identify those pieces in what makes up a metadata repository or actually an enterprise metadata repository. So, I'm going to try and tackle that answer in this presentation. I'm going to go a little bit deeper into the details, but the context is I want to explain why. Like, what makes up my enterprise metadata repository? And how did we build it that way? This is a true story. So, I have to say this. As you can tell by now, I have an accent. So, one day we were debating something at work and the debate started heating up and my peer said, what did you say? Did you say Metacrap? And I said, okay, that's an excellent thing. So, actually, I immediately went in and tried to copyright the name, the Metacrap. I couldn't. I went to register the domain name. The domain name has been taken. So, this happened two months ago. So, it's after I submitted my presentation title. Otherwise, I would have called it Metacrap. So, I intended it has to make it as a slide. So, and it is so like related because what we do is we collect a lot of metadata and sometimes we over-collect data or metadata and we don't know if that metadata is now current or it is of high quality or if it is good. We just collect it for the sake of collecting. And to me, sometimes we end up with passing, okay, what is the threshold when, hey, you're overdoing it or do you have the current data in there? So, I thought it's very relevant and I'm going to talk to like, probably I'm going to challenge you. I don't have a slide to this one. As much as I'm going to challenge you when we look through the metadata, through the metamodel, when I look at the pieces and how we combine them, does that make it a Metacrap or a metadata? The other question which I'm going to tackle, normally I go into presentation, we start with, oh, I want to get a metadata repository, let me figure out how I'm going to use it. Normally most of the people tackle the metadata repository from that perspective. I want to start with the repository, I'm going to find applications. In this presentation, I want to flip it and I'm going to try and tackle it. The business needs this, this, this, this. That's how I can do it through the metadata repository. So, it's easier for me to sell that to upper management or to somebody who is not very technical in understanding what the metadata is. So, that's the flavor which I'm going to bring into this presentation for this, what's, like, why do business need a metadata repository? So, I'm going to have some icebreakers in here. Normally by now I've started deep into my presentation, but this time I'm going to do this. I come from an, like, I come from an engineering background, so I love analogies. And I'm going to try and equate those to a metadata repository. So, I'm going to put some slides and then I'm going to talk about how they relate to a metadata repository. So, in here, those are two things. Hopefully, like, when I look at those, what do they relate to a metadata repository? The way I relate them is the analogy in here is what kind of a repository do you have? Do you have the old one? Like, I hope this is the old one, or do you have the new one? The old one looks, it's an old model, okay? It has multiple systems all put together. In reality, when they design this, like, this is probably like cars of, like, maybe in the 60s or 70s, each unit in here, like this, like, whatever it is, the air filter, it's being designed, it's so efficient by itself, like, it comes up to aspects, but when they build it, they bought the air filter from somebody. They go and buy the carburetor from somebody else, and then they put them all together, they assemble them. The sum of the parts doesn't equal the individual parts when you put them together because it's not designed to be all working efficiently. They all work, but when you put them together, they are not as efficient. Whereas in the new models, when you open it, this is being designed as one unit from the get-go. The designer who came up and designed this model, he put in perspective that he's going to need this in this place, everything is being designed properly. So they become more efficient when you do them compared to the old one. My challenge is, is there any difference in what they produce? To me, both of them are cars, both of them are engines, both of them will drive from point A to point B. So if you have the old one, if you have the new one, you don't need to move to the new one just because, oh, there's a new one which is nicer. Sometimes I have to justify why do I want to do that. The difference between those two, the major difference, this old one, you take it to any service engineer, to any technician, he can fix it. Similarly, in our case, you take it to any developer, they can update the repository, they can add more features to it. It's easy, you don't have to have an experienced people to do it. For this one, you need to be a specialized person in this type of car to go and maintain this engine. Same thing with your tools, with your repository. If you go and get a state-of-the-art repository from like a specifically done, you need some people from that company to give you some consultants to go and manage it. Okay, both of them can do the work. Another one, so what are you going to see in here? This is one of the engines. As you can see, they cut the hole in the hood so they can make space for the engine to grow. The analogy in here is you have to design, if you're doing a repository, a metadata repository, design it for growth. If you don't put the perspective of the future, you're definitely going to overgrow yourself. Normally, with a metadata repository, we start, oh, we need only to collect this, the A, B, and C. And Lauren, behold, after we built A, B, and C, before we even release it to production, we need D in there and we need E in there. All of a sudden, we'll keep on building and we're trying to fit everything into the thing and it doesn't work. The one on the right side, you can probably see, it's properly designed, you can have room to grow, and it has a lot of space in there. Third one and the final one for those three things, my analogy in here between a small car, compact car, an average car, and a super-duper state-of-the-art car, to me is, hey, you can go and buy the latest and greatest model, you can pay as much money as you want, but before you do any of those, before you decide which one fits you, figure out what you need, what does the business need, because it's not like metadata repositories should serve the business. If the business, you don't put them in perspective of what they need, you can actually deliver something they don't want or they don't need at this second. So all I'm trying to put in here is your needs might be this one, then just go and buy a small metadata repository or build a new one. If you are very much interested in performance and all of the great things, then go to the last one. Maybe what you need is something in the middle. It's up to your business, up to your requirement. There is no recipe, hey, that's the one which you should all use. So let me talk into some interesting stuff now. Why do we need a repository? So first thing we did is I went and looked at at least in our business who are using our metadata repository. And by the way, we do have an existing one in the enterprise. So I tried to see who's using it and the people who actually did come up, the top ones are the architect and the system analyst, very clear. Those are heavy users of the enterprise repository and I'll talk about what they use in there but this is one segment of people which we need to address and help. Second group is the warehouse developer. And this is, to me, the warehouse developer means the ETL developers, the DBAs, all of the people who does the backend fall into the warehouse developers. The application developers, all the people who are doing the frontage, they need, they do access a lot of time the metadata repository for a lot of purposes and I'm going to talk about that. Then we have the business reports designers. So those are different from the application developers. The application developers, they develop the UIs, the reports, they generate the reports which most of the business runs against them. Then I have some casual users that are a lot of people who just come in to search for a term or search for something or look for something, which is they are, like, they're probably by number there are a lot but what they do in the repository is very little compared to the others. And us, we specifically are positioning the repository to be used by the automated systems. My theory is if you're building your repository just for documentation, then it's a useless thing because eventually the data in it will get outdated. Nobody cares about it. And quality will go down and users will go off. What we have done is in a lot of the applications which we've built in, definitely for the codes management and now we're extending it to other places, we're making the applications at runtime leverage the repository. So in that way, like, if I have one of my, like, run the business application needing content from my repository, they better be right, otherwise they're going to do something wrong. So in that way, the consistency and the currency of the data in the repository is becoming current. Now, those users which I've listed, what do they do when they use the repository? First thing they do is they research definitions and meanings. That's a very normal thing. People will probably most of you relate to this and this is what there's a lot of overloaded names for this functionality, business glossary, business definition, call it whatever you want to call it. I'm just trying to lump sum it in one place which is just to come up with those definitions. So what does, like, policies enforce mean? They come in the repository, they can figure it out and they can figure out what it is. Another one is to research business rules or metrics or capabilities. So all of those things we document in the repository and we give the user a capability to go and figure out how did this metric, how is this metric calculated? What is it based on? What is the component which makes up that metric or whatever it is? And then relate those to capabilities. What can the system do? And align them to the business, of course. Without the business, there's no need to those. Another usage is the research logical and physical models. We document both so if somebody wants to look at, hey, what's the key to this table they can come in here? What's the key to the entities or how the relationships are? They can still see them from here. They can research codified whether it's a simple or a complex domain. That's the topic of my previous presentations and we're very involved into codes management. We have a lot of practice on there, like a big practice and we document simple domains which is just a state code or gender or a lot of those. We document some complex domains which have relationships. Like you can write this policy in this state but you cannot write it in a different state. They can research mappings of logical and physical attributes to code sets. So for every column in the database, we identify what are the allowed values which are associated with that column. So a lot of the developers do use that functionality. They can research system lineage. So you can look at it like a column or a report element and you can trace it back all the way to the source system. They can also research not just an attribute all the way, we can research a function like data how it moves between systems. So in the claim system gives the data from this place, from this system which comes up from this system. So as the data moves system-wide or as it goes like element-wide we can trace it. Finally we can do change impact. It's again as a derivative of the lineage which means what happens if I change the column in the source system? What is the impact on the target systems on it? So if I make the width of the column like it was two characters and I'm making it three characters, how many systems do I have to change because of that change? A lot of those we have achieved. A lot of those we have achieved. Discuss aside, but I'm talking about what is the expectations of the users like the ones I've listed. So that's what they are currently using in our system. They expect to get that functionality. We also discuss aside, they can discuss. So if somebody puts a definition to a term and like maybe another SME in a different group does not like it or he wants to extend that definition, we have some collaboration on that definition where they can extend, they can suggest a different meaning or they can suggest alternatives to it. Now those people when they have those expectations of usage what is their expectations of the data? This is easy. They need to have like they can't build any assumptions if the data is not complete. So they expect the data to be complete. They expect it to be of high quality. They don't want to be having some incorrect data in there. They have to, they expect it to be accurate. So if I have, if I'm saying that the allowed values for one of the fields is like I'm just coming up with something maybe I have like auto lines or property lines. They better be just those two. I don't expect to see business lines in there. So it has to be accurate. Metadata is consistent which means if I have a definition for one of the key factors, if I define it in one system as something I better use the same definition across all systems. I don't want to make, hey, the performance means something in here and something else in a different system. And then they want to, metadata is traceable. They want to be able to look at certain metadata in a target system and be able to trace it all the way to the originating systems. So how did we go around this federation? Like normally, and I believe any enterprise and I guess a lot of you are from large enterprises when you come in, you normally start with an ETL tool. Everybody has an ETL tool in place. Most of the ETL tools do come up with its own repository. They have a metadata repository in there. A lot of the tools will position themselves as their, like position their repository as quote unquote the enterprise repository. They want everybody to fit in. Normally they are very good and those repositories are targeted toward, like in this case, toward ETL. They might not do a good job doing different kind of like metadata. And the other issue is, by the way, those two things in there, if you like the symbols, I'm talking about, hey, we have some users who are using this repository. That's nothing wrong with that. And there are some people who manage the content of this repository. Then the company goes in and acquires a different company. This is very normal. Now we have two metadata, two ETL tools in the house. So let's say this is tool number one and this is tool number two. All of a sudden now we have two metadata repositories. Which one is the real one? Which one is the correct one? It's a lot of problems in here. A lot of, like, we actually struggled with this. Do we make this one as the real one? And this feeds into it or the other way around. Then all of a sudden you get a reporting tool which has its own repository. And I think the pattern in here is very clear. Then we have our reference data which is our codes data which is another repository by itself. And then you have some metrics and key performance indicator repository which is a separate one. And then we have some services repository. So all of those, how do we solve this? How do we come up, if I want to search for something, if somebody is looking for a certain piece of information, where does he find it? In which one of those? So the notion of what we're trying to come up with is we said, hey, how about we leave all those, all of them should be the authority on its data. How about we come up with something which sits on top of all of them? And what we do is we get data from those systems into this what we call an enterprise metadata repository. We let in here some couple of people who just manage probably permissions or securities but let all the tools have their own metadata repository. And then this one becomes the enterprise one. On top of this one, how about if we put our metadata viewer and then we open it to the general public. So the public will all use a metadata repository which sources data from a lot of repositories. That becomes the information where people can search in yet to let those many repositories live. Live and let live in a way. Just don't try and say I'm going to go and override the others or I want to kill the others and be myself. And this is the perspective of what we call at least a metadata federation in here. I'll talk to that in a second. That's my next slide. So we also from this enterprise metadata repository we do source our operational systems and that's how we can feed the data to the applications in production. Yes, please. Yes, yes. That's absolutely correct. And we just leave them but we source from them. Yes. That's probably I'm going to address the concern of this slide. So the way we address this issue is we've said that for every metadata we will identify who is the SOR for that metadata. So without knowing that there is use like we have to identify one system as the owner of that metadata and we call that system as the AOR of that. And in this case I have an like let's say SOR for info one it's a single piece of information. Then this SOR we can take it to some sanctioned copies whether you call it authoritative sources or we call them sanctioned copies. Those are copies of the data which is in the SOR. Those two systems do not own the data. They are just a copy of that data. They cannot change the data. Now granted in this case I can take one of those which is in this case I take this sanctioned copy I can move it to another system where I can enrich it with other metadata. Now in this system this is the SOR for this new piece but the sanctioned copy is still like lives of this one so we don't change it. So the ownership is only one system who owns that piece of information. Not sure if I address your concern here. Did I, I'm sorry? Not necessarily. No, not necessarily. Actually the enterprise metadata repository is a sanctioned copy. The owners is the systems at the bottom. These are not the owners of the data. So like you could have like, maybe you have an M and an F who owns it for one. Would the owner do it for two? That particular component you may have included a zero and a one. So that's where the PPL would change the difference? No, that's not. Does that still have the same definition? That's a different thing. So the SOR in here what we do is we have domains for the SOR. So a domain concept I have something called gender type. And gender type I define in it male and female. Then I can say system one uses M and F and system two uses zero and one. But the master copy is male and female and those are like golden. Nobody can touch. But you can put some codes to those which becomes like SORs in that system. That's a really key point that the key thing you're controlling is the meaning. The codes can be anything. Exactly. And you're not going to talk to the business about codes. Exactly. Yes. And I don't want to go into the details because I like... You want to? Yeah. Sorry. I'll give you an easier example of this. This SOR, let's say we have an attribute with a definition, right? That becomes in the SOR or a column with a definition. In here, let's say that column gets... Like we have a job which loads that column with contents. And I want to know when the timing of that job happens. This is runtime information. It has nothing to do with the model of the data. It's to do with what happens to that column at runtime. So this is a runtime system which adds this piece of information last runtime or was it a fail or a job? Like this system knows it. But he gets that... He doesn't change the column name. He doesn't change the column definition. He just appends that information to it. Okay. Okay. So what is our meta model at a 10,000 foot level? Sure. That's okay. This one. So that functionality is all managed in this ETL tool. What we move in here is only the columns which is the source of the target columns and the lineage between them. As a transformation or as a filter, whatever it is, we move that and maybe the diagrams, the picture of those. So those gets drawn in here. But this system doesn't change any of this information. He gets... Like he's just a view of them. Are you saying you allow users to use two different language functionalities? The Informatica users are going to be still in here. But if somebody else, like... I'll give you an example. In Informatica, you're doing linears between columns. Like in a couple of slides you're going to see them. We tie those to requirements. So if I have a requirement which says I want to implement this and I want to trace that functionality of that requirement, how it was implemented in the model and then how it was populated through lineage, what happened to it and where it came from, the metrics or the requirements are coming from this repository. The Informatica, the ETL might not know about this requirement and just have the implementation of the requirement. So we do that through this semantic layer in here. I'm not sure if I'm addressing your question or I'm not getting... I'm sorry, like... That's in here. Yeah. Correct. That's where we move in here. So in here, what we do as a 10,000 foot level is what we do document as metadata is we have some KPIs and metrics, which is like the key performance indicators and the metrics and the business rules. We have one of the things which we do as a model. I'll have a slide for each one of them to explain what I mean by those. We have some logical and physical metadata... Sorry, some logical and physical data in there. I have some reference data as a high level again, which has the codified and non-codified data. And as you can see as I'm putting those boxes in there, I'm getting lines because there is dependency. When I'm putting a logical model, there is dependency between the logical model to the KPIs. When I'm getting now the codes, there is a dependency between the codes and the models and between the codes and the metrics. I also bring up data... Go ahead. Yes. Yes. Then I go and do data mappings and lineage. That's another box in there which connects the data across the systems. I have some business reporting models and I have some runtime and metadata jobs and data profiling jobs which runs. Across all those, I have some application or AORs, what we call them, which I'm going to talk of why we need this. I have some versioning information. We provide some auditing information. We provide some governance data and we collect some security about those. So I'm going to talk about those ones, why the business needs those boxes, like how did I come up with those boxes and why did I think this business needs them. So the first one is like the logical and physical models. The business, they come in and say, okay, well, we need you to document the system capabilities. So we start with the capabilities and I want to also map those capabilities to the requirements somehow. And also I want to do some research across this data. So the way what we do is in here, we're saying, okay, those capabilities will translate into some entities, some attributes, some modeling, whatever it is. And we use whatever modeling tool you have. Through this, you can create a logical model, which is composed of entities and eventually every entity has some attributes in it. We also model the physical side, whether it's an Oracle, whether it's a SQL, whether whatever it is. And that becomes a relational data store which has some file and fields in it. And then we do map the logical to the physical and the entities to the file and the attributes to the field. We also go in, so for some hierarchical data, like we have a lot of those, like whether it's an XML like for the canonical models or whether it's some wisdoms or interfaces like for web services between systems, we document those and then we map those models to the logical model and we map an element to an attribute and those elements can be hierarchical. I'm not going very deep into this. I'm trying to make it, it's under the hood. I'm trying to make it more of a high level about those. So I'm not going to go into the details of how each one of them is implemented. The other one, which is the metrics and the key performance indicator. So any of the rational is the business wanted to have one version of the truth, which means when I go into policies in force, what does, like in the claim system, is the policies in force is the same as the one in the policy system. They might be different or in the financial systems. So they wanted to have one way to measure metrics across all the enterprise. So that one version of the truth is what we're trying to address in here, which is what we call the metric like in this slide. What is the rationale of doing this? We care about having one definition. The most important thing is it drives data quality. Without it, I can't have a data quality. If there's two people, like two people have, they don't agree on the meaning of something. There's a big problem in how do we measure the quality of the data. And then we want to have a consistent way to report on key metrics. The way we've done them, I'm going to give an example of a non-insurance business term. But this is the real thing. So we have processes, and one of them, we have a problem management system. So just this is our process. We have multiple processes, but I'm just giving an example in here. We take categories into what we want to measure into the process. So whether we want to do compliance, capacity, all of those things which we want to categorize by. Through those, we come up with something called the KPI, which is a key performance indicator. And that, in this example, it's mean time to service resolution. So if there's a problem, how do you measure it? That's the way. That's one of the measures which we do, which is reported on. Oops, excuse me. And that KPI is actually based on multiple measures, which talks to like number of problem tickets assigned to a multiple party. So there's multiple measures which we collect from the systems. We call them operational measures. They make up a formula which makes up the KPI. The data mappings. This is Strayson's data as travels through the systems. And in here, what's the benefit of it? It answers the question of what if, if I do a change, what happens? And enable research and data quality issues through this lineage. So in here, the way we've done it, we go to the reports and the reports have fields. And then we have some databases. So these are the two databases in here. And there's a mapping between those databases. Those are coming from the ETL tool. Those are coming from the design documents of the report. Which field did populate that attribute and did populate the attribute in the report? And we trace those back all the way to the XML system. Of course, in here, I'm showing you just one leg. But this data, as it hops between those systems, it might be five or six systems from the source systems all the way to the reports, how the data makes it. Reference data, this is, again, what we call a codified data. So we wanted to have a consistent way. When I talked about gender type, I want to have one type, like one place where I document all the values of gender type, all the allowed values, and then allow for multiple usages of those values. And that's what we do in the reference data. This piece is actually integrated into our ETL. So the ETL leverages this content and translates data as it moves from system one to system two, especially from legacy systems to the operations or to the warehouse. The ETL does not hard code any of the translations. It actually goes into the repository, pulls up the data from the repository to do the translations. The way we do them, we identify a code set. One of those code sets could be like gender type, or gender type. We identify what are the codes used in a system, and we identify those, what are the descriptions which are associated with those codes in that system. We also document some non-codified, like we identify the non-codified sets, like last name. So we identify what are the attributes, which do not have a finite list of things. We need that because sometimes we erect some complex relationships across codified sets. And sometimes we embed them with non-codified data. Now I'll give you an example of this. So let's say we're writing some policy lines. Policy line could be one of the codified sets. Another codified set is state. But if I have a certain line which is only written, like which we can only write in certain states but we don't write them across other states, we have to identify that relationship and we identify it through a complex relationship. And we have a huge repository which serve with this. Yes, all of them is based... This is a separate system by itself. I'm just trying to explain what it does. We also use it for different things... I'm sorry, we also use it for different things. Sometimes you get a discount, let's say, in different state on different lines. So that becomes a non-codified set because a discount is a percentage. I'm just trying to be aware of the... Now we have one business value and we erect codes by application and then we establish the mapping to the values through that. So for gender type, I manage male and female but I allow system one to have zero one and system two to have M and F. And the ETL tool, I give it whatever it gets and zero one in the N, it translates to an F automatically. We identify which one of those is using. Exactly. In the attribute, we say this attribute is using this code set. I'm sorry, I'm just going to go faster in here. I have some slides in here to go through. So business reporting model, in here we have some reporting universes and in the reporting universes every universe we have some elements in it and what we do is we map those to the databases which are based on normally those are materialized views or views. So we map the universe to a database table. I'm sorry, to a database and we map the elements to the file and all of this is automatically done for us like we even generally the universe from the repository. Runtime metadata, that's like the rationale for this is we want to have some information about the state of the data the state of the metadata, I'm sorry like in this in the repository. When it was last loaded, what is the profile of that? So all of those things which are not in the model which we have like which they happen at runtime and those into the repository that's the way I talked about the SOR so that's a piece of that SOR2 in there. So I can give some information about the currency, I can also provide some volume metrics about the metadata which we have in the system. The way we do them is we identify the runtime metadata, we collect data profiling so the columns when I look at the metadata repository I can see the profile of that column I can see the runtime job so I can look if there was an ETL job which run last night, what was the outcome of that job? Did it fail? Did it run? Like what happened to the data? I can have some data about that. I can have some volume metrics. How many attributes do I have? How many columns do I have? This is probably for business reporting sometimes for like work. I'm going to have some usage metrics. Usage metrics why is because a lot of the services we open, remember I said application automation which is the application can call a web service which tops into the metadata repository to collect data back so we can get how many times that web service got involved from an application. Exactly. That's the runtime effect. The other one is there you are in systems of like how do we manage our data? So the notion was we wanted to be a self-service we didn't want to be in the business of I'll manage the people for this application who has permission and I'll manage that If that's the case then we're just going to be just managing who has permission to do what. So we wanted to delegate that functionality. The way we did it is we said by IOR when you come into this tool we'll assign one of you to be the owner and then you manage who has permission to do that in a way to delegate that functionality to the users. So the way we achieve that is like this example I'm just going to again like this is the marketing maybe it has like I have multiple systems in here and multiple applications. So I can delegate that functionality this is like all of those are part of the marketing it's like hierarchical in nature and through this we can control the permission of who has access to the systems and I'll show you that in a different slide where we talk about security. Versioning this is one of the trickiest things which we've dealt with is how do you version your data. So there is there's two types of data is as designed or as built so if we want to target the developers we have to do the data like we have to model we have to document the as design if I want to target the end user I have to target I have to do it through the as built and in this example so let's say this is the versioning things I either document and actually we do both we do a flavor of both so we say okay in production this is the metadata in UAT this is the metadata in development that's the metadata and so forth this is by location of where the data is or we can document it by releases so maybe I have a claim system which I have like release one of it and release two release 2.5 and release three and I want to see all the data across all those releases that's another way of versioning our metadata and we do apply this into whether it's our codes or whether it's our metadata. Auditing so this gives us the example this answers the question is who changed this information when it was changed what was the change order for that like who has like if we are a trace back any any change which happened like it for any reason we can do those for auditing information so what we do collect is we collect change by change date this normal standard things we can we keep audit trail of the history for the changes and we have some change control records in there security this is what I was talking about to allow people to use certain pieces and not like in not control the permission the way we do them we identify the application systems we identify security groups we identify the people who belong to those security groups and then through this association we come up with a role between the group and the application so every group within this application have a different role whether you can change the data whether you can just view the data or like be an admin in that content we do it at the system level governance so we collect this data which is like what we do collect is we collect the business owner data steward, metadata owner technical steward of the data mostly using verification and when it needs to be re-verification so if I collect the metadata and I say okay this is when we agreed on the term in six months because that term will be will change so sometimes we need to re-verify this content so we collect that so we can call those people to make sure that the content is current build versus buy so I looked at certain things like if you're going to do initial cost when I look at that if I'm going to go and buy if I'm going to go and build that layer on top whatever it is then building is lower in cost but remember just the car the hood the engine hood but if you buy it it's going to be more expensive to get it in and I'm going to talk about just the initial cost even to install it like the servers and all of this if you buy certain pieces and then you build the wrapper it's still going to be high but not as high as everything you buy it running cost so of course the least is like if you build it but you're going to have some problems with the building production the worst is you build it because it's going to take you some time to go and build it if you need the repository tomorrow the only option for you is to go and buy it maintenance those are the things like it's cheap if you want to build it it's like if you're going to buy it it's going to be expensive maintenance because you're going to need those special consultants who can do the work whereas build any developer can do it if it's custom built in there resources where the build is cheaper for the buy more expensive features when you build it you're going to cut down on features you're going to make it just can do the job so normally you have to sacrifice some features if you want to go with the build if you're going to go with the buy you're going to get a lot of features in it definitely flexibility for the build you're going to have the flexibility to control all the source for your application for the buy of course the flexibility is going to be less you have to go through them to do your to implement your changes polished interface for the build is not going to be it's going to be something just running for if you buy it's going to be very nice like when you give these products standard based normally the build will not be following standards you're just going to follow your own standards the buy most probably they're going to follow any of the industry standards customization and API all of those will be able to do those I'm putting that by pieces I'm going to talk about it in a second so conclusions for us like we're thinking that hey the federated repository is the solution as opposed to saying once one repository we're going to allow the multiple repositories federated in one metadata repository and that is going to be the like the place where we use it second conclusion is buy whenever possible I won't recommend you build anything if it is existing it's just wasting your time on it build on top of own products when you cannot buy a capability so if you have a product which you buy like you can buy it off the shelf if there's some feature you like having instead of going to the vendor to do it because they're going to do it in their system it's probably it makes sense to buy to build that just that feature if you design for a minimum specs and settle for it definitely you realize that hey you're going to oversize your implementation and you're going to go into the scope creep and all of those issues which happens when you go for the minimum standard so when you design put in perspective some future expansions as opposed to building for the least requirements it is definitely okay to do agile work but don't misuse the word agile a lot of people say hey agile I'm going to cut in specs agile just says to at least to us perspective is know where you want to go to but do it in steps and that's what it's very fine if hey you say I want to be in there I want to build that metadata repository and today I can only build this piece so build them as you go and then grow like as Tom goes by now just put this as a another user because I want to keep on using this sure area of responsibility and second if your security model only goes at the system level are you able to say what individual modified this like the salary from an audit purpose are you able to say this individual modified the their own salary of this person can you do that? Yes we do through this not through the security security means who is allowed to change the data but who actually did it is in the auditing model so the security is just control but the auditing is where it's saved sorry all of it is automated yes yeah so for SAP and Siebel like what we have is we expose what like again some services where we collect like for codes we send them the set of codes to them sometimes they collect it through a service and they don't go into this model as this they model it and then how that's how we connect to them not sure if I'm answering your question exactly one question is still the data yes do you consider the process as part of the data we're trying to get there we're trying to get there there's two types of lineages which we're trying like we're trying to consider one of them is I called it system lineage that's probably what you're calling as a process another one is the attribute lineage we are now focusing on the attribute lineage which is how the data how data moves from one system to the other I don't believe we have adopted that model because the ISO model the question is have you have you adopted the ISO model so the ISO model if I'm not mistaken you're not talking you're talking about the international the insurance standard there's two ISO's there's ISO I'm sorry no we have not any other questions thank you everyone hope you enjoyed it