 Thank you, Cathy. And that's my day job. For the last three years, they've had me leading some web application development projects, too. So long story. But anyway, welcome, everybody. I know we're all anxious to get the reception. It's been a long day, your head's probably full of information, and you want to get out of here. So I'll try to make this light and breezy. And cruelly, they started the exhibits. The exhibits opened 10 minutes before this presentation was scheduled to end, which compels me to talk very fast, and just don't ask any questions. If you have a question coming on, lie down until it goes away. The other thing is, I'm a late addition to the program. I was called up from the minor leagues when one of the speakers got injured, or they put them on waivers or something. So this isn't on your CD, but it is on the Handouts website, although I've tweaked a little since then. So we'll get it updated in the next couple of days. By the time you get home, if you download it, then it will be this version that you're seeing now. So we're going to briefly tell you about the company and the problem we were trying to solve, the solutions. This is a case study of a homegrown solution set. It was homegrown, because when we started on this about eight years ago, the tools market in this particular problem space was not very robust. So today, I'm really describing the problem and the approach we took in the solutions that you might be able to use as a lens through which to evaluate MDM solutions. Or actually, some of the metadata tools are encroaching on this problem. Some of the MDM tools are encroaching on this problem. Maybe somebody has the total solution. Maybe you need a combination of them. Then we'll discover how we went. So we were working originally in the analytical space. We're probably trying to solve an analytics integration problem. But as we built our service oriented architecture the last few years, we leveraged the same tool set and the same approaches there. So we'll describe how that happened and talk about the whole convergence and then some big, profound takeaway messages. Or there may be very simplistic takeaway messages. But hopefully, they'll be useful anyway. So about Harvard Pilgrim, we're a nonprofit health plan in New England serving about something over a million members now. Been around since the late 1960s when we started as a community-based HMO plan. Ranked one of the, for eight years in a row, as the top health plan in America. I work in corporate information management, which is a teeny tiny division of IT that sits at the intersection of business and IT. We're the bridge. So we work in analytics, business intelligence, data warehousing, data governance, data quality initiatives, and data architecture, which is supposed to be my day job. Technical environment, we have a heavy oracle footprint in our oracle e-business suite. We use oracle transactional databases for other stuff. The warehouse is on teradata. And for the last few years, we've been working on migrating from our old legacy business in a box monolithic system to a component architecture integrated through SOA, which has not been easy to do. I had solid brown hair when we started out on it and look what's happened. It's an analytics-intensive environment. So we have a lot of data analysts working on complex problems in the typical kinds of insurance problems, as well as medical informatics problems, who are very technically oriented. And in fact, we've just reorged a bunch of them under a new VP of medical informatics to make that function even more robust. So let's start by level-setting first with what do we mean by reference data? Because it isn't always clear. Some people describe it differently. I'll borrow from Malcolm Chisholm. Some of you may have heard Malcolm's presentation earlier today. Way back in 2001, he wrote an article for T. Dan in which he said, reference data is any kind of data that is used solely to categorize other data found in a database or solely for relating data in a database to information beyond the boundaries of the enterprise, which is a very thorough and crisp definition. Another quote, reference data is the core of your enterprise vocabulary, what you use to describe and define your business and give useful meaning to your data assets. That was said by me just now. And you were there. So I think that the core of your enterprise semantics is in your reference data. It's in your metadata and your reference data. And the key function of it is to give useful meaning. Malcolm also has handed out in the past, I don't know if he will, this year, his framework for enterprise data classification, which I kind of like, actually. It's useful in data governance discussions to talk about what layer of the cake we're operating in. You don't have to read the eye chart. It's the large concepts that are important. At the top of the stack, he has metadata, our all-important data about our data. At the bottom of the stack, he has the detailed transaction stuff, the stuff that flows in and out of our enterprises on a daily or minute-by-minute basis. And in the middle, he has this thing called master data at the top of which he clearly delineates and segregates reference data, which a lot of people don't, especially in MDM strategies. Reference data is sort of subsumed under other domains, and it's just going to come along for the ride. I think it deserves its own holistic approach and its own care and feeding. And that's one of the takeaway messages I hope you take today. The other thing I want to distinguish are our homonyms, reference CE and reference TS. The TS references, our important reference, are important because they are the global identifiers that uniquely identify an individual, a product, a customer, a location, et cetera, et cetera. They are often included under the umbrella of reference CE data. I'm explicitly segregating them as a separate problem space. So this is about the reference data that categorize and semantically enrich your other data. So Malcolm sort of agrees with that, another piece of that handout that he gets. It's a big tabloid thing, so you can't show it on one slide. But he categorizes them as external reference data, things like postal service codes, country codes, ISO standards, that kind of thing. Data structure data that defines subtypes, types of transactions, that kind of thing. Taxonomies and classification schemes, which is actually a big driver of the work that we did in this space. Rates, values, constants, other kinds of things that are sort of intrinsic in the world but need to be explicitly described in your data. So that's the realm of reference data that we're discussing here. So the initial problem we were trying to solve was in 2005, we put into production the first release at our Enterprise Data Warehouse, which was a very different paradigm from the legacy data marts. Of course, what we were doing was homogenizing data from many sources into actually a common third normal form database at the base. And we denormalize that into presentation views. But the base model is 3NF. And now, of course, as we're replacing the old platform, a lot of this is decomposed into multiple sources. And this is imminently to be replaced and will also get homogenized into here. Now, in the warehouse, analysts using the legacy marts did a lot of their own aggregation and categorization of data group procedures and diagnoses into instead of all the detailed categories, roll them up into higher level, antically useful groups, rationalizing multiple source system codes over the course of the life of the legacy system. 20 relationship dependents, 20 types of you have a subscriber to health care and you have all their dependents who are also on their contract. We somehow evolved more than 20 types of those relationships. We've rationalized those down to seven. So that rationalization, now we have to do it all over again for the new system. So that's maintained by the business as well. Categorizing benefit plan designs across six dimensions. And so the way this was maintained is these were kept in local SAS data sets or Excel spreadsheets or common limited SAS all the usual suspects. And they would send a monthly service request to IT saying, please update this little reference data that I'm responsible for. So here was the process. Business sends a request to IT. And if you paid attention to Terry Mulholland's slides this morning, either we share the same clip art library or my business guy once worked for the IRS. I'm not sure which. But in any case, business sends a request to IT. The load cycle happens. Error happens depending on the severity of the error. Pagers go off in the middle of the night and so on. And they have to get back to the business guy, say fix this. And the cycle goes. And this is what we uneffectional refer to as the churn. This, I think we can all agree, is not good. So the parallel problem or concurrent problem is that code sets were often the private property of a department or even an individual analyst kept on the desktop or kept in some offline data mart. This inherently diminishes the value of the enterprise warehouse. And it means that different departments are producing different results for their constituents, which is also not a good thing. So the goal was to manage some enterprise sources of truth for this reference data, maintain the master data in the transaction environment, and mirror it to the EDW, make it production grades that put the right kind of controls and backups and validations and auditability and that kind of thing around it, make them accessible across the enterprise, and identify data stewards tasked with the responsibility for the accuracy, timeliness, quality of the reference data, also ensuring valid entry at the time of entry, which is a point we'll get back to in a moment. So the solution set, the first one is totally unprofound. We made a schema. We designated an Oracle database schema. This is going to be the corporate reference center, three letter acronym. Everything works. And that's where we're going to house these reference data sets in relational tables rather than offline. And we set up a bunch of principles about what goes in to the CRC. It's data that doesn't originate from an application. So it's really extrinsic to any system of record. It's relevant to the enterprise. Harvard Pilgrim is the authoritative source. So the industry standard data, we have in a separate play. In fact, we have multiple separate places depending on the system that most needs at first. And I'll talk about that later on when we get to SOA. But industry standard code sets like ICD-9 and procedure codes and SIC codes and all that kind of stuff lives elsewhere. This is for stuff we're originating inside our enterprise. And maybe a classification scheme or one of those taxonomies. It has an identified data steward. And the DDL goes through a gatekeeper, a committee of which I'm the chair, that if you can't say who the business data steward is for this, we're not putting it in until you go find that person. And may have one or more consumers in the enterprise. In other words, it's really enterprise grade. It's not a private thing. And so obviously, this has some governance requirements. If you're going to standardize 20-dependent types down to seven, say, you need some cross-functional dialogue among the stakeholders to say what are the seven values or however many turns out to be that we're going to agree on. And this was an ad hoc process also that we instigated in corporate information management, which was round up the usual suspects. What movie is that from? The Usual Suspects. No, round up the usual suspects. It's not a Western. Major Strasser has been shot. Round up the usual suspects. Casablanca. OK, it's a classic line at the end of Casablanca. No, the next line was where we have the feeling it's the beginning of a beautiful friendship. But anyway, we'll have more movie trivia at the end of the presentation. And but actually started our early kind of nascent efforts in our data governance journey was around, part of it was around the reference data problem. Now this slide reduces to four tight bullets, about a 10-year journey in implementing data governance at Harvard Pilgrim. That was the story of a bottom-up meets top-down. So historically, we would run around gathering stakeholders who might have input around certain data pain points, largely as an ad hoc quasi-formalized process. We actually created standing committees with monthly meetings and that kind of thing. Or we just gathered people when a new pain point popped up. So it was very much the usual suspects kind of approach. But in 2008, when we adopted our five-year IT replatforming strategy, moving off that old monolith into the new SOA environment, we actually got buy-in from the senior executives of the company that enterprise data management principles were vital to making this succeed. And that data governance was one of those principles. And that data governance was a function of the business. So now we had the top-down imperative to formalize what we were doing in a kind of ad hoc way. Yes. So a lot of the pain point wasn't so much around the reference data. It was certainly useful to formalize groups with input around how we were going to rationalize, harmonize reference data when you're consolidating them from multiple sources into the data warehouse. Our data quality issues were more in the operational environment. Our provider data, at one point, was really a mess. Late 1990s, we couldn't pay providers right. It was a result of a merger of two health plans together with multiple GL systems and multiple claim processing systems. And basically, the company went bankrupt in 1999. We went into receivership in 1999, which concentrates the mind wonderfully and says we'd better get formal about our data management practices, which is one we embarked on developing an enterprise logical data model, forming some data quality committees around particular pain points, that kind of thing. We also do a lot of data exchanges with third parties. And the health care payer sector is primitive relative to other industries. In standardization of data exchange, everything was a one-off, always had problems. So we had committees around pharmacy data quality, behavioral health data quality, because those are benefits that are actually managed by third parties, that kind of thing. Does that answer your question? About 160 now. So when CRC was first conceived, there was something like 34 code sets that they were worried about, that they initially identified. But CRC has now grown considerably. So we have the executive imperative. Now we can go get commitment of resources from the business to support the data governance function. And with end-to-end representation, data producers and data consumers, so you have the operational folks and the analytical folks sitting at the table together, which was novel, consistent charters and responsibility, or I could just well say accountability, for enterprise reference data within their particular domains. So we have the schema that we store stuff. We have the people and processes for governing it. What tool set do we give them? And as I said, when we embarked on this many years ago, there wasn't really a tool that did what this thing I'm about to describe does, which is why we built our own. But today, you very well may not have to. So we created this thing called the reference table utility, also a three-letter acronym. That's very important in anything you name. And this is what we call the loading dock and inventory control for master reference data. It will accept data upload bias from a spreadsheet or a flat file, or you can type data directly into a table through an edit screen in the application. It has role-based access. So there are data custodians and data stewards. The steward having the ultimate responsibility, a custodian can maintain but not publish the data. So it maintains the data in a staging environment, which is described better in this picture. You have the published database, which is what external systems or ETL processes draw from. RTU has a workflow where you check out the data. So when you go to RTU, it shows you the status. And if it's checked out, it means somebody is working on it now. So the staging table is what RTU actually operates on. You can then download it to an Excel or ASCII file, maintain it with some desktop tool, upload it. And when you click the promote button, that is when you have officially published the data to the enterprise. You can actually skip the download upload piece, because as I said, there's a built-in editor. You can, for one-off entries or a small table maintenance, this is perfectly good. If you've got a somewhat larger data set, Excel is a good tool to keep it in. So the application itself has this data-driven design where RTU, the only thing it really knows about, is user roles and the workflow state in which a particular object is in, and what actions you're allowed to perform based on that workflow state. Otherwise, all the other instructions are in metadata, either the native physical metadata or metadata stored in RTU's own application control tables. So it reads the semantics in the database catalog to get things like physical names, column names, types, lengths, primary keys, all that good stuff. And then it augments this, or an administrator augments this, by adding a logical database name, a logical table and column name so that users see something more friendly, instructions about how to present the data, and validation rules about the data, which we'll describe in more detail in a sec. So we also established some standards about how you model reference data under RTU maintenance. Here's the staging table, and it has three columns. So it's made up then, customer types. They have a customer type code, a description, and a long description. Those are all business meaningful data. And if you had a spreadsheet, those would be the three columns in the spreadsheet that you upload to RTU. When you publish it, it then goes into this publish table, which has a lot more junk in it. First it has this thing, and now the primary key, rather than being the code, is some integer value, just dumb machine assigned value. And then there are a whole bunch of date columns. So the primary key, the business meaningful key, is what we have in the staging table. That becomes the alternate or natural key in the publish table. We're practicing here the rules of no physical deletes and no overwrites. So anytime some data changes, we're end dating an old version of the row and inserting a new row. So typical date versioning kind of policy. So this gives you history and auditability, and it also, incidentally, supports rollback. Yeah, because you can specify a point in time. And at that point in time, only one row was effective for a given natural key. Do people understand how data versioning works? Do you want to say an example? I need feedback. Yeah, very good. OK, so we'll just flip through these animations where you have a, I just want to point out, this is how we represent infinity in our systems, which means we have an inherent y10k problem. We better get to work on it now. Now I know it seems like a long way off, but that was the attitude with y2k also. You know, someday they're going to go back on those nuts from the 21st century and say, why did they saddle us with this problem? And then you insert a new row, and you end date the previous one, and date the new effective one. And then when you finally come to delete something, after 3.31, it's logically deleted, as if it never existed. But you can still, and downstream users, they're just looking for rows where the end date is greater than today. But it gives us the ability to do point-in-time querying and to do that rollback function as well. So we can look at a bunch of related data and find the reference data associated with it at the point-in-time that we're interested in. Exactly right, yeah. So data validation and edit controls. RTU treats every object as an independent thing. And it kind of has to, because if you enforce a lot of constraints on the database, you would have real complexity about the sequence in which a user could load certain things and that kind of stuff. So everything is its own thing. But we can define constraints and relationships and validity constraints in the metadata maintained by RTU. Those constraints, in fact, can be used with other instructions by RTU through the data entry screen to just restrict the entry through list boxes or other mechanisms to valid values. So we have some simple validation constraint types like numeric or date ranges, constants, lists of values, defaults. But more powerfully, we can do a SQL lookup to other tables. These rules, in turn, get represented in XML and sent off to the validation engine. The validation engine is actually abstracted into a service. So it's quasi-independent of RTU. We've defined some XML based on some patterns that we found on the internet for how you'd represent data validation rules in XML. And so we operate it, actually, as a service. All this is administrated through a very simple user interface. So here, we've defined an edit control type. So I'll present this particular column, which is for product line code in a pull-down menu. We're going to use a SQL lookup to this schema, this table, and this column. And if there's an error, we could specify a custom error message. And then depending on what I selected, this part of the screen is all Ajax. So depending on what I select here, what different things refresh here, and I can select a bunch of different, one of many different validation types for this thing. It's disconcerting to talk when you have a white question here. Yep. The enterprise validation engine right now is the RTU validation service. But the types are defined in RTU. They're extensible. They're defined in an XML structure in RTU. We're working on that. I'll get to that in a later slide. So see, if I turn my back to them, I don't even want to click the button. So here's how it's presented on the data entry screen. So this screen is rendered dynamically based on the metadata about the table. And there's our drop-down list for the product line code. And notice it has the nice English name that I gave it. And it's showing you the translation of the code, which is actually in a separate column from that lookup table. Here's something cute we're doing here. You type MA. Do people hate drop-down lists of the US states? If you're in Wyoming, you're really screwed. You've got to scroll all the way down to select. So everybody knows the abbreviation of a state, right? So you type MA. But we have it go look it up and do a little Ajax-y thing and echo back that you meant Massachusetts. So you had some validation. If you're uploading, you'll get this little red thing on any row that had an error on it. And you can actually filter down to just the error rows here and identify all the errors. We do a little trick that we actually numbered the first row of data row two. So if you have your data in a spreadsheet, the number here matches the one in your spreadsheet. So now we've evolved this tool over time to do all kinds of wonderful things. Basically, the Swiss Army knife for maintaining any code set, any data set that doesn't originate in an application system. So we use it to maintain tables of application rules, like inclusion and exclusion lists for reports or extracts. Other rules that are data that get assembled into dynamic SQL queries for analytics. So when we're running certain analytics and the rules about them change, we can adjust some data rather than adjusting program code. We've made it database neutral. So we connect to Oracle SQL server. We're doing a teradata connector next week, actually. And multiple schemas. So the CRC, as all these other use cases for RTU came in, we wanted to keep CRC pure as just for enterprise reference data. So we can now configure what schema as well as what data-based server it goes against. Yes? No. If it originates in the suite, it stays there. This is being used by other systems. That's right. We don't violate that sacred ground. We can do complex validations now through a scripting language called Groovy. Groovy is an open source scripting language that actually you can write whole applications in. It's Java-like. It operates as a kind of standalone service as well as we're using it. So we pass the script at runtime to the Groovy engine, which compiles it and runs it on the fly. And then we can do things like if column A equals such and column B equals the other, then column C must be between this and that. We can do lookups to multiple SQL queries to multiple places. We can even do some data derivation or data cleansing. So we can say to a user, upload a member number and an effective date. We'll go validate that that member was effective on that date. And based on that point in time, populate some other values so that you aren't tasked with A entering them and B possibly getting them wrong. So it becomes really wildly flexible. We introduced the concept of partitioned virtual tables within a table. So what this allows you to do is you can have many data stewards, each responsible for a subset of, say, a classification scheme. They can all share the same physical structure, but each of their data sets looks like an independent table to them. So their role permissions are on the partition, not on the table. And we made versioning optional. So you can have an audit table, which just has created an update and last user dates. You can have a version table. You can have a partition table. It can combine any of those types. And RTU just knows how to handle all the data adjustment and population assistance fields and that kind of thing. So like I say, it's a Swiss Army knife right now. And therefore, it has some ROI. So we've got more than 200 users. The 540 number, when I put that in a month or so ago, is now out of date. We've had at least eight or 10 new tables go into production over the last month. And because of the partitioning, there are around 1,800 discrete data sets represented in this. So you can just intuitively sense that there must be ROI around this tool. And in fact, by eliminating the churn for all the fine prints, like a drug advertisement, all the fine print reasons down here, which you can read on your own, we estimate we save about $100,000 a year alone. We spent about $85,000 on version one of this thing, which means it paid for itself in under a year. And obviously, you get all the implicit ROI from empowering business stewards, reducing ETL and application maintenance costs, ensuring data quality, analytical consistency, and the support for our ongoing master data strategy. So let's take a quick trip through Sew-a-land. My friend Dave McComb is sitting in the room. Six, seven years ago, I learned a lot of this stuff from him, and now we're actually doing it. Here are some of the major objects in our solar system. I'm only showing that you're the really big ones. Each of them has many moons of subsystems revolving around them. And there's also an asteroid belt of all the other little applications that exist in any enterprise, in desktops, and access databases, and so on. So just wipe the asteroid belt from your mind and just behold these wonderful heavenly bodies. So this at an extremely high level is how we've sort of federated our approach. Now, if you enroll a member, you need to know what customer that member belongs to and what products that customer has said that member can enroll in. If you have a claim pricing system, it needs to be aware of what contract you have with the provider it's pricing a claim for. If you're adjudicating a claim, you need to know a whole bunch of things. And when you're federated, how do you do this effectively, efficiently, high quality, so on and so forth. And of course, you set up a wonderful enterprise message bus or enterprise services bus where nobody is talking directly to its neighbor or needs to know anything about its neighbor's data structure. It needs to know how to interface with this thing. So this is folks at the Lightning Talks last night. This is a canonicalization example that Dan McCreary was talking about last night. So service-oriented architecture, it's about the loosely coupled distributed systems integrated via web services. It uses prescribed interfaces, which are in the form of XML messages in canonical form, which you can just think of as a standard that is independent of any particular application system. The internet protocols are canonicals. Things like FTP and SMTP, those are standard forms that are independent of the particular system they're communicating with. This is the same sort of idea. So each system in this process is a black box with respect to the other one, which means that a data consumer is insulated from changes in the data producer. In fact, we are going to swap out the provider system of record over the next year and a half. If we've done our work right, the consumers of provider data today will be largely, I had just a little bit of my, largely unaffected, the provider canonical will remain intact. So the emphasis then becomes really on specifying the standard interface rather than on all the other details about the data exchange. So this poses a bunch of new challenges. It means the end of silos. Application owners used to operate happily in their own silo, but they are now sharing their data with the enterprise, where they are the system of record for a particular domain. I almost a couple of years ago reached across the table and strangled a project manager who was trying to encourage him to some data that used to lie under a web app that we were rebuilding as a SOA-enabled app, and we were getting rid of the underlying database that some data elements now needed to be in the CRM system of record, customer data. He said, well, it's really a system in service to sales. And I said, no, it's a system in service to the enterprise. So there's a whole mental model shift that happens when you go to a SOA environment. Systems are functioning in an ecosystem. We used to confront the data integration challenges that we do on the enterprise message bus way downstream in the data warehouse. You wear a lot of our data management and data integration disciplines originated. Now we're moving that upstream to the operational environment, which means establishing your master data sources really becomes critical. The need for data governance becomes extremely acute, and you require system neutral descriptions and representations of data, which includes your enterprise reference data. R to U to the rescue. So as an example, yes, we can get into a whole separate long, that's another presentation on how you design SOA services and some lessons learned from the way we did that. In fact, we swung the pendulum very far in one direction, and we're now bringing it back to a more sensible middle. So where reference data is concerned in the enterprise message bus area, here are code values for gender as they might be in three systems. Male, female, unknown, 0 and 1. Oh, true story. We just bought an analytical package from a prominent vendor who will remain nameless that is actually stitched together as three products they acquired, and so you feed a bunch of claims data into it, and it does some magical things, and requires the gender of the patient. In one system, the codes are M and F, and in the other system, the codes are 0 and 1. Same product suite, same vendor, two different code domains. Here is an example, 101, 201, 301, da, da, da. There are some health care organizations that have eight gender codes. Now that's because in addition to male and female, they have male transitioning to female, female transitioning to male, male formerly female, female formerly male. Things that are probably important in a clinical setting or in a clinical problem space, but maybe not important to other parts of the enterprise. So what do you do in the message bus area, say, well, these aren't important to everybody. We're going to collapse them down to male, female, or unknown, which is what most people are worried about. Or there may be another clinically-oriented system that needs all of this detail. You can still carry it on the message in its own kind of private area. Or you might develop a point-to-point interface with that particular consumer. But the point is you've got to have some governance discussion about what you're going to do on the message bus. This isn't just a data warehouse problem anymore. It's now an enterprise operational problem. Another similar example, here are descriptions of three different codes for dependent type. Domestic partner, life partner, spousal equivalent. Semantically equivalent terms, just different words. What are we going to agree on for our enterprise standard? So that's a long way of saying, code sets need standardization. Descriptions names need standardization. And you need to define and maintain the translation rules as you push these codes and descriptions across your enterprise bus. So all this stuff now converges in this picture. So you have coded data. All those need to be translated to an enterprise standard or decoded, that kind of thing. And I'm oversimplifying because not all things are code description. Some are code description, long description, really long description. Some have other metadata attributes. So you can't design something totally generic in this problem space. But for a large portion of it, you can get highly generic. So you have a couple of systems sharing the enterprise bus. This is work in progress, building an enterprise code lookup service that, as part of it, so remember I mentioned the other data stores for industry standard codes, those will be comprehended by the code lookup service, as well as our good old friend, the CRC, for doing things like saying, how do I take eight gender codes and collapse them to three? And that means that the data already established, data governance process, and the data stewards who have the tool RTU come into play here. So I've now taken this tool set originated in the analytics integration space and we're leveraging the same tools and processes in the operational space. So in conclusion, I'm going to end quickly and then we can have time for some more questions. Some takeaways, the general point. Reference data is an essential part of master data strategy. It shouldn't be assumed to be taken care of as you're working in the customer or product or other domains. It should have its own holistic kind of approach. As you're looking for MDM tools, look for things that will empower the business experts with the processes and tools to govern and maintain this data. Don't make it an IT problem and avoid churn and rework. Make sure you have validation tools and processes in place. And RTU operates at the convergence of master data quality and governance. They really are all related. And so should your solution mindset as you look for a solution in this problem space. And also, if it hasn't been explicit, your master data hub doesn't necessarily need to be a physical thing. You can virtualize it as we've done in this picture. So the Oracle CRM is our customer system of record. The enterprise bus virtualizes the master data hub for anybody who needs authoritative customer data. And similarly now for reference data. And then finally, I borrowed this picture from him. He does this 24-month road map to Nirvana. And so he shows master reference data as one of the entry points in an MDM strategy, sort of culminating with analytics integration. But we really move this box way down here and shuffle some of these others. So the only point being that these are all, and David agrees with us, these are all valid destinations on your MDM journey. But your particular road map might take you there in a different sequence. And with that, I'll conclude and take questions. We have five minutes. Please. Sure. We would have a cross-reference table that essentially provides that map. System A value, original value, translated value. Translated value validated against the enterprise code set. And we could make a call out to Sequence A for this value to make sure that it actually exists in that system. He could theoretically, but more typically he's going to be a warehouse user where we've already done the translation to an enterprise standard for him. So we do, for any code field, we pretty uniformly do varkar20. And don't you know we had a new system introduced that has a sort of long keyworded code that exceeds that and we had to change one time. But typically we do varkar20. And for descriptions, varkar100, because we figured that will accommodate most everything. Are you short and long description? No, long description. I mean, we may do varkar250 or varkar500 or something. Numbers as description, not typically. We will use numbers. We will often use meaningless codes because nobody really looks at the code anyway. They're interested in its meaning. There was one other thing I was going to say about that, which just flew out of my head, but it'll come back to me. Dave? So you thought that would break down the OCX turnhole? No, that's all that other stuff that people glommed on to RTU to use as the Swiss Army knife. Well, not directly external, but it's, well, so for example, we collect metadata about our data warehouse and ELDM and everything else in Excel spreadsheets. All that is loaded to our meta model through RTU, loaded and validated through RTU. So that's gazillion things right there. All the rule sets for HEDIS measures that we do annually for NCQA, those are all represented in rules that are used to generate dynamic SQL. Those are all, and there are hundreds of those. So all these report drivers for customer specific ETL things where they are data extracts where they want to be filtered or aggregated in a certain way, all those rule sets are maintained through RTU. No, it depends on what the rule is. It can be a set of columns that can be used for dynamic SQL. It can be two columns, one describing the particular report purpose and one a list of customer numbers to include or exclude from that particular report, that kind of thing. Sure. Yeah. We didn't do the latter so much, although we did identify more things we wanted to create cross references to rationalize. But yeah, the 34 or whatever the number was was the initial inventory of all these independent code sets that were maintained mostly by finance analysts that were for the particular application we were first building on top of EDW that they knew intimately. So they said, oh, here are all the little data sets we keep. But then as we started building out the EDW and discovering more of them, it grew and grew. We are now, because as we're implementing more robust system, more robust transactional systems in our old somewhat limited main framing thing, we should be able to source more of that from these new systems of record than we were able to get from the old one. So I have a feeling that some of these cross references will go away because we have more robust data coming from the new applications. Or we may augment the cross references we use in our rationalization or categorization schemes just to include these new reference domains. Time for one more, yes? So by the system, do you mean the warehouse specifically for your CRC? No, so CRC is the place where you might maintain what you want as your enterprise standard plus a cross reference or one or many cross references of all the data coming from your enterprise standard from your other systems that you need to translate to the standard, as well as a cross reference of source data into a more aggregated categorization scheme. That is where the harmonization occurs. And then the result of that gets reflected in places like the enterprise warehouse and other places. So great, I'll be around the rest of the conference if you have any more questions. Thanks for coming.