 We know what elegance is, you know elegance if you're going to get up in the morning Simple alarm clock is typically going to be a lot more effective than a rude Goldberg contraption What we want to talk about today is Why we're so interested in complexity and elegance How does it how does complexity contribute to the cost of your systems and and what we can do about it? So in order to get into that we have to talk about, you know, what is complexity? Where does it come from? Why why do you why do you currently have it and kind of how to get rid of it? We've got some examples of Places we've been where we can get to much simpler systems and they've had in the past So There's two things I Think going on with information systems that contribute to complexity one of them scale as things scale up They typically get more complex, but that's not the only thing in fact the the tricky bit is As things scale up and as they interact that's where the real complexity comes from We're going to kind of go through Area by area which things seem to be more subject to This kind of come interaction complexity and which things you know scale just fine But we've been we've been studying this for Probably a decade now and in some cases Some what I'm going to say has come from just estimating information systems, you know How how is it that this accounts receivable system costs a hundred times more than this than another one You know, they both look like accounts receivable systems They have a lot of the same functions, but there's something different either in this organization or these particulars that make one Way more complex far larger, etc. Etc. That's kind of what we're gonna Unravel here a little bit So as things scale up, you know in traditional economics If something if the cost goes up proportionate to Something scaling up we say it's it scales linearly So it twice as much of the input, you know twice much output for twice as much input In in the real world a lot of people believe that scale has an That that there is an economy of scale And when you manufacture things most people would tell you you know every time you double in the number of things you Manufacture the the cost per item comes down some predictable small percentage But some things have a Disacademy of scale as you double the number of them the cost per unit goes up instead of going down And that's what we want to explore here. Why? Why does that plague some of the things that we do in information systems and not everything and how does that? How does that translate? So here's the seven or eight things. I'm going to talk about and and how each is affected by scale and complexity And we've been hearing a lot about big data One of the question is big data per se complex And actually in most cases at least most of the ones I've looked at It's just the opposite big days. It's typically a very simple, you know Whether it's sensor data or whatever it is the data itself is pretty simple and it scales up Easily, you know, you can get that that's what big is So we're gonna kind of go through a thought Process here just kind of think to yourself whether these Parts of a information system scale linearly or the have economy of scale or diseconomy of scale So for instance and then how to think about this if you had a database with a million records in it Is it? You know a thousand times more expensive to run and operate and keep going than one with only a thousand records in it And I think most people would say no that actually Data in a database by itself probably has economy of scale once you once you built the system in the database You can put more and more data in and yes, we get some Issues as we scale up, but but for the most part, that's you know That's what a system is you can put more and more things in it and scales up How about schemas? schemas have economy of scale or diseconomy of scale My my sense is my experience is that you know the more Complex you get a schema Every time you go to add another attribute or another column or another table. It's it's more expensive than the last one So, you know if each new unit is more expensive than the last one You got this economy scale of each new unit is cheaper than the last one. You have economy scale So if you have how about lines of code you have programs lots of lines of code you go to add another line of code Typically each line the bigger the system Every line tends to get more and more expensive is why big systems are complex and they cost a lot If you have a lot of content in your site you go to add more content Does then the next piece of content cost you more than the last one? I don't think so. I think content is probably one of those things that it doesn't have a lot a lot of Interaction with the other content. There are links. There are things going on, but for the most part feels like content is sort of linear How about having lots of users come to your site Does the does the incrementally next user cost you more or less to add than the previous user? That one having an answer. Yeah, there it is after I mean people like Facebook have proven that that You can scale up the number of users Pretty rapidly and pretty easily without a lot of of incremental cost and certainly The 800 millionth user is not disproportionately more expensive to bring on than the first few Whereas with code or schema the 800 millionth line of code is probably the most expensive line of code you got there Or the 8 millionth element in your schema You know if an application has if you add another user interface to your application Is that getting more incrementally more expensive or less expensive? And I'd say more I mean typically every time you add another user interface. There's more procedures and more training There's more interaction. It has to deal with all the other ones that came before it and the pattern here is pretty much that if The next thing you add has to deal with all the things that came before it There's things tend to have this economy And if they kind of stand alone you just add them they have economy Interfaces, you know if you've built complex systems that have lots of machine, you know application application interfaces I think you agree that You add another one of them and it gets more complex each interface has a tendency to Have to deal in some way with every interface that went before it and then finding processes for the most probably seem to be mostly linear I Seem some really weird extreme cases I was reading a trade magazine Recently and somebody was talking about business process You know one of these tools or something and how it saved them and it happened to say Prior to implementing this tool we had 400,000 steps in our in our in our processes and 600,000 customers I thought God What kind of you know, there's no economy of scale at all there if every single customer gets his own process I don't know what I don't know what they were doing. It's a good customer service I suppose but a little bit weird, but but for most people, you know, you add more processes and it's sort of Probably scales linearly. So why might even get kind of going through that because Every part of your information system isn't created equal and doesn't contribute the same amount to the complexity of the whole And when we start to understand that we'll start to see oh That's that's why this is so expensive. That's why this is so hard. So What we're trying to do with This whole complexity conversation say if you allow Complexity to creep in your system costs go up flexibility goes down. I mean we've Seen that every place you've been it may not be exactly obvious. What's contributing to that, but that seems to be certainly true now Some of the complexity and some of the scale is what I would call good complex, you know Most people want to have more data big data and want to have more users. I want to have more processes So, you know, we're not trying to completely discourage Scale or adding things on or anything like that But some of the some of the things we just talk about increase complexity and cost Decrease flexibility Disproportionate to what they add you know having lots more schema doesn't necessarily by itself Make you a better business But it does make you more complex and more expensive business and with more code having more code in all your systems Doesn't actually get you anything the same way that having more users does typically or even having more data does typically But having more interfaces more application Integration points etc. Etc. Isn't by itself What you want that's just what you have to put up with So if we look at that and say how do each of those contribute to the complexity and cost of the whole It looks pretty much like this. We have kind of a raid these seven or eight nine things here adding more users Typically contributes to you know, you know often have more content whether it was users Generated content or some of your own content just to have more users and they act of having more users that tends to contribute that way Also having more data. That's where content comes from a lot of cases There's a slight relationship here that the more data you have you tend to start finding Variations in the data and therefore deciding. Oh, I should have more schema I should have more schema that represents the variation in the data that I have Typically you have more Users more kinds of users you will eventually Create a different user interface for some group of those users But it doesn't go up, you know exactly proportionally there and and often the more users you have you end up with more processes simple little systems With a small number of users typically have relatively few Simple processes fewer exception processes all that kind of stuff but this is kind of the the benign side of this little curve or the Virtuous part and this is where what we're saying earlier This is what you want actually this you want to scale these up. You will actually make money Typically if you have more use more content more data the other side of this is Where your money goes As you have more schema every estimating thing I've ever seen it says the scheme Increasing the complexity of your schema directly increases the size and complexity of your code It indirectly, you know at some point schemas get complex enough that people Break them off into separate applications. You know because it's just gotten too big and we can't launch that project It's too much Once you launch more applications you end up with more interfaces and the interfaces cost you money The scheme itself very often contributes to the size and complexity of interfaces Every time you add more schema You end up with more user interfaces because every you know every time you create another table You got another user interface for and that tends to lead to more processes So this is this side of the thing is where you're spending all your money You know all this is for the most part money spent is where you're making money What we want to focus on Is is this guy's key role and kind of Blowing this out if we can we can even save a tiny percent here We can save a lot over here. That's what we're going to talk about with the elegance So and I think everybody I mean you think you're at this conference and you're in this session because you've noticed that there appears to be An explosion of scheme out there Everybody has tons of it. And is that just a fact of life? Is that something we have to accept or? Is that something we can do something about in order to? figure out Whether we can do something about it. We have to ask ourselves Where did it come from? You know, I just described something very generally It's a you know more scheme at least more code, etc. Etc. But How come we why do we have so much schema? There's three or four reasons that I've observed one of them is the way we design things Some of our tools some of it's just bad habits. I'm gonna talk about a little of each of these Probably One of the main contributors. I think that slide after this is is making a bigger contributor Is when we go to design things every time we encounter what we think is a new thing and we talked to our users Oh, we got this different kind of thing here. Oh, I know what that that we need a new table Because it has different attributes. That's just the way we think, you know This this thing is more different than this other thing new table new attributes off We go and we very subtly introduced a lot more complex. We started that whole cascade of expense there Because you know you get the new table and it has more different columns every column you stick on the table is now a new contributor to the problem you have and then The mirror fact you created a table means you've got typically at least four user interfaces You know a create an update a read and a delete for every every table almost and then that will in turn cause you have more processes It's just This is this is one that we fight all the time our business sponsors think have decided that they don't want to reinvent the wheel Got an acquire a package and if you could black box a black box a package Yeah, maybe you could reduce complexity But nobody's ever black box a package for the first in the first place they have a user interface Like they don't have a user interface. They have thousands You know you buy a package and it has anywhere from dozens to thousands and you right away You that runs the cost of your training up your procedures and all those Multiplier effects we talked about but the more insidious thing is The package you bought has its own logical physical and conceptual model baked right into it's got all its own terminology and Sooner or later, you're gonna have to deal with it And the bigger more complex it is The more it's gonna cost you to deal with it I mean it just is doesn't matter whether you Created the scheme or somebody else did if you got to deal with it and it interacts with everything else. You're doing It's you know the meters running it's running the cost up And the other thing that's Interesting about packaged software is it's typically far more complex than anything you could or would make up on your own You know Lots of people run their businesses and they have something equivalent to an ERP system But nobody would make up something as complex as SAP. I mean nobody has that much imagination Takes decades to do that Um Here's what here's one of the other things that that comes in that it's kind of amusing I think we were at an Agency for the state of Washington that does all the financials for the state of Washington And and one of the nice things was they had a metadata repository so they get sucked stuff out of all the systems They had so they had a really nice Inventory of everything they had in all their systems, which is unusual and and they had tokenized You know all the all the variables so the Torum apart a little word phrases that you that you have And I just went combing through this thing and found and unfortunately this slide is my mental recreation I couldn't find the original slide, but I I found looking through all their systems that they had something like 24 different ways to say sort of Of course, they don't accountants. They sort of they say things like well This is the you know general editor budgeted year-end amount of the estimated amount of the allotted the predictor of the set So we just went on and on it was kind of comical and they looked at that and we sort of nodded and I said but the point is I I bet two things One was that they're not 24 nuanced differences in sort of Nor do these things all mean exactly the same thing They don't most people would look at and it took several hours, but it turns out there's about there's about four Conceptually different things in there You know and you can imagine and I'm not going to get into what the four are but but what's important is if you Implement 20 when there's really four That's not five times more complex That's way more than five times more complex because all the interactions of every time you have to get some data from one system another and figure out whether estimated is Close enough to predict it or approximate it or set aside or whatever it was um, the harder things get So this is another and and we unfortunately we tend to do this and and what we what we're doing is Is trying to make up was two things we're doing one we like to make up new words when we make up new tables I don't know why that is I've observed it myself and I've observed it in others And we like to contextualize it so it sounds good on that table that you're working with So a lot of these just come from one of those two Tendencies, but then that result is you got a lot of extra complexity there that you don't want or need so What i'm going to suggest is if we can We can get a hold of the complexity of our schema and reduce that like i'm saying with the irs example He'll have A very positive knock-on effect So here's our methodology for doing so a lot of that was just setting the stage. Here's the problem. I think you probably bought into that I'm going to describe How to go about that this is Mostly inspired by what we've been doing with semantics But you don't have to really understand owl or semantics or anything in order to do this I think it just requires a mental shift To start thinking about the problem differently and start thinking I'd rather reduce complexity than continue to contribute to it and that that's probably almost all you need to do Most of the places we've been have upwards of a hundred thousand occasions a million attributes in their collective you know Schemosphere if you will or schema zone Whatever you want to call it But really most places have a thousand or two thousand essential concepts if you can find those essential concepts and and relate everything back to them you you know you've Really done something. I mean if you've reduced the top one by one percent you've really done something But if you reduce it a hundredfold or a thousandfold, then you've really really done something john Uh, both actually. Yeah, we're I'm I'm I just add because the deeply get into semantics. Um We define Concepts from their properties very often and we just mix them together and add them up Um Yeah, so We're We're shooting for a reduction of a hundredfold here, but we'd be happy to get 10% Um And what happens if you can get like the irs example, there's probably Nobody that understands all 30 000 concepts in the irs be my guess But if you can get it down to 3000, you know, there are a handful of people who can understand all of it And smaller numbers of people who can understand major sub portions of it You can you know, if you get down to that size, it's a it's governable now You know just the this is your scale And that you know it has the effect So you're you know, I just made some wild claims. You can reduce things a hundredfold, etc Etc. I'll give you a couple of examples with with numerical information um From for clients who worked with And we weren't like I said at the time this is what this wasn't what we were trying to do We were trying to build an enterprise ontology and do various different things whatever happened to be on that project But this is what happened. So Many folks know sally may they do student loans pretty big organization When we'd done this modeling thing that they had 51 000 if you added up the attributes and the entities in their Existing loan origination loan servicing and guarantee system judge five or six major systems We did this enterprise ontology and ended up with 1200 classes and 353 properties One of the things that if you're paying attention here, you're going to find that looks odd And you're going to this oddness will repeat itself Um, and I'm going to talk about where that comes from. How can you have fewer properties in classes? What is that how that can that even be? We'll get to that Bear with me just for a second, but this one subsequently They went out and outsourced One of the new line of student loans And and but then realized that that had a different database that had a different user interface that had different All different everything was different again, and then they went. Oh, hey, let's use the ontology To integrate this with our other systems Remember or let's Use the ontology to provide a single interface based on the ontology To the new system and the systems they had and actually worked out pretty well We we didn't have to extend we did extend it some you know as you get down to the details You realize oh, yeah, here's a couple things we didn't think of and stuff But it didn't blow back out to 51 000. You know it blew up 10 percent or something So, you know, that's the and that's pretty significant. That's a that's a lot of extra Point to point in square kind of interface. You don't have to do We work with procter and gamble in the research department the research department is 10 000 people Huge, you know, there are hundreds different disciplines. In this case, the issue was they have People retiring and you know knowledge workers aren't you know people are inventing swifters and Duracell batteries and all the whatever they they invent And they're worried they're going to lose the collective wisdom they had And it was kind of a knowledge management thing we wanted to build an ontology But the problem in this case was and they didn't have one they were starting with it, you know Researchers just do research, you know, and they fill out their time sheets. That's about it And every research area literally has its own language You know the the battery guys talk about anodes and cathodes and chemical reactions and all that kind of stuff But they believed That they should be able to do queries across all the research groups and find um, in fact, they they think that a lot of their key successes are where they found odd combinations between You know disparate groups But the battery guys call deposition sputtering You know putting Small particles on flat surface or something everybody had a different name for but essentially it's the same thing But anyway there we when we did the ontology of 400 classes 192 properties to represent The shared concepts of research and development And as we blew it out in this case we blew it out for batteries and toothbrushes It added lots it added a lot of classes, but only nine properties So now we've got all the batteries and all the toothbrushes as an extension You know most of in fact all these extra classes are almost all of them were defined in terms of classes already existed So if you knew these and could do a query on these You would find these that's the whole idea there for the reuse um And then they they extended it to a couple more product lines and what they said was yeah They hardly had to they hardly had to extend this at all They added you know every time they went to a you know, baby care. They had to add obviously some more classes But all kind of derived from this core Then we have lexus nexus Um, you know there here the scope is content man, you know, they have a lots and lots hundreds of millions billions of documents of content um, we work with them this one's a little bit different, you know as um terry mahalin said very few people have enterprise models That's been our observations. Well lexus does and did so we uh Extracted a lot of what they had in their logical models put it in the shared model and it's still fairly elegant What's what's interesting is I think because we didn't do it by hand We weren't concentrated on reusing the properties as much, but still that's a pretty elegant model for something the size of Lexus nexus and that's what they're going forward with now And just recently we did a health care company and health care is actually pretty complex Um, but in this and this is the delivery of health care. So it's you know hospitals and clinics and extent, uh, you know life care and Home care and cut all kinds of stuff and an insurance company um, and that ended up with that size of Numbers of properties and stuff right now. We're doing uh, actually with bernadette and uh, three round stones We're doing a proof of concept of taking this ontology and showing how we can extend those concepts into the linked open data cloud bring, uh External data in map it up with internal data, etc And again one of the things that we think makes that tractable is getting Your core down a manageable size So, um, you know, here's what we have if we learned anything this that this is possible Well, we want now want to talk about a little bit is how do you do this? How do you how do you get down to something simpler? um Semantic technology or or really just thinking semantically Get you a lot of the way there Um It's sort of like object oriented people later said You know, you don't have to have an object oriented language to do object oriented programming And it's literally true if you already know how to do object oriented. Yes, you could do an assembler But most people given an assembler and the idea of object oriented aren't gonna Snap those together. It just doesn't happen that way So so some of the technology actually helps you think differently, but the real point is to think differently um And a couple of things You know, obviously I can't go into a whole deal about semantics right now, but I think there's Three or four things are sort of interesting. I'll give you a couple of Case examples to to kind of get the the wheels going there um Semantic the semantic approach is more about attempting to model the real world And at first we think that the real world is way more complex than the simplification that we have in our models But the truth is almost just the opposite The the real world becomes the final arbitrar of everything you do And I could model this little thing. I'm holding in my hand a hundred times And it's probably has been modeled a hundred times a hundred different systems, but it's just this thing You know the real world. It's actually kind of simple So, you know, if I put an RFID tag on it, I'd pretty much be done And just say well, it's a thing and it has an ID if I want to say some more about it You know, I can I can add that on but um My observation is the fact that we have So many identifiers for the same thing and so many ways of categorizing the same thing We've made our What should have been a simplified world more complex than the one we're trying to model That's one observation This is the one that I kind of hinted at with the properties Um In semantic technology properties are first class objects. They exist on their own Which is a which is a novel idea if you've been doing either relational or object-oriented most of your life because In relational or object-oriented you create the class or the table first And then you put attributes on it And because of that you end up with more attributes than Then concepts or classes or tables But in semantics properties exist on their own you invent them first and you reuse them later And if you invent them well, they're incredibly reusable You can you know the has part could be a relationship and Really the fact that this thing has a part and my body has a part semantically is the same relationship and we should Recognize this the same relationship and reuse it wherever we can So that's where a lot of that dramatic reduction comes from you just recognize. Oh, yeah I'm This thing means that the thing on the left describes the thing on the right. I should just use the same term again And then you get a yet another add-on benefit when you start thinking about reusing properties Properties now have inheritance, you know, we're used to thinking of classes having inheritance But in semantics if we say We've got a property called has has ancestor And that's the relationship of two people that are genetically related And I later say oh, I've got this specialization of this property called having a parent I then say that having a parent implies Do you have an ancestor implies specifically that person is one of your ancestors? But not vice versa And it's just because somebody's your ancestor doesn't mean they're your parent But because sometimes your parent it does mean they're one of your ancestors You know, you get this extra reuse benefit Where you know, you don't you don't have to say this every time you use it But probably one of the one of the more interesting things about semantic technology or thinking semantically is And and we we've just fallen into this sort of subconsciously In a traditional system every time we make a new table or a new class We imply that it's different than everything else we've done We don't say that and sometimes with object oriented We actually make it a subclass or something that existed But for them, you know for the most part that's more of an exception than the rule Whereas with semantics When you're constructing a formal definition of what something means The system is figuring out constantly behind the scenes. Which things are similar Something I've already got and is is telling you that's helping you design etc. So you end up with A lot of concepts Similar to other ones and you can use that similarity as you're constructing queries and all that kind of stuff like that So it cuts down a lot of you know, just x's joins all that kind of stuff and then one of the One of the last sort of benefits of of thinking semantically is you don't have to deal with structure issues Really trying to figure out what something means not how am I going to put it in the database? So if you find yourself You know drawing something like drawing a junction record or something like that It's because you're thinking of how am I going to store this? But this isn't a semantic concern how you you know how the foreign keys and all that stuff Link up and whether it's even whether it's many to many or any of that kind of stuff Because you're really just trying to say what is the concept in the real world? How am I going to describe it? How do I how do I describe membership in this class and how do I describe what it means to have properties for this thing? And then later in another step you derive this stuff from the semantic description So just by putting that out of your head Makes things a lot less confident in the first place you Drop the whole bunch of these things that don't exist in the real world. It makes it simpler just just for starters So how you go about doing that assuming that was a good idea and that was kind of fun We Use you don't have to but this seems to be this seems to work really well. So it suggests it An upper ontology. So before you go in before we go into a company we have something that's called gist it's freely available. It's it's a Creative commons license so you can just take it and do whatever you want with it as long as you attribute it So it's one of those kind of licenses. It's on our website and it has about a hundred Or about 200 actually if you take the classes and probably 200 of the most common Classes and properties that we have found in business systems Those are kind of the bedrock things that you just see over and over and over again So we start with that and then specialize from there And what we find is somewhere around 98% of the things that we find in a healthcare company or or a legal research company or a research and development company or student loan company Are derived from Things we already knew about So, you know, that's kind of and then the other 2% it's fine to create a few new class like a proctor and gamble interestingly We came this is probably no surprise. It came across the concept of a brand Then scratch your head for a long time and could it didn't fit any of our preconceived notions. In fact, a brand is a very complex Thing and of course nobody knows that probably better than proctor and gamble and they've been doing this for 100 years and they So we just said that'd be a separate class. We don't exactly know what a brand really is It's you know, it's a promise and it's a this and it's a that and it has Logos and now we just set aside Here's your brands. So it doesn't have not everything has to be derived from from the top In this upper ontology We have there's a there's like about eight really key high level concepts Some of them barely read this little red These are this is from the work we did with the healthcare folks and we drew this picture where the size of the bubble Was proportionate to the number of classes that were underneath it Just get an idea and we would have thought you know healthcare There's a lot of physical stuff going on. They've got scalpels and blood and all kinds of things But it turned out There weren't that many That you really need to distinguish at this semantic level They had a bunch more they were more interested in kinds of places Even more social beings or people or organizations or insurance companies or anybody you can get involved in contracts Um, this is kind of a catch-all for a lot of things You just have to have to make a system work, you know collections and dates and units of measure and all that sort of stuff Um But real big down here all the things that happen to you in the healthcare world You know whether you haven't happenedectomy or go to the doctor or have a follow-up or even get a bill The the content the body of knowledge of of medicine itself. Yeah What did you say was in that red dot point? physical things And then and then motivation is Why are you doing the things you're doing which is Includes things like obligations agreements contracts stuff like there's there's several things in that category But it turns out most things fit into this scheme One level down for centera at about 200 classes, you know, you can still print this on the wall You can't read it on the power point You print on the wall find most of the key concepts and then there's another Level below that So here's a a methodology Here's kind of step by step how to go about this and this was just a reverse engineered from these Last four projects. We asked ourselves what we did. That says illicit structure. That's a typo. It should be illicit semantics, but A lot of the way we do this is find people, you know subject matter experts and business analysts But also some IT folks who understand what's either in or should be in these systems You know an illicit information from them But pretty rapidly after that start aligning it with The upper ontology see you because as you'll see in the slide in a minute you want to challenge Your thinking as this stuff comes at you and say what is that thing they just described? Where does it really belong and in the act of doing that you get a lot more clear about what kind of thing it really is as an example and that As it you know has a tendency to Feedback and we do more interviews and get more data start aligning it some more But then you want to start creating what we call exemplars little examples of parts of their model expressed semantically so that you could see am I creating valid inferences are these If somebody looked at these would they make sense does the Does the patient actually have a visit does the visit have a physician, you know an attending physician all that kind of stuff Um, and then at some point You just you just discovered. Oh, I got a lot of very similar sort of things here And you can start saying maybe that maybe there's a missing concept in here that would resolve that Then you know more Then we start looking at existing systems whether it's uh Interfaces to existing systems or the databases or profiling or whatever because what we want to start exploring now is completeness You know we've done some of this exercise. Have we really covered most of the important concepts in the Organization and and most of the important concepts are somewhere in somebody's system. You can't What you could but uh, this is kind of a sampling thing You want to go through and see am I You know roughly covering everything because There'd be 50,000 attributes and millions of rows and you know, there's a lot of work to really Do this all the way out Typically we want to come up with some way to visualize it that last one had the Crop circles, but you know, there's all kinds of ways to visualize these things But you eventually need to visualize it because we're going to socialize it In other words, we have to show it. We have to get people understand. What does it mean? Why should you buy in? et cetera et cetera um So conceptually if you started with an upper ontology you build out an enterprise ontology and then eventually um You specialize this either line of business like we said with procton gamble or this could be applications This application takes all the concepts from the core Specialize and put some structure to them rearrange is that Very often, you know adds in Distinctions that they're interested in locally that the rest of the enterprise isn't Well, I was I was sitting in john's presentation yesterday and occurred me. I should take one of these Put it in between them and say, you know, there's there's these taxonomies are Typically They're not sort of first class You know major structural elements They are ways we make fine distinction between similar things very often they should be Owned and managed by people that understand the domain And if they're shared they should be kind of in between these guys You know to where they share them like reference data And very often some of these live outside your organization. You know, if you're in healthcare, this is You know cpt codes or I see, you know disease codes and procedure codes and all this kind of things That you don't necessarily want in your ontology you want to recognize your ontology knows about diseases But it doesn't need to know about all hundred thousand diseases. They've come up with and they keep changing every year Those are kind of taxonomy things that will will specialize things later So there's a few Tips I think I'm good for five more minutes here Um, and I've got a slide for each of these You know, I'm I'm gonna put this one up because we made this mistake. So, you know Every once in a while you aren't rigorous enough you come across a concept like in healthcare They had a concept of a facility, you know a healthcare facility And we thought that sounds pretty good. We just made a class and we started specializing and building other things around it And it was quite late in the day that we realized we should have asked ourselves a question earlier on Which was what is a facility anyway? And there's and there's four logical things that it could be at least relative It could be literally a building which is a physical place My previous thing would be in that red dot of of a physical thing It could sometimes people refer to a group of buildings at campus if you will as the facility Times they mean the the actual region if you drew a space on the earth And say that's that's a facility that's a little rare, but Um, one of the guys who was doing this I think had that concept from his days of bowing As and as it turned out most of the usage they were referring to the organization not the physical plant although I find that weird and so, um Unfortunately, we left this one confused and so much later Some people could think it was the organization some people think it was the building etc And that is actually a problem That's one of those kinds of things that if you're more rigorous up front every time you hear something new Well, what is that you won't fall into that trap Um address is another one every everybody's system has an address everybody thinks they know what an address means Um the real problem with and uh, it was great to hear in the in the presentation Yesterday that 23 of all addresses are wrong I had a feeling that was true because You put an address in an address field and it's one of three things It's either um A tiny piece of content that tells you where a building is and if you're a ups That's what you want to know and what's different about this You can get the geo codes and you can put a lat long and put a pin in a map This is just a routing code for the post office and a lot of times It's one of the same But it isn't always one of the same and it's when it isn't that it screws up that You know post office boxes and apo addresses and various other things Uh our postal addresses that are not building addresses and when you get to this point you realize if I have an address If I if I've done data profiling and absolutely know that it's both I just I can make two of them and say here is the building address and and it's also Their postal address so in the cases where it isn't yeah It is Same or different it's it's very similar And In fact in in semantics you You do end up with super types and sub types although it's um There's two things are a little bit different about one is It ends up being much more of a lattice than you typically would design doing yourself You know when we do super types and sub types we typically although not always have single inheritance trees You don't have to but most you know my observations most designers do that. It's because of how they think um And the other thing is um in semantic technology We use the definition of the thing to Help us place the thing in the hierarchy In other words when you make a formal definition I get into that much your time doesn't permit but um A system will figure out that for instance a patient is a person Just by the you know when you construct a formal definition of what a healthcare patient is You know and we'll run the reasoner and it does a lot of that sub typing now some of it's obvious But every once in a while it comes up with some interesting Sub typing you go. Oh, yeah, I guess that is one of those But uh beyond that it is pretty similar and and and my earlier admonition still holds if With sub typing super typing and everything you could reduce the complexity of your schema You would reduce the cost of your systems. I mean I absolutely believe that And sometimes an address is what we call a geo region in other words You know we did some work for a workers comp company And some of their addresses are neither deliverable by the post office nor would UPS take a package there You know if you got hurt at exit 243 That was the address and that's only a geo region. It's not either these kind of things So that is it. I've got a few White papers about this which well, yeah and time for questions Social security number And you know we do it as a verb, you know has social security Violet right. Yeah And I've had different directions in the team that we were not going to constrain the domain On that so that would be reduced right and my colleague is saying, you know, it really bugs him because it's a person that has A social security number not the card or not an information exchange between the things And so he was saying he needed to create more properties so that we could be more precise in our language But that introduces complexity Right. So what what's the obvious? Right. I don't know if everyone heard that the question was they got into a thing about reusing properties and And if you reuse them at one level they appear to be Less specific I guess but it does introduce the complexity problem Um and how we typically do that in fact the example is with social security numbers We have a a property called has identifier If if you wanted to you could say, you know has social security number is a specialization of that and the and there's a benefit already Which is uh anytime you ask anything for its identifier if it happens to have a social security number You'll pick up the social security number that is a benefit But you actually don't need this extra property and and the reason is when you go to define a This would be the the definition of a of a legal Person in the u.s. Or something like that. You would say that they They have an identifier That is in the range of you're going to invent a class over here That is owned by the social security administration So they are the domain. They're they're the person who assigns these identifiers. It is a subclass of This is a subclass of a type of id. So your actual number your social security number is going to be over here and we say this is qualified by that and it's part of the definition of this class and Occasionally you do need to be more specific and add an extra property. That's why there's 400 instead of 200 But most of the time I don't I don't think creating the extra property buys you anything. It takes the whole way Yeah So Yeah, absolutely Yeah, that's a great one. I don't know if everybody heard that but if if you do end up having to have Uh Lots of scheme of the handle your storage needs your application specific all that kind of stuff Don't you still have the same level complexity And the well the relationship is what you want the relationship to be is is model driven So from the ontology to these schemas at best or at least mapped So that as you create dozens or hundreds of applications if they're all derived from the same elegant concepts Even if you rename them in some cases we've We've created tools where as it's derived from a to b it gets renamed But if you keep that map Then a lot later when it's time to go integrate two systems You're not back to You know having to do a lot of investigative research to figure out do these concepts really mean the same thing No, they do mean the same thing because they derive from the same place So it takes a lot of complexity out at that level. It doesn't obviously doesn't get rid of all the complexity It's just trying to reduce it So I think I'll take one more question and then a round of applause, but hold that Yeah, yeah, that's true. Okay. Well, thank I do have some white papers up here and