 Hello, everyone, and welcome to our next EDW session called, I'm sorry, Towards a New Age of Conceptual Data Modeling, which will be presented by John Singer, the founder of Node-Ara software. All audience members are muted during these sessions, so please submit your questions in the Q&A window on the right side of the screen. And our speaker will respond to as many questions as possible at the end of the talk. Please note that there was a linked form at the bottom of the page titled EDW Conference Session Survey. This is where you can submit session feedback, and we'll encourage you to do so. So let's begin our presentation now. Thank you, and welcome, John. Hi, thank you very much, and it's a pleasure to be here. Even if here's just a virtual conference, it's good that we're having this. So today I wanna talk about conceptual data modeling and what I see as the future. So this is very much a kind of a futuristic talk. And so let's get going. First, a little bit about myself. I'm a recovering data architect. I've been doing this for over four decades. Recently, I've gotten involved with property graph databases, and especially data modeling for property graphs. I've got an open source data modeling software called Nodera. You're welcome to go to our GitHub page and get your free copy. I also have a property graph database fundamentals and modeling course, eLearningCurve. So once again, there's my contact info, and I'd love to hear from you. So today I wanna talk about conceptual modeling, and I'm kind of posing a question that can conceptual models save IT from itself? And I definitely think that we need a little bit of saving. We need a little help in terms of how we build systems, especially from a data perspective. So we're gonna look at the current state, kind of what the fundamental issue is as I see it. And then we're going to look at how we might change this in the future. And we'll go through a number of existing technologies that seem to have elements of what I believe we need, and then talk about how this might all come together. And what we're looking for is really moving from data modeling to meaning modeling, meaning-based modeling that's more language oriented. And I believe we need to move from databases as they exist today to something that's more of a semantic or conceptual persistence layer. And it's, I think the way I see this, the systems we build today, despite the fact that there's so much bigger and faster and more of everything. And it's not that our systems don't produce good results. People are doing amazing things with machine learning and business analytics. There's no question about that. But at the end of the day, we're really still building unit record processing systems. They're just faster and better at what they do. And I don't think we can move forward until we address that issue. So I've been accused of being a physical data modeler. Like maybe that's a, some people see that as an insult, I'll accept it. I'm okay with that. What I mean by that is, and maybe those of you who are working data modeling groups can appreciate this, but you get assigned to a project and as the data modeler and what the project wants is a data model, what they really want is a physical database design. And so at the end of the process, that's basically what you get or what you give them is a physical database design. And I've always taken the approach that that's what people want. And so that's where we're gonna go. Now, the modeling approach or the methodology that we've all been taught is we build a conceptual data model and then we extract from that a logical data model and then we extract from that or refine that into a physical data model. And I don't have a problem with this process per se, but the problem I have is when you start to ask questions like what is a conceptual data model, you really don't get much of an answer. You just get kind of broad brush strokes of, well, it's more abstract. Well, it's just only the entities and or it's the less refined model. And to me, that's not sufficient. It's really not what we need to accomplish, but it's all, it's what we have. And so the modeling tools, they do support this. You can create these different models. You can link them together, but the problem you run into is, it's really hard to maintain. And that the end of the day, honestly, you can create the greatest conceptual model in the world, but nobody cares about it because it's just not impactful to anyone other than the data model. The other problem that's happening is with this polyglot persistence layer. And we now have so many different target databases that an entity relationship model doesn't really apply to a lot of the databases that we're using today. So I like the process. I don't have a complaint with it, but it's really not sufficient for conceptual models. Now that all that being said, I am gonna make the statement that I truly believe that we desperately need a conceptual data model. And I think this conference is itself is an example or the evidence that you need to see why we need a conceptual data model. In my view, most of what this conference deals with are topics that really exist because we need them to fix the lack of a conceptual data design. So all of these processes, the data catalog, glossary dictionary, data quality, data governance strategy, data lineage, all of these processes are required because the design to the extent that we even design these things at the front end of the system gets lost. It isn't captured in the data model because the data model can't capture it. And when we persist the data into the database, it sure doesn't get captured there. So I believe we need this and I think we need it to fix the issues with the systems we're building. So thinking about this, I've come up with a couple of high level requirements and this is what I'm describing here is our solution requirements. And it's not just a data model, but I believe it's both the model and the persistence. So I'm calling it a conceptual database. So these three requirements are the model has to equal the data. In other words, the model and the data should be defined using the same language. It should be technology neutral. So the model and the data should, of course, as soon as we persist it, we have to commit to some technology to do this thing, but it has to be in a form that we can easily map back and forth to our existing databases. And finally, I think we need to more closely mirror our human behavior because we're really good at defining concepts and talking about them. And so I think language is, to me, really the missing piece. And we'll talk a little more about that as we go on. So what I wanna do next is go through quickly some existing technologies that seem to exhibit some of these capabilities. And my general thesis is that a lot of this existing work is going to start merging together. Innovation rarely happens kind of in a vacuum or there's really no such thing as something that's brand new. It's always rethinking or reworking existing stuff. All right, so let's go back and we'll talk about some conceptual modeling approaches and we're gonna go back to 1975 and Peter Chen wrote a paper The Entity Relationship Model Toward a Unified View of Data. His goal at the time was to unify these different data models that were people were talking about and building databases for. And in the introduction to the paper, he wrote and I quote here, the relational model is based on relational theory, but it may lose some important semantic information about the real world. So there, he hits the nail on the head right at the very beginning that we can create a conceptual model that's more semantically rich, but as soon as we put that data in a relational database, we lose all the context. So his modeling approach and this is a diagram from a company called Concept Draw. They have a Chen, they have a diagramming tool that does lots of different diagram types, but they can do a Chen model. And this is just a kind of an eye chart here for you, but a couple of things to notice. Here, relationships can have properties that's not very terribly relational and that you model properties out separately, you represent them and their relationships to their entities separately. So it's, I think a kind of a deeper dive into understanding the data. But, and I often think, you know, maybe Chen was right. I mean, if the database vendors had built the early databases more along this model, maybe we would be a little further along than we are today. So here's another modeling approach that's more conceptually oriented. It was the guy's name, Nisjin, I'm probably mispronouncing it, that's originally it was that, but then they renamed it the natural language information analysis model. And then it kind of became more generally known as the Oram-Object-Row model. But this is another modeling approach. This came along, I believe in the 90s. And their stated goal, it's in the name natural language, they wanted to come up with a conceptual modeling approach that more reflected the language that you use to describe the concepts that you're modeling. And one of the things they do here, if you look at the lower right corner there, there's an entity in that circle, it says value dollar sign plus. And so really that's the entity that represents all of the possible positive dollar values. And to create a property of an entity, you actually create a relationship that describes that property. So a product is nominally charged a dollar value. So it's really that relationship nominally charged that describes the property. And that's once again, is a more semantically rich way to model data. And of course, the problem is this, this one didn't catch on any more than Chen's methodology. And that's because it doesn't persist in this form to a database. You could drive a relational design out of this, that's fine, but you lose all that semantic detail. So now I wanna switch gears and look at a couple of database packages, relatively new database technologies, I should say, and that seem to give us a little bit of what we're looking for. Now the first one is the property graph, probably best realized by a company called Neo4j, and they produce a property graph database. And in the property graph, it's a very simple model, it's nodes and relationships. So the ovals represent nodes and the arrow represents the relationship between two nodes. And here in this example, you can just see the node has a label called person and then a property called given name with my name. And then the relationship is called homepage and that points to another node that has a label URL and there's my homepage. So this is a more of an alarm style model. And that's what's interesting about when you start designing property graphs, all of a sudden realize that you just kind of intuitively start designing them more like a Chen or an Orm style model. So to understand property graphs, you first, you really need to just let go of everything you know about relational database because it's just not, that's not what it is. So, and I've listed a few of the distinctions here, but the physical data model for property graphs is fixed. It's very simple, it's nodes and relationships and you put properties on them. This is an extremely flexible model that you really can kind of do anything you want with it. The conceptual data model is not defined. You really just define it at runtime. And it does kind of, as I said before, you kind of just intuitively start modeling your data more at the treating every property as an entity because you can because it's simple to do. The downside to this, of course, is that the semantics are just all in your, they're still all in your head. It's just by convention only. And the underlying database doesn't really have any understanding of the semantics. So here's another database technology that's relatively new. And that's the semantic web. So this is the list of standards. It's a bunch of W3C or internet standards. And the goal is anyone anywhere can say anything about anything. So it's distributed by its very nature. You can publish data using this approach and you can link it to any other published data. If you're interested, look up the linked open data cloud. It's really a fascinating, you can see how they've merged on the internet all kinds of diverse data. So we talk about the primarily the RDF, RDFS and all, those are the standards that let you define ontologies and are just resource describe resources. So this is a little closer look, what exactly do you store in an RDF or an ontology or an RDF database? And that's what you describe everything in terms of what's called the RDF triple. So triples are basically a subject predicate object statement, if you will. It is a graph model if you look at it. So semantic web databases like to call themselves graph databases because they wanna jump on that bandwagon but it's more a marketing, I think than an actual comparison to property graphs. But so the triple is the way you make any statement about your data. And notice that this is a bit of a linguistic structure, a subject, I could say subject verb object and a linguist would understand that's a grammatical construct for a complete sentence. So each triple is an assertion of a fact and it's a relationship that exists between these subjects and objects. But fundamentally the semantic web, what you're really doing is you're describing things using a form of logic. And that's the essential to kind of grasp how ontologies of the semantic web is different than relational databases. Here's another, just a quick example of diagrammatically, how you would illustrate some data and this is my contact information. So in the center we have a contact called John Singer ID and notice if you go to the upper left, there's a relationship called type and it's a type person. So here we are defining the thing called John Singer and we're saying it's a type person. And then if you go around kind of clockwise you can see more properties of this contact. I have a homepage and there's the value. So, and this is all drawn from a, you can design these vocabularies and give them names. This happens to be foe for friend of a friend. And the things to notice here is that like in a form model, every property is described as its own entity and it's defined by the relationship it has with the other entity. So properties are definitely modeled as first-class citizens. The other thing that's interesting is that the data model, if you will, the semantics are defined using the same language as the instance data. So you see this mixed here and where I'm defining my contact as a type which is person. So that's really more defining the meaning, classifying what I am. And then the rest of it is actually calling out instances, specific instances of data. And it's all done using the same language. So you don't lose the semantics. When you store this in a database you've got the instance data and the data model. It's all defined in the same place. And the database has knowledge of this. It can make use of it. So once again, understanding the semantic web. So the first thing that you need to do once again is you just need to let go of what you know about relational databases. And you realize we're modeling properties, domains and ranges. These are first-class citizens. We're defining the types in the model using the same language. One of the things that's interesting is that you can define instance data without declaring a type. It just assumes it's a thing. And then you can come back later and the database can actually compute for you. It can analyze and figure out what the type is from the instance data. And so this is kind of a backwards process from if you think of it from the relational world in a relational database, I have to declare a type if I'm gonna have a, which is the tables, right? So I have to declare I have a customer table before I can add data to it. In this type of world, you can actually collect instance data and then ask the database to classify it for you or determine what category it belongs in with respect to the physical and conceptual model. So the physical data model is kind of fixed. It's the triple. That's how you express everything. But the conceptual data model is rigorously defined as opposed to the property graph where we're just kind of by convention defining the conceptual model here. It's specifically called out. And you don't use a type name without first saying that you have a type and the same thing goes with the relationship or a property name or any of those things. So once again, when you go to persist data, you don't lose the semantics. Now, like I said earlier, when you're defining a semantic web database using RDF, RDFS and all, you're really building an ontology and it's based on logic. And that's what you need to think of what you're defining in that terms, in the terms of, well, making this description of something, what can I infer from it using the logic? So logic is, to me, it's both it's superpower and it's kryptonite. And the superpower is you get an insight the superpower is you get an inferencing engine and you can infer new facts from given facts you can infer the types, you can classify things. All of this can be done by the inferencing engine. The kryptonite part is it's hard to understand. Okay, it's a difficult, you know, really smart people get logic and the rest of us all kind of struggle. So once again, semantic web database and ontology is a technology it supports. It seems to have some of the requirements that we're interested in. Most importantly, the model equals data requirements is clearly here. The real issue is ease of use. You know, how can we make this stuff easier to use and accessible to people? And really our business users, not just IT experts. All right, so we're gonna switch gears again now and let's look at another technology if you will. And I wanna talk about linguistics or language. And I've mentioned this several times as we've been going along. I believe this is kind of an important next step. So let's, excuse me, let's just take a quick look at a couple of things in this area. So linguistics, the study of language and grammar, it's obviously a big field of study. And at some point, a branch of that kind of came about which was referred to as formal semantics and they define this model of how language and conceptualization works. And so they talk about you have sentences which is of course a grammatically well-formed string of words, but the meaning, the question is what's the meaning of the sentence and the meaning that's carried in that sentence is called a proposition. And then the thinking is that we convert the propositions into a formal semantics, which is this kind of this logic meta language. And so these propositions are then defined or kind of rewritten using this predicate calculus which is a formal logic. And the procedure is over on the right. So you have spoken language. So somebody makes a statement, a sentence which contains a proposition. And this goes through a translation into this logic meta language which is based on the logic from the language. And then you go through this process of matching where you compare this proposition to your understanding of the world, your model of the world and you determine whether the statement's true or not. And of course this can all be accomplished because we've converted it to logic and we can process that. So here we have a way to, and this has been fairly well worked out but how we can structure language and use it to compute, if you will. I don't know that this has been productized. It's, if it has, I would be interested in hearing that. But the problem is, I guess not the problem. The question though is, is this really how our mind works? And this is kind of the brain as a computer model, if you will. But it is an example of a grammatical system that's been worked out that is computable. So another kind of moving ahead into what's kind of a more modern, if you will, view of the world is conceptual linguistics. And this is a, it's a branch. Once again, it's another branch of the general linguistics enterprise. But where this came from was the psychology world was studying, of course psychology, the study of behavior and cognitive psychology where we started really thinking about how does our mind work? And then how do we build concepts in our mind and how does all that work? How does our consciousness work? And that field of study kind of merged with linguistics. There was some people who had the insight that our use of language is really based on our ability to conceptualize, which is our cognitive process. And so these fields merged into this field called conceptual linguistics. And this is a very modern, this is kind of the state of the state, if you will. And this is a very simplified way of looking at the model that's come about from all this research. And we look over on the left-hand side, the thing that what conceptual linguistics, how they see the world differently than the more formal linguistics approach, but they start with everything that we do in terms of our way of thinking and talking is based on our embodied experience. And so what they mean by that is everything that you sense, you build your conceptual structures based on how you sense the real world. And that kind of becomes your baseline at which point then you build more abstract concepts. Obviously, we deal with a lot of things that are abstract and not just what we experience in the world, but you build those abstract concepts basically as metaphors of the physical world that you've experienced. And so that's how you're able to build up this large conceptual structure in your mind. And that is all below in the subconscious. Now the way language works is you once again, before I talk, I build up in my mind this simulation or this mental space and I pull the meaning out of my conceptual structure which is encyclopedic. It has everything I know as little as that is. And I pull out what I need into this mental space and that's just what I need in order to produce language. So I know I have an idea of what I wanna talk to you about and I pull that into this simulation and because I don't need to be thinking about everything I know just what it is I wanna talk about. And then from that simulation, I am then able to construct a grammatical statement that you can understand. You know they've got about seven minutes before the Q&A. Okay, so then the person listening to me they kind of do the process in reverse. And so they hear the words where the words are props for meaning. And so the word doesn't carry all the meaning. It's just enough of a prompt that you can reach down into your storehouse of concepts and knowledge and you can build this simulation. It's kind of the video game that's constantly running in your head. And then that's how you understand what you hear the other person saying. And the point behind this is if you think about this, it's like a miracle. It's something we do this without even thinking. We're able to build these complex meanings and speak them to other people and the other people actually understand. So to me, this is the ultimate in ease of use. So that's, and this is why I think language is important. We need to move somehow, we need to move to where the way we model the data concepts is in a form that we easily understand and that survives being persistent to a database. And the only way I'm able to see how this can happen is we have to go to a more of a language-based API. And this is how we can close the gap between the meaning of things and the persistent data. So just imagine if the system was actually able to explain itself, right? So all those data quality, dictionary, stewardship processes, if we implemented them upfront when we designed the data, then this, as it's persisted and captured in this conceptual database, the system ought to be able to come back and explain to us, well, what is the definition of that? You know, which part of the business cares about it? We should be able to capture and maintain all this business context in a way that stays with the data. Now, the challenge as I see it is though, is somehow we've got a bridge from the logic to the language. So that first linguistic model I talked about was a formal approach. The second one was kind of really, I think the more realistic the human brain as a fuzzy logic pattern matching machine and somehow we've got to imitate that because that's how we work, but it has to be computable, something that we can put in a computer. So once again, we started with these three high-level requirements, model equals data, technology neutral, which I'm kind of contradicting myself in a way. At some point, I think this language API has to persist data. Well, that's committing to a technology, but it still needs to be technology neutral enough that we need to be able to interface it, you know, the existing legacy world isn't gonna go away. And the tools, there's always specialized tools for processing data and we need to push data out in a pipeline of those tools. And we need to read raw data in as our kind of the computer sense system. So, but we need to do this in a way that more mirrors our human behavior. And once again, I believe that language is the way to accomplish that. We looked at some existing technologies, some conceptual modeling methods. They're more conceptual than what we're used today and they didn't get picked up or used, frankly, because they didn't persist. And we looked at some newer persistence mechanisms that seem to be a little more oriented towards a conceptual models in the graph case and semantic, it's kind of what it is. It is a model of understanding. So, there seems to be progress there. And we looked at some language processes. There's a number of theories of grammar construction that show how we map the words to the meaning. And we just need to tie that back to some of these persistence approaches. There's other technologies too. They're just, you know, lack of time here, but there's a lot of work being done, obviously a natural language processing. I'm not saying we need to solve that problem. Natural language itself is kind of hard to nail down, but I do think we need to systematize it somehow using standard grammar constructions. There's a lot of work that's been done in various knowledge representation systems that are kind of more conceptual oriented knowledge graphs. As an example, you've heard that at the conference. That's a great place to start. So, to me, the good news is that a lot of what we need exists. It's there, it just needs to get merged together in a way that kind of unites the language. The more human oriented way we process knowledge is through language, right? And that's easy, because we do it without even thinking about it. And somehow we need to merge that with a database technology that lets us capture and move that in and out of a system. So, I believe this is gonna happen. There's an arc, if you will, to these technologies and there's a great deal of research. And we already heard a lot of intersectionality between the technologies we looked at in that some of them are language oriented and some of them are more oriented to our actual way of thinking of concepts. So, this is gonna happen. All we need is a spark. And I think we're gonna get this fire started. So, this is very much a future. Like I say, there's research and productization going in these different areas, but nobody's really pulled it together. And I'm suggesting that this is what's going to happen. And maybe five to 10 years is what we're talking about. And I think with that, I've hit my time limit. If you're interested, if these are new concepts for you, there's a couple of things you can do to get started. You can work on knowledge graphs. You can download Nodera, which is a graph data modeling tool that's free. And you can work on building a knowledge graph. A lot of people in this conference have talked about knowledge graphs for those data oriented processes. And I think that's a great idea. If you'd like to learn, I'd also recommend dip your toe into the ontology logic world and you can do that for free. Just download protégé. There's a thing called the pizza tutorial and that'll give you a kind of a general gentle introduction. So I think I'm out of time and... It was great, John. Thanks for that. I do have a couple of questions in the side chat that I've faced it to you, if you want to weigh in on those. Okay, yeah, I'm switching over to that. Okay, so the first one I think I just kind of... How do you see a semantic data model adoption within the enterprise outlook for the next five years? So the next five years, I think you're gonna see knowledge graphs growing. There's people are starting to build front ends that make it easy to kind of zero code or low code front ends. And knowledge graphs that they use of a simple semantic linguistic structure. And that is probably the first thing you'll see more and more of. But how, who starts building a language front end? And by language, I mean something where I can type in sentences and I can type in questions and get answers back. I do believe that we really need to get to that level of ease of use. And I think that's in the research world. I think it's in universities, but it's five years before I think we see that kind of commercialized. There is a, people have done a lot of amazing things with language acquisition, scanning text and extracting meaning. Okay, next property graph and semantic web graph database and the linguistic models all depend on clear meaning and definitions of any of these. Unless these are well-defined, the specific meaning tech doesn't matter. The specific modeling technique doesn't matter, doesn't. So I guess what I'm saying is we need a specific modeling technique. Yeah, I mean, you're right. As long as it's all in your head, it doesn't matter because those meanings don't get into the database. Now, with the semantic web is to give it credit, though we're due that there is a rigorous definition of the meaning of the terms that you're defining. So not sure if I may, so not sure if I totally answered that question, but yes, what I'm looking for is a way, I believe we will use a structured language to both describe the model and the data in terms that we kind of intuitively understand. Please suggest a good primer book training material to take this forward. I did suggest a couple of things on this last chart here to take it forward and I'll tell you what there's, and I'll try to get more of that on my website too. Okay, page six, doesn't every model lose some semantic information about the real world? Okay, well, fair enough. Right, the real world, our own minds, we do not retain all of our experiences that we've had in the real world. So that's true, that actually the way your mind works is you have your sensory experience, but then you create an attenuated version of it. So think of taking a JPEG file, you know, a 12 meg picture from your phone and then you compress it into a JPEG that's small enough to send in an email. That's the attenuated version. So absolutely, we're not necessarily trying to simulate the entire real world, but we are trying to get a better semantic understanding of what we do wanna retain. Yeah, so, and this is where I wish I had, I was actually in the room with the audience because this is a philosophical question, right? Philosophers have been studying, you know, truth, meaning and existence for 2,000 years. All right, so here we have another now, I would disagree that natural language is easy, we're only very experienced using it, it quickly becomes apparent when we start trying to master a foreign language. I guess that's the natural language that you speak is the language you speak is easy to do, but I don't disagree, you try to learn a foreign language and that becomes a real challenge because we acquire language at birth, you know, but not at birth, I'm sorry, we start to acquire language after, you know, basically a year of age. So, I think that's it on the questions and I think we're pretty close to out of time. The slides, I apologize, I uploaded them today, if they're not here, they'll be on the website soon, or you can, they'll be on my website, they'll be on the Dataversity website, hopefully soon. And so, John, actually the slides did end up getting posted to the session about Midway Through, so I think people were able to get that. Okay, good. And I did have a couple of the comments come in here, I'm gonna send you a question right now. The traversal from conceptual to physical is one of de-abstraction, how would a semantic approach more effectively allow for this de-abstraction process? So, I'm not sure I agree with that or I guess I'm not sure I want that to happen. I think I want to be able to model concepts and understand the concepts in our mind. This is one of the things that comes out of cognitive linguistics is that the concepts we have in our mind are actually built out of layers. And that's in the layers based on metaphors, if you will. So, when we create an abstract concept, we actually build it from the concepts of how we interact with the real world. And I'm not sure I agree with the statement that it's a de-abstraction. I think the physical database just becomes more technical, but what my goal or what my hope is to see something where the abstraction or the concepts are identified and if we have a layer of more abstract and more specific versions of the concepts, that gets modeled effectively and we really, I don't think we should really even be aware of the physical persistence of it. It really, that should just happen without us even understanding or feeling it. Cool, so that puts us at time. Thank you so much, John, for this great presentation. And thanks to our attendees for tuning in. Please be sure to complete your conference session survey at the bottom of this page. The next session should start in about 10 minutes. Thanks, everyone. Okay, and thank you.