 Thanks for having me back, although you may regret it at the end. Terry asked me to talk about Odyssey, so I made up some very elaborate title. But the subtitle is really what matters. Why worry about clinical data comparability, and what are we going to do about it? I show this slide every talk I ever give, whether it's relevant or not. You might think it's a squeaker here, but it's not. Everything ultimately begins with a patient, and we go around this cycle of knowledge discovery and so on and so forth, and then we reinstill it. That's the learning health system. But what makes it work is comparable and consistent data, and absent comparability and consistency. This thing grinds to a halt. Arguably, precision medicine is the same darn thing. We could use the same cycle, genomic medicine being a component of that, of course. We require knowledge, but at the end of the day, we have to put patients, we have to do deep phenotyping into small homogenous groups and assemble comparable and consistent data. That's the bottom line. I submit that analytics is actually the easy part. The hardest part is getting the data together to enable the kind of analysis that we all want to do. So in the fabled genotype to phenotype association world, nobody questions the importance of reliable and consistent and accurate genomic data. We spend huge resources on trying to get that and never mind represented. I mean, the previous talk, I think, complements this beautifully. We often forget about the clinical side that it's equally important to have comparability, consistency, quality, and reproducibility on the clinical side, but that's really hard because we usually get clinical data opportunistically from electronic health records, and it doesn't always come out of the box in a comparable and consistent kind of way, and that's the problem. So you've got two options. We can map what we have to what you need, and that creates this terrible spaghetti tangle thing of everything point to point, or you define, and CDM is going to be an important abbreviation for the rest of this talk, a common data model. We pick one and map what we have to that common data model. That defines, theoretically, the canonical form, and then we get the whole interoperability problem fixed because we agree upon this common framework. Well, the nirvana about common data models is that once we choose one, and there's that if, we have a clearly defined hub. We can create spokes and derivatives from it easily. We map once, use many, where have you heard that before? Obviates redundancy of the work. We, data creation using the CDM is desirable, not always done, but you start thinking about it, gosh, why don't we just generate the data that way. And it defines, by definition, this notion of interoperability, if we're all using the same common data model. And so that's the nirvana side. With Heaven, you end up with the other place. So here's the Hades side of the thing. We're all happy to use a CDM as long as it's mine. And it's the old joke, so many CDMs to choose from, which makes it, of course, an oxymoron. The very fact that you have multiple CDMs, a plurality of them, is contrary to the principle. The other challenge that we confront, and people are willing to use a CDM, but they're going to extend it for their use case. Or worse, we're going to make a new CDM for our use case. Or even worse, I'm going to change an existing CDM to map it to what I really needed to do. And therein creeps the challenges. Therein creeps the difficulties. And is it, of course, a recipe for non-interoperability? That begs, which CDM do we use? And I'm only focusing on what I characterize as the high-profile research CDMs. I'll come back to non-research CDMs, because that actually is important. You've all heard of Sentinel, great pilot, FDA surveillance for adverse events, quite sensible. It intelligently leveraged a health services model because that's where the data was. It was available. It was accountable. And as our colleagues from Vanderbilt said yesterday, sometimes very simple data can get you very useful information. So that made sense for Sentinel to emphasize administrative data. But Coronet kind of adopted Sentinel, more or less. But when you're starting to do deep phenotyping, a health services research model and administrative data model becomes a little brittle, particularly when you try to sprinkle clinical sugar on it. And it didn't quite work. In the CTSAs, you have ACT, which is an I2B2 adaptation, partially implemented, kind of grumbling in the community. OMOP, which was a great consortium from Pharma, quite open, up above board, great goals, became Odyssey. And it focuses on large population questions. I'm going to spend more time on Odyssey because that's basically what Terry asked me to talk about. And then the other model, of course, is Trinetics, which is interesting, because Trinetics probably has the advantage of being the only one that really works effectively. And that's because it's commercially sponsored. When you sign up for Trinetics and Hopkins did, they send in the stormtroopers into your space. They model and map all the data on their nickel. They set up their own server. It's a Faustian deal. You sign away your soul. And then they push a button, and your data is on their nodes, on their networks. And you can create subnodes and consortia nodes. And it's all interoperable at all works. It's kind of interesting, but perhaps not what we want. So Odyssey's pretty cool. I actually like Odyssey. And these slides were shamelessly stolen from George Tripsack, who is Dr. Odyssey. And you can see that, simplistically, they focus shamelessly on tallying, causality, and prediction. It came out of the pharma industry. It came out of trying to leverage data across very heterogeneous environments, mapping to a large scale model to a CDM, in a way that would allow credible analytics across multiple nodes and environments and communities. And in that context, it's not bad. It's probably the best among those clinical data models for this kind of large scale purpose. It was not designed for deep phenotyping. We have to remember that. It has a very large community, lots of data partners, a very robust coordinating center that's well funded and well developed with components that kind of make sense. You can see that they have fairly clear semantic divisions in the context of the pink space. That's the clinical model, more or less, with the blue space on the right being the vocabulary that corresponds to it. Different health services, elements, and components that are relevant to the questions that they pose and ask. They have a fairly sophisticated management of multiple vocabularies in a way that you can see NDC, of course, dominates that is out of the pharma world. But the usual suspects in there, ICD, SNOMED, read codes, and the like, are fairly well managed by that consortium and community. And they do this openly and share it. So in that context, they're pretty good. George will point out that there's a lot of NIH uptake of Odyssey, the Emerge Network, this all of us study. I don't know if any of you have heard of it. FDA, NCI, and that's true. These communities have more or less agreed that Odyssey is a useful shared model among the large scale research models. And of course, they have a lot of tooling that is available to do what you want to do, design, implement, test, and so on. And those tools have evolved in an open source community to work with Odyssey data to improve the quality of the interpretation and functioning of Odyssey. And in that case, it's pretty good. But it doesn't get my vote. And I'll tell you why. Any large scale research model has intrinsic limitations. They're invariably designed for purpose. And when you get a very, very large scale model and in the jargon of computer science, prematurely bind data to that model, I don't care whether it's Odyssey or Precorinet or Sentinel or any of those. If you have a very large scale data model, that limits the flexibility and nimbleness with which you can do underlying analytics. It's just an intrinsic fact of large scale relational schema. Orthogonal questions, multiple outer joints can bring a query to a crawl. It's brittle in terms of its reuse, maintenance, and uptake. And it begs the question, what's the optimal size of a data model for discovery, for innovation, for the kinds of use case that I think genomic medicine ultimately needs? And this is the Goldilocks problems. Models that are too small, almost by definition, make the data inchoate. There's not enough substance to hold the data together and it falls apart. Models that are too big, as in many of the large scale research models, lead to brittle structures, inefficient queries, very problematic analytics. And some of us have been focusing on what we consider in the Goldilocks world just right level of data model. And that's the data element. So something at the level of a coherently defined laboratory observation as a self-describing object, of a coherently defined medication as a self-describing object, and so on, being the sweet spot. This is happening in parallel with the reality that historically, much of the reason why these clinical research models evolved was because the clinical data models were very primitive and very limited. If you've looked at HL7V2, you run away in horror. It is not what you'd wanna base a research program on. That's changed. That has changed dramatically, particularly in the past five years. There's international agreement now, pragmatic adaptation, restful resources, and frankly, the emergence of robust and flexible, object-level clinical models, in my opinion, obviates the need to have large-scale research models. Why not just use them the way they come out of the box? And of course, what I'm referring to is fire. If you haven't caught on fire yourself, you're living under a rock, it is fundamentally changing the way we conceptualize clinical data. It can de facto function as this object-level data element and enable us to have international consensus on what the heck these data elements might look like. And it is rapidly being adopted in clinical environments, obviating, again, the need of transformation. If we can get data out of clinical systems in a native fire format, why on the world would we then map them to some arbitrary large-scale data model? We should retain them and persist them as nimble, pluripotent fire objects is my assertion. So fire resources define a Goldilocks level of clinical data organization that is amenable to discovery and to unanticipated query and analytics. Unlike large-scale data models, those are the features stolen shamelessly from the fire site. It can function as the ultimate clinical data model. It's a right-sized level of specification. Think of fire objects as Lego pieces and you assemble them into the use case needs and requirements that are before you. You're not trying to draw from a large-scale table. And oh, by the way, if you wanna go from a pluripotent fire object into a fashionable common data model like Sentinel or PCORnet or Odyssey, that is a much more straightforward task because to go from a nimble flexible object into a fixed large-scale model is quite achievable, whereas the opposite is often very difficult. So it is the pluripotent data model. Data from multiple sources can be shred, elementized, normalized and generate an arbitrary number of projections, data marks, registries, other clinical data model formats and functions as might emerge. It's not unheard of in the research world to use fire. Frankly, all of us uses it at the source. The whole sync for science thing is all premised on fire. And why you would take sync for science and get fire objects and then transform them into some arbitrary large-scale data model is a mystery to me, but it's sort of their current strategy. That's because I didn't win the competition for the coordinating center, but anyhow. NCATS and the FDA partnership have adopted fire as the interoperable pluripotent model. Genomic results, we just finished hearing about a huge progress in the clinical fire community to represent genomic data as native fire resources. And then the CTSA next generation repository through the Center for Disease to Health, the very large coordinating center over all the CTSAs is embracing fire as the framework for translational research moving forward. So where's this going? Biomedical practice research and data are knowledge-intensive. Comparable and consistent data is what matters. It has to be done at the right level of granularity. Canonical rendering is a prerequisite for precision medicine and obviously genomic medicine. Data element models are optimal for precision medicine and fire resources are the obvious candidate to underpin our management of clinical data for use in discovery and other secondary applications. Thank you. I was just to clarify, I won't do the runaway part yet. Great, thank you, Chris. Then to wrap it up, the session for the discussion, Gila Teravitz from Harvard and Boston Children's is going to talk about how he solved the biggest problem in healthcare and that is patient portability. So or at least talk towards it anyway. So Gila, thank you very much for coming and opening this up as a critical element to the future.