 Hey, welcome everyone to grand rounds. I'm super excited today to have Michelle rebar visiting. She was planning on being in person but wasn't able to make it and so. But we're really grateful that she's able to to speak to us virtually today. Dr rebar is an informaticist with really deep expertise in ophthalmology she's worked a lot with ophthalmology EHR data with phenotyping from electronic health record data. And she works at the at OHSU and then she's also been a data scholar with the National Eye Institute. The National Eye Institute really is pushing and trying to improve informatics for disease and for ophthalmology. I know her I am part of a kind of a small glaucoma informatics interest group with researchers from across the country. Dr rebar attends that and has been really helpful and really taught me a lot about data standards and using electronic health record data and so just super grateful that you could come and speak to us today. So we've got some online. I think maybe you're able to see the the folks online and then we have a number of people here in our auditorium. So I have a mic that I'll run around for questions if there's if there's time for question or comments at the end. But otherwise will you can start speaking and thank you again for coming. Oh thank you Brian for inviting me and I apologize I really wish I were there in person this would be much more fun. But I'm glad I have the opportunity to talk to you virtually today. So, as Brian mentioned, I do a lot of research in the secondary use of each our data, particularly in the field of ophthalmology and vision science. I'm a data scholar at the National Eye Institute and this is a new program that the NIH has started out of the Office of data science and strategy, and each center can have one new I, or data scholar per year, and, but they can stay one to two years you can have two at a time. I am the inaugural run for the National Eye Institute. So I'm going to be talking to you today about the reuse of each our data for research. And that'll touch a little bit on the work I've been doing at the National Eye Institute, but the, the seminar at noon today will focus a little bit more on that be a little bit more specific. So, to start off with let's see I get this to fans. I just want to mention that I don't have any financial disclosures, and that we're going to be talking sounds like the title, each our data, what it is, and then the challenges for you for reuse and research, but then some opportunities. And really the important thing here, I have to put my informaticist hat on is to say that while the focus is on eye care and vision research, these concepts really apply across medical specialties and a lot of the things that we struggle with in ophthalmology in terms of reuse of our data is common across medicine so just want to keep that in mind. So let's start off talking about each our data. And so health care data are, you know, collected in the HR during the course of clinical care is just huge. It is estimated to be about 30% of the overall huge amount of data worldwide 64.2 zettabytes in 2020. And really the holy grail of all of this data and the reason that, you know, we have the high tech act that really, you know, encouraged each our adoption and we have, you know, very widespread each our adoption now is the promise to learn from all this data you know when you have, you know, clinical data that's collected if it's just in paper charts, kind of gets stuck there it helps that single patient but we don't. It's very difficult to learn from from that and maybe apply what you've learned from some patients to others. And that promise is slowly becoming a reality, not quickly as some of us would like, but I'll talk a little bit about some of the things that we have been able to do today. So this each our data is often called real world data or observational health data that sort of falls into that category, because it's data that's captured outside of a controlled or experimental setting for the most part sometimes people do capture some research study data in each ours for but for the most part it's, it's there for clinical care. But you can also be collecting as part of monitoring so that could be, you know, devices that are monitoring patients in an inpatient setting, but that monitoring also applies if we think about, you know, patient reported or collected data such as like Fitbit monitors or continuous glucose monitoring data. And also, Billings billing and claims data is also considered to be part of this observational health data, and it's really useful for studying patient outcomes during real care and over a much longer extended time. The trials are still the randomized controlled trial is still going to be the gold standard for, you know, really determining the safety and efficacy of a treatment. But this, you know, observational health data research can really be helpful for figuring out what happens over a much longer time than try trial could feasibly cover what happens, you know, how are patients responding to treatments that are different from those trial cohorts we all know that patients that meet inclusion and exclusion criteria within a clinical trial are really, those are optimized to really show the efficacy of a treatment, but most of the patients that you treat are often, you know, quite a bit different from that you know, how do those patients respond over time to medications and how do different treatments compare. And also, it can be very useful for studying and learning from a variety of practice patterns and sometimes so that variety of practice patterns really becomes a natural experiment. So if you're all familiar with the clinical data that's captured in the EHR, and this contains sort of that core data that's, you know, demographic data vitals exam history notes that kind of thing. And also on the left side, some of the data that right now isn't necessarily integrated exactly into the EHR but we still use it potentially during patient care. So in psychology, imaging data is a really important piece of data that you collect about patients that they use for clinical care, but it's not fully integrated into the EHR there's often connections and ways, you know, links you can click to see it but it's not fully part of your EHR. And on the right is data that some of the medication data might actually be considered to be core EHR data but also data that like labs and monitoring data that exists outside of the EHR. But gets pulled in and that you use doing clinical care. But really, the important thing about this is that data that's in that EHR is really optimized and stored in a way that it really helps you access a single data, single patients data very quickly. And this is not necessarily the best way for us to do research. So, when we use EHR data for research, we're actually using data that's coming from the EHR's data mark. And many of you may already know this but I've also is an informaticist not themology realize that some people don't know that the data that you use for research is stored separately from the EHR the data that you use in clinical care. And part of that is because the data that the way the format that of the data in the EHR is really not optimized well for accessing data about multiple patients so overnight. And this is part of your EHR system so you have epic I have epic it we've epic it OHSU is that data gets pushed from the EHR into a relational database called the data mark. And then that data is used for things like reporting for querying data for research. It can actually be used for financial reporting as well. And part of that is because you know it's stored in a better format but the other, you know probably more important piece is that it's kind of, you know, it doesn't slow down the production each our database. We really don't want to be, you know, interfering with clinical care when we're running our queries for research. And, but also to do research using each our data we could be sending that data to other places such as clinical registries. And I'm not entirely sure if you talk contributes to the iris registry or source, but those are ophthalmology specific registries, and this really allows us to access multiple patients data for research. So in ophthalmology research using clinical data in I care and vision has especially machine learning and AI is really focused on imaging. And the reason for this is it avoids some of the challenges of using other clinical data in the HR that we're going to talk about today. And while these models have been using imaging, pretty successfully, there are still some challenges that remain for doing this at a wide scale across the United States or even the world, and being having being able to have very large and diverse data sets and we'll talk a little bit more about that. But we're all familiar with, you know, the first AI system in medicine that gained FDA approval was for one that performs autonomous diabetic but not the screening. And the other addiction models have been felt for AMD glaucoma and RP, and it really has made ophthalmology a leader in artificial intelligence. And in it that's exciting and it's focused on imaging. And the expertise is on using HR data. And this is still much we're kind of lagging a little bit in terms of doing using unstructured and structured data that's in the HR for machine learning and AI. The data or the research is often limited to just a very few and at a single site or very few sites and studies. So we've done things like looking at IOP after anti bed injections, surgery prediction for glaucoma patients, looking at IOP changes after cataract surgery. And, but really much of this research, both imaging and the HR data was performed using data from a single or few sites. And even in imaging, the data sets that are used where you usually took, you know, years to curate in order to train the models. I work closely with colleagues at OHSU that did a lot of the work in the ROP AI models. And, you know, I have seen the amount of work that they've done to curate their data set. And it's a lot. So, I think we'll talk a little bit now about the reuse of each our data for research and what those challenges are. So, many of you are probably familiar with a lot of the challenges with EHR data and a lot of this are challenges not only for research but for clinical care as well. You know, data is missing. Sometimes data is only entered in notes which makes it difficult to extract for research. The data that was entered is unusable it's in a format, not sure what it means. Sometimes data is incorrect and it can be very difficult to figure out if data is incorrect because a lot of the data that we collect during clinical care we don't necessarily have a gold standard for. One that I often struggle with is the fact that data is fragmented among multiple institutions and we'll talk a little bit more about that. Data isn't standardized so it can be very difficult to do research between multiple sites. I had an experience where I was helping some researchers at OHSU in pediatrics work and pull some data in conjunction with UCSF. And UCSF uses Epic just like OHSU does. They sent us their queries that they used to pull together the data set. We pulled that data sent it off to them only to realize that we did not store data exactly the same. So it looked like our data looked like we had something like a third of our data, a third of our patient visits look like they were self pay, which very weird thing to have happened. So even when you use the same EHR at two different sites, the way you implement that EHR can really have big effects on your data and how you can harmonize that data. And then sometimes data is not accessible and data is not representative and that's a big challenge for any healthcare research because we can only see the data from patients who actually, you know, access healthcare and we all know that not everyone can access healthcare in the same way and equally, you know, at every single institution. So there are some very smart people in informatics that have come up with this framework for data readiness for research for EHR data. And it has four components to it. Data quality, data availability, data interoperability and data provenance. And so we're going to talk about those four different aspects of data readiness and how that plays out in ophthalmology today. So first we talk about data quality. Okay, so first we'll talk about EHR data quality. And there's a group that is from Khan Al there's a really sort of, you know, benchmark paper that talked about this, this data quality assessment terminology that has really been widely used. It talks about three different data quality categories. One is conformance on the data devalues adhere to specified standards and formats. Completeness are data values present or do we have a lot of data that's missing. So we'd think that the third C should be correctness and it is in a way, but that relates back to this idea that it's very difficult for us to determine whether something is correct, because it really isn't a gold standard. Even looking at a patient's chart, all you know is what someone observed someone measured in their, their conclusions and you don't necessarily know if that's absolutely correct. We talk about plausibility instead, you know, based on the evidence that's in the chart are the data values believable. So to demonstrate this a bit, I think it helps to look at examples so there is a paper by a bug who from Northwestern University, when he was at Northwestern University, looking at what visual acuties were recorded in the HR. And they looked at almost 300,000 outpatient visits at Northwestern. And the good news is that the completeness of this data is very high 99% of those visits actually had visual acuity measurements. The conformance, however, was a little bit not is wonderful to expect. You know, there were in those 300,000 visits, there were over 5000 unique values in those. So, and when they talked about the unique they talked about sort of each one being exactly the same so you have the ones they expect like 20 over 20, 20 over 10 that might be something. But then, you know, you get ones that have that measurement books. Get that measurement plus some kind of modifier like slow squinting simply like that, you might have 20 over 50 plus three or 20 over 40 minus one, or you might even have just text like same or no difference, something like that. So because of that, there has to be some sort of processing that makes that data, you know, useful because we know that there's going to be, you know, some data that doesn't look exactly the same but it's still usable. So, what they found was where that 80% of the recorded values were an exact match for what they expected so that's good, but 20% needed processing and that's that 20% that had that huge variety of all those different types of values. And you can see here this was just sort of a snapshot of the most common visual acuties, and you can see that for some of these like no light perception or NLP. You know the exact there's really was a very big exact match, but for counting fingers, there wasn't that needed some processing and that's to be expected since kind of fingers often has. You know some information about is particularly about the distance. Now, when they did this processing. The values on the left are the ones that they processed to that they wanted to get to these kind of standardized forms. You know the important thing that we wonder, and this is something that that I struggle with as we're working and standardizing this data is, are we losing information when we process this and we're forcing it into these really clean categories, or you know, is there something should be paying attention to that plus or minus one after that I guess it really wants me to go on. So, the next example of data quality is possibility and this is with medication list. So, I think many of you know that each our medication lists are notoriously error written. And my husband was just complaining about this the other day he's like, you know, with all this money that we're spending on each ours and you know all the work that I've seen you do over the past, you know 15 years. It's very so bad. Why am I asked all the time about these medications that, you know, were prescribed years ago. And, you know, it is difficult. They're often not updated. And when you're looking at a patient's chart you'd have medications that were prescribed elsewhere they're not often recorded. So you can have that sort of fragmented view as well. And over the counter medications are often not recorded in that medication list. I had a PhD student a few years ago who did a review of 150 glaucoma patient visits, looking at specifically glaucoma medications not all medications. And he showed that there was a discrepancy in 36% of those visits in terms of what the medications were that appeared in the notes and what appeared in the medication list for that patient. One of the good news is he was able to develop some NLP methods to reliably extract the medication from the notes. Okay, and so that second concept that we talked about is availability. So, one way I like to think about this is not necessarily the way that a lot of people think about this but as an informaticist. The data that is stored in the HR can affect its availability so structured data that is is collected either through, you know, things like radio buttons, check boxes, dropdowns. That data that day is called structured and it's very easy to access on the back end we really don't have to do very much we can pull it into report a query very easily. And that would be data that would be entered into like a progress note, surgical note something like that, that data, while, you know, we can pull it kind of as a big blob. We can't get information from it very easily that requires a lot of processing sometimes natural language processing to extract what was in that note, and what was meaningful. Sometimes you can, you can do it using some so simple filters but other times, it really needs some really complicated natural language processing, because you have to make sure that what you're finding it's not negated we're not finding it as part of, you know, patient history or family history, those kinds of things. And then you have this thing in the middle. These are sort of these free text fields and an epic if you're using kaleidoscope. Any of those fields where you can type data in those are free text fields so you can wind up with any type of text in there so even if you're entering in an IOP which should be a numeric value, you can wind up with numeric values numeric values plus text or even text. And that can make that kind of difficult and that's the reason why in that visual acuity study, you have those differences with conformance because you know you you don't have it's not forced into this nice format. So that can affect it, all this can affect availability because depending on how and who's accessing your data from your EHR, you can sometimes have you know it can be if you have a data analyst who just wants to extract structured data you won't have really good access to some of this other data. So there's other examples of EHR data availability. And one is in this one I've often run into when I worked with clinicians at KCI Institute at OHSU is that in the end, we're very lucky that all well, I guess lucky in one way that our major healthcare institutions all use Epic. So what that means is they're all set up within Epic care everywhere. So it looks pretty seamless from the clinicians view and they ask access patients record during a visit that they can kind of see all this data that is from, you know, legacy and governance all these other health care systems and within Portland. And then, but they're always surprised when they asked me to pull that data or they I pulled data for them like but where's all this other stuff and they well, you know, I can't really pull that from legacy, because it's not in our institution. We need data use agreements to access that data and sometimes they'll be stupid that data will have to be to identified with me, which can make it difficult to line up with the data at our institution. So this, this can mean that data that's available for research is often incomplete. And there have been some studies that have shown this is study Massachusetts showed that 31% of patients who had at least two hospital visit visits were seen in at least two different hospitals, which is pretty fragmented. And we did even did a study, looking at systemic data in ophthalmology patients. So, we're trying to come up with a cohort of patients looking for risk factors for glaucoma, and we're using cataract patients just trying to get a cohort that we felt, you know, were didn't necessarily have glaucoma. We were looking for risk factors, and we were trying to look at systemic factors such as hypertension, particularly anti hypertension medications. So we started looking in the chart and in reviewing and we came up with some robots so we actually went in and for 134 patients. And we found that roughly half of them had a primary care physician at OHSU. So then we went through for those 134 cataract patients and had a couple different physicians review their charts to figure out, you know, whether or not we thought this patient had hypertension. When we compared the patients that had primary care providers at OHSU versus those that didn't, there was a significant difference in terms of the percentages of patients that had an accurate diagnosis of hypertension in their charts. So this made it difficult for us to even be looking at these systemic risk factors for our patients, knowing that we don't, we aren't necessarily guaranteed to have a complete look at their data. The next one is interoperability challenges. And these challenges affect everything, essentially. So it affects, it involves the ability to connect other data sources, but also to harmonize that data between sources and link them up. And this requires physical access of connecting of connections, standardizing and harmonizing that data so make sure we're talking about the same thing. And, you know, having the permissions to access the data and making sure that that data is linked correctly between patients. And this is really necessary for multiple, multiple institutional studies and large data sets. But I think a lot of you have also run into this even in clinical care. And because we can't, we have interoperability challenges, even with connecting to imaging devices. So right now a lot of you I know are probably not either pulling in PDF reports for imaging or visual fields into your EHR or manually entering elements in there. And, which is a challenge. And we are challenged in particular so there's some national data sets that the NIH has created. N3C stands for the national cohort, COVID cohort collaborative and that was a large national data set with a spun up in 2020 in response to COVID. And it was it's collected data about all patients that were tested and treated for COVID. And also the all of us data set, which is also another NIH initiative that is trying to collect data about a million patients. And right now, ophthalmology specific data is not included in either of those data sets and some other ones as well, because of these interoperability challenges and I'll talk about that in a little bit. But it's people here. Okay. And then finally, we have this EHR data prop not so it's really to understand the origin of data and how it changes over time and how that affects research is that. So for example, if you train an AI model in, you know, using some data that data can change. And patient did it can change over time that's called data drift. So that means you really have to figure out how frequently you need to retrain those models to make sure that those models actually are working for the patients that you're seeing or that you're applying it to. And another example of this is that, you know, data that we know that was entered for billing may not be as accurate as data that is in the stored in the clinical record. So, and this can be a real challenge. We all know that when you enter in codes, you know, that are diagnosis because they're used for billing, they're not necessarily completely accurate or representative of what you're seeing in the clinic. So one example of this that we've found at OHSU was that we were trying to develop a cohort of newly diagnosed amblyopia patients and we found that, you know, in that time period about 4000 had a billing code for amblyopia on their first visit. When we went in and did a manual review of the charts, we found that another 1000 patients really did have amblyopia but they did not have a billing code for it. And this is really because amblyopia is often a secondary diagnosis. And it's not necessary that it wasn't necessary for the provider to enter the amblyopia diagnosis to get full reimbursement so it wasn't entered in. What does this all mean so unfortunately it means that with this poor data readiness, we can have bias data and this can definitely impact the models that we that we develop so as I mentioned select we can have what's called selection bias which only patients who use healthcare are included. And the, you know, the sicker patients have more data, and this can definitely bias our results in our models if we don't handle that data carefully. There's also measurement bias or information bias. And this is really relates to that lack of a gold standard disease. You know, we don't necessarily know exactly what a patient has but, you know, we can look at the different evidence in the chart to try and figure that out. But also disease staging can be subject, subjective, and that also can influence results or models. And then they also have something called confounding bias and this means that a factor influencing the outcome is missing from your analysis your model. And this sometimes happens because relevant data is missing that might be because it's in unstructured data was entered in the note it was an extra structured data, but also the data fragmentation that we talked about and the end result is that we can have bias data models and algorithms. I'll talk a little bit more later about how we can get around this but really one of the things that I want to point out that this also impacts clinical care because if we can't accurately identify patients. And, you know, who has a disease, it doesn't just affect our research affects our ability to use and take advantage of electronic digital data so cannot, you know, effectively trigger practice alerts, we can't integrate those models and workflows. Often can't fully track our patients progress over time. You know, if we think about that imaging data if it's not actually those measurements of the metadata isn't being pulled into the HR day we can't integrate that into visualizations very well. And we can't necessarily fully automate or support care, some care management activities. And we also don't have complete data about patients and those can, in fact, our, our clinical care as well. So, not to sound too discouraging but there are actually opportunities for research and opportunities to sort of get around these factors. Each data has been used for research and many examples across healthcare. This drug surveillance comparative effectiveness research, doing patient level prediction I'm going to be talking a little bit about clinical workflows, and the best results really are when data sets are large and diverse. An example of data of research I want to talk about our audit log time stamps, and this is an area that I have focused on and when I started in it over 10 years ago was really sort of this emerging area of using this data that's automatically generated during clinical care. And this is part of ONC's regulations for each our certification is that each ours have to track who's in the chart and who's in the record for what patient, when and where they are in the chart and what they're doing. Vendors can provide some reporting of this data and this use data kind of aggregating it and trying to show how people are using the EHR and so for example, signal reporting in Epic that some of you may be familiar with is it's, it's a little separate from their regulatory audit log. They're very careful about saying that, but it's, it's doing the same kind of event tracking, looking at where you are in the chart and what you're doing it and aggregating it into these results so you can actually figure out like for example, how much time you're spending documenting or how much time you're working outside of clinic hours that sort of things. And this data in its raw form or in its aggregated form that vendors provide can be used for research and there are some notations in terms of how the vendors implement it but for the most part it's, it's a very robust data source. And one way that we've used this is that we studied EHR use over time so we studied over 600,000 visits with 70 ophthalmic providers over a decade at OHSU from 2006 to 2016. We found that EHR use increased over time. Now I do need to go back in update this to see what's happened since 2016, but it was really interesting how at the start of EHR use, you know, the average amount of time that was used per visit was pretty close about 4.2 minutes. And it sort of hit a peak in about 2014 it sort of double that. And then sort of seems like it's kind of leveling back down to about six minutes per visit. Now it was really interesting when we showed this data to our epic data analyst who had been actually before he became the data analyst at KC he had actually worked in the implementation of epic back in the 2005 and 2006. And he looked at this and he's like, meaningful use worked, which I thought was a really interesting take on this. Most people physicians look at this and say oh man, each are just uses just getting, you know, the requirements for use just become untenable but so it's kind of an interesting way to see how it hers are being used over time. Another study we use this for was to compare the EHR use for providers when they use describing when they didn't use describe. And we found that, you know, as we expected the total amount of EHR time decreased for the provider when they had a scribe didn't have a scribe. And the most of that benefit came from during the visit when they were actually in the exam room with the patient. But in some cases, the, the providers actually spent more time in the chart after the visit. You know, when they had a scribe them when they didn't have a scribe. And you can see that when you looked at each of the seven providers that we studied that it was different some providers actually spent less time afterwards and some spent specifically more time. And we thought this was very interesting. Of course this data tells us nothing about how, how this worked for the provider and their level of satisfaction and some of them may have actually preferred this to have that one on one time with the patient even if they meant that they had to spend a little bit more time cleaning up their notes afterwards. And then finally we did a very interesting study where we use this data to, to create simulation models of clinics that allowed us to test different different scheduling strategies and what happened was we came up with a scheduling template that really decreased patient wait time by about 15%. At a time when our clinic volume was actually increasing. So, and how this works is that we really try and schedule the patients that are going to take a short amount of time early in the day versus the patients that are long having them later. Now, we didn't put the long patients at the very end because we didn't want our clinic to run over, but even just putting those short patients at the very start really decreased patient wait time and the idea is that if you start with the longer patients which a lot of clinics were doing because they didn't want the clinic to overwork, you know to run late is then you can wind up actually creating delays that you just can't recover from. And I am part of a national effort national group of researchers and I believe we have over 100 researchers that use audit log data and about three years ago we came up with some measures that we thought that each are should be measuring with this audit log data that can allow researchers to do better studies of EHR use. And you can see here what they are just totally each our time work outside of work. Time on note and note documentation time of prescriptions inbox, how much teamwork was associated with an order and the sort of aspirational metric for undivided attention. And the reason I mentioned this is that what has happened is over time is that vendors have adopt have adopted these measures, and they have also released these really large data sets of each are some of its audit log data some of its actual data that allows us to study this at scale and multiple sites looking at things of how physicians use each hours and how they compose notes comparison of work patterns. And the point of this is that this audit log data well it's not perfect. It doesn't suffer from some of these things like data quality issues. And when we actually made an effort to say hey these are what the metrics are that you should be providing. It enabled us to actually do these large multi select data set so it's really kind of, I think a good model for us moving forward with other each our data. So we really want large diverse data sets in order to do research, and the really to represent all the diversity that we can study with patients practice patterns and data and allows us to do studies of where diseases and outcomes. And there's a couple different approaches to this we can pull the data in and registries or we can study it in distributed manner. So pulling the data would be looking at atomic data registry such as iris and source. There's some limitations with this did you have to contribute to the registry to use the data and getting the those data use agreements in place can take sometimes years. And pulling and harmonizing this data is really resource intensive and the imaging data isn't included. There are some prospective data sets that have looked at pulling data. And this is all of us data that we I talked about. There's a bridge to I projects, the UK Biobank. And these can really help us if they're prospective so they can involve they can avoid some of the challenges of the EHR data because we can have better data quality. And they do require having some standardization. And that standardization that they are using is what's called OMOP the OMOP common data model. And we are looking at using this data model and this is a group that Brian mentioned through the Odyssey which is observational health data sciences and informatics. This is a international collaboration of thousands of researchers that are making trying to make this observational health data is particularly collected in EHR is useful and sort of networked and so the aspects of it are the common data model, which just sort of identifies and standardizes the naming of the data how it's stored, especially over time and making sure that everything is kind of normalized and so you can, you know, when you're actually referring to IOP in one institution it's the same data value at another institution. And what we're working on is getting that alphalmic data that's in this common data models it's already widely used it's used in all of us it's using in 3C it's used in these bridge to I projects. And we were trying to get the alphalmic data in the EHR to be included in that and that is what we're working on now. And I think due to time I'll skip this but the first thing about this data is that has really changed practice so it during the pandemic Odyssey studies were able to show when there was a concern about the AstraZeneca vaccine, having the Blair, the blood class it was initially pulled from the market Odyssey studies were able to show that it really was safe and they have it returned to market that was an Odyssey study. And there are some new studies coming out. I don't know if they've been released fully yet, but that have really shown that there is a reduced risk of along COVID for vaccinated patients versus unvaccinated patient. So, what we're trying to do at the National Eye Institute and within this Odyssey work that we're doing with OMOP is we really want to create this alphalmic data network where each of the sites that are part of this network have this data that has been standardized to an OMOP standard. We're also looking at ways to integrate that imaging into that standard. And then they can come together and do studies doing multi site studies. And they'll be sort of a data coordinating center in the middle. But the important thing is that the data will stay at the home institution which allows a little bit easier data use agreements. Hopefully we can integrate imaging into that and sharing data or pooling imaging is almost impossible so we're that's another thing we're hoping on. But we'll be sharing code and models between sites and then the results will be returned. And this is interesting to you. I encourage you to join us. Brian can give you information about how to join the work group I can to this. You know you can follow this link here. And we meet on the summer meeting one time a month but normally you meet twice a month. And it's a very welcoming community. And if you're looking for an area of research to get involved with this is great because you just come in as a subject matter expert in your field and then we can think about this data and how do you standardize it best for use in clinical care and research. So EHRs are a rich source for longitudinal real world observational data, and while ophthalmology is leading the way in AI and machine learning using imaging. EHR data is still an emerging research area and this is true across medicine. And so we're really working on these large diverse data sets that can include all these different facets of patient data systemic ophthalmic and imaging data. And I think I'll stop here. So, and I'll stop sharing because I can, or maybe I'll keep sharing and look at you. Great, thank you. This is just a really amazing summary of EHR data and ophthalmology and all the issues with it and also the opportunities with it. And just really some good just teaching and that too about EHR. I think we'll open it up to questions if people have have thoughts. I wanted to lead off with a question. If that's okay. Yeah, your information on the fragmentation like that really striking and I you know I guess fits with with what we'd expect. My question is, is that in some ways it really like that fragmentation is hard to fix when there's so much like that that's just like built into our healthcare system. And in some ways puts us at it at a disadvantage compared to other countries trying to use this use AI or build algorithms. I wonder if you had any thoughts on that like, you know, maybe from a national health system like like England or like the UK, how like how that's coming out of there. If you've seen differences with that and what's been published. Yeah, yeah, you've hit on one of the things that is really a big sticking point. But that I get frustrated with as well and yeah it is true that for countries that have national healthcare systems, you know the data that is collected within that national healthcare system, you know, is, you know, there's a single patient identifier right. I'm not sure if that same patient identifier persists across. You know, for more of the private healthcare institutions are not. I do know in the US there has, you know, been for least as long as I've been informatics people that I'm pushed for that national medical record identifier. But I believe Congress has stated stated pretty unequivocally that Social Security numbers will only ever be the single identifier so I don't think we'll ever have that. So, especially Regent Strieff has done a lot of work in trying to match patients. So, for in that and 3C, the national covert cohort collaborative. They have done some work in trying to link patient data, because in covert a lot of times patients that were tested in one spot got, you know, cared a different spot so they did do some type of, you know, linking that patient data. Also the iris registry does patient linking as well. So, if a patient has received care multiple institutions. I've worked with iris data. So I know exactly how a ride that can go. So it's not perfect, but you know there are attempts at getting around that and that was is in area I would love to see more work in especially from the perspective because if we are going to be doing things like using up some imaging as a way to either detect or monitor systemic conditions. I think we really do need we, you know, a focus on that improvement in that fragmented data, if we're really going to be using that observational health data otherwise we're just going to be stuck using perspective data sets which aren't bad. But we'd love to take advantage of that decades of observational health data. Can you hear me. Yeah. Okay, we've got another question here. Oh, she's got her own mic. Can you hear me Michelle. Yes, I can. I lean long I'm a retina researcher here. Thanks to Brian stag doing the years of work to bureaucracy to get us onto the source database. Many of us are just starting to do this kind of research so this was a really great and helpful talk for us. I'm just wondering, you know, you mentioned some examples of the limitations of the HR data, specifically like the systemic data regarding hypertension and then the notes versus the billing diagnosis regarding amblyopia. At what point is a problem so big that we just shouldn't try to ask that question or use that data and then, you know, how did you handle those two specific examples like, you know, are there ways to like say okay I think this is not affecting my, this is a limitation but it's not affecting my ability to answer this research question without with a reasonable level low bias, you know. Yeah, and that's a really great question and luckily we have highly trained statisticians and epidemiologists who can help us out with this. This is not my area of expertise, but I know enough to know that I need to include those people. So one of the things that I didn't touch on this in this talk, I'll talk a little bit more about this at my talk at noon, but the Odyssey community has come up with these wonderful tools that, and they have been developed by epidemiologists that really help figure out, do you have high enough quality data to do this type of study using this particular, you know, data site. And we also do, you know, things like cohort matching so that you are really trying to compare, you know, truly compare patients that have the same thing. There's also a lot of work that's gone into developing phenotypes and testing them so, you know, we can't necessarily expect to identify all of our, our embryo patients with diagnosis codes for example, what else could go into that phenotype that would allow us to identify that, you know, can we find other evidence in the chart that would lead us to think that that patient has amblyopia and what are, you know, what's the reliability of that evidence and that's something that I'm working on right now I don't have the answers to. But it really is trying to figure out, like you said, which questions and which studies we really can do what data is good enough. And when do we just say, okay, this isn't good enough and let's not try because anything would come out that will come out of the study just we wouldn't rely we wouldn't think the evidence was very reliable. So I think that's probably, that's the way we feel that is appropriate for approaching that. So but yeah, that it is a challenge. At a time does anyone else have a question or anything they'd like to talk about. Well, great. So we're looking forward to hearing more from you, Michelle and really appreciate your presentation. A lot of great teaching points there so thank you. Yeah, I wish I could see all of you but nice to be able to speak with you today.