 So I'm Alex Gonzalez Originally this presentation was I'm going to be given by our senior developer Kevin Murphy But he actually had a concussion about a week and a half ago So I'm I'm gracefully stepping in so if you have any questions at the end, I'll do my best to answer them So let me give you a little background on Kevin Murphy because he is the man behind this presentation Man wrote the abstract. He's our senior software developer. He kind of he's has his BS from Swarthmore College In mathematics. He does a lot of our genomics programming He's actually just finished working with our team to develop an iPad app Which allows researchers to browse genomic data on their iPads, which is pretty cool He's been with a program for about 15 years And I kind of explained that a little later what our department is we're actually Department of Biomedical and Health Informatics I Am Alex Gonzalez. I'm obviously a lot younger than Kevin Murphy Everyone jokes around because I'm always shouting him like I'm his kid I'm actually a graduate from Temple University in mathematics I work at the Children's Hospital of Philadelphia And I work in a relatively new role called a data integration analyst and what a data integration analyst does in our department is We work front-end with researchers to essentially combine genomic clinical Biomatic biomedical and laboratory data into our homegrown applications, which then we present to researchers and they can query Query on I'll get into that a little later Who is chopped so Children's Hospital of Philadelphia? It was the first United States pediatric hospital in the nation We generally a rank between one and two every year on the US News and World Report. We kind of have this Thing going on with Boston Children's every year. They're one we're two we're one they're two They're two they're one so it's kind of like the Red Sox the Yankees a little bit We're top pediatric and research institution on 535 bed hospital with 55 plus ancillary care facilities So we really do provide pediatric care to children in the region. I guess the the interesting about Children's Hospital is We're really our patient first. I mean a lot of hospitals in our region and throughout the country are relatively concerned with At the end of the day obviously treating the patient But you know making a little profit margin as well The majority of our funding comes from research Grants and grants from people in the area. So we really are kind of patient focused on the majority of our work is Truly trying to kind of help the children and I come in We have this pretty cool program We're of a child's family come in for care and they can't pay for it Chopped with the bill at the end of the day. So we're really proud of our organization So who are we we are the Department of Biomedical and Health Informatics for relatively new Department of within Children's Hospital We're kind of a four-pronged institution We have an application development team and our application development team as they put they developed homegrown applications Kind of I mentioned earlier and this ranges from anything from genomic reporting to clinical reporting We also have a data reporting and management group where a lot of our researchers get research data information We have a genomic analysis group We have a team of biomedical researchers and bioinformaticists and what they do is they develop ways to sequence DNA sequence genomes better and query that information for researchers and We have the fourth prong which is what I kind of sit on there is the education wing and we do a lot of outreach to Obviously conferences talking about what we're doing, but internally as well I'm really trying to push health and bioinformatics, which is a relatively new Field within a within health care as a whole So I'm going to define some buzzwords that I'll say kind of down the road And and these are kind of big, you know both was words in the industry right now So informatics well Informatics it kind of has a very simple definition, but it encompasses the whole range of different applications and techniques procedures It's using health information technologies to improve patient care Now that ranges from anything from you know using social media and reaching out the patients in that way To using electronic health record to try to improve patient care You may not know but the federal government's doing a huge push now with I'm meaningful used to have Companies across the country or health care institutions across the country have a lot of electronic health records and actually providing Lots of money and incentivizing So what is bioinformatics? Well bioinformatics is a little more a Little more to dive down. That's very specific to genomic data So essentially what researchers do in our institutions around the country is they sequence DNA and sequence different parts of the DNA genome or an exome and They determine what subset of that DNA causes specific illness or what subset of that DNA causes specific phenotype So this all this example I always give is I was actually in a genomic study a few years ago and what they did is they took they took my DNA and They kind of look at it and I have Tourette's they were looking for the Tourette's gene So I was in a study with 300 different people where they actually Sequenced our DNA and say okay. Well these this subset of sequence of genomes ad t3 whatever Think this is what causes or this is what we think causes this illness And we're doing that with congestive heart diseases. We're doing that with psychiatric diseases. We're doing that with cancer One of our big products I'll go into later Which actually uses a post-gressive foundation is our children's brain tumor consortium project And that has thousands of samples of children's brain tumors that have been sequenced and has clinical data surrounding it so what researchers do is they go into our Application and they say okay, and I'll show you this a little later They say okay. Well, I'm looking for children with this specific DNA subset with this specific sequence subset Who have these specific characteristics and let's run a study on them and see why They develop this illness and what we can do to treat it better So we'll get into the reason why I'm here why we realize post-gress Well, we're not database. We're not DBAs. We're not we're not we're not database We're database people, but we're not database people in the sense of database administrators Why we really like post-gresses is extremely high scrabble in the robust user community I think the biggest thing about post-gress that we like as an institution is it easily has the best user documentation Of any open source program period So we would look through post-gress implementation We would use post-gress docs to implement our software we use post-gress docs Not only post-gress documentation, but the driver surrounding it. So we use a lot of python So a lot of our a lot of the python drivers and a lot of the things of that nature that communicate with post-gress It was extremely well documented Again that a little bit later Open source is kind of a big deal for us Every dollar we don't spend on an oracle license and every dollar we don't spend on a Microsoft SQL license Is one dollar that we can spend towards bettering children's care And that's kind of big for us So the thousands hundreds of thousands of dollars we save Not implementing these and buying these licenses and associated support We directly funnel into patient care. We funnel into researchers. We funnel into grants Which then researchers use To treat patients and we find that It's kind of our main mission and and we find that's exceptionally important And this goes without saying I'm kind of preaching to the choir at this point But the usability of post-gress is is amazing Um, I mean someone can come in we've had people come in New level analysts who who have used other database systems But have never used post-gress and come in and within a week and a half two weeks They're using pg admin they're using navigate and they're up and running It's it's extremely usable from an user point So I'm going to start talking about kind of the applications that we That we use that post-gress powers and this is kind of our big one This is our flagship application. We've developed over the last five or six years Two of our main application developers I'll go into who they are later byron roof Is manager one and essentially with this is a the jango powered web app That kind of just talked about earlier where we put clinical data and genomic data into a Into a jango back went back in power web app And we have researchers essentially Query their own data in a de-identified way So what does that mean? So HIPAA health information um Essentially an action federal government which disallows people from using Identified data and certain things so we can't say, you know, alex kanzales has this set of stuff alex kanzales has this set of stuff We have to write a irb and go through a bunch of research stuff But if we de-identify all the data and say, I don't know who alex kanzales is I just know he has this stuff We can stick in a reporting tool like this and actually design research studies around these query tools and actually have a Or we actually have a pretty cool demo setup This is available Uh-oh Stuff like this always happens, right the screenshots layer. That's good enough So I was really hoping to show this but it's it's not opening up. I have some screenshots later It's a really cool research tool But I'll just talk about some of the goals of harvest. Hope we have some screenshots going down the line So harvest is a user interface which promotes discovery of underlying data Query without being forced into thinking turns relational database and this is the kind of a big thing, right? So if you're a clinician, you don't really care how data is joined in the back end You really don't care how data is structured. You really don't care how data persists. You just want your data So harvest is very easy friendly where you you kind of you kind of go in the harvest and you have a a series of filters gender race disease location etc And research is essentially all they do is click and drag and then that forms their cohort So instead of asking our reporting team to write a A query where give me all the male patients who have a specific heart defect at the health system and give me that now They can go into their they can go in the harvest and produce that cohort rather easily Um Another major thing is a kind of familiar format complex data our harvest tool exports data into excel sass rcsv formats So it's not only small sets of data, but we and i'll kind of explain this later There's there's kind of large sets of data that we use We're actually in the consortium with 10 children's hospitals I'll get into a little later where we actually use this harvest query tool And it's it's really cool actually so where does it kind of harvest fall? You everyone talks about these bi tools and all these intelligence tools that kind of exist So flexible data model a fixed data model kind of have your basic software crystal reports these these things up there and and and itb too are Extremely hard to use for a clinician extremely hard to use for an user We made harvest extremely user friendly where even you guys anyone and I'll send the link out later because I really want you to take a Look it's open source. It's it's a wonderful program. I really want you to take a look Anyone can go into their clinical data And inquire it in a very easy way And and that's that was kind of the main thing we wanted to produce Is it is an easy way to look at your data from a 30,000 foot view and produce a cohort Which you can do a research study on and that's the most important thing So So i'm going to talk about a couple um Projects which we use harvest on but also is powered by postgrass. Um, so we are the um A main data aggregators is the like to call us of a consortium of 11 hospitals And we essentially house other pediatric pediatric data on 1.4 million unique patients 120 million unique observations What I say when I mean by observation a patient comes in for a visit They have a specific diagnosis. They have a specific diagnosis note that goes along with that 40 40 million visits 10 million plus diagnosis. Well, what's the overall goal of peds net? Well The children's medicine is is something where The federal government makes it even harder Obviously to select specific cohorts because they're children. So if I ask a child to be Consented on a research study We have to go through a little more hoops to do that. Well, what does this do this creates a huge de-identified data set Which allows these 11 hospital consortium to query So we have millions and millions and millions of records And someone sitting in boston children's wants to look at all the patients that we stuck in there from all 11 hospitals Who have a specific and general heart defect are between the ages of 10 and 12 and are hispanic or african american They can do that and they can get that de-identified data set and they can run and they can run um prospective studies and prospective analysis on those on those different data sets So why postgres for peds net? I don't know how many of you guys are familiar with docker dockerization Yeah, yeah, so so this is really awesome. And when we started using docker a couple years ago We it kind of changed our world a little bit the official postgres Docker container or docker image is is is amazing and is extremely functional And what what we do with what we do is we do a lot of spin-ups of testing We do a lot of spin-ups of development. We kind of live in a world where We have a very short development cycle for For what we consider any database schema changes So we're working with different electronic health record systems different vendors that have different database schema So what we have to do at the end of the day is is is bringing into a bringing into a unified schema in our data warehouse um, but by using A dockerization where we can up and down postgres inches is rather easily we have shortened our development cycle significantly um Our robust uses plpg sequel for dynamic elision loading. This is really cool Our kevin murphy who again was supposed to give this presentation He's written some pretty cool stuff with there's a lot of bulk loading into Into our into our pedsnet system We've been kind of given the entrusting role of um of doing this for all our for all the 11 hospitals They essentially send us csv or flat file formats of their data And we just we just pipe it into this uh pipe it into this data warehouse Another reason we really like postgres is kind of the easy authentication schemes. So we we attempted a while ago to To use my sequel and then have an ldap type active directory type um authentication around My sequel and it just was buggy as all hell and we tried our best to get it implemented We just said screw it. We'll just try it in postgres We upped our postgres um ldap we upped our Upped our postgres um instance behind our behind our pedsnet for internally within chop And we implemented ldap and within a week We had active record log on for for the back end scheme was amazing. It was absolutely amazing and just just That's that story is kind of like told around the department is one of the major reasons that That we are going to continue using this open source product and again the user base We actually The postgres postgres brick is actually really great. Um, this audit trigger 91 specifically functions an audit function We use rather robustly to audit all our data And I kind of just want to like say again, we're we're not database administrators We're not we're not corally database people. Most of the people in our department are kind of very uh Confirm either the python world or like a c-type language world. We're not we're not people who kind of Are more concerned about Database things are more concerned about end user things and and having such a great user base where we can say, okay I had no idea how to write this and in a plp just sequel I need to go find it You can go find it and uh It's pretty great in that respect so The next um kind of thing that that we use uh postgres to kind of back Is a is a as a query tool we we developed called verify And what verify does is it looks at Variant genomic data or what is variant genomic data? 99 percent or 95 percent of the human genome is pretty standard We call it we call it it's it's kind of Genomic data that we know doesn't really concern ourselves with right now There's this 5 percent of data that varies and this 5 percent of data Are the other kind of stuff we're looking at so for example someone with threats in my example have a Very specific subset of my dna which varies from a stat specific subset of your dna We'll work when we want to concern ourselves with that We want to say what specific variant in that person's dna causes a child of four to have a congenital heart defect What specific variants of that child's dna causes them to have a Causing to have a um more susceptible to a brain tumor at the age of five or six We want to concern ourselves with why that's happening So what we wanted to do is we we originally thought okay Well, let's just put this in a relational database and and it'll be fine And we know that's a terrible idea because genomic data is huge To kind of give you an idea of how big verify is well, we have 16,000 or 1647 data samples But within that we have 300 million We have 300 million Kind of variant calls and these are these these are these subset of dna fragments that that vary The problem with this is Our database is huge kind of give you kind of give you an idea the library congress all the data That's in the library congress right now can fit into 15 terabyte 15 terabyte drive essentially we have 47 terabytes of genomic data That's a lot of data. So We're kind of thinking well, this is probably not a good idea to put in a relational database But let's try to do it anyway. This is a few years back um before the before the whole uh four or five years ago before No sequel became a kind of a big thing which is where we're moving towards in the future with genomic data But um, okay. Well, how can we how can we how can we query this data? So we stuck what I talked about earlier our harvest instance our query tool around this genomic data And it's very slow, but it works So what a researcher can do is they can they can go into our verify model and say okay I'm looking for again this subset of patients with this subset of variants that causes congenital heart defects Go find me those patients and you can go and find those patients and and and it's uh, and it's rather powerful So why pgsql or why postgres? Well, this is this is actually really interesting. We had this we thought of a mysql Type thing, but we really like materialized views and kind of when we create our we can create our views having that data Purses and that's extremely important for when you query the data because it's not it's not going and looking in those different tables Those tables that are hundreds of millions record 100 million record long and and looking for a specific genetic variance We can already pre-populate those views with the specific queries that we're looking for that We know research is going to look for instead of having those researchers Hit go and have hours and hours and hours of waiting to query as very specific small subset of variant data I kind of want to give you this is relative. It looks relatively simple. It's not that big of a database um What we really want to kind of focus on is it's it's huge and There's no other way to store genomic data in a small way the human genome is is is Is is crazy 100 one human genome can be sequenced um And it's a sequence probably can can you know be hundreds of gigs just one human one human genome and that's that's rather impressive so Someone talk about a kind of a case study we'll read kind of incorporate this verify And this harvest that we talked about earlier and what this does is incorporates clinical data with genomic data So verifies strictly genomic harvest is strictly clinical and we did our best to kind of marry the two um, so what is pc gc the pediatric cardiac genomics consortium and essentially It's a consortium of hospitals that are looking for general heart defects in children and what they do is they send us um, they send us um Exome sequences and genome sequences of of their children and we we house in our database Again, I kind of mentioned earlier what variants are and what we do with this query tool The really cool part about this is this is the first time we did this on a multi institutional level for children We have never had a database or or a consortium on the pediatric level internally Which houses this much data and and where We had the ability to go and actually marry clinical data and um and exomic data And this is This is what I wish worked earlier because it looks really cool So this is what pc gc is and this is what kind of marries that harvest thing I talked about earlier So you are a researcher and you can imagine logging into your research portal and go This could be anything pc gc. It could be like I said earlier a Tourette study. It could be anything. It could be um congenital heart defects and you go, okay. Well, I'm looking for um male patients So you click research or clicks on mail and hips add the filter so then okay add the filter it tells me how many patients are there Well, I really want patients that Have a tissue location of the brain and the tissue stored at negative 80 degrees Um, I'm centigrade and that's extremely important because when you house tissues at different Different degrees quote unquote if you house it negative 80 it maintains RNA stability if you house it In liquid nitrogen maintains DNA stability And that's important and then the variance So if you click this pass button this set of subset of variance You're gonna are gonna pop up of the specific variant associated with disease x the specific variant associated with disease y Click click and an auto populates Here and then filters down And the cool thing this really makes everything great You hit view results at the end and you get a subset of deidentified children You get 100 or 200 or 300 people and you know where to find your former cohort and you know that Not only can you do studies and statistical analysis on that cohort You have the ability to actually go back to the irb Which is the institutional review board and what they do is there are the bridge between researchers and and research So if you so if you want to run a research study on children even adults You have to go through a very strict process in order to get approval to make sure it doesn't harm the patient, etc So you go back to the irb and go while I found these 300 patients and they're deidentified I'm going to write a request and I want you to give me all the clinical data But you never would be able to know how many people you could ever have in a cohort. You don't know if that Um that cohort even exists until you have a tool like this where you can easily go and query your data in a de-identified way So why postgres? Um, I think I kind of went went into this earlier. Um, we're huge fans of the augurding trigger function. It's important that we It's important that we maintain Um, md a kind of md5 checking This is extremely important when you want to mean when you want to ensure that the the data that you have in your database The data that we're getting sent, um, hasn't been audited or audited or um Not audited but deprecated in any way And and it's really interesting because we had a problem a few years back We were getting sent data from one consortium One consortium member and their data their data was actually getting um Data data was getting corrupted on the way before we instituted md5 checking Um, and then once once we started md5 the data and and kind of the ideas associated with it We were able to essentially say okay, give me the manifest Of everything you sent to us and we'll check it against our internal manifest to make sure What you sent us isn't corrupted and this is this is um kind of Something really cool because postgres has that md5 function natively So we could easily md5 our stuff. We also use md5 in in kind of like a in a In a more de-identified way So if I have an m around our medical record number I can't actually present that to the researcher We'd md5 the we'd md5 some set of the medical number Plus some other number plus some other we md5 it so the researcher never knows who that patient ever is at the end of the day so Future state Future state use what's coming what we're really excited about Um, I said before and I kind of want to start with number two is foreign data wrapping because genomic data Sits better and is queried better in a non-relational way. There's actually Um companies out there Curve versus one of them and and they're working on on distributed platforms the query genomic data in a very Fast and robust way google genomics is doing a similar thing We're envisioning running foreign data wrappers to associate with them with those with those with those um distribution system and those apis to query genomic data in a very uh very fast way And and kind of integrate it in a better way that we have now Where the genomic data sits in a relational database and and the guy who developed His name is met he actually the guy who developed Verified did a really good job Of putting these variant calls in the database in such a way where you can query them in a relatively quick amount of time But it's still way too slow if i'm a researcher and I clicked there after wait 10 minutes to query 100 million things I rather have it Rather instantly so we're really excited about that And kind of the other stuff everyone else is excited for better query planning json be data typing a lot of our Honest brokering and if you don't know what honest brokering is it's that de-identified part I talked about our on our electronic honest brokering is a total Um a rest ape arrest api So a lot of a lot of the things we have is is stored in a json form So the ability to store that in our database and query in a better Better way is really exciting for us And just percent calculations stuff like that Aggregation filters everyone else is kind of looking like oh, wow, this is great and very Very native needs is used for um data analytics And last slide And this is our team It's a rather big team, but uh kind of uh Like I talked about our three prongs. I was looking for kevin murphy's picture. There he is Yeah, this cool guy. So if you have any questions or problems the presentation call him And um Please follow us on What was that? Yeah, like Why I feel really bad because he called me up 10 days ago and goes hey, I have this presentation And I don't really have any Many notes for it, but I really want to give it and I go okay. Well, how are you feeling? He goes I'm not really feeling what can you do it for me? I was like, okay, I'll go do it So, uh, this is like so 10 days ago. I had no idea. I was coming to this conference giving this talk But our director really wanted us to To at least talk about something so we we really so I slapped kind of his notes together But um, I really wanted to get the harvest link to work And I'm going to tweet it out and put it on her on the wick in and it'd be really cool If you can go see it in action because it really is an amazing Amazing tool and and it really can be used for knowledge and notebook data But other types of data sets as well and we kind of envision this This harvest thing is we have this dream where we have like a thousand harvests where we have a ton of like de-identified data and we have a ton of um We have a ton of researchers out there just kind of looking through the data and in a very intuitive way Because like I said before a clinician doesn't care that it takes me You know three hours to write their 15,000 louse equals line sequel career to get their data They just care that they get their data and the fact that they can go to a tool like this and easily filter down their data In a very intuitive way that someone who's never touched sequel or and never touched a relation database And the life can go into this tool and then do that It's pretty cool. And um, it's something we hope that um, you know other hospitals start using again It's open sources on github. So So so for peds net are you thinking for in the peds net? So So we're kind of the brokers and everything. Um, we house everything. So it's it's our harvest instance um, and essentially what you do is it's actually Really Really intuitive process is what you do is you essentially go and and you sign up on our consortium And that's great. You sign up on our consortium and we know who you are because we know who you are and we say, okay You're fine. So now you can sign up. So you have a username and password and you log on to our instance Um that way but internally like I said before it's the kind of the LDAP potential credentials that we were using And we really love that. Um, we really love the usability of that because like I said before we were working with My sql in the past and the LDAP stuff seems to be really buggy. So when we started using Postgres, um, LDAP feature Kind of all that authentication to kind of ease a lot of our problems and um, we were just able to go from there So I really want this to work Do I have a minute That I uh, did I uh go a lot uh Do I go a lot faster than I was supposed to let's open it more as Yeah, I think I do. Um So let's go to our main website. So I thought I had my wife are working. So what you can do Is go to our main website and I'll tweet this out again Um, and you go to uh the harvest because it's like a harvest button And you click it and it just brings you up to kind of the harvest stack and explanation of what harvest is And then what we did was we actually put up a public harvest instance around open source data Open public data and you can kind of explore it that way. Um Things are really working out for me today, huh? Oh, there you go So I kind of take any other questions as I'm Going through this Yeah I mean how stable is that mom outside of say the research field if you already take this out to like Something where there's a social service where you're kind of feature service kind of things and things like that Second what you're doing collecting data pretending to researchers or in the program staff something like that and digestible way Do you feel like what you're doing is really only kind of need to the research area Or could it be feel like you've expanded out to others getting such services together collecting their needs and like doing something It's possible open source like this So I so that's a really interesting question and that's something we've kind of We've been really lucky as an institution where we've had a lot of um support from Administrative kind of our administrative like you said before the suits and ties Where they really put together a a cracker jack team. That's not only very technical but very forward focused I think that type of data integration and that type of um Type of kind of showing data that people who don't really understand data And that's kind of a big problem everywhere and the thing is we have this nice layer We have this nice layer Which is us which kind of separates the hard core data people with the hard core like clinicians So we serve as that layer and we serve as that forward facing Forward facing and you to the the clients we call them clients the customers of our programs But I think that model I think And that's really funny because you know, that's something I used to work at the University of Pennsylvania health system And that's something kind of you see everywhere And there's no easy solution to kind of determining how to talk to clinicians in a way that they understand And and kind of bringing together data as a whole and presenting it but the thing is we actually are We're hoping that these stacks and these solutions we make intuitive enough where someone with Limited amount of programming experience like a limited python experience can up one of these harvests This isn't put their data around that the problem is we're not quite there yet So we're not at the point where you can just take our harvests instance with no Knowledge of the jago web framework to jago web framework or no knowledge of the python of python and and how and and up an instance around your data We're working towards that and that's something we're hoping to have eventually But like you said that you kind of get sucked into everyone kind of gets sucked into the This whole you know, we have to be a dot-net shop where we have to use oracle because these are we pay for them So they have to be better and they have to they have to obviously they have to be Have to be a lot more robust and and have a lot better and support Yeah Yeah The thing about harvests is harvest is about I say about 85 percent of the way there What we did is essentially we actually there's an actually harvests institution at the University of Pennsylvania health system Which is um, University of Pennsylvania in Philadelphia. What we did is we gave them the harvest I can go Here's the harvest stack You guys have some knowledge of programming Go And they did and they upped it and we upped it and we helped them a little bit upping if they upped it and they got Their data around it and it looks really good The problem with that is sometimes Sometimes and this is you can you can correct me from my experience at least a lot of researchers They want to funnel their grant man and grant money in a very specific way And it has and they really want to funnel the grant money with with things that have nothing to do with data In their minds data costs nothing and data cleaning and data grabbing should cost very little like I enter the data But then grabbing it should be a very free and easy process and it's not and I feel like you know We have I've had researchers in the past Kind of go well, I don't know where to get the data But they want me to not pay for getting the data that I created But you have to pay you have to eventually pay someone to pull it if you don't know how to do it So I feel like there's that kind of intersection between Kind of the the research world and the is world and that that that kind of overlap is where we sit at chop and we're doing our best to Not charge researchers at this moment, but as we become more strapped catch like I said before every dollar that gets Put in the department of biomedical and health informatics is one dollar that doesn't get put into A vaccine for a child so we don't do our best to cut costs and maintain But not only do that but maintain a very competitive organization But that's you know, that's the question that we're all trying to solve and that's that's so hard because you That could be blabbering but you you I've met with researchers who Go to like an electronic health record like a reporting team and say well, give me this report I want to see all this and the reporting team has no idea what the hell you're talking about That you could say you could say I want this I see a noncode and this specific thing and they don't speak that language So I feel like there needs to be more of a need for researchers to see while you need to either find someone who understands that language Or you need to start understanding their language to get your data back out, but this is harvest so So so in this case like this is this is everything that's in your database. I'm sorry. It's kind of wonky This is everything your database. So in this current database we have 3,720 patients that the birth date is Is estimated 1,500 which not so I can go okay Well, I want everything that that isn't in a hit apply filter and I have all these patient Gender is male and diagnosis name is whatever and then I go view results And what this does is this goes back and queries the data and goes, okay Well, give me that entire data set If I wasn't on such a small screen it will work because I'm giving out zero patients. That's why Okay, two patients So you go view results and then it spits out kind of you know, this is de-aventified, but this is open This is open data for anybody who wants to go see it This is their age is their gender and you can go and download that information. You can go and download any format you want Go back to the query all that other stuff It's it's pretty neat It's kind of like like I said, this is this is very simple to use if you're a clinician So I go. Oh, well, I'm a clinician. I want this Well, oh well clearly I want this you know what I mean or or clearly I want You know It makes it makes a lot of sense to kind of look at your data on 30 000 of you and And kind of instead of saying well left outer join to the diagnosis table where on blah, you was blah A clinician just can go and go oh I want to look at all the white blood cell counts and look at the distribution of my cohort And I want to look at all the patients that are male all that other stuff So we're really hoping that this kind of takes off But like you said and you made a really really good point We're about 85 of the way there to where we make it extremely automated We haven't automated to the point where you can just say Give me a csv file and it'll port because we still have relational models You have to have some type of you know understanding of how relational model works So any other questions? Well, that's that's interesting. Okay, it's It's very slow because of the nature of genomic data The nature of the complexity of genomic data. It's essentially one field with hundreds of thousands of characters and That's just something that just doesn't work Well in a relational database regardless of what relational database you're in because It's just the nature of the vast complexity of how genomic data is structured So I think the big thing what we really like about Postgres and the thing we're really interested in is we really like our harvest stack We want to keep this on top of a Postgres model on top of the Django framework What we're really looking at now is foreign data wrappers and how can we access Non-relational databases and query non-relational databases in a sequel like way That the Django framework can recognize and then we can get really fast robust queries of Genomic data in a rather and in a rather robust and agile Robust and agile way and I think that's just the nature of genomic data And that's why I'm like you were saying with Hadoop and and in google genomics. Everything seems to be moving towards this Non-relational structure, which works really well for how genomic data is structured. So I'm sorry Yeah, and And we do and I think kind of I think kind of the big thing Is I want to kind of circle this back to how genomic data structure Because not only do you have genomic data like so so you have you get your exome, which is a very small set of your genome It's the x the smart part of your genome that actually encodes proteins and and you get that you get that um sequence But then what the machine actually spits out and what you actually do through your your your Kind of post genomic sequencing processes is you get all these annotations So you get you know At lion one million That protein does this so we get these you know vast amounts of vast amounts of data Which we just kind of throw into a database and and honestly at this point Verifier is great, but it's It's not being used as much as it should be being used because of how slow it is So I think honestly at the end of the day we're going to move to a We're going to move to a kind of this type of stack with a the no sequel model Right. We're not using any no sequel right now. We've really We're waiting for um companies like curavos That and and google genomics and and kind of how they're because google genomics every few years They want to take a stab at Genomic data and they kind of stopped for a little bit and then three years later It's like well, we want to do genomic data again and they start working on it again. Um, we really want companies to develop their their They're kind of non structuring as I like to call it and and how they query genomic data so we can access Kind of their their development and their framework and in a kind of a foreign data wrapper api type way There's this company like I said curavos called lightning api. They just got this grant. There's this like um It's actually really cool. There's this kind of like race to whomever can Find a way to Search the specific subset of genome sequences the fastest and whoever gets that kind of gets this like Kind of gets this award and we're kind of just sitting back and say, okay, you guys go and then whoever gets it It's going to be open source. So we're just going to access that so But we really right now everything's in a relational way But um clinical data is great in a relational way because the data is not nearly complex enough where it It causes a lot of you know and and clinical data And any EHR system is structured in a relational way So it's really easy to adopt those types of data models into our data models and port that data over easy Over pretty easily when it comes to genomic data, it's a whole different story and that's a problem the whole World's trying to biomedical world's trying to solve right now Yeah I We actually just um won a grant from the um From the federal government. I think was he was missed We got a 700 000 grant for um and they said okay. Well, this is a lot of the stuff what you do is great So here's 700 000 dollars to continue the funding of your department And we actually just got the thing is about children's hospitals is that we have millions of donors and that give us Tens of millions hundreds of millions of dollars to further patient care So in this in our mind this has a this could have a really positive impact on patient care So we're going to keep developing it till it gets to the point where resources can just take this and take their data And spin up an instance So having that ability to go back to our administrators and go back to our people and say okay We want to do something like this really like allows them to say okay. Well, we can keep doing it Yeah, absolutely But well, we like to we like to make it specific to children. So our donors go. Oh, yeah It's specific to children. This is great And and to be honest with you we've had a lot of positive feedback And my job specifically is to help researchers design data models that can be put into A harvest instance and and kind of integrate their data from their different sources and throw it in the here from lim System from a clinical system and we have a lot of people in the health system going. Wow This is this could really work for me. This could really work for our study. This could really work for our consortium There's a huge I don't know if you guys have ever heard of red cap. It's a research data capture It's this it's this it's this form that was created out of Vanderbilt has a mysql back end And this is essentially just this form and like it's like literally all it is it's research data capture So you'd say it's basically just like a bunch of like forms and the research data person fills it in It's it's okay, but it's it's kind of terrible when you're out for stuff So researchers need a lot of help and we're lucky enough to have the funding to go help a researcher take their 600 field spreadsheets and stick them into a relational database and and put up a harvest instance on it But our hope is that one day You know, we don't have to do that where a researcher A researcher can take the harvest stack and say okay. Well, I understand how they are of harvest stack works It's pretty automated for me. I understand that it's you know, it incorporates these different sources Let me just up the data I'm just going to you know hit the button and it works and we never get there And I personally don't think we'll ever get there because research data is too Mutable and research data is too dynamic, but we can get 95 of the way there Thanks