 Vsi bošli, da je tako dobro, če izgleda bilovskosti, nekaj je v svetu v multimodalji, katerična in biomegalna zvukovina. V sezavom način, je sezavom način za uvršenje KS, vzorom je zelo več zelo, da je vsezav, in zelo sezav je način, da je vsezav, in inčna, kako je inčna za vsega sebe in v svoj vsah, je bilo začeli vsebe v rada, in je bilo vsega na koncu. Zatim, da bomo videli na začaj na drugi mese. Znalej smo zelo, da bi se zelo, da je u vsega vsega vsega vsega inčna, pa je bolj sebe inčna. Selo mi izvorej doh prezentacijov, da smo zelo vsega vsega biometrji, In vseč smo... Moje učinje je to začnevati, kako je to za prima tim. Zato je... In zelo, da sem vse boš, da v 12.40 vsem zelo vse včasnih grubov in informacijne vsečenje. In, da se zelo, je tega nekaj del. potem preda izgledaj se kreči 我們 will finish the two sessions in come back with another link which is for the breakout groups. The link is in the agenda so you can find it there, but it's going to be sent out to your email around 12.30. So you will have it also in your email and will be posted in the chat here. We have two sessions today that are amazing. presumably in data governance infrastructure, and the second gloves session fire will be on technologies in tools to advance environmental health and biomedical research. But without much further attorneys, I would like to ask our next panelists to come up on the screen or on the room, we have a wonderful set of speakers, leaders from federal and state agencies and non profit organizations o AI in data. In sveč sem tudi tudi s Gwena Odinger. Čakaj. Čakaj. Mi je Gwena Odinger. Imam profesor at Drexel University v Center for Science, Technology, and Society in the Department of Politics. In imam zelo, da pričeš taj panel. The Associate Director for Data Science and the Director of the Office of Data Science Strategy at the National Institutes of Health. The Office of Data Science Strategy leaves the implementation of the NIH Strategic Plan for Data Science for Scientific, Technical, and Operational Collaboration with the Institutes, Centers, and Offices that comprise the NIH. Dr. Grigek advances research in computational biology, biophysics and data science, mathematical and biostatistical methods and biomedical technologies in support of the NIGMS mission to increase our understanding of life processes. She received the 2020 Leadership in Biological Sciences Award from the Washington Academy of Sciences for her work in this role, and she was instrumental in the creation of the ODSS and NIH in 2018 and served as a senior advisor to the office until being named to her current position. Dr. Grigorek received her undergraduate degree in Chemistry and Mathematics from the University of Michigan and her PhD in Physical Chemistry from the University of Maryland. She completed a Lady Davis Postdoctoral Fellowship at Hebrew University in Israel and a Sloan Postdoctoral Fellowship at the University of Maryland Center for Advanced Research in Biotechnology, now the Institute for Bioscience and Biotechnology Research in Shadygrove, Maryland. Dr. Grigorek, in the audience. Hi, thank you so much. Go ahead and thank you to the organizers for the invitation. Would you like me to share my slides or would you like to project them? Either one is fine with me. Okay, thank you very much. And then I'll just say next when it's time to advance. I'm really delighted that I'm able to provide perspectives on AI and data governance and infrastructure for the panel. I'm sorry, am I sharing them or are you? I think I misunderstood you. Go ahead and share them yourself, if you'd like, but we can pull them out. Okay, I will share them. Sorry, delay. All right, it just takes a moment to get everything up and running and to switch the panel. Oh, goodness. Okay, so just give me a sec here. All right, does everything look good on your end? Yes, that looks perfect. Awesome, thank you so much. And just as a note, if by chance I freeze, my internet is acting up today. There's no particular reason why. It's a beautiful day here in fine weather, but for whatever reason it keeps freezing. So I apologize in advance for any hiccups on my end or yours. So again, thank you again for inviting me to the panel. This is a really important topic, and I'm really happy that I can give you a little bit of our perspectives. And I'm sure that you're very, very familiar with the power of AI. I don't need to convince you of that, and of its tremendous achievements in science in my field, which is the field of protein structure prediction, protein folding, protein dynamics, when I was a researcher. Today we see some incredible advances due to artificial intelligence in general and generative AI in large language models in specifics, and in the ability to create very, very large scale structure templated databases of protein structures, like what's being used in alpha fold. Now it's possible, for example, just to take a sequence of a protein for which you do not know the structure, there's no experimental cryoam, there's no crystallography, there's no NMR necessarily. And for the most part, if it's been seen in even some way before, you can predict the likely structure. This is super important for drug design and for understanding the relationships between structure and any potential biological mechanisms or functions. And I'm just thinking here of some of the advances in alpha fold, which I'm sure you've heard about, but EMS fold, which is a large language model, it may not be as accurate as alpha fold, but it is orders of magnitude faster. And then there's work by Dave Baker and his group on Rosetta and all the iterations that really allow you to think seriously about designing of proteins. In addition to my field, there's fields of clinical studies and clinical data informatics that have advanced significantly due to large language models, which is using a lot of training on award triplets to predict next words, for example. But here, in particular in the work of laymen and colleagues, what they illustrate is that you can improve generative large language models that are very generalizable with very specific clinical models, and they did a moderately large language model trained on Memic 3 and Memic 4 data, and it did outperform the generative general AI large language models. And so that points to some things that I think people are really starting to explore, which is, hey, let's take JET-GBT and then improve it with our own data and our own models and then use it. And you're gonna see this in the healthcare setting for helping doctors write clinical notes or review notes. In addition, there are just a plethora of really interesting articles coming out, and, for example, the New Journal of Medicine has a new AI-focused journal, AI in Medicine. This allows researchers to explore the role of AI technologies in clinical and medical and digital health, and it provides examples of promising research and pitfalls of the application and I don't have to convince you that large language models in some cases AI generally does hallucinate. You have to really think about bias transparency and what we're getting as output as a result of the large training sets. So that brings me to some of the challenges. There are some challenges, and I'm sure you're aware of, aligning the data sets in algorithms to the use case in question is super important. It allows you to make sure that the data sets that you are gonna use and train if that's how you're gonna construct the AI is really represented of the ability to answer those questions. You're interested, and I'm interested, in integrating clinical and research in healthcare and related environmental data together. That's hard. You know that, and I'm living it too, and it's hard, and that is a challenge. There are biases to data in assessing the algorithm. We're promoting ideas of ground truth and transparency and trustworthiness. There's a new NIST AI risk framework for AI that's worth looking at and considering there are other work going on in the European Union on trustworthy AI that's also worth considering. Something that is coming up more and more is creating an inclusive AI workforce, and so I'll talk a little bit about that so that we bring all of our wonderful researchers across the United States and elsewhere to join the workforce in developing AI. And finally, AI is a fast-moving industry, and so is the infrastructure of AI. There are new architectures, new chips being designed that are very specific for AI being implemented in Google, for example, Microsoft, and other industries, and so what we're seeing today is surely, surely gonna change in the next five years, and it is incredibly challenging to keep up with this fast pace, so much so that most of my day is now spent on the fast-paced world of AI. Maybe you don't know this necessarily, but I'm sure you know this, that AI needs data. It is a data-hungry research field, but maybe you don't know, is that NIH, through partnerships with cloud service providers, such as Google, AWS, and Microsoft Azure, have made over 206 petabytes, and that's actually data for about two months ago, so it's probably more now, 206 petabytes of data available across these three clouds. I'm just listing a few of the data types that you see. It's quite vast. It's quite a wild west. I would not say that this data is necessarily harmonized or easily integratable, but it is there, and just as a point of reference, one petabyte is about 230,000 high-quality DVD movies, so we have a significant amount of data that can be harnessed for AI. Not that that's a possibility today. We have to build a lot of infrastructure in order to aggregate and utilize that data, but it is there, and it is an area of active interest at NIH and in the research community. Using data for AI and in other methodologies, as I'm sure you're aware, requires that those data, especially for clinical research, are collected in particular formats or particular ways. In particular, if we have a large study, let's just take, for example, the HEAL program, helping an addiction long term. There are many, there are thousands of investigators in the HEAL program, and they're all doing excellent research. Moving together as a field and working on a project that requires the input of thousands of investigators requires that we have to have some common understandings of what kind of questions we're asking and what kinds of data we're collecting so that we can actually form that integration, and there are many ways to think about this. One of the ways we're thinking about this is through the practice of common data elements. They play a very important role in structured data collection. Common data elements are ways to ask questions like, how tall are you and get a very formatted response, I am five foot five, standing up. So that's a way that we can all, that's a very simplistic way, that we can all discuss data together in a meaningful way and then hopefully integrate it. For the work that we're doing in this space, an area that seems to be underutilized is the construction of social and environmental determinants of how both together so that we can actually answer meaningful questions. So let's just take an example that we've been working on as asthma in order to have a large group of people working on asthma together. We have to have formatted questions in terms of describe your housing conditions, including the repair or disrepair of your house, your exposures to past, mold, and addition pollution and air quality. These are all incredibly important components of asthma, but so is understanding your proximity to healthcare facilities, your utilization of healthcare facilities, your food security or insecurity. These are also important questions and having a way to have that conversation with structured data that is collected in a meaningful and harmonizable way is important. And so I just want to bring this up when you're thinking of your considerations. This is something we're thinking about quite a bit, is how do we integrate determinants of health in a meaningful research capacity. So what's possible today with large data sets? I just want to give you two examples. There are actually many examples at NIH, but these two I'm hoping that you have some familiarity with. The first on the right-hand side is the Midrick Medical Imaging and Data Resource Center. It's an integrated very large set of radiological images. Most of these are CT scans. This was stood up as part of our COVID initiative, so of course we have a lot of body part images related to COVID, but now of course we're expanding that beyond COVID. What's really nice about this is that there are 54,000 case studies or cases of research here that has generated 135,000 AI capable, harmonized and ready to use for the community images. And so this is just one data set, but what we're doing with that data set is that we're integrating it into other resources, like the National COVID Covert Collaborative, which happens to have electronic healthcare and related data on COVID. So together we can get a much bigger and better picture of an individual who may have had a COVID trajectory, including long COVID, which is an important area for us to investigate on the other side is all of us on the left-hand side. We have over 342,000 participants. The data that we're collecting is very structured, very AI amenable. It's electronic healthcare data, wearable data, like your Fitbit data at the health. Survey data, we have a very wonderful survey for COVID, genomics data and family history. And so I'm showing you some examples of what researchers have done with the all of us data. There's just two, but there are many, including understanding migraines among adults with atopic dermatitis. It's a cross-sectional study in the all of us research program. This is actually done with European investigators using our data. And the one below is the investigation of hypertension in type 2 diabetes for folks who are at risk for dementia in all of us cohorts. So just two examples of how research or utilizing all of us are metric for their work. In addition, we have been working across our funded investigators to make the existing data that they've been collected more AI ready. And so you'll see the number of institutes and centers that we have been working with. And I wanted to just give you one example from our Environmental Health Science Institute is addressing the imbalance and makes misdiagnosis of data within the protect database. This protect database is an extensive data set of environmental and prenatal conditions of pregnant mothers. Data sets in sort of birth and prebirth have a lot of missings. They're missing lots of data. As you can imagine, it's very hard sometimes to have those regular check-ins about conditions in prenatal conditions. And so what we're doing is, or what the researchers are doing, actually is impugning these missing data using a super learning method developed by, in here you'll see the researcher and his coworkers. And the idea is to fill the data gaps in the preterm and full term data sets so that we have a better understanding of the relationship between environmental and prenatal conditions. Creating an inclusive AI research workforce and AI capabilities is an important goal for NIH and an important goal through our AMAHED program. The goal of the AMAHED program is to enhance participation and representation of researchers and communities who are underrepresented in general and to have them work both within their communities to utilize AI but also to develop new AI algorithms and to enhance our data collection of data that could be used for AI from underrepresented communities. This is very much a ground based effort. There's over a network of over 1,000 researchers working in this program. And what we hope to do is to address health disparities and inequities using artificial intelligence and also to improve the capabilities of this emerging technology with these communities. And you'll see that we have four main areas. Partnerships is incredibly important with the communities as well as providing research, research funding for these communities, training efforts in education in terms of how to use AI technologies and then the actual infrastructure that makes this all possible. This is, I think this is, indeed it is my last slide and it's just my short bit to brag a little bit about the program. In its second year we have a number of fellowships that are awarded to early care researchers. One of our fellowship awardees is now an NSF Early Career awardee. So I feel like we are making headway and really promoting the careers of young scientists to use AI in ML. We have 25 leadership fellowships to prepare diverse leaders and this may not necessarily be researchers. It could be activists in communities to help champion the use of AI in addressing persistent health disparities. New and innovative pilots to test new algorithms in AI. These are for AI researchers working with or are underrepresented in communities. Finally, a very large Kinex program which connects folks who are interested in mentoring or being mentees in numerous webinars in symposium. And with that, I just want to thank you very much for the opportunity. I'm looking forward to the panel discussion and perhaps that's where the question and answers will occur. And Gwen, thank you so much again for moderating. I'm turning back over to you. I hope everything went well and I didn't freeze. And thanks again. Thank you very much. And we will take questions all together at the end. So, right now, I'm pleased to introduce Jorge Calzada who is the Acting Associate Director for Platforms within the Office of Public Health Data Surveillance and Technology at the Centers for Disease Control and Prevention. This new office within the CDC seeks to advance the public health data strategy and data modernization initiatives by bringing together technology leaders with public health domain experts. Prior to joining CDC, Jorge spent the last 20 years building data science and machine learning expertise for organizations across several different sectors including energy, market research and artificial intelligence software startups. Jorge received his undergraduate degree in operations technology from Northeastern University, a master of science in information systems also from Northeastern and a master of business administration from the Massachusetts Institute of Technology. Very pleased to welcome Jorge. Thank you, Gran. I'm delighted to be here. I think the only thing that didn't get mentioned is that I'm brand new to CDC. I'm about seven weeks in not just to CDC, but public health in general. I'm, by background, a technology leader and I'm brand new to public health. But one of the things we're doing within this new office is partnering technology leaders with our deep public health domain expertise that already exists within CDC to try to modernize our infrastructure. Why does this matter? Well, we need to systemically address some of the gaps that we saw exposed during the last pandemic response. And I'll talk a little bit more about that, how we think about that within our new office. So the workflow that we see within public health is really ones around surveillance. So detecting and monitoring. So we have a division stood up for detecting and monitoring and I'll talk through. We think about some of the AI applications there around anomaly detection. Then investigate and respond to these outbreaks. Use that data to inform and disseminate the data and the insights to both our state and local partners but also the public at large and other researchers and be response ready. So that's really the goal of this new office and why I'm here at the CDC. So this was all new to me and so I'm always happy to share this. May not be new to you. But I was amazed to discover that the CDC actually has very little power to force people to share data with them. All the data that we get from the CDC is entirely voluntary. It's the participation of our state, local tribal and territorial partners that allow us to execute our mission in public health. So a little bit about this ecosystem. And we again we saw this exposed during the last response. We see an ecosystem where health care organizations are still using fax machines to transmit data about 70%. And they may not sound like an exciting use case for AI but it is to me. I would personally love to eliminate the fax machine from public health transmission. One of the other things I was, the USDS has done a bunch of case studies and time studies and what they discovered is that at the local level, at our STOLT partners, so state, tribal, local, territorial, the epidemiologists who work there spend 80% of their time doing data janitorial services. So data generation, data translation, data transmission, reviewing and approving that data. So we have a high degree of manual intervention, of burden that we put on our local epidemiologists to share their data locally and then eventually with us at the CDC. So whenever I think of large human burden, I naturally think, how can we offload that burden onto machines? So to me, these are the exciting use cases for AI that our office is responsible for. And again, we have missing data elements that are really critical in understanding equity in our response. So things like race and ethnicity are missing in 30% of the cases and when you're trying to determine if there's any disparity in how immunization is impacting one class or another, this data is absolutely critical. So one of the prompts was about what are the areas of need? Well, I think the brittle inelastic data pipelines that come into the CDC is the biggest area of need for artificial intelligence, for multi-modal artificial intelligence, things like computer vision, natural language problems, things like computer vision, natural language processing, could all be brought to bear on this problem to make this pipeline much more elastic, much more response ready, much less human intervention in getting this data to us. Yeah, I'll talk about some of the new roles and processes and how we make the CDC AI ready in a bit. So like I said, I'm brand new here, but there are plenty of data scientists already existing within the CDC who have used computer vision, natural language processing to develop some solutions to help us with our public health mission. So one of the ones I'd like to talk about is an example of computer vision and let it not be said that we are not without humor at the CDC because the tagline to this harnessing machine learning to eliminate tuberculosis otherwise known as Hamlet is TB you aren't not to be. So there we're leveraging computer vision to provide a quality assurance function on radiologists worldwide who are screening for tuberculosis. So there this is an AI that runs in a batch mode. So overnight it ingests all the x-rays that have been processed and scores the performance of each radiologist and highlights areas of discordance between the model and how they scored or how they diagnosed the tuberculosis. One of the other use cases and again around computer vision, one that is near and dear to the CDC and our heritage around outbreak response is Legionnaires disease. So the Legionella bacteria causes a pretty nasty case of pneumonia. It's caused by inhaling small droplets of infected water or swallowing that infected water. And when there's an outbreak as you once you've detected the outbreak you move to that investigate and respond mechanism. One of the tools that we've built to help our state and local partners respond faster to these outbreaks is identifying potential sources of that infected water. And it turns out that large cooling towers for heating HVAC, heating and AC in large buildings is a breeding ground for this bacteria. So using computer vision tied in to bings in Google Maps API to grab satellite images, we're able to spot all the potential cooling towers where this outbreak might be occurring. I didn't think I'd have enough time, so there's shit. But some of the other use cases is forecasting suicide risks. So building a real-time forecast model, real-time here is a matter of weeks or the same week where we're able to estimate a suicide risk based on a lot of the data that we're observing. Our data scientists in the chronic space are looking at environmental factors that contribute to certain disease prevalences. And one of the things that they look at is how the built environment has contributed to it. So we look at things like sidewalk inventories using computer vision models to detect the presence or the absence of sidewalks and providing scores to neighborhoods. And again, some other use cases for natural language processing where we use word embeddings and classification trees to provide some missing information, especially here around opioid use. So with that, I think I answered most of the prompt, but some of the things that when we think about what's missing within CDC specifically around being ready to apply AI at a large scale it's really new roles. So I'm hiring some of the machine learning engineers, ML ops engineers, which is probably a new role to CDC designers, product people, because we think about these as products to be consumed. And then systems, I spent a lot of time in data science and I saw it evolve from an artisanal craft where you were building models and then if you wanted them run you'd go to the data scientist and ask them to run it. Well, that doesn't really scale. We, where industry has gone is really this idea of an AI factory where you have a machine learning operations platform that governs the entire life cycle of machine learning from design, development, deployment to make sure that you build upfront for this idea that it will be operationalized. And so it needs to fit into the existing code base. It needs to play well. It needs to be built for scalability, for safety, for ethics. You do that upfront at the design phase. You don't go chasing it after the fact after you really stood out into the wild. So those are infrastructures that I'm looking to build. But also simple things like the ability to host solutions for our state and local partners because they often don't have that ability within their own departments to deploy a solution. So we are essentially mimicking the services of a software as a service company within the CDC. So thank you very much, Gwen. Thank you. That was impressive for just having come to your position and we really appreciate your being here. Our next speaker is Janet Haven. Janet is the Executive Director of Data and Society and a member of the National Artificial Intelligence Advisory Committee which advises President Biden and the National AI Initiative Office on a range of issues related to artificial intelligence. She has worked at the intersection of technology policy governance and accountability for 20 years, both domestically and internationally. Before joining Data and Society where she previously served as Director of Programs and Strategy, Janet spent more than a decade at the Open Society Foundations. There she oversaw funding strategies in grant making related to technologies role in strengthening civil society and played a substantial role in shaping the field of data and technology governance. Janet started her career in technologies startups in Central Europe and lived in the region for more than 10 years, deepening her understanding of the ways the internet and algorithmic technologies impact communities outside the United States. She sits on the board of the Public Lab for Open Technology and Science and advises a range of non-profit organizations. She holds an MA from the University of Virginia and a BA from Amherstv College. Thank you very much, Janet, for being here, for all yours. Thank you. My pleasure, Gwen, and thanks so much for asking me to participate. And thank you to the other panelists. These talks have been great and I'm excited that my talk, which is really focused on the issue of governance of AI and where we are at in governing these systems, really picks up on a lot of the themes that others have raised. So that's really fantastic and thank you for that. Very briefly, I'm the Executive Director of Data and Society, we're an independent non-profit research institute. We study the social impacts of data-centric technologies and automation. We really center people in our work. We tend to work through a social science lens. And we also do policy engagement work, which is where my participation in the National AI Advisory Committee comes in, and I should say, as I believe I am actually required to do, I am not speaking on behalf of the National AI Advisory Committee. I am speaking in my individual capacity. So I should always say that. So I want to start by saying something, setting us up with something a bit obvious, I think, which is that how we govern AI and the rules that we're setting to govern AI is a really big issue right now, both in the public discourse and in governance around the world. We're seeing a lot of movement around AI governance in both in the United States particularly in the executive branch and within agencies and also in the EU. And I think it's important to say that the reason for this is that there is an increasing recognition that AI is a democracy issue. We're seeing concerns about discrimination against protected classes via algorithmic systems. We see amplification of mis and disinformation on algorithmically mediated social media platforms and there are a range of other ways that we've seen and documented harms that AI systems and algorithmic decision making systems have brought about and at the same time, we know, as our other panelists have talked about, the incredible benefits that society can read from AI systems and so that need to both protect and to create space for innovation is critical. Technology governance is not new. AI raises some novel questions and challenges but the core issues of protecting our rights and ensuring enforceable accountability of technical systems and of the entities designing and deploying them remains. And to be very honest, we haven't gotten that right yet. So this is a real moment of opportunity because of the urgency around the regulatory and governance conversation. So two points that I would put on the table to start with. One is that the Biden-Harris administration has put out a call for a request for information because they are planning to develop a national AI strategy, a whole of government approach to AI. And my position is that this is a moment where we have an opportunity as a society to articulate a core set of values and of commitments to equity, to access to opportunity and to a rights-based framework that I think should guide American AI policy. The second thing that I think is important in framing out that national AI strategy is that we need to build an AI research and development ecosystem that prioritizes the understanding of societal and social impacts of AI, including environmental health that works alongside technical advancement and innovation. Without understanding the impact on people and on the environment, we really cannot govern AI justly or sustainably. So what I'm going to do is just give a quick tour of the highlights of our current AI governance situation. And I'm going to start with the EU because they are way out ahead of everyone else on that front. The EU is working on what's known as the EU AI Act. And in fact, yesterday, the European Parliament passed a draft law, a draft version of that law. The final version of that law is not expected to be passed until the end of this year. And the important thing to know about the EU AI Act is that it's based around a risk-based framework as a core assessment tool of AI systems in context. So they are not regulating a technology, particularly they're regulating the use of that technology and they've defined four levels of risk classification from high to low within the EU AI Act. And I think one of the things that's probably critical for this conversation and this community is that the members of the European Parliament expanded in their negotiations in May over this, expanded the classification of high risk areas to include harm to people's health, safety, fundamental rights or to the environment. And that was, I think, an important expansion to include the environment. In the United States, we're not quite as far along. So there are a couple of major developments, one of them Susan mentioned. And I'm going to start with the release by the Office of Science and Technology Policy in October of last year of 2022 of the Blueprint for an AI Bill of Rights, which came out under the leadership of Dr. Alondra Nelson. This presents a rights-based approach to governing AI. The blueprint is not enforceable law, I should say, it's essentially a policy blueprint, but it is not, most of it is not enforceable. It calls for five core protections and guarantees for the American public around AI. Protection against algorithmic discrimination, safe and effective AI systems, data privacy, notice and explanation, which means you should know when an automated system is being used and understand how and why it contributes to outcomes that impact you. And what's broadly known as a human in the loop, human alternatives, consideration and fallback, that is, there's somebody that you can speak to if you, if you have a concern about an AI system that's making a judgment about you. Currently none of these things are in place. We've seen some movement on algorithmic discrimination through an executive order that President Biden put out in February, which directed all agencies to protect the American public against algorithmic discrimination using existing civil rights law. And that was actually quite a big step and a big departure to draw on existing law and ask for it to be put into action when we're seeing harms coming from algorithmic systems around discrimination. So that's a big step and we've also seen some agencies say very clearly that they intend to do that and to take that forward. The National Institutes of Standards and Technology, NIST, released in Susan mentioned this, the AI risk management framework in January of this year and that followed a very long and I think very thorough consultation process with industry and with some independent groups. The important things to know about the risk management framework is that it is also not enforceable and it is not intended to be enforceable. It is a standards-based tool, governance tool and in fact it is intended to be quite broad and applicable across a number of different sectors and uses. And so what NIST, the stage right now is that NIST is asking a number of actors in the field to take the risk management framework and apply it to their own situation, their own company or product to essentially develop a set of user profiles or case studies. And so I see the risk management framework as a really excellent place to start a standards conversation. It also uses obviously a risk framework but unlike again, unlike the vision for the EU AI Act which will become law or the intention of something like the AI Bill of Rights which is rights-based, the NIST standard is not enforceable and that really leaves a lot of stupid space for interpretation and also for questions about how accountability mitigation and redress happens when harms occur in AI systems. And so finally, Congress is also in on the act. We've seen there have been congressional hearings with AI leaders. We'll see more of those, I think, as congressional members are trying to figure out what it is that they're regulating and also what power they have over the kinds of concentration we see in the AI industry of data and compute, money and talent in a few AI companies, a very few companies that are essentially holding most of the cards right now. I think that we're not as close to actual legislation around AI given how hard it's been to pass even basic data privacy laws and other types of fundamental protections in the technology governance space at the federal level. But what I think we do have right now is a huge opportunity to bring environmental concerns to the forefront of these conversations. They've been in the mix but haven't been foregrounded. AI, environmental impact of AI is usually mentioned in AI governance text but a major obstacle that we have is a lack of information about what the environmental impact of AI systems actually is. We don't have good measurement systems for this at all. And to that end, the OECD has formed an AI expert group on climate and compute. They released a report last fall focused on that question of how do we improve the overall understanding of the environmental impact of AI systems? And so that report is really, I think, is a valuable read. It distinguishes between the direct environmental impacts of developing and using AI systems and the indirect costs and benefits of AI applications which I think really gets us back to that issue that we need to understand the social impacts, the societal impacts of AI to be able to govern it. Even choosing what the measurement standards are in terms of environmental impact of AI systems is really a question about people in society and less about the technology. So I think just to wrap it up, a couple of big questions I think that we're facing in this community is how do we better understand and factor in the environmental costs of AI in this emerging governance ecosystem and how do we ensure that those societal impacts are visible in our central to the governance design. And I think equally important and Susan also talked about this in her talk, the design of AI governance systems has really not been participatory. The loudest voices that we have are those from the AI industry itself. The people who are most directly impacted by these technologies should have a seat at the table. And so I think there's a very big question that needs to be solved in governance design. How do we ensure participation in governance in meaningful ways that truly shifts power that isn't just a box-checking exercise. Thanks, Quinn. Thank you, Janet. Our next speaker is and our last speaker is Dr. Suzanne Dorci, who is Deputy Secretary of the Maryland Department of the Environment. She manages a portfolio that encompasses regulation and enforcement of state and federal environmental laws, multi-jurisdictional restoration projects, climate policy, and environmental justice. Before this role, Dr. Dorci worked with the agency's Water and Science Administration on Chesapeake Bay Restoration and on major issues that require cross-agency collaboration on climate resiliency. She previously was executive director of the Harry H. Hughes Center for Agroecology at the University of Maryland and also was executive director of the Baldhead Island Conservancy and Smith Island Land Trust for 11 years. Dr. Dorci has been a former commissioner of the North Carolina Division of Public School Management and a professor at University of North Carolina at Bloomington and Salem College. She has her bachelor's degree in biology from Drew University, her master's degree in marine estuverian environmental science from the University of Maryland and her PhD in oceanography from the State University of New York at Stony Brook. It's all yours. Thanks, friend. If it's possible, can you guys present my slides? I really appreciate that. And if not, I can just go without. So, Dan, thank you. You are a perfect thing. We were mentioning the role of environment in AI and I'm really gonna pose a potential opportunity and really am grateful to be able to test this idea out with all the experts on this panel and participating in the workshop. So, the Chesapeake Bay, you can go to next slide. Chesapeake Bay restoration is one of the nations and honestly the world's most complex and I would even argue successful examples of ecosystem restoration, multi-jurisdiction ecosystem restoration. The restoration effort has been in place for 40 years and I wanna start centered on equity. We recently had a report that analyzed communities in our Bay watershed, which encompasses six states and the District of Columbia. And the communities in watershed that have the highest equity scores on EJ screens, whether it's EPAs, Maryland has one, University of Maryland has one. And one of the really interesting findings is there is a two-way relationship between areas with equity issues, areas where there may be high poverty and the lack of success of restoration and then the health. So, absolutely we are beginning to build a strong case for the relationship between equity, environmental health and human health. It goes absolutely both ways. Where do we fail to meet our pollution reduction goals often in communities that have been redlined or have suffered systemic racism? So, the Just Big Bay restoration is really impressive and important primarily because it's based on a very rigorous quantification and verification system. In fact, every restoration practice has a scientifically backed quantification system and often in addition has measured outputs as well. So, for 40 years, we've been implementing restoration. We've had some really important changes over the years. One of them was in 2010, the establishment of a total maximum daily load or TMDL, which is the pollution diet regulated by EPA. And then most recently, there was a report by the Science, Technical and Advisory Committee of the Bay Program that had a couple of conclusions that I really want to get into. And then, at the end of this, posed the question of, is this an opportunity to apply AI to environmental restoration goals? Next slide, please. So, I'm not going to go over the details, but it's a complex system, right? It's a system that starts far upstream in New York and it concludes at the mouth of the Just Big Bay in Virginia. There is a substantial amount of science going back well over 100 years, studying the ecosystems, the responses, the habitats and the critters that live and depend on the Bay, as well as now a growing set of social science that looks at the connectivity between environmental well-being and human well-being. As I said, there are quantifiable outcomes that we're seeking with our Bay restoration and metrics for how we achieve those outcomes. Next slide, please. The approach for the Just Big Bay restoration has really evolved to look at things from an integrated perspective, right? So, there is a Bay model that is frequently updated and improved. There are real-time observations in the environment and then there's the measurement of the outcomes. And after 40 years, we have determined that, you know, our restoration outcomes have not been achieved to the extent that our model expected them to be achieved. Next slide, please. If you can go back one. One of the factors, of course, that has caused uncertainty and gaps is the impact of climate change, which 40 years ago was not integrated into restoration. So, as the dark green line demonstrates, the restoration path needed to be adjusted based on the impact of climate change, which requires, it basically makes everything harder and it moves the goalposts. So, you can see that over time we've had to adjust the rate of restoration, increase the amount of our pollution diet over time. Next slide, please. So, with all this, and again, just on the left-hand side of this is the public policy, the tools that we use to achieve environmental restoration across all the jurisdictions. And on the right-hand side is a example of the response. And this Caesar report that came out this year highlighted the fact that despite there being improving sophisticated modeling, the response gap is obvious in almost all of the outcomes that we're looking at. So, where we are ending up versus where we modeled to be, generally we are underachieving our restoration goals, even though we've made progress in achieving health and habitat metrics for the Chesapeake Bay. Next slide, please. So, this is an amazing time. There is, in the Chesapeake Bay restoration there is a milestone and that milestone takes place in 2025. And so, right now we have the opportunity to reimagine what restoration could be and how we want to move forward both with the metrics we use to achieve restoration outcomes and the outcomes that we're going to be benchmarking to determine whether or not we're success. Certainly equity becomes much more of a important benchmark because, as I said, it is not only linked to human well-being but it's also linked to environmental well-being. Next slide, please. So, my question and I'm with all of you esteemed artificial intelligent experts but my question and in collaboration with University of Maryland Center for Environmental Science is is this the time to integrate AI to support the environmental outcomes and public health outcomes that we're seeking. So, we have a set of environmental actions that are backed by science and continuously improved. I would like to see an enhanced and expanded environmental, real-time environmental monitoring system that includes scales, remote as well as in situ monitoring to communicate with our model and at what point my question to my data scientist is at what point does the model become irrelevant can we move to AI or can we? Can AI identify the gaps and opportunity to inform environmental actions and again, this needs to be both on the environment side and the linked human health side of the equation. Next slide, please. So, here's an opportunity a foundation of consistent metrics of uniform metrics both on the implementation side as well as on the measured outcome side. It is a complex system but a system that is well studied and well understood even as it relates to climate change. And we are right now looking for the next generation of environmental restorations and public health tools to inform our daily decision making, our annual decision making and our funding strategies at the state watershed scale. Next slide, please. Thanks a lot. Appreciate your time. Thank you to all of our speakers. We have about 15 minutes for questions. I wonder if I have a few from the internet and remember that if you are on a Q&A button at the bottom where you can put your questions and we'll get to those. If there are any questions in the room how will I know about those? Oh, there. Darrell, I see your hand. Thank you. Great set of talks. I really appreciate the additional info but Susan, I had a question for you initially and it was the first time that I sort of had this feeling that I guess contemporary Americans have with regard to being scared by the power of AI and so I spent four years in Vanderbilt in biophysics, circular dichorism, NMR, crystallography and when you put forth that proposition with respect to the tools I actually had a moment there and so could you say a few words about the actual comparison with regard to the models the protein-protein interaction models and say crystallography in terms of the agreement between the two and what might that portend for postdocs going forward? Absolutely, so speaking in general about the particular algorithms, alpha fold EMS fold and Rosetta that's got a longer name, Rosetta X single or something like that I think that's the question, what is your question comparing the algorithms where they're strong, where their weakness is and where there's a real need for research in the future I think that's what you're getting at, correct? Yes, thank you. Okay great so alpha fold this is really you probably remember the CASP structure prediction I don't know competition is the right word that I actually participated in as a postdoc so it's kind of a long-standing competition you're given a sequence and you have to predict the three-dimensional structure alpha fold came along and it started with a pretty reasonable guess let's get a template and then refine those with some energy minimizations and it did okay but then it realized that really this is a pattern matching problem and so if I take everything that's been crystallized and I develop template structures and then I do pattern matching in really sophisticated ways with AI, the work that they've done has been outstanding and really you know really quite predictive especially if certain elements of the structures are found in many different proteins and so that's really great for structures that have been crystallized then comes large language models which are incredibly fast if you play with chat GPT it gives you answers in a fairly rapid capability so taking those large language models and really trying to look for ways in which the algorithms can predict orphan proteins or proteins with very limited sequence homology it's quite good but it's never but when it looks at algorithms that are already crystallized and alpha fold is really quite superior the large language models don't do as good which is you know for me just a little bit surprising since it's a you know the way that it's structured it should actually I would imagine outperformed so I'm thinking that an interesting area of research if I were a researcher would be to really look at the structure of the large language models and how they're incorporating data from databases and I really would love to see researchers incorporating literature into the large language models including and the data from the databases I think there's just an enormous potential there to improve a large language models and just to really advanced AI and Rosetta Stone takes a slightly different approach where the language models are more on the sequence and not on literature so that they can actually predict and design new proteins with new sequences and so you know it's a really interesting field and I think there's going to be a lot of fast moving capabilities here and where I'm really excited is about using the integration of literature that's peer reviewed and considered highly vetted with data and databases to advance curation, to advance structure prediction or just to advance new technologies in improving large language models so I hope that kind of gets to your question. Next question comes from Jana Asher. Are there any efforts perhaps to the UN to create an international framework around AI ethics and deployment? I can jump in on that. So there are yes there certainly are a lot of discussions about that right now the OECD has released a set of has created a set of AI ethics principles which are pretty high level right now there are I would say there's a lot of focus on national strategies so for instance the UK has also put out a kind of national AI plan on the other hand there have been a number of calls recently and I think to some extent this has been led by again AI industry leaders for a international regulatory body that would govern AI or provide a governance framework for AI internationally that is still very much at the idea phase that UK just offered to host a summit on that this fall I'm not sure if that's going to happen or not yeah but I think there, I mean my personal feeling is I think there are a lot of pros and cons to that when you are talking about a essentially something of a borderless technology and I think the challenge of course with principles in which there are many and many companies have released them as well is that they're just not enforceable and so I think that is the draw of something like the EU AI act of seeing things like the executive order that mandates protections against algorithmic discrimination those are the kinds of behaviors and actions by governments that are going to have real impact on people's lives thank you and then we have a question from Soski Sharma that's related but perhaps more for Suzanne what should be the ideal framework that can be implemented in developing countries for creating AI based models for environmental restoration and protecting the vulnerable hotspots of biodiversity yeah I think that's a really great question I know that NASA is looking at some remote sensing schools that are connected to biodiversity so obviously data is essential but the second thing is quantification beginning to quantify actions and those are tools that we can begin to amplify there's a Kester Peak Bay report card put out by University of Maryland Center for Environmental Science and that report card is being widely used across the globe and again building on the pools of quantification verification and outcome monitoring I think that allows us to begin to apply but you know I would argue starting in a data rich place like the Kester Peak Bay watershed will certainly inform those actions associated with goals large scale goals especially in developing nations so Shirag has been waiting hi there, I'm Shirag I'm not part of the panel but spoke yesterday and I'm a bioinformaticist from Harvard University so I have a question it's great to see our federal resources being spent towards AI and data science as Dr. Kozalda you will know the CDC has supported fostered impactful surveillance data used for environmental health decisions such as the CDC in Haynes for example, every week in our top journals, JAMA New England Journal of Medicine it's not unlikely we'll see a study that has been done on these data so I ask what types of AI or data science resources are being reinvested into those types of resources for us to enhance our environmental decision making in broader question about our infrastructure building, how do we use these resources that we're talking about to better have data set to talk to each other like fantastic resources that Dr. Grigorek presented of all of us what are the AI tools, data resources that we need to be using to make sure that in Haynes talks to all of us and back and forth thanks very much maybe I'll put the second one first that's a real challenge within the CDC as well we have I was again amazed to discover that CDC stands for centers for disease control not the center so we have roughly I think somewhere between 12 and 15 centers, institutes and offices with their own data models their own funding their own different disparate capabilities some have data engineers some do not so that there are some real data silos that exist and one of the opportunities I think for our division to address is connecting all these disparate data sources so there's a couple ways to do that you can build a centralized analytics platform which is pretty straightforward or a novel approach that I'm investigating in the semantic integration so leave the data where it is because there's good reasons for the different data models that are very mission centric in the support of the different centers instead represent the data through an ontology connected ontology through a graph database structure so now you've built a semantic knowledge graph that can be queryable through a query language like sparkle and then in turn talks to the data fabric and pulls the right data so that's what I'm thinking about as far as how we integrate these disparate data sources we use ontologies to tie them all together sorry just to sort of build on what my colleague is saying from CDC and we're probably very closely aligned that's surprising I think the science the world of data science is moving in this way and with 27 different institute centers and offices and many, many different programs we see this problem repeat itself quite a bit and so the work that we're doing right now is of course to create federated data infrastructures just exactly what as you're saying where we're looking to integrate data across different resources like all of us and outside of our programs the NHANES study we've been looking at harmonizing and integrating data models and the semantic capabilities is certainly something that we have been working on as well harmonizing common data elements and ontologies there's a lot of really good work Chris Mungle for example I think he's at Berkeley he's been doing some really good work on large language models and ontologies and I can tell you that is a fast and exciting moving field another thing that we're thinking of is investing in large language models for data harmonization integration by looking at ways in which we can integrate together very many data models and so I think you're going to see a lot of really good research in this area rather than the ways in which we have done it in the past which is brute force harmonization through people skills which is incredibly slow and painful and expensive and I think we can utilize a lot more modern technologies and algorithms to speed that up considerably We have a wealth of questions and I'm afraid we're not going to get to everyone but I think we can sneak in one more and this is again for Dr. Grigorek do you have recommendations on how exposure information could be most optimally captured in medical records for later AI analysis Oh my, I'm so excited you asked that question because we're updating our strategic plan for science to exactly call this out as something that is needed that we really absolutely need to integrate exposure and environmental what we're thinking of is determinants of health into our electronic healthcare records at least for our research side EHRs are designed mostly for payer insurance companies as well as for healthcare provider communication we are trying to leverage those and we are having a big push we're talking with the office of the national coordinator on social determinants of health integration into EHRs as well as environmental determinants of health so this is something we're calling out, I can say we haven't solved this problem but I'd love to have the community work with us on this journey because this is incredibly important so thank you very much for bringing that topic up Alright so we are out of time I want to thank all of our panelists one more time so we are very grateful to have had you as part of the workshop and I hope you will stay for the rest of it we have one more session but I believe it's after a short break make sure we are heading right into the next session ok, so stick around for the next session I guess I'll go on and get us started thank you so much we'll make a quick transition here so hello I'm Allison Matzinger-Reif I'm Chief of the Biostatistics and Computational Biology Branch at NIEHS I've really enjoyed the opportunity to listen in on the talks and panels so far and I'm equally as excited to hear from our upcoming speakers in session 5 so this is really a group that knows a lot about both sort of technologies for collecting data and really sort of how to implement those technologies in a variety of settings I'm really excited to hear more about so I think it really brings a number of things together here our first speaker is Dr. Lorenzo Henkla from the Department of Defense he does a lot of work on wearables technologies is in sort of the program executive office for chemical, biological, radiological and nuclear defense diagnostics and so I'm really excited to hear what he has to say Dr. Henkla, are you with us? Again, as we've done all along let's try to hold back questions until all the speakers are completed their talks and then we can have some time for Q&A OK, yes, this is Lorenzo just doing a mic check can you hear me? we can't see anything yet in a second to try to share my screen of course it was there for just a moment how is that, can you see my screen? nope, it still says it's starting to share the screen double click to enter full screen mode there we can see it we see the power point it's not in presentation mode but we do see the power point perfect all right good deal first off, good morning everyone here is the opportunity to talk about some of the stuff that us within the Department of Defense and specifically within the DOD chemical and biological defense program some stuff we're working on within AI, ML and more specifically with monitoring warfighter, human health readiness and performance I have a handful of charts here over the next five maybe ten minutes just to kind of walk through give you an idea of some of the stuff that we are working on and some of our challenges here at the back end so as I mentioned I am with the joint program executive office for chemical biological radiological nuclear defense it is a mouthful but basically we are part of an enterprise where that our mission space is protecting people protecting warfighters from hazardous threats that are present or possibly placed in the environment by somebody wanting to do something bad we all just live through depending on your perspective we are living through the COVID-19 pandemic the pandemic has formally closed out but we have small kids it seems like the illness thing is just non-stop now but that was just one kind of primer that was very much open in everyone's mind and in everyone's life on one aspect of our mission space the bio-thread and how it can absolutely just affect everything you do affect readiness, affect everything that we are able to do both as people as well as the military's ability to be prepared and be able to respond to whatever mission that may arise there is also our mission space that deals with hazardous chemicals that are within the environment and being able to monitor and protect people from those things so in terms of what we actually do we are in what the DOD called the advanced development space so we are responsible for the joint services the Marine Corps, Navy, Army and Air Force as well as our special operators building equipment to be able to protect them from CBR and threats so we basically have two portfolio elements there is a medical portfolio and what we call a non-medical portfolio so medical that involves medical equipment prophylaxis, vaccines those types of medical things non-medical are suits, boots, gloves, mask sensors and other things that look at threats in the environment wearable sensors for us is very interesting and it kind of spans the medical and non-medical and this was an area that actually accelerated very much for us as part of the COVID-19 pandemic where I feel like kind of a group person that was in the background of the COVID-19 response that began in 2020 I think fairly known that the DOD helped a lot with that response our organization, the JPEO as well as the Ken Baio Defense Program we were front and center with doing a lot so through our partnerships and other things we helped develop some of the initial technology that actually went into the vaccines and other things one of sort of the interesting spaces that came out of the COVID-19 pandemic was a investment in wearables we were looking at wearable sensors fitness trackers, Apple watches those types of devices that are worn on people and pre-pandemic that to try to predict when someone has been exposed to some kind of hazardous chemical or when they may be getting sick and that sickness was more focused on potentially being exposed to some kind of biological threat. When the pandemic happened we very quickly pivoted away from biological warfare agents to COVID-19 and so what we found is that's an additional way for us to really send and can really start to fundamentally change how that we protect our warfighters from the CBR and Ken Baio Radmuq threats that are out there it's an extra layer, it's an extra tool that's in our toolkit. We have lots of sensors that we can deploy all across some area of operation looking for threats in the environment they're somewhat expensive, they work very, very well now this is an extra layer an extra tool that we could apply a $500 smartwatch on someone and have that we can do all of our homework and get the network equipment and other things, the technical back-end set up, we can now monitor each individual person and have that feed into some larger kind of command and control system to be able to monitor for a virgin threat so it's very, very interesting for us within the combined defense program. Now if you take a sort of step back and sort of look from a technical lens, more or less we say wearables, we're really focused on physiological monitoring and basically what that is is a device of some kind that a person wears it either on or maybe very slightly penetrates their skin and it feeds data to an algorithm that algorithm is finally tuned for some threat some thing that it's looking for and that algorithm then passes information via some kind of network architecture so that someone can make a decision this is all about influencing improving performance to help the lives of our war fighters, this is more or less kind of the general flow devices to algorithms and then to a decision maker via some kind of architecture when you piece all of those together you get a capability and I call this, we call this our sunshine chart this is sort of the wheel of capability of investments that are across the DOD in partnership with academia and industry on things to be able to reduce cost to the DOD decrease risk to our force, risk to mission as well as increase our readiness in general, save lives you have the chem bio defense programs kind of primary mission that appear in the top left, sort of some of these red shaded but there are other investments that are in our partners that are across the DOD looking at heat trains there is a huge problem within the DOD other things like human performance cognitive performance tracking when someone has become a casualty all these different things they run on basically the same model you got to get a device on a person feed it to an algorithm the algorithm will be finely tuned to one of these things that's on this wheel and then you move the data and move the information via some kind of architecture so somebody can make a decision one of our problems and one of our challenge points and something that my program very much is focused on trying to address is how can we move that data off the skin into some place where these algorithms are hosted how can we get the data from those algorithms to some kind of command center of sorts so that people can have insight at the individual level and they can also have insight in enterprise population level that is really really really challenging for the DOD and it makes sense if you think about our mission set we have people spread all over the place people are under the sea, on the sea in the air, in the US in all these different locations that are literally all over the globe and sometimes people have access to say a cell phone and AT&T wireless and they can use that old technical back end to move the data other times they can't so it's really really challenging to get the data off the skin past it to an algorithm that could be deployed say on a watch, on a device could be on someone's phone could be maybe on a local server or local laptop that's maybe like on a ship on an installation so we take that into account we have to take all of these into account because we have people spread across everywhere and what we saw with COVID is the bio threat in particular it doesn't really matter where you are the bio threat can spread very quickly and it can affect someone on a ship maybe worse than someone that may be deployed in Hawaii in a training event so it really is a challenge for us and it's also an opportunity because it will buy us time if we can figure out how to make all the technical back end how that we can coordinate together all these different algorithms I think I heard someone during the previous question and answer mentioned something about AI ML platform like an analytics engine to run all these different algorithms on we can figure out all of that it is an excellent opportunity to monitor for when someone's been exposed to can buy a new hazard or other chemical phase hazard burn pits for those that are tracking that's a big deal there's a lot of things that our war fighters get exposed to that wearables can help counter it's very very exciting it's a huge opportunity for us to improve the overall general health and readiness of our war fighters but it also comes with a lot of challenges that we are trying to work through so that I can pause here and just see if there are any questions or if we want to hold them off to the end of the entire session but that's kind of at a very very very high level few minute talk on within the defense department one of the areas that we're using AI ML for and more specifically AI ML has applied to human physiological monitoring or like I said just wait to the talent of this there's something quick we could do it now but otherwise we'll hold back for the discussion thank you that was a fascinating talk really highlighting all the different really extreme exposures that our servicemen and women are exposed to sounds like a really exciting and hopefully fruitful area of application for AI yeah we can discuss it more at the end of the session thank you so much our next speaker is Dr. Akane Sano he's a system professor at Rice University in the department of electrical computer engineering computer science and bioengineering she directs the computational well-being group and is a member of rice digital health initiative her research includes data science machine learning and human centered intelligent systems for health and well-being and spans the field of effective computing, ubiquitous and wearable computing and biobehavioral sensing and analysis and modeling she's been developing tools, algorithms and systems to measure forecast, understand and improve health and well-being using multimodal data for mobile and wearable devices and daily life settings and in clinical assessments recent awards include the NSF career award the best IEEE transactions on effective computing in 2021 and the best paper award at IEEE she received a bachelor's in English and master's in English from Kaio Japan Kaio University Japan and a PhD from MIT here in the United States welcome and I'm very excited to hear your talk yeah, thank you very much thank you for the invitation and today I'm gonna talk about research about multimodal machine learning and human centered computing for health and well-being and yesterday and today we have been already discussing a lot on how to integrate multimodal data from human environment for improving for measuring and improving environmental and human health and today I like to talk more about how can we design this kind of personalized feedback system by combining three different components sensing, measuring different multimodal data and interpreting what they mean by designing some new biobehaviour markers also designing prediction model health and well-being prediction models and we also want to understand ideally what is causing other outcomes and not only that we'd like to also connect that to providing feedback hopefully actionable feedback for intervention, treatment plans or some information to help this make decision making for users including like patient, people at higher risk for health and etc and so some of our studies are targeting both patient and non-patient population for example these studies can be targeted in multiple domains in clinical areas in neurology psychiatry and oncology but also in addition to patient we also have been studying the technology designing technology to support people who are at higher risk for health conditions for example this study is we are currently running clinical trial to evaluate personalized sleep and well-being assistant for shift workers including doctors and nurses so they wear sensors to measure their physical activity sleep and heart rate and then in this study we deployed some machine learning to provide well-being prediction and also burnout prediction to shift workers but also we also have medical doctors in this system to review shift workers data and provide some suggestions to improve their sleep and health based on cognitive behavioral therapy in somnia and so we deployed another machine learning models to help medical doctors to review the data and provide suggestions so currently we are evaluating effectiveness of this kind of system and how the users shift workers also medical doctors use this kind of system so when we design this kind of system we have a lot of challenges to solve in many different stages for example in data collection modeling how to design feedback and how to deploy this kind of system models in real life and today I want to focus on three topics in my presentation especially so the three topics are how can we design here in equitable systems in machine learning models for diverse group of people and secondly I would like to briefly talk about how can we robust interpretable models with limited data and labels and lastly I want to also introduce how can we deploy models developed with multimodal input to the wild with fewer modalities so the first problem I want to talk about is we want to engage users and participants in this kind of system while still diversifying data collection so we want to think about when to sample data and labels and how can we provide and when we should provide feedback to users then we would be able to use a lot of multimodal data to predict when is the right moment to sample data and also provide intervention etc however the issue is participant and user receptivity is also influenced by their contextual state so that means if participants or users are experiencing some issues for example some health issues they might not be responding so that means if we just receive the data, collect data only when they are likely to respond our data might be biased so potential method is we want to diversify some things by thinking into account participant context when sending for example surveys, ecological moment assessment or interventions and another issue related to equity is some algorithm might be making some skewed decisions for some particular group of people and this bias might be coming from different reasons for example in data collection data labeling and model training but how can we design generalizable bias mitigation techniques for example designing model tuning models or generating data so that our models can work accurately also equally for different group of people for that we have been testing our framework the bias mitigation method based on multi task learning and multi color dropout I will be briefly talking about this but the way we do is we design this kind of neural network model to produce two outcomes prediction levels and also protected levels this includes gender or ethnicity etc then we are manipulating the weight of neural network by using multi color dropout so that we can control the uncertainty of protected levels then we want to increase the uncertainty of protected levels while still preserving the performance of prediction levels so that we can minimize the bias from protected levels on prediction levels so we evaluated this kind of framework with several different data set and so far our results has been showing that we can improve fairness metrics of our algorithm while still preserving the performance so the next problem I want to talk about is limited levels and data so for example sensor data can collect large amount of continuous data however it is very expensive to get user input annotation or labels and so that means we have a lot of data that are not labeled unlabeled and only portion of the data is labeled so how can we train robust model with smaller amount of labeled data and we have many potential ways to go and one way we want to think about is how can we leverage unlabeled data to design more robust also hopefully interpretable models and so contrastive learning is one technique to train a model using unlabeled data by contrasting samples against each other and however in this contrastive learning method we rely a lot on data augmentation techniques that means we need to tweak tune a lot of parameters for data augmentation techniques too for learning robust representation but so research target is can we learn augmentation policies automatically without tuning manual so we introduced this LEAPS lightweight module for training data augmentation automatically so we made this LEAPS module to help us automatically learn data augmentation using differentiable data augmentation also adversarial training framework and our experiment shows that LEAPS can effectively and efficiently select parameters automatically for robust training and the last thing I want to talk about is how can we integrate multimodal input and for prediction models and systems and deploy that kind of model in a real life because a lot of applications might want to use multiple sensors for better prediction performance et cetera however when we think about deploying that kind of model in a real life we want to reduce the number of sensors because we want to reduce the device size, cost, energy consumption user burden privacy issues et cetera so how can we bring the model designed with a lot of sensors input to the application in the wild with fewer input for this kind of problem I think there are many ways to go as well but one thing we have been trying is designing this kind of framework we call it more to less framework that allows us to effectively fuse information from multiple sensors modality sensors and also allows us to positively transfer the knowledge from strong modality to weak modality but when we test our model in the wild and we want to reduce we can reduce the modality input sensors still while preserving the performance so today I talked about only some of the challenges we have been encountering however we also have other challenges to overcome when we design this kind application and systems and we start I'd like to thank my team members at Rice University and also collaborators and also founding agencies Thank you so much that was a fantastic talk I really appreciate it to keep on time we'll move on to the next speaker like I said hopefully we'll have time for Q&A at the end of the session I'll ask all the speakers to stay on for questions so our next speaker here is Dr. Jin Chen Mai he's an assistant professor for the department of geography at the University of Georgia Dr. Mai is interested in machine learning deep learning, geographical information science geographic question answering NLP geographic information retrieval knowledge graph and semantic web applications currently his research is focused on geographic question answering and spatially explicit machine learning models Dr. Mai is also affiliated professor and graduate program faculty professor science at the University of Georgia and the University of Georgia Institute for Artificial Intelligence and member of Aga's environmental artificial intelligence faculty cluster along with the Institute for Integrative Precision Agriculture he received a PhD in cartography and geographic information science from the University of California Santa Barbara there's a graduate student research at both space and time for knowledge organization STKO lab at USPC Spatial Center we're so excited to hear what you've got to share with us today Thanks for the introduction it's my great pleasure to be here to share our recent work about foundation models for geospatial and health tasks as a geospatial researcher so my research will be more from a geospatial artificial intelligence perspective so my talk will be in three part so the first is how could we use foundation model in geospatial task and what is the unique challenges and also can we make it in the automatic manner so recent trend in machine learning and AI speak to the power of skill and generalizability so instead of developing task subservient models we are interested in developing foundation models which is just a large task agnostic pretraining model which can be adapted between future learning and future learning on a wide of domains so good examples is open ASGPT3 and del E2 so in fact foundation model has been developed in a lot of domains so in naturalizing processing we have heard a lot about that so Stanford apaca apaca is one of the open source large language models and tech GPT and GPT4 is widely regarded as third large language models in commercial visions google's image gain stable diffusion and del E2 very important diffusion based vision foundation models and metas segmentation anything models is one of the important segmentation vision foundation models so in reinforced learning deep mines gattles is one of the good example and in signal processing open AI has combined their whisper and GPT to achieve a lot of signal processing task so as a geospatial data science researcher my question is can we duplicate or how the existing cutting edge foundation model performed when we compare with state of art fully supervised task specific models on various geospatial tasks so in our recent work we actually test on four different domains like geospatial sematics urban geography remote sensing and health geography so it's very interesting because in our domain we call it health geography not environment health so so basically because of the time limit we are just discussing two applications geospatial sematics and health geography so first I will discuss about applicability of foundation model on geospatial sematics so we want to investigate the performance of different larger models on some well established geospatial sematic tasks like top known recognitions based on recognized large scale place names and location description recognition so basically we let just chip GPT eight few short examples and let them to generate some text which we regard as recognized place names from the text highlighted in the yellow so on the right it is similar approach for the location description recognition but the task is more challenging because it is not at recognizing large scale place names but recognizing multi entity background address from tweet data so we test on three different well established dataset and we find out that we experiment on different GPT models including GB3 and GB2 in order to benchmark it we compare it with 15 different baselines from the literatures especially the state of art fully supervised tasks of the model like neural GPR and we find out like in the first top known recognition task foundation model like GB2 instructive GPT they can consistently outperform the fully supervised task with only eight few short examples and for location description recognition GB3 can achieve the best record it seems like it is not related health but you will immediately see how it can be worked on public health research another application is on one of the health of geography task so specifically we are using a large language model to do time series forecasting on US county level dementia record so basically we give about the historical dementia record for specific counties and ask them could you predict me what will be the dementia record for next year and we find out without any training data or without any specific training the chat GPT is able to outperform especially instructive GPT and GB3 can outperform the fully supervised arena models without any training data so this is the visualization for the map where this is the prediction errors the blue indicate underestimation where the red indicate overestimation you can see different GP2 models significant under estimates the dementia record where the instructive GPT and GB3 provide a more balanced reasonable predictions so next I will discuss upon some unique challenges for foundation model to apply this to this kind of task so one of the shortcomings is this kind of foundation model by design they are unable to handle geographic coordinate so for example if you ask chat GPT not only recognize the place name but please predict the geographic coordinate even if they can recognize the place name very well their prediction for coordinates is hundreds miles away from the ground truth you can see from the map why is that because by design they are unable to handle geospatial vector data like point, polylines and polygons and they cannot perform implicit spatial reasoning in a way that is grounded in real world another uniqueness is many data modalities using geospatial data now we see a lot of environmentally health researchers using geospatial data but geospatial data itself is also multimodal we have geospatial vector data remote sensing images, spatial images geotec data, geografi in knowledge graph so this really call for multimodal approach so in our recent vision paper we propose like we need to develop a multimodal foundation model for geo-air that use geospatial relations as alignment among different data modalities so the advantage is that we can do knowledge transfer across different data modalities so how to achieve that so to tackle the first goal so because to make the model geo aware aware their geografi locations so we have developed service tools we call spatial representation learning so the idea is we want to represent spatial data like point, polylines and polygons into the embedding space so that it can be used in deep neural network another challenge as I said is multimodal multimodal training so we take the first step to propose a multimodal training objective specifically for geospatial tasks so we call it a contrasted spatial training so basic idea is we contrast the representation between a geographic locations with a visual representation or it can be language representations in a self-superizing manner so the advantage is obvious do knowledge transfer across different data modalities to make the model geografi aware so lastly I want to talk about most recent work about how to make large-language models in an optimal manner to achieve a public health task like Alzheimer's digital infodemiologist so basically we call it AD auto GVT so basically it is a GPT-4 based AI assistant model which can conduct data collections data processing data analysis about complex health narratives of Alzheimer's disease in fully automatic miners where users textual inputs so you can see this is over like a very high level overview for what a model looks like so basically you can just tell them what is the final goal so could you help me to know something about something new about Alzheimer's disease and maybe draw some plot for them so you ought to do that the GPT-4 will first interpret this goal divide the final goal into several smaller tasks and then he will identify the useful tools from the instruction libraries and solve each of the smaller tasks and then it will form a data processing pipeline from them and all of this process is conduct automatically by foundation models so this is some result so basically AD auto GP is able to automatically first search Alzheimer's disease related news from Google and save the news articles in the disk and then it can extract spatial temporal information like doing the top number recognition as we said before and force doing some topic modeling on top of this news and thefts to visualize some result so here we can see the figure A is the spatial distribution of all the extracted places mentioned in this Alzheimer's disease news for the past one years figure B is the news count for months for the past one years about Alzheimer's C is the LDA topic modeling about all news articles and D is a stream graph to show the topic trained for the past one years so that's all I'm going to say we are also organizing special issue on geospatial foundation models on our flagship journals in GIS science and that is all the papers based use in my presentations and thank you for listening Thank you so much, that was fantastic I really learned a lot in this talk about new areas for GPT to expand we've got one final speaker in this session so we'll move on to Dr. Nicholas Scaff he's an environmental public health fellow at the CDC in working with the environmental public health tracking program he applies principles of ecology, epidemiology and data science to assess spatial and temporal patterns in human health issues rising from environmental factors he's particularly passionate about developing machine learning models that can identify and forecast important health crises and creating compelling data visualizations to communicate findings environmental concerns and collaborates with FW Data Intensive Landscape Limnology Lab in the Global Lakes Ecological Observatory Network to better understand anthropogenic threats to fresh waters at national and global scales Nick received his Ph.D. in the Department of Fisheries and Wildlife at Michigan State University his dissertation research assessed the combined effects of extreme climate events and land cover on mosquito-borne disease transmission later as a postdoc at the School of Public Health at UC Berkeley he applied machine learning methods to understand the environmental drivers of West Nile virus transmission in California. Welcome and I'm very excited to hear more about what you've got to share. We can see your slide but I can't hear you. Is that working better? No, can you hear you? Okay, awesome. Give me one second. Of course. Alright, thank you so much for having me here today. I'm gonna be giving you a brief introduction to the Environmental Public Health Tracking Program which is situated within the National Center of Environmental Health at CDC. I'm hoping that and this overview of our program can seed some ideas about different AI applications since we don't do a whole lot actively in that space right now, but it might be a potential area of growth for us in the future. So at tracking our main aim is to connect environmental information and related health information in kind of a one stop shop. We make this data accessible to anyone and we make it very easy to visualize and share and download. This slide gives an overview of some of our data offerings. So if you start on the left hand side of this slide we address a variety of environmental hazards including air quality, extreme heat, drinking water quality drought among many others. We also look at human exposures, for example pesticide exposures. A key area for us is also health effects of different environmental exposures and hazards. So these can include asthma, cancer, heart disease, heat related illnesses, childhood poisoning, and many others. And finally we like to think about the characteristics of populations. So what are the socio-economic and demographic characteristics of different locations, what kinds of vulnerabilities do these populations have, and even things like the design of different communities, like access to parks or proximity to highways, that kind of thing. We collect these different kinds of data both as a standalone program at CDC and also by funding 33 recipients, mostly state health departments, but also local health departments through a cooperative agreement that asks these different jurisdictions to build and maintain their own tracking programs and environmental data networks. The goal of all of this is to grow public health capacity in these jurisdictions also to build expertise in environmental health surveillance and implicitly it's to modernize data systems in these jurisdictions. To kind of collect and maintain standardized data from these very disparate locations, these are different tracking funding recipients, we've created something called MCDMs or nationally consistent data and measures. These MCDMs help to ensure that the data recipients release on their own and the data that they share with us is consistent across all the different jurisdictions and can be harmonized with existing data on the CDC tracking network. Some examples of data that our recipients produce, share and provide to us include data on hospitalizations including things like hospitalizations for asthma emergency department visits birth defects reports of birth defects community drinking water quality measurements, radon testing among others. So we have a lot of data but our goals extend a lot further beyond just having data. We work really hard to deliver the data in ways that address the needs of different stakeholders and also to inform decision making at local, state and national levels. Our flagship product is the interactive data explorer. At this point it has over 700 different environmental and health data measures. It has a variety of data visualizations including chlorocluff maps that you can overlay with other maps and show side by side with gridded data products. We also have charts and tables that can be visualized using this platform. The platform also allows you to download the data for further analysis or export maps and an exciting new feature we have is the ability to embed produce HTML code that allows you to embed any map chart or table into your own website using very, very simple code. We also have different dashboards that focus on specific topic areas. For example, one that we have focuses on environmental justice. These dashboards are a bit different from the interactive data explorer in that they focus on a specific topic and they provide additional context that we can't show in the data explorer, such as text and infographics that help with data literacy regarding the dashboard and just generally provide better understanding of all the data. All of our data can be accessed using an API that we produce that's publicly accessible. So the data are very easy to access by developers if they're creating their own apps or websites but also epidemiologists or other stakeholders that might want raw data rather than a map that's already been produced. I want to provide to kind of wrap this up, I want to provide some additional details about our API because I think that would be really relevant to any AI applications that might emerge from our data. So just a bit of an overview. Tracking data is organized in a tiered system where we have measures that represent the actual data. We have indicators that house different suites of measures that address a similar topic and finally we have broad categories of indicators that are grouped together in the API. The API responds to several key functions that both describe the data available and allow the user to understand what data is out there and to download it. For example we have functions that return lists of available items that are content areas, indicators and measures but which content areas are available indicators and measures that are available. We have functions that return lists of available geographic boundaries and temporal aggregation so you can query to figure out if a data product is available at the county level, the tracked level, state or if it's available daily, monthly, annually, that kind of thing. Also you can use functions to return lists of different places where data is available so you can ask whether data is available in particular county or during a particular year. Is this data available in 2020? And finally the keystone function that actually retrieves the data is called GetFullCoreHolder and it can take all the information that's queried from the above functions to retrieve data interest so we can say we want data from a particular content area in the year 2020 and pull that data down. And then finally we have a software package for the R programming language called the EPH Tracker that serves a wrapper for the API. This makes API calls trivially easy for our users and facilitates a bunch of additional applications like statistics, mapping for example creating dashboards like Shiny, our Shiny dashboards and many other applications that you might be able to access using our language. So I hope the API and maybe the package can be a springboard for a variety of different applications for our data. Yeah, and with that I'll conclude. Thank you. Thank you so much, that was great. We've got a few minutes now before sort of turning it over to the closing session here to take questions and have discussions. So I'd invite our speakers to please put your cameras back on, be ready for questions and if our speaker in the room could also come near a mic we'll open the floor to questions about anybody's discussion here. I can say one thing I really enjoyed about this session was really seeing at a very practical level what is, what are some great translational uses of how you each presented some really important use cases and left me very optimistic with how close some of that technology is to actually helping. So I think that's a great way to end this sort of session with something sort of practical and optimistic. This is Megan while at the room and thank you all for great talks. Nicholas, good to see you. I was wondering if one of the things I've heard about tracking is it's often delayed and oh great, it would be awesome to have real time data. I'm wondering if AI might be able to help in that space. Do you have any thoughts about that or is that something you really haven't thought much about? It absolutely is. We are moving in that direction in terms of bringing in not necessarily real time data but definitely low latency data in collaboration with some of our partners at NASA, NOAA and EPA where we tap into gridded data coming from satellites or reanalysis products process that in some of our newer cloud based data systems and present that in our interactive mapping portal and that kind of thing. We haven't done anything that specifically incorporates AI. We have a tangential familiarity with some of the other products that were mentioned in this session like Tower Scout but at this point we haven't used that product or any other AI products but I think we're open to that. It's just about having the capacity and the funding and our leadership emphasizing that that should be a priority area which might be coming in the future. Since it's quiet to follow up and just say the real time data to me would be at least low latency data would be most interesting with the idea of prevention again using AI to maybe identify worrisome trends so that there might be an opportunity to intervene. I'll come back yesterday, I mentioned Flint, Michigan and if we had maybe some data that was we didn't have to wait until a pediatrician got to these high blood levels maybe AI could have done that for us and we could have intervened sooner. It's a little quiet and don't forget you can put your questions or comments in the chat as well in a little space, I'll take advantage of my moderator role to ask a few more questions. Across the areas that you guys talked about really what are top research priorities like what do you see in your areas that are the current limitations and what are those top priorities that you think are needed to move the work forward and I'll ask that to all of you. This is Lauren Henkva from the APO with the wearable talk. In my mind one of the top research priorities for us is sort of an enterprise level roll up monitoring. There was a mention of using AI just to be able to try to collect insight at the individual level again as I mentioned during our talk we do have challenges of moving that data that's not necessarily a research kind of problem that's something that just affects the DOD but once we do get that data moved into a centralized location being able to pick out sort of the needle in the haystack type thing like being able to monitor and predict when someone's getting sick before they actually show symptoms you can roll that up at a large population enterprise level that's a huge powerful tool to be able to identify a pandemic where some other bio threats that's being kind of wrecking havoc in the area and you can do that sooner so from my perspective one of the areas that we are lacking is sort of that enterprise level analysis just teasing out when are we actually entering a pandemic? When is this just a normal run of the mill flu outbreak something to pay attention to but not pandemic mode, everyone you know, deep on your groceries when do we need to flip that switch that's a very difficult thing and I know there is research ongoing there but I would be very interested in anyone that has any insight or ideas on using that but with wearable data with you know, collecting heart rate physiological data just try to be able to know when do we flip that switch Thank you. Shirag, I see you've got your hand up and then we've got a question in the chat too but if you'll Fantastic, yes, thanks for letting me ask the question on the great talk that you gave Dr. Mein, I think this is applicable to all when you talk about foundation models particularly in this dynamic environment in which you are processing these data how are you thinking about updating this information such that it is timely for tasks that we cannot anticipate can you speak to that? That's an excellent question actually, I actually want to share my thoughts on that recently think about what are unique challenges in geography and health in term of using the foundation model if you try to use TP4 or try to GPT and sometimes we will ask you if you told you that sorry, our model is trained upon information between 2020 to September and all the information in the future we cannot capture that but if you think about how much money or resources you need to retrain this kind of model is huge so and not only impact it's not like you have economic impact but it also has environment impact consider like carbon consumptions and we recently have a paper on that but if you specifically think about geography and health issues we consider how many new data like satellite data or like strivial data you collect millions of them every day so basically and also in health you have new tracking data every day so basically you need to refresh the foundation model every day to make sure you keep up space so which is if you think about the cost it is unbelievable so in our paper we discuss about this is not this should not be the correct way to just train a model and refresh it and refresh it and then we need a new approach for example to a cheaper way to do that and also as we said in the previous sessions we need to estimate the environmental impact of all these large language models so some of the for example some of the companies or countries who use this model a lot but they necessarily they didn't pay the environmental cost because they are trained on other countries so we should also raise environmental justice issue Rima is your question a follow up or a different topic see your hand up sorry Allison it's somewhat of a follow up if you don't mind go for it then we'll go to the questions in the chat so it's also for Dr. Mei I very much enjoyed your talk, thank you I wonder also if you can talk to us a little bit I know in health geography let's say we're looking at trends across geography and across different aggregate boundaries in space do you see any challenges or what do you think we still need to work on to be able to use these amazing models that you talked about to go to let's say individual level health risk prediction do you feel like we're there or do we need kind of more work to get there, thank you thank you that's an excellent question so actually I want to ask this question question myself a lot in geography we have a well known problem called magnify union area problem which just means you will get different result when your analysis in different level for example zip code level, county level, state level maybe sometimes you analyze individual level you are not getting a certain different result maybe the hypothesis will be proved so we will reject just similarly based on how you partition the space so this really a challenge when we when we meet this issue first of all there are several challenges first because of the privacy issue considering how now the foundation model is open access to all of us so we actually see some paper to see in some specific problem to hack the foundation model to actually extract some sensitive information from the user like the health record but how do we deal with that for example openly I think they are collaborating with some medical companies to build a private foundation model for specific purpose this is some privacy issue another issue like I mentioned before because different special partition you have different result it is basically also related to the special heterogeneity of all the phenomenon we have seen for example in different countries in different states so however all the AI models we have seen is just universal model which is applied everywhere for example if you train a model in US for the tele-GPT can do a lot of amazing things or may go to another country who have limited data it is very difficult to do that so we are also working on I think there is another speaker in last week talking about special heterogeneities I think this is a pricing issue especially we call it a fairness across the space this is I think the two most important thing Thank you I'll share two of the questions sent to me through chat in AI regulation there appears to be little mention of how humans might interact incorrectly with AI outputs as in the uncertainty in the AI derived prediction what might regulation in the framework of human AI communication look like it's a big question Yeah, I think that's a very good question so I think always we want to make sure what is the uncertainty or certainty, the confidence level and output because of course the model won't be able to you know behaving perfectly so we want to make sure that when we want to also carefully evaluate when the model makes error and why and then always try to output not only just 90% performance or some outcomes but how confident they are Thank you As a follow-up we've got as AI derived predictions into the forecasting mainstream how can the potential for model hallucination create a parallel misinformation pandemic for example augmenting downstream models with predictions data from AI models upstream Yeah, I think this is a great question so I also talked about data augmentation in my talk but that means we might be able to just augment the data themselves so that it can just generate completely fake data then how should we stop doing that kind of things then we might want to make another model to verify that the data streams are real or fake or integrating the kind of verification into the system but there might be another model actually developed for even cheating taking that kind of fake to taking models Thanks, we've got one more question in on the chats How shall AI establish linkages for preventive measures when health data and environmental data are in different time scales health data of healthy people are discrete and environmental data are continuous so I think a general question on you guys' thoughts on how we're going to deal with the heterogeneity data types both in scale and format I can talk about that I think I really like the answer from previous session from Dr. Joe about using a knowledge graph for the solutions because the knowledge graph is supposed to be the design solution to integrate data from heterogeneous source from health, from environment from other resources to actually develop a knowledge graph called nowhere graph where data from cross domains but we realized there is still a lot of challenges like data availability and some data format issue but I think this is one of the ways I think it's more promising Thank you So we're at the end of our time for discussion, I appreciate again really enjoyed everyone's talks I appreciate hearing your thoughts and input in this discussion I know we've got just a minute before we switch over to give Carmen time to give closing remarks Thank you Big round of applause everybody Thanks so much That was a fantastic session so thank you very much to those speakers and to Allison for pulling that together So good afternoon everyone Thank you all for joining us for these two days of thought provoking and energizing presentations on the potential for AI and environmental health We started today with a broad overview of issues regarding governance of AI data and the needed infrastructure to make some of these opportunities that we've heard about work Dr. Grigorek started the session by providing examples of the power of AI particularly highlighting alpha fold which can accurately predict the structure of a protein just by inputting a DNA sequence She highlighted that NIH in collaboration with corporate partners has made over 200 petabytes of data available although this data may not be in the appropriate form to utilize in AI tools This brings up the importance of the development of common data elements and she noted that the NIH Office of Data Sciences is working across institutes to assure that data being generated by researchers they support is findable, shareable, and AI ready She also highlighted the aim ahead program which has a goal to enhance the participation and representation of researchers and communities currently underrepresented in the development of AI improving capabilities and addressing health disparities Dr. Kalzada from the CDC's Office of Public Health Data Surveillance and Technology is bringing his experience as a technology leader to public health He described the development of an improved public health data strategy at CDC bringing together public health functions and data technology to improve public health responses He noted a big challenge faced by CDC is around the collection, sharing, and use of data and his hope is that there could be machine based solutions to address many of these challenges He provided a number of examples that the CDC is using AI tools to improve risk prediction and response He noted that to continue to build the AI tools at CDC there needs to be a growth in data science and AI workforce and to build infrastructure to create scalable solutions that can serve as a resource to state and local partners Dr. Haven then joined us from data and society to talk about governance of AI She underlined the increased recognition that AI is a democracy issue and that misuse of AI could hold harms and negative consequences to specific individuals and groups of citizens underskoring the need for good governance of these systems Currently the EU is leading on governance with EU AI Act, now in a draft version that is based around a risk based framework as a core assessment tool So they are regulating the use of the technology not the technology itself and classify high risk to include harm to peoples health and impacts to an individual's rights In the US there is a blueprint for AI governance which is a voluntary guideline for technological protection and privacy While we have models for regulatory guidelines we are still strides away from implemented legislation Dr. Haven emphasized that there is a promise to a future of improved understanding implementation of AI for our greater society and environment that the governance designs should be participatory and democratized and not solely led by major corporate players To end the session Dr. Shazan Dorsey discussed the relationship between equity, human health and environmental health, particularly in the Chesapeake Bay Highlining the Chesapeake Bay Restoration Project Novel technology has been utilized to measure and model environmental inputs and outputs but now we can reimagine restoration through AI Examples including real time environmental modeling identifying environmental vulnerabilities managing complex data sources streamlining communications and more In our final session we highlighted tools and technologies that with AI could advance environmental health and biomedical science The session started with Mr. Henkele from the US Department of Defense talking about new wearable technologies that are being used to protect US military personnel These devices are fundamentally changing the tools that are being used moving from environmental sensors to personalized smartwatches most of which are focused on physiological monitoring These monitors collect data and use an algorithm to interpret that data which is then used to make decisions Impressively this has to happen for personnel in a variety of settings and around the world which poses its own challenges but also important opportunities to personalized responses Dr. Sano discussed multimodal machine learning for health and well-being and through personalized healthcare systems that sense, interpret and provide feedback to improve patient experiences As an example, she talked about work with shift workers to evaluate their health, provide support and give feedback to their healthcare providers all to prevent burnout and improve well-being She talked about challenges of data collection with higher participant engagement the need for understanding the participant's context when asking for data and providing interventions bias in the algorithms and training data limited labels to data which can impact algorithm training and bringing multimodal data collection and modeling into real life settings She provided ideas for solutions to each of these through a variety of technological and analytical innovations Dr. Maya, geospatial researcher talked about the utility of foundation models in geospatial data He uses these AI tools for a variety of applications including recognizing specific places of information like place names and descriptors and for predicting health outcomes He talked about some of the specific challenges that are faced in geospatial data analysis and work that is being done to address these specific issues He highlighted the opportunities to use foundation models in a chatbot form to be able to create visualizations of health and disease information including geographic distributions Finally, Dr. Schauff talked about the environmental public health tracking program at CDC which is bringing together environmental and health effect data in one place By providing the data they are providing opportunities to develop dashboards or visualizations that are relevant to specific needs The program also provides support for different partners to improve data systems and provides training opportunities so that those partners can improve their high quality consistent data and that they can bring that data to their residents and share that standardized data with CDC They also have resources for statistical packages and interfaces that could help individuals and investigators make use of the wealth of their data Overall, I think we've seen building on what Dr. Bakarelli mentioned during his introduction that over the last two days we know that not only is AI an incredible opportunity for environmental health but it's actively being utilized and it holds incredible potential Of course, the continued expansion of AI in environmental health and biomedical research will not be without challenges and bumps in the road but we are now at a point where we can proactively work to improve data and data standards develop clear and just governance structures work towards reducing biases assure broad and equitable distribution and train a diverse new workforce in these tools and technologies I want to thank you all for taking the time to join us and learn about these innovations and the potential for multimodal AI in environmental health I want to take this opportunity to thank all of our speakers over the past two days who really opened our eyes to the potential of AI Also to the planning committee for this workshop who helped bring all of these great speakers together thank you all very much for that and in addition we have been supported by a great group of staff from the National Academies of Science including Lily Lujacak, O'Shane Orr, Natalie Armstrong Jessica Desmoy and Elizabeth Boyle thank you for providing the support needed this has been great and I think now we just want to tell you a little bit about the breakout sessions that's going to be happening next so can we bring up that slide on the breakout sessions great so we hope you will join us for these breakout sessions in informal networking and make use of the links that you've been sent so if you're online you've been sent some links to your email to be able to join into those sessions the goals for the breakout session is to provide an opportunity for participants to engage and exchange ideas have a forward-looking discussion based on what we heard and learned about during the workshop and the actionable steps that can be taken that will move the needle for those breakout groups we do have some instructions on how to structure the discussions we'd ask that you introduce yourself your institution or who you might represent or where you're from and your area of expertise and then if you had any terms or concepts that have come up in the past two days that were unfamiliar to you and maybe are still unfamiliar or you don't totally understand please talk about those and I think there could be an opportunity in that breakout group who might understand it better to be able to help address that misunderstanding so I think this is a great time to learn even more and then we have a few prompts about questions that we'd like to have discussed in each of the groups and each of the groups will be assigned a theme around which they can discuss some of these but thinking again about challenges and barriers and how they can be overcome what steps should be taken are there tools to better integrate AI and EH in biomedical research what are some of the big key concepts and technologies you thought really will advance the field that you may have heard about in the past two days and what could be some of the next exciting advances and so we hope you take some time to join us in these breakouts we think this is a great opportunity to be able to keep moving this discussion forward again thank you all for joining us and have a great rest of your week