 My name is Dennis Barber from Washington University in St. Louis, and I'd like to begin by laying out generally what the structure of the next two hours should look like. I'll spend a few minutes introducing the symposium itself. I'll then present my presentation and then transition to the other presenters. We have not a time break in the middle, but we have a demonstration that's going to occur during the time interval of this symposium. If you navigate to this location, we'll put this link in the Zoom chat as well. You'll have multiple opportunities to navigate to this location. There is an online data acquisition system that you can explore. You can give your data into this screening hearing test. And then at the end, again, there's no time break here because we don't have the ability to adjust the starting times. We know there are other sessions going on synchronously. You can stick to these starting schedules. But if you have a break, or if you have colleagues who aren't attending the session that you'd like to send the link to three to five minutes or so, you can see what it's like to do a web-based hearing screening test. And at the very end, we'll reserve 15 minutes or so to review the results that we collected in real time during the symposium. And then any remaining time, we will address questions and answers for all of the talks. I'd like to, oh, so I also need to read an organizational paragraph from ARO. The organizers of the ARO Midwinter Meeting asked that all attendees adhere to our Code of Ethical Conduct. In particular, we would like to remind everyone that the creation of personal reproductions of any visual or audible aspects of the talks in the session is only allowed for the purpose of increasing comprehension by individuals who may have difficulty understanding the speaker during the initial presentation. There is also a companion webpage associated with the symposium at this link. And again, this will be put into the chat throughout the symposium. It has a little extra information about the content, as well as below this content here, a comment section where you can place questions of your own or add comments. And these comments will endure past the symposium. If you're inclined, you can also use the Q&A option of Zoom. So the session itself came about because we understand as researchers and clinicians that the typical way of doing business today is to have an investigator or a clinician working directly with a patient in the same aura or a research subject in the same physical location. So technology is emerging that allows you to break this physical link, and that's something that we'll talk about in this symposium. But also, there are additional capabilities that using off the shelf hardware, using advanced algorithms, using experts outside the expertise maybe of experimenters or clinicians, and how we might go about exploiting those resources today and what newly on the horizon resources might be available in the near future are also subjects of this symposium. So the goal, our goal is to demystify how you might do this in your work today, and also what's coming in the near future. So with that I will transition into my talk again I'm Dennis Barber from Washington University in St. Louis, and I'll be talking today about framework of inference that I'm calling self validating tests, specifically for automatic testing for precision hearing assessment. So what I'll cover is focus on clinical scenarios of limited inference. Several of us will be using the language of clinical investigation, but we believe that a lot of these ideas are general, well beyond just clinical applications to research as well. I'll introduce what I mean by self validating test and I'll clarify their advantages and then give examples for perceptual estimation. So let's begin with a clinical example. In 2017 the American Heart Association changed the long standing guidelines for hypertension. Before 2017, there were three categories of hypertension, normal range, a pre hypertension range and the hypertension range for various reasons they eliminated an entire category in 2017, which sounds counter to the trends of precision medicine and big data, where we expect to find more refined estimates of quantities of interest. So again there are several reasons for why they did this but they were all empirical. Here's an investigation leading up to that decision that I found interesting and illustrative of the point I'm trying to make. And that third category that was eliminated was not predictive for patient outcomes of interest. And this particular study. I found interesting because it was a meta analysis of large numbers of study populations, nearly a million people, but they still concluded that more data are needed. So, their conclusion, we need additional data to try to understand why these measures are predictive American Heart Association's conclusion was let's eliminate this entire category because it's not predictive. I have an alternate proposal, and that is to use the data to refine our estimate of the underlying construct of interest, which is we might call cardiovascular health, rather than just a single measure a single biomarker that's measured maybe once a year in a doctor's office. If we instead insert the actual construct of interest into our inference pathway we're going to be able to do better predictions and do better by our patients. So I break the the limiting factors of this clinical scenario down into three categories, conflating measurement with the underlying construct so again, a blood pressure measurement is not cardiovascular health it's a biomarker. It's correlated, but not as highly as we'd like it to be. And relying on just that single predictor, even with large numbers of patients, as we can see it's not as effective as it could be. If we actually modeled the construct itself and all of that is compounded by the heterogeneity that's in our clinical and our research patient populations people populations. So, I'm introducing the concept of self validating tests it has familiar and maybe unfamiliar characteristics. And I'm arguing that the self validating tests can address all of these shortcomings. So what is the self validating test. I'm saying here results in what I'm calling a probabilistic model of intermediate immediate observations. And I use that language deliberately. The concept is similar to any graphic models or within subjects or or intrasubject variability models if you're familiar with those but it's not. That's not doesn't capture the full extent of what I'm interested in in conveying. So for this blood pressure example I just used a PM IO in that case would be a model that would predict blood pressure at new observation points. And the reason that's useful for purposes relevant to the symposium is that a predictive model allows us to quantify the predictive validity for new observations observations meaning measurements and other non measurement observations. Immediately, while the patient still in the clinic or while the research subject is still in the lab, or why is while you still have them on the end of a website. And new data in this framework can either be used to extend the model with additional training or to validate it to evaluate its quality. And we'll see some examples of that as we proceed. I'll take each of these three limitations and use some analogies to try to improve our understanding. In terms of understanding constructs. I'm using example from physics here. We can't measure mass. There's no sensor that is a mass sensor. So the construct that we typically use to estimate mass is Newton's second law. In this case with a scale. We assume that specific gravity field that we're measuring the weight in that we're operating the scale in is the same as it was calibrated in. And we just measure weight or force, and we immediately have mass because it's just proportional. So one measurement but there are assumptions about the, the calibration of this device, and the, the gravity field, an alternate method that uses multiple measurements as a balance. It's more general it doesn't require a specific gravity field. Here we take calibrated weights, and we do a series of binary comparisons and an adaptive staircase. That's less than until we converge on an estimate. And then there's a third opportunity to evaluate mass that's probably not familiar it's called the inertial balance here I'm describing an inertial balance that takes elements from each of the two previous examples we have a construct that is Newton's second law but now instead of assuming the gravitational field or any acceleration I'm measuring it. So I have a sensor to measure force I have a sensor to measure acceleration for this mass in motion. And the measurements here we're still subject to calibration constraints but it's easier to calibrate a sensor over its entire range on average, rather than a point wise calibration curve at every single point of the input domain. So we still have this construct only we can accumulate data until we achieve a particular estimation quality quality metric, and then terminate our data collection at that point. So what, what this kind of framework does, even though it's a simple construct that these other methods would work fine for in most cases. It makes explicit that we're replacing assumptions with data. That's what the main point to make from this example. In terms of over reductionism. The, the trend for precision medicine, generally, these days is to take a more detailed focus of the disease, and a more detailed, higher resolved focus on the individual and combine them to create a much richer picture of each individual. The problem with this scenario is that each of these particular measures is itself independently evaluated and at a pretty low resolution so here, these are important for this breast cancer example for example. These are receptor genetic profiles, but these are still each a binary variable receptor present or absent so pretty low resolution and fairly independently considered. And here's our example from hypertension. So even though this method succeeds, and we'll see examples of success of adding biomarkers together to perform better inference. It has limitations that will come back to as well. The synergies between covariates often lost when you use this type of methodology. And then for the last limitation. I have a thought experiment here about biological heterogeneity on the left. I've constructed an example of a dose response again using clinical terminology but I think this is fairly general. So as the dose increases, the response increases in cohort averages. In the middle I've got an example where each person gets their own data point this is an individual differences kind of plot and now I've regressed the behavior. Each person still only gets one dose and is measured one response but I have a population response across all these individuals that's consistent with what we found with the group dose. If we draw conclusions from data such as this of bigger dose means bigger response, which is a reasonable conclusion. These are statistically significant trends. The assumptions that we're making are essentially that the individual models of each person, these blue lines represent these, these predictive models for each person. Either it's the same as this model here of the cohort or the same with an offset. So there's this implicit assumption that there's homogeneity or relative homogeneity in the population to draw a conclusion of that sort. But in reality, these same population data, even the individual differences data in the middle, can reflect wildly varying individual models, individual dose response curves. So in this example, on the left is the most heterogeneous case that's reflecting these data and on the right the most homogeneous so this this example probably would not be a great conclusion that greater doses lead to greater responses. On average, it's true, but they're outliers. And so the biological heterogeneity problem is only really a problem, given that we tend to use statistical inference methods that work well under reductionism and homogeneity scenarios and fields like chemistry and physics. But in fields of brain and behavior, like most of us are interested or biology more generally reductionism and assumptions of relative homogeneity leave us wanting in terms of inference. So the the concept of these probabilistic models that we can use immediately have some advantages. This self validation piece I emphasize in the title of the talk again I think it's relevant as we're looking for more ways to build more complex models to reflect more complex constructs that don't rely on assumptions that that use data to infer quality. But there's a very interesting extension that's available. And that's to allow for more ecological validity by by testing more complex things by bringing confounders into the model. And we'll see examples of that later as well. The big disadvantage of this kind of procedure or this kind of framework of inference is just data heavy we're replacing assumptions with data requires large amounts of data typically that's been a deal breaker most of the time, but two major points I want to make for this, this kind of framework handles more data better than the traditional cohort level analysis. And the point I'll make in the second half of this talk is that that it actually is more efficient to use these methods than we have traditionally thought by exploiting modern machine learning methods. This design represents the way I think about our conventional inference and observations meaning data measurements and other non measurement observations diagnostic class could be cohort identity or label. And then we have a predicted outcome that we expect based on our inference so far. The problem with this scenario is that each person no matter how complex these diagnostic classes are even in the era of precision medicine, each person still only generates one data point for training the model. So the best you can do if you've got a mislabeled person a false positive instead of a true negative for example in the model is to correct that with more data. So these methods require wide data, a little data from a lot of people to get better. Alternatively, this example here where we take the same kinds of observations and if we have mechanistic understanding of disease or of relationships among variables we can use them to refine our probabilistic models, and these can be either predictions in the future these outcomes, or to make predictions of observations that can then be used to validate the model, evaluate its quality or update it immediately. So not needing a long term perspective trial. And the value of this kind of framework is if I have a lot of data from one person deep data, or if I've got wide data, either one, or both, I can exploit to improve the modeling framework, and the quality of our predictions. So this is probably the most important point I'd like to make today is that the major rationale for using these kinds of models is that wide data, meaning data from a lot of people is inherently limited. There's just so many people who are alive or who are sick with a particular disorder, etc. But deep data from each person is not limited. It can, it can grow essentially without bound. So here's an example I'd like to pull from critical care medicine, which has really started to adopt many of these ideas. This is a single instrument stream from one patient. It's time on this axis and heart rate on this axis. The data points in red are retrospective heart rates, the data points in black are prospective. This algorithm is modeling the retrospective heart rates and trying to predict one hour in the future what the future heart rate is going to be. So as new data come along the model gets updated. This is operating real time. It's only one data stream and only data from this person. But the goal. Oh, this is a probabilistic model too. So you can see an expected value and a range. The goal is to identify outline events that might represent a step change in physiological state. So here this has happened the purple data reflect heart rate measurements that are well outside the range that is predicted by this model. So this algorithm then generates an alert and brings clinicians to the bedside to do a more complete evaluation. So the, again, this is something that clinical care medicine is thinking a lot about as a clinical decision support technology that brings in instrumented data streams and labels them according to in this case again we're just looking for outliers for this particular person. But you could bring in other data streams to improve this model or if you apply this framework to multiple individuals, then you can find the time series templates that reflect danger for this patient or potential for an adverse event and improve the model for everyone as a result. So it's an example of being able to pull in wide data as well as deep data. Okay, so that's enough of the clinical examples that I want to bring us to hearing and my thinking about how we can apply these principles today. And it's because we have for many behavioral questions of interest. We have a built in construct that's well established that reflects these PM IOs, and that is the psychometric curve so we know how to map task difficulty to task performance. So there's a great deal of theory and decades of empirical results behind this. These psychometric curves represent the probability of getting the task correct as the task gets harder. And they are not observable, they are latent constructs that we cannot measure just like our example of mass and physics. We can only measure task performance that's reflected in these black dots. The theory is great here, but the major limitation is that to estimate one of these requires generally hundreds of samples and is traditionally impractical in most situations. What we do perform, because it's so useful to have this information are reduced estimation procedures. I'm showing here three different staircase designs to try to estimate the threshold one point on the psychometric curve. So here's the psychometric curve staircase designs make the task easier harder depending on recent performance, and they can converge onto an estimate of this threshold value at some particular tolerance. So it's a great method they're robust they're efficient. I believe something to be desired when we're thinking about expanding to more complex models and I can illustrate that once we conceptualize a more complex model in this case the audiogram. I've spent a lot of effort fleshing out these ideas on the audiogram because it is a more complex model but only slightly more complex than a standard one dimensional psychometric curve. So here we've got a staircase at one kilohertz again the audiogram measures tone detection threshold as a function of frequency so it's got two input variables frequency and sound level. Once I achieve this estimate at the end of the staircase, and I think about a staircase at another location, I just don't have many degrees of freedom to use the information I already have to improve this estimate. I start with the same kind of staircase and the same termination criteria again. I can conceptualize this a little more clearly is to look at the audiogram and its natural variables. So frequency across this axis, sound level on this axis, every frequency has a threshold you can see that kind of reflected in this plot here. This is a screenshot of two different tests superimposed on each other will see the movies that generated the screenshot in the next slide. For a tone at one kilohertz and get a response from this person. I know more about their hearing at one kilohertz than any other frequency. I don't know much but relatively speaking I know more at one kilohertz than anywhere else. So the last place I'd like to go measure next would be one kilohertz, but a standard staircase design applied in this format does exactly that. And active machine learning procedures can be designed to optimize data acquisition across this entire space. And what that does then is it makes a more efficient yet complete model. So you can see this here it's the same person's ear, a conventional audiogram on the left, a machine learning audiogram on the right, will play these forward in lockstep. The machine learning audiogram on the right is roving these tones. It's fuzzy at first because I have uninformative prior beliefs. It's a Bayesian method. But as I acquire more data, I get a sharper boundary between where I can perform the task and where I can't perform the task or where the subject can or can't perform the task. So a lot of times I'm quantifying the efficiency, but the point I really want to make here is that this is a PMIO. If I'm not convinced at this slope or I'm not convinced at this region maybe as a clinician or researcher, because there's not enough data there in my view, we can validate right on the spot because this is a fully predictive model, every possible task that we could deliver in this case. I have an estimate of what I expect this person would be able to do with that information. Or we can acquire more data to extend the models as well. So the reason that are one of the reasons that these PMIO frameworks is a valuable is it allows us to extend these models without compromising other features of the estimator so here's an example of what we are calling the polyconjoint bilateral audiogram. So we've now extended the model to include the other ear so instead of a two dimensional model it's now four dimensional model. We're only delivering sounds to one ear or the other. So it's not. We're not delivering sounds to two ears concurrently, but we are updating the model of both ears at each time point. And you can see what this looks like here. Again, each ear gets a tone or not, and the model is updated in each each year simultaneously. So the value of doing that. One value of doing it is this so across the population, these disjoint estimates meaning sequential audiograms one year at a time. They consistently take longer to converge their final estimates than this conjoined estimator. So again, this is, it's two, two dimensional models, sequentially, but by judiciously exploiting shared variance among our, our input variables. We can rapidly converge by incorporating both ears into admittedly more complex model, but it's, we're able to achieve greater efficiencies. And also more importantly again for the, the topic of this symposium is that we can extend these models even further to include confounding variables. Here's an example of a cohort of individuals with one dead ear, six individuals, I believe, where a transducer sounds are delivered pure tones to the transducer over the dead ear and detection thresholds are evaluated. And if you're detecting the sound delivery in this configuration, then you're detecting with the other ear. So this is measuring cross hearing, or the frequency dependent inter oral attenuation of these two different transducers. So there's variation among even these six people. And there's variation in both individuals, but also how the transducers are set, etc. So there's a potentially confounding factor if someone has fairly asymmetric hearing of this inter oral attenuation or this cross hearing and conventional methods of dealing with this or to add contralateral So I'm asking if there's a risk of, of cross here, or there's a risk of difference in thresholds between the two ears of being greater than 40 dB. So we've done, as we've extended our bilateral audiogram and include added more higher dimensionality to include this cross hearing into oral attenuation piece. And we're measuring both ears simultaneously. This region in this particular person represents tones that if they were delivered without contralateral masking would be detected by the other ear because of cross hearing. And we're able to show that by dynamically masking in real time, we can eliminate that cross hearing. So to eliminate that confounding variable. And then just to summarize the results here of this particular study in this population. There's a range of hearing loss and a range of asymmetries in this population. The average of all the conventional audiograms conducted by an audiology student. So around 14 minutes or so. That includes the individuals who have clinically indicated masking. If you look at all of the average time over all of the machine learning audiometry procedures, including the high asymmetry individuals are time of acquisition only about six minutes. And then we're hitting all of our target accuracy metrics as well. So this is my last slide, I just want to finish on the note that we haven't sacrificed time acquisition for bringing in more and more variables in this case. And I'm extremely optimistic that our generally our goals to leave the clinic or leave the lab and look for more ecologically valid stimuli scenarios locations to interact with our patients and our research participants that it's practical. That's the main argument I want to make it's practical to adopt modern machine learning methods to pull in more complex data streams and still in reasonable amounts of time bill predictive models of these individuals. If we are skeptical about the results we're getting with this kind of inferential framework will be able to test in real time, whether we got it within the tolerance that we're looking for or not. And the last thing I'll mention for transitioning is that the data shown here we're all collected from a website. So it's very straightforward to generalize out into remote data collection. And I think that is a good segue to our next speaker. So our next speaker is Josh McDermott from MIT. And he will be speaking about why we shouldn't be afraid of online psycho acoustics, and I'll remind everyone if you have questions feel free to place them in the comments section of the associated webpage whose link will will read a post in the chat. I'd like to thank Dennis and Jan Willem for organizing this symposium and inviting me to speak. I'm sorry that we're not doing this in person this is normally one of my favorite times of the year. But hopefully next year. That's what we'll be doing. So today I'm going to talk to you why about why in the meantime we shouldn't be afraid of online psycho acoustics. So, most people are probably aware that over the past 10 to 15 years crowd sourcing services have become available to connect online workers to jobs by the best known example of this is Amazon Mechanical Turk. So if you're looking for work you can register with one of these services and if you've got jobs that you want people to do you can register. And then you post a job, the workers that want to do it can sign up for the job and do it and then you pay them through the service. So it's been used across a whole bunch of different industries, but it's also been adopted in many areas of psychology so if you've got an experiment that you want people to do, you can post it, they can do it and you can pay them. So it enables large samples that can be collected relatively quickly and so it's become very popular in a lot of areas of research, but it's less widely used in hearing research. I think due to understandable concerns about the lack of control over sound quality, maybe also over compliance with instructions. So certainly I think for some kinds of experiments. This is probably not a great option, but there are also a lot of benefits to running experiments online, especially when in person experience experiments are limited. I think Dennis and Jan will ask me to speak in this symposium because our lab has been using online experiments pretty extensively for the last seven years. And the overall conclusion that we've drawn from this experience is that for a pretty wide range of experiments online data collection yields similar results to what you get in lab, provided that you take some modest steps to ensure data quality. And it also facilitates experiments that would be practically challenging to run in the lab. So the short talk I'm going to give today is to first share with you some suggestions for obtaining good online data quality, and then give you a couple examples of how we've leveraged online experiments for our research to get you excited about the possibilities. So our approach for maximizing data quality has got kind of three components to it. The first which is very standard is to restrict participation to workers that have high approval rates. If somebody does a job, and it's your job then you approve them after they've done it saying that they did what they were supposed to do. And so if you restrict your experiments that people that have approval rates of say 95%. That can be a good starting point. But we also use a headphone check pre experiment to help ensure sound quality and basic compliance. Another thing that we usually do is to build in enough trials that a fraction of the data can be used to exclude poorly performing participants without double dipping. All right, so most of us are used to working in nice normal lab conditions we've got our sound boost we've got calibrated headphones we pride ourselves over controlling exactly what goes into the participants years. And really, if you're going to run your experiment online you're going to sacrifice some of that. But what we might hope to attain is a situation like this where you have a person who's listening attentively over headphones. But what could happen and what we would like to avoid is a situation like this, where you got a person who's straining to hear overcoming laptop speakers, or maybe even this where you don't actually have a person at all but you have a bot that's a part of some fraud scheme. All right, so we thought that one thing that we could potentially do to help avoid these bottom two scenarios is to use a test that would help confirm that we're dealing with a human that is wearing headphones. So Kevin Woods was a PhD student in our lab who got interested in running online experiments, and he along with a couple other lab members had the idea that they might be able to leverage anti phase tones for this purpose. The tones are frequencies that are 180 degrees out of phase in the left and the right channel. And the idea is that if you are playing this tone over speakers say from a laptop. Well then the two frequencies are going to cancel partially in the air, reducing the amplitude of the sound. And so Kevin wanted to use this phenomenon to design an experiment that would be easy to do overhead phones, but not over laptop speakers. So we came up with a pretty short six trial experiment for this purpose. So in every trial of the experiment, there are three intervals, each one contains a 200 Hertz pure tone. And one of the tone is 60 be softer than the other two tones and the task of the participant is on every trial to say which of the three tones is quietest. All right. And so that the sneaky thing that we do without telling anybody is that one of the other two tones is anti phase. The notion is that if people are listening to the stimuli over speakers, then the anti phase tone will be the quietest one for the participant and so they will systematically give the wrong answer, whereas if they're wearing headphones. The anti phase tone will sound pretty normal maybe was slightly funny by neural effects, and they'll be able to get the trial correct. So that was a theory. And then Kevin did some experience in the lab to try to confirm that this would work as intended. So he ran two groups of participants, one where we knew they were wearing headphones and the other where we knew they were listening over loudspeakers. And this is a summary graph of those results so this is plotting a number of trials correct people got under the two sets of listening conditions and you can see that there's a big peak here at six for the people that are wearing headphones. And that means when you're wearing headphones a task is really easy, which is what's intended. But you can also see for the ones that were listening over loudspeakers, they tend to get zero or one trials correct so they're systematically below chance. Right. So, again, we're able to distinguish under control conditions between these two groups of participants. So what Kevin did is he ran this experiment on a large number of people using mechanical turf. And all these participants were instructed to wear headphones right they have to click a button saying yeah I'm wearing headphones. And these are the results from 5000 people. And so you can see there's a big peak in the histogram at six trials which means there's a good chunk of people that are doing what they're supposed to do, but you can also see there's a second mode of the distribution down here at zero. And we think really the only explanation for this given a whole bunch of control experiments that Kevin ran is that these people are disregarding the structures to wear the headphones right so that's why they're performing below chance. So we have seen time and time again that under normal operating conditions about a third of participants generally fail the headphone check so they get booted out of the experiment they don't continue on to the main experiment which you actually want to run. So some of these people are clearly disregarding the instructions to wear headphones others may simply not be paying attention, but in both cases it seems like a pretty good idea to screen them out. So we've got code available to run this headphone check experiment so you can use it if you want and lots of people have done so successfully. So the good news is that if you do this, we typically find that this brief pre experiment on its own is often enough to actually give you results that are like what you would get in the lab. So one example from the work of Melinda McPherson, she's got a post Brown Wednesday that's got some more data along these lines. I'm here, this graph is showing a replication of tone and noise detection thresholds online and with a comparison to in lab data. So the thresholds were measured with an adaptive procedure total duration experiment was about an hour and really the only screening procedure that's employed here is that headphone checks these are all the people who passed the headphone check. She measured detection thresholds for complex tones on the left and pure tones on the right, the dark blue and the black is online data. And you can see that the mean performance is comparable online in in lab you can also see the distribution is a little bit wider online than in lab, but overall looks pretty similar and that was encouraging. We've also managed to reproduce in lab pitch discrimination thresholds so this is a result from a recent paper of Melinda shoes trying to characterize the representations that underlie pitch discrimination. So the key this study was to compare discrimination for normal harmonic tones and then in harmonic tones the idea, the idea being that the in harmonic tones don't have a coherent fundamental frequency. And so their discrimination must be based on something else. So this is a graph that plots discrimination thresholds for the harmonic and in harmonic tones with and without a three second delay separating the tones. And this is the result. And then the first thing that I want to just draw your attention to is that in the kind of the standard condition where there isn't a delay between the tones. We're getting discrimination thresholds that are kind of on par with what you would get from good, good participants in the lab, right so pitch discrimination thresholds and usually about a percent. But this experiment was actually pretty long and grueling relative to a lot of the stuff that we run so there was a lot of experiments there were these conditions where there were delays between the tones you really had to pay attention. And I think over the course of two hours the odds are that an online participant is going to occasionally have some distractions. And so we found that in order to match in lab thresholds, we had to discard about two thirds of the participants. All right, and the key to actually making this work is that Melinda made for threshold measurements per condition, and used one of those to determine who to actually include in the analysis, right and that left her with three threshold measurements, which went into the graph here so you get an unbiased measurement of the thresholds of the people that you include. All right so that is often a really valuable thing to include this, this extra data that you can use to determine inclusion. The specific findings of the experiment are twofold one, when there's not a delay between the tones the thresholds for these two types of tones are comparable, but with a delay performance is substantially better when the tones are harmonic, suggesting that there are some memory representation that's specific to harmonic tones. And so we hypothesized that there might be two different representations that people can rely on when doing pitch discrimination. There might be one representation of the spectrum when there isn't a delay between tones and then a representation of the ups zero when there is a delay. And so to substantiate this Melinda took advantage of the fact that we got pretty big sample size online 164 people, and that enables the analysis of individual differences. And that's what's shown here so this graph is plotting the correlation between the thresholds on pairs of conditions across participants. One of the logical individual differences is that if the correlation is high between two conditions that's an indication that people are using similar representations in those two conditions. So we've got the harmonic and the unharmonic conditions that we just talked about. And then in addition, Melinda included an interleaved harmonic condition. So in this condition the two tones that are being impaired, don't have any harmonics in common. And so what she found is that in order to make a comparison between those tones the only real way to do the task is to extract the f zero and and use the f zero to make that comparison. All right, so we got one condition where you can't use the f zero by hypothesis and one where you would have to. So these are the results. And so what she found is that the correlation between the harmonic and the unharmonic conditions as high when there is not a delay between sound suggesting that you're using the same representation. between two conditions, but decreases when there's a time interval between the two sounds, whereas for the harmonic in the interleaved harmonic condition you see the opposite thing. Right, so the correlation is low when there isn't a delay between sounds and then becomes high when there is a delay. So we think this is implicating two different pitch representations depending on the time delay between sounds, representation of the spectrum and then representation of the f zero. That would have been something that would have been hard to pull off in the lab just due to the number of participants. But you might look at this and say all right well this looks a little too good to be true and I would say all right well then let's replicate it. So that's another thing that the online format really makes easy. So we had another set of 200 participants that did a very similar experiment and you can see that you get kind of a similar result. Okay. And so, importantly, like instead of this sort of taking a year of the experimenters life, this was like another week or two of data collection. And another thing that the large and you can get online enables is what we call one shot experiments. These are experiments where you really want to have the participant do the experiment wants to do a condition only once because once they've done it they're kind of spoiled in some way. All right, so in this case, the scientific question we were trying to answer is whether voice recognition depends in part on pitch. And so a natural way to investigate this is to run an experiment where you present voices that are familiar to your participants and then shift the pitch of the person down and measure the effect on recognition. Now the challenges here are that the set of voices that most people can recognize is actually not that big. And then if you're doing the experiment once you recognize one of the voices, there's a very high likelihood of priming on subsequent trials. And so the solution that Melinda came up with in this case was to run the experiment online to get a large sample size, and then to use a design where each participant would hear each voice only once at one of the many pitch shifts. So in under this design of course from every participant you don't get that much data. And so you really need a big sample size in order to have this be well powered. We can do this online. You can again see we've got 250 participants. This graph plots percent correct as a function of the pitch shift so zero is like the natural pitch. And she got this beautiful result, which is that as the pitch deviates from the true pitch of the person's voice, the recognizability of the voice declines pretty rapidly. So it's pretty nice evidence that the fundamental frequency is a really important aspect of voice representations. One of the lessons that we have learned from our online adventures. One is that the subject pool is large, but it's not that large it's not, it's not as large as I actually had thought it might be when we were getting started in this we think it's maybe a couple thousand at any one time given the various restrictions that we place on participation. It refreshes every few months. So that's good. But if you're you're planning a very large and study, you still need to plan pretty carefully so you can very easily find yourself in a situation where you run 2000 participants, you realize like you screwed something up about the experiment, and then you can't just immediately go and get another fresh 2000 participants and we have found ourselves in the situation. Participant quality can also vary over time. So there was a pretty notable increase in fraud over the summer. There were a lot of pandemic related issues that we saw that our headphone check pass rate actually dropped to 25% at one point. It's since kind of recovered but this does underscore the need for screening procedures. But we have had a lot of success using these methods this is just a list of some of the tasks we've run successfully online and when I say we've run successfully I mean that we have replicated in lab results so I told you about detection and noise pitch discrimination, we've also successfully run fusion of musical intervals judgments of illusory texture continuity environmental sound recognition, a ton of tracking streaming of melodies and speech intelligibility and noise. So just to summarize, don't be afraid of online psycho acoustics brief pre tests can help ensure headphone use and basic compliance with instructions performance based inclusion with independent data can be important for long and grueling experiments which we often like to run in psycho acoustics data collection is quick and online experiments enable large and this can facilitate the study of individual differences and enables one shot experiments with few trials for per participant. And it facilitates replication so even if you've got an experiment that you're nervous about running online, you can at least try to replicate it online which might be useful. So I want to acknowledge the fantastic people in the lab that have led all this work. Kevin Woods got a started down the road of online experiments millennium first and as our current online guru, Richard McWalter and Jared Hicks as a poster at arrow also do a lot of this, and many other students have contributed in important ways, as have our funding sources. So I'll leave it at that thank you. Great. So moving along. I'd like to introduce duet Swinnapol from University of Pretoria, and he'll be speaking on mobile technologies for remote diagnosis of hearing loss. Great. Thank you, Dennis. I assume everyone can see my slide. I'd like to just thank all the organizers and especially Dennis, Jan Willem and Dave, who are chairing this session to give me the opportunity to share a little bit of the work we've done, kind of continuing along the lines of online testing and remote services but perhaps a little bit more translational and how these can be used in a clinical setting. I want to acknowledge the team of individuals who've been involved with the work I'll be sharing with you today. I'm Karina de Souza, who is the PhD student and then the valued co investigators David Moore, Cosmets, and also Armand Myberg from the University of Pretoria. Also just acknowledge some of the funding sources that have supported this work, and also just acknowledging a disclosure that I'm a co founder and a scientific advisor to the year X group. Maybe just to start off, it may be worthwhile to make a couple of observations on COVID and hearing testing in general, and these are just some observations I'd like to put up this picture of traditional hearing testing because I think suddenly now a year later, this situation almost seems untenable to be able to sit in a confined space with a patient without any personal protection. And the traditional test setup for audiology or hearing testing is just a high risk setup. It's confined spaces, lots of equipment, and, and it's also oftentimes long duration sessions that we have with with these patients. It's also true that the typical audiology patient is a high risk or hearing test patient is a high risk patient due to the fact that they're most likely to have age related hearing loss. They're more likely to be males, and both of those things are, are very big risks for COVID mortality and mobility. But I think some other observations that have certainly transpired over the last year is that there's been a number of changes in perceptions both in consumers and professionals so a large consumer survey done conducted in the UK and the US by the Ericsson mobility report indicated that six in 10 consumers actually predict now that virtual appointments with their physician will become more popular than face to face appointments. Professionals have had similar shifts in their opinions. A survey we conducted with the International Society of Audiology indicated for audiologists around the world that there's been a shift from pre COVID 44% of audiologists viewed or telehealth as important or very important to their practice, and that's now jumped to 87%. So certainly a big shift due to COVID. And another change that we've seen is that telehealth regulations have now become more enabling so the world has become more accepting of these for virtual and remote services. So that means that we can really rethink the traditional pathways of care for hearing health care and audiology. And, and I think the remote online world offers some alternatives that that we'd like to put forward in this presentation. The traditional clinic for persons who have hearing loss and I'm focusing on adults in particular, the aim has been to determine the degree, the configuration and the type of hearing loss at the clinic. And the degree and configuration typically is through air conduction audiometry and the type of hearing loss through bone conduction audiometry and that's also why a sound treated environment is so important. But if we think in the world we live in now and utilizing remote types of testing, I think a triage approach where we differentiate patients who need different types of care may be a way to actually optimize the way in which we provide services. So here the aim might be that we want to determine all the type of hearing loss remotely. So in other words, what is the likelihood that they have a sensory neural hearing loss or a conductive or ear disease related hearing loss. And those have two very different care pathways linked to them. And we could triage them accordingly. We'll know for example that sensory neural hearing loss in adults could be assessed in alternative settings that are low and no touch for example, and the vast majority of the adult patients we're going to see are going to be in this category. So if we can optimize care for the majority of them, then we can escalate those who have conductive hearing loss or ear disease or complicated types of conditions to higher touch and traditional clinic settings. So if we think of triage firstly with remote types of testing, I'm going to be sharing a little bit around the digits in noise test examples that we've been working on. So this may be in the format of a web based widget and the link that has been based on pasted on to the chat box is to one of these web based widgets, but it could also be a smartphone application and I provide two of the examples you're the one is the national South Africa year Z and the other is the year who application that provides a free downloadable hearing test using the digits in noise paradigm. So the question is, can we triage care path pathways based on firstly, then screening approaches. And secondly, can we triage through low and no touch types of assessments that may be alternative to traditional test settings. So if we consider triage with remote screening. So these are two paradigms that we've been using in terms of our digits in noise tests and our widgets online and the applications. The first is a dioptic digits in noise test. This is an in phase presentation of the signal, by normally to both ears, and we've shown that it's a rapid test of better ear functioning, mostly representative of better ear functioning. This is insensitive to conductive hearing loss, and it's also insensitive to unilateral hearing losses and you can see that illustrated on the top figure there as well. We then subsequently use an anti phasic binaural digits in noise test paradigm where the signals were out of phase 180 degrees. This showed up to be a rapid test of poor ear functioning. It improved our sensitivity to detect sensory neural hearing loss, and it was sensitive to asymmetric or symmetric hearing losses and it was sensitive to detect conductive hearing loss. So with that information, we constructed a two stage screening protocol based on the data we had. The first screen is with the anti phasic digits in noise test, and if we use a speech recognition threshold cut off to determine normal hearing those who passed are classified with normal hearing If they fail the anti phasic then they are then triaged to do a dioptic digits in noise test, and based on the dioptic test whether they pass or fail that subsequently, we predicted that we should be able to classify with reasonable hearing loss into conductive hearing loss or unilateral hearing loss or into bilateral sensory neural hearing loss and in this way, through a remote screening, be able to provide some kind of triage for further care for these patients. So these are some of the preliminary results just been submitted for review. You can see here, we have the dioptic on the Y axis and the anti phasic SRT on the X axis and we've just presenting two ways in which we're classifying three types of hearing loss, three types of hearing abilities and normal hearing unilateral sensory neural and conductive hearing loss and bilateral sensory neural hearing loss. So the first is just a very simple division for normal hearing and then into the bilateral and the unilateral sensory neural and conductive hearing loss group. And then we also used a tiered approach on the right hand side where we got slightly better accuracy to divide into these cases. So 89% accurate to identify normal hearing, 71% to identify bilateral sensory neural hearing loss and a 60% accuracy to detect unilateral and conductive hearing loss. So there's certainly some more work that needs to be done there, but it's promising that we can with these combination of tests at least with a reasonable degree of accuracy start to triage these patients and of course those with unilateral and conductive hearing losses would be candidates for our high touch type of approach in traditional test settings or medical assessments. If we then come to kind of a more diagnostic test approach where we want to triage for low and no touch testing. There is also a need to consider alternative service delivery models according to the required level of care, the risk factors of these patients, and also the cost the travel and the convenience factor that may also coming to play. So it's about differentiating patients perhaps in unconventional or untraditional settings and those who need further care can then be escalated to those pathways. And this is just a simple illustration of a patient who may come into a clinic, but just at the counter side get a home based test kit, or they may do a counter side assessment which is very quick. And then the pathway could be directed from there. So there are different approaches one way is to use risk questionnaires like the consumer ear disease risk assessment. Or, in our case, to use rapid air conduction tests to differentiate these patients. So these are just findings from a recent report of ours where we used air conduction testing. Did our digits in noise test the dioptic version, and we used pure tone air conduction or geometry, both of which can be utilized in non traditional settings outside of clinics and sound booths. And in this instance, you can see how we differentiated sensory neural earring losses, which is the red dots and conductive earring losses which are the triangles, very accurately using this model. So if we utilized a linear regression, we had a sensitivity of 97% and 93% specificity to correctly differentiate sensory neural from conductive hearing losses. So a very accurate way in the absence of bone conduction to differentiate patients who may need further care. So this has been implemented in self test kits for COVID-19, where there's a tablet, a calibrated pair of headphones, and a self testing sequence where patients can do air conduction, pure tone or geometry through these calibrated bone, or at a counter side or drive through even, and then they do a speechy noise which is the binaural DIN test, and then the risk assessment automatically classifies based on those test results and some demographics, what their conductive loss risk is. And then we can of course also screen for asymmetric screening asymmetric hearing loss, and we can use red flag questions and an optional CEDRA questionnaire. There's also an additional optional where we can add a digital autoscopy that utilizes machine learning algorithm to classify the panic membrane images into different diagnostic categories. So in conclusion, COVID-19 certainly has challenged the traditional illogical care pathways, but it's also allowed us to innovate. So digital screening could support initial triage to direct referrals and to optimize our care. No and low touch care could certainly mitigate the risk, improve safety and convenience, and could work for most adult patients with hearing loss. And using these novel tests to triage for ear disease and conductive hearing loss can allow us then to test outside of traditional clinic settings. So we have Jan Willem Wassmann from Rudboud University, Neimechen Medical Center speaking on hearing assessments freed from time and location constraints. Thank you tennis and good morning everybody. Good afternoon. I'd like to talk about how you can monitor remotely patients now and also in the future. I'll ask I'll start with a more broader picture of the emerging capabilities. I think that's in the theme of this symposium. And then I'll report on a project we're running now at our hospital at home testing in cochlear health users together with our team when the Lucas and the ring. And if you look at a picture of, can you hear me major Tom, I think it's a nice example of a listening check and maybe also a check of the whole system. I have from Houston more or less to the astronauts. I like this example because I believe if you're able to perform telehealth in space, you can do it anywhere and, and you can use it, of course, at places that are maybe not easy to reach. Well, this is a virtual conference this time and I am a morning person so well this is my afternoon the best moment of my day already passed. I think there's also different things that we should take into account when we're using remote care for instance. I assume that the center of gravity of this audience is probably somewhere in the United States maybe near Missouri, but the perceived distance from you and me will depend more on my distance to the webcam for instance and probably my accent maybe local habits. And these are all things that we should take into account if we are interacting with subjects or patients when we do these tests tests. I would also like to use this opportunity to thank our clinical team, we have over the last year, enrolled a lot of new technologies to provide care to our patients, and also the cochlear team for technical support so that we were able to do this in cochlear implant patients. And I think there's an emerging capability in the sense that we can do measurements right at home in the house of a patient for instance, and what's interesting then is, are we able to do measurements that can be generalized to real life settings. For instance, this person at the coffee table is able to communicate well with his friends and family, and what should we measure to get an ID. And another thing I would like to emphasize is that we call this remote testing or remote measurements, but I think from the perspective of the patients. These are nearby measurements. And I would like to challenge the audience for suggestions of how can we better describe this maybe in a patient centric way so if you have suggestions put them on the on the web page. I'm looking forward to the discussion. I would like to look on to what are ecologically valid measurements in hearing science. Well, it's measurements that reflect real life hearing related function activity or participation. And a glance on this picture shows me that this kid is able to actively participate in this classroom looks like he's doing well. And these are, I think, performance indicators that we are interested in to know. We can measure of course a lot more than in the past and here I'd like to use an example from sports and with the variables that we can use today. I took cycling because I think the differences between the lab conditions and when you're cycling in a simulator, or when you are cycling outdoors are small. And to the right while you see the ball slants Armstrong so I put them in the center. And let's imagine he's a patient and instead of a coach. He has a care provider or audiologist who's counseling who's helping this patient. He has a team around him. In this case maybe friends, family, colleagues. And yeah what you want to know is, how does somebody perform here in real life and what are then the important measures to take, and how do the, do they generalize maybe from the lab conditions. So if we are tapping into all these variables. Yeah, then we have to think about how to manage all these data streams across domains for we can of course focus on hearing tests, but maybe while somebody is performing a test heart rate might be an interesting question to know if people are really dedicated to the task or they might need a different interface due to a vision impairment, or there are other important domains for their health that interacts with how they function in in life. If you're able to collect these data streams it also opens up new opportunities for instance to explore the relationship between sound and environmental health. And in a recent paper, then as I explained how we think we could deal with the added complexity if you look into all these data streams. Unfortunately, unfortunately I don't have the time to go into the details I'll continue to one last important aspect I think is that not only we can measure across domains but we can also test at the time that the patient can first and maybe do multiple tests, and then a local clinician somebody accessible for the patient can look at the test maybe provide counseling, or maybe administer new tests to get on a better ID. And while this data is collected somebody on the background so we call it here remote experts that could be depending on the case. An expert in auto genetics or it could be a data analyst who's observing trends and maybe provide better advisor diagnostic for this particular case or maybe finds ways to improve diagnostics in general or therapy. So now let's go to the auditory athletes of the project we are running. Here you see pictures of adults who are doing the tests at home that we are collecting data. So far we have anecdotal evidence that many people with cochlear implants they report to me that they're really exhausted at the end of the day or they need time for instance when they receive a phone call to tune in a couple of seconds and looks like they are using a lot of cognitive resources and maybe top down strategies where normal listeners would use the fast brain and for them it would be much easier to follow conversations. But we don't know actually what how interventions like a cochlear implant or hearing aids are affecting for instance the effect of fatigue. And when we're not testing in in a sound booth but at home. There might be other effect factors affecting the test outcome we cannot control for everything that happens in somebody's house and might be effects like circadian rhythm, although that we know that during office hours. Those effects are not so relevant but maybe if you're measuring in the middle of the night because a patient is stressed. Yeah, it might affect the outcome or of course cognitive resources of people are doing other things while they're doing a test or if they are fatigued or if they don't have attention for the test that might all cause new variability in the test we do and with the project we are now carrying out we want to get a better idea of what variability can we expect, can we maybe increase test accuracy. And is it possible to see trends that you cannot see if you assess a patient only once a year but that you could see if you measure a couple of times a week or when you measure when you think it's needed. And what we do know is that body core body temperature fluctuates during the day and it's thought that it's also associated with your performance, your cognitive and physical performance. And, for instance, Olympic records are mostly broken at the end of the morning when people are three hours awake so of course we hope that most of the auditory tasks you perform during the day are not record breaking but it gives some insights of how performance is fluctuated during the day. Also for this project we are running we are interested in what is the long term fatigue had a chronic fatigue of, of subjects. And what we do is we give them the checklist individual strength which is a survey consisting of 20 questions, and it's reporting the subjective fatigue, concentration, motivation and physical activity of people. An example question is that they have to reply on a fast scale if they feel physically exhausted, and they need to report over their state in the last two weeks. We have some reference data of healthy persons, but also some groups of patients with chronic complaints of chronic fatigue. And while these course, the reference course can be used to assess risk of, for instance, a burnout if you look at the subscale of subjective fatigue. I don't know exactly what the effects are of age or now, for instance, the lockdown and COVID-19 with, of course, changes to routine of people and might also affect these outcomes. For our study protocol. What we do is we administer nine hearing test batteries actually within eight days, and we assume that the peak and performance fluctuates during the day. And to exaggerate the effects we have chosen three times to administer the test. One hour between within one hour of awakening. The next, the noon test is when people are at least three hours awake. And the night test is within one or two hours before sleeping. The schedule and day one is always on Monday. So people run through this schedule always at Monday to do a complete test and also fill out the survey. And then the next day on Tuesday, they have two measurements in the morning so that we can look into test retest within a session, but also by comparing it to measurements on other days in the morning. We can look on what are the effects of test retest between sessions. And by choosing different moments during the day we can see if circadian rhythm has effects. And what I said before that we know that's not really an effect within office hours, but well that's for maybe people with a hearing impairment that's not so severe as in the case of cochlear implant users. So these are some of the assumptions why we think it's worthwhile to do this project. We do repeated measures ANOVA for looking to statistical effects and will include 50 subjects. And the test battery consists of as I told you before the assist survey and the remote check the remote check is monitoring check developed by cochlear, you need an iOS device on iPhone or tablets, then you can take a photo of the implant site. And also fill out two questionnaires. One questionnaire is including questions from the SSQ. Another questionnaire is more collecting questions like how was your hearing in the last three months. Are there any unpleasant sounds that you experience. And there's three tests that are performed. Freefield audiogram. This is done by streaming sound from the device to the processor of the CR recipient speech and noise test, which is the dinner so maybe some of the people in the audience are now familiar with this test and an impedance check. So when we repeat the measurements we only do the three tests, people don't need to fill out the questionnaires and at every measurement we ask people to rate their subjective states at that time so they have to answer for instance, how motivated are you to do this test and after the test they have to answer how they rate their listening efforts, and also if they wanted to give up, because we think that's a proxy for listening efforts. So what have we learned so far from this project. We see that multiple tests a day is possible for this group. We can also do this assessment totally virtual by also starting with instructions via email. We used WhatsApp a lot for giving instructions. It's very sensitive to measure chronic fatigue. And what I want to share is some constraints that we are experiencing. For instance, digital proficiency is really important. One example is that one subject had problems, couldn't get through the test and I asked, well, could you maybe make a screenshot and share this with me and then I got a question, what's the screenshot. Let's explain this and with some patients, I think it's all possible people can learn this of course and you really have to see sometimes through the eyes of the subject what they are doing to see what's going wrong. But that means we also learn a lot and I think this helps also to alleviate some of the technical constraints. So another constraint is that we can only use iOS devices. And I hope that in the future. So far we have test in a sound booth and that's the golden gold standards, but if every brand or company would have his own standards on how to stream test. It will be hard to compare of course between different groups of subjects so I think that will be things to consider in the future. So that's my talk on this project so far. If you have projects you want to share. Well, you can use the website we're promoting. And I would like to announce of ready the second virtual conference of computational audiology that will be organized by Tobias Göring and his team from the Cambridge hearing group, and we'll follow soon with more announcements. Thank you for attention. So moving on to keep us on time. I'd like to introduce the next speaker, Ella Brian from iterative AI and University of Michigan School of Information, she'll be presenting on building infrastructure for machine learning and big data and hearing science. And so what I'm going to talk about is is really zooming out quite a bit from what we've heard today. So this is building infrastructure for machine learning and big data and hearing science and so what this means first I'll share a little bit about my background and how I got to this particular topic is that I did speech and hearing research I did my PhD in speech and hearing sciences at the University of Washington. And after that, instead of post talking I joined an open source software project that works on extending it version control which is a way of, you know, managing versions of software projects. I did data science and machine learning and in one week I'm joining the University of Michigan School of Information as a lecturer and research investigator. And something that was really shocking to me that I learned when I when I kind of transitioned into industry briefly was that data scientists don't spend their time on any of the activities that I really thought I really thought it was going to be all about you know using the experimental knowledge that you might have or the statistics, and the majority of it really seems to be in loading your data cleaning your data understanding your data. And this seems to be pretty consistent there's been a number of large scale surveys that you know are establishing this and this is across many types of organizations but where there's large data. Data is its own technique it really requires you know a specialized skill set for managing it. And I'm imagining if we were to, you know, spend time at a big data lab and by a lab using big data you know there's lots of ways that could be used but basically all the projects that we've heard about today have substantial streams of incoming data you know it's, it's beyond the scale that you can you know manually enter into a spreadsheet or that you want to be manually so you have real needs for quality control. You want some things to be automated you can't understand your data by looking at a spreadsheet anymore because it's going to be quite high dimensional and possibly too large to load in some cases. And so, you know, I'm the reason why this seems to take up so much time and which I think will become a kind of a just a natural law of working with data that all labs that go this way are going to have to contend with is that bigger data is it introduces complexity. And so the Netflix data science team. They have an open source project to around how they manage their data science flow. And they have this quote that I like from experience, we know that much of the pain related to operating modern machine system machine learning infrastructure is caused by the complexity of large scale distributed systems, and by distributed, I mean this in a very general sense of many computers, many team members many stakeholders and many parts. And so I want to step through, you know, a couple of the ways that complexity occurs in, you know, presumably the kind of work that we might be undertaking if we follow these methods. And, you know, part of it is that we've always had some complexity anytime you have a data set, you know, and you derive some analysis from the data set and then you publish the analyses. You have some, you know, complex dependencies you've still got to manage you know your the computers in your lab you've got to make sure that you have the software the hardware, but we can often get away with hackier solutions when our scale is small. So we can get away with you know we manually entered our data in a spreadsheet and we just copied over, you know, the results and we wrote it in a, you know, word document or we didn't version control things because of there's only a couple of us working on this and we have backups. And that you know you can do that quite a lot, you know, at small scale, but the larger your data gets the more computers the more you know platforms that are involved like whether you're doing mobile web development. And, you know, you have to their debt will accrue very quickly if this isn't kind of handled in some intentional way from the start. So, some forms of complexity. One is data provenance so an easy example is that there's, you know, in any lab ongoing data collection. So, for example, I recently I'm working on a project where we have new data rolling in every single day, and it can be thousands of opposite thousands of new measurements every day. So, in a few months or even a day my today's data set will actually be the old version. So if your data set is changing over time, at any point, the analysis that I do on that day has to be linked in some kind of immutable record to what data set gave me this analysis today otherwise in a couple days you have no idea what data set your analysis came from even if it's really from the same table. Another example, going beyond just adding data is that maybe you'll have some raw data, and then you'll do some pre processing. So maybe you will low pass filter, you know, some data, and then you'll aggregate measures within a subject, and then maybe you publish it and transforms your data in an entirely new way for their analysis. So this is an example of how a data set can be derived from a previous data set. So I could take data from a previous, you know, from like a public data set, and then transform it in some way. And so my new data set and what I make with that depends on the data I accessed, as well as how I transformed it. Another example is pipelines, if you're kind of more on the mathematical side you might call this a directed acyclic graph, and it's how a bunch of functions chain together to get you from raw data to a result and I really particularly see this in neuro imaging labs where, you know, you have some cleaning steps, but this will also just happen you know if you're taking large psychophysical data off of, you know, off of an app or m Turk, you're still going to have to go through like a quality control stage and aggregation stage, etc. And then maybe at the end you'll model and modeling could be as simple as you know you have a hypothesized model and you're going to test it, or it could be something like we're really doing, you know, a search over a large model space for the best model and maybe its prediction is the goal. But in all of these cases you've got a bunch of functions that are chaining together that transform your raw data to some kind of results. And in machine learning, especially when we're going for prediction, you will really often go over the cycle several times, this is kind of an iterative process you say at this didn't work so well maybe we've really got to include a different feature maybe we want to reduce dimensions first. So there's really an iterative process here. And so one solution for, you know, both of the two kind of complexities that I've shown you is version control version control is a tool for tracking changes in your work so for example it's like you'll just take a snapshot of what your work is at all these points in time. And so you can always go back to it without version control you this is this is what happens if you don't have version control you get all of the, you know, ending suffixes of final version two. So I've definitely been there version control systems get is the big one there's others and get hub which I see some of these labs are using is a really nice place that you can host get projects either privately or publicly, I had a pretty good time in the lab where I did my dissertation where it was a lab convention that we would all post all of our code on GitHub. But as I learned that was, I thought that was pretty good and it is a really good first step but it's nowhere near what it takes to actually reproduce all of the work of an experiment. And that's partially because of this other kind of complexity which is software dependencies. So, I know a lot of us have had the particular headache and heartache of when you make something in MATLAB and it's your version and then you send it to your colleague and then they actually have the older version of MATLAB and then it doesn't actually run out of the box, or maybe the paths are coded in a way that works for like Linux or Mac but not. The best of all the things I was summarizing is that we're in this kind of delicate balance of reproducibility and complexity complexity is that we have ideas that we want to test out really quickly. We have evolving codes and data sets we have software and hardware dependencies hardware dependencies for a hearing science lab can be quite substantial, especially in times when we're all going into the lab and it's, you know, all the equipment that you use it's your sound cards and any machines that you use it could be EEG eye tracking machines. But even, you know, even in web and mobile development for what kind of things that we're showing here they're still not, you know, they're still something. And the complexity of being able to explore multiple modeling approaches so kind of like in Dennis's talk about how we know we're really exploring a very high dimensional space of possible models with many possible predictors taking many possible possibilities. And on the other hand we have reproducibility which is the ability to repeat your own work but also to share it and give it to a colleague and enable them, you know, to basically run everything you did. So to be able to have them recreate the software environment the hardware environment so it's really trivial for them to take your code and your data and rerun it and make something new with it. And that goes hand in hand with ease of collaboration and you know I think the whole of science, you know, is still taking its first steps I think to getting there. Because it's still not trivial, you know, when we publish something it's still not trivial to really pass off everything I did and make a colleague, you know, able to understand every step and build on it. And there are so many tools out there and I thought at first about going in a very practical way here and kind of talking through some of the tools there's inversion control there's get and the team I work on data version control is just extending get a little bit for bigger files. Computing infrastructure, it's trivially easy to get compute now I don't want us it's trivially easy from a technological side but the interface is not that easy and part of why this is still hard I think for labs to get into is because you know it's not super easy to start using cloud computing there is a learning curve dependency management so packaging up all of the software that's needed to run your software. And there's tools for automated code and data testing to, but I don't think that the tools are really the problem, I think there are lots and lots of tools out there, and we could have workshops all day about you know which tool is right for you. How do we use them. I think that it's really an infrastructure problem and in part how we design labs and we design teams. So, you know this has been an ongoing like everything I'm talking about is a huge issue in every organization that works with data. It's, it's kind of an infrastructure question. So, part of it is, you know, how do we develop expertise within our own community so how do we make sure that there's a community of you know there are some people that really understand how do you build a mobile app for testing hearing, and they understand the science of psychophysics to, and that they can, you know, lead on that, you know, lead and grow that expertise so ensuring that we're investing in that approach, you know as almost the way that you might study like microscopy or you might study e e g maybe there's you know I think the complexity of developing these kind of tools is now at that stage where I think it can be its own postdoc or PhD. And some of the techniques here that I think are important are people who understand where cloud computing fits in web and mobile development and data engineering, long term research engineering roles. So almost every data science team at you know an organization that works with large data has an ops team to there are people who just support the engineering that goes into having large data sets. And some of this is shared resources so how do we make data sets, you know, public and shareable, or even if we're not going to make them all the way public but easy for collaborators and some kind of, you know, browser based, you know, just a way to host them in a way that people can access them. So how can we share our software tools and even computing power, and there's a couple scales that are worth thinking about so across labs that are collaborating departments and research hubs. And I think these kind of have to be community decisions about you know what are the priorities and what's most important to start, you know, creating some shared resources around. And also community design standards for data and model sharing so I'm seeing in particularly in neuro imaging communities because of that's they've had to contend with large data for quite a while. And that we're starting to see some standards come up about you know these are the fields that we're going to use when we store a data file from EEG or MRI. Anybody anywhere who works with this kind of data can take it and they don't have to like understand you know what is each column mean it's kind of standardized. And so that's an ongoing effort and you know we'll watch and see how that goes and maybe learn something from it. But you know if there's a kind of data format that's used across many labs, or many research groups this, you know getting into ways to share data and model and agreeing on conventions can reduce some of the complexity that individual labs and research groups have to handle. So, I guess I don't have any you know a practical solution other than to say this stuff is really hard. And I think I just want to say, we need to know about it at this point. And if we know about it now I think that'll make us a lot better equipped to handle it as we explore you know the scientific capabilities of the methods that we've seen today. So my questions are how will we embrace the emerging capabilities and hearing research that we've seen today and how will we incentivize doing it really well. And so this is my contact and happy to answer questions in the chat. Thank you so much L, especially for bouncing back so quickly from our technical difficulties. We're pretty much on time. I want to encourage everyone, especially young investigator students who are on the symposium to post your questions either in the zoom chat or in the accompanying webpage. I'll post that link again briefly. We're entering the last segment of the symposium. So I'll introduce the speaker. This is Dave Moore from Cincinnati Children's Hospital. So I'm going to present something here to finish off which is very orthogonal to several other talks we've heard this morning. And I'd like to thank everyone for sticking with us and I want to put a particular call out for people who are outside the Americas and having to work under very difficult time schedules for attending these sessions today. So my particular stick here is to talk about linking hearing and other data in large populations and so it draws on a lot of the big data issues which I was just mentioning there, and also remote testing issues that have been taken up by a couple of other speakers. The people I've listed here are just some of the people who have been involved in the work that I'm going to talk about today. I'll call some of them out specifically at relevant times. And across the top, we've got some of the funding organizations and two of the places where we've been doing this big data work, UK Biobank, which I'm particularly going to focus on in this talk. In 23andMe, which is kind of a marker, but we're about to have a data release from over 200,000 participants in 23andMe who've done an online hearing test. So with that appetizer, I'll move on to talk about my different things. So big data is a concept that's been around for roughly 20 years or so now. And was originally defined as large volume and wide variety of tests with high velocity input of data, as we've heard about. But more recently, it's gone over more into predictive behavior analytics where we're interested in correlating different aspects of function often across very different systems. So many of you will be familiar with the traditional surveys of hearing that are being done at a national level. For example, the NHANES study here in the US has provided data for a whole cornucopia of studies, particularly an auditory system. And I should mention the English Longitudinal Study of Aging, which is a much smaller sample size but one that's enabled some causal connections to be developed, for example, between cognition and hearing aid use is a very interesting paper by Pierce Dore and his colleagues. The Beaver Dam study in Wisconsin, which Karen Crook-Shankses and her colleagues have published a lot of work on. And finally, the Blue Mountain study in my native Australia, where, again, we've had a number of papers in the auditory domain. Now all of this was kind of blown out of the water somewhat by the human genome product which took place between 1990 and 2003 when the first blueprint was laid down for the human genome. And of course that raised the huge stimulating possibility that we might be able to combine hearing data with genome data and indeed that's something of what I'm going to be talking about today. So in UK Biobank, which at the time it was launched was the largest medical experiment ever done, did in fact give a series of tests and I'll talk more about UK Biobank in a moment, but it particularly combined hearing with genetics as has 23 and me and the US Veterans Program, which has up to a million participants in different aspects of the study. I've heard from a couple of the speakers this morning about continuous data gathering, which I think is kind of the new generation of these sorts of things and I particularly wanted to draw your attention to a really wonderful paper by Poppy Crum that was in the IEEE spectrum in 2019 and the reference for that is on the website that Jan Willam has mentioned and others have. Its title is, Hearables will monitor your brain and body to augment your life and using devices tucked into your ears to make technology more personal than ever before. And the particular aspect I wanted to mention that this of course is the relationship between hearing and other aspects of function. So turning to UK Biobank. It's a research resource for the whole world and it's important for listeners to this to know that, although this was collected in the UK. It's available for any researcher in the world. You can make an application. You can go to UK Biobank just Google it'll come up. And here we have a word cloud describing the sorts of connections that have been made with this resource. It consisted of over 500,000 middle aged people between the ages of 40 and 70 of whom I'm a participant as well as an investigator. And you can see this little figure down at the bottom left here, showing some of the different things that was looking at. And if you look up the top here, there's one pointing to the head saying cognitive function and hearing tests. And these were the only tests of neurological function which were actually where a quantitative data were collected from objective observations. And as I've outlined briefly, a real highlight of UK Biobank has been the genotyping data, which has been available for some time since July 17, and a new development just this year. I should say last year has been whole genome sequencing which by the end of this year should be available for 200,000 of these participants. So here's a bit more about it here. The hearing side of things which I'm really going to focus in on now consisted of a digits in noise test and Devet has introduced that briefly but I'll come back and say more about that in a moment. Look at this sample size here 185,000 people have taken this test, and also questions about hearing of which there were 10 in this case, and that's being taken between between 170 and 561,000 people. Those tests were all repeated on smaller samples about and up to 10 years after the baseline assessment and cognitive testing, which is particularly gaining the sentence within the auditory community has followed a similar trajectory and which can now be compared with the auditory data as we'll see shortly. And finally, I want to mention MRI, which is another new addition to the UK Biobank, which started about six years ago now, and thus far around 50,000 people have had follow on tests at two years after their initial MRI we've got data on about 3000 people. And at all of these test periods over a total duration of about 16 years now, we have repeat measures of hearing testing the questions about hearing and the cognitive testing. So we've really got a wonderful resource here. So I'd like to run you through very briefly, a paper that's also mentioned on the website by Helena Wells and her colleagues and this was a collaboration mainly between University of Manchester where I work part time and Kings College London, where Francis Williams and her colleagues were raised. Helena is a was a PhD student at the time. So here's the design of the study we've got the UK Biobank hearing difficulty data so these were simply two questions about hearing, do you have any difficulty with your hearing, and do you have difficulty following this if there's a background noise and the age was greater than 50 for this particular section and here's some of the results shown in a Manhattan plot and some of you will probably be unfamiliar with this sort of plot but essentially it's looking at all the chromosomes and where a particular gene goes above this red dotted line for either the hearing difficulty or using hearing aids. This is a significant result. And as you can see there are approximately 40 genes that were identified here, some of which were common to both hearing difficulty and the use of a hearing aid, and some were not. And of these 47 were completely novel gene discovery, which could be made with great facility when you have such a large sample size. And that work is published as I said. So coming back to the hearing test we've already heard about digits in noise and I just very quickly run through it with you here we see a typical participant in the studies. See in front of a touchscreen wearing standard phones and this data was collected in 23 centers across the UK. A very good sample at which was very carefully monitored to try and equalize demographics and so forth. And it consists of three spoken digits in steady digit shaped noise. An adaptive staircase of 25 trials, which takes about three minutes to do per ear. It's internet deliverable as we've heard. You don't need a sound booth mainly because the noise that's presented actually mass environmental sounds. It's also very tolerant to use of different devices which we've heard about today. An important issue. And it's usable by very young children and old very older adults because it's cognitively relatively undemanding, and we can try it as we've heard several times at the website. Some funky results between hearing cognition and age that are now a few years old but I love showing them because when you're working with so many people, you get these amazing data relationships that are almost perfect. You can see a difference between the different curves here. It's because that is a real difference. All of these things are statistically significant. And in fact, when you're working with these sort of huge data sets. Significance almost goes out the window because everything significant what you have to look at is effect sizes. And in this case we're looking at the speech reception threshold which is a measure of hearing ability on the digits in noise as a function of age. And over here we look at the distribution between the speech reception threshold on this y axis and cognitive composite this isle where 10 is the highest level of cognitive performance and one is the lowest. You can see there's a very strong relationship even this very simple test between cognitive performance and age. Males and females represented the blue and pink sorry for the traditional approach there, performing relatively similarly across this. And you can see that at each age that people in the very lowest cognitive decels were having particular difficulty with this task. And I like to summarize these data by saying that a very cognitively able 70 year old has the same level of speech and noise hearing is a very cognitively unable 40 to 49 year old. So it's a good relationship scene there. And just by comparison, using another large whoops using another large data set the NIH toolbox. This is the same relative relationship pure tone audiometry. Here you can see clearly the females are doing better than the males across all things but you can also see that even for this very simple test, there is a relationship to cognition. So I'd like to finish this with some future work for currently we're trying to persuade UK by a bank to conduct further online testing. Excuse me, because we need to improve sensitivity and to do further longitudinal analysis. Secondly, we want to harness the power of the whole genome sequencing which is currently underway. And we'd like to look at other types of phenotypes for example 10 of us which are not currently available in detail and I should say that the 23 and me sample which is about to be released had 50 hearing questions all together. So there's a lot of potential there for power. And finally, I'd just like to say that the combination between hearing testing genetics and imaging releases unparalleled power for human auditory science. And I'd strongly recommend people to go to the resource with UK by a bank and use it if they can and bear in mind that the NIDCD at last inquiry at least strongly encourage people to use existing data sets. You don't have to go to the expense of doing your own experiment when there's all this wonderful data already available. And finally, if you haven't done it already. You've still got a couple of minutes to go to computational audiology.com to do the demonstration digits in noise test. So thanks very much for your attention. And I'm going to pass back to Dennis to organize the Q&A session. Thank you Dave and thank all of our presenters. We have 10 minutes in the symposium. Many of our questions have been addressed in the zoom chat and some on the website. So I'd like to start with the remainder of our time and maybe pass the torch over to do that to discuss the results. What are the results of our real time hearing test experiment? We had a good part take part take of the digital noise test demo. I'm not sure if that's a reflection on how engaging our talks were. But we had 61 people completed by six that well by the on half the hour it's 630pm where I'm at. I know it's morning where most of you are. So let me just share quickly with you some of the results. So this is how the results end up in the in the back end portal. And you can see which ones fail and pass just to give you an illustration of this is an example of someone who passed their screening test their birth year and their signal to noise ratio is illustrated there. You can also see the 23 steps of the test and how they responded. Here's another example of someone who failed the hearing screening with the signal to noise ratio of minus 0.4. And you can see in this instance the birth year was 1933. And then we do an export on Excel. So I have a wonderful research assistant, which is also the PhD student Karina who assisted in exporting this for us. And here are the final set of results. So 61 participants who did their digital noise test during the session. So you can see this correlates the speech recognition threshold on the Y axis to age on the X axis. And there were 26 fail results. And it all likely the high proportion is due to the fact that most people probably did not use headphones or earphones. So probably just a free field speaker on their device. This does lead to slightly elevated SRTs. And I think it also speaks to the point that Josh made so eloquently about having potential checks to make sure that people are actually wearing their headphones, you know, during these kinds of online tests, especially when it's for research purposes in most instances, these widgets are used by consumers or patients or clients so they're usually motivated to actually do it correctly. So these are participants necessarily in an in an ARO session. But in any case, I'm going to, I'm going to leave that there. Thank you for everyone for participating in the interactive nature of this presentation and actually conducting this test. Great. Thank you to that and I agree I think if anyone wants to navigate to the din webpage that has those results as well there. In the remaining time that we have we accept questions from the audience. We still have a few for from the later talks unresolved. So maybe I will read those out and allow the presenters to address them verbally. So, we have one question. Dave, you might want to ask UK Biobank for balance data as well. For example, the CVMP data. And I see Dave is typing a response to that. Yeah, I am typing a response and there are balanced data in there and I was about to suggest to to like the, the asker that he actually registers for UK Biobank and looks up the data himself. Dave, can you address whether there's a cost to researchers accessing the UK Biobank data. There is a cost. It's relatively modest. I think in the order of $1000 or so. And that gives you access for several years and across. I think a number of workers at one site. So it's a site license essentially. Excellent. So investigators might be able to join forces with other collaborators at their home institution. Correct. And in fact, that's what we did in the UK. Initially, I was in Nottingham at the time and we joined forces with Manchester and then as I mentioned in the context of the genetic data I presented with Kings College London. They strongly encourage that and one of the first things they do when you make an application is bring to your attention the other groups who are interested in the same sort of domain. Great. Thank you. Next question for L. I'm guessing there are some cultural elements and it can be difficult in speech and hearing department to leverage limited funding for these open science big data type pushes I think. Yeah, I totally agree. I'm having so a couple of like back and forth here which are really good about, you know, it's, it's, there does seem to be a real kind of cultural issue and that's kind of where I didn't want to focus on tools so much is because there's a real shift into saying this is a priority and we have to fund this and we have to invest in people learning these skills and they can't be an afterthought if we want to scale the data collection this large and it's like it's, you know, right now most people I know that are really good coders it's like they were passionate about it and they wanted to learn it and that's how they got there. So, you know, it seems like there's still a lot of room to grow and it's not clear, you know, because some fields like neuroimaging and parts of psychology, I think they've been putting a lot of effort into this somebody else commented about the human brain mapping I know I'm butchering the human brain mapping but it stands for something with human brain mapping organization and they they have had a lot of open science initiatives and you know why we haven't seen it here is a really interesting question which you know if people have ideas about I'd love to hear because kind of knowing what are the barriers is you know the first step to figuring out how we get over them. And that's what some of it has to do with the heterogeneity of people who work in hearing science that there's a lot of people with a lot of different priorities coming to each research project that you have clinicians you have scientists and you have people in various phases of their training which are going to go into different sectors after that and so that you know is really an asset and a strength and it also makes it harder to make shared conventions about how we develop how we access data how we manipulate data. It's, it's both a real strength and a challenge for that. But I'm very curious what other people think about this. I would personally echo those sentiments that most of us on this panel have our feet in other disciplines as well and we're seeing movement in other spaces and that was actually the, the inciting idea for holding the session. We feel that it's time in speech and hearing sciences to pull in some of the already vetted procedures and then also looking forward to the future in our field for advanced methods like this. Okay, we have time for a couple more questions. Another one for L it may be a lack of awareness or education within the field that those practices exist. I'm just going to go back to the conversation we just had the benefits and the benefits they provide the Journal of ear and hearing recently added some open science policies to encourage these practices but they're fairly new. So it sounds like the publishing publishers are in speech and hearing are beginning to consider policies that reflect open science, big data, collaborative initiatives, etc. Let me move on to a question to Dave. Does UK Biobank have cognitive health or mental health data in it? Yes, it certainly has cognitive health because that was what I reported in my talk about that was actually collapsing down six different cognitive tests and since then they've added additional cognitive tests in some of the follow up work. On the mental health side, we've also done some work relating hearing to other mental health issues, including schizophrenia and we've looked at tinnitus and depression and and so forth. It's really easy to make associations across these fields using these large databases and so much so that the collaboration with Manchester has resulted in almost 20 papers so far since the data were first released six years ago. Okay, I'm going to wrap up now with a comment and a quick question. So Roger Miller from NIDCD is posted to reach out to him if you're interested in seeking funding from NIDCD for big data archive and machine learning activities I think that's a that's an incredibly important public service announcement for this community. There's one last question to me actually about prediction modeling, and can it be extended when the outcomes of interest or speech intelligibility for example digits and noise. That's absolutely the case that's where the research is right now so the audiogram was our test case to demonstrate that we could achieve the effects that we wanted without sacrificing time, taking a long time, but we've built now cognitive models and visual perceptual models, it works in a wide variety of contexts. Okay, I feel like we succeeded in our goal of presenting this material I think my fellow organizers, the presenters and the audience who if you stuck with us all the way to the end kudos to you and a special thanks to those of you who took time out to perform our online din test. That concludes this symposium. I wish you all an excellent remainder of your aero virtual experience.