 Yeah, welcome everyone. My name is Joey Love-Strand and I am a British Academy post-doctoral fellow at SOAS University of London, hosting this webinar on behalf of the Linguistics Department at SOAS. Our speaker today is Dr. Chelsea Sanker, who is a lecturer in the Linguistics Department at Yale University with expertise in phonetics and historical linguistics. Chelsea will be speaking to us today about the effects of recording devices and software on phonetic analysis, and this is based on ongoing work with colleagues at Yale. This is of course a timely topic as many linguists have had to adapt to new recording devices and software, and will no doubt be more inclined to continue to use this technology that allows recording at a great distance. Chelsea will be speaking for about 30 or 40 minutes, followed by a period of time using the rest of our hour together for further discussion and questions from participants. You'll note that I'll be activating a live auto transcription that will be available in Zoom. If you prefer not to have this, you can deactivate it in your own Zoom by clicking on the live transcription button in your app. Chelsea, thank you for preparing this presentation to share with us today and we look forward to hearing your talk. So, I'm going to go ahead and share my screen. So yeah, I'm going to be talking about how different recording devices and different software influence the phonetic results when we make measurements, which is collaborative work with the rest of the Yale linguistics field work group. Yes, so of course with the pandemic this has changed what sort of research we can do and has imposed limitations on our ability to do in person research so a lot of people have shifted to collecting data remotely in various ways, which makes it important to know how these different methods of making recordings using different recording software like Zoom, or using different devices that people happen to have at home is going to impact our acoustic measurements. So there are a variety of different potential sources of variation. So, their effects of compression is different software compresses in different ways and this can influence measurements of frequency and also measurements of duration particularly when you have variable compression so different things, different parts of the signal being compressed in different ways. We also have filtering so things like zoom has its own filters to boost certain frequencies to reduce background noise, and this can influence particularly a periodic noise but also overall intensity and the relative intensity of different frequencies. And then there are all the familiar concerns that we've seen from previous work, which actually has been studied in some detail, looking at effects of sampling rate ambient noise shielding to reduce interference from other electrical devices microphone placement microphone sensitivity so basically all of the concerns about having a slightly different physical setup, which can change your results in addition to differences from software. So we had two phases of gathering data. So in phase one, we made simultaneous recordings on six different devices so one speech event being recorded in six different ways. And I'll show you an image of what that setup looks like in a moment. So in phase two, we transmitted recorded speech over four different conferencing applications to look at how that software is impacting the recordings, and then we were comparing each of these against our gold standard which was a solid state zoom h4n recorder. So we were 94 target words embedded in the carrier sentence we say X again elicited in randomized order from three native speakers in English, and it was just three because we wanted to have exactly the same setup for everyone so we're limited by who could come into campus and sit at our little array of devices. So we were able to find so that we could test a variety of parameters of different types and so measurements of duration a frequency of a periodic noise, and later I'll show you the full list of all the things that we measured. So we wanted to look at both the raw measurements, and also specifically the relative measurements so when you have context where these characteristics are part of phonological distinctions to see whether we can capture those distinctions like at zero as a correlative stress and onset voicing. So here's what our array looked like in phase one. So you can see the six different devices numbered here so you can see which one is which. So we had the h4n recorder. That's the one numbered one we had two cell phones. We had and we had one computer recording with the internal microphone and one recording with an external microphone a headset microphone so you can't see the headset here because the speaker would be wearing it and then the chair that you see in the front is where the speaker would be sitting. So in phase two, we took the recordings from the h4n solid state recorder. Henceforth we're calling it the h4n recorder to avoid confusion with the zoom program. And then played those through the sound card of one computer to send it through another computer to another computer through each program so basically it gets treated as if it is the input from an external microphone. So you have identical signals that are being transmitted through each of these recording recording conditions. So it's not quite the same as having live speech, but we really wanted to make sure that we are capturing differences that were just due to the program, instead of potential differences of what microphone was being used as the input for different programs. So in this phase what we were testing and the programs we were testing were zoom skype clean feed which is a program that often gets used for podcast interviews and is focused on having clear audio and Facebook messenger, which doesn't have built in recording capabilities. So you have to record in the background and we were using audacity, but we had a separate condition, just using audacity alone to see when we have effects in the messenger condition is it due to messenger or is it due to audacity. So this selection of programs was basically based on things that we personally know some people are using which are freely available and could all be tested in the same way. So for the acoustic analysis. First we converted the audio files and to a sampling rate of 16,000 Hertz uncompressed mono wave files, because it was important that they all have the sampling same sampling rate same file types that weren't going to be due to that because different devices do have different sampling rates and we already know what the effects of sampling rate are we didn't want to test that. So it's important to have everything and be the same there. And then this is also the sampling rate that's required by the pen forced aligner which is what we use to do segmentation. So forced alignment in this case is useful because you know that it's going to be systematic in what cues it's going to be using. So manual segmentation done by human segmenters is slightly more accurate, but it's going to be more variable because you don't have the same complexity in what is being used to identify boundaries every time. And when you have this sort of clear slow lab speech forced alignment is extremely accurate. So then we extracted measurements from our target words using scripts in prod. And then we did our statistical analysis using mixed effects models with the fixed effect being either device for phase one or program for phase two with the reference condition being the h4n recorder and we had a random intercept for speaker and performance and center of gravity measurements we also had a random intercept for segment. So, of course, our set of measurements is not exhaustive. It is just a selection of things that we're trying to cover to see what are the types of issues that we're likely to see so you have measurements that are basically looking at duration measurements that are basically looking at apiotic noise and measurements of frequency and all three of these types of categories exhibit effects that you do see differences in the specific measurements and how they get impacted. So I'm going to talk a little bit about what types of issues we're likely to see that are influencing our measurements in each of these categories. So, in many of our conditions, we see effects on duration so consonant duration being underestimated and vowel duration being overestimated and they're two main sources of effects on duration so partially this is due to compression that's being used by various algorithms and in some of the device conditions based on what program is being used to make the recording in that device. So if you have compression and lossy compression can result in effects in timing that some parts of the signal are compressed more than others, and the decompression doesn't end up sort of reconstructing the same exact duration that you started so you do see effects of that where you get non identical durations within the recordings, but you also get effects that aren't directly altering the timing of things within the recording but alter our ability to measure it. Because if you have changes in the degree of background noise or the intensity of the signal this can obscure the boundaries and our ability to identify them so that's what I'm illustrating here in these two figures so this is the same word tug. Where you can see that final G doesn't have a full closure so it has some formants this is actually me saying tug. And as recorded by the H4N recorder you have this clear drop in intensity and you have that fairly early boundary between the vowel and the G. And as recorded by the iPad you have a lot more background noise so you don't have that really clear boundary it's just more of a gradient that the formants are sort of changing and they get slightly less intense, but it isn't quite as clear where you would want to put that boundary point and it ends up getting put much later so this isn't just an issue of forced alignment this is the same sort of doubt that a human would also have when saying well where do I want to put that boundary. But some boundaries are clear and you have a lot of agreement across the different conditions like what we have between the initial T and and the vowel. But we do get this variation that's due to not the actual duration of the vowel has changed within the recording. In this case, but just our ability to identify that boundary has been obscured by the differences in how much background noise there is. The next thing I want to talk about was a periodic noise. So we have some effects that are caused by differences in how much background noise is being captured and how well the speech signal itself is being captured. As well as filtering that's meant to remove background noise or boost the speech signal, because even though you have these sort of enhancement methods like zoom uses that are supposed to make speech clear they are changing. What's actually going on in the signal, which means that it's going to influence our phonetic measurements. So these sorts of factors can directly impact measurements that actually involve measuring degree of apiotic noise like the harmonics noise ratio, or the center of gravity. And then it can also have indirect effects in how well target characteristics are identified like I mentioned before for duration, but also for things like identifying formats or other periodic signals. As you have in the recording and the less reliably you'll be able to identify those so this has both direct effects and and indirect effects and you see this both in device conditions based on how much is this microphone picking up of different parts of the signal, but also in the program conditions based on what sorts of filters they have. And this is a sequence of frequency. So formants in particular were impacted in several conditions. And this is likely to be a result of several different factors so partially lossy compression and but also filtering and how much noise is being picked up. Depending on how the compression system handles repeating waves, they might get over regularized or they might be obscured if it's not accurately identified. But you also get changes due to filtering particularly if you have these programs that are boosting particular frequencies are suppressing particular frequencies which are going to change the intensity and when you change the intensity. And this can shift measurements of frequency that basically if you have a higher intensity slightly higher in the bandwidth of a format, it's going to change the measurement of that format to be higher. And then of course changing the intensity of frequencies also directly impacts things like spectral tilt, which we'll see how is impacted in all of our program conditions. So this is a summary of all of our results for phase one. So how to read this table each cell is giving the estimate for that factor so how much the measurements given for that device for that measure differed from our baseline h4n recorder. And then we have stars to indicate a significance level. So the variability only the significant results are included so the empty cells are ones that didn't reach significant but it's important to remember that that basically is just about, did we have enough data in order to find a significant effect so don't look at those empty cells and say odd that was being perfectly captured it just means that it was either more variable or a smaller effect but that doesn't mean that these were identical. The ones that we did find significant effects in were large and consistent. So, you can see a few things that are worth noting so one is that the internal microphone had substantially more effects than external microphone. So, even though they're both being recorded by very similar computers there is a difference in the internal microphone, not being as good at capturing the signal based on both, you know, the directionality of the microphone what frequency is it sensitive to, and how much noise it's picking up from the computer itself a computer fan and so on. You can also see a difference between the two Apple devices we have the iPad and the iPhone which are very similar except that we had used different settings for the recording one of that's compressed and one which is uncompressed, and we do see that compression results in more differences. And there are a variety of other things that I wanted to comment on one of which is the center of gravity so you'll note that we have these huge effects for center of gravity and partially that is because of our sampling rate so you. When you have a low sampling rate a lot of the noise of the fricative is cut off which is going to make these measurements a lot more sensitive to background noise. So, with a higher sampling rate, you're still probably going to get effects on center of gravity they just aren't going to be quite as huge as what you're seeing here so it's worth keeping that in mind. And, but some of the other things of note we have differences in the signal to noise ratio which is largely about how much background noise is being picked up, but it's also worth noting that a higher signal to noise ratio doesn't necessarily mean that the recording is better because you can have various alterations to the signal that are going to change and alterations in the recording that are going to change the signal to noise ratio, which are still altering the recording so basically want to think about, well you know for boosting certain frequencies that are common in speech that is going to make it look like we have a better signal to noise ratio, but that doesn't mean that it's necessarily a more reliable recording. One of the other things to note particularly for center of gravity and the format measurements is that there's a lot of variation by the particular segment being measured. Basically you only are going to get significant effects if you have a consistent effect. And when I look at the differences across different vowels you'll see that different vowels were impacted in very different ways so some of these conditions weren't actually accurately capturing formants it's just that they didn't shift all of the vowels in similar enough ways to make it show up as an overall effect. Whoops. I'm not sure why it jumped right to the end. Okay, so next I wanted to talk about phase two so here's the summary of all of our phase two recordings results. Again laid out in the same way only showing the significant results, you'll see that there are more significant effects across different programs than there were across different devices. The first column that you'll note this is the one where we just had audacity so basically you take the recording and you play it through audacity and re-record it there so it's actually more surprising that we see any effects of that at all because it's not being transmitted anywhere. But it does confirm that basically all of the effects that we see in Facebook Messenger are a result of Messenger itself and not an effect of recording through audacity so you'll note that there were lots of effects, very large effects through Facebook Messenger so one of our takeaways is that that's not a really good recording setup at all if you can avoid it. But we see also a different set of effects across different programs than what we saw as being common across different devices. So we see more effects in spectral tilt that are probably related to some of these filtering and frequency boosting effects. So we see more effects in duration which is probably a result of some of the patterns in compression which also can change the measurement of other timing things like where the FZO peak occurs. And you'll note that we don't see as many formant effects but this is partially because the measurements across vowels are even more variable here. So part of the thing to take away is just that there is a lot of variation based on the particular condition being used it isn't that certain measurements are consistently always affected, but you have some differences in which measurements are affected in which conditions. So one of the things that's also worth noting is that we tried comparing several different zoom conditions, even though we ended up just reporting one of them for the comparison with other programs, it's worth noting. It's worth considering you know do you get differences based on what settings were being used within zoom so is it that the recording was local. That is, you're the one playing the recording and you're also the one recording it or is it remote. Was the computer Mac or windows where the files converted from MP4 files, or just directly from the way files and did you use the original audio setting or not so this is one of the options you get up in the corner for zoom. And it's supposed to change how much you get the filtering for background noise and echo cancellation, but there was very little variation that we found between these different conditions so it seems like most of what we're finding is really just an effect of the zoom program itself and not about particular settings or combinations with the device. So next I want to talk about relative measurement so looking at correlates of phonological contrasts and how reliably we can capture these phonological contrasts using these different devices. So looking at stress as reflected in vowel duration and f zero maximum code of voicing is reflected in vowel duration and the harmonics to noise ratio on set voicing as indicated in harmonics to noise ratio spectral tilt and f zero maximum vowel categories indicated in f one and f two and fricative identity as indicated in center of gravity, and most of these contrasts were captured in all conditions. So even when the raw measurements were substantially shifted in different conditions, you still mostly were capturing these relative values, but sometimes the size of the difference varied across these different conditions, and there are some contrast that weren't captured in some of these conditions. First, I wanted to show just one of the examples of things that were captured. This is looking at f zero maximum as predicted by onset voicing. So you get a higher f zero after voiceless consonants than after voiced consonants. And this was captured in all of the conditions all of them found this difference and there weren't any significant interactions between onset voicing and the condition either devoid the device or the program. So I thought that there is some variation in the size of the effect, like we have a smaller difference as measured in the Android condition that is our second device here, then in the zoom h4n recorder. So, but all of them have a clear separation. So mostly if all you're interested in are the relative measurements to say, is this a correlate of this phonological contrast that's mostly being captured and this is just one example. So most of the other phonological measurements were in this same, we had the same sort of pattern so I'm just giving this one example and then I'm going to focus on the, the measurements that actually did have these significant interactions and are more of a concern. So, moving on to center of gravity by fricative. We see several different fricatives. And then this is looking at the interactions between the particular device or program being used and what the fricative was, and we see several major interactions. So as I mentioned, partially this is exaggerated by our sampling rate, such they're getting a lot more sensitivity to background noise but we have some very very large effects. That either you end up with two fricatives where you don't find any difference at all in some conditions that basically they're measured as being right on top of each other. Even though they're separate and other conditions or you have ones where even the measurements are flipped, such that one of them is measured as having a higher center of gravity in one condition, and a different condition the other one is so some major differences in center of gravity where you're likely to get slightly smaller effects, but still effects in center of gravity with a higher sampling rate. So next I wanted to talk about vowel spaces. So first looking at vowel spaces as measured across a different devices. So here the different colors are the different devices, and each of the vowels is marked with its IPA symbol, and there's a interaction. So including the interaction between device and vowel significantly improve the models, but you do still basically have a recognizable vowel space for all of the conditions and you can look at the little clusters where for each of the speakers. You really do have very pretty similar measurements across the different device conditions, but you'll see that sometimes they're a little bit spread out they aren't all right on top of each other so you get a fair bit of variation, but and notably it varies by the particular vowel it isn't all shifted in the same way. So overall, mostly you still have separate vowels where you want the vowels to be separate, even though the particular measurements of what are the formats characterizing this vowel aren't identical. And then looking at the vowel space as measured across different programs you'll note that this looks a lot worse. That the vowels are really spread out within each category where they all should be similar, and you end up with a lot of overlap between categories. So you get these shifts that are vowel specific so it isn't just well this program consistently overestimates F1 or underestimates one, but basically it just depends on what sort of filtering or amplification you get and how that aligns with the different requirements for that vowel for that speaker which is why you get so much of this variation but basically a lot of the contrasts wouldn't be retrievable here the vowels are hugely shifted, particularly in the Facebook messenger condition but even in the other conditions we see a lot of vowels substantially shifted away from where we expect the measurements to be and into the realm of other categories so basically you would not be able to accurately capture all of the different vowel qualities and their contrasts using many of these programs and you might end up with a very inaccurate idea of just how many vowels exist for each speaker and not just issues of what's the realization of each vowel. So then I wanted to just say a little bit about you know what to take away from these results so both different devices and different software affected the phonetic measurements sometimes substantially. But on the other hand when we're looking at relative measurements to look for acoustic correlates of phonological contrasts these generally remained clear in most conditions for most contrasts, but there are some contrasts that were exaggerated or underestimated or not captured at all in certain conditions so it's important to think about what the particular thing is that you want to measure with a particular data set and whether that's going to be reliable. Given what devices or programs are being used. So this is a major concern for any data that's gathered remotely or gathered in person in different ways so for things like if you ask participants to record themselves with whatever device they have at home. So when we went about this project mostly we were concerned we were thinking about this from a fieldwork perspective, but it's also going to be a concern for experiments or corpus work or typological work basically anything with speech recordings, where the recordings might have been made in different settings by different devices or by different programs. So the first thing that's really important is just always document the recording setup in as much detail as possible. So what microphone was used what program was used, what the settings were for the program, if the program allows multiple settings. So you know sometimes they allow, they have a compressed setting and an uncompressed setting. So sometimes it doesn't seem to make a difference which is what we found in the zoom but for other programs, it is likely to make a difference so it's always better to have more information than less, because then we can use that with we have that record to be appropriately cautious and saying well we found some of these effects are the things that we can attribute to the speaker or the effects of the language, or are those confounded with recording conditions such that we can't actually evaluate if there are individual differences or not, or if there are differences between language or not. So one of the differences, if you have these differences across different devices or different programs for different data sets like if you're doing typological work, then it's hard to actually establish. Is this something that is an effect of one language versus another, or one speaker versus another, or is it just that while we used one device for this and one device for this other language, such that the results might be due to that. So we talked about the comparability of different recording conditions so basically we set up our recording conditions across different devices to be as different as possible so we selected different devices with very different sets of settings, and different types of microphones and so on, but sort of if you can select more similar devices you can get even more similarity across them so you can think about how much of an effect. We're going to expect we're comparing these two recording conditions that we want to compare, and also specifically in the particular measurement that you might be interested in how much is that influenced by these different conditions that are being used. So we have some general recommendations, which are basically both if you want, if you can use the same recording for making comparisons because you can make a lot of reliable comparisons within the recording and it gets much less reliable when you're comparing across different recordings with different setups, but also it's better if you can use the same setup would make being multiple recordings that you want to compare to each other, or at least use very similar setups if you can so even if you have different people in different places you can make sure that they're all using external headset microphones for example, rather than having variability and some of them using internal computer microphones some using phones and so on. So it's important both thinking about your own data and how comparable those results can be, but also if you want to compare your results to someone else's data. It's important that you know how both of those sets of recordings were made such that you can think about what similarities you're going to have, or what differences you're going to have based on having differences in the setup. So if you're using virtual recording might want to consider testing the setups that are being used so if you know that you have multiple consultants or multiple participants who are using slightly different setups some using their phones some using their computers. You might think about doing something parallel to our tests and actually saying all right we're going to set up all of these devices and see how the measurements were making are going to be impacted. These are specific recommendations first to avoid compression, whenever possible. This has been said before, you know, is worth repeating so use lossless formats. Sometimes it's not immediately clear that some devices or some programs will default to compression to lossy formats, just based on it making more convenient smaller file sizes. You can use external computer microphones rather than the internal microphones even if you have a relatively new computer with a high quality microphone you still get the differences that I mentioned before about sensitivity to different frequencies about directionality how much background noise it's going to catch to catch. And also about you know particularly on hot days if you have the computer fan running the internal computer mic is going to pick that up. And using different in person devices is going to be preferable to using video conferencing software at least we didn't identify any video conferencing software that seemed to be reliable enough that it would be preferable to use that rather than making in person devices. So particularly you can reduce the differences across devices by using very similar devices. You can reduce a lot of these effects just by making recordings in person, and then having them sent to you by the people who are making them. So you do end up needing to use video conferencing software it's really important to use the same program. For example, Skype and zoom had slightly different effects. So if you look at the specific alterations that we see in measurements of the different formats or the different other characteristics that you don't have the same shifts. And each of the conditions so it's important to use the same conditions such that you'll at least get comparability within your own data, even if it isn't necessarily going to be comparable to the results that someone else has in their own data set which is collected in different ways. So some of the other factors to consider that we only tested a sort of small sample of conditions which are far from being all possible devices and software that might be used so that's why we very much encourage everyone else to test their own setups that either they have used, or that they're not using in order to get a better sense of what sorts of variation we're going to find across different devices and different programs and to what extent that's going to be consistent or interact with other factors like the particular speaker. We also only looked at English. So you might get different results in different languages that have a different inventory of phonemes or different sets of contrasts. And in particular, you might get differences if you have noise reduction algorithms in particular programs that have been trained on English speech data, they might alter non English languages more than they alter English, based on sort of having these particular set expectations about what speech noise looks like and what gets categorized as non speech noise that's not something that we tested, but it is a potential concern. It's also worth noting that all of our virtual recordings were run on stable high speed internet connections, which has, which basically has made effective programs as small as they might possibly be because we really wanted to look at effects of the program itself rather than other factors but if you had slower connections that is going to introduce a lot of additional issues and much larger effects than what we observed here. Just to conclude, all of our tested recording options do distort the signal in some way they alter what results we get, which is worth keeping in mind and really thinking about whenever you have data collected in a variety of ways. That it's important to think about what are the effects of all of these factors of the setup. And it does vary a lot by the characteristic being measured. So, even if you have a setup that's going to be accurate for one characteristic it might not be accurate for other characteristics and it's important to think about both what particular setup. You're using, but also what characteristics you might want to measure what future researchers might want to measure in the data that you're making. But also sort of on the good side, most phonological contrasts are captured reasonably well. So even when you have the raw measurements being altered, often you basically have a systematic alteration within that condition, such that you can still make relative measurements within a recording and many comparisons can still reliably be made within a recording, though not all of them, so you don't want to just assume that all phonological work is still going to be possible. But it's your sort of relative comparisons are going to be more reliable than just the raw measurements. And then just to end with if you want to look at all of our data in detail. We have the paper and the supplementary materials available on LingBuzz. And I've given the link here. And so you can look at that if you want to further examine, we have our summary tables but also each of the individual models and figures illustrating each of the results for the individual measurements that we made. So, thank you. I will end there and I can answer any questions. Thank you so much Chelsea that's very clear, very helpful and the results are quite striking on just what kind of affects you, you get through the software. So we have time for some more discussion or questions you may have. If you'd like to ask a question, you may use the raise hand function and zoom or in the chat you can just write the word question and I'll call in you. Otherwise, if you maybe don't have a stable connection yourself and want to write out your question in the chat, you can do that as well and I'll read the question for you. So while waiting for those questions to come in. Maybe I'll just ask Chelsea about one thing that we discussed briefly before the talk I saw that there was a similar paper on LingBuzz I think it's now published in the Journal of acoustical society of America, on comparing acoustic analysis of speech data collected and I'm wondering if you would just comment on some of the similarities and differences between your study and theirs and I'll put the link to that in the chat for anybody who wants to see this other paper as well. Yeah, definitely. We see some, some similarities so they measured F zero and F one F two F three, and we see similar results for F zero. We see some differences for measurements of formats and I think that can largely be attributed to the fact, well I guess two things. So one is that they used a different set of vowels. And, as I mentioned, our overall effects for measuring each format differ on the particular vowels that you have so using a different set of vowels is going to result in different overall effects so they did look at zoom. So they see different overall effects I think F one turned out the same but F two turned out differently so partially it's just thinking about what the overall effects are versus what the effects are by vowel and how just using a slightly different set of vowels is going to change your overall effects. The other difference is that so we had sort of our transmission of the existing recording from the h4n recorder so that you had exactly the same input to zoom that was identical to the base recording. And in their setup, they did concurrent recordings so basically it was I think it was an internal computer mic but in any case, basically, you're getting two effects in their results one are what is zoom doing and one is what is the effect of the particular microphone doing, whereas ours is just zoom alone so a slightly different set of things but it's useful to also get that look at how do these combined so is there a difference between, you know, making a recording with zoom using our method, where you're sort of pretending to have an external computer microphone which is really just the sound card from the h4n recording versus making a sort of running zoom and making a live recording. And they also see effects so it's not like they discovered the magic to sort of solving this problem they've seen similar effects is that right. Right yeah so yes they do they do also find effects and they're just some differences based on the particular parameters they also looked at doing the analysis with prot versus voice loss and find additional effects of that so there are effects also of and then what analytical tools. Do we use and how sensitive are those going to be to things like additional noise and what can be captured despite the noise. Okay, thanks to me, do you want to ask your question. Hi, thanks for the interesting talk. So, I heard from a few colleagues already that they are they've been collecting data, not necessarily for phonetic analysis through what's up, because some of these messengers are better than others, in the sense that you press button and then it records a piece and then it sends it to you afterwards so you avoid this effect of the unstable internet connection and also in some remote field sites basically people don't have access to computers and zoom and all of that stuff. What's up is better than messenger for example because messenger allows only one minute long recordings and so on and so forth so maybe some are better than others but you didn't test that but do you know anything about like where would you place what's up in your within results or what's up or what we chat or you know all these other considering that you're recording with most likely the internal microphone of a relatively cheap mobile phone. Yes, that's something we didn't look at but which I hope someone does look at because it would be really interesting to know, based on sort of as you say they're going to be two potential effects one is what is the program itself doing. So what effects are you going to get based on these being through mobile phones so you can look at our data say to get a hint of what sorts of effects are you going to get across different phones, but I don't know what sort of compression codec or anything. What's up is using so so I can't say where how it might behave but that would be something that someone should look at because I would be useful to know because I do know that yes, that does get used and is potentially useful in that it avoids the potential effect of internet connection, which is something that we didn't address but is a really important thing to keep in mind because you can't assume that everyone has a high speed internet connection, such that you can get these clear recordings that aren't impacted Thanks, Chelsea. Thanks, Tim. I had a general question just to I mean this is an issue that comes up for anyone working in language documentation as well just the fact that this technology both the recording equipment and the software and internet access is just changing year to year. And what might have worked last year just next year turns out it doesn't work or there's a new, you know, device that you'd never heard of that came out. So do you have any recommendations for how your average linguists would stay on top of what's changing from year to year and to know to three years from now. Are these results still holding or have things improved and change how would we as a field sort of stay stay in touch with what's going on with the technology. I guess the the main thing there is just as long as you're documenting exactly what you're doing. Then that's something that you can refer to, you know, if you're looking at someone's results from 10 years ago say, you know what did things look like when they made these. But yeah that's something that I guess we didn't really think about in terms of you know are the effects of zoom today going to be the same as effects of zoom in two years when they've updated whatever process of filtering they have. That is a really important concern but it demonstrates one of the things that I don't think you can use our results to basically say, Oh, here's this adjustment that we need to make for zoom and then everything's going to be reliable basically it's just, you know, zoom is not super reliable for these things. And we need to keep that in mind, which is likely to be somewhat stable over time unless there's some major update where suddenly something, you know, introduces a new filtering system. Yeah, I guess the bad news there is probably that people should keep doing work like this to say you know we found this, you know, in the past, is it actually still true or have there been various updates it's particularly hard because a lot of these programs. Don't actually tell you what exactly their codec is doing so you say well you zoom zoom is doing a whole lot of things that alter the signal. So they don't have a web pages details exactly what it's going to do they just say oh but it makes things nice and clear for you as a listener, not for you as a phonetician what's it going to do. So, so it's actually hard to keep track of because we don't get all that much information from these different programs, but but it's certainly something they want to keep in mind, you know as we look at things over time that there's some work that's done this looking at how do you compare analog recordings that people were making sort of on cassette tapes back 50 years ago can we compare that to our digital recordings that we're making now what sorts of effects are there when we have these major shifts in technology that we can use so there's some work that's looked at that less work looking at shifts within a technology as you have a program that's been around for 10 years or 15 years you know at what point the updates in that program also substantially change what it's doing so that's a really important point to also keep in mind that basically is we need to document everything it is too much detail as possible to have all this as much information as we can to evaluate what's going on just as an effect of the setup. And I suppose as you say their goal is never going to be to create good recordings for phonetic analysis and so we can assume it's ever going to be perfect for for what a phonetic furniture would want to do with this. I think we have another question is it on a sec. You have a question. Yes, thank you. Hello, I'm not actually can a lot from Thailand and thank you for your very interest in please just talk today. Just about from your talk you have shown that phonetic details very according to devices and also software and so my question is, is it possible to to normalize the acoustic value across device and software. So basically say, like establish what sort of effects there are by device and then try to apply that to make the results comparable. I think not because there's a lot of variability so partially it's that there's a lot of noise introduced, and then there's a lot of sensitivity to the particular thing being measured so like performance you definitely know is based on not just the particular vowel, but the particular speaker, such that you know it's just about how like where that particular format is so there's no way to sort of normalize that and correct for here's the adjustment for zoom here's the one for Skype and so on. And you also get a lot of variability. So there's the effect of here's the particular device that we use here's this particular Android phone recording with this particular program. But you have potential sensitivity to you know how old was the device how good was its microphone, would you get the same result if you used a different program so you have so many combinations of factors, both based on the setup not being quite the same. But also based on what parts of the signal that it's sensitive to that I think we don't want to jump to thinking that we've identified exactly what the problem is or what the differences are, and how to correct for that I think mostly it's at the level of, you know there are these problems we know that we have these differences, and we just need to sort of be aware of that and try to use conditions that are as similar as possible because you know maybe at some point we'll get enough data that we know exactly what's going on and we know how to adjust for the different devices and programs but we certainly aren't there yet. Thank you for the question. Thank you very much. Peter just mentioned Peter Austin's mentioned in the chat wondering whether different language versions of zoom show different effects, I guess I would be assuming that zoom is has different you know filtering algorithms depending on what language they expect to be used or I don't know if their technology is that advanced. Yeah, that's something we didn't test that but it's something we wondered about so we did sort of glance around to see does zoom actually give us information about, does it have different algorithms for different languages how is it identifying what language you're using. And it wasn't clear that information might be out there somewhere but we don't have it so it's certainly something that would be interesting to know, because it certainly would matter if you want to do work on different languages that either will or won't be identified by zoom to say you know are these effects going to differ from what we get for English so certainly is something that might be a concern and is worth thinking about though I don't actually know to what extent it's a concern so another thing in the category of someone should test this and see to what extent the results are similar. Great thanks. Are there any other last questions or comments before we end our session of the paper is already available as a prepent online with further information also coming out in language later this year. And other paper as well that I linked to in the chat read for for more details. And I know Chelsea is there a good way people want good contact you if they have specific information maybe there's a contact you can put in the chat in case everyone has a particular question that they'd like to follow up on. Otherwise, I don't think we have any further questions so just say thank you to everyone for coming thanks to those of you who asked the question. And of course Chelsea and all your colleagues worked with you on the study for really doing the service to our field as a whole really appreciate this work and for you being willing to share with us today. Well thanks for this opportunity and thanks everyone for coming. All right, thank you.