 Computational Audiology Network Complex models, technical care, digital health and patient-centered outcomes Welcome everybody, I'm really excited to record this second episode of the Computational Audiology Network podcast and I'm really happy with today's guests, Jessica Monneken, she's working for the Null at Sydney Good morning Jessica Good morning And Nicky Chung White, she's also working for Null at Sydney, good morning Nicky Morning And thank you Jessica and Nicky for spending your weekend in this podcast or sacrificing it And Dmitri Karnesky from Google, you're stationed in New York, so good afternoon Dmitri Good afternoon Good to see you all and before we further start this interview I would like to explain everybody at home the system we are using So I'll prepare a short statement, a disclaimer So you're witnessing a recording of an interview that was prepared as an experiment using automated speech recognition So that's a system that translates speech to text, all live One of the participants, Dmitri Karnesky is deaf and he reads the transcript And he needs this to follow the discussion, the other participants are normal hearing And we all need to take time to read the transcript and confirm that we understand each other So that's for us, something new to take into account I'm used to listening to somebody and reading then sometimes the transcript but not talking myself and reading the transcript So we'll see how that will work out We are using Google Meet and Google Relate It's a prototype system not yet publicly released And it's been specifically trained on Dmitri's speech And in addition we are in different time zones Jessica and Nikki are 10 hours ahead of us And Dmitri is 6 hours lagging so we are 16 hours apart And we haven't met in person before so that can be sometimes a barrier And English is not my first language nor Dmitri's So that might be a challenge for the speech recognition system as well So let's hope that technology will not fail us And there will be a video recording and audio only recording And the edited video recording will also include the transcript of what is said by Dmitri And I guess for people at home the final recording may look different than from how it's experienced live I was really glad that we were able to practice this a little bit And I would like to continue with introducing the first guest, Jessica Monaghan Jessica I think we met two years ago If I remember well You remember the VCCA? Yes, I remember I gave the first talk Yes indeed and I remember I was a little nervous And I think your video clip it didn't start up right away But you kept your cool and I thought I guess it will bring good luck today as well So you work as a research scientist in Sydney and with a special interest in machine learning in audiology You studied physics in Cambridge in the UK and received a PhD in Nottingham And then you continued working as a research fellow in Southampton And your work is focused on speech recognition and how to improve this in case of hearing loss And you shared that you recently have studied the effect of face masks on speech recognition And Jessica could you explain us your initial interest for ASR or automated speech recognition Thank you, so I started researching using ASR as a master's student in Roy Patterson's lab in Cambridge And that was also my first experience of research and my first introduction to using machine learning So that was something that I found really fascinating And there I was working on a project trying to improve the robustness of speech recognition to different talk By using a human auditory model as a front end To try and give it the same robustness to different speakers When a human hears someone talk they hear the same thing no matter what someone's saying And despite the different acoustics of the situation But yeah at the time automatic speech recognition had to be trained a bit more for individual speakers So that was really interesting and I worked in other areas for my PhD and post-doc Or they're still looking at machine learning But I always retained this interest in automatic speech recognition And then when I started working at now that was in 2020 So it's just at the start of the pandemic and so we were seeing the impact of face masks and barriers on communication Particularly in clinics and so we've done this research looking at how face masks impacted speech And how we could apply a particular gain to try and improve understanding for hearing aid users And with the ubiquity of ASR by that point and having it on different devices Then it was clear that to us that that could be used to aid communication So I was really excited to work on NowScribe with Niki And she'll probably talk more about that Yes I'm sure that Niki will further explain NowScribe And I just wondered with this effect of the face mask is it then more acoustics like filtering Or does it have an effect on your articulation? Apparently there isn't much effect on your articulation It really is just the filtering effect of the mask Because I experienced sometimes that if you wear a mask and then your chin is pulling more or less your mask from your nose And then you're well maybe not articulating that well but you didn't see that effect No in fact for instance surgical face masks they have much effect on the acoustics Even though they're constricting your face in the same way as other masks But yeah it seems to be just an acoustic filter So it was primarily a gain or some compensation that you could then build into your system Yes that's right so you could apply it as an additional gain for hearing aids So that would be a setting that they could change to a mask mode when they needed that Ah cool so you've applied it to different devices both the NowScribe then as in hearing aids prescriptions So yeah we haven't applied that to NowScribe but we did find that it worked quite well with masks nevertheless We did some tests on that Ah okay yeah I thought that it was also applied to NowScribes but it's then in the hearing aid devices and the rehabilitation that you applied it Yes that's right we considered applying it to NowScribe But since you're able to be quite close to the microphone then the calculations were done assuming the talker would be at some distance from the speaker So yeah we found it wasn't really necessary Okay good to know and then I guess a good moment to get over to Niki Niki you led the development of the app Previously you studied electrical engineering at the University of Auckland in New Zealand And you received the PhD in speed signal processing at the University of Wollongong in Australia You recognized at least the name And then you worked as a DSP engineer with several research organizations including Motorola Australian Research Centre and the AT&T Labs I see that you hold 10 patents and yeah you were the lead developer of NowScribe, a live captioning app to help people hearing difficulties Could you explain us why you started developing this app or why your interest in speech recognition initiated Yeah thanks for the introduction, I'm impressed that it's got Wollongong for it There you go and so someone must have programmed that and I'm sure Yes I did my PhD in speech signal processing at Wollongong University And that was probably my first sort of introduction to digital signal processing techniques to analyze speech features And find efficient sort of parameter representations of speech And so even though during my PhD I was focused on speech analysis and coding and a little bit of synthesis It's all the same methods that are used in speech recognition so it was quite a strong foundation And then after that when I worked at AT&T Labs and I think one of my first meetings there was a presentation by a group who had just recently They recently released this new intelligent voice response system which was called Behind the Scenes We called it How May I Help You and that was when an AT&T customer could just ring up on the phone And instead of being met with this automated sort of robotic system that said press one for accounts, press two for products and services You just had a voice, another automated voice that said how may I help you And the customer could just speak naturally and say no I'd like to pay my bill And that was quite mind blowing at the time, this was late 90s early 2000s And I think was the first sort of system like that And then I think that really inspired me to delve more into speech recognition and natural language understanding So yeah fast forward 20 years, we're working at NAL And with the pandemic and we saw now opportunities where we could revisit what we've done previously in speech recognition And really now not so much develop it further ourselves that how do we apply that speech recognition technology And package it in an easy way for people to access And yeah that was when we were doing research on a lot of user research on the problems they were finding With communication, with masks especially That was when we thought there's a real opportunity here to produce something that can really help people and make a difference Because we were discovering people had really strong emotions Like negative emotions when they were trying to communicate Frustration and embarrassment and anxiety People who didn't want to go out, like they were staying home or avoiding both social interactions Because all those communication difficulties So that was the motivation behind NAL thread Wow yeah and also impressive how much has happened then in well 20 years in improvement in the system So if I understand well it has been more focused on the last part in the design how to make people use the technology Or that you translated into benefits for persons in using it Yeah that's right amazing researchers like the people at Google and Apple and Microsoft Who have done all the hard yards and collected all this data that we now have these more sophisticated training methods We're not just using our acoustic models and having this little speech corpus to work with It's just millions and more than millions hours of speech in real situations from YouTube from phone calls from everything Our focus at NAL is how do we turn that into something that can help people Thank you Nikki and I think we cannot wait any longer with listening to Dmitri who has done quite some work I guess that was important preparation for the later work done by Nikki and Jessica Dmitri you work as a researcher at Google, you lost your hearing in early childhoods I understood you studied mathematics in Moscow and also received the PhD there And then you started working at various research centers including the Max Planck Institute in Bonn in Germany And also the Institute for Advanced Studies in Princeton the USA And then you joined IBM in 1986 And I think you've been working for more than 25 years in speech recognition if I'm correct Dmitri And then you joined Google somewhere in the last five years I think I didn't know the exact date I saw that you had developed Google Live Transcribe Google Relate the systems we are now using today But you also worked on other technologies to improve accessibility And in 2012 Dmitri was honored at the White House as a champion of change for his efforts to advance access to science, technology, engineering and math for people with disabilities And Dmitri currently holds over 295 patents So I hope this was captured well and Dmitri I'm really honored to have you here in the show And I wondered with your motivation to work on speech recognition when you decided to study mathematics Did you already have that ambition then to work on in this field? I had no intention to do speech recognition when I did math After my dream was to work forever in mathematics But then after I received PhD at that time it was Soviet Union My family and I decided to immigrate to Israel and I repeat very well in Russia But I realized I would not be able to repeat so well in Hebrew and in English So I knew that Hapchik helped me to repeat but so while I was waiting for permission from Soviet Union to immigrate to Israel It was about 10 months of young electric engineering and developed Hapchik wearable device that has terror channels One channel just for low band audio that amplification but as it transformed high frequencies to low frequencies So I could understand frequencies In Hebrew you have a lot, shalom, shabbat, akshah I took this device to Israel It had small speech recognition technology And I got some grant from Israel government And there were startups And this device had a lot of impact It was first wearable Hapchik device But I continued to do mathematics But when I immigrated, when I went to America and worked at the Institute for Advanced Studies It was very difficult because there were no transcription services at all in America I decided that I should have opened a speech recognition It had a temporal break in mathematics And it gave me communication means for me And because I gave Hapchik speech recognition technology I was speech recognition talk me I did not talk about the most abstract mathematics I was the first person who could do practical application In five years we developed speech recognition technology That would solve all my problems, other people's problems Five years passed And for the next five years For the next 35 years We developed good algorithm That was just significant But still not enough to be used for communication And to find a way to move to Google In the final we achieved very good critical accuracy Out team And I moved to California from New York to develop practical application This is my story And now you went back to mathematics I started to try to do mathematics And solve 50 years old mathematical problem That my advisor gave me 50 years ago And finally I had time to eat and finish it Great, so problem was a little bit underestimated You needed much more time to solve the problem of speech recognition And we are all having benefit of this today So glad to hear you have now more time for your other passion of mathematics But still is there a mathematics and the tools maybe that you developed there And for instance machine learning or the data analysis that is needed for speech recognition? First, I did a work as a mathematician in speech recognition I developed new optimization algorithm I don't know if you heard about Baum-Wirch algorithm For hidden marker for model That is the type it works only for polynomials For polynomial functions For maximum likelihood Nobody knew how to extend this to different kinds of objective facts Like maximum mutual information And my contribution was I discovered algorithm that called extended Baum-Wirch It extended efficient quasi-linear algorithm Different type of objective function It allows significantly improved speech recognition algorithm But now I am trying to apply my abstract mathematics Algebraic geometry number here in machine learning I am trying to develop new kind of machine learning That is based on more abstract mathematics Impressive and I must admit I am not able to fully grasp and appreciate it How you have been able to do this So I expect and also that you have been working probably in a team With many different specialists for developing for instance Google Live Transcribe Absolutely correct Google Live Transcribe became possible because I had very remarkable call walker My friend, chat call was compassionate Difficult as I had At that time I used only manual transcription services From stenography And I told chat that speech recognition already good enough You have already this speech recognition in Google Documents Google Docs But you need to click on the microphone each time that you want to speak And it immediately stops if you don't pause I could not use this for conversation So when we came to port this system into Android And gave me first prototype And this has started and Polish it Tested for user And there are many languages Like transcribe was born And got a few more very talented people Like Sager and Azos who was project manager And got team, big team of software developer That implemented this You got also simplification So you could check this Doc market Baby crying You can see that has more and more wonderful features For example now we are adding offline speech recognition Now live transcribe has offline speech recognition Before live transcribe required data Or by file connection Now you can use live transcribe It's beginning to be gone to public soon You can use this in air water While you're losing connection You can use this in India In Africa where there are no good network connection Wow that's a really important development Also the robustness of this system With low connectivity And Jessica I remember you also had a question For Dimitri that maybe fits in nicely I think now in his explanation so far So I was wondering if in your experience Of using ASR and developing it So did you to see a gradual improvement Or was there a particular step change That you can recall in the accuracy Yes two factors There is a significant improvement of accuracy For usual network of course Computers became powerful enough To process faster And second factor was That you got unlimited data for training From Lutrub This was actually my basic work at Google Because Lutrub has manually uploaded caption But manually uploaded caption has a lot of errors You couldn't use it directly For training speech recognition So you put filter That if high probability detected Each address segment Had good manual transcription And you got so much data In one day you suddenly improved Future recognition By many many percent I remember before that You could spend several years And we were happy to improve Future recognition accuracy By quarter percent Suddenly every day Five percent improvement Also more five percent improvement It was exciting in our team Wonderful Jessica do you have follow up questions Or Nikki is there something you would like to add Or ask to Dimitri? Yeah my main question to Dimitri was What are the next barriers to overcome Like that you see What's going to get you that Well I don't know if we can get an extra five percent Improvement but is there something That you see is holding back The accuracy to where it is today Or can we have Are we close to the limit Or is there still a lot of room for improvement I think the next barrier Is our people who have no standard Speech like me Only I do not find Many people speaking like me So you could not use data To create a model for me And if you have people who have A, S, S Now our future cognition Will work for them too But we need a record specific So Relate specific Get data from people Who have no standard To get a record And they'll get for them To get recognition But also this model is local What you see now Is not on network It is on my device And actors can understand you also very well Don't want to try Yet me Don't connect Don't want to speak And you will see it will start to Describe you Okay I'd say Nicky, honours yours to Have a try Now it's coming up Speak The next thing coming More personalized Speech recognition For individuals and training When I'm looking now And looking at your transcription And the transcription That comes with Google Meet I can definitely see That your transcription is better So it does Show a lot of promise for that individual Training And the benefits that you can get Yeah I guess The next thing is How can people do that Without spending a whole year To train like you have And many hours can that Being done more efficiently Maybe by I'm just thinking here And sounds and targeting That training material To make it Yeah, to improve accuracy So definitely Very interesting Times ahead And Dimitri My local Speech recognition Not all understand to me It understand other people This is one An answer to your question I see a solution coming When you get enough Classes of Similar speakers People who have HLS Got a lot of Speech From different people With HLS So a new person comes That person doesn't need to train Too much This is an eventual solution For all actions Accommodate a lot of class Yeah I actually had to Stop transcribing A local system That doesn't distract I had two Follow-up questions One is Beforehand we thought Will the system have troubles with our Accents Jessica, Niki and I All speak differently Looks like there have been Quite some clusters of British or New Zealanders Speeds Accent speeds So my question would be more like What role could clinics play Here because I think that clinics Can help in collecting Or motivating groups Of patients with similar Disease or similar Symptoms Similar Atypical speeds And that could help In collecting data What do you Three of you think of this And maybe Niki you have already Working also with different clinics In Australia What could there be An opportunity Sorry my AirPods just went flat there So I may I'm just relying on the captions I understand What you just said Yeah We because we release now Subscribe and we've been doing More clinical testing So we are looking at how It has been performing In Australian clinics In US clinics And now coming up In the Netherlands We do have different Variants of English that The user can select I'm not quite sure Yeah How different they are When you select Australian English Versus British English How similar Are these models But I think what's the question More in terms of atypical Speech we haven't actually delved Quite into that Yes but it would be interesting Yes I think it's also Looking it's about Finding useful clusters cause For instance in the UK also Have from city to city the Accent is different and You could Debate whether head is UK English or The same is also in the Netherlands That in some regions Yeah the accent is quite It's a dialect So then just the label Dutch Doesn't capture it all So how would what would be Then a good strategy to know When you have a valid Cluster or something or And I'm wondering who's Best to address that question In atypical speech So Nikki you didn't Yet look into this Problem No and we have been mainly Looking At the use case where It is a Normal hearing speaker Speaking to a hearing impaired person In that clinical setting To improve that communication Yeah definitely To look at more of that Two-way communication is something That would be really interesting But in the Yeah in the development Of Niles Fry we were Really looking at The hearing impaired person As the listener Okay yeah and then That brings me to the follow-up Question then for broader Applications of this technology Also what you now bring up The barriers Of communication between People Irregardless of hearing status So Nikki it's actually a good example Now that you are also Now relying on The transcription So we have now two people Listening and two people Reading So Nikki maybe you experienced Some new barriers And What do You think Could be the next Steps to Relief this and maybe good If I ask you Jessica If you would comment on How to further Yeah develop this with the opportunities That Well already are Described before Okay so I think the The next advance would be To take advantage of the technology That we already have So there are all these situations Where particularly at the moment Where there are in fact physical barriers To communication If someone with a hearing difficulty Goes to their DP clinic And they can't be understood at reception Because they're not using a technology That's widely available to I think that's excluding People unnecessarily So I think there are a lot of situations Where this could be Beneficial already And particularly in that situation Is quite a good one If you have a business that has a Tablet and a microphone Then even if you're in a noisy Waiting room Then they're able to take advantage Of that good signal to noise ratio And typically get very good captions So I think that's Using Existing technology And in terms of future technology I think it's very exciting Is the emergence of augmented reality Systems So AR glasses If you had captions on an AR glass AR glasses So you could see the captions When you look at a particular person Or different captions for different people And maybe additional Information So Sounds that are going on around you I think that would be a Is going to be a really wonderful Use of technology Yeah, I fully agree I think that could really Help in this augmented reality And It also brings me to another question Maybe to you, Niki If people start reading From a tablet How does it affect For instance the ability to Read lips And also include the facial expression Yes, we've had feedback And recommendations That we can put forward Based on the experience of just observing How people are interacting and using This technology In a clinic situation And definitely People with hearing loss And even people without hearing loss That's just become more apparent When everyone's wearing Face masks And even as a normal hearer I struggle to understand Because they've just taken away This facial lip cue Which I didn't really even know I was using Now that I don't have it I find I just need to concentrate Harder on understanding So we do encourage When it's used for the tablet To be placed close to the person's face So that they can Don't have to do a full head turn To turn between the tablet And the person speaking We're encouraging people to pause more After that While in between So that the person who's Reading the caption has time To basically catch up We've found a lot of people saying They like the captions To confirm what they've heard So a lot of people May be listening and be able to Recognize all the words But maybe it takes A little bit longer for them to process And understand And really get a good enough understanding To be able to engage Fully in the conversation Yeah, definitely positioning of that screen Is really important Of course, when we all do have augmented Reality glasses that may Make things a lot easier But until that becomes more Available and then affordable For the average person Certainly There are definitely things You can do with the existing Technology to make it easier And hopefully we can Pass more of those Acceptability barriers So is it acceptable for use Is it a usable thing If we can't get past that Then it doesn't matter how good The technology is, people Should come up with other strategies So that That sounds to me that you had Actually A lucky situation that because People were wearing a face mask They lost lip reading anyway And the step To using a tablet then For reading text Was smaller And it could be that as soon as people Get familiar with this technology Then in the future We'll find maybe hybrid ways Of how to use this technology And In your approach in developing I read you follow the more Holistic approach In your design Is there something in particular That helps you here in better Finding The needs and how to Maybe change your design In this process Yeah, we followed What's known as a design Thinking process, so really starting From the customer And the user point of view What are their problems and needs And as you've just mentioned The mask really heightened that need Generally The people With a disability Are your first adopters of such a technology With masks It basically gave everyone a disability And when we really make That need so much stronger Yeah, it does reduce that barrier It makes it more likely that That technology Can be more mainstream and picked up By more people Yeah, we went through An iterative process of understanding The need, coming up with a prototype And putting something out there Our first version of NELScribe Was really basic It really had Very little features And as we Would talk To our users, we found Things like they really wanted the privacy They wanted that offline Mode for a clinical setting They wanted the screen Automatically cleared When they used it In a reception counter use case Where you don't want The next person in line To see what the previous conversation That was had between two people So all these little things Would, yeah We took note of and we could Incorporate that into our app Nice, and I think Dimitri, I saw you Wrote this paper about Different use cases Some of those use cases are Also covered by Niki But I also saw the example of For instance, students listening To a professor A lesson, and when you're Briefly Interrupted by something else A message on your phone and then You want to get into the story Again, you can read the transcripts And that way it could be Of course also come in handy For people without A hearing loss And Dimitri, did you For Getting to these use cases What kind of strategy Did you follow, was it based on What you experienced yourself Or did you also Ask other people With hearing loss I hope my question was transcribed Clearly We did Considered very Stretching And sometimes you don't Want to listen to transcripts To meet in all the time You In due to what To do something else Maybe you're bored, a little bit Neat You got interesting message In your phone So you can look Into phone Then you can Miss something That way Glasses were very loose I don't know if you saw All of the published pages About using transcripts In glasses I don't know if you saw this I didn't see it yet But I see So how was that experience with the Flosses How nice it have glasses When you interact That shows the video Where I have dinner With a lot of people So I do not need to put It Maybe phone To look at transcripts I enjoy Looking at everybody I can eat And follow transcripts So back to the example I can do something else But transcripts are running in glasses I continue to follow Transcripts What is being spoken But you also have Transcript It vibrates if somebody calls me So If I am using Transcript It also integrates with my haptic device If somebody calls me Demeter People are calling me So these are all user cases Scenarios You are absolutely right Glasses change Human interaction completely You can Either follow presentation Otherwise In ZPA And lecture Point to something You do not know what you pointed At Glasses you see What was pointed in slide And in the transcription You describe all the user cases And Dimitri Are you using all these Technologies now in daily life So if you have A dinner with friends You are using already these Glasses Sometimes I am using it And What are reasons for you then Not to do it Now I do not need It is very convenient Not really I am not planning to Start to watch my phone Really When you are talking We have your full attention Cool Let's see For this Round of Potential applications Of course Most important The end user The hearing loss But also the clinician Could be a user Of this technology or play a role What role do you see For clinicians In Either promoting this technology Better For prospective users Or to help in Validating the technology Or improving the technology What are your thoughts And If I do not give a turn How this will work Now in the system Feel free to answer Take the initiative I was just going to talk About how I thought it was Particularly useful For medical Situations So we know that People with hearing difficulties Are at a disadvantage So they have worse health outcomes And Higher rates of re-hospitalization Low compliance with Medication and all of these factors Are Magnified If they have said that they have Poor communication with their physician So I think in that situation It is really Really valuable for the clinician That they know that the information That they are trying to get across Is actually Has been understood And also this possibility Of then having a transcript That the patient can take home Or they can share with a carer Hopefully That should Improve The situation Improve outcomes a lot But in terms of actually Validating it in the clinic I think If we have So we did test in the clinic And we did some questionnaires And I think the value there Is really that it demonstrates To the clinicians That this is a worthwhile thing to do That there is actual benefit Because of course they do have A lot of skills Particularly audiologists at Communicating with hearing impaired people So they may not feel Any help with that That's what they do every day And so it gives them some confidence That actually clients do find A benefit And it's valuable And also that helps them to adjust To the clinic owners But if there's some expense Or some time that's needed To get these systems set up That it's worthwhile That it's providing benefit And increasing satisfaction Thank you Jessica And also interesting point You raise of clinicians Actually being maybe a barrier Sometimes if they feel That they don't need it Because they are already Taking into account in the communication Niki did you find this In your pilot That Clinicians were not open To this technology I think In our pilot Our clinicians were very Encouraged to try it out I think it was Maybe more of a barrier from The client point of view Who thought Depending on the degree of hearing loss If someone doesn't have A severe hearing loss They would say, no I don't need it I'm okay But through our own experience Or even just as an assistive thing You're not relying on the captions But just having it there can Be helpful so we were trying to Encourage more and more people To use it or at least Just experience it for a little bit Just to see How they thought of it Because you don't really know It's with any technology You don't really know What it does until you actually try it And I think also A good way That clinicians can help To show this technology to Especially our Older clients Who aren't so tech savvy And Have an experience Discover this on their phones for themselves We had some clients say Wow this is amazing And we're like There are other apps that have been around For at least a few years or so That have been doing this It's not groundbreakingly novel right now Yeah giving that introduction To More people Can the clinicians can help us in that way And then just one other point I wanted to make Is we know that an improved Client-clinician relationship Leads to better hearing outcomes In the clinic And definitely Introducing live captions If that client feel more valued More included We understand Your difficulties And this is what we're doing to take steps To improve that for you I think that can really add To the rapport And the relationship building Not just in the clinic but in personal situations as well So Yeah that's another good advantage And for clinicians Now maybe listening To this podcast Do you recommend There's a kind of minimum hearing loss Or a type of persons That you say you should definitely Recommend them to use these apps We certainly recommended for More severe hearing losses Or complex cases But it can help over A very wide range so I wouldn't say Don't offer it to People with more mild hearing losses Because it also works well In appointments where there's A partner Or a significant other there It has a lot of Wide ranging benefits So definitely The people with severe hearing loss Were more Excited about it And we could definitely tell That they gained Most benefit But I think there is benefit there For everyone And do you think it could also work The other way around That people experience Benefits of these apps That then they are also more open For other assistive Technologies Yes certainly and even As I'm talking here and we're Reading our captions It does encourage me to speak More clearly and more slowly And enunciate better So I think it does Yeah It's also helping me Even though I'm not relying I actually do have audio through The speakers now I'm not relying on just the captions To understand Yeah it is helping Training my voice better And as you've mentioned we all Have different accents And nuances in the way we speak Yeah it helps That part of the speakers communication Not just the listener Yeah you're correct It gives feedback Both to the listener As the talker And both can learn from it I expected It will also improve my English For instance by just seeing When there are regular errors So that's Interesting that this feedback Can be Used for training Or I can also Imagine that for some people It could help to just focus One channel of information instead of Many different modes And that it could help With people with attention Deficits Jessica I see you raise your hand I was just going to make A comment about that it reminded me There's a campaign In the UK at the moment That parents switch on subtitles On their televisions Because research has shown That it helps children Who are learning to read And improve their reading ability So I thought that was interesting Yes Use of the technology A nice example of something Probably not foreseen When developing the technology And I guess that if There is a widespread Use of both the Speech recognition systems But also for instance these Ear buds I see many young people wearing it And it reduces the stigma Also of using hearing aids For instance because Everybody has something in their ears And if everybody gets used to Close caption For instance now in the Netherlands It would look really funny If Everyone would get Close caption because that's only Done for people with dialect So then It would be maybe Good if everybody on television Would receive close caption Also for those people Who need this or are complaining That They cannot follow Interviews And it brings me maybe to Another question that The main complains that My patients tell me Is that they want to Better understand their grandchildren And A big problem is that The children are moving around all the time But also they have Voices that are Probably less Familiar to the systems Because there's not so much recordings On YouTube of Three year old So Dimitri Do you think there are solutions for this Because I would put it High on the priorities For future developments I do have Patterns That I receive At IBA For future cognition That learn to recognize Babies While baby are crying Do they have stomach Or This card Stamina And Suggesting To print this data came Expert Who spent a lot of time With babies They can interpret While baby are crying So they could Cheat Recognition system That parents Who have their first baby They could rely on this system Wow, that's wonderful I remember Reading in the news That this was developed Maybe it was developed Also had similar system To recognize dog barks Well If you found that this system was developed That would be nice to see So if I understand well Then the system was warning And it could either be a dog barking Or a baby crying Or it could explain Why dog is barking If dog is hungry Or somebody Trying to enter the house Yeah So for safety And Sound awareness A really important feature Jessica I see you have another Comment or question So that sounds really I would like to have Had a baby interpreter I can tell you But I saw there was A paper from Google Maybe from DeepMind About Speech recognition from children Because they have This YouTube kids app They actually have a big database Of children's speech From them trying to interact With this tablet They did Try applying A system trained With children's speech To try and improve the And got a Small increase in accuracy So I thought that was really interesting That they had this database So that's a voice control By toddlers that's Collected them and This way they can better Command their grandparents In the future That's right What are you doing? But one of the Jokes when we discuss Development This will prevent toddlers To improve Their pronunciation Yeah They're trying to speak better To understand Everybody understands Them No matter what they remember This will be a big problem For their development Nikki I see you want to respond I see I just had a thought When you said that I think an application Of automated speech recognition Could be in speech therapy And training children Who have speech deficits Or Trouble pronouncing certain sounds To help them Develop their speech More As an application To have an automated Recognition Give the child feedback I know even in my own Circumstances we have a google home And my son will Ask it a question in the morning And google says Very politely Sorry I didn't understand that And so he will repeat And change the way He's speaking to speak more clearly So that google understands And I think it's a good way To do it because as a parent When I was trying to do speech therapy With my child You can try not to A little bit impatient But an automated system Has all the time in the world And can be quite engaging For a child to interact with So yeah another application For speech recognition Yes and So there's Learning but this is more about the human Learning And Then I wonder How do you Get the right direction of the learning If the machines would adapt Too fast to the speech of the children Then they never need to develop anymore Because they're understood So the machine somehow Needs to encourage Or motivate the children To improve their speech Well in the same way When communicating Is important with the relatives The system should allow it Any thoughts on this? Dimitri how we could Both train Humans and machines Yes This pattern that I wrote That I've been addressing It suggested Incremental Improvement So It's too far from Babies to speak Normal But If Speech recognition sees That small changes For babies needed to speak Corrected It pretends that it does not Understand This way Babies improving Because small things But they understood Very difficult things They can't improve right away Wow Now that you mentioned The same technology is also I think what we need in Fitting cochlear implants Because if you make it too big A step in change of the Patterns that people hear They have difficulties in Adjusting to it and Improving When making smaller steps Or the right steps Not too small and not too big Then they are improving better Of course Completely away from the Speech recognition But it's the same principle Of training Or somehow Providing personalized Care Or personalized medicine Or proper doses Looking time wise I don't know how You're running out of time And I think it's really nice That we have touched other Topics already And it looks like We can somehow Loosen up The structure a little bit That it's more spontaneous Than we thought Before I think it's a good point To wrap up Yeah, I want to Thank you again for this Nice conversation But also that Yeah, I feel I learned a lot About this topic but also How to use this system So maybe Everybody if you'd like Could share his or her Experience in how you thought This interview and this technology Went So Jessica, how do you Think or what did you experience I don't hear you Know Apart from that slight technical hit So yeah, I find the captions Really accurate For me considering Normally I have to select British English Or put on my best Australian Accent for instance To understand me And luckily I haven't had to do that Because I have to live here That's been really great and I'm just amazed by the technology With Dimitri's captions That's such a Great experience, it's really Yeah, I thought it might be Not that conducive To having An easy conversation but Yeah, it's been really smooth So I'm very impressed Dimitri Thank you for your kind Of words And I really was enjoying Talking to you Hearing such fresh Point of views And I do agree For you that Actual speech system is a Focusing on some Accent like British Accent, India Accent Me louto voku we As the general British And many Many hundred thousands Of British Accent But maybe you don't have So many hours For Accent And usually That you have all kind of Accent You don't have any light Transcribe also Special Accent India Australia But I found that General Really Nice Dimitri And I think But also answers from me already The question about this smaller Languages, so it depends Then probably on how active a Community is on YouTube For how easy it will become And the more data of course The better because Of the other questions for me was that For instance, I think Bengali is one of the 10 major Languages But it's Poorly supported Digitally So that could have to do Maybe lack of recordings Of this language Exactly When we start For many languages I could see That all other languages Europe Had 10 times less Recorded Videos Than English So for a long time English was the most accurate channel For street recognition And other ways to Circumpend this Or to further improve that That you'd need less data For these smaller languages In the past But now we are developing A lot of smart ways Even For languages That we are lost In the future we cannot read How people speak We have some development For Very rare languages To develop future recognition For technologies developing For Instantly On the amount of data Trying to do smart Nice I'm thinking We should really Have another session On further developing these Ideas Niki, how was your experience In this session And your temporarily hearing loss Yes It's been a really enjoyable Discussion Thanks for setting this up And thanks to Dimitri for Showcasing your Relate up as well It's been amazing To see The improvements you can have By training On additional training On your own voice Makes me more excited But generally It's been a really good discussion And bringing out challenges That we are still facing And sharing feedbacks On what we have done already I'm sure we could talk For much longer It's been very good Thank you all For participating in this Interview I wanted to close The quote that I had prepared Be careful About reading health books You may die of a misprint Famous words by Mark Twain And I was a little bit anxious That we might be Would Get into misunderstanding Due to mis or wrong Transcriptions But I must say that Both the technology As US participants Went all Better than expected And I felt that we could Relax more and more Over this conversation So thanks again for Joining this and also for All the preparation And hope to See you again soon Maybe on a different event Or who knows On a future project We have already discussed A lot of potential Work that could be done Thanks It was really great Stay nice to meet you Dimitri Bye Bye Thank you