 Ddwy'n ddweud i'n meddwl am y cyfnodd yw'r ysgolwyddiant. Mae'n gweithio i chi ddweud o'r projeg Clarity, ac yn fwy o'r colegol o'r unig o'r Salford, Sheffield, Notaam, ac Cardiff. Y Clarity projeg yn ym 2019 a'r ysgolwyddiant yn y cyfnodd 5 yma o'r ysgolwyddiant yma i'r Ysgolwyddiant yma. Yr ysgolwyddiant yma yma yw'n gweithio'r ysgolwyddiant, targeting the hearing aids feature noise processing. Our advisory group includes representatives from the hearing industry research consortium, which includes several major manufacturers of hearing aids, and the UK charity RNID, or Royal National Institute for the Deaf, which represents hearing aid users. We all know that hearing loss is a major problem. In the UK alone it has been estimated that by 2035 there will be 15 million people with hearing loss. ac nid, ych spawn i ddechrau, eich mynd yn gweithio debyd y tawnedig ar eu cymdeithasol. Gweithomheith ac Cymdeithasol yn ystod yn amdano i'r cymdeithasol a'r tyfnod y tŷwch yn y llifod dyn ni. Fe ydych chi'n ystod yn ddechrau ac mae'n gymriniad gwzir â yr adegau yn ei eithaf yn gyfrifiad i gael agafod, i gweithio'r adegau hyn a'n golyg i gael ar gyfer y gwasanaeth. Over the last few decades there have been major advances in machine learning applied to speech technology, for example in automatic speech recognition, performance is unrecognisable when compared with what was possible 10 years ago. The aims of this project were to bring communities together to work on the problem of speech and noise for hearing aids and to broaden the base of researchers working on hearing technology. Our argument was that it's not enough to only have people who are experts in say signal processing or machine learning. We also need to involve people with expertise in hearing and speech technology, however currently the tools and software aren't readily accessible for people from the speech and hearing communities. Today first I'll give you an overview of the project and challenge methodology. I'll talk about the two challenges in the first round, how we plan to evaluate competition entries and what materials tools and software we are providing. I'll end with a brief discussion of our future plans. I wanted to spend a minute talking about the idea of machine learning challenges and what is required when applying such an approach. If we look at speech technology in particular challenges have been going on since the 1980s, but more recently John Barker and colleagues have been running the Trim challenges for distant microphone speech and noise recognition. The idea of these challenges is providing frameworks to allow people to contribute the component that they have expertise in. Other speech challenges such as the reverb, blizzard and hurricane challenges are like our challenges in needing real listening evaluations. Each year they build on the state of the art from the previous year to improve the baseline, however none of them has evaluated their systems with hearing impaired listeners. What makes a challenge? We need a common task with predefined evaluation criteria so we can compare systems and we need a set of well-defined rules that constrain solution space. We also need to provide common data sets especially for testing but also for training and development and these need to be large enough to be suitable for a machine learning framework. We need to provide a baseline system that counts as a reference system so that people can evaluate how well they are performing and also so that people don't need to work on a complete system, they can work on one or two components. This is particularly important in this context where we're trying to encourage people from many different fields to contribute. Then we have a schedule where the test set is introduced immediately before the submission date and we evaluate the systems that have been submitted and hold an event at which we announce the winners. We realised that it wasn't going to be sufficient to just have the signal enhancement challenge, that is we needed to ensure that our machine learning models were maximising the right objective. So we need a system that can sensibly predict intelligibility. However, there isn't necessarily enough of the shelf intelligibility model available so we realised that we needed to run speech intelligibility prediction challenges. The enhancement challenges provide data that can be used to design and test prediction models. There will be three rounds of increasing complexity, for example with a move from static sources in the first round to dynamic sources in subsequent rounds and each round will build on the last one. The first enhancement challenge launched in February and we are currently evaluating entries. We plan to launch the second enhancement challenge early next year. The objective of the enhancement challenges is to develop hearing aid signal processing algorithms that improve speech intelligibility. Dispens are given speech and noise stimuli and listener pairings and generative tools to create all of the input signals to the hearing aid processor. The participants submit their process speech to be evaluated and the intelligibility of their signals is evaluated by means of listening tests. In the scenario for the first round, the listener is sitting or standing in a small room that has a low to moderate reverberation. The person is listening to a target talker or simultaneously an interfera sound is playing. The room dimensions boundary materials and the locations of the listener target and interfera are randomized. In the first prediction challenge, which will open in October, the objective is to predict the intelligibility of speech processed by hearing aids for specific hearing impaired listeners. People can submit either a speech intelligibility model that incorporates a simulation of hearing loss or separate hearing loss and speech intelligibility models. Dispens are given speech and noise signals process by hearing aid algorithms and the outputs of their prediction models are compared with the listening tests speech intelligibility schools. In round one, for use as the target speech, we've made a new 40 speaker British English speech database, which includes more than 10,000 unique sentences selected from a British National Corpus. The noise sources for the first round of the challenge are speech and non-speech interferas. Only one noise source is present in each scene. And the speech interfera is a single competing talker. Speech interferas are taken from the SLR 83 corpus of UK nourishing this speech. And the non-speech interferas are recordings of domestic noise sources, all interferas point sources. In round one to simulate our environments were used by normal room impulse responses generated by Raven. We generated 10,000 spatial configurations in total and the room simulation was done in accordance with published statistics on British living rooms. These are simple cube shaped living rooms that feature variations in surface absorption to represent doors, windows, curtains, rugs and furniture. In these simulated environments, the target speech source is a plus or minus 30 degrees azimuth inclusive in front of the listener and zero degrees elevation at a distance of at least one meter. It has human speech directivity and is always oriented towards listener. The interfera source can be can be in any position except within one meter of the receiver or walls and is only directional. The 10,000 vinyl room impulse responses are convolved with studio randomly selected target and interfera source combinations and can be used to generate about 10 million different stimuli. The interfera always proceeds the target by two seconds. This is partly to make sure that the hearing aid has time to adapt to the background noise. Speech and noise signals are mixed to achieve specific speech weighted signal to noise ratios at the better area where the SNR ranges were identified on the basis of pilot testing. The listener panel will comprise about 50 adults who have sensory neural hearing loss. The listeners will have a loss of about 25 to 60 dBHL in their better area and will be accustomed to wearing hearing aids on both ears. The listening test will be performed at their homes using custom software on a tablet. The listener will speak into the microphone what they heard and will use automatic speech recognition to ensure that the quality of the recording is adequate before the tablet uploads the recordings to our server. In this first challenge, we'll provide the following data. The training set, which is limited to 6,000 of the 10,000 scenes. The development or validation set and the evaluation test set, which is more. We'll also provide a scene to listener pairings for the training development and evaluation data and listener audio grounds. The evaluation data set only includes the input signals for the three microphones of the hearing aid, listener audio grounds and the scene listener pairings. You can see how the scene generates the create signals that are fed into the rest of the pipeline. The scene generator is the blue block underneath. We have the mixture signals and listener metadata entering the pipeline on the left. Going through the yellow enhancement block, which involves a fitting prescription or setting to the hearing aid and a hearing aid model. And then the orange prediction block, which includes a hearing loss model and a speech intelligibility model. An outcome to predicted speech intelligibility score for a particular mixture signal and listener. Our enhancement challenge involves replacing the yellow block, while the later prediction challenge will involve replacing the orange block. Here we focus on scene generator. Scene generator is open source code for generating hearing aid input signals for each scene. Scene generator takes the message data for the scene and according to that message data chooses the input target speech and noise signals. Convolts them with the room impulse responses shown here is the geometric room model. And had related impulse responses shown here as the mixed measured had related transfer functions database. The tool then scales the noise signals to obtain the correct mixture arsenal and outputs the speech and noise mixture signals. The baseline hearing aid processor is based on the open master hearing aid or open rich aid. And in this case, it simulates a very basic behind the year model. There is no connection between two years, so this is not a true by normal hearing aid. And we aren't modeling the two board enrolled or considering the direct path or microphone sensor noise or feedback in this round. Our Python code creates the input signal from the front room microphone signals and a configuration file for a given set of audio grams and runs open MHA with a configuration file and input signal. The game table for a multi band dynamic compression component is calculated according to can fit compressive prescription, which is an early version of Camry Q. Our baseline hearing aid is designed to be a basic model, as I mentioned, which does multi band dynamic compression and attenuate noise coming from the rear hemisphere. We didn't want to implement any proprietary software components which can't be replicated, such as true by normal processing. Our hearing loss model is a Python implementation of a model developed by Brian Moore, Michael Stone and colleagues at the University of Cambridge. It simulates the main aspects of sensory hearing loss, sensory neural hearing loss, the raising of auditory thresholds, loudness recruitment and impaired frequency selectivity and temporal resolution. The baseline speech intelligibility model is a binaural version of the short time objective intelligibility metric termed MBIS story, which requires access to a clean target speech signal, hence is intrusive. MBIS story is a correlation based model that can estimate the effects of non linear processing on intelligibility, such as any clipping performed by the hearing aid model, and which doesn't require the degradation being the form of additive noise or reverberation. MBIS story includes an equalisation cancellation stage and a better stage to model the advantage of having two ears. Our plans for the next challenge and next rounds are as follows. We will create our planned large open access database of audiological and auditory performance data for 50 hearing impaired listeners, plus some healthy hearing listeners, and this will be available for the first prediction challenge and will be augmented in future rounds. Future rounds will involve more complex scenarios. In round two, the scenario will again be indoors, but now the sources will be moving. The plan is that the third round will involve outdoor environments such as communication inside a car. In future rounds, we also plan to provide state of the art enhancement and protection models as baselines, which build on the best entries submitted in the previous round. We also plan to evaluate simulated hearing aids both in terms of the intelligibility of the speech transmitted and the quality of the speech. In September, the results of the first enhancement challenge will be announced at our first workshop, which will be held as a satellite event in speech 2021. If you are interested in entering the first prediction challenge, or you would like to be kept up to date on the project, please go to our website listed here. Thank you.