 jai thili anjhi, deki asili pharaupare manjhi. mako jaythavare, kyahtavare ki sa hubo, kyi kulta jaythavare. ramanam maranam marasa aakku na koi go puli na. sitha vare, nari manamku, porthi varo, sujok jaya jau na thila, kul na samana tobe porthi pari na thila. mani sarajemti janmao chi, phasarajemti janmao chi. bahasa madhya, dhire janmao huye, pakalo huye, budha huye, tebhum tapre mare. haa, kaanara kichu koi me, avu achi na mandar vapo, kye te kaanala asili tini. dinega mandara pela, karna manjahi ji, kora. mili tidi akhi ni, mili tidi akhi ni, khoto tole, banna mali tidi akhi. choyita jayi chi, gayi chi, sadeshi janmao jau rau chi, mamu na gau dhru na, kaha hii chi. jokhi po pila mani, 2 km parjumpa challi challi jau kile, rakti re, peki pori vapa hai, peki, ekri lakum manam koi, pakya pori vapa, ati na huye utu na thila. soli chi chau rau baishi ma chau, nishi buru varo, munjiye nchucho, chuncho jayi chucho, chau kaha thau, muna kile jau chi, sabdo thau, jau rakti kile jau chi, mma kandi kile jau chi, jau. amur jau ancholi ko upo vasa, sehi sabdo rau ke chucho tole jau me, sayi titu gani ba, sehto bol, sayi ta amaro, mulo sayi titu abo, mulo vasa abo. ye goso pa chau chau kaha, chau ngo ko jingolo, ye guru mari misi jau, mari misi jau. ame shwadinata agaro ka tha di bhabhi ba, sehto bol karo vasa, uto rancho rau vasa, keki bibhoba vasa rau, seh lekha ho, audio ho, video ho, keki jinsi saita hei chiya, peki jinsi saita hei nahi. svarana kechi saita hei nahi. her language and world view were shaped around oral culture. later in life she also converted to mahimadharma, which is a revolt against Brahminism and embraced a world view of treating everyone equally and with mutual respect. she embodied all these complexities in her speech. the lack of modern education based social contact also played an important role and it is reflected very much in how she and many of her contemporary women spoke as opposed to how many men with more social and educational capital spoke. all these nuances are vital to languages and need to be documented. unfortunately there has been very little effort to document the dialects of different eras and people from different social classes. some of the researchers I spoke with shared about no or very little audio visual documentation that is made in my dialect when came to dialects and languages that are primarily spoken whereas people could switch while writing as it is the case of mine. it is paramount to make audio visual recordings of speeches. obviously I didn't know all these things when I started I only wanted to listen to my grandmother's stories and songs again. as a Wikimedian I also wanted to archive and share with the larger world. the knowledge and the wisdom was not mine but of the society and was rendered by somebody who had relatively less privileges. obviously if you are interviewing somebody who is 95 years old whose memory is also fading fast there is a new fiction within established fiction her renditions of folklore were vivid but also full of newness though her songs were surprisingly untouched I managed to record all she wanted to share with a span of several months I had posted few of the recordings I made publicly and they were received very well I finally started putting these pieces together for a short documentary and released it on her death anniversary in October last year. I was also recording pronunciations of words and phrases using lingualibre a good piece of information I now hold a record of making over 71000 recordings in odya which happens to be the highest on lingualibre it won't last long but these are also nuggets of celebration in a Wikimedian's life while making these recordings I realized how different the tonality is from my own speech and that of my grandmother's most automatic speech recognition or ASR systems are trained with speech data contributed by people like me who are privileged to receive generational privileges that they did not work for privileges to be privileged to be educated probably live in cities many are even more privileged like myself for their gender and the list goes on if we want future speech recognition there are two things that we need to ensure the training data has to be diverse and there has to be community ownership in the process and I'm referring to intersectionality in diversity and not just token representation this means many genders edge group, socio-economic strata and so on but also the tonality has to be diverse meaning speech that is rendered in different moods of a person is better as training data as compared to the neutral sounding speech recordings keeping that in mind I realized that I was honored and privileged to have access to extremely valuable speech data I can't emphasize how important community ownership is in data world when we use the term community in a loose term in a loose sense it could be misleading as different people in a community have different levels of access and privilege how do we even have the same level of ownership that is inclusive enough it's a rather complex situation and there is no easy and simple answer to that we can only make sure that the data collection is a fair process and try to increase the diversity in terms of contributors if there is a way one must always try to compensate for labor and find funding sources something that is openly licensed and freely distributed at the end doesn't need to be all donation it's very critical to acknowledge here that donation in most context is a manifestation of privileges and many don't have that that said openly licensed speech data of community wisdom like oral literature can be a good non-extractive model it can also be done without risking the interviewee's privacy we are at an era of being both policed and threatened by corporate bill large language models communities are losing access to their own bodily data the hope is by strengthening the community's own data and building free and open source tools we could help balance such skewed proportion to some extent in short there has to be consistent check and balance to ensure that the data collection is non-extractive fair and just whose data is collected is they have full access to the resources that are created using that data that said community led processes are slower and have their own shortcomings but there is that's for another session since I primarily recorded stories and songs the tonality was different from conversational speech which itself has a strong international swing the footage used for Nani Ma mostly has monologues of narration and recital of sing songs all from memory so using oral storytelling as a source of speech data is also uncommon I have recently started this initiative to slowly listen to the speech clean up the audio in a way that retains the natural speech but only removes or discards highly noisy non-speech audio also each word is transcribed transcribed speech speech data is the basis of automatic speech recognition trainings my current process of cleaning up data and creating audio files with transcribed words is a fairly straightforward process there's a certain degree of simplicity to it too it is tedious for sure what I have is uncompressed audio or video files which were captured in a lavalier microphone so the audio that was captured in a lavalier mic obviously has a lot of noise from the surrounding from the environment because lavalier mics are omnidirectional and hence capture environmental noise in addition to speech I also recorded many interviews outdoors and one can hear burst chirping and broken branches falling this is the reason archival media processing is a tedious task as I have less and less time every day to focus on this project and it's progressing very slowly listening to someone you love and who is not there anymore is also an emotional process that apart in order to retain the natural speech I don't boost the low and high ends of the frequency using equalization I also consciously and conservatively apply amplify the audio by increasing the volume to a listening level without clipping lastly I removed the hiss or hum or different white noise sounds it's hard and destructive to remove natural environmental sounds due to the complexity of the sound as does burst chirping the retention of natural speech without any distortion is the key here audacity has a wonderful feature to mark audio you can mark the beginning and end of a word you can then type the word meaning markers then you can batch export later all these words would be perfectly exported I chose the lossless wave format for my audio it is not only high quality but also is an open standard so it keeps the door open to free and open source software developers the batch exporting also export some of the unwanted files I generally don't name during my editing process the markers for regions I don't want to use later so when they get sorted within the folder after being exported with numerals or similar automated file names it's easier to delete these files so what you end up having is a folder full of words sometimes the same word with different intonations I try to keep them all probably it's okay to delete recordings on 10 occurrences of the same word if I had time I would add emotion markers say a word used in a question and the same word used in an affirmative sentence are pronounced with different intonations if you have such tagged speech data it helps build a more sophisticated speech engine so this was a bit about my intent and workflow behind this pilot if you speak a language that is low or medium resourced I strongly recommend seeing ways to get access to archival media in addition to building a data repository of fresh data it might mean getting permission for archival recordings or even recording interviews with elders you don't necessarily need to use audacity as most modern editing software audio editing software have an option to add markers with names and bulk export if you have access to a better tool that two words same goes with licenses I chose a universal public domain release since it made sense for me and for my workflow you could choose what makes sense for you there is a range of creative commons licenses and your data also doesn't need to be completely open to the public if you fear of exploitation many indigenous communities assert data sovereignty and rightly so to avoid potential exploitation by non-native speakers to make sure that the data remains accessible to the native speakers and they don't have to go through any external paywall community moderation and vouching the checks and balances some communities create to ensure that the recorded data is used for the benefit of the community and not just for profit making that goes against the larger good of the community I am extremely grateful to my grandma who is no more and was kind enough to agree for the interviews I could make I also acknowledged my own privileges that came to me to cast and gender namely socio-economic privileges also opportunity to take out the time to be able to and afford to explore ideas and pursuing this entire project there is also mistakes that I have made along the way I remain curious and open to learn and correct myself I am leaving my contact details on the screen in case you want to chat more thank you so much for your time and interest