 Thank you Normally, I just give workshops around the libraries that I contribute to or something else related to machine learning, but today it's a bit of an Emotional talk because of the topic as you can see So I was introduced very well. So I'm going to move on to the next slide So in February there was a very big earthquake that had a magnitude of 7.8 and it was in the South Eastern Turkey and I think north of Syria and Many lives were just ruined that day and in In in like few days there was like 40,000 40,000 million 40,000 people died and so many were injured in just a few days and at the end it was like an area as big as Germany and 16 million people were affected because of it like an entire some of the cities were completely just destroyed because of it and what happened so That day We saw that there's a common pattern with the survivors survivors are posting a writing of a screenshot and posting it on Instagram saying hey, I'm I'm here This is my address. Can you come and rescue me and then they take the screenshot of that writing and then they post it or They they try to send messages to their relatives and They also post to Twitter or there are several telegram channels that take these take these notifications and We were a bunch of voluntary hackers We were in a discord group and we were like there is this survivor guilt that you need to do something about it and It's We just try to come up with ways that we could somehow make use of this data and Actually rescue people Because it would get lost and like the area was like so big the The civilians had to help a lot. There was a very big voluntary effort. So we were like, how can we contribute to this? and We were first asked to build an OCR application if you don't know OCR is optical character recognition and what it essentially does is that you have like type text or handwritten text In an image and you can extract the text that is inside it and just pass it. So your input is a screenshot of writing and your output is the text and this Application should return the parsed address not not exactly the text with the parsed version of it into streets apartment number and more So we built one Initially and this is how it looked like so you would you would in here you would input the screenshot and you could get directly The parsed address or you could also input the text if you wish and then you could get the address so we could actually crowdsource this data collection or Because this UI can be used as an API you could send a batch requests to get your addresses parsed and this was This was used a lot like you could actually send a request with the screenshot the It's base 64 encoded or you could send the text directly and you would get the parsed address if you want to and For OCR we tried many libraries. We tried pedal OCR easy OCR and tesseract But at the end we decided to go with easy OCR because of some of the characters not being parsed properly and This this whole address parsing is called named entity recognition and you can parse Streets apartments and so on you can get the names with named entity recognition this is actually the abstracted name of the problem that we were solving and Initially we used the open AI API, but we easily ran out of credit. So this application would crash a lot and We later decided to train our model Which brings me to today's talk which is about the disaster map application that we have developed so disaster map is essentially a front-end with With points or data points on it visualized on the addresses and you can filter what people need and You can just go to the data points which are matched with the longitude and the latitude of the person posting the tweet or Something and you can provide them with what they need and This was the amount of the track the amount of traffic it got in In only one month it was used a lot and we have received so many messages That people were actually saved thanks to this application and What's what's happening was that we were first asked To make a meaning like this distill the tweets and the messages into following information Overnors post owners needs post owners address but in a structured manner so we can pass to the Google's GIS API and The post-offner's name and phone number So there was a couple of problems with this or because we are a bunch of hackers There was a problem around the anonymization and storing the tweets We later learned that even getting these These information and putting it inside the plain CSV is actually processing it Like just collecting tweets and this might be in breach of you know, GDPR and so on and There was many challenges around the data drift as well, which I will Later come to today So before I move on to that I would like to Explain our methodology a bit So in here we have we have solved mainly natural language understanding problems And the in natural language understanding where you are doing text classification token classification what you want and not generation like GPT stuff The state of the art is the transformers and transfer learning transformer is the architecture and Transfer learning is to transfer information from a very big pre-trained date Pre-trained model so you can imagine for instance GPT GPT's are very big models or like there is for computer vision. There is ResNet and other things You can have that pre-trained model and then you can just adapt that model to Be useful in your own use case This is called fine-tuning and this is currently the state of the art I will explain the how it's done later on but We directly picked Transfer transfer learning for this use case and it's we also later compared it to NXG boost trained from scratch and it was again outperforming it. We also Compared it to OpenAI as API for generation But it was like a Restricted generation and it was again outperforming it because we were actually Fine-tuning a model for our use case So this is the BERT model in case you don't know BERT model It's it's the biggest one of the biggest changes in the paradigm of natural language processing. So Today's models like GPT, Lama and others are originating from the Transformers architecture and this is one of the first Transformers architectures used for natural language processing and Can I just move? Yeah, so What you can do with BERT is that and this was released by Google back in the day what you can do with BERT is that you have this pre-trained model and You put a classifier head on top of it and it can be about text classification for instance and if you have a text classification If you want to train for text classification with three outputs You just put a classifier layer on top of it with three out three three nodes and then you can just train that part and you will be transferring the knowledge you have from BERT To your use case. So you only train the classifier layer or you can also Unfreeze the previous layers. It's up to you, but it only takes three minutes on I think we used T4 T4 GPU three minutes on T4, so it's very simple and Yeah, that's why we picked this and on top of this this also comes with position lambeddings which improve the performance That's why I think it performed way better than XGBoost or any other architecture and Transformers is the library that makes a lot of obstructions over the whole transfer learning process because it's actually quite a handful, but you can just You can just abstract things away. Thanks to transformers and like it takes very little Change in the code if you would like to use it It has something called trainer API which you can directly pass your data and some hyperparameters and Fine-tune a model and it will just work fine So initially for intent classification, we were asked to do intent classification so that we can classify Shelter food water and more we have used something called Natural language inference models and essentially what it does is that it does zero-shot text classification And zero-shot means you haven't trained these models specifically for text classification But you provide the text itself and the candidate labels It can be like shelter food Water and then it will just return in here. You can see I think it will return the class Probabilities of what this text might be so it's like a large model that you can use out of the box without Training that's why we first opted for this and this was performing as good as GPT like models and this is completely open source by the way and Yeah, like for intent classification later the model that we trained also this one is a Bert's like model It's an it's the architecture is the same. It's called encoder only models, but it's a Bert variant and for address parsing we tried the GPT 3 but again, it's we ran out of it and The it would occasionally return wrong results and Confused name entities like you can have like an apartment name with a person's name and it would confuse it But if you are to train like it's often that my intuition is that and what I observed was that you shouldn't use large generative language models on Understanding tasks. It's like it's like cutting the bread with a katana. It's too overkill and you don't need that you can actually Just train your models in like three three minutes and it will eventually outperform That's case where you are paying a lot of money and The inference is not as bad And the thing is like we alongside with the named entity recognition model They've that's where parsing the address and the names that I just talked about we use regex and It's it's it's very often that the simpler solutions actually results better and The how we work together. So basically hugging face hub is like the GitHub of machine learning in a way it has repositories specifically faith made for machine learning and like it also has collaborative features like pull requests and so on so it was easier to work on that and like people could try our models right away and This also later helped us With the inference part because the inference for inference it would take a lot of work Which I will explain later So this is the first open source address parsing model that we have trained So in here You can see the inference widget and it parses the the streets the apartment the the municipality and everything and Thank you, and then we have passed this output We processed and parsed this output and passed it to the Google's GIS API and then later it's really it would return Longitude and latitude and that would be marked on the on the map directly and This was our multi-label intense classification model The the thing is people might state Multiple needs in one place like it can be I need blanket and shelter and food so you don't really classify into You know one class So we had to train multi-label intent classification and Yeah, there was a very small change Compared to text classification so on and For model serving we use the inference API of hugging face Because the thing is like if we were to get the model itself We to make it usable We would have to dockerize it and then put it to a cloud instance and so on and we were racing with time so This was very convenient because once you push the model to hugging face up. This is automatically enabled and Thanks to this We gained a lot of time and we could swap the models very quickly in the production and they charge per Per-token as far as I remember back in the day. So it's not it wasn't much at all and The problem was that after these model was deployed these models were deployed The problem was that there was a lot of data drift. So what was happening was that as The time passes Like the effects of the earthquakes actually stayed for one month or something as the time passes People's needs were changing a lot like in the first week You would act people would ask to be rescued under the building and Then in the second week you would have people asking for shelter and food because it was very cold over there And like even if people were rescued they would actually freeze to death or They were they couldn't find water and food and that was that was quite a problem and Later on this evolved into sanitary needs of people and other things So the tweets and the posts and the messages that we were receiving were constantly evolving and it wasn't it it's it's it was a challenging task and the problem another problem was that We weren't allowed to see the data that was coming to coming care from production they just weren't going to store it for privacy reasons and because of this also like they were on us like They were saying that we need to remove our data or anonymize it and store that instead of Keeping it because in this data you have phone numbers Names of people addresses, which is unimaginable within EU but acceptable within Turkey and And for this we use the faker library to anonymize and store the data without hurting the model performance and For the for the parts where we would have to adapt to the data drift We would have to crowdsource this process so people like we have the tweets with the timestamps and everything so we built a UI using our Gila and radio radio is like stream that if you are using stream that it's like you are building a UI for your machine learning model and The we had like multiple of these but this one for instance you input the data and then it's in the back end There is a model that classifies the The addresses and then you can say hey, this is this is true This is false or this is ambiguous and then it drops to a database where we collect The where we collect the incorrect stuff and we can later adapt our model and This was hosted open to everyone. So everyone could contribute to the To this but the problem is that data labeling itself is very hard like I remember We got a very big generous grants from Microsoft Asia of 150,000 I don't know the currency. I think it was in dollars and We also use this for labeling and everything and the initial initial thing will initial data set that we labeled was Very bad because there is a rush and everything and the guidelines weren't as clear and If you don't really have this then you cannot come up with a standardized thing and in labeling in data Labeling what you should do is like you should have people approving things Approving the labeled data and like it's if it passes from like six people or something then it goes on to your data set and we didn't have time for that that was very awful and Yeah, after a while we have improved our data set as well because we had clearer guidelines and everything but Then you know as you race with the time the problem is that you Begin with the people being rescued from refugees and then and then you need to first get them and That is where you need to race the hardest with the time and then the importance decreases eventually and That was the problem with the labeling and everything and Thanks to the hugging face repositories. We were able to host tensor board instances and We also had that built a leaderboard for the For the models because multiple people were training models. It's like a very big team effort We evaluated them and then we would write into the Model repositories themselves, so it would automatically get the leaderboard would automatically get them and then just Ranked them so we would use the best model that was available and In here we cared about recall for the multi-label intent because you would You you should be caring about false false positive false negatives Because if a person has a need and if you are not attending to it, then this is a problem That's why we opted for recall and F1 score And so it was the completely open source ML OPS pipeline So we would clean and version the data and then experiment using transformers and Version our models on hugging face hub and do active learning There were a bunch of other machine learning projects as well this one is determining road and building damage with remote sensing and This is about the fact that like the buildings were crushed and the roads were completely some of them were completely useless and because of this so many Like even if you were to deliver help over there, you wouldn't be able to do that because that road is just cracked and It wasn't usable so we had a bunch of computer vision engineers that Developed a model to determine do segmentation over the buildings and the roads and others so that we could inform the authorities that you know this road is useless and You should opt for another one and this was pretty cool application and This is my significant others blog he let the front-end part and if you would like to read about the front-end the back-end and the other challenges about this process you can go to his Devto profile and People are still struggling unfortunately ever since their homes were lost some of them were never able to move to another place and some of them are still living with shelters and Still they still need food and water so you can donate at abap.org and I can take questions now. It's a bit earlier. I'm sorry, but I guess So thank you very much for the talk If there was questions you could raise a hand and then go up to the microphone and ask them Let's have a quick look. Yes, please stand up and go to the microphone. Thank you. Thanks for the great talk I have a question. I don't know if you already heard maybe about initiatives like Maybe not for only for Turkey or others seismically active regions maybe can be Useful to have an application when I was listening to you. I was thinking about Kind of lack of an application where you can just press a button and specify your needs and stuff like that and Then it's gonna be sent to to authorities and I know if you if you already start maybe at government's side or maybe some Private initiatives if you start thinking about creating some kind of Application I agree that will be very useful But like the problem was that the fact that this didn't exist at all, you know, like there is a need to actually build something Why would we need to build something? Why like there are so many critiques actually why the buildings weren't Properly audited, you know, some of the buildings were not as good and that's why they crumbled and There is so many questions and what you say makes sense and I will just forward it to the discord server that we developed this on and we later on we also developed stuff for the voting processes in Turkey with OCR and everything and This discord is still active. So we might do something like that. Thank you Do you have any other questions? Yeah, it's loud. Thanks for the inspirational talk really shows you what even a small group or rather small group of hackers can Can accomplish I have a non-technical question. What after you gain the information we got Information what their people needed? How did you actually help get the? food or shelter or water so this This is the this is the map and like anyone can actually go to that website and See the address and the need and it's more to reflect this reflect this information and We have heard that people actually went through this website and they rescued other people and so they basically see the address and see the need and just go and just But those were mainly private persons helping each other not like The military or other authorities, but it was so it was it was and so NGOs were using it civilians were using it Later governments was like, yeah, we can use this but what about the data? What about the privacy and Yeah, like the Main search and rescue teams of Turkey were using yet. Okay. Yeah, thank you. You're welcome Another question, please Thanks for the talk. I have more of a technical question Okay You mentioned in your talk fine-tuning all these language models and I'm just curious how much data would you need to fine-tune a model? I guess the answer is it depends Maybe maybe you could answer By maybe talking about some factors that are important to consider for example in the case of classification a cool question Essentially the thing with the transfer learning is that You you do not need that much data anymore because you already have the features extracted you can only Focus on the data specific things. I would say The data quality should be better and you don't need as much data I would say like maybe like 5,000 samples or something should be enough It really depends on your model size if you have something like birthlike model But like models are often very small even if you have like something like distilled births, which is like a smaller version of birth You need you don't you just don't need that much, but if you are going to For instance, do a generative have a if you if you are fine-tuning a generative model, maybe T5 or something like that I think you would need more data for classification You don't need as much because it's a very simple problem to solve compared to generation where It's a bit tricky. Okay. Thank you. You're welcome. I Also work at the hugging face. By the way, if you have like hugging face related questions or transformers or Transfer learning related questions you can ask any other questions Seeing none. Okay at this point. I see no other questions But there's one will you be around tomorrow for the rest of the conference if other people want to talk to you? Yeah, of course You can just look for her tomorrow around the conference and again check the schedule which has the information also about your blog Yeah, so if you want to know more just look there. So let's have another round of applause from us. Thanks for the talk