 Good morning everyone. Sorry to drag you out on a Sunday, but I think you will be having another wonderful day here today at risk I hope you have had one yesterday also and among all the poster sessions and the student talks and the keynotes So today we'll be starting our day with a keynote talk from Dr. Alpan Rawal He's a chief ML scientist at Wadhwani AI Is it by by training? He's a theoretical physicist which came out as a bit of surprise to me when I was looking at his profile He has worked on early universe cosmology and black holes That a mix that's a mix that we don't usually get in physics in CS He has taught applied mathematics and computational biology at Claremont colleges in California and later worked on research and AI team said Deesh or research Amazon and LinkedIn today. He'll be joining us to talk about ML in the wild and in the real world I mean we have all worked on toy ML problems like Maybe house prices and stuff like that where you have very nice acting data But today he'll be showing us the other side of the ML algorithms Where we have to deal with real-world data where we have to deal with all the challenges of the data I'd like to welcome him to the stage now Thank you so much for that very kind introduction Thank you Varsha and Preeti for inviting me for risk and Thanks to all of you for coming here on a Sunday morning I'm quite surprised to see the size of this audience given that it's Sunday morning I think when I was in undergrad I don't know if I would have gone to a talk at 9 a.m. on Sunday morning So it's very nice to see all of you coming here for this and I hope this talk proves to be worth it So I'm going to talk about the title is deliberately a bit provocative I'm going to talk about machine learning in the real world as mentioned as opposed to Machine learning in the research world And I'm not making any value judgments here. I'm just trying to show you how different things are in the real world So before I get to the meat of my talk Just a little Blurred one who we are you may not know about us. So I come from Wadwani AI We are based in Delhi, but we have a small office not too far from the IT campus We are an independent non-profit Interdisciplinary cross-domain AI impact Institute. So we want to create impact through AI devoted to developing and deploying AI based solutions to create impacted scale by improving lives and livelihoods in Underserved communities across the global South. So it's quite a mouthful, but it's actually very specific We only work on problems that impact underserved communities only in the global South. That's the third world formally called the third world Global South is the more politically correct term and We work on Problems that save lives meaning health and so on as well as improve livelihoods education and agriculture and so on so that's just a little bit about us and Now we'll get back to the title of this talk. What do I mean by the real world? So there's as far as we are concerned in this room, you know, many of us have interest in AI or work in AI and that's why we are here There's two possible definitions of the real world. One is it's the world most of us live in and by us I really mean the larger world outside not us in this room and The second definition is it's the world The real world is the environment environment in which technology Namely a ML model Will be used right now When ML models are largely deployed on the internet as is the case today And we do not coincide Many of us who live in this real world are not exposed to the internet As those of us in this room are right and this is especially true in the global South So what we do is we try to work for real world a and so what we do is we build AI for social impact So in countries like India and we all know this Real world is not the world of IIT Bombay. It's not the world of the office in which I sit It's a world in which farmers commit suicide because their farms are devastated by pest attacks It's a world in which people die of TB a disease that has been curable for the last hundred years That's some kind of record, you know It's curable for last hundred years and yet people not just a few people millions die every year It's the world in which babies are not receiving adequate nutrition in a country that has a food surplus Right and school-going children are not able to read So this is the world that we want to address through the problems that we saw So what are the characteristics of this world? Well, it's messy. It's dynamic. It's constantly changing It's impoverished and yet rich with data with Diversity of opinions with diversity of problems and so on The data aspects are typically Different from the type of data that you get from working on problems On internet problems You typically deal with small data sets you don't typically don't have the big data problem They're incomplete data sets. They're error-prone and the data is often available in spurts So the data is definitely not high-velocity data There's annotation issues annotation is poor. It's very difficult sometimes to get very accurate annotation labeling related to that Evaluation of any model that you build is very very difficult In this type of real-world context There's deployment and adoption aspects you can integrate your model into a software platform But that doesn't mean it's going to be used right so the deployment platforms are themselves basic or non-existent There's internet connectivity issues There's a need for user interfaces to your ML model that incorporate cultural context And I'll be talking about all of these in the context of specific case studies And there's a need for committed partners That can make sure that the work that you've done in building an AI model is actually used So you need partners for rollout adoption and scaling and our assumption in doing all this is that technology can make a difference To lives and livelihoods and as we work we are you know about a five-year-old Institute And as we roll out our projects we find more and more evidence that this is in fact true So what can we do with AI in this type of real world? Well, as we know modern societal problems Across the developing world are very complex. They're dynamic and they're all connected to each other and this is a Classic platform for the use of a technology like AI, you know, you want to discover Unknown or so far unknown correlations. You want to discover patterns so that when you solve one problem Maybe you're solving other problems as well So in pretty much any area that you can think of that impacts lives and livelihoods AI can be used in public health You can use AI to screen for to diagnose diseases to recommend treatments To drive health-seeking behavior among patients to carry out disease surveillance at a population level Disseminate information better obviously because of LLMs and so on In agri and climate change again detecting pests monitoring crop disease Through satellite images or through images taken with basic smartphones Driving better agricultural practices predicting crop yield in the future so that you can take The necessary interventions in advance again disseminating agricultural advisories information about weather Soil conditions and so on I'm monitoring the effects of climate change using satellite data things like that In education you can drive student behavior through the use of ML models improve learning outcomes and create better content and I'm sure the list goes on in all of these cases So the the thesis behind AI for social impact and I sort of make a distinction between AI for social impact and AI for social good AI for social good is a term that's I think been hijacked by the West to mean Removing the negative effects of AI like biases and so on AI for social impact I view as a more positive term where you can actually use AI to do good and you're not just addressing the negative effects of AI right so So the thesis of AI for social impact is really to improve Life and economic conditions for the underserved as a first priority More important than addressing although it's important to address biases And and things like that we consider this part to be more important than that So this actually fits in very well with the UN's Sustainable development goals for those of you are not aware these are goals That were adopted by UN member states in 2015 and the aim was to To reach these goals by 2030 we are nowhere near reaching these goals and COVID is partly responsible But there are other issues as well. There's wars. There's political issues And the goals can be divided into three broad categories goals that affect society goals that affect The economy and goals that affect the environment and you can see I mean the statements are fairly simple But there's quantitative aspects to each of these statements the statements are things like no poverty zero hunger and so on and you and the aim was at least the aim in 2015 was to achieve these things by 2030 across the world and I don't think you know, we are close to achieving them across the world for any of these goals, right? So there's been very limited progress most of the sustainable development goals are not on track to be reached by 2030 and In fact developing country targets are much far off much more far off than Western targets and these are some examples COVID-19 and Ukraine crisis have derailed progress on extreme poverty In 2020 the share of population living in extreme poverty actually rose for the first time in two decades and so on. There's still you know Food issues, there's disease issues progress against TB malaria HIV was slow to begin with and it's been further derailed by the pandemic 10 million TB cases worldwide 30% of them are from India alone and 33% of the world's TB deaths are from India alone And this is this has been worsening. I think 2023 numbers were slightly better than 2022, but it's still pretty bad There's been some studies and these are I think these studies need to be repeated There's a fairly there's a set of fairly old studies that show how AI can Impact the sustainable development goals So there's a there's a five-year-old study from McKinsey that actually mapped many of these STGs to specific AI methodologies that you can use to maybe address the STGs There was a separate study in nature communications in 2020 three years ago Which actually weighed the positive Effects of AI on the STGs versus the negative effects the negative effects include things like bias and environmental issues because of training large models And the conclusion was that the positive effects far outweigh the negative effects And so this is sort of a worthwhile Area to work in So we'll get back now to this issue of real-world ML versus research ML And this is a sort of very simplified representation of research ML Please don't shoot me down if you disagree. I know there are lots of exceptions to this But I'm giving you sort of cartoon representation of what research ML is So you start typically Sorry, yeah You'd start typically with a fixed data set what I call a God-given data set or ggd You extract labels and features You carry out some algorithmic experimentation on this data set you come up with a model There's a subset of ggd. That's usually reserved for evaluation which we call the test set and which I'm calling ggd prime And you predict labels on the subset You want you keep iterating until you can beat state-of-the-art Based on some God-given metric that's universally accepted which I call ggm and then you publish a paper Right now, I know not all research ML is like this. There's lots of exceptions But the idea is that you You build models that you can eventually publish and models that actually outperform others Based on some standards that are accepted in the community, right? And based on data sets that are accepted in the community So the standards are what I refer to as God-given metrics and the data sets I what I refer to as God-given data sets Now how does real-world ML work by contrast? So first of all, it's not the blind application of research ML to the real world, right? There's a lot of misconception Especially among academics that all you do is take a paper and apply it to a real-world data set It's not just that it's the application of natural intelligence not AI But that our own intelligence to select aspects of research ML that may be relevant to the real-world setting Tweaking it appropriately and much more than that There's a lot of data and other issues that I'll talk about the rest of this talk And it really starts with a respect for and an understanding of the deployment environment So you can even sort of you can almost think of the environment in which the model will be deployed as the God-given environment It's the environment that's fixed, right? And you have to work backwards from the environment So you work backwards from the deployment environment to define the ML problem Often the problem is not well defined to begin with the relevant metrics and these metrics you may iterate on The business metrics the ML metrics and you have to make sure these two sets of metrics are well aligned So that when you optimize an ML metric, you're actually also optimizing a business metric And what are the data needs and how do you need to annotate the data and what other metadata you may need to collect and so on, right? And so you're really iterating in this space of model data problem definition metrics All of these are dynamic and keep changing in a real-world situation So this is sort of a cartoon representing real-world ML. You have the raw deployment environment in which you will deploy your model You may design some nice user interfaces to this environment to make it cleaner. That's this box here and Then you you may have some raw available data to begin with you that data set may be of size zero You may not have any data at all So you will have to supplement it with additional collected data and this is again an iterative process You may also collect some metadata along with that Then you will have to sample this data. You will have to clean it Transform it in various ways annotate it and transformation includes sort of imputation and things like that You will have to annotate it and usually annotation is itself an extremely tedious process This part is also iterative And then you train your ML model you do algorithmic experimentation here train and validate it deploy this model in your clean environment and And then of course you have your new data coming in along with any new metadata And you predict a label, right now all of the green all of the Aspects in green are under your control are things that you need to choose or you can pretty much define the only thing that's given to you is a Deployment environment and maybe some data to begin with and of course you have a lose problem statement Problem statement is usually quite vague and lose Now when you do this you also have to ensure that you don't get any type of drift data drift concept drift and so on between Your training data and the data that's actually coming in during deployment, right? And you have to constantly monitor for this and then finally once you have a predicted label You have to do some type of continuous evaluation and the final result of this process is impact. It's not a publication Right, so this is how the real-world ML cycle works and as you can see it's somewhat different and In in many ways much more complex than a research ML environment The research ML problem all the complexity is in the algorithmic work Whereas in the in a real-world situation the complexity is is much more spread out And now there are various pitfalls and I know most of you know about the pitfalls of Deploying AI models you read about it all the time And there's some things that we do in order to try and address these pitfalls first of all we never Ingest any personally identifiable information in our Datasets, so we make sure that that's deleted from the get go We make sure data collection happens with informed consent We make sure before we deploy a model you don't just evaluate Summary performance on an entire population, but you actually evaluate performance on protected cohorts to make sure that there's no Significant variations in performance across cohorts if there are significant variations you need to address them through debiasing techniques This is very important especially in health AI models to have human in the loop the AI should not be the final decision maker It should always be the human We almost implement conformal prediction as a given in almost all of the Models that we work on so that we have uncertainty and we have you know We have confidence measures in our predictions When we use LLM's you have to make sure there are guardrails you implement rag Avoid hallucinations as much as possible, but this is something we are still struggling with to make sure LLM's can be used safely When we deploy models we start with a passive deployment a passive deployment is one where the model is deployed It's making inferences, but the inferences are actually not being used to drive any interventions on the ground We're just logging inferences In order to evaluate that's a passive deployment Once we have passed the passive deployment stage we go into pilot or small-scale deployments and check for robustness in these Pilots ideally we'd like to have pilots in as diverse number of locations as possible so that we check The generalization abilities of the model and the robustness of the models Ideally we also want to do a B testing, but that's not very easy to do in the kinds of situations that we work in So as you can see this requires all of this work requires a very collaborative approach We as a 200-person Institute can't do all of this so we have to collaborate with a number of partners We have 40 plus partnerships with governments non-profits academia including IIT Bombay And the private sector. We're an official AI partner of these ministries And we partner with a number of such organizations And our team really consists of a mix of people. It's not just ML modellers We have domain experts. So we have doctors Agriculture experts and so on in our team. We of course have the ML folks. We have user research and design people engineering And M&E team that designs Evaluation studies, so now I'm going to deep dive into one of our solutions. It's New born anthropometry That's the team that's working on it. I'm just giving the talk So I'll start with the problem statement so so the problem statement really has to do with Babies being born in rural areas And it's well known that the first 28 days of life, which is called the neonatal period is the most vulnerable time for a child survival Right, so the statistics global statistics are one-third of all neonatal deaths occur in the first day of life One-third right half within three days three-fourth in the first week And this is these are really sobering statistics, right? And this isn't you know 2024 Sorry, no, this is from 2020, but it hasn't changed much So in India under the national rural health mission Home-based newborn care has been implemented since 2011 to reduce mortality rates in rural areas so there are these Asha workers who workers that are employed by the government usually they are housewives who go door-to-door and Counsel mothers on how to treat their babies and they also do some basic things like checking the health of the baby weighing the baby Measuring the height of the baby monitoring their growth and things like that, right now. How do they? And this brings us to what the problem that we are interested in is how do they weigh babies in a home setting? So this is how they weigh babies they take They usually take a piece of cloth like an old saree or something Tie it to a spring balance and create a sling Put the baby in the sling and then read the weight of the spring balance now This system is heavily prone to errors as you can probably imagine the spring balances are not well calibrated So they have zero errors the the The Asha worker eyeballs the weight of the baby on the spring balance as the baby moves. There are lots of fluctuations and Lastly, there's no motivation to enter the true weight. So for example According to WHO standards if the baby at any point in their first Believe in the first six weeks dips below two and a half kg Then there's a number of interventions that need to be taken because that's a danger sign Now these interventions have to be taken by the Asha workers themselves who are overworked Right so the temptation is always to round up to 2.5. So if the baby is 2.2 kgs The weight is entered in their register is 2.5 And so if you actually look at statistics of baby weights across India, there's a huge mod at 2.5 It's completely artificial and it's because the WHO cut off is 2.5 for taking interventions So this is a huge problem and there's similar issues with length measurements and so on but the main issue is with weight measurement But it's very important to measure weight in a timely manner so that you in fact do take the right interventions if the baby has low weight So there's lots of these issues that cause errors as I mentioned already supply maintenance and performance of spring balances There's cultural taboos that don't allow these Asha workers to touch newborns, right? So then there's this complicated thing that happens where the mother puts the baby in the sling and The Asha workers not allowed to touch and so that causes errors in the weight all kinds of issues like that, right? Low motivation as I already mentioned errors in manual entries and then data tampering because these Asha workers are overworked So babies are not in fact accurately weighed on the ground in home settings By contrast, they're actually very accurately weighed when they are born in a hospital setting So 80% of babies in India are born in hospitals And there they are weighed with digital weighing scales that have an error of 10 grams Right, so they're very accurately weighed there. The problem starts when they go home, right? So they're not that weight is not being tracked properly So what can we do with AI? So the idea is we wanted to make sure that we address The cultural taboos appropriately. So we wanted to make sure that the solution is touch-free. It's robust Doesn't need this complicated spring balance that you need to Calibrate all the time, right and it should be simple enough to use and we knew that Asha workers are Being supplied by basic smartphones by the government. In fact, this is rolling out across India across India as we speak So we said well, why not try and take a short video of the baby at various angles so that we capture some Sense of volume of the baby and then use that to infer the weight of the baby So this is how we started working on this problem Four years ago and we envisioned this to be you know We wanted this to be accurate real-time tempo-proof, etc. Etc. So so the way this works is that the Asha worker takes a video in a circular arc like this Around the baby with their smartphone and in real time they get a weight number and we call this digital Taraju Taraju is a wing machine and the workflow is very simple. It's just you capture a video You capture some other data about the baby like the birth weight if it's known and so on The model estimates Anthropometry measurements and the relevant databases are updated so you can both get Individualized information about the baby That's tamper proof. It's logged in the back end and you can get summary statistics because this is automatic digitization as well That's the other problem with this process is that the Asha workers often write down these weights on registers So there's a further digitization step whereas in this case the digitization is done automatically So this is designed to work in home and community settings not in a hospital setting even though it works better in a hospital setting Because of better lighting conditions and backgrounds and so on but in hospital settings you have digital wingscales It works on low-end smartphones. It doesn't need any high-end equipment like depth sensors or LiDAR and so on It requires minimal setup It does require a fixed reference object in the frame so the common reference object used in computer vision problems is a checkerboard So we started working on this problem Assuming that there would be a checkerboard place next to the baby after about two years I think of working on it We were told that it's not feasible for Asha workers across the country to carry checkerboards with them It's just not feasible and our solution simply didn't work without a reference object like that So then we try to find out what is the feasible thing for them to carry and we looked at what is in their bag And it turns out that that simple 12 inch wooden ruler that we've all used is something that they carry around anyway And we started using that as a reference object and after a lot of Modeling work we could make the wooden ruler work equally well as a checkerboard as a reference object So we're now using a wooden ruler Doesn't require physical contact And the video is not stored on the phones of the frontline workers. There's an interesting story behind this Which is that when we created an initial version of this solution and we went into the field to show it to mothers We expect we expected them to be impressed, right? So we expect them to think oh, this is really cool, you know, you can tell the weight of my baby from a video But instead the reaction was Obviously you can tell the weight of the baby from a video my question is where are you storing that video of my baby? Where does it go? so there was a huge privacy concern even among rural mothers and Part of it is still a problem for us because if we tried techniques where you can automatically blur the face of the baby Before it goes as an input into the neural net But then that throws off the weight estimation quite a bit because of the size of the blur and so on So we still can't do that. But what we do is delete the data as soon as the inference is done So it turns out this system now performs we've been working on this for about four years It now performs better than conventional methods Over 95% of babies have less than 10% relative error in weight and weight is really the hardest thing to infer The mean absolute error in weight is at 111 grams right now Versus with the spring balance method the average error is about 183 grams And of course you get all the other benefits of automatic digitization and so on The data collection for this has been extremely complicated We worked with three institutions Seva Rural in Gujarat PGI PGI-MER in Chandigarh and Nilofa Hospital in Telangana and Over four over a four-year period. We've collected about 24,000 baby videos from about 12,000 babies So it doesn't seem like a huge data set if you're used to sort of YouTube scale video data sets But for this type of thing it was a huge effort to get to this many baby baby videos Our data collection is still ongoing And you know we have trained staff provided with calibrated equipment to measure ground truth and so on The data collectors visit children on days 3, 7, 14, 21, 28 and 42 After birth this is the standard protocol for home-based newborn care And they capture measurements on mobile phones and capture baby videos and one of the caveats is the babies have to be Unclothed we are still working on methods where you can have clothes on the baby and and it works equally well But right now we do require the babies to be unclothed The the measurements that we capture for ground truthing are the length weight This is the wrist circumference the MUAC head circumference chest circumference We also ask for I mean this data is actually typically noisy But we log it anyway the gestational age the actual age the gender and the birth weight if known Right because most babies actually weighed in a hospital. So birth weight is usually known There's a short video that I'd like to show It's about a minute long Do we have audio? It's it's okay audio is not critical. It's just music So this video actually shows a white background, but the model actually doesn't require a white background It can be arbitrary. We've tested it under various backgrounds and lighting conditions Then the worker takes a video An arc around the baby with the ruler as a reference object This video actually shows an evaluation study. So So the worker is also measuring ground truth using a digital wingscale and this fact the solution in fact is fairly robust We've trained on Gujarat data evaluated on Punjab trained on Punjab evaluated on Gujarat and so on and it works quite robustly So this is sort of this is the model architecture. We have video input CNN backbone. This is We for a long time we use the ResNet 50 backbone We've also replaced this with clip embeddings and the clip embeddings work slightly better than ResNet 50 We do we have a frame pooling method and then The the critical thing in this is to pose it as a multitask problem So and to combine with tabular information the tabular Information is the additional data that I showed age birth weight and so on The video combines representations from multiple You know this input combines representations from multiple frames of the video and the multiple tasks here are the different anthropometric measurements such as length head circumference chest circumference and weight obviously so the So these are the corresponding for multitask heads But in addition we have two additional tasks that are actually not required at inference time But which improve the performance of the model and those are sort of baby segmentation tasks on the video Which is implemented through a fully connect a fully convolutional network head And a task where key points on the baby's body are predicted certain key points and so we had to do additional annotation for these two tasks as well So when you set it up as a multitask problem like that It actually improves performance on all of the individual tasks So we had to do some pseudo labeling here for for these two tasks the segmentation task and the key point prediction task and we actually did labeling here using Models so the annotation for the segmentation mass and the key points are carried out via fine-tuned models so we did some annotation and fine-tuned the point rend and hr net models to automatically annotate And then use that as ground truth for the larger model So the segmentation head is a I already mentioned this as a standard fully can fully convolutional net Architecture and the key point head is a two-layer multi-layer perceptron architecture This is what the data looks like We can we actually we had to make sure that we got representative data From all weight groups. So this is bend by weight. You can see this is a cross weight Training set the training set is in red test set is in blue And this is sort of the gender distribution of the data Evaluation metrics are fairly straightforward We evaluate using the mean absolute error in the weight or the mean absolute percentage error And I don't need to tell this audience what that means So this is the summary of our results This is on a training data set of roughly 10,000 videos corresponding to 2,000 babies and the test set of 1,300 videos corresponding to 150 baby babies We get a mean error of 111 grams, which is actually really good for use in the field This is entirely in a field setting with poor lighting conditions with varied backgrounds and so on Over 50% of the samples have error less than 100 gram and the The MAP is about 4% and so on so these all of these numbers look Pretty good for deployment and that's what we are now gearing up to do pilots What we'd like to do eventually is things like this. So we want to be able to automatically Plot these growth charts for each baby and here you can see an example for a male low low birth weight baby and a female normal birth weight baby The blue is the ground truth growth chart and the red is the predicted one from this model and What what we are gearing it up for is an evaluation that's planned in Bihar We haven't trained this model on any babies from Bihar. So we're really excited to see how well it does The second problem I'd like to deep dive into I hope I have enough time Yes, sure sure sure yeah We can pause for questions. I mean since you said about the real thing, right? So I feel in the real world the percent that goes wrong. Yes Has a bigger impact than a number in the paper. Yes, so That's one of my questions I have But so what happens? I mean you're not if it's the actual data the real and You can't keep having the ground truth also for the actual data. Also. Otherwise, there's no point in doing that Yeah, that's right. So what is happening like what's the real impact of this five or ten percent that's going wrong Yes, so we actually I haven't shown I don't have that slide here But we analyzed where things are going wrong and it turns out that our model accuracy is much higher Among low birth weight babies. So it's going wrong more among normal birth weight babies and Precisely, I was going to ask that so it's it tends the errors tend to be more normal birth weight babies being classified as low birth weight Which is a more acceptable error than the other one But again, you know the whole real world nest so yes Do you anyway do those conventional methods like because it seems I mean I don't know Would the magnitude of the error be sometimes maybe the magnitude of the error from the AI Could be just anything whereas the spring thing maybe the Magnitude there is limited or it can be bounded in some way because it's just a little bit off and and also the person Who is taking the weight? Yes, we'll probably have an intuition kiya kya So we yeah, so we we haven't done a deep. That's a good question. We haven't done a detailed analysis of the tail You know tail of the predicted distribution that we should probably look at that But overall we don't find huge deviations. I mean that we need to look at that in detail But we don't find huge deviations. So we've got this tested In the field by various workers and everybody is reporting back saying it's working well They don't see this massive deviations like you will have one case where you know the babies To kg and it's predicting 5 kg or something like that. You don't see things The second question now we do expect that if you go to if you take a model That's been trained say on Indian babies and take it to Africa or something. Yeah, we do expect things like that So we're not very confident that it will work in a completely different demographic the second question is is an end-to-end cost of this done because otherwise, is it really just better to get Something for the end and the spring balance for them. Is it is that gonna be cheaper than throwing AI at it? So this spring balance there's been a number of Efforts by the Gates Foundation to make sure that babies are weighed accurately in the field. So they funded a number of You know what are called programmatic grants to make sure that spring balances are well calibrated that they're available and so on and In fact, the Gates Foundation came to us with this problem and said nothing is working So none of these methods are working. Now, of course, there's a huge development cost to this I mean we we got a fairly large grant from the Gates Foundation to build this model over four years in the first place data collection is expensive But we believe once it's robust enough and the indications are there that you don't need to do too much additional data collection For it to be usable pan India Then we think that, you know, the cost will be justified. I mean this it's it's a relevant question But the answer is still somewhat up in the air, but we believe the cost will be justified Hello, how does the model perform with the obese children with obese children? Yes I don't know how many obese children we have in our data set. I mean we go to rural areas to collect data hardly any obese children and In the and obese in the see we're only collecting data from zero to 42 days of age. I Don't think we have any obese children in the data set. So I can't answer your question I don't think the model will work very well on On city children, you know who have Good income and so on I don't think it will but Because it's that's just not reflected in our training data Hi Chilly I want to ask about the aftermath of this You are doing such a great thing your institute, but if we found out the means each state elbow point where all the Babies healthy babies are risk-free babies are Lie on that particular Point and we can calculate all those structured data and Somehow predict that this should be done. Then can it be possible why? by doing this that we just Just keep these all things aside like waging all these things and just told the government that do this and automatically All the things those who are not sure I understand. So are you saying that if you know? If you know regions where the problems are can you just tell the government to intervene or is that what you're saying? Yes, yes, if the problem is we don't know finding out the point of predicted babies of healthy What do you mean by the elbow point here? I'm sorry means all the ranges of babies would those who are not at risk By weight and height and all those ratios. Yes, so that is not shows that that previous slide shows Yes, so the idea is that so the idea is once you use this data once you use this model to collect baby weights then and the other anthropometric parameters Then you can aggregate all these numbers at a district level at a village level And so you know how you know how well babies are doing in each of these districts and the government can then take interventions And you can automatically do this aggregation in the back end Whereas with the current process because there's no digitization happening everything is written these registers and You know the Asha workers are being told to use their smartphones to enter the data But they're still used to using registers. Some of them use smartphones. Some of them don't so there's the whole Digitization issue behind this in addition to the accuracy issue. I Don't know if that answers your question. So you said you are calculating volume and no, we are not calculating volume So when we started doing this We had a different modeling approach where we were trying to estimate a mesh around the baby and Then compute the volume inside the mesh and then multiply by the density to get the weight Because most babies in fact it turns out have similar densities body density The density start varying after the first six weeks the first six weeks actually But it turned out that this direct regression approach where we directly regress on the weight Turns out to work much better at least within you know We think that may be a data set issue that because we still have a relatively small data set of videos The direct regression approach works better. Whereas if you had a much larger data set then estimating a 3d mesh Might actually work better. So if some neonatal baby has an abnormal bone density So is your model a robust to that or is it error prone in that situation? So if yes, so if you had a mesh model, it would not be robust to that because then the density would differ This current model may also not be robust in the sense that it's only using visual information to estimate the weight We're using some of the data like where we are using the birth weight As one of the predictors. So hopefully that abnormal bone density would be taken into account in the birth weight itself But we have no way of measuring bone densities and checking this hypothesis So is your model generalizing to normal baby only or it has outliers like abnormalities or other diseases? No, so it generalizes the main purpose of this model is to identify low birth weight babies So it generalizes well across the weight spectrum But if you look at other types of abnormalities like, you know missing limbs or things like that We just don't have the data to test that and collecting that type of data is going to be very very difficult And of course incorporating that data into the training set is going to be even more difficult Any more questions? Okay, I'll continue then How much how much can I go over let me ask? Okay, okay, then I'll try to go through this quickly and if there are questions, you know, please come to me later so this is In contrast to the previous solution, this is in a different space. This is in the education space and it's something that's already being used it's deployed at scale and it's being used and The basic idea behind this is much simpler It comes from the fact that you know only roughly 40% of grade 5 students in India can read at grade 2 level You know So there's not even at grade 5 level at grade 2 level And this is in their own mother tongue. This is not English So how can we fix this or try to fix this using technology? So we want to make sure that we have a way of automating and digitizing assessments. This is a problem Just identifying where the problems are is a problem Provide good reading recommendations to students in multiple languages and make sure that this happens at low cost Doesn't require very high levels of literacy We want to make sure that the technology is can be Deployed offline doesn't require internet And it can be verified by students So what we've developed and again, I'll go through this a bit quickly in the interest of time what we've developed is This solution called Vachans which we call Vachan Samiksha, which is an oral reading fluency app And the idea is quite simple Student reads a piece of text a given piece of text This audio is pre-processed. We apply various Speech recognition models to it Create transcripts from these speech recognition models Each of them will create a different transcript these transcripts are then aligned to the target text That needs to be read And then combined to produce a consensus transcript and then this consensus transcript is then compared to the target text to figure out What words the student got right what words were substituted by a different word What words were added these are insertions and what words were missed these are deletions, right? And of course, you know these counts are functions of the students reading ability As well as the accuracy of the ASR models. So these two things get mixed up If the ASR student could be reading accurately, but the ASR model is not good and therefore the final Errors reported are really errors of the ASR model and not of the student right conversely ASR model could be really good. That's what we really want. We want the ASR models to be as good as possible within the Scenario in which we are deploying them and so that any errors that are measured here Are the errors of the student? So this is the challenge really the challenge is not it's not a standard ASR problem because of this challenge So we do standard I'm going to skip through a lot of this. I'm sorry Please ask me questions later if you're interested So we go through a lot of pre-processing steps on the audio denoising amplitude normalization and so on We looked at some open-source ASR models one the one that we Actually considered we didn't want to develop our own ASR from scratch so we looked at This is AI for Barat Barat's wave-to-vector model and The reason we like it is because you have frame-level output for each audio And it generates so you can actually generate a timestamp for each character in the time in the transcript It doesn't have an encoder decoder architecture So and the other thing that we liked about it. There is no language model in the wave-to-vector Model that will auto correct a transcript. So for example the whisper ASR Actually auto corrects a transcript and that defeats our purpose because we don't want it to auto correct If the student makes mistakes, we want to catch that mistake, right? So we don't want a language model or some context in the background that will auto correct a transcript So for example if a student instead of saying happy birthday to my friend says happy birthday to my friend Wave-to-vector actually, you know captures it as happy birthday Whereas whisper auto corrects it to happy birthday and that's what we don't want so we used wave-to-vector as a base model and Then we did a lot of different types of fine-tuning to this base model And this is where the data aspects and the context of where this will be deployed come in, right? So we we actually had a MOU with the Gujarat government They were very interested in deploying it across the state of Gujarat across all government schools in Gujarat And so our data collection is mostly in Gujarati We created a set of 300 grade appropriate paragraphs from six grades grades three to eight 50 paragraphs per grade that were appropriate to the reading level for that grade And these paragraphs had been used earlier for oral reading fluency assessments by the Gujarat government So they conform to the standards So and the way we did data collection and ground-truthing for these first we had teachers speak these paragraphs and we assume that the teachers are mostly right okay, so We collected about 7,500 recordings from 33 teachers And then we also had a lot of student data We did an initial pilot where we collected a lot of student data and we had about 2.8 million Audio recordings of students totaling about 90,000 hours And so we had all this massive data set of audio recordings from students reading these paragraphs Of course, these are error-prone Then what we did in terms of annotation is you know, we can't assume that all teachers will be perfect So we applied the the AI for Bharat model as a model to filter out teachers who are not reading well So we said well, you know, let's Let's filter out all recordings from teachers that have an accuracy by accuracy. I mean the word error rate That's less than 60% as output by the Wave-to-vector model and then only annotated those that had accuracy higher than 60% assuming that any errors there were errors of the ASR not errors of the teachers Right, so the annotation for them was simply the target paragraph You don't have to do any additional annotation that because you assume they are right The annotation of the audio is the target paragraph that they are supposed to lead For the remaining data the student data we selected a subset of this And and did a very detailed annotation including a timestamp annotation But this we could only do for a very small subset about a hundred hours of data We did very detailed time step level character level annotation So these are two sets of annotations that we did We did a third type of pseudo labeling as well, sorry We had a you know as I mentioned we had 2.8 million audio recordings to begin with We used a pseudo labeling strategy using models to do annotation Where we took a combination of the wave-to-vector model and a model that we had fine-tuned on the teacher data, right and we said well Let's align the model outputs with the target paragraph and we created a pseudo labeling strategy Where we say we only keep the hits and the insertions that means we treat every Substitution as correct. So we assume that any substitution error Is going to be an error of the model? It's not an error of the student and we do only we do this only for students who have a score greater than a certain threshold So this is a fairly complex pseudo labeling strategy that we arrived at after a lot of experiments So we did we we had a held out well set on which we evaluated Word error rates and that's how and we iterated on this strategy to come up with that So I'm going to skip this. Sorry in the interest of time even though this pseudo labeling strategy is quite interesting And so this is how we fine-tune our final model We start with the AI for Bharat wave-to-vector model. We have the teacher data that's annotated Fine-tune on that and this is full fine-tuning. We fine-tune all parameters To get to a version one model Then we have the pseudo label data again do full model fine-tuning to get a second version of the model And then we take only the small very well annotated set of data The small set of data and we use fine-tuning using Laura to get a final model Okay, so this is this is how we did it again Why we use full fine-tuning here versus only Laura fine-tuning here? It's just what worked best in our experiments But our hypothesis is that because this is a small data set Laura fine-tuning seems to work better than full fine-tuning, but that seems to be the hypothesis And then we had to compress this model to work in rural areas where there's no internet. So we carried out distillation Using CTC loss on the teacher outputs again, I'm not going to go into the details So we did distillation followed by another fine-tuning step on the data that we had The data that we had was 15 second audio chunks But then in in during inference time We would have had to chunk the audio into three second intervals just to save on compute So we did a further what we call a Streaming fine-tuning set on three second audio chunks and that's how we come up with the final compressed model So the compressed model has the student model has 27 million parameters as opposed to the original model They had about 300 million parameters and it fits well on a one-plus Nord C2. There's only a 2% increase in character error rate between the original teacher model and the final student one Details of distillation and then to skip This is the way we evaluated the model. So we created two data sets. We had annotated data That was roughly 19,000 files and we created two data sets We created what we call an overlapping set where The test set consisted of people reading the same paragraphs as those in the train set So it's overlapping in that sense, but you have different people reading the same paragraph in train and test Whereas in the non-overlapping set both Both the people reading it and the paragraphs themselves were different Okay, so that's the difference between these two sets So so here you have shared paragraphs across splits and you have unique paragraphs across splits and these are our results so At various levels of fine-tuning so the AI for Bharat and we have two data sets on which we evaluate So there's our overlapping data set o and then this is the index superb data set That's a general asr that data set that wave 2 vecto Was evaluated on and as expected wave 2 vecto works better than our model on the index superb data set our model works much better than any of these on the On our overlapping data set, but it's also good to see that our model even on a general asr benchmark Doesn't do too badly. So there's no the fine-tuning doesn't hasn't resulted in catastrophic forgetting So this was good to see and we find I mean we've done all kinds of ablations here We find that the pseudo labeling significantly boosts performance even on unseen paragraphs I've already mentioned the compressed model and the fact that the results are actually quite satisfactory even on the more generic data set index superb This is comparison of our model with commercial apis on both overlapping on the overlap lapping set as well as the non overlapping set and You can see that you know the doing this type of fine-tuning and doing additional data collection was totally worth it award error rates on the tasks that we want to Deploy this for are much better than any of these other commercial apis So these are impact numbers we launched in in Gujarat in August 2023 as part of the g-shalla app We've done 2.8 million assessments to date with this as of February It's being used in 26,000 schools across 40 districts over a lakh teachers Impacted 2.2 million students and what we find is on average only 51 percent of words is spoken correctly So we do we do have a long way to go to use this tool to improve reading performance This is just a typical sort of teacher usage profile of the tool and it maps very well to school start and schooling We have timestamps for all the usage What we want to do next I'll be done in a minute sure sure so I'm just done. This is my last slide So where we want to go with this really is we're collecting all this data from millions of students We have their audio we have the outputs of this Of the or if that's actually so we know what words are hard words at a location granularity across the state So we can create a list of hard words at a school level at a district level and so on we can do semantic matching to To find words in the dictionary that are similar to those hard words and using that expanded set of hard words We can use LLM's to create new content at various difficulty levels And then we can use that have students read that content and lead them up the pedagogical ladder So this is where we are going next and I'll just end there. I I think I'm way out of time Yeah, I was going to say something about our other projects, but I'll just leave the slide there happy to take questions Yeah, thanks for the talk. I was wondering this the models that you have trained how well say you would be able to use it to evaluate children Reading assessment of children and say schools in Maharashtra. So I think the choice of using the teachers recordings Insulates against say accent differences. Yes, I was going to which Will may not translate as well now if you move to a different setting because yes So yeah, if you could just comment on that. Yes, so it won't work out of the box We need to collect additional data for Maharashtra. In fact, we're doing that already and yeah Just the point you made about dialect agnostic Yeah, the teacher recordings from across the state make sure that this model is diagnosed a dialect agnostic In fact, if you look at some of the other there's Google Bolo and so on which actually if you speak in a different dialect Classifies that as being erroneous. Whereas we wanted to make sure that Speaking in a different dialect is actually correct. It's not wrong So yeah, but for Maharashtra, we definitely need to do additional data collection and additional fine tune so I had a question upon that That we are calculating the results of their data like what I'll speak how are they speaking? So can we also calculate like some form of errors in their speaking like list span and other things? Yes, in principle, yes so one of the Things that we've been thinking about is can you diagnose learning disabilities using the data we have so we have timestamps on every At a character level, right? So if you analyze the pause patterns, you know, are there unnatural pause patterns between words Places where people don't normally pause Can you say can you diagnose something about the student? Of course, you need you know additional ground truth to do that and similarly with lists and so on we We don't sort of classify The presence of lists in the Audio, but if we could add that as an additional task again could require more annotation and so on using the data that we have Thank you All right, can we have another round of applause for dr. Alpin? For this talk, I'd like to invite Titi ma'am to give him a token of appreciation Well, I'm sure we all have gathered something about how difficult it is to apply ML in the real world setup Especially when you don't have digitization Irritably available as we have in our campus or in offices Next we have professor methi Lee wood guru from our department. She's an associate professor in our department. Her research work Spans span computer spans computer networking operating systems and network systems She earned her BTEC from IIT madras and did her MS and PhD in computer science at MIT 2006 and 2010 respectively today. She'll be diving into Linux network stack She'll be talking about how network packets are received and processed within the operating system before it is delivered to your application She'll also be giving a lot of examples. So it's a nice I mean, I'm I'm excited for this. I want to understand how different network stacks and Latest network stacks work Hello, good morning, everyone. So in this talk, I'm going to be talking a little bit under the hood of what happens Inside the Linux kernel when we send and receive packets, right? So I'd like to thank my students they bojit and treaty who are here for some of these slides So how does this talk differ from in general the other stuff that we learn about, you know, networking, right? So normally a lot of you who would have taken some networking course would have You know seen what are called socket programs? For example, you know that most networking applications are Built as client and server applications where you know the clients and servers have things like Sockets that are listening at some IP addresses port numbers and you know, you connect these sockets to each other You can send and receive messages. So if you've taken a networking course before some of this is Familiar to you and you would have heard these terminologies, right? And a lot of web applications today, you know web servers clients email Gaming all of these things are typically there is some API like sockets, which is a cc++ Abstraction or in other programming languages there could be different abstractions But all of them point to the same thing that there is some mechanism available Into which you can send and receive messages and a similar Entity on the other side will receive these messages, right? So this is how networking applications typically communicate So if you look at a networking course, you're going to learn a lot about, you know What happens when this networking data leaves the computer, you know You have a layer to switching you have IP routing routing protocols and you know TCP congestion control All of that stuff is what you typically learn in a networking course, which is what happens once this data that an application Gives leaves the host but in this talk what I'm going to try and talk about is what happens inside the end host itself When you send and receive data So this content is something that found that you know falls in this weird boundary between a networking course and an operating System's course and so sort of kind of left gets left behind in most curricula whereas For modern system design, you know a lot of what happens here is very important when you look at You know real systems being built out there and where do the inefficiencies arise and so on, right? So in this talk, I'm going to try and Demystify this part of it, you know when a networking application writes something or reads something from a socket And you know what happens from that point till the point when the packet leaves the machine at which point I think there's a lot of networking textbooks that talk about switching routing and so on right So this is where I'll try and focus in this talk Okay, so we'll start sort of the deep dive. We'll go step by step So the first thing of course in any operating system So these networking devices your Ethernet device your Wi-Fi adapter are all you know Devices that connect to an operating system on a desktop or a laptop and All of these devices are managed by device drivers. So device driver is a piece of code that manages And it's an external device on a computer, right? So for a networking card or also called an NIC or a network interface card Also, there are some device drivers a different devices have different device drivers and the first step to sending and receiving data From Computer is actually a device driver talking to that hardware device and configuring it, right? So these are the basic steps that happen every device driver talks to the NIC the network interface card through a process known as MMIO or memory mapped IO what this means is that there are some, you know registers or some Places some pieces of information in the device that your device driver accesses And configures them saying, you know, this is how you have to send packets This is how you have to receive packets and so on so that is the first step that happens and Then from now on so what we're going to do is we are only going to focus on the receive path here What happens when you receive a packet on your computer, right? The packet has traverse the internet has been routed forwarded the TCP blah blah blah all of that has happened And the packet has reached your computer. What are the steps that happen until it reaches your application? So we are going to focus on the RX path the transmit path is somewhat similar and simpler Okay, so the so first the device driver has configured the network card. They've established communication saying if you receive Any packets? This is what you're supposed to do So the next thing that happens is when the network card, you know your ethernet card or a wi-fi card receives some data Receives a network packet The next thing you'll do is it'll do something called a dma or a direct memory access What this means is it will go to some Area of your DRAM your main memory Which is under the control of the operating system. It will go to some location How does it know what this location is the device driver in its initial configuration has given it some information, right? It'll use that information and it will go to some part of the os memory and it will directly copy the packet into that os memory Right, so this happens automatically without your operating system anybody else getting involved So this is called dma or direct memory access After which point the nyc raises what is called an interrupt So any external device your network card this keyboard mouse anything when it has Uh, and even tuckers, you know, you've typed a character on a keyboard when any such thing happens Any external device does what is called raising an interrupt this basically Tells the operating system. Hey, look some event has happened. You got to handle it So this guy puts the packet into the os memory and raises an interrupt Right, so this is just basic interrupt handling that if you've taken a course in operating systems, you should know So interrupts come from external devices and whenever an interrupt comes there will be usually, you know Your cpu is running some user program. That's uh, you know scheduled on the cpu And that guy is continuing to run and when an interrupt happens that is when the operating system comes into play This is called. It's also called a trap. So when this happens your cpu's context, whatever is the context of this User program is, you know, the state of all the registers everything that is saved Then the operating system comes into play now some external event has happened, right The program that is running doesn't know how to deal with it So the operating system comes into play. It runs this interrupt handler takes care of that event and then goes back to the user process again So this is what happens for any interrupt So is the case for interrupts from nics whenever network packets arrive as well, right So there's a lot of stuff going on here, right? So this is not an easy process You have to kind of, you know, pause this program correctly so that it can resume again correctly You have to switch from user mode to kernel mode. What does this mean? This means you're going into a more privileged mode where the operating system can do some privileged operations So all of this is sort of like a It takes Some number of cpu cycles and this is not cheap to do so interrupts handling interrupts are very expensive So why am I emphasizing this? We'll come to it later. So anyway, this is general interrupt handling So also the similar case happens for networking devices as well whenever a network packet arrives. There is an interrupt But a few differences some small differences when it comes to network interrupt handling is that So normally if it's just handling a keyboard interrupt, you typed a character on a keyboard There's a keyboard interrupt. There's a little bit of work to be done. You know, you take that keyboard character You copy it into memory. You're done. But for networking packets There's a lot of work to be done When you handle an interrupt you have to do layer two processing layer three processing ip routing You have to do the checksum and if you've studied tcp congestion control, there are these large algorithms that have to be run Oh, is there a loss should I increase my congestion window decrease my like there's a lot of thinking to be done, right? So all of that doing it during this time when you know, this process has been interrupted And you know, some other processes running you've interrupted it doing all of that During the interrupt handling is very cumbersome. Therefore for network interrupt handling what operating systems do is They split this interrupt handling into two parts What which are called the top half and the bottom half So the bare minimum that you need to do to handle an interrupt Which is just acknowledge tell the device that hey, you know, I've listened to you and you know, I'll take care of this That is done in the top half and then this top half basically schedules Another the second part of the interrupt handler called the bottom half interrupt handler Which does a bulk of all the tcp ip processing because that takes a lot of time and you don't want to do it While you have interrupted an existing process So later on when the cpu is free when no other process is running That is when this bottom half interrupt handler gets scheduled and this bottom half is the one that fully processes the packets Right. So this is very unique to network interrupt handling simply because it is a very heavy weight interrupt handling mechanism that is needed So, uh, the other thing that uh, what happens between the top half and bottom half is that the top half disables future interrupts So once you have interrupted, you don't want the network card to be Constantly ringing the doorbell and saying hey, there's a packet. There's a packet. There's a packet So the top half disables interrupts temporarily until the bottom half runs and when the bottom half runs It'll handle all the packets that have come in and once the bottom half runs then it reable reenables interrupts again Right. So what happens during the top half is uh You know your n i c has done the dma Has interrupted whichever process is running it temporarily takes a pause runs the top half interrupt handler and goes back Okay, so at this point I'd like to also introduce this terminology of the device driver And the n i c share what are called these tx and rx rings What are these rings? These rings are nothing but pointers to Received and transmitted packets now the n i c cannot just dump the packet somewhere in memory and go away Right. It has to tell the operating system where the packet is So what it does is a pointer to this packet is stored in what is called this ring Now this ring Is a shared piece of memory that both the device driver and the n i c know where it is So whenever The n i c puts a packet it updates the pointer in this the pointer is put in the rx ring And whenever the os runs later it'll look at the pointers of these packets from this rx ring and it processes them Right. So this is so so far we've done with the running of the top half interrupt handler So the next step is now whenever this process that was running that did the top half interrupt handling It gives up the cpu the cpu scheduler runs. There's nothing else to do So at that point when the cpu is idle this bottom half interrupt handler gets scheduled This is also called this process is also called case of tire qd in linux, but it is basically os process that handles a bulk bulk of the Network processing right so this bottom half runs asynchronously later not when the packet is received But when the cpu has free time So this bottom half runs and then the bottom half once again makes you know So there is this received packet then you have to kind of now parse the packet understand What is the layer 2 header layer 3 header? What is the tcpip? Headers, what is the information there? So it creates another data structure called the socket buffer or skbuff This is the name of that data structure in linux There is the skbuff that gets created and this skbuff is basically has all the packet headers along with a pointer to the packet Right. So all of this happens in the bottom half interrupt processing And in the bottom half as part of the bottom half itself you also do the tcpip processing now check Check the head you know check sums of all the headers tcp congestion control sequence number is it received correctly or not Should I you know send an acknowledgement? What acknowledgement all of that happens as part of the Bottom half interrupt handling by this time note that you have the packet in memory And you have the skbuff that is pointing to the packet and this skbuff is what gets Passed along to all the tcpip functions right by now the packet is Of it has been plucked off the rx ring the device driver is no longer responsible for it This packet is now traversing the packet along with a whole bunch of other Pointers to headers the skbuff has pointers to headers as well as pointer to the packet This is the guy that's going through the network stack at this point Okay And finally after all the processing happens all the networking processing happens the packet has to eventually reach a socket Remember that at the beginning we said sockets are typically the most common api used to send and receive packets So an application has opened a socket an application writes into a socket or reads from a socket now When this packet has been received the socket buffer After doing all the tcpip processing you identify which socket it belongs to you know you look at the port number ip address everything you Identify the socket that should receive this packet and you go take this skbuff tag it along to the socket The socket also has a point you know some place where it stores all the received data for that socket called the socket buffer So you take this skbuff tag it along to the socket right so so far the packet is in the os memory With all of these skbuff or rx ring was storing pointers to this packet right so we have one copy of the packet in os memory Now when the application reads From this socket Then this data this network data the payload the information the message that has been received on the socket is copied into User memory right the user has some Array or something some character array that it has asked to Has given as part of argument to the read system calls So when you read from a socket now the os and the user space they do not share memory by default Right so because of security and isolation so the user space memory There is some variable in the user space that is some array that has to store this packet Now this packet is copied from the kernel memory to the user memory right so at this point when you read from a socket All of this story has happened and the packet that has been processed fully by tcp Now that packet is copied into user memory and now the networking application You know if it's an http get request the web server can read the request Send a response back and so on right all of that happens at the user space of the application Okay, I hope this is Clear so far So as you can see there is a lot there are a lot of steps and this is a very high overhead process There are many many inefficiencies when it comes to the linux network stack for example You have frequent interrupts whenever a network packet comes, you know Whichever process is running has to stop has to save context jump into kernel mode go back So interrupt handling imposes a certain overhead certain inefficiency in the system Then the next thing you have is you have this frequent context switching between You know, there are regular user processes running then there is this kernel process the soft iRQ the bottom half interrupt handler that is doing All the network processing so you have to constantly switch between your actual application and this kernel process which again There is a context switching overhead involved across processes, right? So there is switching overhead between user space kernel mode There is switching overhead across processes and finally there is this packet copy overhead, right? The packet is first dma into os memory and then the os does all the processing and then the packet is copied into user space So you have multiple copies of the same data lying around Now when linux network stack was Designed I mean network cards had much lower throughput and you know all of this was Didn't seem like it was such a big deal, right? But today what you have is you have hundreds of gbps of Networking hardware coming up and you have the same operating system with the same architecture that was designed Decades ago at much lower network speeds the same guy running the same network stack running with higher and higher Networking speeds, right? So today so this is from a recent paper if you take a hundred gbps Network card and you run this Run it on linux and run a networking application on linux. That is, you know simply reading packets You can nowhere match The line rate, right? You're getting hundreds of gbps of data even with multiple threads and You know multithreading parallelization whatever you do the throughput that you get of course These numbers may vary and there's a lot of work on how do you make this number go higher? But the bottom line is you cannot just keep doing whatever you are doing before and expect To match the speed of networking hardware as you 400 gbps is also You know found in data centers today and you cannot run the same operating system on top of These higher and higher networking speeds And hope to be able to process So if you have a web server or any other application that is actually that needs to process 100 gbps of data It cannot run on linux. The way linux is today, right? It has not been designed for those kind of speeds because there's a lot of overheads that we've just discussed right so today people are coming up with newer ideas and while I cannot go into a lot of depth of those ideas I'd like to give a high level flavor of what is it that people are looking at Right so on the left here you have the generic network stack where for every packet There's an interrupt you process the packet you copy the packet to user space and all of this one alternative to this that The state of the art today is exploring is you're trying to offload some of this packet processing into the kernel itself Right, so this is called in kernel offload now Of course, you can always write, you know, you can hack the linux kernel change the source code Write kernel modules to process packets inside the kernel, but that is not something that everybody can do So today people are coming up with techniques that makes you easy that makes it easy to offload Some of this packet processing into the kernel itself so that you handle the packet here itself At least most of the packets here itself and you know, don't do this copying into user space So the application somehow communicates something to the kernel in an efficient manner and the kernel handles the packet At this level itself so that you're avoiding all of these packet copies So that is one way pushing things into the operating system itself and you know, avoid crossing this kernel user space boundary The other technique is what is called kernel bypass where you say forget it I mean linux this architecture itself makes my head hurt. I'm going to leave all of this aside I'm going to directly take packets away from the n i c directly into my user space application And bypass the operating system fully, right? So the second the next set of optimization techniques are called kernel bypass methods Right, so we're going to briefly look at what these two are. Of course, there are pros and cons, right? So as you can see this one is likely to give you better performance because you are kind of getting rid of all of this Clunk here and going directly to user space But at the same time you're losing control of the operating system whatever control the operating system had With respect to you know, maybe you have existing scripts and tools and all of that don't work over here They work better over here, right? So there is a trade-off So first let's look at what are these uh in kernel offload techniques. So the most common and Popular one is what is called ebpf or extended Berkeley packet filter I mean, uh, let's not worry about how this name came about But what this means is that today in the linux kernel you can actually write some programs called ebpf programs Which actually, you know where you can write some packet processing You know look at the packet if this is the header do this do that you can write packet processing code in In this framework called ebpf which gets compiled and verified and then loaded into the kernel So now this is not like hacking the linux kernel to do packet processing This is much more safer You are guaranteed that this ebpf code will not crash the kernel because there is a verifier step over here That ensures that the code you're writing is safe and it can be you know dynamically loaded into the kernel So you can write these ebpf programs and then these ebpf programs run at what are called hook points, right? There are specific places inside the kernel during the packet processing Where the os has you know given you a place to push this ebpf code You know when the in the device driver during the tcpip processing when you are you know Assigning packets to sockets. There are multiple hook points inside the kernel where you can write your own code ebpf code to customize the packet processing So earlier everything had to go to the user space now everything does not have to go to the user space A lot of the stuff you can handle within the kernel itself Safely by doing custom packet processing in ebpf, right? And there are things like ebpf maps So there are ways so an ebpf map is nothing but a shared, you know key value store a hash table Kind of data structure where the user space can say hey for these kind of packets you do this And the ebpf program will take instructions from the user space or give back information to the user space Exchange information and you can actually do custom packet processing inside the kernel Right, so this is one way and there are several hook points, right? So I'll just talk about a couple of the hook points One of them is what is called the xdp hook which happens directly at the device driver as soon as the device Driver starts you can actually run this xdp hook. So, you know a lot of simple packet processing You can do here then there is something else called the tc hook that runs later on during the tcpip Processing when you have passed the tcpip headers. So here you have more information available to process The packet here you have less information, but this is faster. So there are different trade-offs There are multiple ways in which you can multiple hook points where you can run these programs Now ebpf has many applications today for example things like firewalls, you know You're getting a lot of extra traffic instead of this traffic going all the way to your user space program Through these 20 different steps. We've seen you can just drop it at the device driver itself You can look at the packet you can do some custom packet processing you can drop the packet and you know things like Observability collecting statistics about applications. You can write such code in ebpf and In fact people are also offloading if your application is doing some processing a lot of the common case Processing that can be done quickly can be offloaded into the kernel into ebpf programs So today there's a lot of interest in ebpf. It is Being developed very actively as part of the linux kernel and there are many many interesting applications that are being built on ebpf Right, so maybe i'll take a couple more minutes. I'll quickly go through this kernel bypass Method as well So what this kernel bypass says is it's slightly different from ebpf in ebpf We were pushing custom logic into the os here what we are saying is let's just bypass the operating system Right, so you can write A program in the device driver that will basically take all the packets and directly copy them into the user space application By passing all the tcp ip processing that happens inside the kernel So there are special type of sockets. So normally you have tcp sockets udp sockets You also have a type of sockets called af xdp sockets Which basically whatever packets that match a certain pattern at the xdp hook Which is the device driver hook you can write an ebpf program that will just directly pass them on Into your user space program and if you have device driver support, you can actually tell the nyc To dma the packet also into user space now, there's no more packet copies You don't have to copy the packet into os copy it to the user space application You're completely bypassing the kernel. You're basically telling the device dma the packet into The user space application and whenever the packet comes just give it to the user space application You're completely taking the os out of the loop, right? So this is called af xdp These are a special kind of sockets on which you can build applications So all of these things ebpf and xdp they all work together, right? So when you get a packet at the xdp hook you can do things like you can drop the packet or you can forward it through the tcp ip stack into regular applications or you can take the packet give it to this af xdp application that has bypassed the entire kernel stack right and This xdp and ebpf they are also used to connect up containers They work along with virtual devices that are used to connect up containers So ebpf and xdp together they actually form a very important part of the cloud networking stack And this lets you run networking applications efficiently on the cloud at high performance Right and finally dpdk is also one more it's slightly More widely used than af xdp, which is somewhat newer. So dpdk or data plane development kit This was initially pioneered by intel for their networking cards But now support is widely available for all high speed networking cards So what dpdk does is also similar to af xdp where you're completely bypassing the kernel You put in some kind of a dummy device driver in the kernel that doesn't do much And then your device driver is actually a user space device driver Which directly talks to the nic gets the nic to dma packets into user space memory And now in user space memory you have lot more control for example dpdk does things like huge pages Which let you you know avoid tlb misses and access memory much more efficiently And it pre allocates all these packet buffers so that you don't have too many pointer traversals when processing packets So there are a lot of optimizations that dpdk does but the basic thing it does is there's a pole mode driver There is a user space device driver that disables any kernel processing And periodically it'll go to the nic get a batch of packets directly into user space and process them Right so this basically eliminates all the inefficiencies of the linux kernel stack Now with all of these techniques today Networking applications can actually very easily get 100 gbps on a small number of cpus You know with af xdp with six threads dpdk using four threads You know with a small amount of cpu usage you're actually able to Uh sink in and you know process digest hundreds of gbps of data very easily Right without all of this it is actually very challenging on the Bare bones linux network stack but today with these techniques It is actually possible to build high performance networking applications very very easily So this is the summary of my talk. I've just to recap I've basically told you how the traditional linux network stack works from the packet coming in To the kernel packet processing all the context switching copying it into user space the inefficiencies Involved there and the two optimizations that are used today So given that you don't want to do this dance across user space kernel space crossing these boundaries There are two ways either you push your packet processing into the kernel itself There are ways to do that safely using ebpf or you say you want to bypass the kernel fully Take your packet directly into user space and you deal with it in user space using Frameworks like afxdp or dpdk and today the state of the art if you're building any high Performance networking application so today the buzzword in the industry is what is called nfe or Network function virtualization what this means is i'm going to build whatever was built as Software earlier is whatever was built as hardware earlier a lot of that stuff like firewalls load Balancers and various gateways and routers are actually being built as software today Instead of hardware and this software needs to perform well and it is using frameworks such as these That we are hoping to see high performance networking applications in software which is Driving the trend for nfe right so i think i'm out of time here so i'll stop i'll be happy Do i have time for a couple of questions yeah i think there are a couple of student talks now and then t 11 o'clock is the break yeah okay if there any quick questions yes ma'am as you said the packet processing actually performs in the kernel mode and since kernel mode is a non preemptive in nature if any big packet processing program are there will there be any Starvation of other programs yeah so that is why it happens in a separate process right the bottom half the case of tier qd process so that is not non preemptible so which is why the non preemptible interrupt handler is actually the top half is actually very slim and most of the Processing happens in the bottom half and then you could have starvation for example at high Speed you may have this case of tier q not getting enough cycles and not processing packets Right that can also happen okay thank you hello thanks for the talk yeah i have a question like in specially in data centers we see the trend that most of the network function is moving into SmartNIC so even Nvidia AMD Intel they are coming up with SmartNICs and almost all of the network Functionality has been like handled by them and they are freeing up the CPU to run user programs so how do you see like this evolution going ahead like will the kernel evolve or this this is a permanency that we will see in the future so that's a good question so there's actually in this set of things right there's actually another thing which i didn't talk about so you're moving it into the kernel or you're doing kernel bypass the third option is you're actually pushing it into smarter hardware so you're having programmable hardware smartNICs with attached FPGAs or programmable switches all of which are kind of you know absorbing some of the packet processing so that everything your kernel user space everything is freed up so that is also a trend so i think these are complementary i don't necessarily see one as overriding the other because this hardware is expensive right and you are in some sense locked into a certain hardware things like suppose you have you want to migrate your VM from one machine to the other if you're locked into a smartNIC you are basically limited in your VM migration capabilities so there are pros and cons to doing things in hardware it is obviously more efficient programmable hardware versus doing it in the kernel or kernel bypass so the way i see it is both of these will go hand in hand right and both are valid ways to make your applications perform better okay yeah so with that we'll end the session okay thank you all any more questions i'll be happy to take it talk to you during the team break thank you yeah i'd like to invite professor shavram to provide a token of appreciation to professor mathily so we move to the next segment of our event which is student talks we'll be having three talks in this particular session two will be from students and one will be from the gisv division of our college i'd like to invite anshuman dhulia here anshuman is a phd student and today he'll be talking about synergistic program analyzer which is a project he has been working on over to anshuman