 Okay, so our next talk is an industry talk from Sebastiano Saccani, right, you need to turn it on. Can you hear me? Yeah. So, so this is an industry talk, so we won't be as scientific. I am Sebastiano Saccani, I am a data scientist in this company called Aindo, and we have the CISA logo because we are recognized as a CISA startup. And we work, yeah, on applying essentially machine learning to the industry with a particular focus on generative model. So, so I used to be one of you, so a researcher, I realized with disgust that this article was published 10 years ago, so 10 years ago I was doing my PhD. And in my PhD I was using essentially Metropolis Hastings algorithm to sample a function of like this form that you may well recognize and that you have seen also in the previous interviews, previous talk. And right now, moving to the industry, I still sampling probability distribution, sometimes of a very, very similar form. So I'm going to try to give you a flavor of how we apply generative models into the industry. So what is a generative model, you may have seen this before but a generative model is simply a model in machine learning that takes a sample samples from an unknown distribution, tries to learn that distribution, and is then able to generate samples from that distribution so this is a one dimensional case. You have a one dimensional case like you have a 1D distribution, you fit say, a distribution to it, a Gaussian, and then once you have the Gaussian you, you can resample and so have new samples from the same distribution distribution that you have fitted. And the characteristics of generative models in machine learning is that the probability function that you're trying to to learn lives in a highly dimensional space. For example, the space of images, they have a like before one megapixel images you have one million dimensions so it's very highly dimensional for the space of a smile string. That's the same, it's a very high dimensional space. So you're trying to learn from samples on that distribution in very high dimensional spaces. And so you have this magic process say in which through training you have this box generative model that at will is able to generate new samples from that distribution that you learn just like we've seen before, for example, with with molecules. So a little bit of distinction. Typically, there are we have two categories of models that conditional and unconditional models. So in the unconditional models, we are trying to learn this distribution plain and simple, while in conditional models, we are trying to learn condition Okay, the name says it all. And a great example say unconditional model is playing image generation with the generosity of the serial network. While I would probably call what we've seen before with my string, a conditional generation so we try to find the next token or the next word based the condition on the previous tokens and on the previous awards. But the other things that that maybe it's important distinction is implicit versus explicit models. There are some models in which we explicitly know the probability of why so the probability, we can calculate the probability for a given configuration. For example, in the autoregressive model, we can calculate the probability of the next token, as we've seen in the previous talk, or, but there are also implicit models that are, I would say more common, but probably not, but anyway, quite common, which you cannot know explicitly your view of why but you are still able to sample from it. So let me give you a couple of examples. An unconditional and implicit model is, for example, a variation out encoder that you may know already for those that you don't know how it works. It works as follows. So we have a an input in which we receive samples from the distribution that I want to reproduce. The samples are included in reduced dimensionality space that's called a latent space. Actually you include each point into distribution but okay that's a bit of a technical detail. And then this is this encoding is typically done through a neural functions on your network. And then another piece of code that's sold on your network that takes the map things in this latent space and max those back into the space of the original data. And you train the system through this kind of by minimizing the kind of losses, which essentially has two terms are construction term, which forces the system to have a good reproduction of the input data at the same time and a cool back library divergence term, which is used in to force the essentially to force the distribution in the latent space to be simple. Why is that, because in this way, we are able to map a complex distribution that's the one, the one on the left into a simpler distribution that I know that is the latent space, and then I have a machinery to sample from that simple back into the complex space that I started from. So, in this way, you can then at inference time once you've done the training, you essentially discard the encoder part, and you just sample from this known distribution which typically is a normal distribution, and then apply your decoder, and then you can generate new samples. So, the other example that I want to show you actually you have seen in the previous talk so it will be very fast, out of aggressive language models. So, in this case, I will not go through into the details, it's already been discussed. People do it with the current neural network, and now they do it with the generative transformer, but anyway, the basics is still the same. You have the previous tokens, and you try to predict the probability of the next token, and this allows you then to generate pieces of pieces, large pieces or small pieces of text depending on your application, because you've learned essentially from from a set of data. So, the other thing. So this is just a bit of overview of, say, types of generative models. Of course, we went, we didn't go into the details. But the other thing that I wanted to mention is the relation between self supervision and generative models. This is a short post from young lecan or collaborators I don't remember exactly the calling the self supervision that matter of intelligence. So, the self supervision is essentially, I would say a training procedure in which you try to predict observed or observed or hidden part of the input from observed or hidden part of the data that you see of the input so you slice your data in time or space. One part of the data is used for essentially as a prompt to the model, and you're trying to predict what you don't see. So that's exactly what we were doing before. So, in this, in this settings, we have something that part of the input that we see so the pieces of strings. And there's something that's hidden side like the word here that we want to predict. And so, essentially, we can say that most generative models, well, that is models are trained through self supervision. And in fact, even unconditional generative models, you could say that are trained through self supervision, but only in this case, you do not have any observed part of the input so the unconditional models, like the one be the one we had before. So, in this case, the generation is unconditioned and you can, we could still say that there are self supervision model only that we don't have any observed part of the input. I've already made a bit of a mess of here, but I hope you follow me on that. What, what this is the self supervision is usually talked about when we deal with the problem, the annotation problem machine learning if you are doing so self supervised learning. The problem is that annotated data are expensive. We want to leverage data that are cheap. Instead of leverage data that are expensive side, such as labeled data. So self supervision is very often employed in this pre training fine tuned scheme that we also saw in previous talks or my work is very easy today. So we tried to use lots of cheap data in the previous talk was probably Campbell. Now, the lots of cheap data could be in this example Wikipedia to train again. To start from a radio randomly initialized model we obtained a pre trained model. And with this pre trained model then I can do further fine tuning through using the supervised data that is my expensive data. And this can be done just just that we've seen before with conditional model, but this can be also done with unconditional model because I take again my variational encoder here, say that they have a lot of unlabeled example, I can train an encoder that maps my complicated space into a reduced dimensionality space. So, because in the simpler space, so I can then use this encoder, or just the representation of the data that I obtained to do a further classification task. And because now I live in a simple space. It's much easier, I need less data to do that. So one of the big application of this self supervised generative models is actually pre training the models that you need. So that you don't need as many labeled data as you would. So that's one of the things we do all the times with that very different model so I didn't focus on on any of that. And the other things that I want to point out is that the industry is going through a direction that's trying to substitute unsupervised approaches to solve supervised problem. And I will give you one example of that. For example, say that I have a translation task, this is generally considered a supervised problem in that I need to have typically have parallel corpora so the bodies of text in which on the left say you have the language that the starting language which might be Italian. And on the right, you have the English translation, and you have to have a one to one correspondence between sentences, so that you can do supervised training. These are these of course are hard to come about this kind of data set how hard to obtain and need to be manually created. So they are typically expensive. And what's been happening recently is that purely supervised approaches can also general, general learners, as they call it, so they can do multiple tasks. So for example, the classic case is gpd3 gpd3 is generative train models or one of these models that can generate text as we've seen before. But you can ask him like you can prompt centers like that, translate this into Italian or through the available. And the model we try to complete this sentence, and he tries to complete it in the correct way. So we actually out of the translation so well it's not a camera this point really. So in this case, a model that's been trained completely unsupervised is doing a task that seemingly supervised this can be done for other tasks question answer is a big one for us, because now you have models that are able to answer question or extract things without being explicitly trained for doing that. So that's a big one for somebody like sauce that works in the industry because this now relieves the burden of collecting the data or at least I can use now, say 10 times less data than I would have to train a question answering model from from scratch. So, the last thing that I want to talk you about is probably the things that we are most focused on is the use of generative model to create synthetic data. The, the, the bottom line is as follows. In general, when you're working with data there is a problem of privacy, especially if you're talking about. Yeah, in the private sector, say a company and an insurance company needs to analyze data. There is a risk associated with it because of privacy and GDPR. So you want to protect the privacy of people but you need still need be still need to be able to analyze the data and especially if you are dealing with an external analysis provider like we are giving your data to somebody but in fact when you're doing statistical analysis as the name says you're not really interested in the record level information. What you're actually interested in is the probability, the statistics of your data set that described by this probability function. So what we are trying to do is to say market generative models as a way to capture this probability function so that you can publish it. But in doing that you can publish this function or you can publish it either directly or through a synthetic data set that is a resembling of this function that you have learned this generative model essentially. And in this way you don't you haven't given any specific information about the people that were originally in the data test that is you have given them plausible deniability. So, even if I give you a synthetic data, you will not be able to recognize what are the people that were in the original data if the model is trained correctly. And, and or in at least you're giving them plausible deniability they can always deny that they were in the original data again is the thing is done correctly. So we are using justice machinery that I've shown you before of generative models. But we are applying it to the space of the typical corporate data set which are stored in tables. So now the question becomes more of training this variational code or generative adversarial network or diffusion model whatever to a data set that not really standardized so it has different types of say columns, like the time flows strings, God knows what, and, and you have to turn them into a smart way so that you can feed them into your generative model and then you can sample out of that. So we add our own machinery we have a library that does that. One of the other things that we are focusing on is extending this to relational data set the problem here becomes really messy, because of the reason that we have on the right so when we are synthesizing a single table with independent roles, everything is happy. When we have a rational data set, a lot of things break a lot of assumption break, you have a one to manipulation between samples. We have somehow to carry information across tables because a user is represented in this example here a user is not represented by his features but also what he does so say this is an e-commerce database what items he has bought. And also what is a sample here is not particularly clear. Rose might not be dependent on each other and so on and so forth so I will not go into the greedy leaders of what we do. Some of it I would say is also a bit of a trade secret, but anyway that's the direction we are going of using this kind of models to essentially publish this distribution, but protecting the privacy of the people involved. I will go into the 18 minutes and I thank you for your attention and if you have any question and you're curious about what we do feel free to ask. Thank you. Okay, thanks a lot for the insight into industry. There are questions. Kevin has one. So while I walk over to Kevin I can maybe ask do you have any clients from the industry that is maybe closer to engineering or material science at the moment. Do you have any clients that come from say the producing sector? Well actually we work with Kisi Farmaciautici so we actually do very similar things that we've seen in the talk before and we work with retail and banking. So these are the sectors that we work the most. Yeah, I was very fascinated by the parallelism and the transparency of these methods, but I guess you answered my question whether also pharmaceutical companies or. Yeah. So for us now is using these language models for extracting information from text because it made it really really simple to use this pre-trained model to extract information that you would have otherwise needed a lot of training data to do. Now it's stupid stuff really but kind of a kind of a burden for this unit to do it, extracting information from invoice PDFs for example, like who has, who has, who is this invoice from. If you needed to do it five years ago, you'd need a tons of trading data. Now we have this pre-trained model you feed the text of the PDF and you ask them who is this invoice from and you need a bit of fine tuning data. So yeah, 100 samples and you get out of a good performance out of it so that's kind of amazing, I would say. Okay, any other questions. Anything on zoom. Okay. In that case, let's thank our speaker again.