 Thank you very much Welcome everybody Well, I understand that you are a bit tired after all the day, but what I try to be the less Charging that I can as she said I'm Edward of Santa Carvalho. I work in a strategy and Well, I I've been over the years designing Lots of AI solutions and now we want to talk about an alternative to a standard complex resolution That's the direct, okay So no As I said, there are lots of complex problems that we want to resolve with AI and many times we go for assistance or corrob all through car autonomous car and we always think on one only algorithm that resolve everything that it's like Rosetta stone of the or cornerstone of the artificial intelligence, but We are going to present today an alternative to this a Colony of two nights gambled more than a single human think about this When you think on a person a person is far more intelligent that any termite or any insect, but What the insect can build it's amazing Why because they collaborate they Well work together to build something that One of the most intelligent beings in the planet can build alone How we how they do that by Dubai and conquer and we can reply this natural behavior But doing the same we have a complex problem, but there is not a single individual there solving all the problem No, they go for it. So problems that are easier because the complex problem requires high computational Power and it's not always Parallel disabled you don't have a parallel approach for every complex problem And also they're kind of difficult to model and prepare the data. That's the reason you need Incredibly good that the scientists to model a complex problem but if you work on the decomposition you are going to have an initial analysis that will be kind of complex, but Easy for any Engineering and the sub problems that are the output of this Analysis process are simple easier problem that can be modeled with Small programs that can resolve that specific sub problem How we do that going by agents what's an agent an agent is an entity that can do Do things in an environment where other processes and other agents exist It's interesting because there is no artificial work in all this definition. Why? because an agent could be anything artificial or natural in the environment anything that is autonomous in the environment is an agent so we have an environment We have requests and what we design as the architecture of this agent as simple agents is Request response as any program in the world, but also they have to be aware of the environment with triggers context for the workflow What that means the environment are going to be sent by sensors and The context will happen a status and the workflow we work with this status how The environment triggered the sensors the sensors make changes on the status And this status is read by the workflow who reacts to it This is also as many other things that we are going to see today Inspired by the natural way that we respond to any process in the environment We sense the environment something. It's changed in levels on Our brain and that triggers some specific response When you have the whole picture you have Domains from a some subtype others titles that all of them are cute By machines and human requests Which are the responses of the same agents? Which means that you can combine any solution that the soup problem Generates and Makes them with the human request in a new queue So you have the whole system that it's working to with environment solving that complex problem And when we say domain a agents we mean a bunch of agents of this type with Horizontal scalability with the able to self clone and to talk with the undernet platform and I want to self kill with no requests Are we safe and as we are talking a vertical Scalability well every agent is able to do there to know if a request Belongs to their domain or not and if not simply rewrote to other agent. We don't like them I don't like this problem. This is not my request not my problem and pass to another one This approach provides true cooperative model and you can have Really cooperation between your agents to solve for real large and complex model and also have all the necessities For a larger scale complex problem solving because you can scale both vertical and horizontal in the way you need and How evolution strategies come here evolution strategies are very no Models that can resolve particular small problems What's the problem with the evolution strategies that? Historical they don't scale well, but let's see a bit of evolution strategies before we decide, okay? When we have evolution strategies, we always have in a rail of individuals to call population each one of these individuals is a Sample of the solution space which means every individual is a solution for your problem your small problem Okay, and these individuals have an array of values which represent their Jeans their chromosomes what they are meant to do and how well they are performed for that specific problem Also, when you go for evolution strategies, we only have two operations crossover which is the traditional Genetic operation when you have two individuals you select parts of the chromosome exchange genes and You get two new individuals that are new performance that could be better or worse than their parents But now you have a bunch of new solutions that can be evaluated to know if you are going to go Well or going bad in the resolution of your simple problem also you have mutation with Increase the entropy of the system by simply doing changes in a specific genes In a probabilistic or a random way, okay, so this is the traditional approach of the evolution strategies You have a bunch of individuals you have a chromosome which means real values and You have crossover and mutation to go over the iteration to get the improvements in the solving of the problem, okay so We we need is to get a better representation a functional representation Look when you go through complex models You always or many people always to go through neural networks and Neural networks are represented by a direct graph like this. You have lots of neurons that are interconnected You have in a liar input how you don't slayer here layers and output liar For what for represental represent a single function, okay? You have a function that ideally could represent any problem and solve it When you go with evolution strategies with you have an expression three expression three is a very good way to represent functions that are simple enough and That's the key because we are resolving small simple problems We are not going for the bench or the wall cake. We are going for the best piece And we have the representation of any function with this approach And this is very important because resolving this kind of a structure is far more efficient than the upper one now Which one are the Traditional implementation of this kind of algorithms. We have the traditional that come from the 90s the Genetic algorithm. Okay, it's basing crossover as we've seen before lame demutation is optional as depending of the programmer if they want it or no and Are good for very specific problems like? finding the maximum of the small function with Very small number of parameters and they are very bad at this kind Because when you need to cross over all the population you have a really Performance problem is you can distribute the population and try to cross between different nodes because you are trespassing the problem of the Complexity itself to the network so you don't have any improvement you can scale it would How we can solve this? Well, there are three modern alternatives to this Cmas which means covariance matrix adaptive evolution strategy Which is basing on mutation and only mutation? it's this is very interesting because you discard the crossover and Go through the covariance of demutation to get Very optimum a gradient descent of the problem When you get on sim cmas what you get is a population that Explode basing on the covariance of the power of the of the solutions of the space solution and You concrete where the maximum is and all the population is iterating over that point of this solution space When you have any ease which means natural strategy Natural evolutive strategy you are trying to go For a similar approach, but inspired by natural evolution what you do is to guide all the mutation again And this is very important the main reason because every Approach discard crossover is because all of those Solutions are good scaling a really good scaling so you can put the mutation on Spark cluster or over a tons of low you can have any large parallel back end Which? improved the performance a lot Open a a it's the only one that is really designed it for scaling and It's very recent open AI the described open AI evolution strategies the last year So this is really new solution It's based on cmas, but it's designed for a scaling so you have a problem that could solve Complex problems in a functional way, but simple problems in a computational way so you can go and solve all of them Let's talk about performance because as you can see I was insisting one and over and over and over again about Performance because this is the main point because this amazing technology is discarded several years ago What we have here is a graph or true Executions and you have the iterations needed to get a good a good solution And if you look here at the very start you will see the gradient descent of The simple genetic algorithm, which is the blue line or the open AI Alternatives which you can see that go kind of good under natural a strategy that can go better And you have the best one DCM as As you can see in a very very few iterations You have the best solution rich. This is kind of good because Very few algorithms can do that But let's see also in that time. Okay, this is the performance in time as you can see cmas because Really like a real a regular neural network the iterations are spaced. They are long iterations a long epochs that go improve in and suddenly they explode As you can see the natural strategy Natural evolution strategy go the first to get a very good solution, but far from the best one for Reference the point where all other are stuck. It's about 80 85 percent And the point where he's stuck the cmas is by 99 percent Okay, so they're kind of good finding a very good solution and The problem they are solving here. It's a Thousand variables tracing function. Okay, this is a very complex function that have thousand literally thousands of variables and Thousands of dimensions and have lots and lots and lots of local optimums local maximum and That's the reason the older solution get the stack and even the traditional general genetic algorithm, it's slowly going through is this no look I had a better one but the cmas The reason which is far away from the best solution at the start is because as we talked before They explode on the covariance so gets a lot of very bad options And they do this so they can do a true Explore of the walls space solution. They want to explore all the space solution to know where is the best and Then they come back there So every individual is trying to be the best in the area with the real maximum on our and this is It's incredible to see it But also we have the execution Execution performance is something that many people simply don't talk about everybody is obsessed with training Performance because is where all the money is but Execution performance is the key to know if your model could be used it in real-world or not and how is that because when you have an ES The typical execution time the activation time of your trained model is under the second far under the second Which mean is really usable but when you have for example neural networks in the same architecture you need 20 seconds for activate a neural network and look that the expression tree is even bigger than their neural network, but They takes lots far Far less time to execute and this is because the expression tree is very very efficient in the execution But their neural network need to go through all the nodes several times to get the solution Of course the way We usually go it's adding more and more and more GPUs to reduce that 20 seconds to one or half second but with that Approaches you don't need that any single or any a small resource architecture raspberry P a small A you AWS Instance can execute the model in a very fast way very very fast They don't need any backend to execute to activate So how is this all put in together, okay? It's Very interesting because we see at the start that we are going to solve all this solve this with an Asian architecture Where is the point where our? strategy Sorry evolution is that he comes here Well, of course, it's our workflow our workflow It's going to be trained it with all the requests we have and all the context We have to be able to produce very good responses again for a small small simple problem and then we'll be all combined To get a resolve of the wall problem How is the how does this don't you have the environment and as you know You have sensors that trigger changes in the status level and the CMA. Yes, it's reading This status is reading risk request to activate but also You can have a very good thing here You remember the execution times. They are very very small Training of an S or training of CMA S. It's in the order of seconds Not minutes nor hours seconds on a standard architecture those times where Were executed over in a standard i7 CPU? so What we can do here that is Far more difficult with other approaches. You can have retraining Continuous retraining and you can have this absolutely automated because you are fairly sure that the Retraining it's not going to take long. It's not going to take lots of resources and Even more important being parallelized. This can be executed in a back-end with Will be even in the same machine as the CMA s it's executing. So you have a very small Algorithm with needs very few RAM very few CPU that can be retrained it very frequent That even is not necessary or a strong back-end for retraining But it's absolutely parallelizable. So you can do it and that's the fantastic solution about this You can decide if you can you want to go over a spar with strong back-end lots of nodes for retraining or you can go with a small architecture like raspberry P and train only what you need and That's the real point that is killer with this technology that you can do all of this and decide exactly what you want Let's see a practical scenario of this, okay Of course, this isn't oversimplified Solution, but let's take a smart rocker mass broker a smart rocker is the solution Everyone want to solve because if you solve it you are going to get rich very quickly and be very very happy the rest of your life So what's the first thing we need to do we need to do to a splitting soup problems? We have the relevant facts with the news about the different stocks that we are Monitoring we have third-party recommendation and this is very important because if everybody is Recommending to buy some kind of stock It's likely to go up or go down and you need to know that and also the historical data, of course Which shows you a minimum maximum on trendings What we do now is to model those soup problems with here is here to solve into agents You have an agent for mood that can extract from relevant for a relevant facts the mood about a stock and you can with the same Algorithm and the same instance you can scale sub cloning so You can have lots of agents Looking for the relevant facts and reacting and producing a value for every a stock that you want to monitor Also, you are going to model an agent that make influence Which means what is everybody else saying about these stocks and this is very important But also it's a simple problem. It's not very difficult to get the influence. This is many times This is a toy a sample toy problem that you have in many tutorials That you have lots of opinions and you have to categorize them. Well, this is it Small agent that simply categorize a recommendation to know if a stock it's a and good mood It's a bad good mood or if the people are recommending to sell or to buy and the last one again simple a Simple agent and this is very interesting because Getting maximums and minimums is the specialty of evolution strategy So you have the best algorithm for doing the trend of the historical data here It's the best one you can have and the less Resource consumer so it's fantastic because you now have your Very complex problem, which is very very difficult to model because you have relevant facts It's an instructor a data you have third-party recommendation, which is a structured but comes from many different different Approaches and you have the historical data, which is the structure of data and You don't try to solve on a one shot now You go for the small problems and as I say this is oversimplified the real smart rocker We have lots more problems lots of more agents, but that's a sample you have these and This is the point to go you have the smart rocker Just split the problems and you model the agents and train that I'm going for retraining because Other thing that you have here is it's very easy to retrain frequently when you have Continuous data like it's this example. Okay so Evolution strategies are a very good solution for optimization problems and They are very useful in multi agent implementations Okay, and they can solve complex problem if you go for a previous analysis phase that the split every Problem in simple problems that can be solved in a multi agent environment The training times are very few very small compared with all the other model solution that you have available today and Even very few for all the power they offer to you and And these allows are I said and I will say again and again You can retrain you can real-time retrain your models and this is very very good You can do that with any model you think it's only a few models that can do that and here you have a Model that can't represent any function You want for any volume of data that can scale and that can be retrained it Continuously continuously as killer is fantastic So if you have multi agent system based on CMAS or other ES you want but Important that will be a scalable and within modern approaches You can massive Parallelize across three units cluster. You can invoke tons of flow or you can invoke a spark clusters and get Retraining and retraining and important but you have multi agent You only have to retrain one of them because all the the others are clones and you can spread The new model the new values of the best solution across all the new all the clone instances of the agent so You will have lots of solutions and you will have everything you need and I Think I've been a bit too quick so Thanks for coming and this is the time when if you have any questions, and I hope you have lots of them you can go through it. Oh that's a Difficult to compare because you need to compare the wall system and the multi agent system many times Don't go for wall training when you train a RMM you are training for resolve the complex problem the wall problem and when you go to train this CMAS this small agents you are going to train only a small part of the problem So your wings have to be a better time in the CMAS But that doesn't mean that it's a better global training time but if you Try to scale and get a similar Circumstances on my experience. I still think that the CMAS approach. It's more flexible and also Faster that the standard training of our RNN and There is another point very important here when you are training a CMAS you are not Conditioned by the back end you are trying in a neural network often you are going to decide to go through Keras and go with TensorFlow and always on a GPU or a cloud of GPUs You are stuck with the architecture of the back end, but here you don't you can train on MPI with the local Computer because you have a good GPU or you can go with TensorFlow because you want to explore all the tensors and all the Or you can go with a spark or never you can go with any computing cluster You can think because you just have to model it and you have training it and Model expression trees is kind of easy because it's one of the simple binary trees that you can model Well, I I tried to I Try to solve the same multi-classification problem standard classification problem with you have some patterns and Some categories and you want to know if your pattern belongs to one or another category and In my experience what we have is similar It's difficult to say similar because as I said before I I trained it The bulletin strategy using a spark back end and I trained the neural network using TensorFlow with the GPU but The training times are similar But the execution time the activation time are far far far away So I said Using TensorFlow over CPU for solving a similar problem small problem, of course The training time was similar, but the execution time the activation time of the neural network goes through 20 seconds instead of the 160 mile seconds of the expression tree and both problems were achieving similar loss after a training but the activation time was what is killer for me here and Think about this I Don't say that you don't have to use neural network What I saying is that you can use another things about a part of neural networks You have to think on your problem first and you will have to go for the best solution and this one is a very good solution especially when execution time The activation time not the training the activation time. It's a really important factor for you Thanks to you Hi Thank you for the look I have two questions The first one is have you used this approach to solve real-world problems or just this is kind of experiment Well, I personally use it When I was doing the comparison from NLP problems with our classification problems when you are trying to generate text and well, as I said the The training was similar that is not a fair comparison because we are using completely different back-end from training but The the execution is killer. So yes, I try to it to help a problem real-world problems and As I say, it's very good for some kind of problem specific So for classification and optimization problems is very very good. Okay, and second one is what kind of representations Are you using only expression trees or yeah, I focus mainly on expression trees here Because I think Offers the best option for the execution time. Of course you can use and many people using it For training neural networks, and then your representation is the neural network itself then you can go for a improvement on the training fairly good improvement, but There are two constraints that you have to have in mind if you are going to use as a pollutive strategies for training for not yet for training for by propagation instead of a propagation using Evolutive strategies you have to think that Evolution strategies like to decide the shape of your network. They like a lot. It's very it's far better when you let the Evolution decide which is the better architecture inside the network and this is a Consideration because many people with using neural networks say this is my problem I go for an LSTM or I go for a convolution error network and They are a stack to this architecture and that's your best option is to go with a tons of flow standard training with GPUs and Which is usual for this kind of problems because if you really want to use evolution strategies are the guide For the training of a neural network you have to let the Evolution study to decide the inside architecture of the neural network Okay. Thank you. Thank you There is another question back there Thank you for the talk Thank you for comment a bit about CMAES, right? Yeah, and as far as I know this method works for Optimizing real value data, but it seems like you're using that to create these Expression trees. Yes. So how does how does that work out? Well, as you see the expression tree represent real Data as you say and if you want to go for classification problems and not for optimization problems, of course, you have to do a trick every other tricks can be done and you can you have to decide between several options to deal with the Declassification, okay The most usual it's to go not for only one expression tree But for an array of expression tree representing each one every category of declassification So what you are doing is representing what after is going to be your one hot encoding of the solution and What you are training every expression tree is for the site with is the best probability for those category So you go training all those Expression trees which are very fast for training very fast for execution that can be parallelized even for the execution can be Parallelized when you have this approach and the result is a value between 0 and 1 Which is the probability to be in this category with you have multi category classifier classifier Okay, we can talk later. I'll do it. Okay Thank you everyone for coming