 Hello and welcome. This is Act of Gesturing 52.1 on August 10th, 2023. We have Ahmed El-Said, Alexander of Rorbia and Travis DeSalle. Ahmed is going to give a presentation and following we'll have some reflections and discussions. So thank you all for joining and Ahmed to you for the presentation. Okay. Thanks so much for having me. Today I'm going to discuss the methods that we came up to solve new architectural search and new evolution problems, the methods that are and colony optimization based solutions. So as the title here is mentioned, and colony optimization for new architecture search and new evolution. This work is a collaboration between me, my previous advisor, Dr. Travis DeSalle and my co-advisor, Dr. O'Rourke Alexander of Rorbia. I'm currently an assistant professor in University of North Carolina, Wellington and moving on to the next slide. So as an overview of the things that I'm going to discuss today, I'll try to give bird eyes view for what is new evolution, why we need it and from there I'll just try to discuss our methods that is based on and colony optimization, which is called N-based Neural Topology Search for Ns for Short and after that I will discuss how we advanced with this idea by introducing a continuous Ns for Ns for Short, which could thread off the discrete search space and replace it with a continuous search space and after that I'm going to also say what we did with the three dimensions and the continuous Ns to have that propagation free chance and later I'll discuss three of the points that we are considering for future work. So in the machine learning learn as the neural network structures get got deeper and deeper, people were trying to optimize the structures to have better performance and people in different realms or different problem domain used to borrow the best performing neural architectures or structures and try to modify a little bit to work for their problem and they tried to tweak some of the features of the structure and compare these different tweaks and then they say that we found the best performing structure but to actually find an absolute optimal structure concerning that solution it could be an NP-hard problem because to reach that solution they had to try all the combinations of the different structural elements and because we have massive structures in these neural networks it could be an NP-hard problem because we don't have computational power or enough time to actually train and test all these structures constructed from these different combinations of structural elements so the alternative way to do this is to actually try to apply a materialistic method to converge to a near optimal solution that is much better than not optimizing the structure or relying on kind of like a random search a random search can get us better performing neural network but it's not going to give us a near to optimal solution or to converge to a near to optimal solution so a materialistic method would give us an automated method and also would converge to a near to optimal solution to this structure problem so the way that people approached the NAS was by trying to mimic how optimization is done in nature so the first method, the first way we thought of was by trying to mimic how living organisms evolve in nature using genetic based algorithm like the Darwinian genetic evolution and it started with NEAT it's short for neural evolution of augmented topologies and it relied on genetic algorithms where also that concept also applied for in most of the NAS neural evolution methods and exam is one of them, Travis came up with this method and it became one of the state of the art methods in NAS so in such methods we tried to again mimic genetic evolution by introducing new structural elements or removing structural elements or altering the structure through the evolution process through the evolution generations so we can apply mutations by splitting edges or adding edges or disabling edges and we disable edges or disable some structural elements so that we don't lose that component so that we can later on use it kind of like a dormant gene in a genome so that it can appear in later generations not to get rid of totally so we can, in mutations that can disable edges we can enable them later in later generations if we found that we want to try this option we can also add the current edges or disable the current edges we can split nodes, we can take some of the nodes that in a previous generation and we can just split it to two nodes and then take the edges connected to that node and try to divide it between the nodes that were generated from the previous node in the previous generation also we can add nodes to the structure so all of these are part of the mutation process in the genetic process we can also disable a node if you want to try to just get rid of one of the nodes and disabling a node or multiple nodes will also disable the edges connected to that node beside mutations the other side of a genetic process is to do crossovers where we have two of the best performance population made together to bring an offspring and offspring will have collection of the characteristics of coming from their parents so it could take some of the characteristics from that parent some of the other characteristics from the other parent hoping that this would give us a better forming neural network or better forming generation so the main problem with genetic based algorithms is that they start with minimal structure like we can see there meaning that inputs and outputs and starting from the optimization with this minimal search space can track the method in a local minima through the optimization process so we were thinking how to get rid of this obstacle and by having a bigger larger space to start with and then we can sample some solutions from that larger space and we were looking around and we considered end colony optimization and I would say why but I'll try to introduce the method first so the method was first introduced as graphic optimization method graphic optimization method sorry it was introduced in the mid 90s by Marco Durego Marco Durego applied this method on a travel salesman's problem and the problem is mainly about a travel salesman who wants to visit a number of cities in the country using shorter staff and considering this problem if we have this number if the number of cities grows then the permutations of these numbers of these cities that we have to consider to find the optimal solution if this number of cities grows then we will end up having an NPR problem because we won't have enough time or computational power to have this this exhaustive solution done or this exhaustive search done so he from his observations to chance in nature he found that he can apply how they force to find food in nature and then take this concept that this observation applied in an algorithm to find the optimum path that leads from one point to visit all the cities in the shortest span so the method, this observation so this slide and the comic slides will just try to give us a picture of how hands push in nature to find food and then how Marco Durego took that concept to apply it in that travel salesman's problem so observers found that ants go out from their left to find food and they tried different directions and when they find food, eventually find food they will take some of that food and then they will go back to their nest and in their way back they will deposit some other substance called a pheromone so they deposit that substance so that they communicate that path through the food resource with other ants so actually other ants do exploit this pheromone other substance and when they sense it they follow the path that the first ant took from the food resource to the nest hoping that they will find food at the end of that path and when they actually find food at the end of the path they will take some of the food and do the same thing they will deposit some more pheromone on the same path making it more appealing to other ants to take it so that they can bring more food to the nest so this process shows us the exploitation behavior of ants but again from time to time other ants also try to explore some other food resources potential food resources and they can't resist following the pheromone phrases and they try to go away and find some new food resources for the nest so the ants are not only exploiters they also explorers and these two concepts were used by Marco de Rega to kind of like balance the search for the better or faster path between the cities for the travel sales man's problem and the third thing that they observed also in how ants portion in nature is that the older the pheromone also evaporates so whenever a path to a food resource is not appealing anymore or the food resources are excluded no more ants will take that path or when they take it and reach to the food resource that is excluded they will not take the same path to the nest again they will try to wonder and to find more new food resources and because they are not going that path again and not depositing any pheromone on that path the pheromone will eventually evaporate and disappears and making it less and less appealing for other ants to take so that's what Marco de Rega was looking at when he thought of the travel sales man's problem he applied that for the travel sales man's problem by making one agent try these different paths and then comparing it through each iteration so that agent will take a path between the cities and then it will compare the length of that path to the previous experience with other paths and if it's shorter it will try to deposit pheromones on the segments of that path and eventually he was hoping and he was right about what he was hoping he was hoping that eventually the shorter path shorter segments that give the ultimate shorter path had more and more pheromone deposits making it more and more appealing for the agent to take it through the iterations so we thought about this concept and we thought that it's very appealing to apply it for an NAS problem because the de Rega's solution was applied for a graph optimization problem and neural networks are in their sense direct graphs so we also considered an N-colon optimization because it's full torrent decentralized and scalable and it's also easily traceable and going back to being decentralized it made this method a perfect candidate for parallel and hyper-performance computing solution which will eventually accelerate the optimization problem and I think in the next slide after the next one I will discuss how we exploited or used this characteristic of N-colon optimization to accelerate the solution that we came up with or the method that we came up with which is N-colon optimization so this scheme or the scheme of applying N-colon optimization in neural architecture search is depicted in or illustrated in this flow chart so we start up with a method search space expressed in superstructure which expresses or embodies a neural network that is massively connected meaning that each node in that superstructure is connected with other nodes through edges and forward and backward current edges and then we let a number of agents swarm over the structure from an input node to an output node so each one of these agents will pick an input node and then it wanders from that node through the connection of the current edges between the nodes and between the hidden layers till it gets and picks one of the output nodes and then we take all these paths of the different agents and we put them together to form a neural network structure and we take that structure and train it and test it and then compare its performance to the population of this performing neural networks through this performance structures and if it's better than the worst in the population then we reward the path that the agents took over in the superstructure we were with formal and so that it make these paths appealing for later iterations through the evolution process or the optimization process that's if the generated structure is best than the worst in the population if not if it's actually worse than the worst in the population then we discard that structure or that neural network and we don't reward any of the path that the agents took and also if thermal evaporation will help us get rid of the thermals that were deposited on the edges that are not giving us better and better structures and again because the M colony is decentralized we exploited this by having an asynchronous solution or asynchronous evolution we had a main process that took care of generating the new structures and also updating the population the test performing structures and updating the hormone on the superstructure so the main process will generate structures and send them to worker processes the worker processes will train and test the neural network on the data that we have for the problem and then send the results back or the fitness of the neural network to the main process and based on that fitness the main process will either discard it or we take this fitness and compare it to the best performing in the population if it's better than the worst it will reward the path that has took on the superstructure by depositing more from more or if it's worse than the worst then the population will just discard it and it will keep generating new structures and sending them to worker processes because the training which relies on back propagation is the most computationally expensive part in this process if we have a number of worker processes that can work in parallel to train and evaluate these neural networks these neural structures we can speed up the process by training and evaluating different structures at the same time in parallel in an asynchronous evolution scheme this is actually an animation but it's not working in this version of the slides because we're using a PDF but it's mainly a structure where you'll see edges or connections between these nodes feeding or having darker colors based on the thermal values through the iterations so each frame in this animation is kind of an update for the thermal value of the edges based on the performance of the version of the neural network that was generated by the Asians when they swarmed from the start node taking one of the input nodes in the middle layer and then from there going to one of the middle layers in this one middle layer that we have here and from there going to the output so that was the concept of applying ATO and client optimization in NAS and this now I'm going to talk about the actual method that we came up with so it's more generic and more powerful neural texture search methods more comprehensive if I may say that we opted to apply the method on recurrent neural networks because they tend to be potentially larger than other neural network structures because of their recurrent connections so we thought that if we started this problem the method or the concept applies to any neural network but applying it to recurrent neural networks made it more appealing challenge for measuring the performance of the method that we thought of so this slide in the comment side will discuss the different heuristics of the method of ANS the third heuristic is superstructure itself and as I mentioned before it's a method search space and as method as possible to be handled with the machine or the hardware that we're working on the superstructure consists of a neural network that is massively connected meaning that every node in that structure is connected with other nodes via or through forward edges and recurrent edges forward recurrent edges and backward recurrent edges this simple structure that we have here represents one of just the concept of superstructure that we apply in ANS here we have three input nodes three in layers each have three nodes and one output node in the output layer and we are just showing one node connected to the other nodes through edges which are the ones represented in green forward recurrent sorry the edges are the one in gray and then forward recurrent and backward recurrent in the green and in the red and the concept of recurrent edges might be a little bit confusing if we look at it in this example because how can an edge be recurrent coming in and out from the same node the nodes in the same time step but this structure might make it more clear so here we have structure that is pretty much similar to the one we saw before but here we have three input nodes two in layers there are three in layers each have three nodes and then output layer have two nodes and then we have also three time steps the current time step T zero and the previous time step T minus one the one before that T minus two edges here are illustrated using the solid black lines and of course these edges are present in the current time step that is going to propagate through the neural network and then the recurrent connections these ones are going to bring information or propagate information from the previous time steps the previous inputs or previous data that was fired to the nodes in the previous time steps and these recurrent edges are depicted here in red and orange the four ones are depicted in red and orange the reds are coming from T minus one and the orange are coming from T minus two and the backward current edges are the dot lines in blue and green and we can see that they are going backwards so they go backward and to backward layers but because they are the current we can do that because they are they process information they bring information that is already processed so we don't have to worry about propagating information back through time or back through the structure but we can do that if it's coming from the previous time step the second heuristic answer is the colony weight sharing we wanted to use so instead of branding initialize the weights or the snacked weights and generated neural networks we wanted to use the train weights to initialize the weights of the newly generated neural networks we did that by saving these neural networks on the superstructure we used the last equation here to do this update so we balanced between the weights that were saved previously and the weights that we were are coming from the train or evaluated neural networks and we also used two strategies to do this update we used a fixed parameter phi or we let phi that was the first strategy the second strategy was to get phi by applying these two equations which relies on the performance of the neural network that was previously generated and trained so it relies on the performance of the neural network that was trained and validated or tested so if the performance was good then we will let the weights of that neural network to contribute to to contribute more to the initialization of the weights of the newly generated RNNs if it's not performing that well then this equation will not allow it to contribute that much the third hemiata heuristic is multiple memory cells so at each node when an agent or an ant leads to a node in the superstructure it will do a local search to pick the type of the neuron out of the type of the node from these three different types of memory cells so in the generated RNN the generated structure the nodes in the structure is not all the same and based on the local search that the agent or the ant will do at each node they reach through their path from the input to the output in the superstructure the fourth hemiata heuristic is the multiple ant species so we applied different species or came up with different species for the ants the first species was the ones that will traverse over the edges only the edges will only forward through the edges of the neural network of the superstructure and these ones are going to define the number of nodes in the generated structure also define the types of the nodes in the superstructure and when they're done with their work then the second species where it says the social ants will traverse between these nodes but they will use the current edges to do to move between these nodes so they will create the current edges for the new generated RNN and we have two different species for these social ants two different subtypes of species one is the forward social ants these ones traverse only over the forward current connections they go from the input to the output but only over the forward current edges and then the backward current edges or the backward social ants these ones go from the output to the input and they traverse over the current connections and the reason we thought about these different species is that we wanted to control the tendency of the ants to be around the superstructure exploiting the convoluted mesh of recurrent connections so we wanted to control that so we came up with this strategy so that we can just define this structure using the forward ants and then the recurrent connections can be defined after that using the the social ants the fifth meteoristic is the regularization of formal pavement again we wanted to give the ants an incentive to bring sparser and also well performing neural networks by just penalizing them if they constructed denser or bigger structures so we added this regularization term and the formula updated the formal value and as you see the regularization term relies on the performance the data here is the performance of the neural network and it also relies on the size of the structure the last one is jumping ants which is we want to experiment with if we let the ants if we let the ants jump over the layers when they move through them through them with superstructure if we let them jump over these layers to construct the neural networks compared to if we restrict their movement to jump one layer at a time how would this end up performance wise if they will give us sparser and well performing structures or this jumping will hurt the performance by giving us weaker structures weaker neural networks so we use at times serious data that can belong to coal fire power plant we divided the data to have 7200 records for training and testing and here the plot shows that the data and we can see that it's non-linear and it's acyclic and non-seasonal so it's a hard problem for a non neural network solution or regression linear regression solution so this is of 12 parameters when we were trying to predict only one parameter the flame intensity experiments covered all the heuristics of ants giving different values for these different parameters the superstructure consists of 12 input nodes 3 hidden layers which have 12 nodes and one output node in the output layer the recurrent connection can span over 3 times steps 12 and 3 times steps in total the superstructure had 49 nodes 924 edges and almost 3.5 thousand recurrent edges so if you unroll the structure over 72 times steps in that propagation through time we'll have about 352 thousand nodes about 6.5 million edges and about 26 million recurrent edges in the experiments we also compared performance of ants using the same data set we compared it to exam and NEAT so exam is the state of the art in neural texture search which is genetic based method we also compared it to NEAT because it's like a benchmark in the neural revolution in NAS realm and also we compared it to fixed structures, unoptimized structures that had 1 and 2 and 3 hidden layers and also different types of memory based cells the experiments covered 1600 experiments to cover all the combinations of the metaheuristics of ants each one of these experiments was repeated 10 times for statistical analysis ants generated 2000 RNNs for each experiment each trained for 10 epochs in total ants generated trained and evaluated 32 million RNNs it took a month and 1000 CPUs to finish the experiments the results that we got showed that ants outperformed the unstructured unstructured structures unstructured, sorry, unoptimized unoptimized structures and it also outperformed NEAT and then some of the combinations of ants outperformed the exam so the exam here is the fourth from the left and we can see that the mean absolute error for some of the versions or the combinations of ants heuristics outperformed the exam so we we tried to do some statistical study for the results we got from ants so we tried to look at the top performing neural networks coming in our results so we tried to look at the top 10, 25, 250 and 500 results and we look at the contribution of these heuristics and these best performing neural networks or structures we found that these heuristics contributed effectively in most of these results but the thing that was really intriguing for us or surprise us is that that we saw that the recurrent connections disappeared in these results because the best performing neural networks didn't have that much of recurrent connections we that meant for us that the memory based cells did the job for recurrent information coming from previous time steps but we wanted to expand on this later but so it's on our list discussing on our future investigations so this is just a summary for the achievements of ANTS based on the results we got from our experiments so ANTS was the first method to involve the core of ATO and called optimization and recon neural networks NAS or neural evolution ANTS heuristics control ANTS tendency to wander around superstructure exploiting recurrent connection proved successful because the regularization component the regularization heuristics it gave us better results showing here in this table and also the jumping ANTS here gave us better performance compared to non-jumping ANTS and the regularization also also the weight sharing strategy also proved effective if we look at it the results here compared to if we don't apply weight sharing so also realizing strategies are generic so the strategies that we use are generic enough to apply it for any problem or solution that is ANT colony based and ANT colonization based the firm on the position that the method that we came up with is also a value that wasn't introduced in any literature previously literature and the performance of ANTS compared to the other benchmark and state of art in the realm is also remarkable so going forward we thought that ANT was did give us a good result but the main drawback of ANTS was that the discrete search space so ANTS worked on that this massively connected my massively connected superstructure but it's massive yes but it's still discrete ANTS can move freely between these nodes and or they are forced to move over between these nodes over these predefined connections whether they are forward edges or current edges so we thought of removing that continuous search discrete search space and replacing it with a continuous search space so we designed a 3D search space where the search space had like layers representing the lag, the time lags and then ANTS can jump over between these layers to have to give us the recurrent connections and on each of these layers ANTS will just give us the the edges between the nodes and the edges between the nodes so in this slide in the comic slides we'll show an example of how ANTS move in CANTS or continuous ANTS or continuous ANTS so an agent or an ANTS will just start by picking up one of the layers that will move on this is done in a discrete fashion and once this is done it will then decide if it's going to do an exploitation or exploration movement and in this example we decided to do an exploration movement so it will decide the angle and the radius of its next allocation on that layer once that's done it's going to go forward and to that location and then decides if it's going to do an exploitation or if the next move will be exploitation or exploration move and this example it will be an exploitation move so it will try to exploit the Firmone traces the Firmone that was previously deposited by other ANTS on in the search space so it will use its sensing radius that's something that was previously defined for each ANTS and then it will find the center of mass of the Firmone traces and then once it finds that when it calculates that center of mass of the Firmone it will consider it as its next location and then it will move to that spot and then it will decide about if its next move will be an exploitation or exploration move and at each location before it's deciding about the type of the step if it's exploitation or exploration it will also decide if it's going to stay at the same level at the same time lag or jump to a next time lag if the jump or the movement is on the same level if the movement is on the same level the ANTS if the ANTS is doing an exploitation movement it will only consider the Firmone traces that is ahead of it because it can't move backwards on the same time lag otherwise it will be doing a backwards step which is not allowed in your networks but if it's going to another time lag above above layer then it's going to do a recurrent edge and so recurrent edges can go back in time in the structure in the previous type layer so now the ANTS can consider all the Firmone traces within its sensing radius the ones that are ahead of it the ones that are behind it so in this example he is going to consider in this step he's going to consider all the Firmone traces within his sensing radius calculate the center of mass and then consider it as its next position and keep doing the still it reaches to the proximity of the output nodes and once this is done it will decide which output node consider as its final position in its path from an input to the output and then other ANTS will do the same and then we will have different paths from some input and some output and then ANTS will take these paths and then try to condense the nodes so that we don't have so much nodes that are very close to each other so the nodes that are within certain proximity will be clustered together using BB scan so have less number of nodes and then the paths will be taken to be collected and put in a structure near that structure and then send to a worker process to train a test and then compare it to the population and the path performing coordinates and the process will be almost the same from this point will be almost the same as ANTS once the structure is constructed and the training and testing process will be the same as the ones that we discussed in ANTS so this was also an animation which shows how these paths are taken by ANTS look from an input to an output in the 3D search space so ANTS if we look at ANTS ANTS have only 8 tunable hyperparameters when comparing this to ANTS and exam it's half of the number of hyperparameters in the other methods these hyperparameters are the number of layers of the search space how many of them we have in search space number of ANTS which is similar to what we have a similar hyperparameter in ANTS the sensing radius of the ANTS the ANTS the ANTS's probability to create a new node presents the exploration instead of the realization of the parameters that traces in the space node compensation parameters or factors the variables that represent in the DB scan are also hyperparameters in ANTS from one update parameter and from one volatility parameter and these are also present in ANTS and ACO based problems the experiments used three times years different number of parameters and different sizes the results showed that ANTS the results that we got from CANTS were very competing with ANTS in the exam they weren't necessarily better but they competed they were the same level but it also didn't do very well in one of the data sets but it's comparing the that was comparing the performance as the mean absolute error but comparing CANTS the results from the from the size of the new networks that we got from CANTS we saw that CANTS structures were sparser so the performance was competing with ANTS and the exam but the structures compared to ANTS were much sparser similar to exam the size of things the structure that we're getting from exam remember exam is genetic based meaning that we start the optimization process using the minimal structure elements but it was susceptible to local minimus trap ANTS CANTS is not susceptible to this problem but it also gave us smaller structures with performances that we're competing with other methods at exam and ANTS so the advantages of ANTS is that it has an unbalanced space compared to ANTS results compared the results that came out were good compared to ANTS and exam, the tunable hyperparameters are half of these in exam and ANTS and also indirectly encoded the neural topology to 3D search space so this is one of maybe important contributions of CANTS then so far ANTS and CANTS are solutions that apply neural topology to 3D search meaning that they did not optimize the synaptic parameters of the neural networks during the optimization process or during the evolution process so we thought of actually making ANTS capable of doing this as well so to train the neural network or optimize the synaptic weights the weights of the neural networks the structural optimization process the structural optimization process so we added a fourth dimension to the search space which we embedded and embedded the weights of synaptic parameters in that map these parameters in that new dimension we also wanted ANTS to be self-aware and try to evolve themselves through the evolution process so that they can adapt to the changes or like adapt themselves so that they can give us better performance we wanted them to change their behavior of their characteristics like the sensing radius they had before for example they can be available that can be changed through alterations each ANTS can change their these characteristics as they are as the evolution goes on progress based on the performance of the neural networks that they are generating so the advantage of doing this was that it eliminated the back propagation process which is the most completely expensive part in the evolution process in the method that we use so far any exam, ANTS and CANTS so eliminating the back propagation gave us much faster evolution process the results we got I will discuss the graph on the right hand side first which discusses the fitness of the main absolute error of the results we got from back propagation free CANTS the four dimension CANTS compared to the normal CANTS, the back propagation CANTS and ANTS so the results showed us that again a particular database that back propagation CANTS and back propagation free CANTS did a quite similar job but they were both better than ANTS on this particular database but the actually main contribution or the main advantage of applying CANTS shows up in the graph on the left hand side because we can see that if you compare the results based on the time of the evolution based on the evolution time we see that CANTS was much it took much less time than the back propagation version of CANTS and ANTS using these different number of CANTS also this graph shows how fast and back propagation free CANTS compared to the normal CANTS in this figure the curves and the dotted lines shows the time that back propagation free CANTS and CANTS took to prepare or generate the neural networks we can see that back propagation free CANTS took more time to prepare or generate neural networks compared to back propagation CANTS or the normal CANTS and that's because it has the fourth dimension also has to evolve the ANTS or the ANTS through the iteration so it needs more time to do that but the other two curves in dashed lines here are the lines that shows the amount of time it took for the two methods to validate train and validate the neural networks so back propagation free CANTS doesn't have to train the neural network doesn't do back propagation so you can see that the time it took is in order of magnitude less than the back propagation version of CANTS and thus these four lines shows the cumulative time took for both methods and of course it shows that back propagation free CANTS took much much less time than the back propagation free CANTS took much less time than the back propagation CANTS the future directions that we are points that we are considering for sorry the point that we are considering for future work are to turn CANTS to a complete continuous search space so as you saw we the search space for CANTS is not purely continuous because the time lengths are represented by discrete layers we want to replace this with the continuous dimension continuous layer representing the time lag in arguments however this is a little bit challenging because the time lengths has to be known prior to any optimization process because other than that we will be mapping the whole time series as time lengths and then picking the time lengths from them which is not feasible so this is the first point we are considering to investigate for in future work the second one is to investigate this finding that we found in the results in CANTS where the recurrent connections disappeared from the continuous performing structures and the theory that we have is that the memory based cells replace these connections to give us the information that we need from the past time steps so that they were more effective more efficient in doing this compared to the recurrent connections so we have to do this expand on this and investigate more the last thing we want to consider also this is one of the top things that we want these three points are the top on our list the third one is to actually consider one of the concepts that was coined in a book by Dr. Deborah Gordon which where she mentioned that the living organism in ANTS world is not the ANTS themselves but there are the colonies the colonies are the organisms that are they start and they grow and they interact with the environment of the ecosystem they interact with other colonies and they die at some point and the ANTS themselves are not the organisms they are the cells of these organisms the colony organisms so we want to take that concept and apply it in our methods to have number of colonies living together in parallel evolving and communicating with each other so that we want to see if this would give us a better performance because after all we are trying to mimic nature in our in the solution we are trying to investigate so with this I'm done with my presentation if you have any questions alright awesome wow what a great presentation how about Travis and Alexander either of you which want to go first please feel free to give an introduction and any primary remarks that you have on that okay I can say something I don't have too much to add I think Abdulrahman did a pretty good job you know climbing over the core bits of the work I'm Alex Rordia a professor in computer science an affiliate professor in psychology an affiliate faculty in computational neuroscience at the Rochester Institute of Technology and I work on a lot of stuff but primarily predictive coding active inference, variational free energy a lot of the stuff that's actually of interest to this group and yeah and this was a particularly interesting project for me because a branch of my own research is working in neuro evolutionary methods or even just nature inspired meta heuristic optimization and you know when I got the pleasure of working with Abdulrahman when he was a PhD student at RIT you know we talked a lot about the colony based optimization approaches and I encouraged him to do to look into sort of like the origins and also try to understand actually how physical ants behave so that was always fascinating to me I don't have too much to add in terms of the technical parts I think he covered all the core results the only thing I like to think about and I'm actually more fascinated to hear from the active inference institute their interest in the ant colony methods and particularly what was interesting to them because thinking about what does ant colony optimization how do you view it from an active inference perspective is I think particularly interesting in thinking and I even had a thought I don't know if this was a question among the audience is mine do the little ant agents or the can't agents that Abdulrahman described earlier you know what is there a way to start to view them as like a multi agent system that's optimizing some variational free energy quantity and you know because you know a lot of even like Carl's work I mean I collaborate with them you know sort of touches into this area of like well what happens with collective intelligence and societal organization you can kind of look at free energy from these very very high level viewpoints all the way down to fine grained cellular activity and so I'm actually more curious to know Daniel and anyone else in the active inference institute who can sort of explain their particular interest in and based optimization and meta heuristic optimization I'm curious to know that but there could be some interesting viewpoints of you know like what is the free energy I think the ants themselves are very simple I mean we have made them more intelligent I know Abdulrahman and I talked at length about well what if we even made them for example have like a reinforcement learning control system and you could even imagine well now what if they can't themselves you know engage in a form of active inference themselves what would that look like for the system and they are optimizing their own free energy those are just fun little thought experiments we have obviously not worked on them at least Abdulrahman was never exposed by me to that part of my world and you know and bio mimetic intelligence so those are my comments I'm not sure if they are particularly helpful but they're very general and I'm actually more curious to know from the active inference institute their interest in it and you know where does that maybe perhaps intersect or is it just like oh we like you know interesting topics and intelligence and so yeah Thank you Alexander Travis do you want to say hello and give any reflections on the talk Sure I'll come in hopefully you can sing me a little bit like don't go here hi I'm Travis himself I'm an associate professor at RIT also the graduate program director for our masters in data science so if any of you are interested in any of this or any data science please you know shoot me an email I thought this work was really very interesting in a lot of ways so Abdulrahman was working on it I think one of the coolest parts about it is a popular neuro evolution algorithm that came after neat is called hyper neat and it transforms the discrete search space of neuro architecture search into a continuous one and it's shown to be a pretty powerful method well this version of ant colony optimization the new one does the same thing that makes the search space continuous but what I found was really cool about it was that as opposed to traditional ant colony optimization where you have a graph and you just send the ants along the edges of the graph and you take the best paths and construct a graph after it here the ants are actually working like ants in the real world well they'll move a continuous amount of space between data point B and actually dropping down pheromones and it's more like a real simulation of how ants would move around in the real world and we're getting better results from it than some of the older methods so I think that is, that really made me be kind of happy to see that and thought it made it as very very interesting work that's awesome, Ahmed want to add anything or I'm happy to give on the ants and ask some questions from the live chat I'm good, try to cover all the things that we I wanted to say about I think just one thing that Alex mentioned that we give high intelligence to the agent this is something that we discuss and I'm pretty much open to that I started actually to think about it but I didn't implement anything yet however the last thing that I discussed in the future is something that I actually started working on but they didn't start to explain the patients yet so hopefully we will see something out of that awesome well there's a ton of ways to go and isn't that kind of one of the fundamental questions like in an interactive setting either pure agent-stigma g interaction or multi-agent but ultimately mediated through multiple stigmerges with like reading and writing and error correcting codes so in that communication setting I felt like the work generalizes along multiple dimensions that previously approaches to multi-agent just didn't have those kinds of flexibilities like the continuous time feature and several other features and I guess with respect to the ants themselves I did five summers of field work with ants in the USA Southwest in Arizona and observed a lot of foraging activity so that problem or that context or setting is really a fun one and a pervasive one across any kind of living system anything that's going to be active and living so why did you pursue foraging type algorithms overall and does this class include the interaction-based methods with direct agent contacts that Professor Gordon highlights in the ant encounters yeah I just wanted to I just wanted to mention that the first thought about this actually Travis started this idea about using customization and neural pollution by applying this method in simpler neural networks Elman and Gordon neural networks and I took the lead from there and started working on my pieces my patient pieces so we thought about this idea I think I mentioned a little bit about why we used and optimization in previous slide but just repeating that for the audience to make sure that they got we thought about this idea because and optimization was applied for as a graph graph optimization solution and we thought that why not and why not neural networks since they are in their sense neural networks they are directed sorry they are directed graphs neural networks are directed graphs because the flow of information goes from one direction to another direction so they are directed but they also graph because they are nodes connected with edges and the ultimate aim or goal is to do my structure by removing and adding elements there so that it gives us better performance awesome and one way that Professor Gordon and others have talked about that bi-directional learning relationship between the computer science and the math and the analytical formulations and then the field work and the actual behavior is because the species of ants they are working amidst a huge range of ecologies with all these different patterns of regularities all these different resource distributions and foraging it's amazing how general it is yet it's also just one of the functionalities that need to occur in terms of these like even slower processes like allocation of tissue to faster processes even of like response to alarm so it's like this one class of algorithms clearly scales across from view ultimately one some of these foraging algorithms are lone foragers they don't leave pheromone trails so it's like even the idea of leaving a single positive pheromone or leaving a more than but also there's things models can do that ants can't do like the time travel pheromone loss this perception high dimensional signaling profiles that can't occur just with like finite amounts of molecules and now they are just persisting on their path as active agents within one generation with a variational free energy at the behavioral scale then across generations with that evolutionary layer and the relationships between the neural network implementation and the active inference model they're kind of like two ways of describing slash implementing yeah I'd be curious to hear any of your thoughts on where you see active inference coming into play or how do you see ultimately similarities and differences between neural network based approaches and active inference based approaches are they the same complimentary overlapping just want to take this one I think this might be more of an Alex question to be honest but the past well yeah I mean if you want to keep it on active inference but I think Travis you're going to need to tag in when you want to get into the very specifics of like the actual and colony details because again I kind of see and colony optimization from a more global point of view I would say so to be clear I mean this particular work that I'll be so that way of the Romans not also completely blindsided by your question Daniel even though the name is in the Institute itself this isn't an active inference work so you know again while there are obviously as you pointed out lots of interesting elements like for example the fact that the ants when they conduct their exploration along let's just think about like the recurrent networks and they're figuring out what nodes and what as Abdulrahman can explain the superstructure right as they iterate across with the pheromone trails and figure out what nodes I want to recurrently connect before connect skip connect so on and so forth you would say well okay what these ants are doing is they're engaging in epistemic foraging which is a key concept in active inference right the idea so that way Abdulrahman and Travis are not also left behind by a jargon epistemic foraging and active inference it's a big general framework it's like a biological neurobiological process to RL and epistemic just refers to kind of like the uncertainty Travis that you and I work on and the idea is it's saying well okay I want to understand my world and the more that I explore my world right there will be things that surprise me less but if I encounter some information that is really weird when I build a generative world model or a predictive world model that's very surprising I should probably explore that and so of course I'm condensing the concept down into sort of like the exploration part of the explore exploit tradeoff in that I know you know but that characterizes reinforcement learning so that's just what we need when we say epistemic or epistemic foraging and obviously Abdulrahman foraging you know can be likened to what the ants are doing right they're exploring their environment and so I guess with that in mind so that way everyone's sort of on the same page here to get to your question about the differences about how does this work versus like your typical neural based approaches to active inference and I would fall into that category of oh I build neural models biological process models those are very much focused you could say at the individual level at least the ones that I am aware of when you're building for example even a back propagation based partially observable markout decision process in active inference that's like a single agent right you're trying to build this construct that is trying to balance the epistemic quantity with it's instrumental term which by the way another jargon term for you Abdulrahman and Travis that's just like your reward signal or your prior preference or a prior distribution over goal states and so the agent these agents sort of like deal with that trade off but at an individual agent level now again I'm sure that there's interpretations of these from other perspectives the ant based approach even though I would not argue per se that this has at least an explicit form or connection at least that Abdulrahman has made clear to active inference but the idea is that this is like a multi agent approach to active inference and so the ants when they conduct their epistemic forging which arguably is a very very simple model each and each antenna of themselves is you know essentially a bunch of you know coefficients and some hard coded rules because their job is essentially to work together with their pheromone trails to figure out oh what parts of the superstructure are useful so I'd say that that's different and it lends itself in some ways of course you can make the ants more complicated and lose the benefit of I'm just about to say you could massively parallelize this and this is one of this key strengths that I think is natural in for example a lot of actually nature inspired optimization algorithms and an colony optimization is one of them is you know Abdulrahman has been using hundreds of CPUs and you can put these ants on their own individual processor and the communication that's you know occurring across them as they exchange information is you know through the pheromone trails there's an indirect mechanism it's not terribly complex to facilitate and I'm sure that there's even better ways to go about doing like asynchronous forms of communication and further diversify this I know Travis can add to this has done things on like citizen science and distributed computing through you know the volunteer volunteer computing and how you can distribute this through a massive global asynchronous networks you can imagine adapting the ant form sorry the ant colony optimization approach to some like distributed massively distributed version of active inference where you essentially have to write down that the variational free energy and I'm putting quotes around this because again there is no concrete term written in Abdulrahman's work I mean because we haven't at least we haven't viewed this from the active inference perspective directly each ant is optimizing its own variational free energy but then there's probably a global quantity that's sort of as a function of those pheromone trails in the individual ant agents and then of course with the exploitation term or the instrumental or the you know what is the reward signal to give an RL term that's sort of driven by the performance of the actual agent on the each and candidate agent on the task right Abdulrahman you compute like mean squared error when you're doing time series prediction so in some sense we have built in you know a reward function that we use and again for those in the active inference group here you know you can use the complete class theorem and look at you know the prior preference of saying oh well the reward is actually technically a prior preference right it's like a log probability so with that in mind you could squint at ant colony optimization and I guess the big benefit comes from that massive parallelization that you wouldn't actually very easily if at all get with you know our single agent neural based approaches and that might be an interesting place to build on and I'll stop rambling at this point I'm not sure if that was helpful that was awesome even earlier today Chris Fields in the physics as information processing course was talking about the classical information inscribed on the blanket which is like the pheromone perception and deposition pheromone modification and and perception sense making an action which we can associate with like the nest mate cognitive system so then there could be as simple as a pass-through for the nest mate could be any arbitrary relationship described with the blanket simple nest mate sophisticated nest mate like another level of time series modeling whereas there's the environmental time series modeling and that's just in the ants and then the fourth dimension is like that quantum rotation which goes from the lower dimensional classical state merged screen into the quantum informational space and so that's one of the discussions ongoing in act inf right now is about well previews approaches to connect quantum formalisms to macro let's just say neural phenomena based it upon the plausibility of like a molecular electronic bubbling up whereas just with research from decision making and statistics and just multi-perspectival modeling and all the issues associated with the physicality of information transfer the finiteness of it the quantum formalism becomes useful just by itself whether without reliance on some other electronic phenomena so it's just a lot of very interesting connections like having the degrees of freedom on the blanket which could be noiseless and forbids or it could be noisy with this really specific thing but in silico you get to play it from both sides and scale things up and down and do these meta heuristics on top of that arbitrary space could be really simple for learning or it could be however much and then just like the ant colony algorithm is ultimately federated through embodiment that property makes it a really useful candidate for how mimicry so a lot of times when people think about collective behavior they're thinking about like the flock of birds in the school of fish and those are of course collective systems and collective behavioral and all these kinds of like complex systems properties can be studied in that type of system but it is also neglecting at least an analytical degree of freedom with the stigmergy so that really opens up that both the quantum the classical information or both niche modification and behavioral and cognitive modeling so just to add on because I think and then also what the node is could even be heterogeneous or unknown or in different ways or fixed through design processes so just like the ant colonies are flexible enabling them to live in all kinds of places make all these kinds of nested decisions that interact with each other that flexibility is just like the tip of the iceberg of what we could even just describe because there would always be real environments we hadn't yet tried with ant colonies so we really would never know the full extent of like all the repertoire and the dynamics of the ant system but then we can just abduct into new mathematical statistical distributional frameworks pull back to different levels of the learning in the meta-learning process and just start there and then almost ironically or maybe the opposite of that it could be applied to ant colony video data or movement data or foraging activity itself but it kind of takes inspiration and develops in parallel or in conversation so it's not like bounce what real ants can do or you could constrain it so that there are properties that real ants have like they can only interact within this certain way or like there really are only this many pheromones do model comparisons so it's a lot of degrees of freedom I feel like you all are opening up with the ant colony modeling and also one of the challenging pieces of multi-agent simulation is kind of like the open-endedness with the design space so then it's very hard for even creative ideas like sometimes to find the right compute and processes that obviously are still even needed for what you discovered with the analysis here I'll ask a question from Bert in the chat I just want to clarify just so make sure that I got the right term and certainly Obleromin and Travis might want to look it up is when you were saying blanket you were referring to a Markov blanket, correct? Yes, so the technical definition of Markov blanket when you have a Bayesian graph where nodes are the variables and edges are relationships amongst these variables for any given node of interest we'll just call it internal states so these are not features of the world it's not tagged onto some tissue of a real nest mate in the world this is something that's tagged onto or like a perspective we could take on any node in a Bayes graph and then all the nodes that insulate it and the co-parents are known as the blanket and there's a lot of more discussion on it the philosophical implications and all of these generalizations of that but broadly the Markov blanket is just the inbound dependencies which we associate with sense-making and perception learning, attention and then the outgoing dependencies for the agents which we associate with like action influence in some downstream pointing way Thank you, I just wanted to clarify that I don't think Abdulrahman and Travis might be familiar just with that terminology it's very very active in-princey kind of jargon so I wanted to make sure that they got that from the physics point of view Yeah, totally great point and I'll ask a question now from Burt Burt says Very impressive Solving generative models with more generative models sounds very promising What about replacing ants with convolutions? Talking about a meta learning algorithm like having a neural network that learns how to optimize neural other neural networks Yeah, I mean this concept I think is interviewed at some point I mean in machine learning blur but we wanted to apply for nature based method that mimics like the other nature which is the nature is the most efficient optimization evolution system so looking at the results that they are in those nature applying nature based methods they were superior to any other method and the results we got also which is just pointed that out that we saw good performance coming out from these from our results and previous results from other methods as well So what happened here a little bit too in the case of neural architecture search there's kind of a couple classes of approaches one is constructive where you build larger and larger networks and try and keep your network size minimal to try to find your optimal solution Other types of neural architecture search approaches use like a superstructure and this is kind of how the earlier iterations of this were where you have a bound of your search space and you try and find the optimal network within that bound so one is like trying to build things from the ground up and the other one is trying to like trim down a big network to a small network as to your question about convolutions there's been a fair bit of research lately in what's called graph based neural networks which can use convolutions over like a discrete graph search space and can potentially produce other graphs and I believe there's been some neural architecture search work using this but one of the main and I think cool things about the approach here which is different from those is that even if you have a graph based neural network and you have your search space defined as some kind of matrix where things are off and on depending on which nodes are connected to each other you have a fixed search space which may not be big enough or it might not be the correct search space for this neural architecture search problem where this method here is all continuous right so within this continuous search space you really have it gives the algorithm a lot of freedom maybe too much freedom but a really open-ended way of generating a wide variety of neural architectures which if you pre-constrain your algorithm to work within a fixed discrete superstructure you may not even find them because they're not even even a possibility so that's one of the reasons we didn't go that route but there are graph based neural architecture search algorithms out there where basically you take the architecture as a graph train a neural network to spit out another graph that it might think is better and those use convolution sometimes that helps that's awesome so convolution sounds like yes and one thought on that is yes the ants solve all these incredible patterns and and kind of do amazing things amidst informational and physical limitations like we all do having the ants be able to just make trade-offs within a task space and then have a dial as modelers to make that task space kind of like touching the pheromone distribution or metacognitive ants or something emulating essentially that for example the active inference forward looking and thinking through other minds that there could be a kind of cognitive colony so then that enables in silico total thought experiment colonies and through data driven processes also kind of keep continuity with that model perhaps literally continuity with the model and then connected to empirical which is something that is very hard for agent-based modeling which as you kind of point to often set certain fixed axes performs like a sweep looks at one mechanism doesn't look at all these possible mechanisms of like learning and intran intergenerational and all these time effects so how do you see this being used in different research or application domains well well I mean we were the main use case that we think of was neurovolution in architecture search to apply it for other domains other than your architecture we didn't figure out that yet and I think Alex can explain on that I can hop in a little bit too I mean so basically this type of algorithm if you need to generate graphs and you don't necessarily have a fixed structure for that graph and when I say graph I mean like a computer science graph where you have nodes in it so pretty much anything involving graph construction I think types of methods for no networks kind of under the hood can be represented as graphs and usually are so I think we're using it for normal networks because it's really popular but there's other algorithms out there like the traditional traveling salesman there's like routing problems any type of stuff where you might need to generate a graph in a smart way it's good to do that to your other point though I think what's really cool here and I don't want to steal up or on this thunder but you know his last point on future direction is while the a particular version of this ant colony optimization search is running to find an optimal neural network a colony has fixed parameters that it operates within but if you think of the colony as an organism as opposed to the ants being an organism you can evolve colonies that optimize how the ants themselves act so you can have evolving colonies that in a smaller sense also evolve or optimize what they're doing within their prescriptive parameters for the agents they're generating so you can have like a meta meta if you're a it's awesome I mean the evolutionary account of a why question for ant behavior today one part of that answer is like because colonies that couldn't under that regularity or constraint survive we've had a long long time to wipe those off the table and so every biological system has to have that kind of multi-scale ordering in 2021 we made the active imperance paper which was modified from an epistemic foraging visual attention task about scanning around and then learning a saccade policy that we had to do with epistemic foraging but not leaving a trace and then the main modifications to bring that active inference epistemic visual foraging model into the active inference ant setting was to add a pheromone rule just like you described even though of course again that's not the only pheromone rule but that's just the most simple pheromone rule that we can generalize from as you definitely have and there are just many emerging ways of modeling those multi-scale active inference models so composing across layers which we might associate more with the kind of laterality of things that happen through interactions and then also as Mike Levin shows with kind of the time diamond systems that have memory retention component of some shape, cognitive shape and then a pretension awareness or agency other attributes you can use to describe that and that that's a statistically amenable way to describe things and then there's a variety of implementations on a given statistical problem or like federated compute architecture it might be the case that you're not running like the pure matrix multiplications that are shown in like the early MATLAB code of active inference different components of machine learning systems might be kind of composed together also ways kind of abiding by those patterns of communication but then there's a level of abstraction that we can still describe but it doesn't mean active inference is going to be like kind of causing it so that's what gives a lot of flexibility and it's really cool that through your background Alexander and work and these kinds of collaborations that like the active inference perspective on multi-agent modeling with all these other views can at least come together um comparatively and then that is going to I think be quite an interesting interchange to apply this entire tissue type or colony type thinking above and within the models just a lot of degrees of freedom like you said could be too much I mean and I think there's different ways too I think you depends on how you want to take the ant metaphor and you know again it's it's kind of interesting some of the questions or comments that you're making Daniel and some from the audience about you know uh how does this you know think about it from a cognitive point of view I mean I do work in cognitive architectures of course again kind of from the whole single agent you know or single entity and you know modeling a single brain in its different regions but I think if you take you know a nature inspired optimization approach like the ant metaphor that Abdel Raman uh you know you know latched on to and that's sort of like the way in which he formalizes how or it takes a principle of how ants interact with their world interact with each other uh and then you know mathematically model those particular concepts uh step by step and I think if you bend the metaphor and say well okay could the ant colony metaphor apply to multi human agent systems right or other entities does it does it does the ants necessarily can they be generalized beyond you know the physical creature upon which Abdel Raman based his initial metaphor and that's an interesting philosophical kind of take to it and then how do you apply that to let's say building a multi-agent cognitive system and then of course Travis was discussing with you and you were mentioning metacognition and you know you could think of ant colonies of ant colonies but you could even replace the word ant and just say well we have you know clusters of you know intelligent agents or however whatever degree of modeling we're doing because again I do want to emphasize that at least they cance and ants agents that I have worked with in the context with Abdel Raman they are not in each and of themselves even I would argue if nothing else a very extremely simplified generative model or a very very very simple control system there's no neural network under each one because then you'd have to simulate computationally each one of these ants within the framework so I think there's always that practical machine learning kind of viewpoint of well you know there's always how do you simulate that in you know Abdel Raman is working with CPUs like he has an army of GPUs to replace them with convolutional networks again if you had the resources this would be awesome but you know expense and money is another constraint on this planet but I think there's interesting views and interesting directions one could go by taking inspiration from the ant metaphor and you know the concept of pheromones and translate them to other you know other real world signals and how for example communication patterns among other animal entities or other human agents and I think that opens up an interesting perspective and if you're constantly trying to connect it back to you know free energy minimization and trying to say well how are we balancing the terms that you can decompose it into an epistemic and an instrumental and how are these balancing out and how are these physical processes that we specify you know that's just a very interesting place to be and you mentioned active inference versions of ants and that's fascinating and of itself last comment I have is again the degree of modeling and what you are modeling like if you're modeling a society or organization you know that's one way you could use you know the ant colony framework if you will or meta heuristic optimization frameworks to you know then cast you know any type of complex multi agent system as an active inference kind of you know engaging process or you could go really low level and think about you know cells in a body or you know our units that make up organs or organelles and trying to say well can we use this to model that level of granularity within like a human or an animal entity right and I think there's some fascinating questions about how does this metaphor manifest itself at different time scales and different degrees of you know perspective right about how you're modeling what are you looking at what's the picture that you want to emulate and of course there's always under the hood this practical consideration of well okay the computational expense that you allocate and are you able to actually run that simulation long enough because I think of the ramen incorrect me if I wrong if I'm wrong you mentioned like one of the experiments I think was for the bigger systems took a month right of course this was on how to use you know that that can get pretty prohibited if you want to go even bigger than that but again I think it just depends on you know what hardware you have to simulate this on yeah yeah and also I mean using high-performance computing it is not always feasible for for smaller so if we try to model the brain of ants like small your neck rig using and GPU might not be like feasible solution can do that high-performance computing specialist here expert value but like committee sending data to the GPU and getting it back it's very time consuming and resources consuming it will worsen the time consumption rather than solving it because communicating between the main memory and the GPU would it have like an overhead so it has to be a big enough problem to actually utilize the GPU in such solutions we've been sorry so when you have like you know the super large language models or large models for computer vision they do a lot of just massive operations on tensors which are basically multi-dimensional matrices right and when you have really big wide sensors they you can parallelize the operation really nicely across the GPU a lot of this work is based on doing like the time series forecasting time series classification on sensor data I think stuff from like power systems so you know in the input to a large language model might be a thousand or more you know word embedded length of a word embedding which is actually not huge but if you go up to computer vision model the input image may be a thousand by a thousand pixels and that gives you like actually a million inputs right when you're working with sensor systems off of aircraft power plants but then you may have 50 to 100 inputs and when you're working with this type of time series data you don't need a massive super wide neural network and then if you add in recurrency where you have to do back problem over time and other things like this you actually can't really do it more efficient we tried I've already wrote a bunch of code a long time ago to put the stuff on GPUs and we found it was quite a bit slower so depending on what you're doing with a neural network a GPU actually isn't the right answer but the other cool thing about this which I think does have maybe even potential for there is that you know one of the big not talked about problems in machine learning is that back propagation is the fastest thing we know but it's inherently not scalable like you can get a bigger better GPU to do your bits of your network in parallel to speed things up but that only gets better if you have a bigger network if you want to speed up the training process you can't just add another CPU or another GPU and make back prop go faster you can make the forward and backward pass through your neural network faster but you still have to do every epoch of back prop iteratively a method like this where we generate we're not one it's back prop free so it's not using back prop we can use a nature inspire or other method to use hundreds of computers and you can throw twice as many computers at it and get a result twice as back twice as fast whereas back prop you can't do that so if you think about actually being able to train a neural network faster back prop actually it's got a pretty low speed limit for what we need to do and it's kind of a big problem in machine learning community that people don't like to talk about because they're like I'll just buy the next big and video GPU and that'll be things faster so that's super interesting does this maybe even bring up a kind of relationship where things like a graphics visualization of course a GPU does well and that's like the screen changing through time with a classical process that can be massively unfolded and then the cognitive models ultimately of the nest mates which again can be nested but the cognitive the thing that's more quantum more cognitive model like you can do in parallel because the minds are not influencing each other except through stigmergy so then that is CPU bound the size of the colony but and then you could use different like graphics techniques like there are colonies organisms so one ants being or it's a philosophical question what is the scale at which A exists but all throughout California with the Argentine ant for example and so how do we deal with those kinds of like mesh work cognitive systems all the way on through 50 in an acorn there's just all these different trade-offs that are being made and like in the feature list deserts there's different wayfinding pathfinding sensor integration polarization of light like different cognitive strategies because they might be going out long distance and dragging something home not leaving any pheromone because it's not any more likely to have food there so in that case the stigmergy is basically minimal to essentially none and then in other situations you could have something that's a very adherence to distributions to the point of being like fit to very like kind of normative path but that's happening at a level that allows the different compute architectures different information architectures and ultimately different biological embodiments to really engage fruitfully again looking at the variability the diversity of biological algorithms for collective behavior which have been studied by professor Gordon and others in so many different angles yet sometimes it can feel like multi-agent models always start kind of like at square one demonstrate some proof of concept phenomena and that is utilized as part of like a bigger perspective but it's not like that model was ever like claimed to have been tuned to maximum performance it's like well we got decision making behavior you could transfer this to group decision making or something like that but there's still like a big gap there but I think what you're describing with the Kant which is funny because it could be cannot but also the Kant is the dialect which is spoken it was very funny when we came up with it great choice and it's just like yeah because there's multiple perspectives to swap from on the classical screen because the meaning of the word is something that's happening that fourth dimension cognitively the meaning of the word isn't to be found just on the blanket just on the interface itself that's just the communication and that's like a bounded system then if you model a cognitive system that doesn't have that kind of a constraint that so represented by a map that has some kind of blanket index some kind of blanketing like if you don't embody that constraint in the statistical model the map you're ignoring one of the fundamental constraints of modeling the way that things happen in an embodied fashion maybe there's some abstract space for a certain problem that's just like a total slam dunk however for full generality at least to the space that we know of biological life forms and their engagements and like ecological engagements not just like within one behavior there's that space is so vast and there's so much to learn across different systems then to again to abduce away into different like information architectures and active inference being some subset or type of those so it's awesome work do you have any last comments I well being giving ability to brands to be so aware aware about this environment it's actually something that we implemented in our last work with BP free cans they are indirectly aware about their environment and they are adapting to the changes in their cake environment by indirectly meaning that they are evolving using a genetic based algorithm to just change their behavior like how they sense the word the hormones when they take the steps and some other time there's some other characteristics they have so they are adapting but kind of like not in an intelligent way but through evolution if you want to say actually we consider putting a brain in each one of these agents but then again Travis and Alex mentioned we found that the it won't be practical actually it will hinder our asynchronous design because we couldn't like analyze that it will take time to train each one of these brains as we evolve from your networks awesome Alexander or Travis any last thoughts Alex okay I I'm just really really happy I think this work is really interesting and it opens up a lot of pretty cool avenues again if we can get to the point where we're evolving colonies that are producing ants and can see where that can go so one of the big issues in neural architecture search is the whole question of what is an optimal neural network and what an optimal neural network is could be different for well will be different for different tasks but not only that even if it's the same data set how you're using that neural network could lead to a less optimal less or more optimal neural network depending on what you're doing maybe you need one that's more energy efficient maybe you don't care about energy efficiency or performance and you'll take a slower neural network but you need more accuracy so being able to have algorithms which can automate this whole process and actually want to use the neural network for is really important and I think one just having ant colony optimization be able to optimize a network for a problem is great but two if we can make it such that the algorithm itself is self-optimizing it really can streamline this whole process where right now if you're doing machine learning it can be kind of miserable you like make a neural network architecture you try it out see how well it does nope that didn't do so well let me tweak a couple knobs this is a good process my whole life as a computer scientist is about being lazy but being smart about it so whatever I can optimize so that I don't have to do it over and over again seems like a good use of my time so I'll be smart about having to do as little as possible in the future then I guess I don't have too much to add to that I think a lot of the good discussion has happened already about the various implications and ways of viewing and colony optimization from other perspectives including an active inference point of view so I guess really more from a closing thought on my end is that it will be interesting or it is an interesting direction to think about like I said earlier suggested about the adaptation of the metaphor to other systems and what are you trying to model and what's your goals from a scientific and philosophical point of view what are the questions you seek to answer and I think it might be very interesting again given other developments and computing technology and ways in which you implement the parallelization that I think that's what attracted me the most to a lot of these meta heuristic algorithms even things like particle swarm and when I worked with Travis many years ago on the exam algorithm you saw our names on that working on that type of stuff the part that always caught my attention is again that ability to say I can put these entities on different processing computing processing you know resources or devices and then they will interact interact and exchange their results in some way to try to optimize some you know often complex objective cost function and so I think the part that will see or that would encourage the wider spread adoption of even like meta heuristic algorithms in general not to say that they aren't used a lot in for example the engineering domains is again the development of parallel computing processing systems and I think exploiting things like asynchronous computing that was again another angle that caught my attention from Travis he's done a lot of work on optimization and their evolution from an asynchronous point of view and how can we allocate whatever resources are available distributing them across global networks I think that might be the best shot to scaling up let's say with what we got right now there might be again you've mentioned Daniel quantum technology quantum computing is another interesting place that sort of like changes the game but you know barring changes that we don't necessarily have exactly at their best at this moment you know how can we take advantage of citizen citizen science or distributed computing or peer to peer type of communication and building massive active inference systems that you know embody like the multi agent metaphor of and colony optimization or other nature inspired frameworks and can this system you know evolve over very long spans of time just like evolution really worked another piece my final and is that why I'm interested sometimes in evolution is that to me it is the inductive bias that provided us with structures that allow for example a human agent babies can you know to operate already babies can already recognize faces right and we have certain instinctual reactions and certain mechanisms that evolution is in doubt us with and so a fascinating question is what is the interplay of this you know the idea of simulating an artificial form of evolution maybe building you know DNA structures are very very simplified computational structures which would answer Abdul Rahman's concern about well maybe we don't want the agents to be too smart know themselves because I can't really simulate that unless you give me like a decade turn on the simulator but you could maybe come up with a more fundamental primitive and then use that as a starting point for your neural network let's say Daniel you're you want to do some task in image segmentation and you know like okay but what can your evolutionary framework give me I'll say here here's a template to start from this is this is a kernel on which you build your framework and you know it's like a DNA structure and this is evolved across you know many many years of distributed peer to peer computing and you could imagine this mammoth evolving continual learning style you know evolutionary algorithm whether it is based on genetic algorithm brand colony and you could imagine that might be an interesting way to think about and by the way I am spitballing and generating an idea of like how I could envision a scalable form without inventing a new computing system that I don't know will or will not exist because quantum has a lot of problems still to solve too like superconducting or super temperatures or trapping photons as I have learned so that might be an interesting direction and I think the scaling of this especially from the practical end is going to be the most important we're going to need to pull together all the tools that we have as I mentioned before so I'll stop there too because I'll end up rambling more so hopefully that made sense well this was very epic and inspiring so good luck with the work you're all welcome to suggest another piece that we might focus on or continue the discussion however you see fit because it's super interesting direction so thank you till next time thank you so much for having us bye