 All right. Good morning, everyone. How are you? So let's start. I'm Tom. I work for Semiotic Labs, and today I'll have the pleasure to present you our work on reinforcement learning in query pricing in the graph. So first let's me start with the outline. I will tell a few words about Semiotic, what we are doing, then I'll introduce what's really we are doing in this area, automated price discovery. I will formulate the problem. I will add a few words. When it comes to modeling, how we model that problem, then I'll brief you and through reinforcement learning basics, let's say, just a refresher for those ones of you who know or maybe do not know reinforcement learning. Then agent-based modeling is the core of that talk, and I will show how we're using that for testing system properties and for testing different behaviors when it comes to using single and multi-agent setups, and then I will show that actually we deployed that solution in production, what are the results, and finish with a small summary. All right. So let's start. The first Semiotic, Semiotic Labs was founded in 2020 by AI and cryptography researchers. We are focusing on applied research, and we are one of the core devs of the of the graph protocol. Also, we are developers of others, the optimal text aggregator. Actually, the lead of that product is giving a talk at the very moment on a different stage, so bad luck. Okay. So our expertise, we combine cryptography with AI. That's our core expertise, and of course, there's software engineering involved, and we focus mostly on building infrastructure. We focus on crypto-economics, today's talk is actually focusing on this, on crypto-economics. All right. So what's the scenario? So the scenario in here is that, I don't know which one of you know the graph actually. Who knows the graph protocol? All right. Okay. 50-50. Good. So basically, in this scenario, we just focus on one part of the graph system, right, of the graph protocol, but imagine that you've got some customers that are sending queries, right? And you've got indexers that are indexing blockchain and are kind of serving those queries. So between those guys, there is an entity called the gateway, which is doing, which is kind of acting as a query market, right? So depending on the price, depending on the quality of service, a given query sent by the customer would be distributed this or that indexer, and indexers earn money by serving queries, right? So they can control the prices of the service queries, and the core idea that I'm going to talk about today is basically wanted to have dynamic pricing based on query volume received by an indexer, and we call that auto-agora, and I will explain in a second why we're calling that auto-agora. So this is how we model the scenario, right? It's simplified from one point of view because, as you can see, as you can see, there are no, let's say, agents in here. We get rid of the customers for the purpose of that modding, of those simulations. Instead, we've got a traffic generator, right? Something that just says, okay, at a given time, this is the query volume that needs to be served, right? We also got query distributor. So something that basically look at the bits of the agents. So the agents are putting their bits, and then depending on those bits, the queries will be distributed across those different agents, and those bits are expressed in a domain-specific language called agora. That's why the product we call it auto-agora, right? Because it automates that bidding. Otherwise, the users will have to, the index sales have to basically create those price models manually, which is kind of a tricky, right? All right. So some assumptions, selected assumptions, what is happening in here. So this is only about how we generate that query volume. So we've got in here, so this is basically the shape that we model. So at the x-axis, there is this price, query price slash, this is a budget of a customer. That customer is saying, okay, this is how much I can pay for that query. This budget is unknown, right? That's important. It's not sharing that with the agents. It's not sharing that with the indexers. It's something that is setting on its own. So this is something that we want to discover, right? And the budget can move, right? From left to right, which means that if the indexer will set its price, its price will be higher than this. Well, if he won't be picked, right? Also, there's some kind of like a noise. We added a noise to that volume, to that query volume. And important is to know that there's a game happening, right? In here. And the game is, so agent's goal is to maximize the revenue, but gateway's goal and basically protocol goal is actually quality of service, right? We want all the queries to be served, you know, five nines and so on, right? So there's a game happening. All right. So if you know how you know that, let's now switch gears to reinforcement learning, just one-on-one most classical thing. So down there in reinforcement learning, we've got two main entities, agents and environments. Agents interacts with environment by executing an action. Then agents actions change the state of the environment. Agents get to reward and observations after... Oh, okay. And agent can up its policy based on the received reward. In short, that's reinforcement learning. So we're using different types of agents in our simulations. In here, I put two most important criterions. One are trainable reinforcement learning, online learning, and the other ones are like rule-based. So there are some predefined behaviors. And also we've got this second criterion. So agents can be stochastic or deterministic. In here, I put some simplified classification of reinforcement learning algorithms. And I highlighted the two that basically we'll be using in our simulations. So we'll be using vanilla policy gradients, and mostly most of our agents will be using PPL. Important. There is one important type of agents that actually we are mostly focusing, which we call them Gaussian bandits. So the idea that this is a trainable stochastic agent with policy that is represented with a Gaussian, and then when an agent is supposed to perform an action in something from that Gaussian, right? So imagine that there is a distribution, and when the agent is supposed to make an action, it's just something, right? And in this case, okay, the agent samples 0.8, and from that blue distribution, second agent samples 0.524, right? And why bandits? Bandit is because the agent is not building any internal model, an internal representation of the environment. All right. So going to the most important part, so testing properties with agents-based modeling. So in here, so I already kind of explained this, but in the first set of experiments, we focus on single agent simulation, just testing the properties of an agent, and this is a single Gaussian bandit with a modified PPL policy update rule. The query distributor is like a super naive, you know, let's say version of that, what could happen here is that we've got actually just, okay, depending on the price, this is just inversely proportional. And in the first step, we've got a fixed customer budget with some noise, of course, right? And what we are checking in here is the following. We've got market conditions, we've got two criterias. For every experiment, we've got some criterias, and a property that we want to test, whether the bandit is fulfilling, that that crater is operating the way we want or not. In this case, the criterion is customer budget discovery. As I've highlighted before, the agents do not know the budgets, right? But so we want to discover that. So first, so there will be lots of things happening, right? Because we are checking when the things there are in simulations. So let me try to, let's say unfold that and go step by step explaining what you will be seeing in a moment in those animations, in those videos that we captured during the experiments. So first, there are five plots, right, in a single video. First one is going to cut the most important. So what we've got in here is the, is that budget, right? The query volume plus customer budget. The solid red line is the agent's current policy. The dashed line is the initial policy that the agent is starting from. Second, in the second plot from, from the top, this is basically query volume served by the agent at a given time step. So what happened in here is agent actually didn't serve a query, right? Didn't get any queries because it's, he sampled somewhere in here, right? Like, few steps ago, he sampled some that, that the price was too high, right? So the agent didn't get any queries. So which, which also is shown in here. So this is the aggregated query volume served by, by a given agent in red and the blue ones shows basically how many queries were dropped weren't served at all, right? So you can see in here there's a tiny, tiny step, which is, which actually is associated with this. Then in the next plot, we've got agent's revenue and aggregated agent's revenue over time, right? So let's run our first, first experiment, right? So once again, what we are testing in here, what we are trying to discover is where that, that, where that, that budget is, right? So as agent is, is learning, right? So first, the Gaussian is, was super wide, right? Like it was like much, much wider. And once the agent, as you can see, it's sampling from time to time somewhere in here, right? Above the budget. So it's getting more and more sure that, okay, this is the max that I can get, right? That's the highest price. That's, that's the highest bid that I can make. So as you can see, the agent is dropping some of the queries, right? What is revenues is nicely, nicely growing? All right. So first property test, right? That was the static environment. So how about dynamic environment, right? By, by dynamic, I mean that the customer budget can change, can vary over time, right? So in this case, we are also testing the same property, whether the customer budget can be discovered by the agent, but in more difficult market conditions, let's say, right? So once again, we are starting, agent is kind of converging, capturing the, the, the right, the price, right? Which is kind of optimal. And then the price changes. So what happens is the agent is, is, is not getting, getting any queries, right? So it's saying, okay, I don't know where am I. So it's pressed the Gaussian and once the queries are once again captured, it, it moves to the left, right? So it's like, okay, I was above the budget, let me, let me go back, right? And that's happens, that is happening over time, right? So there are plenty of queries, queries dropped, but still we are, we are making some nice revenue, right? And we are reacting to the, to the dynamic, dynamic market. Cool. Second property, check. So in further experiment, we said, okay, so there are some subgraphs that actually have no queries at all, right? There is a moment where nobody is, is asking for queries. So what will happen then? So that's a different scenario. In this case, we call that the market conditions, basically, there's no demand for your services, right? So we wonder how we should add. So it will happen in a second, right? So there will be like three iterations. We are recording a few of those videos, but as you can see already, there's a hint in the right hand corner that something is not right. All right, Gaussian spread, this is great. And now times that 400, because every 200 steps were like, you know, related. Okay, so what just happened? There are no queries. So according to the design, the agents just said, okay, let me spread, right? So I can sample from wider distribution, but actually spread all across the, the, the price is domain. So we said, okay, actually, this is not a good behavior, right? We, if this is happening, then the agent can, can sample at the same probability, like super, some super small prices, price bits, and at the same super high, moreover, recover from that will take a lot of time. So we said, okay, can we improve that? So we implemented something that we called graceful init pool, which basically the idea is that, okay, there's this initial distribution that the agents is starting from. This is something that the indexer was, will parametrize the agent when, when, when, when the agent is deployed, and where, when there are no queries, it just slowly spools towards that initial distribution, right? And when the queries will appear once again, it will get from there and once again start, start something, I'm losing from time to time. The voice, I think. Okay, cool. So we passed that property. So now let's move into multi-agent simulation. So this was great. At that point, you know, we weren't, we weren't competing with anyone up till that moment, right? Now we are starting with, with some, we're competing with some deterministic agents. The idea is that right now, all the, all the indexers have that, that aggram models that is kind of fixed, right? They are not changing the models. So it can be seen as a rule base, right? So there's like, okay, that's my bit and I'm sticking with it no matter what, right? So this is what we are trying to model in here, competition with deterministic agents, and the property that we are testing is called discover of price bits of competitive agents, right? So we want to make sure that we, we also will, we can, let's say, compete with them and discover what are, what were their bits without actually knowing them, without having that explicit information. So those three lines in here, actually, those are the three, three competitive agents with deterministic policies that they are just set, and the blue agent in this case is the one, is the bandit, right? And as you can see, it, after a while, it set up like a price just below the, the, the cheapest agents, which means that it's trying to once again snatch the whole market if you will see a number of total queries, right? You can see, clearly see a difference, right? That agent is dominating, right? So I will say, okay, this is nice. Check. Cool. So how about stochastic agents? Similar experiments, we also want to discover the, the, the price bits of other agents, but their policies are stochastic, so we model them in a similar way as Gaussians, as Gaussians distributions, right? But they do not change, right? So we observe similar behavior, this is great, right? So the agent, our agent, right? Our bandit moves somewhere in here, and of course it's something, you know, the red, the red, the red agent stochastic, heuristic agent is also something from that distribution, so it's changing. But as we, we can see the agent can come, moved slightly below and trying once again to capture the market, right? All right, great. I think it's, we check that property. Awesome. So how about if we will right now compete with, like, make a competition where we've got plenty of those bandits deployed? And we hope that actually this is the most interesting, that's the most interesting case, right? Because we want autogoura to be run by all the indexers in the graph ecosystem. So same property, slightly different market conditions. All right, so what is happening in here? Sadly, it's not good. It's not good. So the agents are acting actually as designed, but as you can see, they're like fighting, fighting, fighting, going, moving, moving to the left-hand side, which means that they're lowering their prices all the time, right? And as you can see in here, oh, you can see the labels, but this is, this is the revenue that they're getting, right? At a given time step, basically it's going, it's converging to zero, which means that after a while, the agents will be basically serving for free, right? And so that's not a good situation for the indexers, right? Because in here, we're not even modeling the, we are modeling the revenue, not the, not the, we are not modeling the costs, right? So at that point, agents will be, well, indexers will be actually, well, paying for that instead of making money. All right, so we call this phenomenon or this outcome raised to the bottom and we run many simulations, we discuss that a lot. And actually we realize that, well, this is the expected behavior, right? In this setup. So we've got some agents that are basically trying to snatch the market, right? And they're purely driven by, by the, by the query volume, right? So if this is the case, and the environment just natively distributes, you know, the queries based on the price bit, raised to the bottom is the expected, is the expected outcome, right? So it can be addressed in many ways, but we look at the graph protocol, what are the features, right? And, and down there, and what are the assumptions? So one assumption is that all the indexers should have freedom with their pricing, right? There should be limitations to that. If they can, if they want to serve queries super, that are super expensive or super cheap, it's okay. And all the indexers should be able to make any profit. So the conclusion was that actually is the gateway, right? That the thing that controls the market should implement this kind of anti-dominant policy. And so happened, it does. It already does. So in our next step, what we did, we wrapped the existing ISA, right? ISA's indexer selection algorithm is one of the components of the, of the gateway, right? So we just wrapped it in here and using the simulation, we are once again testing the same conditions, right? Competing, you know, many Gavshans, many Gavshans band is competing with each other and the property, okay, can we discover each other prices and maximize our profits? So the outcome is totally different, right? We, by changing, changing the way the, the queries are distributed, actually reach a consensus, right? So that all the agents are discovering the market, right? What was the market? At the same gateway is looking, the ISA is looking at their, their quality of service at their prices and trying to kind of feed everyone, right? Not just, okay, you're the winner, you're, you're snatching all the queries, right? So this is really nice. So please note that the gateway also leaves some kind of budget, right? But it's designed such a way that actually it's, it's not enabling the agents to, to consume all the budgets, right? So the budget is, there is a portion that is left to the, to the, let's say, to the customer. Okay, so we're on some more simulations, of course, it's like, okay, free, can we run that with 10, 20 agents, right? Will that still work? It seems, yes, it does. So we are testing the same properties, right? We are just a little bit changing the market conditions, right? This is looking really good. If you look at, look at the query surf and the revenue, it's like, all the agents are, you know, are, are making, making money, which is the desired, desired behavior, right? And they're rediscovering the market. Cool. So what we also did, we, we start, we wanted to see whether, okay, so how about different initial conditions, right? So we are assuming that different, different index, okay, that's my mean price. That's my variance. I want to maybe move a bit more expensive or less expensive at the beginning. Wanted to see whether the, the, the system will kind of converge to that equilibrium. And it seems that this is really happening, right? So disregarding the initial conditions. This is good. We also run some additional experiments with, with recurrent stronger policy updates. So this one is our modified PPO agent. As you can see, it's much faster than, okay, the red agent is the vanilla policy gradients, like elementary policy, elementary update rule. And the other one is just pure, pure PPO, right? So our agent is kind of dominating and it's kind of interesting to observe that it's serving less queries, right? The blue or Cn line is clearly below, and this is the number of queries that are served, but it's making bigger revenue, right? So in short, it's making less work for more money. So policy really matters, right? All right. So having those results, we are quite, quite happy with, with, with those. We said, okay, so can we deploy that in production? And actually that's our solution, you know, on a battlefield. So once again, there's there are many, there are many things to unfold in here. So let me try to focus on one thing at a time, right? So first, this is the mean. So this is kind of where the Gaussian is on that x-axis, right? So what happens is that the initial value was too high. So the mean was going down first, but then steadily it was going up, up, up. Good. So it was moving to the right. At the same time, variance was steadily going down, which means that the Gaussian was more and more narrower, which means that agents were like, okay, I think I'm there. I'm good, right? I'm more confident that I should sample from that our smaller distribution. So the outcome is that the reward goes up. Maybe you cannot see it in here like super nicely, but that's the revenue, right? That's the total revenue and you can clearly see that it's, it's going up. There's this tendency, right? So we deploy that. The results were great. So I think we are, well, this is the month where I should start telling, okay, if you're an indexer, please go ahead. There is a repository to download, play with it. And we, there are some additional materials. Okay, but you know, actually in a second. Okay, so let me first summarize. So agent-based modeling for crypto economics, that was the core of that talk in here is focused, we focused on the dynamic pricing for the applied to the graph protocol to, to automated price discovery. And we use reinforcement learning and multiple multi-agent simulation, multi-agent modeling for, for revenue maximization. So we can show how to use that for testing the properties of the protocol, right? We came up with that framework that, okay, those are the market conditions. Those are the properties and systematically started test, right? Our agents are solutions against, against those. And finally, we have deployed autoguera in, in, in production and have shown that actually it makes sense. So future works, of course, better policy, better updates, better policies, agents with multiple rewards of taking quality of service into account. We are not doing that right now. Just, we are just looking at, at the, at the query volume. Modeling and putting consumer agents into play, right? So that right hand side that we simplified actually putting that back in the simulation, it will give us some more insight in what is happening, right? So that's kind of, that will give us simulator with, with higher fidelity. And kind of redesigning the game, right? So in this case, there was no information that agents were like, acting totally independently. The question is, if you will redesign the game, redesign a little bit the graph protocol, would that kind of perfect information will help, right? There are some, here are some additional resources. So my friend Alex is he, there is, he wrote this blog post which describes more the technical side of deploying auto auger in production. He also gave a talk around this, about this during the last episode in June, I think. So there's a recording on YouTube. And of course, this is open source. So I encourage you, go ahead, play with it, right? All those visualizations are there. Basically, you can reproduce that and try to deploy your, your, your auto auger in production, right? Finally, I wanted to highlight some of the, of the other works that Samotic is doing. So there's talk actually that's happening right now that, that our friend Matt is giving the product and research leader photos, he's giving it right now. So that's, that's, that's said. And savvy is giving a talk today about our work and let's say our, our, our, our, our trip in the, to the Snarks world. And we are focusing on, on, on, this is more like cryptography. We are focusing on verifiable payments for the graph protocol as well. So if you're interested in those, those, those topics, please come to us. We've got plenty of positions open. We, if you're interested, please go ahead. Thank you for your attention. Excuse me, there's a gentleman who wants to be back. Hello, and thank you for the talk. I was wondering if the customer has a very, very low budget. Will you try to match it and make a loss or will you shut down? That's a great question. Actually, I wasn't showing this in here, right? Because it's shaking. Like I said, the goal of graph is to have five nights. So right now we are working with, with the guys that are, you know, implementing the gateway updating that a little bit because for now, if the customer will say, okay, I want that for basically free, the gateway will still try to distribute that. Right? It's not the best outcome, I would say, for the index. Right? But we are changing that. We are, we've got that tool right now, right? So you can think about, I was showing you that now we're working in this work, we're focusing on, on the agents, right? But we developed a tool that enables us to test the properties and the outcome of the whole thing, right? When there are different players in the game, including gateway. So great question. It's a tricky, right? It's a tricky thing. What do we want to achieve? We want the indexer to make money or we want five nights, right? The quality of service to be super high. I have another question. Sorry. Maybe more technical. When you showed the experiments, whenever the budget of the client was going down, the spread of the ocean was, was widening, but the mean was staying the same, which means that you are still sampling to the right side where you know it's not the right price. Would it be more effective to move the ocean to the left as you spread it? Yeah. So what's the question actually? Because you just repeated the behavior. That's the design behavior, right? If we overshoot, if we overshoot with the price, then we don't know what the reason was. There were no queries or we were too expensive, right? So we spread because we want to sample from wider distribution, right? All right. If there are any additional questions I'm here, I'd be delighted to answer your questions, but it seems that my time is up. So thank you for your attention.