 Linux box, and this is my laptop broke two days ago. All right, good. So I'm going to try to focus on Python and TensorFlow here. But really the context of what we're doing, I sort of haven't spent a few minutes on that just so it makes a little more sense and might be more interesting. My company, Prediction Machines, is doing deep learning and predictions in transactional markets. So that includes financial markets, that includes energy markets, that includes digital advertising. So there's a lot of transactional markets where people are exchanging something of value in an exchange. There's buyers and sellers, and sometimes they intersect, and there's an execution. So that's the kind of stuff we're working with. Let me move relatively quickly. Thank you to the Python group and thank you to the desk. So traditionally in financial markets, people are doing statistical work that involves a lot of micromanagement of signals and features. And it's a lot of work, labor intensive. And these machine learning systems are very played out now, so markets are generally efficient to it. So there's not a lot of profits being made. But what we see is certain traders can continually make good profits in markets. And the strategies that they're employing are a lot more complex than what the traditional machine learning strategies are. And that complexity means that so far people haven't really been automating it because there's some magical intuition that certain traders have about the markets. They can look at the data, somehow they collect that data, and it's almost like a black box. They're like, OK, I know. I need to buy and sell. So what we're doing with deep learning is learning these complex strategies, which is one level above what is traditionally done, which is just statistical predictions. So we took inspiration from what DeepMind did, which was purchased by Google. What they did is, back in 2015, they built a neural network and had a few additional features that really put it through to make it effective. And the interesting thing is they taught it to learn how to play Atari. And they didn't actually tell it the rules of the game. So all it was doing was looking at the screen, like visually, and moving the paddle randomly. And it just figured out, at some point, how to play the game, how to score high. And most of the results are much better than what he was going to do. So we took this inspiration and decided maybe we should model the markets like a game and use some of these results from DeepMind and go beyond that. So we implemented deep learning methods, these neural networks, and played the transactional market space as a game. So again, the comparison between traditional NL and deep learning is, deep learning is on the left. Basically, a kid doesn't really know the physics of the bicycle. They just get on it. They try something. If they do well, they get a reward. If they fail, they will try again and do something differently. Whereas on the right, you have to model the bicycle physics. And it's a bunch of math. And it takes a lot of people and a lot of cost. And you can't actually take that and apply that model to a bigger bicycle anymore. But the kid can get on a bigger bicycle and ride it just as well. So in reinforcement learning, there's always a feedback loop where you have an environment that has a state. That state could be, for example, the price of the stock, or it could be the volatility of some index function or something. And you have an agent that's acting on that environment. And so what happens is the agent will observe the environment. It will make a decision about what action to take. That action will influence the environment. And the environment will then be observed by the agent. So you've got this feedback loop. And the reinforcement learning part is the reward. So when you play a game in this environment, you'll get a reward or a penalty. And that's the feedback that you get. There's examples online. This is a JavaScript example. You can look up Parthi. So GridWorld shows you, in reinforcement learning, there's sort of three key things. There's a value function, which tells you how good is this state that you're in. There's a policy, which tells you what action should I take given the state. And then there's a reward, which is given by the environment. I'm moving very fast because I want to show you code. So we created a game where we discretize, like in this example, it's a Parish trait. So it's a spread between two different assets. And they are correlated in some fashion. So what you expect is, as one asset moves, the other one will move in a predictable manner. But it's not always going to stay at the same. So you get a mean reverting instrument. And this doesn't just apply to finance. It applies to many different transactional markets. So we discretized it into a lattice. And this lattice is a state machine that will move based on time, which is the stock market moving. But also as you buy or sell, you're affecting the state of the machine also. So that's the feedback part. I just want to get into code as quick as I can. On our blog, we have a demonstration of this game done in JavaScript. So you can play it. You can choose when to buy yourself. So that just gives you a good example of how we're doing it. So once you formulate the game, then you have to build a brain that's going to actually learn how to play it. So this example code here is available. We've open-sourced this code. It's called Trading Gym. Trading Gym is modeled after the OpenAI Gym, which is Elon Musk endeavor. So this is an open-sourced project where we've implemented the trading game and the data generators so we can simulate data, or we can plug in actual market data. And in this example, we have a Keras implementation, which will learn how to play that trading game. And this is all open-sourced. So Keras is higher-bundled than TensorFlow. So this example code here is 10 lines, and it's an entire network that you see on the left there. So it's quite simple to use Keras. My demo is doing TensorFlow, and I'll tell you why I'm showing TensorFlow around Keras. So in TensorFlow, the demo I'm giving is this network, which is more complex than that previous one. The reason is that DeepMind has made some improvements since 2015, and I'm implementing some of the current state-of-the-art system. And the cool thing is the code I'm demoing is also available on our GitHub for you guys. So you can also play with the more complex stuff that I'm going to show. So this is the architecture that we have up on GitHub. The trading gym in that box is open-sourced, and it has two main modules, a data generator and an environment. So this is sort of plug-and-play. You can use any environment with any data generator. And also, it's quite easy to write your own data generator and environment. And that's why I want people to port the code and actually write some of that themselves and contribute. So we have example data generators that are random, that are deterministic. You can use a CSV version, which lets you just plug whatever data you have personally into it. Or we have not open-sourced that we can plug in actual market data. And then the environment has different transactional models. So there's single-asset, multi-asset, market-making models, et cetera. And the API is very simple. The data generator just has a next, which gives you the next value. And the environment has a step, which, given an action, it will give you a reward and the next state. So basically two API functions you have to implement very basic and easy to use. And then all the other stuff I have there is in the trading brain. And here we have sort of a recommended architecture. So you have an agent that is acting and observing the environment. And then that agent has a claim that it doesn't even have to be a deep network. It could be any learning algorithm. And usually there's a memory also. So let me bounce very quick to the other slides. And then I'll be ready to show you the code. So the reason I did this is this will be available tomorrow if you go to our blog, so you can download these PDFs. And you can read the papers because without explaining anything, I can show you the code and I can explain what that code's doing, but you won't understand why. And you have to read the papers to understand why. More details about what's in those papers. I'll skip that. And here's just a high-level comparison. Keras is a, it's on top of TensorFlow. TensorFlow is a low-level. And Keras is a much higher level. The main reason I like TensorFlow is that when I read the papers, they have all the math in there and the derivations. And if you don't understand that very well and you're just using Keras, then you'll get stuck with more complicated problems. So if you use TensorFlow, you can actually directly program in that math, like almost line-for-line. So it's much closer to what you're reading in the papers, whereas Keras sort of hides everything from it. So I read everything on Linux. CentOS was a little harder to install. I found one good blog that's one version previous of CentOS for how to install. And then Ubuntu is quite easy because Google sort of supports that directly. One thing is that I did build from source. And that's because if you don't, you don't get all of the compiler optimizations in from the hardware. So I think it's important to get the source code for all of the libraries and packages and build it yourself. But you don't have to. It's just going to run a lot slower if you're running it from the binaries. So people don't really know what TensorFlow is. If you took computer science, you'd know graph theory. And it's basically TensorFlow is just a graph in limitation. There's nothing specific about it that makes it deep learning. So the reason it works well for deep learning though is that neural networks are represented mathematically as a graph. So you've got neurons, nodes. They're in layers and they're connected to each other. So you've got nodes and edges. It looks like this. TensorFlow will automatically compute the computational graph, which means you can throw data at the inputs and you can request any of the output nodes. And TensorFlow figures out the optimal way to compute that for you. So generally with TensorFlow, you create your graph first. And then later, you can just run it by giving it inputs and requesting outputs. And you don't have to worry about what's going on inside. The more complicated thing is the gradients that TensorFlow is computing. There's a lot of optimization necessary for doing it well. And TensorFlow handles all of that optimization. So that's all nice, magical, internal. So basically the takeaway from that is TensorFlow is a computational graph solver. And it's not something that's specific to deep learning. So I'll leave it on this slide because I'm going to bounce to the code now. But this slide is going to show you some of the core functions that TensorFlow uses. And I'll show you the code where I'm using it. So in TensorFlow, you often use namespaces because you can actually produce graphs and understand what's happening very well with TensorFlow, which is a visualization tool that they provide. You can create multiple sessions if you want. And so for example, if you're parallelizing the algorithm, you can have multiple sessions that interact with each other that have their own graphs. Or you can just use one session, which is what I'm doing in the demo. Furthermore, you have to define the graph. So to do that, you have variables which are placeholders or they're actual variables. So placeholders are sort of you're defining a slot in memory that is not occupied at the moment. And at some point, to be able to make a computation, you have to put something in that placeholder. And then you have variables which are occupied or something. Often these are internal. So you have maybe an input node that's a placeholder. And then you have variables internally. And then you have an output node. And the variables internally will get computed automatically. Or you can request that value if you want. Whereas the placeholders, you need to externally put something into it. And then to actually run the computation, you have two options. There's a run, which is under the session. And then there's any given variable you can call eval.on to evaluate it. So in either case, whatever you're trying to get as an output, you have to give it the necessary input. So if one node in your graph requires two inputs, then to be able to call eval on that node, you have to give it both of those inputs that it needs. The same with the run. Run allows you to run multiple variables in parallel, whereas eval can only do one at a time. And so my top example is just an eval on the output, which is q underscore actual. And the input here, which is s underscore t. And then the run, I'm calculating two values, q and loss. And I'm giving it two inputs. So let me. All right. So this part is going to be interesting. I'm going to try this in front of people. I have my code in front of me. And I also have something that's helpful, which is TensorFlow. So TensorFlow, do you see it's running locally? And it's reading a log folder that I've generated from the previous run. So if you look at graphs, we can start out by just looking at the structure of my code. And then I can actually show you the code. So we have six different namespaces here. We have step, prediction, target, optimizer, pred to target, and summary. Step is simply a counter that just tells you what step they're running on currently. It's needed for TensorFlow because when you do a graph, you have a x-axis, which is the time or the number of ticks. And in my case, that's defined by step. So that's not actually part of the learning system. You see it's not even connected to anything else. We have a prediction network, which is a deep q network. This is, in my case, it's a picture on the back of the shirt. That's what it looks like. So it's got input nodes, hidden layers, output nodes. And it's a little more complex than that. And then there's a target network, which is actually a copy of the prediction network. And the reason I did this is back in 2015, this is one of the three things that DeepMind figured out, which actually helps get good results. So the target network is a clone of the prediction network. And only once every x number of steps do we copy that prediction network into the target network. And we actually use the target network to compute the next state and the reward. And the reason you do that is because the prediction network, on every update, it changes. Whereas the target network is a little more stable and that basically, I just said the word, it makes it more stable. This is what actually does the copying. So it's just an operator that will take all of the weights and biases from the prediction network and copy them into the target network. It's very easy because they're the same architecture, so you don't have to do any weird magic in there. And then there's an optimizer, which does the actual learning. Machine learning, the most common method is stochastic gradient descent. So I'm using something different, which is called RMS prop. And nobody knows what prop means. It's a mystery. But RMS means root mean square. And it's similar to stochastic gradient descent, but it's not. So I'm not gonna double click yet because if I do, you'll freak out. So let me show you some code. So this is my brain, which is where all of my sources was inside of. From that architecture I showed you before, here, we're just looking at this part brain. There's also an agent in this code. There's a memory, there's a runner, there's everything else is there that I'm just gonna show you the brain and we'll see how my time goes. So the brain has these variable scopes, like I said, and these are gonna match those six things that we saw in TensorFlow. So you start with a width and it's nice because as soon as that block terminates, you're outside of that and you don't have to keep track of which context you're in. So I'm naming a prediction. I'm creating my prediction network here. And then I have output of that prediction network. And I'll show you that in a moment. If we go down, you see I have a target network and it's actually the same network as a prediction network. I just named it differently. So it's the same architecture. And then we go down, we see that we have the prediction to target section. So let me explain this create net. And then I'll go back up. So here I'm defining the architecture of the network and remember this is in sign and name space that I've defined in the parent function. First we create a placeholder and this placeholder is my input and in the network. That input has to come from outside the network because it's coming from my environment. So that's why it's a placeholder and not just a variable. In TensorFlow, you define the shape of the variables most often it's gonna be a matrix. And can anyone guess what a more than two-dimensional matrix is called? Yeah, sensor, yeah. So that's the official, that's why they call it TensorFlow. So this is a three-dimensional matrix. What's interesting is that the first entry is of size none. Second entry, second one is an integer, third one is an integer, this is none. So what's cool is let me go to TensorFlow. Let's look inside prediction and we see my input is of size question mark by one by two. So the none actually means it's a variable size and that's necessary because we do batch learning which means you could choose to have it calculate for one observation or you could throw in a batch size of 32 or 64 and you can learn much faster that way and much better because we use experiential replay which means we actually take 64, in this case 64, experiences from the past that we've observed and we're plugging all this whole matrix into the system at once and we get a batch learning output from that. So let me go back to the code. Right, so that's what the none means and then we give it a name and the next thing I do is I flatten it because what you want that's going into your hidden layer is a vector of one initial vector. So I flatten it and then I have my first hidden layer, L1, level one and in TensorFlow you define it by giving it the input which is the previous layer. You give it the number of nodes that you want and you give it an activation function. Let me scroll down and show you that actual function. Yeah, so this is the network, the layer that I defined. You see I have an input, I have an output dimension which is the number of nodes inside that layer and an activation function. And so in this case I have a nested variable scope. So you know that the parent function has a variable scope called prediction. Here I've got a variable scope which is the name of this layer, L1. And I define two variables inside there, weights and biases. And then I connect them all together using this mathematical function which is a matrix multiply combined with an addition of the biases. So this is actually what you call a fully connected layer, quite standard. And the reason that I'm storing everything as I am because you don't have to store it internally. You can just create the computational graph and not actually save the variables. It'll remember when you create them. I do that because later in a tensor board I can observe all of those variables. So one thing about deep learning is it can be very black box. What I've done here is I've tried to expose all of the internals through tensor board but also I've got quite a bit of code that helps expose more of that that's not part of tensor flow. Like I manually produce certain values and I plot them separately. So I'll go back up again. Right, so we have this layer L1 and we have an option now. Either it's a dueling network or it's not. The dueling network is more complicated. It's got four different layers whereas the non-dueling is just one more hidden layer and an output. The dueling network is a little harder to explain since those papers I posted. But the pattern is the same. You define a new layer. So you define a new layer, give it an input layer and an output size and an activation function. So that's the pattern I'm following. Once you learn a few key functions in tensor flow you can build whatever architecture you want quite easily. So I will continue back up here again. Assuming that we understand how we built this architecture. The next thing is this copy function. So in tensor flow you can define operators on variables. They actually name the operators rather than I guess in Python it's a little harder to do operator overloading. You can do it but I guess they didn't want to do it because non C++ programmers might be a little bit thrown off by operator overloading potentially. I'm not sure exactly why but this could be an equal sign but they made it dot assign. So what we're doing is we're assigning to this target network this variable which I defined in previous line as a placeholder which has the same shape as the target. And later on I actually will execute this to copy the network. So this is, because it's based on a placeholder it's just sitting there waiting to be executed. It's in the network but it's not being run until I actually call it. And I think as I said like only once every so often do I call it and that's because I want the target network to be more stable than the prediction network. Now to read this optimizer code I recommend going from bottom to top. And that might apply to quite a bit of what you're doing with TensorFlow. Either you think in terms of what's the input and or you think in terms of what is my target what do I want out of it. So starting at the bottom which you could do with the architecture like I want the result to be this, how do I get that. The same with the optimization. So I'm gonna start with the target which is I want to do this optimizer to train my network. So how do I do that? The optimizer requires a loss function. What is my loss function? Let's go up one line. Well a little aside, this line of code here is just to decay the learning rate exponentially. That's something you want to do when you're optimizing most of the time because as it's learning you want it to sort of settle in on the zone that's good. And if you don't decay the learning rate and it stays steady then as it gets better it's gonna continue to sort of wildly explore and that's not gonna give you good results. So this is just an exponential decay learning rate which you use which can be it's an optional input into this optimizer. So we can go ahead and here I have my loss function which is what I feed the optimizer. The loss function, so the concept is pretty simple. Your network is gonna produce a certain output and then it's gonna observe in the world that actually the real output I got was different. Like it's not saying that I expected. Basically you just take the difference between those two things, usually square it and take square root. And that's called a mean square error and that's what the loss is gonna be. So I largely did that except one additional benefit is to use this super function which is similar to MSE but it's a little more clipped so it's not, yeah it's sort of like a clipped mean square error. So that's what my loss is and as I said it's on the error. So my error I define here as the difference between the target which is what did I want and what did I actually calculate which was not correct. So I compute that here and the way I do that is my target is a placeholder and that's because it's coming from outside but my, and also my actual queue is computed previously and stored locally here. So to be able to call this optimization function you can tell that the thing I have to do is give it the target queue because that's required to compute this and I also have to have my current queue pre-concuted. So those are the two things you have to have to be able to learn with this network. I can go back to TensorBoard and we can look at this optimizer and you can see various things here. So it's using RMS prop, it has a magic box of gradients that it's figuring out for you, very complex. The same with this whole graph which is computing on its own. And I have an exponential decay variable that I'm applying and you see that the prediction network is what feeds into it. And also you have the action which is used for calculating the actual queue value. One thing I had to mention was what is queue? Queue is this magic thing which tells you how good is my policy. So given a state that you're in, what is the value of the various decisions that I could potentially make? And so it's a big matrix that has all of the states, all of the potential actions, fill out that matrix with numbers that represent how good or bad this is. So that's a queue matrix and that's why we're calling it a dequeue. It's a deep queue network. So let me go back to the code. So that's really building the network and I'm done building it. The next thing to do is actually to run with it. So the brain has two main functions for running. It has predict right here which we give it a state and it gives you an action. And that action is gonna be based on its queue values. So you see here that my action is equal to this queue underscore action E value. So what is queue action? We can go back up to the top real quick. I hope you don't get dizzy. My queue action is defined as the arg max of my queue. So the network outputs three values in my case. One of them is should I buy? One is should I sell? And one is should I do nothing? And the action that we're choosing is the maximum of those three values. So that's what I'm doing here. So I'll go back down. So that's all I do. It's one line. Action equals queue action dot E value. And that's all the forward paths. So in the network, you give it an input which in my case is the states. That's my input. And you get an output which is the action. And then we have a train function which is the backwards paths. That's where we do our learning. So for train we're running, as I said, we have to first calculate the queue value and then using that and also the target we can compute, we can actually do the training. So here I'm running multiple things. I'm running in parallel. I'm running the queue function. I'm running the loss function just because I want to display it, not because I have to. And I'm running the optimization which is the core thing. So I can run all of those in parallel and these are the four inputs necessary to be able to do that. So in the API design, you want to make it such that the minimum things you need to be able to run something are what you give it. You don't want to mess up the API or making it too complex. So in my case, all I need is the state, the target, the action, and the step because I'm exponentially decaying my learning rate as I progress. So I need to know what stage of the game I'm in. So that's all you need. I'm outputting the result to TensorFlow using a writer that I create. So TensorFlow has a writer API that lets you, it's very rich. You can define whatever stuff you want to show in TensorFlow. The other core thing which is part of reinforcement learning is this. It's called the famous Bellman equation. It's in the papers. This is some of the math and one difference again between TensorFlow and Keras is that in TensorFlow, I am actually writing the math out and each line of that math is closely matching what you would be in the papers. Whereas in Keras, they will hide all of that from you and you won't actually do it yourself which is good if you're not a researcher and you only want some results but if you're trying to actually develop new stuff then I think TensorFlow is easier than that. So what I'm doing with the Bellman equation is it's very simple concept. We take an action now in a video game, let's say I'm playing home and I'm moving the paddle left or right. The ball right now is maybe one second away from actually hitting the paddle. At this moment when I move the paddle, it doesn't change the score but if I don't move the paddle now in one second when the ball comes may the paddle is too far away and I lose the game. So generally in games when you take an action you get a reward not necessarily immediately. You get it later. So what the Bellman equation does is it basically says the immediate reward here is added to a discounted future reward. And the concept there is that maybe you want a cake. It's probably better to eat the cake immediately if it's available. A cake five days from now is probably worth a little less from now because it doesn't taste so good. So the reward will decay over time. So that's what this discount factor is. It's called the Bellman equation. And it's relatively simple here. If you're not doing a double Q network, if you're doing a double Q network it's a little more complex. It involves making an action with one of the networks and evaluating the value of that action with the other network. So you're actually combining the two networks outputs to produce the final result whereas with a non-double Q network you're using the same network to take the action as you are to evaluate the value of the action. And it turns out that double Q gets better results and so that's how I've implemented that here. So if we look at the agent, the agent also has a predict function and the predict function most of the time is just gonna call the brains predict function. Additionally, most reinforcement learning agents will have an exploratory period at the beginning where they make random choices. And the reason you do that is because at the beginning you don't know anything, it's like that kid on a bicycle. They don't know what they're doing. So you start out by making random choices and you learn from that. And as time goes on, in my case it's an exponential decay. As time goes on you take less and less random choices and you start to actually listen to your brain. So that's what I'm doing here in the predict function for the agent. Additionally, the agent will have this observe function. So after we take an action, the environment will produce a new state from that action and also a reward. And that's what's fed into this observe because the agent is observing the results of what they've done. So in doing that, we have to remember what we did because later on we're gonna actually learn from our memories and we have two things we do. And neither of them necessarily are every single iteration. You can define how frequently you do these things. So the first thing you can do, which in my case I do every time, you could change it to be every other time or every fifth time, I actually do the learning. And that's the thing that takes up a lot of time in TensorFlow. That's where you're computing all of the derivatives and that's the heavy map. So if we look at that function, we see that we're using the brain. Well, first we have to sample the memory. So we're taking a batch size of 64 in my case and we are sampling 64 past experiences. And then we're calculating, using the brain, using the Bellman equation, we're calculating the target Q values, which are what should those values be according to the Bellman equation. And then we call the train function, which takes the state and the target and the action. Those are the three main ones. And that's where we actually do the gradient descent and learn. So then the other one here is very basic. That's just update target Q network. So I named it that because that's what it does. It takes the prediction network and it copies it to the target network. So, how am I doing for time? I'm almost done. All right, so that's the agent. Now finally we have the main, the runner. The runner has just a simple for loop. That for loop here goes for the maximum number of steps I've told it to do. And in that loop, we have these beautiful three functions predict, which takes the current state of the environment and it makes an action on it using the agent. And then the environment does a step, which you give it the action that the agent took and then it computes the next state and the reward. Additionally, it computes whether or not we're finished with the game. You need to know that. And then the agent will observe the result, the fruits of its labor. And that's where it learns from its experiences. So these three things are really the core of that loop. Everything else I have is boilerplate stuff for statistics management, like we get to see. Like I said, I tried to make it not a black box. So you actually see what's happening internally. So this is, one thing it does is it outputs pickle files which contain rich data sets. And then it also outputs a CSV file, which you can analyze in MATLAB if you want or any other of your utilities. Additionally, whenever the thing does really well, so hey, I've done a lot better than what I used to do. It goes ahead and saves the model to disk. So later I can rerun this thing with a minus L flag and it's gonna load the model that it saved previously and run from there. So that's my runner, the main loop. And I can now show you a couple demos. A couple demos real quick and then I'll sum up. So one demo is gonna show the trading gem which actually has a really nice visualization which I did not implement with TensorFlow version. So let me run that. That's my Keras example. Let's think about this. Okay, so people think deep learning takes a really long time and it does if you're doing image recognition or speech recognition because the number of inputs are gonna be massive. In my case, I only have a handful of inputs and my hidden layers are of size 24. There's two of them. It's a pretty basic network. So I can actually train it in real time in front of everybody. The loss is gonna decrease because that's what I'm optimizing on. EPS is the exploration rate and the reward is how much money we're making. So this is the train network. It's running on simulated data and the red arrows are a cell and the green arrows are bi. It gets penalized if it buys multiple times in a row. It gets penalized if it sells multiple times in a row. So the optimal result would be one by one cell, the top and bottom. So this is not optimal. That's because I trained it in less than a minute. And you see that it's making a bunch of money and there was nobody that told it the rules of the game. So that's the cool thing. We didn't tell it how to make money. The environment just gave me feedback says this is good, this is bad and it learns. So that's a nice visualization. This is available in our open source training room. The other one is the TensorFlow that I was walking you through. It's structured the exact same, but the network is more complex. Before I run it, let me show you what it outputs in addition to TensorFlow. So I just did some math.lib stuff to plot some of those pickled data sets. So this was the run I did earlier today. You can see that the reward, which is green, climbs up until eight, which is not optimum, 10 is optimum in this case. And you can see that the fee in post drops to zero. The penalty, like I said, was if you buy when you've already bought or if you've sold when you've already sold. So that's one plot. The length of the game is optimum at four because my lattice that I drew had five minutes. And so if we buy at the bottom and sell at the top, that's one, two, three, four, like the length is four. So it's learned that buy a top, buy a bottom and sell a top. And then these are the Q values. So the Q values are very informative but they only work for small networks. If it's too big, then it's already hard to read. In my case, white means that our current state, like our position in the market is flat. And red means that we're short and green means we're long. So this is when we're at the top of the market. So when you're at the top of the market, we expect the best thing to do is to sell. So that's actually what we have here with the dot dash line is sell. So that's the best thing to do. That's what we learned in the Q function. The worst thing to do is to buy if we're already long because we're at the top. So that's bad idea. So you see that that's a negative value and this is the maximum. So this looks good. This is no longer a black box. We actually see that it's learned a strategy. It's learned, I need to sell at the top and I need to buy at the bottom. And then you can also look at the similar chart when we're at the bottom. So this one is when we're at the bottom. When we're at the bottom, you see that the maximum value thing to do is to buy if we're already short. That's the solid red line. The worst thing to do is this green line, which is to sell if we are long, which made sense. So visualizing the Q values is only possible with small networks, but I mean, the code is there. It'll run and it'll give you thousands of data points if you wanted to. It's only able to be visualized because it's a small network. Otherwise, you have to rely on TensorBoard. So in TensorBoard, you can look at the variables that you've output. So I have my summary that I've chosen to output myself. So in this case, we can see the average Q value increasing. We can see the reward increasing and towards the end it settles in with it and the average reward gets up to 10, which is optimum. That's the most you can get. And you can see that some of these graphs don't look that great. This is chaotic at the beginning and then it actually learns pretty quickly or you can say it gets stuck at a local minimum. So what you can do with TensorBoard is you can do multiple runs and you can compare them on top of each other. So let me do that. So let's change one of the values. So I have a configuration file. This is the Bellman equation discount. So in the future, how valuable is this? I use 0.95, change that to 0.99 because a lot of the literature uses that, not because anybody figured out mathematically and that's best because it's a bit of an art. Let me run this. So I've done, it looks like I typed something. Live demo. All right, good. Okay, so the first thing we see is the queue table that's output. Now we start out in random weights. So these things are basically, who knows what they're gonna be, but it's random, it starts out random. If we're short, then randomly it decides to sell all the time. So this is when we're at the bottom of the graph and this is when we're at the top. So you can see these queue values are just random stuff. So now we see it's learning. The things to notice are the average reward. Generally you want that to go up. You see the exploration rate, which is here. So 25% of the time we're making random decisions. Now we're down to 16% of the time. You see the maximum reward, minimum reward. You would want this minimum reward optimally to become 10, because then it's doing the best thing it can do every single time. It's up to eight, it's not bad. It's almost done, 90%. So the average reward is moving up pretty well and it's done, there it is. Okay, so it finished. We can see the queue values. When we're short, it learned that at the bottom it should buy and otherwise it should do nothing. If we're long, it learned that at the top we should sell, otherwise we should do nothing. And if we're flat, it learns at the bottom to buy and at the top to sell. So that's optimum, that's the perfect result. So now we can go back to tensorboard and let's see if it updated. Otherwise I'll just kill it and restart it. There it is. So now we can make a comparison of what happened when I changed that discovery. You can hover over it and it'll tell you like which is which. But I think I'm running out of time. So this is great because now I can make comparisons and I can turn the art into a little bit of a more science by producing all these plots and showing the results. And so that was my demo. Let's see what else. I think that's pretty much it. Let's see what else we got. So our GitHub is at prediction machines, prediction-machines, and that's because our URL is also prediction-machines. And we have this creating gym and then we also have the brain. And a lot of the brain was inspired by this GitHub example called by people called dead sisters. And then there's our GitHub account. And we're hiring. So we're looking for Python developers or C++ developers. So afterwards, I've got business cards. I'd like to talk to whoever's interested in coding some Python. You don't have to be good at machine learning actually because we've got plenty of stuff for doing. So I guess any questions now? Yes. What do you want to do when you build a real-time machine from the start? So you want to test your data because the quality of your data is all right. Right. And then you need to find digital and MSU to find the MSU, right? Yeah. I was working on an open-village automated data. So I think that anyone can come to the machine. Right. Yeah. Real estate. Real estate, right? Yeah, yeah. My problem was, you know, whenever I was running this, it took me eight hours to model this one. You know, one small data set, right? Yeah. The problem was that every time I found there is elastic net and random forest was producing better results than the deep learning. Yeah. And I used Keras. I don't know the internal magic, I don't know the TF function. So I'm not that I go deep into there. I just used the Keras library. Yeah. We created deeper and valuable body. Yeah. And both of them were exterior to elastic net and random forest. Yeah. So is this, depending on the choice of new networks or? I think a lot of it, well, so for us, we wanted to make sure that we were actually modeling our environment as close as we could to what the Atari's, Atari game for normal ass, because we were taking inspiration from the research that D-Mind did on Atari. So the domain needs to be similar enough to the research that you're applying. And if it's not similar enough, then it might not be the right tool. The second problem is categorical data. Yes. Yes. So once at a time when you get a feature data, it's mostly categorical, right? And then you need to convert it into a metrics or you can use some Pandas function to do that. Yeah. And the predictions were very way off when I used categorical data. Yeah. We're not using categorical data in our system. So I'm not sure about that one. Like, you know, generally categorical data, you're doing like SVMs and regressions because you're trying to separate sets of, you know, the space in a hyper plane. So it's not like, so deep learning is a non-linear mapping from, you know, one to another. And so in theory, it can actually calculate these separation spaces to be non-linear. But in practice, like, it's, you know, it is a bit of an arc. So I'm not sure about that one. Because I have this problem. Yeah. Yeah. What kind of input data do you use? Are you using any live market data and what frequency? Because you just learn that it has to buy at the bottom and sell at the top. Yeah. Which is fine. Which is great when you're trying to beat something to play a game. Yeah. Which has set rules. Yeah. But when you're looking at markets. Yeah. What kind of data would you use to actually make an act of creation? Right. So we use both market microstructure, so which is tick level data, tick by tick level book, 10 orders, 10 levels wide. We use that. But additionally, we inject more traditional signals that the traders look at, such as like MACD or stochastic barriers and, you know, things like that. Just technical analysis. Yes. Yeah. Yeah. And we try to focus more on just simple market microstructure signals, because those are the safest things to use. And in theory, all of those other signals are actually generated from that. Like, so it's a superset. The example that you use, you were saying that you are using 3S trading and you are making a ratio and trying to figure that out. Yeah. But the mean reversion of a pair of trade typically takes over what? The typical 4 to 11 trading, a human real trading. You can take like months to mean reverse, right? Not with a pair of trade, typically. If these two instruments are highly correlated, during the day, they're just going to be, it's an Ohmstein-Bluenbeck process, is what we're doing, OU process. So that is a mean reverting process that is basically in physics, it's modeled as a spring. So you've got, maybe I have rubber band tied in my leg and I'm walking. And if I deviate too far, the force to pull me back gets stronger. So that's one of the typical models that people use for pairs trades. And I'm getting more accurate. So there is a magnum of OU trade. Ideally, it was a budget. And I'm going to open it up into the traders one time. Yeah. There's a lot of interesting detail to go into. The first thing that most probably I wanted to say that spot on point, but it will keep on getting more spread because it's so complicated. It very much depends on what you throw at it. So if the traders are able to observe that it's trending, then what we want to make sure of is that this nonlinear function mapping is able to learn the same thing that they're doing. So can we identify this trending versus mean reverting? And that's something that we are able to do. One of the things I have to ask is, you're thinking about taking the attempt to say you put the bandage for training? Yeah. The bandage is like? Well, 10 levels. Oh, yeah. Yeah. All right. Put the bandage levels. Yeah. With the volume. You know. Yeah. This demo is a very simple simulator. Yeah. Yes. You've been able to use this to identify which market we're using to go into trending, mean reverting by itself? Or you just go now and test it on the case. Right. So we do turn it off sometimes. So we have it running when the market's working the way it normally works. Sometimes, especially if there's news, we will turn it off. We're continuing to advance it, though, to the point where what we want to do is hopefully not turn it off when that happens and actually make a lot more money than that. There are traders that are pretty good at guessing the direction of something through a news event. So we want to try to model that. And that's a direction. Those traders are very good because they have been looking at 10 years of data. And here, you have a thing of big data. And there's a big difference with those values and all the things for 10 years. Well, we are training on a lot of historical data. So it's not like in runtime, in real time, it's working off with what it's receiving. But when we train it, that network is built from millions of data points. Last question. First, have you ever tried LSTM or R&M or other models or combined LSTM with reinforcement here in the trial? Yeah, so we are using R&M and LSTM right now on generating simulated market data. So it's similar to speech recognition. You have what happens next is largely independent on what happened previously. So that's also true in markets. So we're using LSTM and R&M for the data generation side. So like in the trading gym, we have these generators for data. And one of the ones that we're using internally is R&M and LSTM that's actually generating simulated market data using these networks. Well, I know it's going to be a model that we're going to be using more in the future. Yeah, well, it turns out that if you have a history, so my history length in my example is one. If you use a history size of like five or 10, it's very similar to what an R&M does with Dropout. And so we might do it, but we want to keep it as simple as we can as long as we can. Yeah. One last question regarding the activation function. Yeah. So as I understand, in neural network, activation function is very important in the golf function, which you can find. Yeah. So you're trying out the other activation functions, or how is that affected by that? So tanh and relu are the two ones that are used the most. And both of them work in this just as well. Yeah. I'm sure in some cases, one will work better than the other. But 10 support is nice, because you can change it. And then you can see exactly what happens as a result of changing it. So it's quite easy to compare. All right. Let me turn this off and get the next speaker. So thank you, everybody.