 Okay. Hello, everyone. My name is Vita. I'm a software engineer and I really like difficult problems, preferably with some mathematics in them. And that's exactly the kind of problems I work on at Quantlane. So at Quantlane, we develop and also operate our own stock trading platform and also trading strategies running on that platform. So we build our own software. We are our only customers. And we've got in-house traders using the software to trade. We are a really small team and work a bit like a lean startup, which is not so common in the finance industry. And of course, all of our backend code is in very modern Python. We are currently in the process of migrating to Python 3.6. We've been doing this for about three years now. It's working quite well. But this talk is not really about trading or stocks. So you don't need to know anything about that. It's more about data and state on a backend server and how that gets transformed and displayed to users. I'll briefly show you the problem that we're solving at Quantlane. So a trading platform is essentially a stream processor, which receives market data updates from providers. Those providers might be stock exchanges telling us what's happening in the market or third-party businesses that sell that real-time data. And this data streams to us constantly in real-time. And it's very small updates. And then we process that. We make some calculations on the data. We make predictions and estimates. And based on the results, we may decide to trade in the market to buy or sell or do both at the same time. And crucially for us, not everything here is automated. So we don't make all of our trading decisions automatically, but there's human traders in the loop. So we need to display all that's happening in the platform to humans who may observe what's going on and make decisions or intervene in any automatic trading that we might be running. In our case, the user interface is web-based. It's built in React. And it communicates with the main platform through WebSockets. But that's not very important for this talk. I'll give you some specific examples of what kind of data or state we might be showing to users. So the first example is trades. So that is just a record of transactions that happened on the stock exchange on a particular stock. So it may look like a simple table like this. We see for each transaction the time stamp when it happened, what was the price, and how many shares exchanged hands at that time. And usually this is just an app and only lock. So the stock exchange keeps sending us new trades as they happen. And obviously this is not very difficult to display in the user interface efficiently. We just need to send the new trades as they arrive. We need to display them in the user interface. Another example is not something we get from the stock exchange, but this is our internal state. This tells us what orders we currently have open in the market. That means we're offering to buy or sell a certain number of shares at a certain price. But those orders might stand in the market for quite some time before they actually get executed or before we cancel them. And the type of trading that we do is called market making. That means we are at the same time offering to buy and sell very large quantities of shares in the market. And the value of this is that we provide a lot of liquidity in the market, meaning that every time you come to the stock market and you want to buy or sell, you will always find a counterparty that is willing to trade with you. And one of those counterparties could be us. So we have dozens or hundreds of orders open in the market at any given time. And we want to see what they are, what their parameters are. So the traders can check on them and maybe alter them or cancel them. So again, you can imagine this as a table of objects with some attributes. And obviously, when we create a new order, it's a simple case of adding a row to that table. But what we also do all the time is alter the parameters of the orders that are already open. So perhaps the status of an order changes at some point when we start cancelling the order, or we might even alter the price of an existing order. So in this case, you need to be able to update pretty much anything in your table, not just append new rows. So those are two specific examples from the trading world. But of course, you can imagine any other real-time application with a user interface will have data structures like these. It will typically be a list of objects or maybe just a dictionary. So how can you do this? How can you efficiently show your internal state that lives in your Python process to one or more connected clients? The very first approach you might think about is to simply do snapshots and send that to clients on every data change. So each component in your Python process that generates some state will grow a get state method that returns whatever the state of that component is. So, for instance, if you have a list of trades, that the state of that component is just the list of trade objects. And in your process, you just call these all those methods periodically. You might have dozens of them or hundreds in your process. And you encode their return values in, say, JSON and ship them to the client every time something changes. So this is obviously extremely inefficient. But on the other hand, it's very, very simple to implement. This actually worked for us in the very beginning when we needed to get a quick prototype out and see if our style of trading will even work. So we simply use these snapshots because it was super simple to implement. And we didn't have so much data changing and streaming in the beginning. So it's a reasonable start, but obviously this will not get you far. So you need to get a little bit smarter. So then you might start thinking about diffing the snapshots. So you keep your get state methods as they were before, and you still call them periodically. But you no longer take their entire output and ship it to all the clients all the time. Instead, you start sending incremental updates. So if you append a new trade to your list of trades, you only tell your client to just append a trade. You don't send the entire list of trades as it were. And you do this by remembering the last state that your clients saw and then comparing the result of get state to that previously remembered value and computing a list of differences or incremental updates that you can send to the client. So in this case, you treat the state of each component as a black box. You don't really care what's inside. You just compare the old version and the new version, and you send the differences to the client in such a way that the client can reproduce the new state. So one idea how you might do that is amazingly already contained in the Python standard library. There's a module called diff lip that lets you find the differences between two sequences. And those sequences might be strings or lists or tuples of things. This is an example of how you can use that. So we have two sequences here, x and y. You construct a matter object, and you ask it for so-called op codes. Op codes are operations you need to perform on x so that you arrive at y. And in this case, it gives us three op codes. The first one tells us to delete the first item in x. The second one tells us that there is a subsequence that is equal between the two. And the third one tells us to replace the end of x with the end of y. And the indices in those tuples refer to indices in x and y. It's pretty intuitive to use this. So this is very useful if you can represent your state as a list of things. Because then the diffs you generate are on the level of list items. This appeared like a really good idea to us. So we implemented it. But then we found out there's quite a few problems that you might not think about at first. So the first one, perhaps the most obvious one, is that snapshots are not really going away. You still have to support snapshots because if a client comes to your already running system and wants to bootstrap its state, it has to get a snapshot to start from somewhere. So you still kind of have to support them somehow. What's more serious, though, is that you still keep calling your get state methods all the time. And maybe those methods do some work that will be thrown away when you find out that nothing has actually changed. A real-life example of the work you might be doing is converting some internal Python structures to something that can be encoded in JSON and send to clients. So you need to work around that so suddenly your producers also need to be able to tell you if they have actually changed. They might have a Boolean flag and if it has changed, the flag is set to false and you will never call your get state methods and you won't waste time. So this of course works. This is doable. But the downside is that it's really complex to implement because you can make a lot of mistakes and create a lot of bugs by forgetting to set or clear this flag somewhere. And this is bound to happen if you have many producers of data and your state might change in ways you don't, you can't always predict accurately. But there's even more problems, although this slide is more about interesting challenges than problems. First, DiffLip only works with sequences of hashable types. So before I showed you a couple of strings, turns out you can't diff a couple of dictionaries, for instance, because DiffLip will throw an exception. So to work around that, you suddenly must have a hashable representation of every single piece of state that you want to send to your clients. So that's a little bit of work or maybe a lot of work. So you might just say, I give up for small collections. I'm not even going to try and produce this. I'm just going to send those snapshots just as before. And that's in fact what we ended up doing for smaller lists of items, because it's just faster. And I'm talking about these problems in such depth, because I find them really interesting. It really makes you think about data and data structures in Python and how they're represented. For instance, that hashability requirement is actually really, really important, because if DiffLip didn't require you to do that and it simply let you diff tuples of dictionaries, you would find out that it's very hard to compare your old remembered state to your new state and find the differences, because you may have remembered a reference to a dictionary in your old state and then compare that to a reference to the same dictionary in the new state. So it's really not trivial to find what the differences are if you have two references pointing to the same dictionary, because they will compare the same. So to really make this work with mutable data, you have to do a deep copy of your old state and then compare that to your new state. So that's unfortunate and turns out that treating your state as a black box that you can simply diff at will might not be the best approach either. So we found a third way and that is generating those Diff's as soon as your state changes. So it's no longer a black box, but you really need to remember everything you do to your state. This is like a journal of mutations. So the way this works is you may have a list that's called orders, you insert a new item into it. Every time you do that, something somewhere in your Python code remembers that this mutation has happened. So it happens this Diff somewhere saying that there has been an insert operation at the index 123 and the payload of that insert is a reference to new order. And of course, you do that for item assignments and for item deletions as well. To this end, we built a library that is now open source as of lunchtime today. It's called diff track. It's also on PyPy, so you can install it and try it out. And this code sample simply shows you some boilerplate to set this up. So in diff track, you have dispatchers and listeners. Dispatchers are the data structures that you can manipulate, like insert into them, override items or delete items. And listeners are receivers of these mutations. So those are the components that actually get the log of rights. So here we just create a dispatcher and a listener and we connect them to each other. And this is how you use it. So you operate on the dispatcher called orders here as if it was a normal list. So here we insert two order objects and then we delete one of them. And then notice that we're asking the listener, not the dispatcher, but the listener for the diffs that happened. And we get this. So we really get a log of all the operations, two inserts and a deletion. And you get references to the insert payloads. So if you have this, you can encode this in JSON and send it to your client. And you didn't have to find out what the differences were exposed. This system obviously can support snapshots very easily because the listener, ever since it has been connected to the dispatcher, must have seen all the diffs, so all the operations that were performed on the dispatcher. So you can ask it any time for the current snapshot, what the list currently looks like. Notice we only have one order here because back here we inserted two, but then we deleted one of them. So the snapshot shows you the latest state of the dispatcher. And perhaps interesting with the dispatcher couldn't tell you this information because it is stateless. It doesn't remember its own state. It only proxies those rights to listeners. And the listener, once you call get new diffs on it, like we did on the previous slides, it will forget them. And if you call it again and no rights have happened, you will get an empty list, of course. So this automatically preserves consistency between snapshotting and diffs. It cannot happen to you that you ask for a snapshot, and then you ask for diffs and you suddenly get diffs that actually were already applied in the snapshot. There is a couple more features that are worth mentioning. So this doesn't just support lists, but also dictionaries. So you can set items or delete items and dictionaries, and then you get a lock of what you changed in that dictionary. One feature that we really like is compacting diffs. So in the previous example, I had two inserts and one delete that actually reversed one of the inserts. If this happens in a batch before you even send those diffs to your user interface, to your client, you might not actually want to send the entire lock because you will be telling the client to insert an item and then immediately delete it afterwards, which is kind of wasteful. So compacting means cancelling out diffs that operate on the same index in the list. So if you insert something and delete it immediately afterwards, you can just throw away both diffs because it is as if this has never happened. And you can do similar optimizations when you set the same index in the list to new values. The only problem with this is we haven't quite finished this. Turns out it's quite a difficult thing to implement because when you start cancelling out diffs, you suddenly affect indices of other operations in the list of the diffs. So we've been working on this for quite some time with breaks because it proves to be a very frustrating task. Last week, one of the developers on my team was telling me triumphantly that he finally fixed all the remaining bugs and we could release diff compaction and half an hour later, he told me that he ran a fuzzing tool on it and found a couple of new bugs again after three months of trying to get this right. But the latest message as of this morning is that it will work tonight. So hopefully we'll be able to release this feature maybe tomorrow. But according to our estimation, this will save us maybe 40% of traffic between the backend and the user interface. So it is really worth it. Another interesting thing you could do is squashing or aggregating diffs. So this means if you have a couple of inserts that affect subsequent indices, you could actually group them together and instead of saying insert an item at index five and six and seven, you could say replace indices five through eight with this list of items. So it's a small optimization. It doesn't really produce diff track diffs. It produces group diffs that we can use in our own custom protocol that I'm going to talk about now. So beyond just optimizing your user interface updates through diff track and diffs, you might also want to consider to use a completely DIY protocol if it makes sense in your case. So we decided that JSON is nice because it's human readable, but it's really wasteful in terms of space. And it also isn't, it's quite fast to encode and decode, but you can do better. So we decided to use a binary protocol. And to encode the actual payloads, we use Apache Evro. I really recommend you check it out if you don't know it. It's a bit like protocol buffers. But we think it's slightly more evolved and has nicer Python implementations. By the way, all these things in the slides are links. So I'll put these slides up this afternoon and you can just click through. And this protocol, among other things, allows us to squash those diffs and instead of sending a couple of separate inserts, we can do like a slice operation, which the client then applies as if it was just a slicing operation on a Python list. But of course, it has to do that in JavaScript, which is a whole another story. And again, about Evro, apart from the binary protocol saving you some space and time when encoding and decoding, the nice thing about it, it requires you to write a schema for every message type that you send, which has the nice benefit of having an always up-to-date documentation, because it is literally impossible to encode a message that doesn't conform to your schema. I'm running out of time, so check out these links in the slides. There's some other projects that inspired us when building this. And I thank you for your attention. Thank you. Thank you, Rita, for this good presentation and for your questions. I'm going to start with a question from the press library you released during your lunch. I understood. And thank you all for coming here. See you after the break. So I'm guessing we have no time for questions. Yeah, you can just catch me around here. I'll be here all week if you have any questions. Thank you.