 Hello, everybody. Since recently, I'm actually a machine learning engineer at FreeNOW, all former MyTaxi. And I do a lot of backend development and putting models from our data scientists into production, but not only just putting into web service or into some batch jobs, but trying to optimize them so that we have certain thresholds on our performance. This is also why I stumbled across Rust. And as previous speaker here told you about PyO3, I also tried it out and it was very promising. But right now, I'm working on simulation for our business purposes. And the question of what should be executed, how fast, how expensive in terms of scalability or maybe development time, all different points, they are kind of important right now. This is why I decided to present this topic as well here. Hello. So what is a simulation? In a very simple manner, simulation is an abstraction of some event. So you can think a state machine is technically a simulation. An example would be, for instance, how the water flows through the pipes in a system or how the blood goes through our veins. And then, for instance, you have an aneurysm, how the blood flow behaves there. So all of this can be created in a simulation. So it's just a program that recreates some real event. There are different types of simulation. Normally you will see continuous simulations which are implementation of mathematic models. For instance, that are continuous, they define the whole flow of the logic. A very well-known example would be a game of life or, for instance, in chemistry, how different chemicals react with each other and, for instance, whether you're gonna get an explosion or not, this can be also played in a continuous way in a simulation. A second type is discrete event simulation which is that it covers most of man-made systems, for instance, post office. So event occurs and something happens only when somebody's in the post office, I as a customer. Or manufacturing pipeline, logistic systems and stuff like that can be used in a discrete event simulation. But right now, there are a lot of papers and most of the production-ready simulation frameworks are mixed where you can define certain event dispatching system that says, okay, these resources are dispatched and then the system decides how to proceed further or if there are any other side effects that can be described in continuous way. For instance, forestry is a good example because you have a natural way of forest to grow or recover from a fire, for instance. But again, you can dispatch discrete events like man-planted forest or man-created fire. So it's a mixture of both. And obviously it requires more development time involved. There are numerous tools in all languages, you can imagine. For instance, frameworks, libraries, game engines, simulation technically can be seen as a game engine without manual or human input. And of course, people build it from scratch with different programming languages. There are just some examples that you can use. For instance, you can use Unity to simulate how people will learn by providing some model that will replace the human input. For instance, a model will learn how to drive a car in Mario Kart or stuff like that. That can be considered a simulation. But in my case, there are different things you need to consider when choosing whether you want to go with a framework or a library or how you want to use the simulation, whether you need the simulation in your business. So four main things are cost. If you go with close sourcing, it might cost you a lot or it might result in a lot of development hours that also directly is translated into cost. Then speed of the simulation. You probably don't want to simulate something with real time or near real time speed, right? Like why then bother? The second, the third point is scalability. For instance, you want to run different scenarios. You cannot just wait for one to finish and then start another one. So you want to, for instance, horizontally scale it. So you want to put it into a service. You want to put it into cloud. You want to provide some resources. You need to know how the framework that you've chosen or you've written behaves in this way as well. The fourth and one of the most important points is extensibility. If you are using something that is already there, like a backbone library or even a ready framework, you might run into a situation where you have a very important business use case, which is not covered by this framework. So you either have to pay the company to extend the framework or you have to do it yourself. And again, it translates almost directly into cost. So whenever you decide to do something, you need to keep an eye those four. And for this presentation, I decided to go with very similar situation that I have on my work, which is we're gonna simulate dispatch of taxi tour request or a customer. So we're gonna have a world that has, that can spawns with a certain chance of P request. We can have only max and requests, which can be described as you can only have thousand passengers in using your app and one passenger, one request at a time. It's not possible for a passenger to request a taxi to A point and B point because how would you do it? Because you have only one physical body. What would you do? Would you split? I don't know. Then request can be assigned to a free car only. We don't have something like shuttle buses. It's just pure car, one request, one car. Request can be canceled after a certain amount of time they then don't get assigned, which again is the real situation from taxi business. Passengers are not waiting longer than let's say 15 minutes. They will just get a bus or a different taxi driver. Cars can be either free or occupied. We do not have any other use cases and we will simulate one day, which means we will simulate 24 multiplied by 60, 60 ticks. So one second is our atomic unit of time that will be driving the whole simulation. Criteria, so the talk is called Python versus Rust. So we have to somehow compare them objectively or semi-objectively and these are criterias I came up with. The objective criterias are amount of code you need to at least prototype your flow, then testing simplicity. How many packages there are for testing? Is it simple to write a test? Is it just how much time you would spend on it? Documentation generation and documentation available because probably you're not gonna write everything yourself. You might need some additional libraries and crates. Performance obviously is important. Memory usage is also important because of cost and scalability points and ecosystem will play later a big role once you have your prototype in place. For instance, if you have already existing business system that has, for instance, Hadoop and Coal, you might run into problems where your simulation has no official adapters or connectors to Hive or Hadoop itself or there's just no crates and you have to either write it yourself or rewrite it again in the language that has the libraries. And language versions. The previous speaker mentioned it already, but it sometimes play a crucial role, sometime it doesn't, so you also have to evaluate this risk. The subjective two points are called simplicity. Obviously, if I have more experience in Rust, I will say, ah, it's easy. I don't understand Python at all. I will spend one hour in Rust and one day in Python and vice versa. And the second one is development speed. They are connected, but development speed is, Python is notoriously known for allowing people to fast prototype. With other languages, especially aesthetically type languages, you have to think first what structures you want to use, how you want to present your program, how is the flow, not of the logic, but the whole flow of the program, where objects are going, memory, collection, all that. So you have to put some time in advance. Saying all that, let's go to our presentation. So I will show you a couple of codes. We will start with Python first. So the implementations are identical, with a slight change or unique things that only Rust has or only Python has. Otherwise, you have the same struct or class, which is request, request has UID, which is its unique ID, drive ID that can be assigned. It's technically an option, so there is non-driver or there is a driver. Remaining lifetime and fulfillment time. These are parameters that will make our request either be canceled or being fulfilled. And is alive is a utility function that will tell us that yes, still kicking, still working. Another is a taxi, which has ID so that we can see in the logs what request was assigned to what taxi, if we have slightly more complicated logic of assignment other than random to random. And then we have our world. World is just a container that binds it all together. So it has this main loop that says, until I have time or in this day of simulation, do this. And within do this will happen spawn creation, if we can, assignment, if there are free cars, update. So this ticking, for instance, all requests that are pending will start decaying and all requests that are in progress will start being fulfilled. And the cleanup that will say, ah, this one is already done, free the taxi, allow it to be used in the next cycle in the assignment and put this progress request into canceled or finished. So you can see all of this in the attributes and functions. So the world will have runtime. This is amount of seconds it will run for. Age is current step. Request spawn chance is, again, this random chance to spawn a request. Max active request is the limiter of how many active requests we can have at once. Active mean both in progress plus pending. And taxes. Taxes, as I said before, can be free and occupied. And requests are spread into four groups, pending, progress, finished and canceled by the state and execution. Then we have maybe spawn request, which is obvious if we can't spawn a request. It's not, we have not reached our max. We will do it with random chance. Then we have distribute. If there is a taxi in free taxes, we will try to attempt or we will assign this car to a first non-assigned request. So there is no special logic yet. The update request is exactly that. Teak down either remaining waiting time or fulfillment time. The cleanup is just put into next state if I have to. And the main loop is while we have time, do all the steps one after another. I'm not gonna run this program for the whole day because it's actually slow. So I will be going back to the presentation side. Sorry for fighting. So all of our criteria's will translate into quite small amount of code to prototype our flow. It's actually 94 without documentation. Our performance is for one day without printing into console is 210 seconds. I did it with Hyperfine, which gives it like several runs and uses average. So it's believable number. It's not like I run it once with couple of different programs running in the background. Memory usage. Just remember this number. So for this execution, Python allocated on the heap about 35 megabytes. And it's small. Tests, I don't have to tell you about Python tests. It's gorgeous. It's cool. You can either use a Unity test or Doc test if you don't want to bother or you can have some additional like PyTest packages or something completely different like NOS tests. The ecosystem is amazing. Tell me at least one use case that is not applicable to Python. The version of Python should not be a problem at all starting 3.6. Sometimes you might run into rendering problems but it's a slightly different issue and it dependent on the OS that you are running it on. I've spent about an hour writing this program. So I will say it's quite fast. The simplicity of the code you can judge for yourself but I will say it's also quite cool. It's easy to understand what it's happening and not only by the structure but like from the Python code itself. Okay. So now we'll have the second contender which is Rust. As you can see it looks very similar. You have to use imports. This use std something is just an import to use certain structure or function from a crate. Then you have these things are called macros. You probably saw it from the previous speaker. They will give you all the implementation of this functionality into your struct. Just simplification. So we have the same request. You can see right now struct as a class that has ID, remaining waiting time, assigned taxi which is now option. So again it can be none or in this case UID and fulfillment time. Then we have our request implementation. It's in Rust you can create a constructor directly and this is let's say a utility function where you also can provide some default values if you want to. I have it here. So and is alive. The next one would be taxi which looks almost the same way as the Python one with implementation of the new function. And then we are at our world that has the very same fields that Python class as well has. Runtime, age, request ponchans, max active request, taxis is a vector like list in Python would be a vector in Rust. Then active request and archived request. I have them separately and not united as a one hash map. You could do this as well but this was the idea to go with the first. So it's not completely optimal implementation in Rust but something that works. Then you have the new function for the world. You have the RNG for random. You have to pass it through in comparison to Python that can just use it as a global. Then we have a utility function print and implementation for that. Then maybe spawn request you already seen it's very similar. Distribute and fulfilled request is slightly different. You now can see this eterm root find which in Rust you have a concept of mutable and immutable you probably all know that and you have to think in advance whether you want to change certain structures or not where you want to reference them or not and stuff like that. You normally get used to this after a couple of days writing Rust but it is something that is very different from Python just to notice. Then update request is the same and then we have cleanup requests which is slightly different in comparison to Python due to this concept of boring and that boring and reference checking. So you cannot just say one vector from an element from vector just append to a different vector because there's gonna be a movement and you have only one owner at a time. Just ask me later about technical details so that we don't spend too much time on this but it's just the way it is. It still is kinda understandable from the code that you can see. Then you can have like you have your run till done which is the same. This display function is just a implementation for print. Rust does not always by default has print possibility for every struct you have and then we have our main. So what does Rust tell us? It results in 160 lines of code without documentation. Documentation was with three slashes and performance with one day with no logging is 154 milliseconds versus Python 200 seconds. The memory usage is a magnitude less than Python. It actually is a heap location and not stack. So there are some shady business done there. I must say but in pure heap location Rust just bits Python dead. The different criteria were the simplicity of the code. So I would say it's subjective and you would understand from reading the code twice or three times if you never did it before but it's still Python wins in this regard. Then amount of time I've spent on writing this one is one day because my first implementation actually I managed to make it quadratic time. So it never ended. I run off patience before that. So you really have to invest time in advance thinking of, okay, what am I doing here? Also you will fight with the compiler a lot in the beginning but he's your friend. So if it compiles it works. The other one was tests. Cargo is the package manager for instance and everything manager in Rust you can run just cargo tests with and write the test in the same file as your normal program or in a different file and just annotate it with a hashtag test which is a macro for tests. Then ecosystem is not that good at all in comparison to Python. It can be in a certain domains like embedded programming or for instance low-level programming in general. Right now it's catching up on web services as well but Python is very much ahead. And again the situation where you have something like Presto and Hadoop or Hive and Hadoop on your business use case, just forget it. There is no crate in Rust for using those SQL layers. Yeah, so let's have a more visual comparison. So amount of code, Rust is about two times more code. It can be even more or about the same amount if you know how to write your program but for beginners, Python is a clear advantage there. The simplicity is in both cases very simple. Both languages are trying to keep it very simple and usable. Documentation, I would have said that it's good in both cases but Rust has one killer feature for me and which is you can have offline documentation served on your local server for instance HTTP Python server, accessible offline if you have it cached with your cargo. So you can build your project and you have it offline. Doesn't matter whether you have internet or not which is as far as I know not the case for Python. This is why for Python I left no mark or color for this one and Rust is like, yes, very good. Memory efficiency, duh, performance, you saw it. Ecosystem, by the way, there were no parallelization in the program so it can run even faster and quite cheaper for that matter. Ecosystem, Python is a clear advantage there. There is no denying that. Versions, in Python you should have no problems if you got rid of your Python too, hopefully. And in Rust it will depend whether you have to use nightly or not. Stable has most of the features. Nightly has really nice features that you might need or might not depending on how deep you want to go in your implementations since we are using a writing from scratch use case. And development simplicity and development speed. I am a Python developer, not a Rust developer so this is biased for me. So would you write rather in Rust or in Python your simulation if you need one? To be honest, I don't know because you have to consider your paint points first like cost scalability, your extensibility and speed because at some point if you go for Python you have to do optimization. Like it always happens. And optimization can take quite a lot of your resources in terms of development time, in terms of even getting some libraries that are not open source or whatever. At the same time, if you go with Rust, you will be very slow in the beginning, maybe even very frustrated, but the end results always worth it. You won't be able to reach this performance with Python pure code without any site or whatever usage. So I'm sorry for not giving you the exact answer on this question. You have to consider for yourself, depending on what simulation you're writing. How fast should it be? How scalable should it be? How simple should it be in the end for other users to use it? If you want to have simple, okay fast and write now simulation, then go with Python. If you are able to invest time, then consider Rust. Thank you. So I think we have time for questions. If there are any. Did you try to run your simulation with PyPy? Not yet, but in previous years we actually had a similar program. And we, well, me, I tried to use PyO3, Cpy and PyPy, PyPy from the box gave me 30% of performance improvement, which is still not the magnitude or several magnitudes. And also PyPy is not compatible with all packages, for instance, Psykit. If, again, I'm a machine learning engineer, it's my biggest pain. We use Psykit and we cannot use PyPy for that matter. To be fair, Rust probably will always have problems with that as well. But yes, I tried PyPy in a similar situation. It didn't perform as well as Rust. Thanks for the talk. They were interesting. Tell me about the business of writing simulations. Is that a big thing? Like, is it critical to the business? How many people work on that? I will say simulation is the next big buzzword in industry. For instance, I can give you a couple examples from mobility industry. Tesla, Uber, my taxi. We're all writing simulations for slightly different purposes. For instance, Tesla has a giant and very complex simulator for autopilot to test some new features before they roll it out into the streets. So to avoid certain fatal flaws or to catch them slightly earlier. Uber has simulator for marketing purposes. And in my taxi, we are developing it in the first row for development purposes, meaning we want to speed up our development cycle. For instance, we have a team that works on allocation. So connecting driver and a passenger and they have a brilliant idea. But until they put it into production and run in a B test for some time, they don't know whether it was worth it or not. We can even lose people if it was a bad decision. So we are one, what we are doing, we are building assimilation that will allow them to at least cut the bad ideas. Also, you can test certain things like how would my system react if we have a strike? Suddenly we have small supply. Like we have, instead of 200 cars, one car. We still want to make money. So we will probably have to serve our best customers first and stuff like that. You just cannot replay it in the real world. Also from your historical data is kind of hard because it's just historical and not flexible. And this is where simulation kicks in. Also, simulations are very used for data generation purposes. For instance, if you want to not only test your system on robustness, like, okay, this service works, the request is coming in, but you want to produce some edge cases and you want to test it like end-to-end content-wise. Then simulation can also help you. Thanks. Hi, thank you for your presentation. You've mentioned about data generation and I know that there are papers for using machine learning models to speed up simulations. You're basically using input and output from the simulation as a then data set and data set for our model. Have you heard or considered using that approach? Yeah, the thing that we are building and it's quite famous is agent-based simulation. So we have trained models of drivers or passengers or, for instance, your baristas, if we're talking about coffee shop or even customers that are learned from historical data. For instance, I have a customer agent that learned that he's, and sorry for the word, asshole and he likes leaving no tips and spilling coffee everywhere. So you would have something like this in your simulation. So I don't know exactly how would you use models to speed up simulation itself, but using models in the simulation is quite a very often technique. Thank you. Thank you for your talk. If you have any more questions, you can leave.