 Hello everyone, I'm Darko and Mr. is behind me. He will be joining you a bit later So we'll from none of it a creation company and we will be talking about mobile games to the cloud with Python or how we as a company made a transaction from a more More mobile from a mobile based gaming to a one that is cloud based So I will divide this talk into three parts the first part will remind me then missile will take over and then I will finish So let's get started So how is this talk going to be divided so we have four parts the intro and the requirements the Python back it but you will see that later But I want to tell you what our objectives are our first objectives is to show you how we Connected various technologies using Python how we created a scalable architecture as a result of it And how we are using Python in our company on a number of places But the second objective which I find more important is to show you the lessons We learned the challenges we faced the difficulties we had and what the problems were when creating such a system That has a lot of customers that has a lot of players Which which are on smartphones and which problems arise with smartphone gaming and stuff like that So who are we I'm going to just talk you through this going to be a short part about our company We are none of it. We are a creation game development company and we started in 2009 and up till until 2011 we were all developing Applications and then we started developing games because in that time the games were more profitable and more interesting part of the industry so After we switched the games we decided okay Our games are on the smartphone, but we need some kind of a back-end on The servers which are going to connect the players of our games together like a social part of the gaming So we started developing a back-end server in Python And after that we all we saw a steady growth We had all the games we developed connect to this back-end But after some time that back-end back-end became too slow and to it wasn't Sufficient enough for our new requirements on in our new games So as you can see we are using Amazon and we have a lot of instances and we have a lot of Request daily and that's rising every day and every year So how did we use Python in the past like I said we needed some social social features So we started with our first iteration of server with which was pretty easy we used simple HTTP server in Python and that that's Lasted for about a month and then we realized that that won't work We switched to a more robust and more proof solution which was Apache with Django and that was okay for our Requirements back then our requirements were only to connect the players in a social way so they can stick in view each other's games they can Send gifts to each other and stuff like that after that we added some more functionalities like updates in the games Push notifications settings and stuff like that And we added a secondary server which was for our in-house analytics system and that was working pretty well But we only had one big issue We had only vertical scalability So every 12 months our server would become too small and would be able to handle the stress and the number of requests So we had to To bump up the server and that lasted for three years every 12 months a bigger server And we wanted to have horizontal scalability, but we couldn't because we had a problem with save games Which were on which were which were saved on the disc and this would not allow us to have horizontal scalability So a change was needed because of that So why did we make the change? So we had a new game this new game and new requirements and these new requirements were okay Let's have a better multiplayer. Let's have players who play against each other And when you have apache that's not possible because apache is a pre fork worker based web server Which doesn't allow you a lot of consistent persistent connections. So we had to change that Another thing we had my sequel my sequel wasn't Good enough for stuff like that You couldn't we had problems with my sequel when we developed the analytic system because we had like 50 60 million million Rows of data in one table and we had to join it that didn't work. So we decided, okay, we're gonna change that We're gonna make a solution that is more scalable that could allow us not only to have the social part of the game but the entire multiplayer and One other thing we decided, okay, we're gonna change We're gonna put the game logic of the game on the back end No longer will the game be played only on the smartphone It will gonna be played on the on the back end from now on So a player would send a request to the server every time something happens in the game So that we can have save games on the server that we can have players that play on multiple devices And when you have save games on devices like we had in the beginning You have problems with cheating because as soon as a player can reach his own his own save game He can cheat. He can just change the number of like coins. He has to four billion He can change the price of some stuff in the game to minus two billion and when he buys the stuff He gains two billion coins. So that that was an issue. So we decided we have to change that we have to We have to stop the players from cheating like that So what was the goal? The goal was to have a system that was scale Horizontally every part of the system that was primary primary part of the system could scale horizontally We had wanted to have an automatic deployment because up till now we had to upload the source code It's each time to the server and that was not feasible enough We had to upload it. We had to restart the serve the apache this time we decided to change something and We wanted to containerize each part of our back end So like if we have layers of back and we wanted to have them transferable easily transferable between Instances between servers since I like I said the game the game logic had to be Transferred to the back end that meant that each player had to have a persistent connection to the server and Apache like I said wasn't good enough for that So we had to find a solution how to have like 10 20 30,000 of players which play Simultaneously and each of these players have Persistent connection to the server. So another thing up till now until now The game would pull new data from the server. It would send a request It would receive a response from the server. We wanted to change it We wanted to have the server push new data to the to the games to the client each time new data would appear so We decided to use a sort of a events based Architecture we had a number of requests that would be sent in a batch from the client if those Requests would not go through they would be a sent again in like three or four seconds at that that allowed us to have to have So the data would not be lost in transaction in the transfer and stuff like that and the cheating it made cheating very difficult because we Tried to to each request that would allow a lie that would arrive on the back And would be checked to see if this request is available is possible. For instance, if a player Wanted to buy something in the game. We would check if he if this player has enough coins has enough any enough gems or or any other Any other resource in the game So why did we choose Python? We already had Python like I said we use Django We used a lot of things in Python and that was Pretty nice for us and we wanted to continue that way as I like I said We needed a web server that will allow us to have a large number of persistent connections and tornado tornado fits perfectly here because it's Made for that it's made for persistent connections from the clients and after a couple of thousand connections to one tornado Instance we decided okay. We can start another instance We can have a proxy in the front in front of that that would load balance between it We wanted automatic deployments fabric is also great for that We had fabric connect to our Amazon easy to instances Check for each type of instance with tags and we used both of that You can see both on the slide to and we had workers workers were part of the system that would that would process the requests that arrive from The smartphones from the games Since we had a working Django server, we wanted to have some sort of backward compatibility so we can reuse the database data we can reuse the code and We can reuse the stuff we already had that was required for the new game and We had a lot of databases We had three databases and they all had great frameworks in Python So Python was the solution for that and we decided okay We can go with Python from from the from this part on and continue the way it was used until then So I'm gonna leave you to miss off. He's gonna walk you through the architecture and the parts of the system So I'll show the architecture better on the diagram later But for now the diagram the architecture is divided into several layers Approxies Roots the request to the web servers the web servers talk to the queue which which talks to the Web to the workers and they talk to the databases and send a reply back everything is horizontally scalable and Every part that needs to be scaled and is a unit in this layer is dockerized so That makes deployment really easy So Python is great here because not only did we develop the code That implements the system logic the game logic, but also it works great in binding all those services together with its libraries So everything is in a virtual private cloud. So and only the Proxy the engine X proxy is available to the outside You can see how the data travels to the system blue lines represent the inbound traffic and the yellow lines represent the album traffic Inbound traffic comes to the HTTP requests and the album traffic goes through the web socket as you can see we use a React Amazon RDS and ready says are back end for the web web server part Each it contains consists of two parts the HTTP layer which Receives requests and the web socket layer which pushes the responses back to the To the game the engine X uses IP hashing so the persistent Connections are possible and web sockets web sockets can survive The when the request comes it goes To the web socket to the web server and it's pushed to our in-house built queue and the responses are The responses come through redis and we use redis as a queue we use simple list and web sockets block on the On the list until our request our response arrives that gave us some problems, but we'll talk about it later now the workers Our single-threaded Python programs. They are dockerized and they are connected to RQ We had to use our current RQ because of some issues with react We'll talk about that later and They receive requests from RQ each message from client has a unique message type and when we develop new games, so we just have to Process different events that are specific to that game and the rest of the system stays the same RQ is written in go We had we had to write a queue because of an issue with react when because of its architecture It doesn't support transactions. So when multiple Multiple points are right to the same data point They can create what they call siblings which are pieces of data with different week logs And you have to manage that that yourself. We saw that issue by having a queue Which roots requests for single users or single connections always to the same worker we use ring hashing for that and So that makes sure that Every piece of data that is written to is written by the same worker and they will never write multiple times to the same Data point the queue detects when workers go online and offline Everything in the architecture is made that everything is detectable and messages will queue up until new workers come up or it will rehash and Start using a different number of workers so Technologies used our NGNICs for the proxy tornado for the web server zero MQ is used Several places so that we can turn off and on several components and the messages will go through as soon as it's Turned on for the databases. We use react redis and my sequel We was ready for everything we use Docker for all the components. We just have to make git push deployment master and post receive hook we'll build the Docker containers Tell the the fabric will tell With the instances which have to be restarted that a new image is available They will pull the image rest are the Docker and we will have a new code Running and SQL alchemy for everything SQL related So dark over So as you can see we use Python to bind various technologies together that works great And as you can see we have a web server in Python We have workers in Python and we have them connecting various databases clients together But I'm gonna talk you now through the lessons learned because when developing such a system You always face difficulties and we had a number of difficulties some more specific to the smartphone part of the industry So one of the first things Mobile devices have mobile connections mobile connections can be slow when a person walks to this to the to the street Sometimes it disconnects from one tower connects to another tower. It connects to beefy It disconnects for Dubai 5 so you have connections which drop and new connections which start and we had had that problem Because we TCP doesn't help you when you try to discover connection drops So we had to fix that by monitoring our connections frequently We had to ping from the client to the server. We had to ping from the servers to the client Why because each time a new connection would spawn it would steal some of the data which arrives to Redis And that was the problem So we used Published and subscribed in Redis to fix that issue. We had to to Inter tornado communicate between tornado instances because We had multiple tornadoes and when a connection would drop no other tornado instance would know that it is dropped So that was a big issue. It's still an issue And I think it's it's a problem with most with most smartphone Connections because they drop frequently our old our old system also had this type of problems But Apache would fix it for us You had like when a player leaves the game It would leave a hanging connection which would drop only in three or four minutes But that wasn't the issue then now it is an issue So we had to fix it with multiple things with multiple on multiple places And that was the biggest issue another issue is what miss love said react and raise conditions in react So react does not allow you to have it allows you to have multiple workers right to the same data But when you do you have a condition which you have to fix react just tells you okay You have a condition you have two siblings now you have to decide which data is the correct one You have to merge the data yourself and that was an issue to us because if we if two workers write the data to react then the workers would have to know how to fix the issues that arise and Since we have a lot of requests that come from from the games each request has its has its own request type We had to know how to fix each Request each request type and that was a big problem. So we decided okay We're gonna just skip the problem. We're gonna pretend the problem isn't there We're gonna fix it like miss love said you're gonna have one worker receive all the data from one client and This data would arrive in the same order and one worker would only write for his or his client and that would allow the client only to have one worker for him and That fixed the issue because there were no multiple workers writing to the same data point. So another thing sequel data When we use sequel it blocks our single-threaded nature of our workers and since workers have to process a lot of requests We had to make sure that it they run as fast as possible. So we decided okay We're gonna drop we're gonna push some data to additional threads Which would write to sequel only when there are no other data to process We decided that this data is low priority Some of the sequel data is low priority and it can be written in two three or four seconds after the request was Processed so the main thread will not block the main thread would be able to process as much as requests as possible And that in general would increase the throughput of our system Another thing tornado and red is they don't work together Quite well because tornado does not allow you to block Anything it has to go Is it is asynchronous? But if you try to block in red is you have to block on the web server side And if you block like on a block pop blocking pop or on a on a subscribe It would block the tornado itself completely. So there were a couple of Solutions, but both solutions. We tried had some issues. So we had to fix those issues We had issues when we tried to disconnect from Redis and stuff like that So we had to implement some new stuff on that part And that was a problem and another thing a react We have a react of five databases and we we had one Problem when one react instance crashed and we didn't know about it because we were in the middle of development and We have we had this one server crash for about one month and everything was working fine because a react healed itself Magically we didn't know anything happened until one day. We tried to connect to this instance that was down And we we realized that it wasn't responding to pings or anything But the data was secure data wasn't corrupted. Everything was working as it was supposed to work So what's in the future? Like like we said we dockerized every container possible and docker is a great new technology And I think you have a great talk tomorrow by our colleague Which will which will show you how to use docker in a product in a developed environment I and I would suggest you listen to that because it allows you some great stuff especially on the production like this you can transfer complete images of Your your code between service and docker now has a native Python support So you can write docker Docker containers from your Python code which you have for automatic deployment and stuff like that Another thing we are working on is an automatic automatic testing facility Which uses a lot of docker and it allows you to it allows us to run suit of tests on a new iteration of our web servers our entire back-end to see if each request that arrives would always yield the same response and Since we have two connections currently we have one connection which is HTTP Which has receiving requests and we have a web socket that is only for the responses to the game We want to unify this to one single connection because because it would be much easier than it is now This also brings us some issues But we are trying to fix it with one connection and one interesting thing when we developed our first game using this back-end It's required five months to do it because we had to write the entire back-end from scratch But the second game was finished in a month and a half because we only had to transfer the game logic from the game To the to the back-end we only already had the entire working back-end the web servers the queue the Workers and we only had to transfer how to process the requests that arrive to the server And that that would allow us to have our new games Which would which would be developed to also need a lot less time than the first game did So that's that's it. You can ask the general some questions if you want. Thank you Well, we send it in a Jason form we Yeah, the the the colleague was asking how do we send the data from the from the client to the servers? How it is how it is formatted. So we use Jason to format the data We try to we wait for for like three seconds to see all the requests That are going to be sent to the server and then we send it in a batch If this batch goes through it is marked as completed and it isn't sent again if it isn't received if If a message isn't received that it is processed it is sent again in the next batch But the next batch always has the same order of messages like they were produced in the game So basically we have a Jason data which has requests Which are requests that were generated in the last three four seconds that word process This is one type and the other type are messages that are Critical high priority. They are sent Instantaneously in as a one message those messages like our stuff like a player tries to attack another player a player tries to Request the current leaderboard and stuff like that. So data that is needed as soon as possible Well, we didn't have it for the first game, but we are doing it right now Yeah, I would forgot so the colleague was asking was our method of testing and staging in our system So we didn't have it at the beginning But now we are doing this automating testing which would allow us that as soon as we do a git push We would spin off a new backend in docker and it would had have unit test for each For each request type that that is in our system And it would try to you to start some use cases in the game and it then would Would try to run those use cases and see if the Responses in the database in the game are as they are supposed to be Well, the question is why did we use we use both zero MQ and Redis And why did we decide to use a redis as a public published subscribe service and not zero MQ? So basically when we started using zero MQ We tried to use it in place of the custom in-house queue We developed and we were we realized that it didn't support what we needed and we already had red is doing this stuff And since red is is very powerful It allows you to do blocking pops publish subscribes key value storage everything you need We decided that as we are already using redis for This kind of stuff we can use it as a as a published subscribe. So That's probably the main reason zero MQ isn't as much Isn't so much used in our system as Redis is so basically that's the reason So basically the question is why did we use tornado and did we try to use other technologies such as Web socket you and stuff like that. So when we started developing we decided we needed a web server That would support a lot larger large number of persistent connections and the choice we did some research we did some Look some documentation and some stuff like that And we decided that tornado in that point was the best solution because it It had great support and it had connections with with tornado The next with the red is connections with zero MQ and that was what we needed at that time So basically that was the reason we did it But we are currently looking into finding an alternative Because we are trying like I said we're trying to unify the connections only have one connection And that would allow us to have to try new technologies and see if something else would fit into the shoes of the Tornado fits right now. So that's a great question. I'm gonna repeat it. So the question is how do we? Deploy a web service because when you deploy web service and have if you have a persistent connection This persists connection would drop and what would the clients would then send requests to a server that doesn't exist basically So we have the client is Full bullet proof so to say to connection drops if a connection drops the client would know immediately because when the But engine X and tornado drop it it drops the entire TCP connection and Client would then reconnect immediately to a new web server, which is deployed automatically in that's Like a second or two delay, but since we don't have to have Responses that are immediate we can batch the we can send the messages in batch in every four or five seconds That isn't an issue. So we decided we can stop the docker containers and start them again in one second And the client would not know that anything happened. It would just reconnect again Okay, the question is why did we use web sockets and only for the for the returning part you mean Yeah, because we wanted to have Persistent connection and one of the solutions was to have a low-level TCP connection But we wanted to have a connection that is a more of a higher level sort of like HTTP because our requests that are arrived to the server Are also HTTP connections So we decided that to use web sockets because they are up an upgrade to our current architecture that we already use and they are an upgrade to do the connection we already use for receiving requests and The only we needed something that would that would have stream like connection and the web socket fitted the bill Perfectly here. So basically that's it