 Привет, всем. Хороша видеть вас здесь. Сегодня я буду говорить о Pony ORM. Но перед этим, я хочу сказать, почему мы еще не разработали еще один мэпер в первом месте. В моей прошлой жизни я был техническим лидером в Нью-Йорке фонаресной компании. Нью-Йорке фонаресная компания. Мой кафандр Александр был интересен в объектах бассейна, с тем, что мы были студентами. Бассейна стала его пасшеном. Который ученый в Университете Санкт-Петербург. В какой-то момент он пришел с идеей создать предыдущий объект религийного мэпера. Хорошее и эффективное. Он делал эту идею с меня. И я просто люблю эту идею. После года, продавая продукты для такой динамики филдерской финансы, я увидел нуждание для этого. Если там был один, я бы его использовал. Как мы начали работать на Пони-Рам вместе? Как вы знаете, Пони-Рам очень хорошее, смарт. И это наш мэпер. Он очень хорошее, смарт. Что делает Пони-Рам из других мэперов? Во-первых, он использует динамичные генераторы для динамичных динамиков. Это пример для динамичных генераторов. Это помирает в бытах, а это же темпал. И, как вы видите, динамичный генератор выглядит какhardинал. Ни с какой стороны онplatится в рюкзаке. Динамичный генератор выглядит как фронт. И, наконец, видает и с кондиционами. Но эта какхо с тюрьмами, Регулируемый генератер расстирается в память, пока сико-локвере отбирается с датами. Предполагаю, как это будет, если мы используем генератеры для формии датами. Это точно, что Пони делает. Он же экземпляр по Куриаме. Эта формия, проведена от Пони, получит генератер с параметром, но не executes. Instead, Pony gets the generated bytecode. It translates it into SQL. Then Pony sends this query to the database and creates objects in memory based on the query result. This function returns a list of objects. The product is a class, which is declared earlier in the code. It represents the product entity and is mapped to a corresponding database table. So, let's compare the Pony syntax with other mappers. Here's the query in Pony. We select products with multiple conditions. The product name starts with A. The product has no image. Or instead of the previous two conditions, the product was added before 2014. In Pony, we iterate over the product entity. See how can we use a standard Python function. For example, a name is a string, so we can use start with function. And edit is attribute of datetime type, so we can use the attribute here. This is the same query in Django. The product class in Django has the objects attribute. It is a manager, which is used for querying this object. In our example, we have both and or conditions. We can express and condition, just providing a list, a comma-separated list. But in order to express or condition, we have to use the special queue object. And then we combine those conditions with a bitwise or operator. In Django, we have to use double underscores in order to apply a specific condition, such as start with is null and less than. In SQL Alchemy, the same query will look this way. We get that session object and make the request of the product class. In Pony, we would use the P variable in order to apply the conditions. But in SQL Alchemy, we use the product class itself. In order to get the year from product edit attribute, we need to use the extract function. Also, we have to put parts of our condition into parenthesis in order to keep the precedence of the bitwise operators. If we compare these queries, we look that they are pretty similar in the terms of length. Some people like Django and some people like SQL Alchemy syntax. But we like the generator syntax because it's easier to remember. It is the native Python syntax. You work with objects, which are stored in the database, like they were stored in memory. It makes the development easier. Also, using generators is more performant than query translation Django or SQL Alchemy. This happened because in Pony, the result of the translation is cached. The select function receives a generator as a parameter and the generator is a standard Python object. So we can use it as a key for getting the result of the translation of this query. In Django and SQL Alchemy, there is not such a key because the filter function just receives a set of parameters. And that is why we necessary to translate this query every time in Django and SQL Alchemy when we execute it. Pony makes the translation of the byte code only once. With other mappers, you can build query step by step. How is it possible using generators? Pony also allows building query step by step, adding conditions one after another. This is done with the help of the filter function. It receives a lambda expression as a parameter. This function turns a new query object, which with the applied condition. In this example, we add a couple of conditions, then order the result and slicing it. And you can see the result of the last applied condition of the slicing. Now let's see how Pony translates generators into SQL queries. Some people may think that it's complex process and it's too much magic in it. I want to show you how it works, so you can see that it's pretty fast and straightforward. The translation process consists of three steps. The first step, we get the generator and its code object. As you know, Pony Python doesn't provide abstract syntax tree for generators, so we have to restore it from the byte code. The second step, Pony translates this AST into abstract SQL. It is intermediate representation of SQL. It is almost SQL, but it's not a SQL query text yet because databases, they have some peculiarities in terms of representation of SQL queries. And different dialects can vary. So that is why we don't want to overcomplicate the translation part. We just put it as another step. So the first step is the de-compilation of the byte code and restoring abstract syntax tree. For de-compilation, Pony uses an object which implements the visitor pattern. It has a set of methods and each method name corresponds to a byte code command. We go through the byte code from top to bottom and call the corresponding method. While executing the byte code, we accumulate it on a stack. Let's say we have such an expression and now we are going to translate it to SQL. We get the byte code from the generator object and this is the stack where Pony will accumulate the AST parts. Now we process the first command, load global. We call the method of the visitor object the same name and this method pushes the name node to the stack. The next command, load fast. The name of this command is different from the previous one because it loads a variable from a different namespace. But for the translator, it does the same thing, just load a variable. Pony pushes the corresponding AST node to the stack. The next command is load other. This command we apply to the top of the stack. We wrap the existing node with a new one, get attribute. The binary add command pops two stack elements and then pushes the binary add command which wraps those two existing nodes. It combines the previous two. Then we load the X-attribute, the X-variable and get Y attribute of X. And finally, we pop two stack elements and push the compare node which combines them. As you can see, this process is very straightforward. It doesn't have too much magic and usually generator code is not that long. Also we don't need to build the whole decompiler here because generator just has a subset of Python by code operations. As a result of the first step, we get such an AST. It is a tree representation of the same expression. The second step is translation to abstract SQL. But which SQL should we build? Can we tell that just looking at this expression? It depends on variable types. For example, A and C can be strings or numbers. Y could be a string or a collection. If A and C are numbers and Y is a collection, then we should generate an numeric addition and a subquery. If they are strings, then we should generate the string concatenation. If Y attributes a collection, then we should use a subquery. If all of them are strings, then we need to use concat and like operators. If we analyze all these variable types inside the translator, we'll have a lot of if-then statements inside the translator and it will overcomplicate the translator itself. It will have to maintain and extend such code. So we should somehow structure the logic of our translator in order to keep the translator simple. And for this purpose, we use monads. A monad is a container which encapsulates the translation logic. Each node type has its own translation logic. The translator walks through the tree and delegates the translation to different monads. Each monad accumulates data and generates a part of a SQL query as the result. A pony has different monads for different types for different attributes. Each monad defines a set of permitted operations and also how it will be combined with other monads. And each monad generates a particular part of the resulting SQL query. Here I show you different monotypes. We have a lot of them, but just some part of it. String monads know how to make operations with strings. For example, this string utter monad can generate a join operation and string parameter puts a string parameter into a query. Using monads keeps the process of translation simple. We walk the syntax tree and we process each node when we exit from the node. We walk the tree all the way down using the depth-first strategy. When we enter the addition node, we don't have all the necessary information for generating SQL yet because we don't know types of nodes which lay lower. And we continue walking the tree all the way down. When we leave the node, we create a monad. Let's say that A is an external string parameter. Then we create a string parameter monad here. Then we walk further. Before leaving the node, we create another monad. In our example, B is an iterator variable. So we create the object iter monad. This monad knows on which object we are iterating in our query. Then we walk further. And then the translator processes this node. It doesn't analyze the node at all. It just delegates the get attribute operation to the previously processed monad. It says to the object iter monad, get the attribute C from yourself and create a new monad, which will be a result of this operation. And it creates a string iter monad because it knows the type. Then we walk further. And now the translator gets the left monad and says add yourself to the right monad. By this time we know all the types of operands and this string, and we create a string expert monad. And this monad can generate the part of a sequel for this green nodes of our tree. This way all the processing logic is encapsulated inside the monads. It helps to keep the translator simple and we can easily extend our translator by adding monads, which can handle new types, for example. So we walk further. And at the end we have the top level monad that has the generated sequel for the whole tree. Here the abstract sequel that we get as a result of the translation. It is stored by like a list by now and it reminds list structure. So the first element of this list is a command and then goes appearance of this command. For example, this like command. The first parameter is the column y from the table T1. Then goes a function concat, which is followed by four arguments that should be concatenated. By this moment the process of the translation is almost done. The last step of the translation is to translate it to a specific sequel, direct. This is a very simple step. Pony has a set of database providers and those modules keep the knowledge of the database peculiarities when it comes to sequel syntax. We walk our list structure using the visitor pattern again and combine the result in sequel into a query stream. Here we can see the result of the translation to my sequel and sequel line dialect. What other features? This is it with the translation. So this is the last step and we got our sequel query, which can be cached and sent to the database. By this time, mapping is what it is needed. And now I would like to tell you about other Pony feature. Pony keeps the objects in the identity map. It means that each object is loaded only once within the same database session. When we try to get the same object again, no query goes to the database. It will be returned from the map. Pony also has the Nplus1 query problem solution. This is when the map generates a lot of unnecessary queries. I'll show you in a minute how it works with Pony. Pony can optimize queries. For example, it can substitute a subquery with a left join in order to improve performance. Also Pony can work with the database using pessimistic and optimistic concurrency control. The pessimistic approach is when we log a row in a database in order to prevent its modification by another transaction. With optimistic approach, there is no luck. Before committing, Pony just verifies that no other transaction changed the value of that attribute, which it read. Pony also has online entity relationship diagram editor. You can build a diagram of your application online and get the code for your entities or SQL scripts for creating tables. Let's look at this example. Say we use Django and we want to select a couple of students from the database. Also we want to print out their names and their growth IDs. How many queries will be sent to the database and how many objects will be created after we execute this code? Let's see. After the first query will be executed, a student object will be created in memory. When we access the growth attribute, the second query will be sent to the database and the second object will be created in memory. Now we create another student object. Let's say that both students belong to the same group. When we try to access the growth attribute, Django will create another group object. It is a copy of the existing one. This happens because Django doesn't implement the identity map pattern. Each object exists independently and Django doesn't track if a similar object was already created. Let's see how it is implemented in Pony. The Pony syntax is a little bit different. As you can see, we use square brackets in order to get access by primary key. After we execute the first line, we have the following. A query went to the database and an instance of a student was created. Now Pony sees that there is a group object, which is associated with this instance. It means that this group ID exists in the group row in the group table, existing in the database. Pony didn't load this object yet, but it knows that it exists for sure because it's consistent database. Pony creates a seed object. This is how we call the object, which has only its primary key. The other attributes were not loaded to this object, it's just like a seed of an object. Using this concept of seeds allows us to make some optimizations. For example, if you don't need any other attributes, but just group ID, Pony will not send any other request to the database because the primary key of this group is already known. If we would like to get any other attribute of group, then the request will be sent to the database. But in our example, we don't need to send any request because the ID presents in the student table itself. Then we load the second student and Pony sees that this object has the same group ID, and it links it with the same seed. If we need to get an attribute of the group object for one or both students, Pony will send just one query to the database. But there is the probability that we don't need to send any other request. The concept of seeds allows us to solve n plus one query problem. Let's say we have an online store and we want to get customer orders, which have the total price of the order greater than a thousand. In this example, we use the order by method. We sort the result in the descending order because we want to see the last customer orders first. Also, we are getting the first page of the result. The page method, it's just a syntax sugar. We can use slice, but it's more easier to use the page because usually we want to show the results at the web page. And this is the SQL query, which will be generated for the first two lines. And when we start iterating over the order list, in this case, most mappers will run into the n plus one query problem. It is when a mapper will generate a separate SQL query in order to load each customer in order to show its name. And n here is a number of items in the orders list. In Pony it will work this way. The first query allows orders to the memory and Pony creates seeds for all customers objects. We can see what customers might need if we will iterate over the list. And when we iterate over this list, we get the first customer attribute. Pony allows all the seeds objects with one SQL query instead of n queries. This query selects all customers objects and the ID belonging to the list. In terms of query cost, it is not a big difference if we load one row from a table or ten rows. The performance degradation usually happens when we do a lot of database round trips. Of course, there is a probability that we don't need all the objects. But usually in web application we don't load thousands of objects. We just load some part of it in order to show it a web page. And that is why if we load all the objects by one query, it increases the overall mapper performance. In Django, of course, we can use select-related or prefetch-related something and other mappers also have some similar feature. But the thing is that a programmer needs to think about it himself. And sometimes newbie programmers they put select-related everywhere where it is needed and not. And if you put it everywhere, it hurts performance. But if you don't put it where it is needed, it hurts performance too. Having this concept of seeds allows us to avoid n plus one query problem automatically. And you don't need to think about where you need to put it. Actually, it's a smart way to do prefetch-related. Also, if you use select-related, it generates queries using join in order to join all the necessary attributes from other tables. And if you do a lot of joins in a database, it can be less performant if you just select data from one table. So when we select those customer IDs and customer objects in a separate query, we can join those records in Python and it can be more performant and more scalable. Another thing that we have in Pony is automatic query optimization. Here's an example. When we translate this generator to SQL, we get this query. It is called query-related because it uses values from the outer query. The query-related query should be evaluated for each row selected from the main query. And it can be inefficient. And there is a well-known technique of replacing query-related queries with left joins. Now let's talk about transactions. Let's say we need to write a function that transfers the specified amount from one account to another. If we use Django, we could start with something like this. Here we get the first account object by its primary key and check the available amount. If there is not enough funds, then we raise the exception. Otherwise we get the second account, save the object, increase the second account and save the object. Are there any problems with these functions? The problem is that these two account operations are not performed in the single transaction. We have two separate transactions here. And if we lost the database connection after the first save completed, we can leave the database in state. And we can of course solve this problem and add the decorated transaction atomic. Did it solve the problem? Not necessarily. Because this function can still override data if they were changed by a parallel transaction. When we do save it doesn't take into account if this data were changed by another transaction. And we can have the database in the consistent state again. In order to avoid this we can use a special F expression or lock the records in the database. And for this purpose we can use the select for update method. This way we lock the records in the database and no other transaction can change them until our transaction is finished. The inconvenience is that a programmer needs to specify the select for update method explicitly. Often an application consists of several tiers and in more complex application we can get data in one tier then pass it to another. And at that point we can just miss this knowledge and we might overlook the point where it should be locked in the database. And if we use select for update everywhere in order to avoid this problem it again will hurt performance. In Pony you can use the select for update method too, but besides this Pony uses the concept of optimistic locking. Here is how we can write our function Pony. The DB session decorator is mandatory when we access the database. And it wraps the database updates with a single transaction. We get the first account object and check the available amount. If there is enough funds we decrease our account and increase the other one. And you can see that there is no need to call the save method. Because Pony tracks the object which were changed due to using identity map. And all that changes will be sent to the database automatically when we leave the scope of the DB session. What will happen if parallel transaction changes the amount of any account? Pony tracks the attributes which were read and written during the transaction. And when we read the attribute amount of the account object Pony remembers the old value of this attribute. And now when we save the updated value to the database Pony adds additional where condition with this check. And the update query returns the number of rows which were updated during the operation. If the value was changed by a parallel transaction then the update SQL operator will return zero. And this means that other transaction changes the value and we have to roll back the current transaction in order to avoid the database inconsistency. This approach is called optimistic locking. We don't lock the object but we keep the old value of each attribute which we read and when we commit this object to the database we verify that this object holds the same state. This approach is more performant and scalable than locking the row in the database. It allows more parallel transaction to work simultaneously. Pony helps to avoid lost updates and keep database consistency automatically. Even if a programmer measures when this DB session decorator will lose the scope it will check if the database is in the same state. Another thing which we have in Pony is entity relationship editor. You can create ER diagrams for your applications online and this editor has several tabs. The first tab is where you can design your diagram. We represent entities with rectangles and also you have attributes, data attributes and relationship attributes. In Pony you specify relationship with both attributes at both sides. So here we follow the rule explicit better than implicit and it is easier to see what attributes the entity has when you look at it. So here you can see that relationship we use a line in order to depict the relationship and if it is a relationship too many it has fork at the end. Also you can switch the tab get the second tab Python code of your entities. And there are several other tabs where you can see SQL scripts for creating tables in the database. So those are main features which I told you today. We have generators and lambdas for database queries Pony implements the identity map It has a solution for nplus1 query problem It has an automatic optimization of query Optimistic transactions Entity relationship diagram editor Here is the roadmap of Pony RAM development At the moment we are working on adding Python 3 support because it is the most often asked feature Then we add Microsoft SQL server support Because at this moment we support only 4 bases SQLite, MySQL, Progress and Oracle We were asked about adding support for Microsoft SQL server After this is done we will add migrations and asynchronous queries support That's it, thank you very much Do you have any questions? Hello Can you return to the slide with db session decorator Oh, it's okay With db session This one? Yes, this one How it is done It is thread local So something like that thread local I didn't quite understand that question I hope I'm not asking the same question again If this now fails because the second So the update is not done because there was some other thing Will it automatically retry to do the thing or will it just fail If there is a parameter to this db session decorator or it can be used also as a context manager you can specify retry for example 3 or 5 and it will retry Okay In the nplus1 query solution how does pony know for which items in the second query to load the values Well when we execute a query So you do the first query to load the orders and then you have the second query for the customers Because order has a relationship to one each customer has a relationship to many let me show the diagram which I used for that example See when we go to orders Yeah I understand that there is a relationship but how do you know for which in which customers will be in the loop Do you just We don't know where we are going to loop over We just create seeds for those objects which it has a relationship to to one When we start iterating over these objects we understand that it might be needed all other objects So you show you load if there is one customer.name you load the name for all the customers Yeah when we send this request to the database it's not a big difference we load one row from this table or 10 not a big difference in terms of performance So then you load orders or customer seed objects that exist Yes we have all the seeds and then we load using this query the last line where CID in the list of IDs and then you load for all the seeds that exist It seems to me that after you have realized the seeds so they have become objects there is a lot more caching happening and I can imagine that that can also be very dangerous you are using weak references or how do you know that after you have fetched all these customers that they can be and you somehow globally cached them so that other queries Okay so for order one you select the customer one and you realize it and then for order three you have some global reference to it so that you can fetch it again Yeah I think I got your question but you know this identity map this cache exists within one transaction so when we start our transaction we use db-session db-session scope is over we just free the cache and usually you don't work with millions of objects when you built application right you don't use millions you can use bulk update if you want to update or work with millions of objects usually you load say 10, 200 maybe a thousand in order to show it a web page amount of objects Hey, comparing MySQL and SQLite for example MySQL has strongly typed columns and SQLite allows you to insert any data type that you want how do you handle these heterogeneous data types or do you just disallow them or what do you do How do we hold one? In SQLite you could considerably stick different data types into the same column SQLite allows you to do that How do you handle this in your mapping do you only allow insertion of specific data types or whether you filter them out or what happens really When we specify the type of object, sorry I didn't quite hear Okay, let me restate that In MySQL your column specifies what will be contained within In SQLite the column type is just a suggestion you can still have different data types whether it's integers, or text, or dates How do you handle that Well, in Pony we have different DB API providers and in terms of Pony all those types are exactly the same and this DB provider takes into account some peculiarities of the database but we use the Python notation of datetime and we represent it the same way in SQLite Maybe you are talking about the difference in microseconds for example with this type in MySQL and SQLite Pony takes into account using this a layer over the DB API driver on this particular DB API provider We handle that Hi, is it possible in Pony to iterate over results without having all the results in memory Is it possible in Pony to iterate over results without having all the results in memory To iterate over results You mean don't load all the objects to memory but iterate just in the database We need that if we want to update a lot of records at once bulk update For that purpose we have one method but recently we were asked about adding just a little bit more and we are going to add more functionality to that so you can update objects without loading them to the cache So I guess we have time for a last question Okay Hi, you mentioned on one of your first slides that it was fast this particular ORM so what makes it fast and how does it compare to others? Do you have benchmark data? Mhm We ourselves didn't create robust and good tests yet We definitely are going to do that but other users they tried to compare that and they said like 3-4 times faster Why it is faster? First of all it's because we can cache the result of the translation as I told because in Django and SQL Alchemy there is no way to understand that we execute the same query with Pony when our program is being executed we can see that this is the same generator, it has ID so we can immediately understand that we don't need to translate it once again we just can get the result from the cache and also when we develop Pony we are very picky in terms of working with references, objects we don't do deep copy a lot we just pay attention to details when we just use low footprint in terms of resources when we need to when we work with objects with cache we try to take into account the performance is one of the goals which we have in our mind Thanks a lot for your talk Аплодисменты