 Hello, my name is Jachim Kudanis and I work for a company called Combell. We're a hosting company in Belgium and next to that I co-organize a local user group in Belgium. And today we're going to talk about how Dr. Ian Cashing can skyrocket your application. I gave this talk before and then I had this image as a background. As of last week I can use the real image instead of the simulation. So this talk is about doctrine and I will first give a small crash course of doctrine. Then I will talk about the internals because by knowing how doctrine works you can boost the application quite a bit. At the end I will talk about adding cash to doctrine. Who uses doctrine? All right. Excellent. So there's a small disclaimer for this talk. It's only when you're using doctrine that it will, if you know how the internals work, that it will skyrocket your application, obviously. So what is doctrine? Doctrine is a object relational mapper and that is a programming technique for converting data between incompatible type systems using object-oriented programming languages. So what an ORM basically does is mapping data from a database into objects that you can use in your application. And vice versa it will then translate the objects, look for changes in the objects and persist them back to the database. So before we will have a quick introduction of doctrine, there's some terminology that you have to be aware of. We have a concept called entities. These are just classes with an identity and can have data to it and can have collections and other stuff. And we have a mapping mechanism which tells doctrine how to map some properties to database tables and how to map collections in that entity to foreign keys and relations in the database. We have repositories and repositories is a way of handling or of fetching entities from the object manager and the object or the entity manager is the way we interact with doctrine to fetch and store entities into the database. So there's a getting started tutorial on the website of doctrine and there's a GitHub repository which contains all the code. I have a small fork of it which automates some things. So here is my fork and I can run the tutorial and it will just show the PHP examples and then execute them so you know what's going on. So first of all we install Composer and if we install Composer we have the vendor directory here on the screen and we also have some basic entities. So we have a product which has an ID and a name then we have a bug and a bug has a description. It can be filed against a product. So it has a list of products and it has an engineer and a sign. This is the user entity which again has a name and then some collections referencing the other entities. So we have three entities. We have a mapping system and the mapping system here is the annotation mapping. So here we say we have an ID value. We have a column of a specific type. So the doctrine notes it has to create a voucher. We have some relational mapping information. So doctrine should know what to do. So if we install Composer we have some scripts that we can use and here we see we have a schema tool and I will drop it because I will create it once again. And here we ask the schema tool to reach all the annotations and then just generate a database for us. In our case it will generate a SQL lite database. So here we have a driver and it will just create this SQL lite database. So I have dropped it and I have a new database with all the schema definitions that are necessary to run this tutorial. So it says don't do this in production because it will just try to execute queries to get into the state it should be in. And if you do this in production you should use migrations and I want to talk already about that. So don't do this in production. So here we have a new script. A script that will take an argument and it will just add it to the name of the product and it will persist the entity into the entity manager and then it will flush it. And the entity manager will make sure it is flushed to the database and give us the ID back. So if I execute it with ORM I will get ID one. If I execute it with the database extraction layer I will get ID two. And now we have two products in the database and if we list it we can get a repository from the entity manager and then just find all the products that are available in the database and just loop over it and no big surprise we see what we just added to the database. We can find products directly on ID. Also pretty straightforward. Here we can update the product. We find the product based on an ID that was given as an agreement. We give it a new name and then we flush it to the database and doctrine will take care of all the checking what state was changed and what it should render to update the database. So we update the first product to PHP. And that's it. The user put a state forward to. I just create a new user and I get the ID back. And the next thing is the S debug. I will try to scroll because it won't be that readable. So for the bug we have a reporter ID and a sign ID and then a list of products that we want to link debug to. So we just fetch the user from the database. We create a new bug. Add some values to it and then loop over every product ID, link it to the bug and then flush it to the database. Again no big surprise. We have a start entity and doctrine takes care of all the linking of all the objects. We can query some more using DQL and DQL is a variant doctrine query language which is a bit simpler or more usable in the context of doctrine. To fetch data and then just loop over the data. And here we see that if we fetch the bugs, the reporter is also fetched and we can access it through the property. So we can just go through every relation if we want. So I filed one bug. Here it shows up. No big surprise. Showing a bug just the same and we have as a final note a dashboard which gets all the bugs that I am an assignee of or I reported on. This is just again. It's showing me how many bugs are open and it displays all the bugs I filed. I talked about repositories and repository is a different class in the bug. I marked my class as an entity and the bug repository as the repository class. So every time I request a repository, doctrine will just give me that file. And in that file I have a bit of I can group commonly accessed queries. So this is just a handy system to bundle all the queries. So no surprise. It's just exactly the same. All right. So who learned something? All right. Excellent. So if you're not using doctrine, this is how doctrine works. You work with objects and you don't mind if you don't care how it is persisted in the database. So the only thing we saw with the entity manager is that we persisted entities and that we flushed entities. But how does doctrine does this? This is like a high level schema of the internals of doctrine. So on the left side we have our own application which can query or find entities. In the application we update them and then we flush them back to the entity manager. We can do that using repo stories or we can just ask for a specific entity to the entity manager. On the other side we have a database abstraction layer. That's a different component. And that will take care of the different SQL variants for like SQL or another database. And it will take care of select and update and delete queries. So the big chunk in the middle is the entity manager. And an entity manager is the central point where our application will communicate to doctrine and request entities and give them back to doctrine to manage them and to calculate the differences. Inside the entity manager we have a unit of work. And a unit of work maintains a list of objects affected by a business transaction and coordinates the writing out of the changes and the resolution of concurrency problems. So everything we give to the entity manager, the entity manager internally stores in the unit of work and everything that is changed, we ask the unit of work what should we do to persist the new states to the database. So it keeps track of all the changes that have been made. So doctrine knows what to write to the database. The unit of work is also is responsible for calculating all the differences and uses a concept called transactional right behind. Transactional right behind means it will create a very small window where it opens a database transaction and flushes all the changes it has calculated to the database. So if you have a script of a web page that is loading and it is taking five seconds, imagine you open the database transaction and then you fetch all your records and then you are doing some manipulations to it and then at the end of the request you commit that transaction. Then you have a lock on your database of five seconds and transactional right behind is you do all the logic in your entities and then you give them to the entity manager and the moment you call flush then it will calculate all the things that have to be done and it will open a database transaction flush all the update statements or delete statements and then commit it. So it's if you change your entity it's not directly persisted to the database. So inside our unit of work we have a identity map and an identity map ensures that each object gets loaded only once by keeping every loaded object in a map looks up objects using the map when referring to them. So instead of always going to the database and fetching a record where they give an ID it will first look if it is available in the identity map and it does that to avoid having multiple instances of the same object. So if you have a user with my name and we know it was ID1 and we fetch ID1 it is stored in the entity manager and the identity map and if I do a query give me all the users with first name jachim it will do the query but then it will see it already has that entity in the identity map and then it will just return that one to avoid having multiple instances because if you then flush the entity manager it won't know which entity is the correct one. So entities are stored in an identity map and they have a state. We have four states and a doctrine. We have the new one and this is the one where I just create a new product and I haven't persisted it to the entity manager so doctrine doesn't know about it. We have the managed one and that's the moment I do a persist on the I persist the entity and the entity manager or if I fetch it from the database then it's managed to see. It could be detached if you explicitly asked it to be detached or if you serialize it because it's not an object anymore and then it's to get detached or it could be flagged for removal. So I said we use a transactional widebehinds so it's not if you request a removal it's not directly removed it's just flagged as removed and if you flush the entity manager it will effectively remove it from the database. So this is the unit of work and then here we have persistors and hydrators and a persistor is just some kind of serializing to SQL. It has all the changes and it knows how to save it to the database and the hydrator is it takes the raw data and it hydrates it into objects and we have to keep the hydration in mind because this is about skyrocketing your application. Hydration can take quite a while and sometimes you just don't need it so if you have raw data and you just have to display it you don't need to hydrate objects and you can have a bit of a performance boost but hydration just takes raw data puts it in an object that you can interact with. Another concept that is used by the entity manager are proxies. Proxies are lazy loading mechanisms and it is an object that doesn't contain all of the data you need but it knows how to get it. So Dottrin uses this in collections so if you ask for a user and the user has a signed box if you don't use proxies or lazy loading it will just fetch it which stored a complete database into into memory because it's it's all related but you don't know when you need it so instead of loading the properties instead of loading the relations at once it will just create a proxy object that knows how to load the objects. So the entity manager just gives you fake objects but you shouldn't care at least for now why it does that. So now we know the internal workings of Dottrin and now we can run the tutorial again. So if I run it again with the SQL logger and this is just a flag every SQL statement that is executed will show up here on screen so again composure install here we drop all the tables from the SQL database here we say create our schema and you can see that it creates the products and all the necessary information it creates index indexes on all the foreign keys so Dottrin does a lot for you and you don't have to you don't have to care about the internal workings. So if we create a product the moment we flush it will just start a transaction create an answered statement and commit it. The same with the Dible product if we list the product it will just create a select statement to show a specific one it will again select statements and surprisingly add its aware statement to it and here we have the update so we first need to create the entity into memory so it does a select statement we change some some things we flush it and when we flush it we'll see that only the name has changed so it will open a transaction this is the transactional right behind it will open a transaction and updates all the data in the database same for user in the transaction and here is where it gets interesting so we have the create bug and as you can see here we will find a user with ID one which is my ID and it will find a the same user also user one but it will use that as the engineer and then we will create the bug itself then loop over every product and find the product link it to the bug and then flush it and if we execute the script you will see that it only performs one query to the database so we requested two users with the same ID so it will skip the database query and it will just fetch it from the identity map so we have two users with one query we have the products and if we then save it or flush it to the database doc3 knows it has to create a new bugs record and that we have a many to many relationship that it has to create statements for that too so here we have the tql and that is not recognized by the by the database engine so it has to create sql for that and then here we have a property that we can that we can loop over all the products and here was where the lazy loading will be will kick in so here we can see that we have a query that has generated a sql query and this is a many to one relationship so we can just re-join the engineer and the reporter and it will just load it in one go and then we will loop over the products and the products as a proxy object so the moment I say get products it will execute the query and get it from the database so if I load the bug from the database and don't touch the get products methods it won't load all the all the objects so this is something you have to keep in mind to make sure you don't load the complete database into memory every time so show bug here we see the proxy object again we get the bug and then the moment we had we hit get engineer it will load lazy load the engineer information from the database here we have the dql to sql example and the repository is just the same as before so we saw all the concepts that we just that we just talked about and we can start using doctrine a little bit smarter yeah to make sure we don't abuse it or it slows us down so another thing where we can get some performance gains is understanding how the tracking of changes works every time we flush we flush something through the database we have to we have to calculate in a way how the how the changes what changes there were and how we should persist them to the database so I quickly go to the some examples if we created we have an object it's in the new state we persisted it's in a managed state we flush the entity manager and it will create it will calculate all the changes that we'll see it was a new object and it will create insert statements and do the transaction thing if we get the object we request an object we first look in the identity map we do a database query we hydrate the object we save it into the identity map so the next requests will get it from there if we update it we just do the we first to get it then we change the property and we persisted in the entity manager but since it is already managed it won't do anything so we just let the entity manager know that we want to persist it and then we flush the database again it will see it's an already existing entity it will create update statements for it and it will it will perform a transaction when we delete it we ask for removal and only if we flush it it will actually actually generate delete SQL to delete the object so you see that we only always have this calculate changes step and the way it is done is by using a specific tracking policy doctrine has three different tracking policies the default one is deferred implicit and this is just the default one that will do a property by property comparison of all the properties in the object and it will do that for everything that is reachable from within the identity map so everything that is loaded and all the relations it has it will just say flush to the data flush it to the database and it has a copy of all the all the entities that are actively tracked with all the changes in it and it has a copy of that with all the entities that have the state of the database so it will just every entity it has it will do a property by property comparison and then see this one is changed we have to update that this one is changed so this is the default one but as you can imagine if it does is on all the objects it is the slowest one so the second policy is to use the default explicit and this is exactly the same it will use a property by property comparison but it will only do it on entities that are explicitly persisted to the entity manager so if they are loaded via a proxy object or something like that it won't trigger the comparison so here you can boost the boost the performance a bit by avoiding docs freeing to check every every entity that it has under under its identity map and it's as simple as just adding a new annotation which said default explicit and it works in a different way the third one is the most performant one and it's notify and it's the most performant one because the entity itself is responsible for publishing events when something has changed this kind of is a bit crazy if you are using something like doctrine to do all the calculations for you and then you start using notify tracking policy because then you just are responsible yourself for saying to doctrine this field has changed so it's the most performance one because doctrine doesn't have to do a thing so you do this by implementing the event listener the event the interface and the interface just gets a list of of built a list of listeners and then in your in your entity you have to manually notify all the listeners that that specific field has changed so if you want to go for maximum performance do this but then your code base will be very polluted switch um yeah event missions so for the improvements we can use a different result format and this is the hydration step I mentioned before that we are skipping this is usually done for read only data because you don't get object back that you can change properties on and then flush it back to the database so it's more like if you have big tables to show you can use a average result or something and it's only scale scalar values but it's also possible of or it will also give you a nested array graph so if we have the query I showed you before where we do a join of a reporter and some products and here we ask for the array result it will just generates a nested array array with with all the data and here you can see it is nested because we have all the data from the user entity that is added to the reporter key so if you're just fetching data to display on a page don't bother with the hydration step just use the plane array format um if you have for instance a logging entity or a log entity you don't want the entity manager to calculate all the changes on that log entity you just want to read them out and display them so that for that you can use read only entities and it's just a flag in the entity where we say to doctrine fetch the data from the database but if we hit flush just skip all the classes all the product classes because we don't want to start them back in the database I have a quick demo for that read only so this is a here we fetch a bug and the bug is not read only so this is the normal way of working so we have a an existing bug and we create a new bug we persist the new bug and change the bug we already had and if we flush it to the database we want to see multiple updates there are an insert statement and an update statement and if we do it again we want to see a update statement so no surprise we have a select statement for the one we already have then we changed the probe we create a new one and we change the property of the first one so we have a insert statement and an update statement and then we change the new bug again and we have a update statement if we have ever add the read only flag to the entity to the product entity and we do exactly the same but which products instead of instead of bugs we will see find statements and insert statements but we won't see any update statements so here we can see that we have a select statement and the new one gets persisted but the changing of product one is just completely ignored and if we change a new product it's again completely ignored because we said it was read only so if you have if you have entities that are only for only for displaying things and you don't want to persist them back to the database just flag them as read only and you're good to go another thing you can use is using extra lazy collections so we have the proxy objects and these are the default way of loading collections so this is the lazy loaded one where we inject the proxy object and the moment we use that method it will fetch it from the database the opposite one is the eager loading and the eager loading is just it will force a inner join every time we load a product for instance if we say we want the bug and we want to eager load all the products associated with it it will just use a big inner join and always load all the products instead of lazy loading it from the database extra lazy loading is the collections that are having another trick and instead of always loading always lazy loading the collection when you hit the property or if you hit the methods if you do for instance a count on the collection it will see it as a count and it won't first load all the items from the database and then do a PHP count on top on the on the collection and return that it will see that it is a count and will do a optimized select count from the database and give that back it does the same for if you say does this object there's this collection contains an entity then it will try to figure out if it can do a different a different different query instead of loading everything and this is done by if you by adding a new property to the annotation where you say how you want it how you want the fetch to work so this is everything where I can tell you about the internals and only by using the internal only by using your knowledge of the internals of doctrine you can avoid a lot of things that make or m's kind of hated by some people because they just don't understand how the internals work and they are abusing the entity manager and they're saying this is very slow it's loading too much from the database so you have to or m's can't solve every problem so you have to know where they are good at and where they're not good at so I hope you have a better idea of when to use the or m and how to optimize some constructs but if that's not enough you can use caching and doctrine so first of all we have a component called doctrine cache and this has nothing to do with the or m but this is just a interface which defines that we want to fetch data with a cache ID we want to see if it is already cached we want to save some data with a specific ID and we want to delete the cache ID so it has a lot of drivers you have memcached you have database drivers you have file caching and it is used in a not only doctrine but in a different in other projects too so this is the component and doctrine uses this component and doctrine or m so the or m can benefit from the caching too and we have different caches in the or m so the first cache is the metadata cache and the metadata are the in our example the annotations where we say to doctrine this property maps to this field in a database and you can do it you can do this by annotations yaml or xml and three points oh you won't be used you can't use yaml anymore because it will be removed so you have xml or annotations but all the parsing is something you just have to cache because it's useless for every request you have on your website to do all the logic of okay this field is mapped to this data to this property so this is a cache you have to have on by default if you don't use metadata cache you just don't care about performance and you can leave the room because you don't care you don't care about speeding up your website so metadata cache is just a flag or a configuration injection where you say this is the kind of cache driver I want to use for the annotations or the xml and it will parse all the mapping information once put it in a cache and then it won't have to be loaded again the next one is query cache and query cache is not the same as query caching in mysql but we have a dialect the doctrine query language and that has to be translated to a in sql statement that the database engine can understand and again if you do this on every on every page request you don't care about performance so those two are the ones you have to enable by default and you will see a spike in performance um yeah with two simple steps the third option is result cache and the result cache stores your sql query result into a cache that doctrine can read again so you can skip the the round threat to the database um it stores the raw data so you still have to um hydrate all the objects but that's not really an issue if you can remove the network latency and the round threat to the database you will see a big performance boost already um and the thing to keep in mind is that if you do the joins like i said before um if you you load a bug you want to do a join with all the products it will have a very big table where all the products are listed on the line but the bug information will be repeated every time so this is how joins work and that information will be stored as is so if you have very big um join results that's your cache will grow but storage is cheap so but um but this is something you have to keep in mind so you have still have to hydrate objects and something you can do to go even further is custom caching and custom caching is caching the hydrated results instead of um thank you instead of always um going to the to the hydrating loop again so the hydrated results um can be easily done by building a decorator of on the repository where instead of um going to the database of fetching all the um data from the database and hydrating it you can store it um hydrate it in the cache and just bypass the repository all together but it's generally a bad id uh just look at the picture and um it's generally a bad id because you are serializing entities and entities have collections so you are serializing collections and unsuralizing um objects and big collections and lazy loaded collections is just something that will blow up in your face um so um so like i said the moment you serialize a object will get detached and the uh object manager won't know it exists so um this is a pull request from uh from 2011 result caching used to cache the hydrated result but they reverted that uh for the reasons i just mentioned uh it's really pain in the ass to um figure out what was already hydrated and registering it to the object manager and saying you already know this object just here is a cached version so they just skipped it all together and hydrate every time um but php7 will solve a lot of those issues too so it's not really an issue anymore so that are the basic cachings uh that are available in doctrine and there's a new thing since uh 2.5 and that is second level caching um and second level caching is surprisingly just like second level caching uh and a computer so if we do the um uh comparison with doctrine we have the um cpu is the unit of work and it has the level one cache and that is the it entity map then we have the entity manager that stores um um non-hydrated data close by and if it's not in the second level cache it will just go for the memory in our case the database so um this is small but um fast this is bigger but less you know you know why uh the um so the the the speed goes down the further you go so if you have um second level caching you can speed up your application a bit more so it doesn't cache entity instances it um if you enable second level cache you only store identifiers and the values um so it doesn't store hydrase hydrated um entities you just have um plain data it still has to um hydrate in an entity and it is best suited for read-only data um but you can use it also in a normal uh like you would just um persist uh things to the database I will show in a moment so we have three types we have the entity data the entity data is stored with the identifier and then a plane uh and the raw data as values we have collection data and it will store the owner ID and then a list of IDs of the um referenced entities in that collection so no data here is stored of no values are stored here and we have query data and query query second level caching uses the um generated query it hashes that and it will just save a list of the IDs that match to that result and it will keep the uh list of IDs in permanent cache storage so if you use caching on the collection of the query you must um you must use entity data caching all you won't be able to find the the uh the data so entity caching is straightforward you just add a cache annotation and you have entity data caching the same for association caching so it is pretty uh pretty simple to enable it and to have the benefits from it the caching has a concept of um regions and regions are um used to invalidate caches and to have different uh TTLs on caching so every type had its own region every entity collection has its own region so this is used to um label of our group a bunch of um cache entries that you can delete safely um you have some modes you can use read only data which is um the safest one because um if you update things it will just skip the uh will just skip the caching you have non strict read write and this is used um this is a read write mode without locking so it will just do best effort of um of writing your of or updating your data um and then you have the read write and that will use a lock on your caching uh layer but your caching driver must support the um the locking uh so we saw the entity and the collection cache and the query cache we just say we create a query this is what we will uh that what we will query and we want we want to return to be cached um I will skip the different modes um so the query cache ignores uh sorry the second level cache just um is ignored by deleting and update queries because it just bypasses that because you can't cache delete or update queries um so they are directly um um injected in the database um but the problem there is it won't um it won't update the cache so you have to do that kind of manually but you can do that with uh some methods on the entity manager so here we have a hint that we say we just want if we update if we perform the query we just want to evacuate the complete cache so um everything will be deleted the safest option but yeah you won't have any cache anymore if we know we are we are updating a country we can say we want to update the entity region of this entity so all the other caches are intact but we want to update um only the uh we want to affect only the entity uh region and to be more specific if you're um updating is a specific um a specific entity you can evict that entity all by itself there's also a lot of logic and the background um and you also you have a concept called timestamp regions and this is used by the persisters and the background so if we um fetch everything um it will start it was it will store the query that was used here and it will store every um entry in a timestamped region so if we ask it again it will just load it from the from the cache and if we update the country and we flush it the next time the persister will see that the specific um cache region has a different timestamp than the previous one so it does some kind of magic also to um help you with the query caching uh with the second level caching in validation there are some limitations um if you're using second level cache you can't have a back office system or something like that because the database will be uh updated from another application and you are um looking to um your second level cache close by so you won't see any updates um so it's you can you can use it if you also invalidate caches from the other side but yeah down the rabbit hole um and then you uh the other limitation is that you have you must use single primary keys but that's not really an issue uh because it's easier any way to use single primary keys um i have a um small tutorial on that but i didn't know if i have time to show it we'll just try it so second level cache i will store my um caching in a data store in a in a file um format so here i just say that i want to create a caching factory with um default regions um so per region you can uh specify a TTL and stuff like that we just use the default one we say we want to use the file system cache in a this directory and then we just enable the second level caching and inject the factory to the configuration and the entity manager knows it it has to use second level caching so here we um add a read only um method to the caching of the products within its own region um so if we list the products the first time it will uh to waste equal uh select statements and here we can see that the this triggers um uh the second level caching uh to store its data so we have three um data files with the data of the entities and we have a query caching um entry which just has all the IDs of the um of the other entities so this is the content of the data we have a count cache entry with an ID and the plain text uh of the the underplane values and here we have the query cache so the SQL is encoded in the key and we have a list of identifiers that match the um the query so here we execute it again and we see the select statement is um isn't executed the same for products we have um we have the bug and we mark the products as uh cached the first time it will fetch the bug and it do the it will do the lazy loading of the products and the second time it has the product and cache and it won't generate it won't execute the SQL anymore this is an example of the query cache who we just say we want a uh product with a specific name and we enable the caching on it so first one query second one no query um and that's the example of second level caching in conclusion um keep in keep in mind how the internals work to to um yeah make sure your performance is is getting better um always use the query and mapping caching uh and use upcode caching um yeah because if you don't you just you're not interested in performance um if you have heavy query heavy queries uh with a large result set cache those queries and give second level caching a try um you will see if it is if it's suitable for your application or not um but even if you're only just caching the entities and not the queries or the um collections your will see already a performance boost because the data is cached locally and finally the query cache of the doctrine is not the same as the second level category query cache because this will store the raw data from the database and this will um store a list of identifiers so it's not it's not the same um the slides will be available online uh on uh join then please give me feedback um and thank you for your attention