 Hello all of you And thanks Michael for the invitation to here. Thanks for organizing everything Yeah, my name is Harald Zettelhofer Let's make it Harald Zettelhofer was a little easy for me to pronounce, but maybe not for others Yeah, I've my background is actually I've been working on Databases and PHP projects for quite a while now so I started with as a database consultant in the mid 90s and I've been working with PHP since 99 so also quite a while and Especially with with databases My my topic was always performance. So I've worked on quite large projects here for a large enterprises mainly on Oracle databases and Performance was always an issue here. So I was always the one that was called when the database was slow and we really had Projects here for large medical companies with document management systems with millions of documents and they had performance issues in their applications and At least The first impression was always that the database is slow but The more I worked with these projects the more I saw that it's not always the database most time the problems are Originated in the application itself and that's what I want to talk about today So we could fill a lot of talks with performance tuning for databases Today, I'd like to focus on what we can do from the database and why it's not always The database itself that is slow, but we can do a lot directly in the application Yeah, right now I'm working for for Dynatrace as a Technology strategist Dynatrace is a company in application performance monitoring. So We have a tool that monitors the application the full Transaction from the browser so from what the user is doing there all the way through the network load balances web servers caching layers Application down to the database and especially in the application really down to code level so we can see What's executed where performance issues are where errors occur and can do root cause analysis very easy here But to start with my presentation Let me tell you a little story here most of these stories start with once upon a time I'd like to start with as long as everything runs fine and runs fast We are like in paradise right so no problems No Discussions no no issues, but then all of a sudden when we have users out there that experience performance issues Then the problem starts so the user calls The application is slow. What's going on here? I can't work anymore and then in the in the operation Department typically the blame game starts so This is what what I saw with with many of these projects when someone was or something was slow First they looked at network monitoring locks. Okay network was fast different peoples got involved and A lot of times it goes to the development team them because when the infrastructure is running fine servers are running Memory is Available in Yeah Just enough CPU is fine then it goes to development and We already we also had such an example in-house where we had to create or that the developers had to create a report for Meeting rooms so a reservation report for meeting rooms and that report was just slow So it got back to development. What development did here is they added Simple logging logic Just they captured the timestamp before they accessed the data and the timestamp after they access the data Before they created the report and then they found out. Okay, that took 45 seconds And it was easy Not my fault so it has to be someone else or something else and if we Look at the code here and we see okay. It's load data for room It's probably loading data from the database. Okay, so it's the databases fault And it went all the way to the DBAs Well, the DBAs had their own tools To check the performance here and they looked at slow queries in the database and They found out that all of the queries that were executed there were actually faster than one millisecond So it was easy not my fault So whose fault was it then and then the blame game started so what what happened actually The problem was We executed for that request we executed 25,000 database statements and The 40 seconds that were really spent in the database For these 25,000 seconds we have yeah Each web request is faster than or in average 1.5 milliseconds so the database is not really slow here and This is what we see a lot of times so the database performance hotspots or Performance hotspots where we see that in the application most time is spent in the database Has several reasons on the one hand and this is in my opinion the most important. It's the application design and Then database queries how they are executed and how they are designed and For sure also the infrastructure of the database server and for sure the network with connections But to bring that into a pie chart most of the time is really Caused by the application So that means most of the performance issues of the database can be fixed in the in the application We don't need to bother the DBAs here in detail and this is Feedback that we get from a lot of developers out there it's Slow database queries and slow database queries is not necessarily because of missing indexes or whatever It's really the design of the query itself in efficient application code and too many database queries so too many database queries the example that we had before and That's this Factor here that I want to talk about in the next couple of minutes here too many database queries Where do they come from? Let's do an exercise here We have to create a code for creating a list of courses So we have schools departments in schools and each department offers courses when we consider the database now just as a data storage and we have Existing access code for the database and we can just focus on creating the code here to Create the results that we want and this is actually a situation that we see a lot of times when we are using ORM layers so object relational mappers access Libraries for databases like doctrine for PHP or for Java hibernate. So doctrine is actually based on this hibernate approach so access to the database is coded there and We just use these layers in In our code so actually the developer a PHP developer does not really need to care about databases care about SQL language care about what the database is doing there on the one hand It's easy for the developer because he can focus on on his duties on the other hand We will see what the disadvantages are of that. So in that case we have school entities We have a loop we get each school out of these set of schools We loop over the departments for each school We look over the courses for each school and then we print it out. So department name and the course title Let's imagine we have 10 different schools 20 departments per schools and 50 courses per department. So what would we? end up here what we What we see here is a typical We call it the n plus one problem So where does this n plus one come from n is actually the number of different entities and for 10 schools 20 departments and 50 curses per department plus one initial query to get the set of entities We end up in 1001 database statements Just to create that list so if We have a different approach here to not use these Libraries these existing libraries, but if we as a developer are able to Create SQL code directly and send it to the database. What would we get here? We could create one query a join query where we join the source tables and get back the result set and just loop over the result set and and Print out that list We end up in one Database statement instead of ten thousand and one and that's quite performance improvement The only thing we have to consider here is that as a developer Coding development PHP code. We are no longer just Focusing on on PHP here, but we also consider SQL Because we consider the database not only as a data storage, but as a Powerful engine not only for storing data, but also for getting data back and we can leverage that Receiving too many records. Let's get back to that example before we want to filter that list to only show Curses in category SQL with a level expert and Let's imagine we have in total three courses For that criteria we still would create 10,001 queries in the database because we are doing the filtering in the application This is also something that we can easily add To the query in the database just by adding a word loss Category and level so we receive back three records and still have one Database query and Especially for databases that already have a high number of connections that already have a high load This is very important because we can reduce the load to the database here reduce the number of connections and We can save a lot of Resources we can we We don't need such a powerful database machine here caching and I'm not going to talk about Application caches here As we are talking about applications, but let me talk about leveraging database caches by Doing the right things in the application itself and for databases that are used in from PHP a lot of times we see a lot of in the one hand my SQL on the other hand Oracle and the SQL cache or the data cache in these databases is Significantly different and that's why I'd like to mention that here. So the the my SQL query cache stores Results so really result sets for queries in the cache So whenever a new query with the exact same text is Executed in the database and data is available in the cache then the result set is sent back from the cache That means if you have a query Select whatever and the next query select in in upper case letters. It's a different query It's really a full text comparison The Oracle SQL cache is a little different here. It also stores the information for the text of the of the SQL statement, but it does not store the result set the data But it stores the execution plan That means if We send a query to the database the database Optimizer first has to decide how data is extracted from the data tables and This execution plan is or can be stored then in the in the SQL cache from Oracle The big benefit here is that this parsing of the statements are creating the execution plan or from a Programming language perspective. Let's call it compilation Is the most time-intensive process here and also the Process that requires most of the CPU time and that's why it's very cool to store that in the database and again here Select in lower cases and selecting upper cases are different statements So if we receive the first statement and then receive the second statement It has to be parsed again because it cannot be found in cache the difference now is that for prepared statement That we sent from the application In in Oracle No matter what the content for the for the values of the bind variables in prepared statement is we will see an example later We will in most times Get the same execution plan However, the data set the result set is different. So in in Oracle when the when when when the database checks content in the cache it considers only the statement without the bind variables while in my SQL as the result set could be different it has to Set the bind variables first to check whether data is available in the cache anyway, but we will have that in an example So for instance Small piece of PHP code We create a PDO connection to an oracle database and send four different queries to the database select star from country where code is for different values So that's for different statements It has to be parsed four different times because when we receive the second statement in the database The execution plan cannot be found in the cache. It has to be parsed again However, the execution plan is probably the same so it could be used from cash, but it cannot be found there and what we see in In in in a in a database metric dashboard here that most of the time is really spent for parsing the So so the the dark blue section here for parsing the statement and if we check the CPU We have a very high CPU usage because of this a lot of statements That has to have to be parsed again and again So how could that be done differently? We can prepare a statement and use a bind variable here and just for the execution we set the Value for this bind variable to different values and in that case for the execution the execution plan For that statement is taken from the sequel cache statement There's no requirement to pass the statement and this makes it much faster And we also see that then in the in the result dashboard we there are some Times for sure spent for for parsing because initially it has to be parsed to be stored in the cache But CPU usage is is very low and again this on the one hand Saves a lot of resources on the database server and saves a lot of time that is not Creating overhead performance overhead in our transactions anymore Another thing and this is already getting a little into the database is Using indexes in Some companies or some organizations indexes are just seen as responsibility of database administrators, so the developer Creates his code and the DBA has to make sure that the Indexes are created accordingly I have seen a lot of Organizations where the creation or the definition of the indexes in detail where under responsibilities of the developers so developers created their code created their queries their sequel statements and Developers also defined the columns that had to be indexed for a proper access of the table That's why I'd like to mention it here. So what is an index an index is a data structure? Where a search value is is referenced to a destination table just like in a book an index when you look for a certain keyword in a book You don't want to go through the entire book to find the word You reference it in an index and then you get a list of pages where you find information for that can see that In a similar way So an index is used in a in a sequel statement when we have a where clause for instance like we had before where we filtered the result set to Sequel courses with level expert we had this where clause and then such an index is Used or can be used As an example here, we have a sequel statement Select star from msd where length is 10 So what happens here when we look at the execution plan and in my sequel? This is easy just by using explain Then in that case we see that no index is used it tells us no possible keys and the result set is 70 something thousand rows So why is that in that case because when we look at the existing indexes show indexes for that table We cannot find an index on that color however, if we use a different column in the where clause if we Filter it on group ID then we have a possible key and it's also used and The resulting number of rows is only 15 So on the one hand, it's it's faster in the database because of the Rows that have to be extracted on the other hand Transferring the data over the network is faster because we have much less data to be transmitted in the application It's faster because we have much less row to be processed It's always good to use indexes here But indexes are not only used for for ware clauses indexes are also used when tables are joined for instance like we had The example before where we joined the schools with the departments and the courses if we Are not using indexes there then we would again have these 10,000 rows, but when we or Actually We still have the 10,000 rows, but we need to do a full table scan on each table and then match the Criteria defined in the joint so in this example here, I Have a couple of tables joined and I see that the execution plan Uses these different keys and has just a small number of result sets Resulting rows We have to be a little careful here because in in sequel statements. There are so-called index killers So we can have certain Structures there to restrict result sets or join tables that are just not using indexes and For joining tables on the one hand, it's very popular because it's a very easy way to Do restrictions to use a so-called in gloss where we have a sub query and then we have a Query on top of that and we say where ID in and then the list of That the resulting list of the sub query and this is this is what we call an index killer Because these tables cannot be joined with the index and that's why it the explain plan also says Possible key null and has the full number of rows here in the result set So very useful commands here in my sequel are for instance the explain and Then just the sequel statement because this shows the execution plan How that statement would be executed and also to show indexes for table names. So if you Do the explain select whatever and then you find out that your statement is probably not Executed as you would expect it then you can use to show indexes to check Whether the index that you would expect to be used is available at all or maybe there's a better index Maybe a different column would make more sense It especially with with complex queries with complex structures. This can be a little tricky and sometimes requires some Experience on on how to to optimize that in general from from the developers perspective, so I Saw that a lot of times and a lot of developers tend to say well, I Focus on the business logic. I don't care what the database is doing because this is used for data storage Others should care about that on the other hand. I'm really convinced that if a developer Has knowledge about what's happening in the database and why this is happening in the database then We can avoid a lot of troubles already in development because Performance is not just a nice to have feature performance is really essential performance is considered by by the Google search results in your result ranking of your of your website Performance is more and more important a lot of people expect in online job, so we typically see this level of three seconds so when when a page Renders longer than three seconds a lot of people already are leaving and not returning to the online job so In my perspective the database is not what's slow In general, but the way it's used from the application and that's why it's very important to make Developers self-sufficient so that they get an understanding what's happening in the database even if We have to use existing libraries and we cannot Define our sequel code by ourselves at least the developer should get awareness for what's happening there So if the developers is I will definitely run into a problem here I have to maybe Escalate that to a different level to give it to a different team. They have to create new methods new access methods for data in the database that are Fulfilling my requirements then this has to be done and With with APM tools performance monitoring tools We are really able then to see for a certain web request that's coming in what is Executed in the database for instance. What what statements does this the trigger or See that in a transaction flow to see okay I have one web request and this triggers So many Executions to the database. However this Too many database statements is actually also The same for external web services, so Database could be fast, but when we call database Thousands of times the web request will be slow the same as for web services If we have a web service an external one and that's fast, but we call it a lot of times Then our entire transaction will be slow For single select statements, we can have them listed then by execution time and also by number of Executions and then it's easy to Do a breakdown? Where is this statement used at all or? The other way around to drill down to the database as I mentioned before For from the application even if the developer does not have access to the database itself The developer can see the execution plan and can see how this statement is executed and maybe raise I need an index here or My table structure or my my my few that I'm using in the table So I can not only use tables, but Predefined queries that are stored in the database. We call that fuse are executed and the developer So the few is on the one hand completely transparent It can be used as a table But when we see from our development team already that there is a few in the background and The query the joints to that few is slow and missing indexes and maybe called 500 times Then we can raise that problem from here because the faster the sooner we find problems the cheaper It is to fix these problems the longer we wait maybe until production The more expensive it gets so what are The key takeaways that I'd like to give you is much time spent in the database does not necessarily mean that the database is slow Performance hotspots typically that we see are too many secret statements missing caching Not missing caching because the cache is not available missing caching because we are using bind variables The bad query design for prepared fuse for instance, so inefficient joins Index miss usage or not used at all or wrong indexes It's also not good just to index every column for sure to extract data It would be very fast, but once we are doing updates to a table Then the database does not only need to update the table, but it does need to update every single index here so the more indexes I get the faster my Select might be but the slower the updates will be We have to find the right balance here or undersized database servers, but for sure if we Follow other best practices. We don't need that many resources on the server for that and really let the developers know and create awareness in your development teams that they Really care for what's happening in the database and always keep in mind that DBA is our friend Thank you very much. So if there are any questions Please let me know or meet me during lunch. I'm happy to have a chat about that