 Good morning. It's bright and early for me. Not really, but it's nearly lunch time. I'll be talking in the slots about it's all about the go-to that has very little to do with the evil word itself. We can have a discussion about that later if you want to. This is very technical. That's probably why you're here. If I'm going to fast explaining anything, please let me know because if you miss one part and you get lost, then yeah, then you sort of get lost totally. So please let me know if something isn't clear or haven't explained it properly. I'm Derek. I am European, long story. Talk to me about it later as well. I'm living here in London, so this is my local conference. I work for MongoDB on the PHP driver for that. I will not be talking anything about it whatsoever, but I feel obliged to at least mention it. I'm the author of the PHP debugger called XD book, which you might have heard of at some point, and some of the things I like maps are like they are. I also like whiskey, but my font doesn't have an icon for that. If there's any comments, also let me know on Twitter. I'm happy to read the comments later. All right, so let's get started. We're going to talk about a few different things. We're going to talk about stages, conversion stages, and a conclusion. That was great, isn't it? Let's get started. So for the stages, what I mean here is the stages in how PHP executes code. From the start on, when I start reading your code until it is done executing it. Not done executing, but while it is executing it. So there's four main stages. The first one is it parses your script. It converts it into tokens. I'll explain what all of those steps mean in great detail after this. Then from the parsed script, we're creating a logical representation, and it's something we call an abstract syntax tree. 700 times in a talk. So I'm going to call that AST from now on. So we convert the tokens into an AST. I'll also explain what that is. Then we create some executable code out of that, where we convert the AST into opcodes, and then we're going to execute this bytecode. And I will show you that as well, of course. Okay, so let's have a look at the first thing. This is the parsing stage. If PHP sees your script, it does the following thing. What this actually does, it tokenizes the script into smaller bits, so that something can make use of what this actually says. And the way this works is it's a very big state machine starting with initial. I can show you that, actually, because it doesn't like live demos. This is big enough for the people in the back. It was big enough for me, but I'm not sure. All right, so I really could do it three hands as well. So basically where it starts, it starts in this stage called initial. Initial is basically the start of your script, nothing has happened. So if you would have HTML in it, that'd be also initial. And any keywords can change the state to other states. So the first one, when you have the PHP opening tag, it starts this new state called ST in scripting. And ST in scripting basically means we're currently parsing PHP code. Now, there's very different things in there as well. So if we, for example, look at... I should have set a bookmark here, really. Where did I go? Initial. So there's ST in scripting. Sorry, this keyboard, this angle, I can hardly type it. You also get things like ST double quotes. So the moment you see a double quote, it goes to a different stage because the parsing rules within a string are different than in your normal PHP code, right? Because there's like the variable substitution in them which you don't have in normal PHP code. At least not in the exact same way. So this is how it works. It reads a token and based on the token, it can sometimes enter a different state. What it gets out of this in the end is basically a stream of tokens. But no meaning is given to this token yet. So PHP doesn't know what they actually do. They're just names for bits of your code. And PHP comes with a tokenized extension. It makes it kind of handy and easy to visualize what it does. So let's have a look at an example script. This is a very simple, stupid class that doesn't do anything. Drem.io is my site project which I'm trying to plug a little bit if you like whiskey and rum, your website. But that's not what I want to talk about. I want to talk about what happens. This is a script that has a namespace declaration, has a class name in it, and so on and so on. It doesn't really do anything useful whatsoever. But it's just an example here. So what comes out of the tokenized is the following thing, right? So you can see that the first thing here in the script is your open tag. So that's where you get the T open tag. If there's anything more than a character, the tokenizer will give you two things. It will give you the name of the token rather. It will give you the token number that you can convert. And then also the contents of it. So if it's a string, like T open tag, it's a string with four characters, five characters. You get the namespace keyword, which is a token. You get the whitespace is also a token, but I've sort of ignored that here because it's whitespace. It doesn't matter. You get T string for just saying it is a string. So this moment when PHP reads namespace Drem.io, it doesn't know that Drem.io is actually the namespace. It just sees it's a string. And only later stages it does do something with that. And you have a whole bunch of all the things. So you have the T private, T variable, T function, T object operator, T pammayim, nukadatayim, if you have ever seen this really long keyword, that means double, double colon, and he were, no, pammayim, nukadatayim. I practiced pronouncing it a few times. Yes, definitely did that. And so on. So this is, yeah, what comes out is the tokenizer. But as I said, no meaning is given to this yet. Which means, as you can guess, there needs to be something that gives a meaning to those tokens, right? So that is the following thing. And that is what the scanner does. If you've ever looked at PHP source code, it is this fault T, sorry, Zend language scanner. That's a very big file. If you look at it, which I will show you here, language parser. If you go to the start of this, and read through this, I will just skip through this. Don't pay attention to all of what you're seeing. It's very complicated. And then you get these big tables of numbers. Now, nobody has written that themselves, of course. That's very difficult to do. This is clearly generated by a computer, because computers are kind of good at doing this. What it basically is, it's a big state machine again. From every state that there is, you can transition to all the states. Because the scanner is much more complex, sorry, the parser is much more complex than the scanner. Because the scanner only has like, creates like, what is it, 400 tokens or something like that, maybe even less than that. Whereas the parser, well, it's very, very large. I'm just scrolling this and scrolling this and scrolling this, you can see. Nobody writes this code. There's, I was going to show how many states there are, just too far. It's about 500 states as well. But that has changed quite a lot between PHP 5 and PHP 7. PHP 7 is a lot easier to read. Even this is easier to read than the PHP 5 versions. Because instead of directly going from interpreting the rules, it doesn't go directly to bytecode anymore. But it goes through another stage called this abstract syntax tree, which I will call ASP, ASC. But to have a quick look at how the scanner rules work here. So at the start of your script, under Chef HTML in the first, which you really shouldn't be doing anymore, you get a list of what can be the top statements. The top statements is basically what is the first thing a PHP script can contain. So it can either be a statement, there's any sort of statement you can come up with, or from clean declaration statement, a class declaration straight, again, difficult to pronounce, a trade declaration statement, a namespace declaration statement and so on and so on. And those are, again, those initial states that it can be. And what a scanner does is it applies rules to it, it tries to find the meaning in the tokens, figure out which rule applies to it, and then builds up this ASD. So as a simple example here, and I won't go into great detail about all of this, but say, for example, that your fault contains a PHP class, which is your class declaration statement. Now, and that consists out of class modifiers in pink here, which then is quite easy to read actually. The class modifiers is either a class modifier or class modifier followed by class modifier. You get some recursion in here. Now, the reason why that is done in this way and not the other way around where you have class modifier and then class modifier is the way how the parsing works. If you do it this way, then it's much easy for the parser to figure out how to give meaning to things. It's difficult to explain. I had a whole compiler class in uni about how this works, so I won't bore you with that too much. But from the things that it finds, so it finds class modifiers and at some point it says, well, it's either T abstract or T final. Those are the class modifiers. So when it sees one of these tokens, well, it knows this rule applies and then it does do something. And doing something is between the curly brackets. So basically this says, well, we set the value that was found to send a CX explicit abstract class. So that is when you use the abstract class in PHP internally, it will attach this keyword to your class internally. So then when you do something with it, you know you can't instantiate and stuff like that. So it's a whole tree that goes down. So once it finds a class modifier, well, then it knows that it is either that it is this option. So this is an or just like you'd have in PHP, right? So it is either this rule or it is the other rule or this one. And you can see the difference here. Where there is a rule, there's also variable here. So those variables are very much like PHP variables except they are internal to the scanner. But they do very similar things. And from that you can see if it finds those rules, it creates this AST createDecl function, which then builds up this AST for you. Well, let me go back to this AST. So the scanner gives meaning to the tokens. It constructs this AST through those rules. And then it creates very complicated code that nobody writes or reads because it's generated by a computer. If we look at what people actually write, they write this file here, which is not generated and it's much easier to read and quite easy to follow if something goes on there. All right. So this AST, how does it look like? Well, it looks, it's a following thing. It describes a structure that comes out of the parser. It describes a structure of your script without and having given meaning to what all those tokens mean. So each node in it is a language construct and a language construct can be many, many different things I'll show you in a moment. So the nested structures are in there because it's a tree. But it also removes some of the original information from the source code. Like in some cases, it doesn't remember your comments anymore. It remembers the dog blocks still but not the comments you have otherwise in your code. It also sometimes gets line numbers wrong and things like that. Or it doesn't need to store the line numbers because PHP doesn't care for executing them. Unless you get an error message. The same thing, it doesn't keep the column or which column a specific token has been parsed in either. So there's information loss going to an AST. However, from an AST, you can always go back to a fully functional PHP script that would exactly behave the same. Because all the information that it does need to keep is still in there. Including dog blocks. Because they are important for doing reflection stuff. So that is kept. To visualize them, there's an extension called PHP AST by the person that did most of the AST implementation in the first place. And it's also possible to do optimizations by looking at this tree and see which things cannot be reached, for example. PHP itself doesn't do that. But if you install OP cache or OP cache, then it will look at these things and do some optimizations that are possible. So when you use Nikita's extension and run AST parse code, you get this kind of stuff back. Again, nearly impossible to read. And it's lots of stuff even for my very simple scripts. However, there's also a way of formatting it. And when you format it, it's much easier to read and figure out what is actually in there. And if any of you have ever used things like PHP code snapper, PHP code snapper used to be based on just the tokens. And writing rules for PHP code snapper because it's based on those tokens that haven't been giving any meaning yet. It's actually very difficult to write some good rules and not being too complicated. But when you look at this AST, it's much easier to find these kind of things. Because it's a structure that has parsed for you. Everything has gotten a meaning. When you look at it, you can actually almost visually in your head reconstruct back how the script looks like. So in this case, the top of your AST is an AST statement list, or STMT list. I don't know where the constant comes from, so I'm skipping that. But the first thing, again, is this class. The class definition, which has its name, as you can see, it's the name whisky. It extends nothing, it implements nothing, and it consists of statements. So a class can contain statements, but in this case, the statements aren't normal statements. They are like your, as you can see here, your property declaration, or your methods in there, and things like that. So the property declaration, basically what it says is we define a property, it is private. The name of it is name, which probably wasn't the best example to pick here. And then the default of it is no. And all of this information is in there. So you can write a script to figure out where you have used the argument name, for example, quite easily by parsing this. Then you have methods. Again, they have modifiers, they have names again. And of course, a method or function consists of a list of arguments first. So that is the AST param list. So in this case, there's a parameter called name. And again, it has a default of no. And then you get a statement list. And a statement list is then every PHP statement that you've written yourself that belongs to this method. So in this case, the only statement in there is the assignment. And this has something more complicated. Actually, I think I have better slides for this to explain this. So if you look at this very simple constructs that we have here, then I've tried highlighting in colors where the different bits end up in. So the first thing, the public modifier, I actually forgot to color here. But that's the same one there. We have the constructor, which is the name. We have the assignment. And the assignment looks very much more complicated, right? Because you get here now is the assignment actually happens in two steps. The first one is actually an expression. It's the expression that means this name. And that looks as follows. So because this is a property axis of this, it knows that the variable that you're doing the property from is this in this case. So that's the expression. And then the property name is name. So if you get this nested by doing multiple arrows, you see this tree stick, you're growing into something more complicated. And then the other, the value that gets assigned to this assign, through this assign is, of course, the variable name. And then a return type is no because we haven't set a return type. So that's what the ASC does. It builds up from this function or method in this case, this tree structure. All right. So PHP itself cannot execute this AST either. What it needs to do is it now needs to convert it to such a way that it can be executed by the engine itself. And to do that, it needs to convert us to something we call bytecode or opcodes or operas, synonyms basically of the same thing. All right. So let's have a look at this bytecode. And this is where it gets even more complicated than we all had. So in PHP, we call them opcodes. So each function, method or main body of your scripts is represented by an operate and an operate contains these opcodes. And these are instructions for the Zenta engine in a very similar way as how a computer or an assembler runs assembly code, converts assembly code to machine code. PHP does not have that step. But it does have the assembly, assembler layer in between, which are called these opcodes here. It's very similar and I'll show you in a moment. And there's an extension called VLD that I've written many, many years ago that allows you to visualize these operas and opcodes. Let's have a quick look at how this looks like. So from the previous example where we had our construct methods, from there on, we then generate these opcodes. And what VLD shows you when it comes out of it is the following. So there's a list of compiled variable names. Compiled variable names are something that is a bit of an optimization. I think that came in in PHP 5.3, actually, where instead of having to look up a name in a table, while compiling it, it already associates a number array index, basically, describing the variable name. So instead of having to do a hash lookup, like array key lookup, it doesn't direct array index. So that's much faster. And this is probably one of the biggest performance gains in PHP 5.3, actually. So these compiled variables, as they are called internally, they basically attaches array indexes to this name. So in this case, the variable name is represented by exclamation mark 0. So the exclamation mark is just to notify that this is a compiled variable. And PHP has other types of variables in its engine as well, so there's temporary variables and variables that actually represent a variable in PHP. Because if you use things like variable variables in PHP, then it cannot attach an array index to it, because while parsing the script, it doesn't know which variable is actually going to represent, and it still has to do this hash lookup for you. And that shows up in a different way in the opcode that's come out of it. Now, this very simple construct actually creates four opcodes that actually do something. The first one, xnop, sent for no operation, does nothing. The engine sometimes generates these as a... Okay, so what it does, it doesn't generate them at front. What it actually would have done, it creates a function declaration here. But you don't see that, because the scanner does two passes over, so the first pass parses the script, and then the second one looks at it a little bit and removes all the things that it can do whilst scanning and parsing your script. So an example here would be that a namespace declaration in PHP is basically something that is alias for something else that you have in PHP, like a four class. So if you have Dramio slash whiskey, after the parsing stage, or after the second parsing stage, that has already been resolved. So the PHP engine doesn't know anything about namespaces. It's all done while parsing the script. So this is why you don't find things like use star, use namespace slash start, because PHP then doesn't know what could exist in that namespace and hence cannot do this pre-optimization of copy and pasting into the class names, which the engine needs. So that's why sometimes you don't get features in PHP, because the engine doesn't really do that. Anyway, the no op is what that used to be in this case. The second opcode that we have here, actually this is opcode number one, is to receive. Anybody wants to guess what that means? It receives an argument into the function, which is the name argument in your constructor in this case. But it doesn't show name because every exclamation mark zero you get to replace by the name, so it says receive into name. Now PHP 7 does this internally a little bit different again? PHP 7 actually does not execute this receive code. It actually jumps over all the receipts because of the way how the calling conventions work. So when calling functions in PHP 7, it actually puts those things in the right place in memory already, so you don't have to do a receive. But to make things still work with older things and sometimes when you're combining things, the receive is still being generated, although it isn't actually used much anymore. But it's nice to know that it's still there. We get X statement, sorry, an X statement doesn't do anything either. It is a placeholder opcode that debuggers can hook into. So if you do single step debugging, that happens because debuggers will hook to this X statement opcode and you can really only do this for X statement. And that is how single step debuggers work by hooking into this specific opcode, define your own handler and then introspect the function or the PHP function. Now the third one or actually the fourth one which is assign underscore obj and that is has two colors here because it is an opcode that is generated from an assignment but is already sort of pre-optimized. So there's a sign obj, there's also a sign array and normal assign. Each opcode can have two arguments, at most two arguments. So this a return value and then two operands. Now the thing is assign obj needs three operands. It needs your name of your class, sorry the variable representing your class. It needs the object property's name which is in this case the string name. You don't see the variable here because assign obj has a flag in there that says if it's this then we don't have to also tell that it is this. So even though there's only one operand showing this is actually the second operand. The first one has been flagged out already so you don't see it here. But because there's occasions where if you don't use this it's a different variable name it needs to accept both of those operands. And there's no not a specialized opcode for having both cases. Although I say everyone can have two they don't always use two. So there's this and name and then because it needs three it needs the like I said the variable that represents the the object the property name and it needs the data. There's a special opcode called opdata that basically access extra space to provide more data to an opcode that precedes it. And there's very few opcodes that has this assign obj is one of them and there's a few others as well. So the last thing that I'd like to point out is the return. It says return null but I didn't write that in my in my script at all right. That I didn't have a return statement at all. First of all of course constructors can't really have a return. But PHP will always generate this return statement at the end of every function and script has been doing that since it starts and it continues doing that because in theory every function can return something. Now there's optimizations in there that if there's no return value then PHP would also not attempt to read the return value coming back either. So there's an additional step there that optimizes out. But there's lots of interesting intricate things going on there that make no sense if you look at it but it's all there because of historical reasons. Now this is a very simple list of opcodes. I'm going to show you a few more because you might remember the title of the talk was it's all about the go-to. And so far I haven't actually shown you any go-to's yet. But we'll get there no worries. So the go-to I'm referring to is a jump. It's not a jump between two prongs of a mountain. This is approximately 500 meters overseas level and you actually have people jumping between the two prongs. It's it's somewhere in Norway on the Lufoten. This is not my photo but I'm in this position to be able to take the photo. But there was no crazy people trying to jump across at that moment. So I only have photos of people climbing up there. Anyway the jumps. We're not talking about these jumps that can result in that but we're talking about jumps in computer stuff. So a jump starts with a very simple one with an if statement. An if statement well you can either go into the if statement or you can go out of the if statement like not executed right. So this is a choice that needs to be made. And the choice is made by by doing a jump if something matches or not doing a jump if something doesn't match or the other way around depending on which type of jump is being used. So in this case we have the script as if it has a variable a and a value 42 which come back here in the AST. And then when you look at the opcodes that come out of it it looks like something like this. So the first one in line one ignoring the X statement again is is equal. And is equal has a return with the tilde one. This is a temporary variable in the engine and then it has two operands that has exclamation mark zero which is name or sorry it's a compiled variable and then the value 42 which is here. So what it does is it runs is equal and the return value of that it stores in the temporary variable one. Then the second opcode in this case jump Z stands for jump zero checks what this temporary variable is and then depending on if it is zero that means false for the if is equals return to zero false then we need to jump. So it says basically if the return value of the is equal is zero the Z stands for zero here we jump to opcode five which is then return one at the end of the function. I'm not sure whether return comes from really here. I might have cut it off the the slides because I run out of space or something. So that's what this does so it's a very simple instruction jump zero only jumps when something is false. There's other jump instructions. So this is slightly more complicated on where we also have an else. So we have if a equals pi then we echo circles otherwise it squares. It makes no sense whatsoever of course but it's an example here simple example. And then when you look at the opcodes that are being generated from this there's a few more things in there. There's still the is equal if this exclamation mark zero is pi. If it is not we jump to six. So that is basically says if this doesn't match we jump to this bit here. So if you look at six it has the echo for squares following it right. So echo is another opcode here. If it matches however then we run this we run the echo for circles and then you get another jump here. This is a unconditional jump or an unconditional go-to really that then jumps to opcode eight which then returns from the function. So there's the only way how this kind of structures are implemented are three these jumps or through these go-tos because that's really what they are. Your nested AST structure which is a tree gets flattened out to a basically an array of opcodes to run but it can jump around that by these jump instructions. All right so let's talk about rings. Anybody guess which ring I'm referring to in the slides? I can't hear any. The collider that's exactly what it is because there's no other ring that you can't see on satellite imagery and is 27 miles somewhere in France. I don't know where this is. France? It's a France fish border and that's anyway the ring let me let me not talk about the ALHC although that would probably also be very interesting and that's not what I'm talking about this morning. Let's have a look at four because that is a loop structure right and this gets more complicated again because that's how the presentation involves every step gets more complicated and previous one and because there's now multiple statements in here right so there's four loop there's the initial state in here that you can see in the AST actually started blinking at me let's see what is this should I throw it away? I think the battery might get empty we'll see. So the four has four elements in the AST and they are the in its stage which is the red bit here the conditional which is the green bit and the loop is the blue bit here and then of course you have the statements that that fit inside your for loop and it generates for that's the following things again it is an assignment first which is your in its stage it assigns the value zero to exclamation mark one which is the I and then we jump and we jump directly to opcode six which is the green bit here so before we do anything else we check whether the exclamation mark or sorry the variable i is smaller than 42 and if that's true then till the three contains true then it has this x statement in it which is nothing and then it has a jump not zero that is basically jump if it is true so if i is smaller than 42 then we jump to opcode three which is then the pink bit which are your statements in this function well it looks at the statements it does the echo and then after that it does the elements that come at the end of the loop which are the blue bits is this is the pre increment of i it is basically this what it does it converts your for loop if we would do that in php syntax it would look like this so you have the i equals zero here and then jumps to the condition it checks whether this condition is true i used dollar underscore which you should not do in your code ever but you can check whether that's true and if that's not true we jump to the statement then attack you the statements and then a bit at the end of the loop so every loop you get in php every loop structure you can rewrite with gotos should you do that absolutely not but you can if you really want to and the code that comes out of it is pretty much the same it does exactly the same thing you can of course get slightly different line numbers what is interesting though is that the lines numbers you get out of here those are the line numbers that p associates with these particular op codes the php doesn't always get it right sometimes gets it wrong if it gets it wrong it's a bug of course but if you ever seen php's code coverage which uses xd book on and who and it gets the lines wrong or sometimes it shows you that lines aren't covered it's mainly because php doesn't get the line numbers right now this is also vastly improved in php seven so that you'd not no longer be so much of a problem as it used to be it's also a lot better at getting the line numbers even set sometimes so they're rewritten for now the the other one we have a wall loop which is a very similar thing again right so the wall loop as we set before the loop we set a variable and then while a certain variable is is true or in this case larger than zero i guess then we echo the variable the variable i and that again results in this right you have the assignment it's the first bit in red here the dark red is to jump so it unconditionally jumps to opcode six which just the pre decrement it does do that first uh if it is false so if it is true or that means that if the decrement didn't hit zero yet then it jumps back to opcode four to echo the variable if nothing happens it just goes to the next operands which is the return out of this function that's while um do while is very similar i don't think i should cover that as well in the same detail because it just slight reordering of conditionals again we can re-implement as in go-tos if you want to again don't do that we have for each and for each gets a bit more complicated now if any of you follows the discussion on php and internals when go-to was introduced remember anybody read that no okay well one thing that came up there is an argument for not having go-to is that in many languages go-to allows you to jump around a function freely like any way you want um which makes it really difficult to implement especially in php because in php some construct like for each they do a bit of a setup beforehand so if you would jump directly into statements within the for each loop because the setup hasn't been done at the end of the loop when some things need to be checked it couldn't because the setup hasn't been done so the implementation of for each made it very nearly impossible to jump randomly into console into control structures like the for each loop so this is why go-to in php only allows you to jump out of control structures and never into them implementation details i would also say that is a really nice restriction logical wise because jumping all across your function with a go-to is kind of rubbish to do all right let's have a quick look at what this does is the first thing is we do the assignment of the array and you don't see the values of the array here because php is already optimized it out because it's a constant array it has created a structure in memory for you which is immutable just a nice thing php5 doesn't do that as well and then you get a few opcodes that do the the setup of your for each loop which are the resets the fetch and the and then we do the assignment so what do those opcodes do so fe resets resets the internal structure of the array to to make sure that the internal pointers start at the first element in your array and so it resets that if if it can't do that it jumps to opcode 11 reasons why it might not be able to do that for example if you don't give it an array or if you don't give it an iterable then because you can't loop over a string this easily it jumps out it generates a warning and then it jumps out of the loop which is why it jumps with the 11 to the end php doesn't know beforehand whether you're going to give an object or an array an iterable object or an array or a string so it can't optimize for that right now upfront we then do the fe fetch as you that's a very complex complex opcode didn't i did i not say that that's opcodes only have two operands before well this one has string because of php it is actually two but because of flags one of them is reused so it actually one operand encodes two things yeah php what can i say so yeah it fetches the array element here so from the reset array we fetch the element s um sorry sorry the reason for this is that the fetch fetches the value and then there's also the assignment for the key so the fetch actually gets the key no i say that the wrong way sorry the complicated the fetch fetches the value that is the uh the exclamation mark one here and then it also does an assignment and this assignment isn't always there where it uses a return of this temporary variable which then contains the key and then it's assigned that to exclamation mark two uh it's complicated it works most of the time and that's there's a buck somewhere um then executes some bits and then at the end of the for each loop it does a jump an un conditional jump back to the fetch so it doesn't go back to the reset here because it has to reset the array now this also means that if you add elements to an array uh this works just fine in php you can wall for for each and over something you can add more elements to the array now if the internal array pointer has gone beyond the original beyond the element that you're inserting then it won't show up again i would recommend people just don't do this because it's not particularly nice coding really but you can if you want to for each now we got really complex loops and nested structures and then it gets kind of mind-boggling right it's if you go through it you can follow mostly what it does but what i want to highlight here is the um is that vld also shows you the in and out so that the little arrows basically indicate the start of a branch and the end of a branch and a branch in this complex loop you have um the if else is two branches right you can either either go into the if true or the if false statements so those are two branches then within the within the else there's another two branches there's the if matches or the if doesn't matches even though the else has no case it is still a branch in here and that shows with all the little arrows here and when you go through this um you can actually construct more interesting things however it just gets so complicated just like primer if you've ever seen this film xkcd where i stole the slide from um has one of its comics it has the timelines of films so one of them is the lord of the rings where you have all the characters meeting and i'm splitting apart and meeting again and then they show primer where there's three main characters but because this is a film about time travel the loops get extremely complicated because people go back in time and it's not as complicated as this slide but it's a mind-boggling film it gets so complex that you can't make any sense out of it at some point which is the same thing if you get so many statements here it gets really complicated it gets complicated to read um and it also gets complicated for other reasons we'll get back to in a moment all right so this complex loops it would be nice if we can visualize them right and that's basically what we'll be also allows you to is allows you to create a graph of all the complex loops that are in there so for example um the graph on the on the right hand side here shows what are the possibilities so we start with opcode zero and one which are here the start of your for each right then it can either jump to the next one which is opcode two which is the fetch here or as i show you it can jump to number 12 and that is what this arrow shows here it jumps to number 12 which is line eight in your script which would be this one i guess maybe we should have line number done that would be easier and then it exits the function by using by going directly out of it because that is to return here now if the for each is right um because this is a loop there's another jump to um at the end of the loop so this is the loop and there's multiple ifs in there right there's the if on this side there's the if on this side and then that's up at opcode 11 which is also line eight that is the end of the loop here and the end of the loop of course it jumps back to opcode two which is this jump here so the numbers here the opcodes represent the numbers here to sort of tie them together i haven't figured out a better way of visualizing this yet but if you have an idea then i'm happy to hear that these complex loops are important because they allow you to do some other things too okay the last one i want to talk about is exceptions because they are a bit of a tricky case exceptions in php every function has a starting point right which is the start of your function however a catch statement is also a starting point of your function it has to be like that because suddenly rollcode is being executed so if in the try you'd be calling multiple functions here nested levels for functions called for the function so if you do a throw somewhere and there's no try catch block around it earlier then suddenly the control of the execution function gets transferred back to this function and it goes straight to the catch statements so this catch statement denoted by an e here are also an entry point to your function now and then you have finally and finally this implementation is very complicated because it finally needs to be executed regardless regardless whether it's a catch or not and in some cases if you have net nested catches it needs to figure out where which entry point was being used to get into the specific catch statement to be able to then jump to the finally and that is that happens through this opcode called fast call i don't like the name of this opcode because it has the words fast in it and has nothing to do with it being fast or slow there's no slow call function it is internally a way of remembering where we came from and whether we need to after the catch is being processed need to jump out of the function or not so if if there was an exception in try then it would do one of the catches at the end of the catch you see the jump here to opcode 7 right that that is a fast call you don't see that as the last one because it just drops down to that automatically it's like break intentionally missing in php internals so this is fast call and if there was if it came out of a catch then it remembers that jumps to opcode 9 which is the next one this is echo in finally and then the fast rat uses the value that was set by fast call to either jump out of the function because there was an exception and if there's an exception anything after the try catch finally the after doesn't get executed right it needs to jump immediately out of the function or it just drops down and then it executes the of the return here so if there was an exception being called fast rat also returns on the function so it doesn't do this last echo after if you look at how this is implemented then you might want to tear your hairs out like as you can see I don't have very many anymore all right so what the analysis of all these looping structures can actually give you is that code analysis and that code analysis is basically ways to figuring out which branches cannot be reached and that is something that xDbook does internally and VLD shows you that as well by showing the little star here a typical example that I'm having here is I'm trying to do an if statement echoing 40 then we return from the function and then I have another echo statement there too of course when you look at this code it's quite quite obvious that it's never going to echo the two here because it's after return statement it's a similar way that if you do a throw and then you do an echo behind it you also would never get that happening because you cannot simply reach that code now of course this is a simple example in some cases it is less obvious that you have that code for other reasons PHP would not find that code if you had a constant value for if by the way because the PHP itself doesn't have an optimizer so it doesn't optimize it doesn't eliminate the code that cannot be reached it just doesn't do that opcache does do some of that actually so not only does it speed up PHP it also removes code so that it doesn't have to take care of a stored and execute it so in any case what VLD and xDbook under the hoodoo is actually allow you to follow the branches and if some branches cannot be reached for logical reasons then it marks it as such and then from there on by knowing which bits of code can not be executed which paths cannot be hit you can go more complicated by doing branch analysis and branch analysis basically the following solves the following problem now I know all of you are really good writing unit tests for all of your code right I see some people I hear some people smirk and laugh so they know but they know that they should write more test cases really but a part of writing unit tests is people try to go for 100% code coverage right sometimes like it needs to be green otherwise it's not going to be great now having everything green in code coverage doesn't tell you everything it tells you that you have at least hits every single line in your code but it doesn't mean you have tested it all correctly and the only way how to do that is making sure that not only do you hit every line you also hit every possible part or a rather from all the branches that you can do you need to have all the combinations so that you have tested every path for your code and that is what an XD book now allows you to do that unfortunately PHP unit doesn't implement this yet it doesn't do anything with the information yet and that is not because they don't want to do it but it is because it produces so much information that you basically run out of memory before you have the chance to visualize it so my simple example is still quite easy to understand but it gets more complicated especially when you have so many nested loops everywhere so let's see what happens here is why we have the if then else I'm going to ignore loopy here if then else has two statements right a can be true or false and b can be true or false but with two test cases I get to cover every single line and the test cases here are I call them the true and false and which falls on true which means if we look at my code cover it will show up as 100% actually it looks like if you write this yourself it looks like this but I'm just showing you how the output of it right so you can see every single line in here is now green which yay 100% but you haven't done all of it yet right so so what you haven't tested you haven't tested a being true and b being true or a being false and b being false we have only done the true false and a false true so you're basically lying to yourself 100% code coverage doesn't actually tell you that you have tested all of your code so what you can do because xd we can reveal they do this internally they do something called branch analysis which is the new magic source here says so it must be true and this new magic source not only tries to find out which code cannot be reached with that code analysis but it also figures out which possible parts you can have through a function now I have to say here that this is the problem here is is an exponentially complicated problem right every if statement you do means doubling the amount of parts that you have so with my two if statements you can probably guess how many possible parts are there which is four okay so what comes out of xdbook or actually verily is something like this it shows you all the different branches and then all the different parts and it will tell you whether the part has been executed or not so the x means not and the hit means hit this is just a very simple visualization it is easy to see when actually showed with a graph here so with function and these statements it outputs something like this so you have the four different parts right so you have the dotted ones are the ones that haven't been executed and the solid ones are the ones that have been executed so the first one is where a equals true as well as b being true is not being hit the same thing is the purple one here the first if not being hit and the second if not being hit is also a dash line because that part is also not being run now the problem here is if even with two statements I have four parts already if I had 10 if statements I'd have 1024 parts or if I have 16 if statements or loops because they're basically if statements they're the same kind of go-to things then you end up having so many parts that you run out of colors that your eyes can distinguish right nobody can see 64,000 colors with eyes also how do you visualize that on a on a slide I can't show you 64,000 lines here you would never be able to find anything about this right you can never make any heads or tails of that so the visualization of the part coverage is actually a really difficult thing to do and as I said it's this exponential problem as well so I don't think you'll see this showing up in PHP code cover through PHP units anytime really soon unfortunately it is also ridiculously slow but let's not talk about that all right so a recap for me before we do a few questions we've spoken about the different stages of executing code right so we started the PHP script we turn it into tokens you can visualize that with the tokenizer then we go with the parser to this abstract syntax tree which you can visualize with an extension of which I've created a few links for you here and from the ac we go to the bytecode the opcodes which you can visualize with vld all the looping structures in PHP are all implemented as jumps or go-tos and then we looked at some code analysis for fun and profits but not sure whether it was fun definitely no profit anyway there was the jumps the go-tos that I wanted to talk about are there any questions I know it was a bit of a tough one wasn't it no questions at all oh there's one there I'm being blinded by the light that sounds so difficult to see I'll just ask one because it was a great talk and worth a question I suppose so if you go back to the if-else thing you were showing with the testing I mean theoretically the problem there is obviously because you're not using like scalar hints or anything like that I mean a and b could be anything there could be strings there could be integers so that graph doesn't even show the scale of what you need to test theoretically based on the script so I mean that's is that the sort of reason why that I mean as you said about the memory issue that you know some of this unit testing you'll never ever necessarily get a hundred percent code coverage because theoretically it may be impossible yeah it might also sometimes cases that you defensively code against but would never ever happen in running your tests right like the hard drive being full because you then suddenly can't create a file I mean you can't really test for that well unless you're sql light then you test for people unplugging the server actually if test cases for that not sure how they run that I don't think there's physically a person pulling the plug while they're running the test or something but yeah those things are difficult however the thing you mentioned about the types in there at this stage in PHP PHP doesn't know anything about that so it is not something that code analysis can actually show you unless you start using the scalar type and but even then it is only a check when the function gets executed after that has happened PHP doesn't do anything with this information at the moment so that might be a next stage for making PHP even go faster by having the optimizations in there that does allow PHP to have the special opcodes to deal with those specific cases but that's not there right now hopefully in the future so I have a question is there any way existing way of visualizing this past one is missing one is one when you have I mean when you work you look at the code not the opcodes or graphs or dumps is there anything in your toolbox and you are working with this which helps you with that which we can maybe can reuse or there the graphs that I'm showing here are actually being generated so I didn't hand draw the diagram okay I'm rubbish drawing diagrams so this is what the tool does that creates a dot file yeah and dot is a file form that you can interpret with different visualization tools to create kind of graphs like this so would you do you see any usefulness in overlaying that information over actual source code so maybe the variants I did look at doing that but again the problem is the amount of information you get the amount of parts that you get sometimes so overlaying that on top of source code again it's a very difficult thing to do it's difficult to visualize in such a way that you can still understand what it says yeah make it useful okay thank you yeah one right here hi I know you said like the XD bug in VLD addons you've made it practically won't be in PHP unit but can we still practically use these features you know outside of PHP unit yes and if so like let me go any advice so right so yeah I'm just saying like how could we take advantage of this at a small scale you can't do it at large scale because if you run out of RAM but at small scale you know can you give any advice how we could use this in our jobs to like help us find stuff right so when PHP unit does this it uses this tool called PHP code covered which is what this slide shows here that doesn't handle all the different things yet but XD bug does output information so if you run code covered with this extra magic source flag then the result of this when you get the code covered has all the information about the paths being hit all the branches in there and then you can visualize them so in a very simple case I visualize it like this but there's certainly you can do more with this information than just throwing it on the slide and visualizing this like you could write some analysis to show how many percenters of the paths have been executed and things like that so the information that you got out of here although in it's very simply visualized here you can of course do more with it and this information comes back in PHP arrays so yeah you can do something with that and this is what PHP code covered hopefully in the future we'll use to do something with but that step is currently missing still like this one there I think we have one more minute hello hello you shown how the how the for each loop works in from the point of view of tokens yeah I've got a question actually what's the difference how's the yield for each loop differs from the original one can you repeat that once more yield like yield yield returns yeah like I know it doesn't reset probably but it jumps like to a completely different place and I haven't looked at that yet personally let's find out I don't have the time to show you right now we ran out of time here if I had five more minutes I can actually just show you I don't know at the moment how it looked like but I'm happy to have a look at that and come find me and figure that out cause I haven't tried cause it's 12-13 well thanks very much I have one more point so this slide should show you the joined in link where you can leave feedback for this talk however is missing however this QR code will go to my site where I have uploaded the slides already and where I will also add a link for to leave feedback for the talk which I'm more than welcome about to here and I will also tweet about it yes thanks and enjoy the rest of the conference