 Ychydig i chi i'n hoffa, sy'n gweithio i siaradol? Ychydig i chi i'n hoffa, rydyn ni'n gofynu'r ymateb yma ar y cymdeithas a'r cymwylliant pethau. Mae gwirio'r llad yn dweud James Tickham a rydyn ni'n hoffa ar y cymdeithas ar 2002 ac mae'n rhan i'r pryd yn hoffa a yn ymwyllteb. Rydyn ni'n cymdeithio ar y ddechrau ac rydyn ni'n gofynu'r cymdeithas, ar y ffordd y ffynedd. There is only one left for today. I'll bring another 10 tomorrow, so if you come and find me, if you don't get hold of this one at the end of the talk, come and find me tomorrow and I hopefully will have some more. Yes, PHP, so how many people know roughly this cycle of how it works already? Okay, about half, okay, cool. So, for the benefit of everyone, PHP code is fed into this thing called Alexa. Don't worry about what it means yet, I'll explain that in due course. This Alexa generates these things called tokens, you may have heard of these. These tokens are then fed into this thing called a parser, which generates this thing called an AST. The AST is then compiled by this compiler thing to generate these things called opcodes. You've probably heard of those. And the opcodes are stored in this thing called the opcash, which hopefully you've heard of that, and is executed in this thing called a virtual machine. That's basically how PHP works. The virtual machine thing is not like vagrant or anything like that. It's like a virtual CPU that executes these opcodes and runs them, makes it do things, right? The AST part of that is new in PHP 7. Well, I say new because PHP 7 is not that new anymore. But yeah, it was introduced in PHP 7 and it was quite a dramatic change to how the engine works underneath. So let's have a look at how this actually works. So we start off with Alexa, right? The Alexa in PHP is a generated program written in C. I say it's generated because to write one of these things by hand is a bit silly. It's time consuming and it's complicated. So we have a generator to do it for us. R2C or R2C, however you prefer to say those things, is the Lexa generator tool that PHP uses. There are others available, other programming languages, you will use different ones. So we have this file, zendlanguagescanner.l. This is the syntax definition, if you like, of PHP. And this describes how the tokens are formed and how they're split out and so on. And this is what it might look like. A very simple example, right? The format is quite common. It's similar to this thing called Flex, which stands for fast lexical analyser. Incidentally, that is used by HHVM, which is the alternative PHP runtime that Facebook built and now only runs hack as far as I know. But these are a few very simple examples, right? In the angle brackets here, we have ST in scripting. So these are the current states where these tokens will be matched. Alexa is a state machine, so as you go through, you can change the state of where you are. So you're in scripting or maybe you're in double quotes or in a here doc or now doc and things like that. So you can change the context or state, right? Of what tokens you're looking for. Because when you're in a string, you may be looking for the dollar to input a variable name and things like that to concatenate a string, for example. The next bit is a very simple string literals. We've probably seen these things before, like a function or die or exit. And then what happens inside the curly braces is what happens when this token is matched. So when the Alexa encounters these tokens, this thing will happen. In all of these cases, they just return the tokens directly. T exit is instantly exactly the same in exit and die, it doesn't matter. It returns the same token, so it has the same meaning. And function returns the function token. So then it puts these out into this thing called the output stream of the Alexa, the token stream, right? So, as I mentioned, it's a state machine, so the state allows you to match different tokens depending on where you are and so on. So this is an example, bit by bit, let's go through it. This first one will match, if we're in double quotes, back quote, or we probably know it as back ticks perhaps, or here doc. And we're looking for the string dollar curly brace, so that we're inside a string basically, right? And when we encounter that, we will do this thing called a yy push state, right? That's basically changed the state to this, which is looking for variable name, right? So now we're looking for a variable name, so we're changing the state again. And we return the token that we've matched to the token stream. So now we're in this state looking for variable name, we're looking for a variable, a label basically, and followed by a curly brace. And there's lots of sort of square brackets and curly brackets and so on. If you get them wrong, you'll probably break. So, you know, don't touch that. And then we do this thing called copying a semantic value. Basically that means that's the name of the variable, right? But that's useful information we want to know when we're looking at that token. We want to know what the name of the variable is that we matched. It's kind of important for the engine to know, right? Pop state, we kind of go back into the last state we're in, and then we push state, so we go back into scripting. The pop state seems kind of perhaps a bit redundant, but I don't know, maybe there's a reason for that, I don't know. And then of course we return the token for the token stream. Yay, we've lexed stuff, all right? So now all we have is we've broken up our long string of PHP code into lots of little tokens, perhaps with some meaningful information like the variable names and so on. So, given that syntax definition, we pass this into re2c, this program, and it generates a C file. It's not all that readable. There's bits of it that are readable, but then there's huge chunks of numbers and letters that are really meaningless. But you can go ahead, you can check out the PHP source code and you can run re2c on this with a load of flags which are minus, minus, no generation date, minus, minus, case inverted, minus, C, B, D, F, minus, O, A, dot, C, zen, language, scanner, dot, L, and so on. Remember that? You probably do, but you're probably the only person that remembers that in this room. And that is the lecture implementation, and that's what actually happens, right? So the next stage is the parser. The parser kind of works in a similar way, it's a state machine as well, fine. And we have another definition, zen, language, parser, dot, Y, another meaningless file extension. This contains the parser definition, and this gets a little bit more complicated. When we're lexing stuff, all we're doing is breaking up a string into lots of little tokens, right? As I said, a parser actually tries to make sense of these tokens in order. And it says, well, you know, I could throw a load of random words at you, like I'm doing now. But they're actually ordered, so that they mean something to you, and you understand what I'm talking about, hopefully. If I'm getting the words in the right order, that is. If I start talking random nonsense, I'm still using real words, but in the order they are, it doesn't make sense, and you won't be able to understand. So, actually, when you're writing a programming language with a lexer and parser, it works in quite a similar way to, you know, you could deconstruct a spoken language or written language, if you like. So, the parser tries to make sense of these words, if you like, in order, and says, right, that works, that's good. So, we're trying to understand the main parts of what's going on here. Right. First of all, these are just labels. We have two definitions. If statement, if statement without else. Now, throughout this, you'll encounter two other definitions, which are quite large. Expr, which is expression, and statement. And these are really big definitions, so I can't fit them on a slide, so I've omitted them, but, you know, we know what an expression is in a statement, so we can kind of put that in mentally. So, we'll look at the if statement first. Basically, this says if statement can be an if statement without else on its own. This thing is like precedence. We don't have to worry too much about that. It's kind of important, but we'll skip over it. It's fine. Or it can be an if statement without else followed by an else followed by a statement. OK. But we also need to know what an if statement without else is, so we can fill in those blanks. And that is the if token followed by the literal open parenthesis, followed by an expression. An expression is something that returns something, followed by a closed bracket, followed by a statement. And a statement doesn't necessarily return something. The definition for expression statement is huge, OK? So, we'll ignore it. But you can look at this file in PHP code and see, wow, that's big, if you like. Or it can be an if statement without else can also be an if statement without else, followed by else if, followed by an open parenthesis, an expression, closed parenthesis, and a statement. So, the statement, just for completeness, so you can hopefully visualise this a bit better, includes curly braces around the side. So, obviously you write if brackets, an expression that returns something boolean, or truthy, or falsy, maybe. And then sometimes you have curly, it's optional, right, if you want to do one line. Most people, I think, use curly braces consistently for their ifs and else's and so on. But that's included in the statement, so we don't actually see that in this definition. So, let's have a look at some code. We know some PHP code, right? Who doesn't know PHP code? Okay. Cool. You're at the right place, then. All right, so we'll try and use those rules. So, the first one matches the first one to a few lines of code. If statement without else, and it's got an if token, open brackets, expression statement. As I said, the statement includes the curly braces. And then it's got else if, so that's why we match that. So, the next part is matched by exactly the same rule. It matches the alternative grammar, if you like, that was defined. So, it matches the if statement without else and an else if as well, and a little open parenthesis, expression, closed parenthesis statement. Okay. And finally, the last one, if statement, matches because it's an if statement without else, followed by an else token, followed by a statement. Okay. With me so far? Some nods, good. All right. Cool. So, if we have a look at PHP 7.0, whatever, it doesn't matter, it's irrelevant. We can see it's doing some stuff, right? It has curly brace parts. So, that's, again, what happens when we match the grammar that's been defined. But there's something important that I've highlighted in big red boxes here. Right? In PHP 7, the parser outputs the abstract syntax tree. If we go look at PHP 5, the grammar is actually slightly different, so I've kind of matched it the best I can. But instead of doing this AST stuff, we're doing this zend do if-cond, which is basically an if statement, right? And the grammar is different. And you can see we're actually generating op codes here directly. You know, we're doing op code is zend jump zidden, things like that, right? You know, I have to worry too much about what they are, but fine. So, the AST is new, right? So, going back to the first diagram, it's this bit here, sort of in between these two steps is where the AST exists, if you like. So, the result of the parser in PHP 7 is the AST. The AST is then fed into the compiler, which converts it to op codes, and the op codes are executed, and magic happens. Anyone lost? It's okay to put your hand up. It's fine. Okay, so I'm going to try and simplify this, okay? First of all, what is an AST? Because I haven't really explained what it is, just that it is. An AST is just a data structure. It's a representation of your code, and that's pretty much it. Right? It uses a tree structure, hence the name tree, funnily enough. And it's not necessarily specific to PHP, which is kind of interesting. It's a data structure that says, well, this is how your code is laid out, and so on. It can be modified in theory. And obviously, if you modify the behaviour of that, it would... Sorry, if you modify the tree, it would modify the behaviour of the code, because basically you're changing the code programmatically, if you like. It can also contain some metadata as well, things like what line number, what file it is, and things like that. And this data structure that represents your file of code, or whatever, can also be unparsed. That is to say you regenerate a program from that, your source code. The layout may change because we may ignore certain elements of the layout, like whether you've got a space and so on. It may or may not be relevant to whatever programming languages there are. I say that because an AST is a representation used in many other programming languages as well, not just PHP. So let's have a look at what it might look like. So we know what this does, hopefully. If we represent this as a tree, brilliantly drawn ASCII tree, we have an echo statement, and we have a child value of that of a string, which is a scalar value, and the value is Hello World. And that's it. That's what it might look like. So then just to demonstrate it getting a bit more tree-like, let's do something slightly different. This does roughly the same thing, I think. There's a bit more going on here, though, if we turn this into an abstract syntax tree. We have an echo statement. We have a concatenation operator as the child instead of the string directly. The concatenation is what's called a binary operator. It takes a left and a right argument, if you like. I think of it maybe as a function, perhaps. So the left value is the scalar string Hello, and the right operand is the scalar string World. Okay. So let's do a bit of maths on this. Don't worry, it's not difficult maths, hopefully. We're going to assign some variables and then do this sum A plus brackets B times 2. Fine. This is what the AST for this looks like, and you can already see it's getting quite big and complicated, but it's okay. At the top, if you can just about see it, we have the assign statement variable A with the integer value of 1. We have B with an integer value of 3. Sorry, value of 5 and value of 3. I can't read it's really tiny on there, so it's probably really tiny at the back as well. But then we have the echo statement. That's the next thing to happen. And then we have the add operation. Again, that's a binary operation. It has two things to it. At the left side, we have the variable A. The right side, we have another binary operation. To multiply, because that takes, again, two things. You multiply two things together. The left is the variable B, and the right is just an integer with a value of 2. All right? Okay. So we can represent stuff as trees. But why? Well, there was quite an important, it was quite an important introduction to the PHP engine. Lots of other languages already do this. So maybe there's a bit of, well, PHP can be a grown up language and we can have one too. But it also increases the maintainability and co-quality inside PHP itself. And it decouples the compiler from the parser. So we have this sort of intermediate thing. And that also simplified some of the productions from the parser as well, which is nice. So the grammar actually got simpler. At the cost of some really weird edge cases that we don't really care about. And if you've already migrated to PHP 7 and you encountered some of these weird things, then you shouldn't be doing weird things in the first place perhaps. I don't know. Yes. And it's faster, asterisk. All right? I have to quantify this. Because there is a particular part of that process that is faster. So using the AST means that we can be faster in the compilation step, right? Despite this extra step, it kind of offsets it. But the runtime difference is kind of negligible. Most of the benefits that we see in PHP 7 are actually from other optimizations, like rewriting how Z-Vals work and referencing and things like that. Not this part of it. So the runtime performance is not really changed much from this particular change. But in the compilation step, that actually means that it's about 10% to 15% faster, which is kind of nice. But it does require more memory because there's obviously this big old data structure. That makes sense. But once it's compiled into opcodes anyway, it doesn't matter because you've already compiled the opcodes then and they're in the opcash and they're in the opcoder. But the reason why it's more efficient is because this tree structure is very friendly for compilers. Computers understand it nicely and you can go around this tree and figure things out and I'll demonstrate that shortly. So it's very efficient. In theory, you could also do static collapsing of nodes and things like that. If we look at this code, we already know what the answer is because the value will always be the same. Exactly, you could just change that into an echo the value, right? And then that AST becomes nothing. But we won't, we'll leave it as a big AST, right? So we can draw this a different way, like a tree, but upside down. So the exact same information is here on this slide. But when we show this as a tree and draw it out like this nice tree, we can do something. We can follow a line around the graph. Execute each node that we touch down in order. And if we start at the top and go all the way around and run all of these, if you like, this is something called a pre-order traversal. And if you write them down in the order that they happen, you'll get something that looks like this. And in sort of pseudo code terms, that looks the same as the code we've written, right? The PHP code, roughly. So the important thing is the operation or the statement is prefixed, right? So the assign comes before the variable and the scalar value. And it's called Polish notation when we write it down like this. And it allows us to very easily see the order of precedence. We know that we have to multiply something first before we can add the value. So we have to go all the way down, multiply B by 2 and then we can add A to the result of the multiply operation. When writing a programming language, I've mentioned this thing called order of precedence. When writing a programming language, order of precedence is really important because it defines how things behave when the order of operations is potentially ambiguous. Demonstrated by this simple maths sum, if you do it two different ways, you get different results. So order of precedence is very important in the same way in programming languages. Is it seven? Do we do the multiplication first? Or is it nine where we add first and then multiply the result? We know in mathematics rule that we multiply first, right? But you can change that potentially in a compiler and make it do something different, which would be unexpected perhaps. So when we write this out and we do the multiplication, then we can see the order of things that have to happen. But it's actually kind of backwards because you have to go all the way down, take the three and the two and multiply them together and so on. So reading left to right, we have to say, well, okay, we want to add something. We got the first thing, which is a one, but then we got another operator, which is the multiply. So we have to go and do something else first. Right? So the operator is multiply. We have a left and a right and they are something that we can do something with immediately. That's great. So we know that the values are two and three. And so we return six from the multiply and then we have the value six as the add. And then we know to add one and six and the answer is actually seven. So we can turn it around. Maybe that's a bit easy to understand. We have this thing called, funnily enough, reverse Polish notation. It's kind of similar, but maybe not as obvious as what's going on here. It does exactly the same thing. It's just a different way of writing it out. To pass this, we build a stack. So we can do this thing called a stack interpreter, if you like. So we've got a stack and we're going to step through each item one by one. If it's not operational, it's an operator then we're going to execute on whatever's in the stack. Otherwise, we're going to add it onto the stack. So, one by one. This is a number. It's not an operation, so we'll add it to the stack. The second one's also a number, so we'll add it to the stack. It's a number, we'll add it to the stack. Fourth one is an operator. So we have to do something now. We know that the multiply operator takes two operands. So we have to take two things off the stack, two and the three. And then we put the result back onto the stack. So now we have six in the stack, at the top of the stack. So the next one is the add operator. We know we need two things for an add because we add two things together. So we take the two things, one and the six off and then we put the result back on, which is indeed seven. And then we've reached the end of the input. The final result of that particular operation is the single item left in the stack. We know it's seven, right? If you have more than one thing left on the stack or no items in the stack, something's broken, okay? All right. So now we have armed ourselves with some interesting and dangerous knowledge. We can now write a compiler. I'm going to constrain it for the sake of keeping it within this talk. And we're going to use an AST in it as well, of course. So we can do this in three easy steps. Don't use this in production, though, because it's terrible. If you like GitHub things, you can have a look at the source code on github.com. Basic maths compiler. And all the code that you'll see on this slide is there in its entirety, but not slightly redacted to fit on tiny little slides on this huge screen. So first of all, when we want to write a programming language, you have to define what's in it. Our language is just going to be basic sums, okay? Just maths. Hence the name basic maths compiler. All right. So we're only going to use positive integers. So nothing negative. That simplifies things somewhat. We're going to allow white space. But we're going to ignore it. And we're going to have four operators. There's only four things you can do. Add some trapped multiplied void. And we're only going to support one line of input. And another thing to simplify this even further is that we're not going to allow any way to override the order of precedence. So no brackets or anything like that to say or execute this bit first. Ignore that, because that gets a bit complicated. So first up, we write the lexa. A very simple lexa can be written just with regular expressions. Who loves regular expressions? Who hates them? It's the other hands, right? So whether you love them or hate them, regular expressions can be used here. So we're going to define them and notice that all of them, if you don't know regular expressions in tax very well, the circumflex at the start means you always match the start of the string, okay? And then we've got the plus or minus. Multiply, divide, some of them have to be escaped with a backslash to make sure that they pass properly and so on. Backslash D plus is a number. Any digit, if you like. Backslash S is any white space. So any amount of white space, fine. But by using that circumflex there, we ensure that we're always matching the next token, not something further on down in our string. Because we want to do this in order, right? We don't want to say, oh, there's a number over there. Because, oh, the other way around. There's a number over there, yeah? Because that's going to screw up our lexa. So we always want to match the start of the string and then we move along the string bit by bit. All right? So we're going to step through the string, like I said, bit by bit. And what this bit of code here does, just to explain it, we analyse from the offset onwards each time and we try to do this in order. We move the offset onwards each time. And we try all of those regular expressions each time. We move the offset on if we match something. I've used this fancy word, lexime, basically that's the thing that got matched. That's a fancy word for that. Okay. And the match is just, we literally loop through the regular expressions and see if any of them match. That's it. If they do, great. That's all over the place. Let's make another object. All right. So that's it, that's a lexa done. That was it. So now we need to parse the tokens. Okay? Before we can do this though, we need to know the order of precedence. As I explained, it's something very important for programme language. We're going to use standard operator precedence here. It's kind of in inverse order. Basically the highest value is the most important precedence, if you like. So multiply, then divide, then add, then subtract. So the list of tokens we get from our parser, or the array of tokens, or stack of tokens, is in the correct vernacular. We're going to loop over that. Okay? And we're going to create two other stacks as we go through this. We're going to have our token stack, which is the output stack. So we've made sense of this. And a temporary operator stack, which is going to be, we're going to chuck stuff in, probably operators, the name may suggest, and then take them out and so on. If the token that we find is an operator, that is, it's add, subtract, multiply, divide, then we're going to do a load of other code. So this is inside that if block. Basically, if it's an operator, we're going to put it on the operator stack and put it on the operator stack and move on. And going backwards a bit, if we find something in the operator stack that has a higher precedence of the operator that we're looking at, then we're then going to pop it out of the operator stack and put it onto the output stack. All right? And then at the end, we clean up any leftover operators in the operator stack and put them onto the output stack. So then everything ends up on this output stack. All right? So all we're doing here is just reordering, shuffling the tokens around into something that is more meaningful to our compiler. So let's try and visualize this with an amazingly drawn load of boxes. So we're going to go through one by one. Okay, this one is a value, so we're going to put it onto the output stack. Oh, this one's an operator, so push it onto the operator stack. Okay? There's nothing of a higher precedence there, so that's fine, we can just add it on. And then we add two to the output stack. That's very simple. The multiply operator comes next. Oh, it's an operator. We check the stack. If the last thing has a higher precedence, which it doesn't, then we would have popped it off, but it doesn't. Okay? So we just add the multiply onto the operator stack. Finally, the last one is just a value and then we need to do our cleanup. We need to take all the operators and put them onto the output stack. This is familiar. This is the reverse Polish notation we looked at. So what we've done is we've taken this big stream of tokens, or potentially big stream of tokens, and we've shuffled them all around into this notation that we can use in our compiler. So now we need to create the AST. But because we have this stack of tokens in the right order, we can actually take those tokens and very simply make them into a tree. So we progress through the stack again. So we do another loop. IP is the instruction pointer, which is useful if you start getting a bit more complicated and going back through the stack and then forward through the stack and so on, which we don't have to in this compiler. It's fine. We can ignore that, but it gets a bit more complicated perhaps. So it defines where we are in the stack. So then we just create the AST, right? If it's a value, we just say, it's a value, and we give it the lexeme. We pass the lexeme to it and cast it to an integer because it's a number, right? The lexeme is the bits that's matched, so that's going to be the numbers, right? If it's an operator, we need to figure out what kind of operator it is. There's a bit of code that I've omitted there, but we figure out what it is, store it in dollar node type, and then we create this AST node. So we're representing our AST three of PHP objects here. And because these are all binary operators, they both need the left and the right, okay? So we pass those. So we take them off the stack using that stack compiler thing, right? And we make the AST node there. So there we have it. It would look something like this, right? We have our ad, we have the integer value one perhaps, and then a multiply, and we have the two integer values, right? So then we need to go through this. We have this parsed now, so we need to execute it. And again, this is quite an easy part. We're going to create this tiny virtual machine. It's kind of inception there because PHP is running in a virtual machine and we're creating a virtual machine in the virtual machine. And if you run it inside vagrant, it's inside of another virtual machine. Anyway, lulls. We descend through the abstract syntax tree and execute things as we come to them. That's basically it. But we can't execute it until we go all the way down the tree and figure out what we're trying to execute, right? So we have this recursive calling of this compile node function. So this is like our entry point for our compiler. So at the top, if we know it's a binary operator, so all of the binary operators in this particular source code extend from this abstract binary operator, to the interface that would work as well. And then we call another function called compile binary op. We'll come to that in a moment. If it's a scalar value, i.e., our integer, then we're just going to return the value directly. That's it. That's all we need to do there. And we don't have any other node types. We're ignoring white space. So that's it. So the compile binary op. Let's have a look. This is where kind of the meat of it exists. Now, we don't know whether left and right are values that we can use straight away. So actually what we have to do is pass left and right into the compile node function. So that's where the recursion comes from. Once we've done that, eventually, there might be another add or multiply, whichever, it was multiply, wasn't it? So one of the sides is going to be multiply. So then, actually, it would go back into compile binary op and then call compile node on the left and right, and that time we'd get integers. So left and right, eventually, hopefully, we'll turn into integers. And then we figure out the type of the node, so whether it's multiply, sorry, add, subtract, multiply, divide. We've got to switch here as you can see, and we just run it. We just do add, subtract, multiply, divide. And that's it. All right, so what does this mean for me? On the surface, this doesn't seem very relevant to our day-to-day jobs, perhaps, unless you're Derek. But what it does mean is that we get a nice bonus, along with the other optimisations in PHP 7 and so on, that we have faster and more efficient code. AST is one of those reasons, faster asterisk, yeah? But it does mean that we have some nice scope for optimisations in the future as well. And I believe some of those are sort of being discussed or some of them being implemented as well, which is nice, because we can reason about how the code goes together a bit nicer. So we've got this separation from the compiler and the parser, so it's kind of fun. Knowing how the engine works can help somewhat. I would suggest, though, that just because you know how the engine works, don't necessarily write micro silly optimisations that say, well, I know this is going to generate a more efficient data structure in the AST or whatever. No, PHP is actually a brilliant language for people to write terrible code. And PHP just deals with it, right? And that's evidenced as well by the vast improvements in PHP 7. Maybe Vim talked about those earlier. I didn't see his talk, but something I mentioned earlier as well, the AST isn't actually specific to PHP, but for most programming languages, apart from the weird ones that we'll ignore, most programming languages work in exactly the same way I've just described. We let stuff, parse it, compile it. And that's pretty much how you make a programming language, except some of them, right? We'll ignore those, though. So this can actually help you understand how your code is put together and helps you understand, actually, that syntax is just that. Essentially, all programming languages do roughly the same thing in a very slightly different way with some weird quirks. Fine. So can we use this? Well, no, unfortunately not at the moment. The AST is not directly available for you to say, oh, what does the AST look like for my code? But there are ways we can do this. Why would we want to do this? Well, it gets us big insights to our code. What's going on? We can examine the structure of the code that's going on, and we can do some magic things as well. I like magic things. There is this extension from Nikita Popov called PHP AST. And that exposes the AST into userland. So you could then say, well, have a look at this source code and give me the AST for it. So it gives you a load of objects. The downside to this, depending on your opinions on this, is it's an extension. So you have to then go and install an extension. If you're on perhaps a shared host, that might be a bit more difficult. Maybe you probably don't want to do that in production anyway. But this is like how we'd use that. So we install the extension, whether it's by Peckle or compiling it by hand, like a nutter. You give it some code. In this case, I'm just giving it a string of code. You use the AST backslash pass code function. And then we've got this AST dump, which is basically just a helper to make it a bit more readable. Because as you saw, even with that simple math sum, the AST got pretty big and maybe even impossible to read at the back. So PHP AST is what we can use to expose PHP's very own internal abstract syntax tree to you. There is also another extension, which I advise you don't use, because it's madness. But it's called ASTKit. Does anyone use RunKit? Does anyone use RunKit in production? Okay, cool, good. RunKit, if you don't know what it is, it allows you to like unknow classes and do things you shouldn't probably be doing and things like that. So ASTKit is written by Sarah Goldman. And it allows you to modify the AST and then execute it. So in effect what you get is this concept you may have heard of called monkey patching, which is where you take some code, change it before you run it, right? Which is quite cool. The downsides are, it's an extension and this is probably not something you want to be doing in production. And it was written as just a bit of an experiment, right? So maybe don't use it in production. So some usage of this extension. Fairly straightforward. We give it some code and it exposes these extra methods like getchild, getchild, graft and stuff. Basically what that means is changing the true in the if into false. Because it's a tree, you're getting the children of the tree and things like that. You're changing things over and so on. And then you can execute it. So the first time you execute it, it will say this is a triumph and then the second time you execute it once it's been changed it will say the cake is a lie. So these examples are just taken from the read me's. I didn't write this. But it demonstrates what these extensions are used for. But that's not all. There is this thing called PHP parser. It's written by Nikita Popov who is the author of the PHP AST extension and does a lot of work on PHP core. So he knows how it works, which is a good start. And it's more or less the same as the PHP AST extension but it's written in PHP. So you can install it with composer. You can say just composer require Nicky C PHP parser. So the upsides are it's not an extension. It's very easy for anyone to use. The downside is, and there's a really big downside to it, it's really, really, really slow because it's written in PHP. So this is how you use it anyway. There are use cases for it. I'm just getting there. Don't disregard this straight away because it's slow. So this is how you use it very much in the same way that if you have an extension, you give it some code. In this case I'm just doing some file get contents on whatever and calling pars and you'll get this big old AST depending on how much code you have, of course. All right. So it's very simple to get the AST and then we can do stuff with it, right? So we have this library that I'm going to shamelessly plug now. I don't care. It's called Better Reflection. For short, it's the reflection API that you may or may not be familiar with in PHP but it uses the AST instead. It's pretty flexible, it's pretty powerful and it allows you to do monkey patching. So this is roughly how it works. I'm not going to go into too much detail because this isn't at all about Better Reflection. I want to make it relevant, though. We have this reflector which is like the public API. That's how you do reflecting and the source locator is basically instructions for Better Reflection to find your code. The reason why we do this is because we can now reflect on code that doesn't exist to PHP yet. So you could reflect on a load of code that hasn't yet been loaded, change it beforehand and then run it, right? That's the point. So this uses PHP parser, that's like the main part of it if you like and you've got all the methods most of the methods that you're used to when you're using normal reflection but a few extra ones like get AST and get body AST and things like that. So you can start looking at the content of your methods and your classes and functions and things like that. The API is roughly similar instead of saying new reflection class we have this new Better Reflection we get a class reflector and stuff like that. But we're returning this reflection, right? So we're using some same defaults, we're using whatever auto loader is registered in your SPL auto load add thing, which is usually going to be composer, right? So anything that can compose, composer can load, we can load too but we don't actually have to load it to look at it. We just grab the source code, feed it into PHP parser and then we have the reflections so this is that source locator and class reflector stuff. The defaults are that we can reflect on PHP's internal source because there's a load of stuff there and that's an awful mess so we'll ignore that. Evald code if you're evil enough to be using eval and then your auto load one which does some magic under the hood which is kind of fun actually it overwrites your file streamwrapper and then it will say did your class exist so it will trigger your auto loader to say oh open this file but instead we've overwritten the file streamwrapper and then we stop it from actually being loaded we just grab the file name and then we restore the file streamwrapper and then we have the file name so then we can open it with PHP parser so that class never actually has to be loaded for us to look at it we just grab the source code of it directly and it works in 99% of cases if you're loading your source code from something like a database which is really weird but it wouldn't work or if you're loading it from a memory stream or something like that which is also a bit weird most people load them from files that's where we store source code so it works for most of the time so given a class structure like this fairly straightforward we get the AST that looks like this so we get all of the nodes we get all the structure of the code and I've simplified it down because that would be huge so we get the class node with an array of statements so the statements are the body of the class and we have the property node and the property node has types and attributes and things like that we have the method node has a type, some parameters statements within the method actually we didn't have any statements within the method so that would just be an empty array any relevant attributes like start line, end line, things like that so what can I use better reflection for so we have this AST, great so we can do some monkey patching as I kind of explained you've got a class, for example my class and it returns 5, very simple so we can use better reflection to reflect on that because it doesn't necessarily load your code into memory so we can grab the AST of that because hopefully as you know certainly if you use Runkit you know that once you have loaded a class in PHP you can't change it that's one of the benefits of Runkit so we can do this instead we can say grabbing the source code finding the AST of it make sure you do it before the class is loaded of course then we use this class loader and we're going to cache it in a file so we don't have to keep doing this over and over again do this after all the other autoloaders because this class loader actually registers his own autoloader to go in the way of composer if you like and say well if you try and instantiate the class that you're looking for and I know about this and I've changed it then I will make my own evil monkey patched version instead so then we can use the reflection we can say get method foo which is familiar probably if you use reflection to get a method of over reflection but we also have this new function that doesn't exist in core reflection called set body from closure and we can pass it a closure in this case a function that returns for you can also set the body from a string if you don't like writing functions or you can pass it a load of AST nodes that are in the format of Nikita's PHP parser AST nodes and it does have like this whole way of building AST nodes which is kind of nice but basically what we're doing is we're making that method return foo instead of 5 it's a kind of fairly straightforward change so the autoloader that we registered that class loader will kick in and say oh you're trying to instantiate my class and it will allow us to load the modified version of that code so now it will return foo not 5 hahaha so for the PHP engine AST is a nice efficient data structure that we can use to represent code it means that compilation step is faster obviously when you factor in the OPCash it doesn't make any difference in runtime really maybe like the first time you run that but it provides that nice separation But it provides that nice separation from the Pazza and the Compiler. But what's more useful to us normal developers who don't delve into PHP's core too much... Not that you're not normal, Derek. these concepts can be used in userLand and they are very useful. So we've got the PHP Pazza library, which means we can do this straight away and do it slowly of course. We have better reflection which is based on PHP Pazza. Ond os ydych chi fod ar y cwmwysgwch o amlwg nifer i los i'r cyn o'r llwyddiad ein gweithigau ar gyfer ei achyfodd a'r llwyddiad sefydliad o'n amlwg, ond diddorol o hollwch yn ein hyn大家都 iddi'r pleidwyr stathog yr analysus. Mae'r analysus yn y rhan o'r tawch am ymdweud, oherwydd gechwyn fyddi, swipe upon it, about a tool called Exocat, which uses the AST as well to do static analysis. If you're not sure what static analysis is, it's basically looking at your code and seeing where it's going wrong, not debugging, that's a different thing. It's kind of looking at things like types and saying, well, does this function accept the right types that it's using and things like that? Are you passing the right types to a function? Obviously, if you're using strict type declarations, that will now throw lots of errors, but certainly in PHP 5 code and before, this kind of thing is going to be immensely useful. So maybe we can use it to look at the dock blocks and say, well, you're using this function and passing it an integer, but the dock blocks say a string. So what's going on? It's wrong, right? And just who uses PHP Storm? Okay, many of you. It does static analysis as you're writing code. So when it says the little squiggly underlines and it will say, oh, this doesn't look right, that's what it's doing, it's looking at your code, it's not running it, it's statically analysing it, right? There's also another tool called FAN, P-H-A-N, not F-A-N, which uses the PHP AST extension and that does a load of static analysis as well. It was originally written by Rasmus and then some other people have taken it over and now maintaining it, and it's a very useful tool. And these kind of tools, you can stick into your CI environment and then make sure nothing breaks, or nothing obvious breaks perhaps is a better way of putting it. So it allows you to catch your bugs faster. So actually, it's cool and it's very useful.