 Hi everybody. So it's an intro to PHP internals. So I gave this talk exactly like four years ago. But many things have actually changed from what i'm going to talk. Some of the things actually have changed. Because from PHP 5 to PHP 7, there was many internal things that were changed to improve the performance. While at the same time having almost the same code. So you could run the same code and it would still run like X number of times faster. As like Rasmus talked in the PHP conference. So therefore it's quite changed a bit. But some of the fundamental things are pretty much still the same. It's just the intricate stuff. So PHP is actually written in C. And also in PHP actually. So there is PHP scripts that generate C code. So in theory you actually need PHP to generate or build PHP. So if you're doing some language stuff, it might be actually quite hard for you to do. Because the PHP that generates that C code might be wrong. And invalid, so you would get it. So there might be weirdness like that. So the source code for PHP is in github slash php slash php-source. There is branches and they have each of their versions and things like that. But master typically has the cutting edge. So this is the framework of PHP. So in the external there is something called TSRM. Typically if you look at the source code it says TSRM. It's basically for thread management. To say more than that I'm not sure. Because things I know are gone. Then there is something called the SAPI. So SAPI is the server API, server application programming interface or something like that. I think it's server. I can't remember. So SAPI is the one that connects PHP to the outside world. So if a request comes in it goes through SAPI typically. Then there is two parts. The PHP code which has file, i.o., networking, and things like that. Then there is a send engine. Send engine is the one that actually lexes your code, passes your code, generates the up-code, compiles them, wants the up-code in the ZenVM, and kind of does everything. That's critical. Then there is PHP extensions. That are first-class, I guess, first-party PHP extensions. So these extensions are date and time, ISP World, Curl, there is tokenizer. I think about 20 first-class extensions. There's many, the most crucial stuff that you might use, like string stuff in something called standard extensions. But most of the things that you write code for is typically calling a PHP extension to get the functionality. There is extensions like Curl, like bindings to the libcurl, then actually re-implementation or something like that. They're just bindings. That's why if you see Curl, you'll see the curl.php. PHP underscore curl.h file or c file. You'll see a lot of constants. They have exact versions of curl that should be there to use these constants. So you'll see a ton of constants. If you like documentation, that's one place you can go to because there's all these constants there. Or if you want to contribute, I think it's probably one pretty easy thing. You just have to add constants. Yeah, so that's the PHP over there. This is the process cycle. So some code comes in. I'm not sure whether you're familiar with OpCache, APC, what's that? Send cache? Send OpCache? Send cache? Something like that? Send optimizer, I guess. So first, when the script comes in, you look up the OpCache to see whether there's already compiled data there or the way it checks it. Previously, it would just check whether the file was modified. So if the file was modified, then this thing would fail. So if there's no code, nothing in OpCache, it passes the code, generates the OpCode, and then saves to memory, which is OpCache, and then the ZNVM executes them, and then you kind of get the output. The other way you obviously can see, if it's in the OpCache, it takes it from memory, and then directly goes to ZNVM. So therefore, for obvious reasons, if you have OpCache, it's drastically going to increase your performance because it doesn't have to recompile these things. Also, so you have heard me saying compiled a few times, but you would say PHP is interpreted. But it actually does compile to something, and that something can be reusable. It's not like if you know Lisp, Lisp is a pretty much you can write an interpreter. It's an interpreter you just go through, and then you go one by one, and then you execute while you're out running it. But PHP does not execute while parsing. It executes after parsing, optimization, and everything. So yeah, that's why I say compiled every time. So the lifecycle of PHP, when you run PHP.run.php, the CL comes in, the SAPI starts, and so there's something called M-Init, which is the engine in it, engine start. So what happens is if you have 20 or somewhat extensions, it will call M-Init on all the extensions, and it kind of starts up these extensions. So what happens is if you have an extension that has to do which has to declare constants in the global scope, which has to have database connections. So it would start up your database connections on the engine start level. Then the request actually, the R-Init gets executed. And that's when the request comes in. When you run it in the client, they all happen, everything happens in one go. So after the R-Init, it processes the code, and then engine request shuts down, engine shuts down, and PHP stops. Let's see how it's kind of different now. So there's this multi-threaded lifecycle of PHP, which, as you can see, there's one in it. So your database connections are actually done in the M-Init, and then you have the request running in threads. So it's actually using the same database connection that was opened. So you don't actually have to open a connection every time you do this. And this is same for global variables and constants, sorry, not global variables, constants. So there's the multiprocessor type of SAPI, which I think it might be Apache 2 or something. Probably uses this. I'm not exactly sure. So this happens in every process. So in every process, there will be a database connection. So now you kind of understand why there's an engine in it, engine start, and a request start. How are they different, and why, what purpose do they serve? OK. So anybody heard about CVALs? So PHP is at heart. Everything is a Z-Val. Every variable is a Z-Val. Every array, everything is a Z-Val. So CVAL is the atomic of anything in PHP. So this is on the right side. How many of you know it's C? OK. So this is a struct. This defines the Z-Val. So when you have A equals 1 or something, the A equals 1 actually points to this Z-Val. I'll talk about it a bit more after. So this has four types of information in it. I'll just skip the first one. The second one is called ref counting, which I will talk about later. Ref counting is used for garbage collection. And next is the type. This is the type of the variable. Int, float, string, null, what is it? It can hold about, I think, eight. So there is eight types that can be assigned to this type. So this is where your variable type is actually set. So then you'll be asking PHP is dynamically typed. So what's this about? This is because it's C and everything, it has to have a type. That's why PHP actually can do a lot of casting and things like that, because when you do the casting, they just change the type. And that's what kind of determines a lot of things down the low. So if you see weird behaviors, like casting stuff, if you have weird behaviors when you're casting variables, it's probably because of this. But this holds the type. Then there is something called isref, which I'll talk about later. So the C value, the first one I showed you, so this is where the actual value of A equals one is actually stored. So the one is actually stored here. So this is at union. So union is a type of data structure where it's in one place of memory. And so though it has multiple types, it only has one place in memory. So you can only access or use one type of attribute, let's say, of this. So I'll just talk about the values there beforehand. So there's L value to store long variables, so long values, which are integer, long, those things, that long. And then resource identifiers, like file resources and things like that. Then there's D-valve, which is double and Booleans are also stored there. Also then there is a struct. The next struct is a string. So this is where your actual, if you have one in double quotes, this is where your string is actually stored. It's the valve will store the string and length will have the length of the string. So if you do str length or something, it's actually constant time because length is actually calculated to you while it's been assigned. That's a string. So hash table is where your is are defined. It's another complex data structure that's behind it. Then there is in object value. That's where classes, objects, and all those things are stored. Hash tables and Z-valves actually can go pretty deep. There's a lot of pointers and tables to look up and things like that. So it actually gets a bit more complicated double line. So this Z-valve is very specific for PHP 5. Don't go look at PHP 7. It's totally different. The reduce of this was one of the most critical thing. I think it's the most critical thing that kind of optimized PHP because this took more, much more memory. And the new version in PHP 7 takes less amount of memory. When they were profiling, what they figured was that the PHP code kind of was spending a lot of time allocating and deallocating memory. So memory was one of the key things they tackled, and that's why PHP 7 is much more faster. Did I miss anything? Oh yeah. So as I said before, this is a union. Therefore, only one of these things is true, or has a value, or has something in it. That's the purpose of a union. Any questions? Sorry. I might be going a bit faster. My apologies. Is it okay? About a set in SAP. PHP is basically a single process, single set. Yeah. So PHP is, but not the SAP or Apache or whatever your engine X or whatever your front end, let's call it, of what you're running. That's not, that's what kind of handles your PHP. PHP might be single thread, but if it wants to do it, they could do it if they wanted to. Outside of PHP? Yeah, outside of PHP. So you can actually choose what you want to do, what your priorities are. Okay. So anybody know copy and write? What's meant by that? So this is something, a concept called copy and write, which is actually typically there in most of the languages. This thing is there. This philosophy, let's say. So let's go through this. So previously I showed you Z valve and the value. So we have variable A here, which is equal to one. And you have Z valve, that's value equals to one. Ref count equals to one because it's a certain ones. Now you have B equals A. So what happens is actually they're not two variables. They're one variable. So you can see they have this, they point to the same Z valve. So there's a value is one, but the ref count is two now because there's one more reference that's coming in. Now you have C equals B. They all, all are three then actually point to the same Z valve. So the value is the same. Ref counting equals three. So this is done. So they don't have to actually copy these Z valves every time because if you're not writing to this, why do you want to have another variable stored, right? Because everything is passed by value. If they had to do this, there would be PHP, it wouldn't be a programming language. It would be just a thing that copies variables everywhere. Okay. So this, now this is where the next one is where everything gets interesting. Sorry. So is this what an optimization that means? No, no, no. This has been there for long. Yeah. That's a pretty good question. Because it involves C valve obviously because C valve was optimized. Therefore now this, therefore it's optimized. Let's say indirectly. But this is not the concept to optimize it. So then you have, you increment A. Now that A is now a totally different thing. So what happens is B and C stays on the same Z valve and A goes to another C valve. And you can see the ref count becomes two now. Because now there's no three variables. There's only one or two variables pointing to it. And then A gets the value of two and the ref count one. And then now then you unset B and the ref count decrements. You unset C, now the first Z valve decrements, and ref count becomes zero. So when ref count becomes zero, now what it indicates to garbage collectors is good to go. Delay this thing. Yeah. So that's how the Z valve structure and ref counting and copy on write. This is very common in most of the other program languages also I think. Okay. Any questions on this? Okay. So now we have talked about how PHP works. Some of the internal stuff. Now what we're coming is the pretty interesting part. So how does PHP get compiled? And how do you, how, what runs it? How does it run? Right? So there's three, let's say three phases for this thing. One is called lexing. One is called, next one is called parsing. And the next one is generation and running of upcodes or compiling. Okay. So the lexer is, it identifies what, what, the, your, your, your script. So let's just go through the example. So you have the PHP. PHP is a, the PHP open tag is an open tag. So it's, it's set as an open tag. So this is, this kind of generalizes your code. Okay. Now, if you were not to do this, now we have to be looking at, because, because most of the, otherwise you would have to do like string looking at strings. So though I say it's under t underscore open tag, it's just a string version of it. It's actually just an integer. So it actually saves a lot of memory doing this. There's an also an interesting thing for this, which I will be talking about this later. Next, you actually have a white space. It's actually has a token for white space. I didn't put white space here because it kind of clutters the thing. Next, you have the class, which is the t underscore class. Then you have a string called a, and then you'll see this thing. One of the questions you would have is, okay, now every, everything else has a token to it. Why doesn't this have a token, right? It's because it's a single character. So it doesn't make any performance difference because it's a single character. It takes the same amount of space that if you would have t, the same things have the same things. Let's say. Sorry. There is a few places where this becomes a bit different. Not every code braces would have this, but there's like a variable in a variable kind of situation. This would not happen. Then you have a constant, a public, and then again equal, your number, and the semicolon, and then you have the close curl brace. So this is what Lexa does. Lexa just replaces things with an identifier. In the actual Lexa code, this is how it looks like. So the Lexa code is actually in something called zenlanguagescanner.i. So you might be curious, why is it i? What the hell is this i? So i is a file that's inputted into a tool called rec2c, which converts these rejectc types of code into c code. That's why I'm not sure why it's called i though. They could really easily call it r. So this is the Lexa. So it says if you see an if, put it as a t underscore f. If it's an elsef, elsef, and if this is the simple section of the Lexa, there is something to get a bit more complicated. They have to just do a bit more to figure it out. Then you come to parser. So now you know what these data are, but you don't know if it's an if with the else, or without the else, or if it's a short if, or those kinds of things. So that's what the parser does. The parser creates something called an abstract syntax tree, which has all the nodes. It kind of tells you this is an if, which has an expression, and it has a statement. The statement is actually another definition like this. That says it opens with a curly brace. It ends with a closing curly brace. Inside of it, it can have these things. That's what the statement is. Then you go into, then now you know what your, now what you do, try to do is you try to add it to your syntax tree. So they try to say, or this is the if, this is the this side, this is the true side of the if, the false side of the if, and things like that. Yeah. Okay. So you know all those things, and then you come to the compile step of everything. So in the compile step, it chooses the abstract syntax tree and try to figure out what are the op codes you can produce. So op codes are the ones that you run, and that's what kind of runs your code. So this is the op code, this is the op code for echo PHP. And so this is, this is, if you haven't not used VLD, it's Vulkan logic dumper. So it's a tool that you can put PHP scripts to, and it produce you the op codes. So if you want to figure out whether ease array exists or something else is faster, you put it to the VLD and you see the op codes. The op codes will tell you what's faster. So there's a nice site that's called, it's called 3V4L eval. So it's 3V4L.org. So this is an interesting site where you can put PHP code in here and it should actually run to your PHP code in over 200 PHP versions. So it has like runs code in 200 plus PHP and HHVM versions. So if you're looking for regressions or things like that, this is pretty good cool tool. So in this, let me just show you this. So this is my code. If you look on it, I can look at performance on PHP 7.3256 and all of those things. And then there's this VLD. So VLD is where the op codes are for this code. These are the op codes that are produced. It's actually, this is produced by 7.1. I haven't figured out a way where you can produce the VLD from like five in this site. It just generates the thing on the latest version. Not the latest version actually. It's like 7.1 is not the latest. Also, there's some interesting things where this tool also has RFC branches. So RFCs are requests for comments where any new functionality is added to PHP. That's done through an RFC where people vote. And you can see the branches of RFCs. So you can actually run PHP on a version that nobody actually have even voted on. And see whether what interesting stuff it does. So let me just show you. So as you can see, the VLD, the top stuff doesn't like matter that much. Can you see this? You can see this, right? So this is the line number of the PHP corresponding PHP code. This is the op code number. So line three has two op codes. I'll talk about the return later. And these are the op codes that's going to run. So the first op code is echo, and it echoes this. This is very simple. And there is a return. Everything in PHP, op codes has a return. I think almost everything in PHP has a return. If you don't have a return, it still has a return. That's a safe thing to say. Now this is a function call. I made it green and red. So the red part is the function call. And green is the function execution of the function. So if you look at the red, sorry, it's the other way around. The top one is where it calls the function. And the top one is actually the execution. So in the top one, if you can see, there's a no op, which means no operation. I'm not sure why it's there. I'm sure it serves a purpose. Then there's an initf call, which is a function call, to say now we are going to run a function. And Zenval sendValve is putting the data on a stack. So now the f call actually starts up a stack. And Zenvalve sendValve actually puts that number one into that stack. And then you actually do the function call. Then the red box gets invoked. There the first one is receive. So you pull it out of the pull number one out of the stack. And you put it on variable zero. So if you see exclamation point zero, it's something called a compiled variable. It's not your typical notion of variable in PHP, but this is just internally, very deep. It's called a compiled variable. So it takes zero, and it assigns one to compiled variable zero, and it actually returns compiled variable zero back to the caller. And as you can see, if there's a return, there's still another return. There's always a return. In some way or form, I think. I'm exactly not sure why they always return, but they always return. Let me see. So this is an if statement. Just looking at it. So this code has a first, it has an assign, assigned flag false. So that's the assign code that happens. So boolean not is the actual if statement that runs. It saves it to variable two. So this approximate or tilde means it's a temporarily variable. Explanation point, compiled variables. Difference being these things, temporary variables are directly sent to other opcodes, as you would see in the next one. Then you have a jump instruction. So the jump basically says if tilde two is zero, then jump to opcode five. If not, if you just keep executing. So in the true, it should just echo out true, hit the next jump instruction, and return and go out. If it's false, the code jumps to echo, and code jumps to opcode five, and then it returns afterwards. That's that. So this is a follow-up where it's kind of the same thing. This is just a bit more complicated where it's going upwards than downwards. So it's actually looping basically. So next is the demo. So I've done some things with PHP. Let's see how it works. So I've made it specially for... Okay. So I did some changes to the PHP syntax. That's got it. First, you have false, and instead of it being if followed by the expression, I'm going to say... I change it so that the expression comes first, and the if statement, and then the if comes later. I just change it for the hell of it. So I can just show you some interesting things you can show. So this is my... After I modify it, this works. And the other version, this would also serve the same purpose. Then I replace my else with what? Pojio. Yes. I think it's fair, right? Fairly, fairly okay. So let's see how... I believe I have no idea. I'm just using my code to just rewrite PHP. Okay. Pojio is my else. So I just replaced it. Let's see how I kind of did it. So the... Yeah. So this is the lexer. Yeah, this is the lexer. It's a pretty huge file. So what's the first thing I need if I wanted to do this in the lexer? What token? What token? Pojio. Exactly. So if it's Pojio, I just say it's an else. Don't worry about it. It's an else. So basically I'm aliasing the else statement. Okay. Then next what should I do? Yeah. So that's the parser now. So now kind of PHP knows. PHP knows what this means. It knows this is an else. And they know this is an if. Now we need to change the order. So it's done in the scanner, the parser code. Now this is not the code. Yes. The file. Okay. Where's this? So I'm going to search for TF. Okay. So originally it looks like this. Let me try to syntax something like this. Okay. So this is a typical expression when PHP works. So there is an if statement, an expression, and then the statement. And then this is just a code. Don't think about it. So this is the gist of everything that's there. So what I did, very simple. I said my expression comes first. There is a TF. And then there's the statement. This is my syntax in my world. And that's pretty much all you have to know. And then you have your own brand new PHP functionality. Question? Yeah. So when the let's search, it brings up the tokens and gives the abstract syntax tree. So I thought that probably you can say that if and then you will get the condition and the block for the if and then the block for the else. So in this case you are reversing. So will this affect the building of the tree? This program actually cater for it as well. Yeah. So because this is where the, this is just a, just thinking of it as an fst. Like an fst where it says this is my whole statement and i Identify this box. Just dump it into this if statement. So you can, you can twist it anyway around you want to. What you need to do is just identify the box. You just put the, in an if statement, you know there's A condition, there's else and there's a true in any Order of fashion. Just take them put into these pockets. You can do it anyway you want to. It's kind of the rough idea. I'm just hand waving this summer stuff. So it makes it very easy to understand. But it's all that simple. Okay. Nobody saw that, right? Oh yeah. Yeah. Sorry. My bad. So I, this is my own version of PHP. So I, I compile PHP and then I run it myself. So this is only for me. If I wanted to fork PHP, I could do that and then Introduce my crazy PHP language. Yeah. There's actually no point of doing this. As you might say, but other how you can have fun, right? Oh, if you want to introduce some new language constructs into PHP And you have to obviously do these kind of crazy things. So if you run, I don't think anybody actually compiles PHP By themselves right here because I how many versions. So I have like few versions of PHP lying around. Everything with my hacky kind of things. So if you wanted to ever build PHP, you just clone the source. You'll have to, if you're on a Mac, it's a bit more harder. You need to install some extensions. After that you run, build the conf and then you run configure, prefix it and then make, make and then make install. And then you have your own PHP version to like have fun with. Okay. So my PHP version is here. Descript one, the L statements runs L statement runs. And when you make it's true, the true statement runs. It works. Okay. Any other things that you want to know before I go ahead? Okay. Let's see. Okay. Okay. So that's one thing I did. And then for fun I did something else. So this is, this is Singaporean PHP. Okay. So I don't use PHP. I use Singapore to open my every, every, every time I write a script. I say Singapore. And then, and then you have the class lay, right? Lay and some class. And day in, day in, day in, day in, day in. I say day in and I'm like so Singaporean and I guess that's Chinese. Sorry. And then I have a boy, boy a time. I throw it and then my life is good just for the fun of it. So yeah, you could, who was this? You could run it and everything works, works perfectly. Actually, this is not needed. So you can change the language a bit, like have some fun and why not. When you want to learn the language, you can just do these kind of interesting things to figure out how does it work and things like that. Yeah, I think that's that. Yes, is there any questions after all of those things? Yeah. Do opticals always contain different information like source code or something? Yeah. In production? If you get it in production, you just run the opcode. So you need to actually look for it to get it. I don't know how to answer that. It's there, but yeah, it's there. Yeah, sorry, I couldn't answer it. So you have a familiar GCC and RIVM, right? So what happens is when you compile and you generate an opcode, so you can actually pass in some flag to achieve the tag, the line, the line number, and the Z-character position, or at least the line number. The other way, you look into it, the information will be there. You need to generate a lot of stuff, so when you're debugging, you want to see that job code. So when you run the program, you pass in some flag, say, okay, I want a line number for debugging purposes. Because when you do a stack trace, the stack trace already contains the line number, so it's already there. It's not associated. It might not be associated with the opcode, but you can do it. Yeah. In the first slide or second slide, about the Z-cache. So how long in that technology is using HV7 and HV7? I think this has been there until like from a long time ago. So why is it making that problem? Okay, so you want to, okay. There's few subtle things. Okay, let me see. It's just a matter of reducing the size of this. So they tried quite hard to reduce the size of it. And they, I think, reduced it by probably 20, 30, probably more than that. I'm looking for an exact thing that says the cost information. So I just don't have that information. I just, I can't find it right now. But the idea is to try to make the structure that actually stores everywhere even smaller. Therefore, there's no much more allocations or deallocations. So that's kind of the gist of the reason. But opcache helps. So it definitely helps. And yeah, yeah, those kind of things. Yeah. I could release this at the end. Yeah, that's pretty much all. Thanks.